Wikipedia:WikiProject Languages

This WikiProject aims primarily to provide a consistent treatment of each human language on Wikipedia. Many languages already have extensive pages, and the systematic information on those pages is not presented in a consistent way. The purpose of this WikiProject is to present that information consistently, and to ensure that each of the major areas is covered at least briefly for each language.

These are only suggestions, things to give you focus and to get you going, and you shouldn't feel obligated in the least to follow them. However, try to stick to the format for the Infobox for each language. See the template for an example Infobox.

The easiest way to get started writing for a language that doesn't already have an article or to convert an article to the WikiProject format is to start with the template.

Article alertsEdit

Articles for deletion

Requested moves

Articles to be merged

Articles to be split

Articles for creation

Quality articlesEdit

Featured articles marked in bold have appeared on the Main Page.

Article assessmentEdit

Place the {{WikiProject Languages}} project banner template on the talk pages of any language-related articles. To rate the article on the quality scale, add one of the following parameters:

  • class=FA for featured articles
  • class=A for A-class articles
  • class=GA for good articles
  • class=B for B-class articles
  • class=start for Start-class articles
  • class=stub for Stub-class articles (which may not necessarily have a "stub" message on them!)
  • class=NA for non-articles (templates, images, etc.)

See WP:GRADES for pointers on classification.

StatisticsEdit

Index · Statistics · Log


Article namesEdit

The guidelines for article titles for languages are at Wikipedia:Naming conventions (languages). In short, most language articles should be titled XXX language. Reasons for this recommendation:

  1. Ambiguity. While some language have special forms that refer unambiguously to the language, English is inherently ambiguous about language names. Having a standard of "XXX language" ensures that it's always unambiguous.
  2. Precedent. This is how Encyclopædia Britannica and many other English-language encyclopedias name their articles.

When there is nothing to disambiguate a language name from, such as Hindi, Esperanto or Inuktitut, there is no need for the "language".

Whether the varieties of Arabic and Chinese should be called "languages" or "dialects" continues to be a highly controversial issue. The current convention is: use NAME + Arabic for Arabic varieties (e.g. Egyptian Arabic) and NAME + Chinese for Chinese varieties (e.g. Mandarin Chinese). Infoboxes are put at both Arabic language and Chinese language and at their first-level subdivisions. However, where there is little controversy that a variety of Arabic or Chinese is a dialect (when it is demonstrably intelligible to other dialects), then 'dialect' is acceptable in the title.

Even in cases in which there is a consensus that varieties of a language have a dialect status, the number and divisions between such dialects are often vaguely-defined, and controversies exist among dialectologists over whether certain varieties should be treated in a unified way or are best understood as separate though related varieties. Separate articles should only be written on varieties (e.g., Estuary English) or related groups of varieties (e.g., Hispanic English) that have been well-enough studied by linguists that at least a minimal body of literature exists about that variety or group of varieties, as a distinct dialect or group of dialects. Phonological, morphosyntactic, or lexical variation that may be considered subdialectal should be noted as "differences within X dialect,", where X is a dialect as discussed in the relevant literature. Controversies over dialect status can be noted in articles as such, but should also be based on citable work. Names used to refer to that dialect in the title should be preferred over folk-linguistic terms (e.g., Inland North versus Midwestern Accent).

Article structureEdit

If you would like to create an article on a new language, you can use {{subst:New language article}} to help streamline the process. An example structure and explanation of the sections can be found at /Template for oral languages and /Template (sign language) for sign languages. Language articles are subject to Wikipedia's includion criteria.

Open tasksEdit

GeneralEdit

UpdatesEdit

Population data has been mostly updated from Ethnologue 16 to 17. However, an unknown number of articles which did not have the ref field set to "e16" slipped through the cracks; an example is Cumanagoto, which did not have a ref'd population figure because E16 had mistakenly listed it as extinct. Articles which are not ref'd to Ethnologue could be checked in case E17 has a more recent figure.

User:PotatoBot helps keep ISO redirects in sync with changing WP articles and ISO standards. The results of the latest run are displayed at ISO 639 log and ISO 639 language articles missing.

Names at Spurious_languages#Spurious_according_to_Glottolog with asterisks have not been addressed.

Articles to be createdEdit

Red links should either be redirected or have their own articles.

Articles with red links

99.9% of ISO language names have articles, though not always one-to-one (e.g. Fulani, Zhuang, and Mazatec); the 0.01% which do not are spurious, dubious, or insufficiently attested to justify their own article, and are redirected to an article stating that.

Lists for evaluation

The lists below are of self-links in our articles, language names from various sources which do not have articles or redirects, and suspicious cases to keep track of.

Lists of obscure names from common refs
INALI
  • 48 at INALI names for Mexican languages (27 Mixtec & 6 Nahuatl to be reviewed; 12 Zapotec & 3 others attempted). Even blue links may be wrong, due to confusion of similar town names or misidentification at Ethnologue.
AIATSIS
  • 7 potential languages w data. The AIATSIS db is periodically updated, with new languages confirmed.
Ethnologue 11
  • Holima ["near Dobu" – misreading of Molima?], Waelulu ["existence unconfirmed"; taken from V&V]
Voegelin (1977)
36 red-linked names; list doesn't bother with reds links for what Loukotka says is unattested.
Blue links have not been checked. Many are presumably inadvertent homonyms rather than the language intended by V&V.
Ruhlen (1987)
  • S.Am.: 12 (see key) extremely obscure names of mostly unattested languages, not even listed in Campbell & Grondona 2012, and for only a few does Loukotka say anything other than 'unknown'. Those not found in Loukotka might be copy errors.
There are also at least half a dozen names in Ruhlen which take you to what is apparently the wrong article. One is a typo, 3 are unidentified, and 2 have perhaps just been reclassified.
Campbell & Grondona
Linguist List local-use ISO
Glottolog
25 at Talk:Glottolog#Unclassified_languages
93 more at Wikipedia:WikiProject Languages/Glottolog languages without ISO codes -- both for Glottolog 2.2
Circular and suspicious links
Identity suspect
Nshi, Sotatipo, Lui, Pasto (wrong ISO?), Kanamarí and Karipuná (contradicted by E17), Gulei (marked "?" in list), Sonde, Ngoni, Pretoria-Tsonga (marked "§" in list) & Mangala
Circular links of ISO names with summary data
Loloish, Qiangic (3 listed + old name Pingfang, which I can't ID), unclassified Asian (Bhatola: presumably a Gond dialect, Warduji: presumably a Persian dialect), Hindi (Ghera: Pakistani enclave of unidentified Indian language), conlang codes (Kotava, Romanova: old articles were deleted as not-notable)
Cases to track
No 1-to-1 correspondence to ISO
Tracking only; no need to fix.
Gbaya language (Central African Republic), Gbaya language (Sudan), Syriac language
ISO languages without info box
Typically because there are problems in defining the language. Tracking only; no need to fix.
Minor languages covered in family article: Loloish (4)
Language uncertain: Mina, Majhwar
Rd. to script or history article: Epi-Olmec (undeciphered), Ancient Zapotec, Middle Korean
Rd. to spurious-language article: Parsi-Dari, Parsi, Tapeba
Newly discovered or unattested languages without ISO codes
Lubu (unattested and extinct)
Cuyama (unattested and extinct)

Requests for expansionEdit

Images for articles in Category:Wikipedia requested photographs of languages.

Requests for attentionEdit

(no article Ashéninka people; Keres functions as the lang article but reads as a family article)

Tagged categoriesEdit

Category:Articles lacking sourcesEdit

Only language varieties are included here. Subjects such as 'French language in Jordan' and 'Westernized Chinese language', though in bad shape, are not listed because they would not be representative of the many unreferenced articles that are not about specific varieties.

  • 2004–2014: (only articles with 'language', 'dialect', 'creole', or 'pidgin' in name are included; distilled from an insane number of articles)
English: Jewish English languages
Germanic: Central Franconian dialects, Eastphalian dialect, Hamburgisch dialect, Norwegian dialects, Orsamål dialect, Ripuarian language, Sognamål dialect
Romance: Chipilo Venetian dialect, Comasco-Lecchese dialects, Fornes dialects, Pavese dialect, Sabino dialect, Sutsilvan dialects (Romansh)
Slavic: Debar dialect, Reka dialect, Strumica dialect
Maltese: Qormi dialect, Żejtun dialect
Chinese: Luoyang dialect, Mango dialect, Qihai dialect, Weihai dialect, Ningbo dialect, Ganyu dialect, Fu'an dialect, Xuzhou dialect
other: Kfar Kama Adyghe dialect (Adyghe), Enuani dialect (Igbo), Thanjavur Marathi dialect, South Korean standard language

Category:Orphaned articlesEdit

(same search terms as missing sources)

Ordek-Burnu language (moved to 'stele')

Open ISO issuesEdit

The following ISO change requests from previous years were still open in 2016 Jan. The articles should be updated if they are accepted. (See the current list, reviewed to 2016-06.)

Old open ISO change requests[3]
2006-084	gkm     Medieval Greek           Create
2009-060	ecg 	Ecclesiastical Greek     Create
2009-081	elr 	Katharevousa Greek       Create
2011-041	vsn 	Vedic Sanskrit           Create
2011-165	jpd 	Pando                    Create (removed from list)
2011-171	jkt 	Kantana                  Create (removed from list)
2012-090	lgo 	Looma macrolanguage      = Toma [tod], Loma [lom]  (removed from list) 
2015-005 	fmu 	Far Western Muria 	Update (removed from list)

Articles proposed for deletionEdit

including WP:AFD, WP:PROD and other processes

Articles to watchEdit

The following are language articles which come under repeated POV attack, often for ethnic or nationalistic reasons. Feel free to add ones you've noticed, and to remove languages which have not been a problem for some time. That way, if one of us drops out from editing, the articles we've been watching hopefully won't go to pot.

(Note: Ethnologue 17 and the Swedish Nationalencyklopedin use Indian census data, which is not a RS because it does not have a consistent definition of Hindi. For example, part of the Awadhi population is listed under Awadhi, but most is counted as Hindi. This problem is acknowledged in the presentation of the census results, but has gotten lost in 2ary sources.)
  • Serbo-Croatian & Croatian (subject to ARBMAC)
  • Saraiki dialect, Punjabi dialects, and "Panjistani" (requires text searches to purge repeated additions of contradictory claims of "Panjistani" to multiple articles)
  • Southern Luri language. It may be worthwhile splitting the Luri article, but so far the attempts to do so have been incompetent and motivated by OR redefinition of the language. The present description of the two varieties in the Luri article is so intertwined that splitting them would create something close to a content fork. — kwami (talk) 02:32, 4 September 2015 (UTC)
  • Assyrian Neo-Aramaic and Chaldean Neo-Aramaic, along with the ethnic articles. A seemingly chronic ethnic dispute.
  • Luganda and Baganda: deletion of ISO name
  • Misleading maps: Many national languages have had maps with half the world filled in because of emigration, with no apparent standard for what counts as a speaking population. Most of these will be caught by checking the top 100 at List of languages by number of native speakers.

Interpreting Ethnologue and Glottolog dataEdit

Ethnologue is the default source for language data on WP. There are several advantages to Ethnologue: for many languages, it's the only demographic data we have; for others, it provides a check on the politicization and population inflation that we experience when we allow advocates of a language to cherry-pick sources. Nonetheless, Ethnologue data needs to be carefully evaluated, and if possible, their sources should be verified and cited directly, or better sources used instead of Ethnologue where these are known. There are a few common and serious problems:

Extended content
  • The family trees are auto-generated, and should not be relied on. Auto-generation is skewed by idiosyncratic entries in the language articles. In E16, for example, the Maban family was listed as a branch of the Luo languages, because one of the Luo languages was named Maban; meanwhile, there were two separate Luo branches of Nilotic due to the spelling of "Luo" not matching across articles. The more obvious problems of this sort have been remedied in E17, but the trees are still not a RS for classification, and the nodes are not RSs for the languages in a particular group. Many of our articles say that there are X languages in the Y branch, based on Ethnologue, but all that can be relied on is the classification cited in individual Ethnologue articles.
  • Speaker data is inconsistent. For instance, in E14, Gawwada was cited as having 32,698 mother tongue speakers, including 27,477 monolinguals, based on the 1998 census. In E17, it is cited as having 68,600 speakers based on the 2007 census, but still 27,500 monolinguals. There is no reason to think that the percentage who are monolingual has changed drastically in ten years, so adding the cited number of monolinguals to a Wikipedia article would be irresponsible. Similarly, the cited size of the ethnic group may be only half the cited number of speakers, due to it being several decades older. If the number of monolinguals or ethnic members is not given a citation date by Ethnologue, it is useless and should not be repeated by us. The number of speakers and the dialects of the language may be from different sources, with the result that the number of speakers may not be that of all dialects. Very commonly, when a language is named after one of its dialects, the speaker number is that of the dialect, not of the language as a whole. Also, a language may be split up into separate ISO codes with the result that one article covers one variety but inherits the number of speakers of all varieties from the old article. Ethnologue has handled this well in recent years, but has not been able to go back and fix such errors inherited from old editions.
  • Ethnologue's arithmetic is consistently bad. For instance, Ethnologue lists five Central Iranian languages as having had 7,030 speakers reported in 2000. It appears that their source listed 35,000 speakers total, and Ethnologue divided that figure by 5 for the individual articles, with no indication that the result was no more than a guess. This kind of problem is not uncommon. Even more commonly, Ethnologue will add together incompatible data from various sources, paying no attention to significant figures. For example, if one source reported 2 to 5 million speakers in country A in 1975, and another 5 to 10 thousand in country B in 2006, Ethnologue will report the total as 3,507,500 speakers (3.5 million, the median of 2 and 5 million, plus 7,500, the median of 5–10,000). Old editions such as E14 are actually more reliable in this regard, as they tend to note that the estimate for country A was 2 to 5 million, when later editions will simply report 3.5 million as if that were the figure in the source. If the original source cannot be verified, we should at least look at each of the figures that make up the total and redo the math, so that we avoid spurious precision as much as practicable.
  • Dates are not reliable indicators of when the data was taken. Unless they are census data, which has the problem all censuses do of speakers intentionally misreporting their language, the dates given by Ethnologue are the date of publication of their source. They can be several decades after when the data was collected. The result is that an old date may report the same or more recent data than a newer date. For instance, several Australian languages are cited as "SIL 2011" in E17. However, in E16 they all had the same numbers of speakers cited to "Wurm and Hattori 1983". In other cases the source that Ethnologue uses may cite an old edition of Ethnologue, or the source that Ethnologue used in an old edition. And the sources themselves may have problems that are not mentioned in Ethnologue. For instance, one source from the 1990s notes that its numbers are copied from a publication from the 1980s that was based on field work in the 1950s. In the Ethnologue entry, however, only the date from the 1990s is given. For another example, the data for the Hindi languages was updated between E16 and E17, based on the new Indian census. However, the census makes it clear that many Awadhi speakers, for example, reported their language to be "Hindi" rather than Awadhi. The result is that the E17 figure for Hindi is inflated by perhaps 100 million people who should be listed under other languages, but there is no warning about this in Ethnologue. Many entries are also undated. Some of these are recent oversights that will be fixed in the next edition, but many are inherited from old editions of Ethnologue. In such cases, citing the edition of Ethnologue that first reported the figure might give the reader some indication that it is not recent data.
  • Figures may be ethnic numbers and an order of magnitude greater than the actual number of speakers. A good start in cleaning this up has been made in E17, but it's not clear how complete is it.

Such problems are understandable: Ethnologue is an enormous project with a very small editorial team and budget. For years, Ethnologue had a reputation for being unresponsive, so many linguists do not bother to correct the errors they find, but since ca. 2012 they have been appreciative of feedback, and the quality of their material has improved markedly.

Linguist List / MultiTree is a former undergrad student project that includes a large number of language names not found in Ethnologue, but their identification is highly unreliable, and can often be seen to be spurious with even a cursory glance at the literature. They should not be considered a reliable source, and since the creation of Glottolog they are no longer of much value as a source of references.

Glottolog[4] is for the most part a reliably cited and researched alternative to Ethnologue. It often does a superior job, for instance in verifying and updating the classifications it adopts, in marking languages as 'spurious' when they cannot be verified to exist, and most importantly in citing its sources. But it is largely the work of a single person (Harald Hammarström), and he has not had the time to improve on Ethnologue for all the languages of the world. In many places, Glottolog language inventories are copied from Ethnologue and it is not an independent source. At first, Glottolog linked to or cited Ethnologue for most such material, but since Ethnologue put up its paywall, Glottolog has deleted acknowledgement, often replacing it with sources which themselves are simply copies of Ethnologue. Although in many cases Hammarström has personally vetted sources, even to the extent of doing his own comparison of the raw lexical or morphological data to evaluate which classification is the most accurate, his work is not always transparent, and it may be difficult to tell whether a classification is his own work (in which case the classification does not match any of the cited sources), the work of the most credible published source, or an unacknowledged copy of some other source. Also, Glottolog should not be used for dialects, which it blindly copies from MultiTree simply for completeness but where it makes no attempt at reliability.

Global Recordings Network copies much of its data from Ethnologue, misidentifies alternative names as languages, and contradicts itself with speaker numbers.

In all these cases, primary sources should be used to check for the accuracy of such claims.

TemplatesEdit

InfoboxesEdit

Project bannerEdit

Please add {{WikiProject Languages}} to talk pages of relevant articles. Articles with this template are put into Category:WikiProject Languages articles.

StubsEdit

Language stubs should be tagged with the most appropriate template of these:

UserboxEdit

After you sign up, you can add the project userbox to your user page by adding the following: {{User WikiProject Languages}}. Your username will then automatically be added to the Category:WikiProject Language members.

Related WikiProjectsEdit

This WikiProject is a descendant of WikiProject Linguistics. It has descendants of its own, most of which aren't particularly active at present:

See also:

Project volunteersEdit

If you'd like to help out, be contacted by others interested in this WikiProject's subject, and receive task assignments and project-related updates on your talk page, please add your name here:

CategoriesEdit

Click on "►" below to display subcategories: