About the Catalogue of the Endangered Languages of the World Many languages of the world are at risk of extinct soon. The crisis of endangered languages is one of the most serious issues facing humanity today, posing moral, practical, and scientific problems of enormous proportions. This catalogue informs users about the plight of endangered languages and encourages efforts to slow the loss. It provides information on the endangered languages of the world as a resource for the public, scholars, those whose languages are in peril and community groups facing language loss, and funding agencies to be able to deploy limited resources. Until now, with this Catalogue of Endangered Languages, there has been no single reliable source of information on the endangered languages of the world, describing how endangered each language is and to what extent it has been documented. For many of the languages in this catalogue, little or no accessible information exists. For others, the existing sources are often inaccurate, unreliable, or inaccessible. For those seeking to understand where documentation efforts and resources might most effectively be directed, and where language conservation or revitalization efforts are most needed, it is important to know not only how critically endangered a language is, but also how well it has already been described, how different or unique it might be, and how further description might contribute to understanding of human language in general. This catalogue presents these kinds of information on the endangered languages of the world. The Catalogue of Endangered Languages Project personnel. The Catalogue of Endangered Languages is under the direction of Lyle Campbell (University of Hawai‘i Mānoa) and Anthony Aristar and Helen Aristar-Dry (LINGUIST List/Eastern Michigan University). The team at Eastern Michigan University (EMU) is responsible for the programming, technical aspects of the Catalogue, bibliography management, and for the languages of Africa and Australia. The University of Hawai‘i Mānoa (UHM) team is responsible for the languages of Europe, the Caucasus, North Asia, East Asia, South Asia (the Indian subcontinent), Southeast Asia, North America, Central America, South America, and the Pacific, and for the endangerment scale and the documentation need scale. The following individuals contributed to Phase I of the data collection: EMU team: Dr. Anthony Aristar Dr. Helen Aristar-Dry Anna Belew (Team Leader) Lwin Moe Kristen Dunkinson Amy Brunett Brent Woo UHM team: Dr. Lyle Campbell John Van Way (Project Coordinator) Huiying Nala Lee Eve Okura Dr. Kaori Ueki The initial catalogue content was prepared by the members of these two teams, with some input from Regional Directors, experts on the languages of specific regions who provide expertise to correct and expand the catalogue, whose primary role begins in Phase II. The Regional Directors are: Willem F.H.Adelaar (South America) Greg Anderson (South Asia) I. Wayan Arka (Indonesia) Claire Bowern (Australia) Matthias Brenzinger (Africa) Lyle Campbell (the Americas) Alice C. Harris (Caucasus) Brian Joseph (Europe) Juha Janhunen (Northern and Central Eurasia) Bill Palmer (Pacific) Keren Rice (North America) David Solnit (East and Southeast Asia) 1 Acknowledgements. The research for the Endangered Language Catalogue project is funded by a grant from the National Science Foundation: Collaborative Research: Endangered Languages Catalog (ELCat), BCS-1058096 to the University of Hawai'i at Mānoa (Principal Investigator, Lyle Campbell) and BCS-1057725 to Eastern Michigan University (Principal Investigators Helen Aristar-Dry and Anthony Aristar). The goals and basic organization of the Catalogue were established in an international workshop with some 50 specialists from around the world supported by National Science Foundation grant Collaborative Research: Endangered Languages Information and Infrastructure Project (NSF 0924140 ). This is just the beginning. It is extremely important to understand that the Catalogue is a work in progress. At launch of this website, the Catalogue is still in Phase I, which is based only on the information available in existing publications and web resources about the individual endangered languages. Bringing in more recent and local information is critical to this project, which is the focus of Phase II. The second phase will continue over the next two years. It involves an international team of regional specialists (see above) reaching out to knowledgeable individuals and organizations to fill in the missing information for languages in their areas, to check the accuracy of information, and to make needed corrections. For this phase and long into the future, the goal is to modify, update, and improve the catalogue contents constantly, as new information becomes available or as the situation for particular languages changes. If users of this website have particular knowledge or information about specific languages, we encourage submission of comments and suggestions for improvement of language entries. We are grateful for your help in improving the collective knowledge of the endangered languages. The Language Endangerment Index and the Need for Documentation Index presented for each language are not meant to be the final word about degree of endangerment or extent of documentation. The scores for individual languages will change as more information becomes available. They are provided for practical purposes, to give a quick but rough visual indication of a language’s endangerment status and documentation needs. The level of certainty accompanying each language shows the degree of confidence in the score: a label of “uncertain” may indicate that the level is not yet known or the score has been computed and further evaluation is needed. How the Catalogue handles tough questions. Some may wonder how differences of opinion have been handled. For example, some language varieties are believed to be independent languages by some scholars but are considered only dialects of a single language by others. In cases where it is not clear whether separate languages are involved or just dialects of one language, the entity in question is given its own entry as a potentially distinct language, but with the different opinions noted. In cases where the evidence is clear, entities are joined in a single entry, but with differences of opinion registered. Similarly, in cases the evidence is clear that separate independent languages are involved, though some believe they are dialects of a single language, these are given separate entries in the catalogue, with description of the different interpretations. The thorny issue of distinguishing dialects from closely related languages is avoided simply by giving doubtful entities their own entries with comments representing the range of opinion. As more comes to be known, it will possible to resolve the status of many of these entities; for others, the status may just remain unclear. This benefit-of-the-doubt approach to inclusion in the Catalogue, however, means that it is not possible just to count the total number of entries in the catalogue to get an absolute number of how many endangered languages there are in the world. Almost certainly some entities given their own entry will turn out to be only dialects that need to be joined in a single entry as representatives of a single language, reducing the total number of entries in the Catalogue. This approach results in the total number of entries being greater than the absolute number of true languages that are endangered, though hopefully not by a very large margin. Opinions differ also over the word “extinct.” In cases there have been no known speakers for hundreds or thousands of years, extinction is clear. However, there are cases where one source says “extinct,” “probably extinct,” “possibly extinct,” or “no known speakers”, and another credible source reports some speakers. In unclear instances, we include the language in the Catalogue, but report the conflicting designations. This means that almost certainly some languages in the Catalogue are in fact extinct -- not just endangered -- though definitive information is not yet available. As work on the Catalogue progresses, more accurate information on these cases will be obtained and their situations clarified. However, this means that it is not possible to take the total number of entries in the Catalogue as the absolute number of endangered languages in the world today, since some of these languages will prove not to be just endangered, but in fact extinct. There are 133 entries in the Catalogue that fall into this category. [link to “Silent languages” here] The word “extinct” raises other questions. Some scholars consider a language extinct when there are no longer any completely fluent native speakers who learned the language as children from the previous generation. Often, however, even 2 after there are no fully fluent native speakers, there remain speakers with some aptitude in the language, others with passive knowledge, and others who have learned or are learning their heritage tongue as a second language. Many oppose calling these languages “extinct.” Some do not consider languages with any of these sorts of speakers (even if not fluent) as extinct, and they recommend avoiding premature declaration of extinction: for those attempting to learn or revitalize their languages, it can be demoralizing to read that their language is deemed dead. In order not to discourage learning and revitalization efforts in these situations, they recommend reporting these languages as having “no known speakers,” or something equivalent. This practice of avoiding the word “extinct” in such situations is followed in this catalogue, though when the number of native speakers is given as Ø, that is an indication that the language in question falls into this category, a language with no known speakers. Endangered languages: Why so important? Language extinction is not new – languages have been dying since the ancient times. However, languages are becoming extinct today at an alarming rate. Of the nearly 7,000 languages in the world today, some 3,000 (over 40%) are endangered; many others will make their way into this catalogue in the near future. Experts predict that in the worst-case scenario 90% of all languages will be extinct within 100 years; in best-case scenarios, only 50% will survive, and just 10% are considered safe during the next century (see Krauss 1992). Languages not being learned by children are not just endangered but doomed. Of the Native American languages of the US, 90% are not being passed on to a new generation, while also 90% of Australian aboriginal languages and over 50% of minority languages of Russia are in a similar situation. There were 312 American Indian languages in use when Europeans first arrived in North America; of these, 123 (40%) are extinct and others were lost without record. In the US, of the 280 languages known from the time of first European contact, only 151 still have speakers (54%), but all are endangered. Only 20 of these (13%) are being learned by children, but by ever fewer children each year. Most of these languages will be extinct in your lifetime, if language revitalization programs are not successful. California illustrates the crisis: at the time of the Gold Rush (c.1850), California had about 100 Native American languages; only 50 of these survive with speakers, but none is being learned by children in the normal way – the youngest remaining speakers are well into senior-citizenhood. The disappearance of an individual language constitutes a monumental loss of scientific information and cultural knowledge, comparable in gravity to the loss of a species, for example the Bengal tiger or the white whale. However, the extinction of whole families of languages is a tragedy comparable in magnitude to the loss of whole branches of the animal kingdom (classes, orders, families), for example to the loss of all felines or all cetaceans. Just as it would be difficult to understand the animal kingdom with major branches missing, it is impossible to understand the history and classification of human languages with the loss of entire language families. Yet this is what confronts us: already all the languages belonging to 108 of the 420 independent language families (including isolates) of the world are extinct – a staggering 26% of the linguistic diversity of the world is gone forever. Why should you care? We should all be concerned over the crisis of language loss for compelling reasons. (1) Human concerns. Languages are treasure houses of information on literature, history, philosophy, and art. Their stories, ideas, and words help us make sense of our lives and the world round us. For example, the life-enriching value of literature is well-understood and is true also of the oral literatures of the indigenous peoples of the world – they, too, have grappled with the complexities of their world and the problems of life, and the insights and discoveries represented in their literatures are of value to us all. When a language becomes extinct without documentation, taking all its oral literature, oral tradition, and oral history with it into oblivion, we are all diminished. There are also great reservoirs of historical information to be recovered from the study of languages. The classification of related languages teaches us about the history of human groups and how they are related to one another, and we gain understanding of contacts and migrations, the original homelands where languages were spoken, and past cultures from the comparison of related languages and the study of language change – all irretrievably lost when a language becomes extinct without adequate documentation. (2) Lost knowledge. Specific knowledge is often held by the smaller speech communities of the world – knowledge of medicinal plants and cures, identification of plants and animals yet unknown scientifically, new crops, etc. When the language is not learned by the next generation, the knowledge of the natural and cultural world encoded in the language typically fails to be transmitted. Loss of such knowledge could have devastating consequences for humanity. For example, 3 the Seri (of Mexico, only 700 speakers) use xnois ‘eelgrass seed’ (Zostera marina L.) as a food. This is “the only known grain from the sea used as a human food source” and it has considerable potential as a general food source … Its cultivation would not require fresh water, pesticides, or artificial fertilizer” (Felger and Moser 1973). It is easy to imagine a future in which natural or human-caused catastrophes compromise land-based crops, leaving human survival in jeopardy if we lose knowledge such as this. Medicines provide similar examples. Seventy-five percent of plant-derived pharmaceuticals were discovered by examining traditional medicines, and the languages of curers often played a key role. If these languages had become extinct and knowledge of the medicinal plants and associated cures had been lost in the process, all of humanity would have been impoverished and our survival as a species left more precarious. Paul Cox worked with Epenesa Mauigoa, a taulasea, traditional healer, on Upolu, Samoa, and they described 121 herbal remedies. Their work led to knowledge of the mamala plant (Homalanthus nutans) and the anti-viral drug prostratin, used to treat yellow fever. In trials at the National Cancer Institute, it also proved effective against HIV Type 1 (Cox 1993, 2001). Loss of this endangered traditional Samoan knowledge would have been a loss for all of humanity. (3) Scientific understanding of human language. Linguists have the goal of understanding what is possible and impossible in human languages, and through the study of human language capacity, of advancing knowledge of how the human mind works. For these goals, language extinction is a disaster. The discovery of previously unknown features and traits in undescribed languages contributes to this goal. For example, the discovery of languages with OVS [Object-VerbSubject] and OSV [Object-Subject-Verb] basic word orders forced abandonment of previously postulated universals of language. Since languages with these basic word orders were not previously known, it was claimed that “the dominant order is almost always one in which the subject precedes the object” (Greenberg 1966:177), like English with SVO or Japanese with SOV. However, languages such as Hixkaryana (Brazil, 350 speakers) were discovered with OVS basic word order, as in: toto yonoye kamura man ate jaguar ‘The jaguar ate the man.’ Discovery of languages with these previously unattested basic word orders forced this claim to be abandoned. It is all too plausible, however, given the recent loss of many languages in Brazil where most of the OVS and OSV languages were found, that the few languages with these word orders could have become extinct before they were described, leaving us forever in error about what is possible in human language and how that reflects human cognition. The discovery of a new speech sound is to linguists like the discovery of a new species to biologists. Recent discoveries of a new speech sound in threatened languages has led to testing scientific claims about sound systems and to refining our knowledge. Linguists document endangered languages to discover information of this sort, to determine the full range of what is possible in human languages. (4) Human rights. Language loss is often not voluntary; it frequently involves violations of human rights, with oppression or repression of speakers of minority languages. It is a matter of injustice when people are forced to give up their languages by repressive regimes or prejudiced dominant societies. Related to this is the personal loss associated with the death of one’s heritage language. Language loss is often experienced as a crisis of social identity. Our psychological, social, and physical well-being is connected with our native language; it shapes our values, self-image, identity, relationships, and ultimately success in life. For many communities, work towards language revitalization is not about language alone, but is part of a “larger effort to restore personal and societal wellness” (Pfeiffer and Holm 1994, the of Navajo Nation’s Education Division). Many indigenous voices affirm the importance of language in cultural identity: Linguistic diversity ... constitutes one of the great treasures of humanity, an enormous storehouse of expressive power and profound understanding of the universe. The loss of hundreds of languages that have already passed into history is an intellectual catastrophe in every way comparable in magnitude to the ecological catastrophe we face today as the earth’s tropical forests are swept by fire. Each language still spoken is fundamental to the personal, social and – a key term in the discourse of indigenous peoples – spiritual identity of its speakers. (Zepeda [Tohono O’odham nation] and Hill 1991.) 4 But why save our languages ... we should save our languages because it is the spiritual relevance that is deeply embedded in our own languages that is important. (Richard Littlebear [Northern Cheyenne, President of Chief Dull Knife College, Lame Deer]. 1999:1.) I canʼt stress enough the importance of retaining our tribal languages, when it comes to the core relevance or existence of our people … You could argue that when a tribe loses its language, it loses a piece of its inner-most being, a part of its soul or spirit … When it comes to native languages, the situation is simple: Use it or lose it. (Sonny Skyhawk [Sicangu Lakota, Hollywood actor] 2012.) Language loss does not promote peace. It is often claimed that there would be more harmony if there were just one or only a few languages in the world. Some see language loss as promoting greater understanding and fostering world peace. This is wrong. Having only one language is no guarantee of “understanding.” We need look no further than the conflicts in monolingual Northern Ireland, the former Yugoslavia (where Serbians and Croatians have a common language), or the 1994 Rwanda genocide (involving Hutu and Tutsi, both speakers of Kinyarwanda), not to mention th US Civil War. National unity is not fostered by monolingualism; rather, recognition of minority languages’ rights may be a better way of bringing about peace, understanding, and ultimately national unity, as in relatively peaceful multilingual Belgium, Finland, or Switzerland. References Cox, Paul Alan. 1993. Saving the ethnopharmacological heritage of Samoa. Journal of Ethnopharmacology 38.181-8. Cox, Paul Alan. 2001. Will Tribal Knowledge Survive the Millennium? Science 287.5450.44-5. Felger, Richard and Mary Beck Moser. 1973. Eelgrass (Zostera marina L.) in the Gulf of California. Science.181.4097:3556. Greenberg, Joseph H. 1966. Some universals of grammar with particular reference to the order of meaningful elements. Universals of Language, ed. by Joseph H. Greenberg, 73-113. Cambridge, MA: MIT Press. Littlebear, Richard. 1999. Some Rare and Radical Ideas for Keeping Indigenous Languages Alive. Revitalizing Indigenous Languages, edited by Jon Reyhner, Gina Cantoni, Robert N. St. Clair, and Evangeline Parsons Yazzie, 1-5. Flagstaff, AZ: Northern Arizona University. Skyhawk, Sonny. 2012. Why should we keep tribal languages alive? Indian Country, April 6, 2012 (http://indiancountrytodaymedianetwork.com/2012/04/06/why-should-we-keep-tribal- languages-alive-99182). Zepeda, Ofelia and Jane H. Hill. 1991. The condition of Native American languages in the United States. Endangered Languages, ed. by R. H. Robins and E. M. Uhlenbeck, 135-55. Oxford: Berg. 5 Scale of Endangerment Level of Endangerment Intergenerational Transmission 5 Critically Endangered 4 Severely Endangered 3 Endangered 2 Threatened 1 Vulnerable 0 Safe1 Few speakers, all elderly Many of the grandparent generation speaks the language. Some of childbearing age know the language, but do not speak it to children. Most adults of child-bearing age speak the language. Most adults and some children are speakers. Absolute Number of Speakers Speaker Number Trends 1-9 speakers 10-99 speakers 100-999 speakers 1000-9999 speakers 10,000-99,999 speakers All community members /members of the ethnic group speak the language. >100,000 speakers A small percentage of community members or members of the ethnic group speaks the language;, the rate of language shift is very high. Fewer than half of community members or members of the ethnic group speak the language; the rate of language shift is accelerated. Almost all community members or members of the ethnic group speak the language; speaker numbers are stable or increasing. Used only in very few domains, (for example, restricted to ceremonies, to few specific domestic activities; a majority of speakers supports language shift; no institutional support. The language is being replaced even in the home; some speakers may values their language while the majority support language shift; very limited institutional support, if any. A majority of community members or members of the ethnic group speak the language; the numbers of speakers is gradually diminishing. Used in nonofficial domains; shares usage in social domains with other languages; most value their language but some are indifferent; education and literacy programs are rarely embraced by the community; government has no explicit policy regarding minority languages, though some outside institutions support the languages. Most community members or members of the ethnic group are speakers; speaker numbers are diminishing, but at a slow rate. Domains of use of the language About half of community members or members of the ethnic group speak the language; the rate of language shift, is frequent but not rapidly accelerating. Used mainly just in the home; some speakers may value their language but many are indifferent or support language shift; no literacy or education programs exist for the language; Government encourages shift to the majority language; there is little few outside institutional support. Used in all domains except official ones (i.e., government and workplace); nearly all speakers value their language and are positive about using it (prestigious); education and literacy in the language is available, but only valued by some; government and other institutional support for use in non-official domains. Used in government, mass media, education and the workplace; most speakers value their language and are enthusiastic about promoting it; education and literacy in the language are valued by most community members; government and other institutions support the language for use in all domains. 1 In order for a language to be considered ‘Safe,’ it must receive a 0 rating in all four categories. If a language’s composite score is 0% but the score is anything less than ‘Certain,’ it will be considered ‘At risk.’ 6 Computing Level of Endangerment: Intergenerational Transmission will be worth twice each of the other factors. Because many languages will not have reliable data for some of these factors, the total score will be based on the percentage of points out of the total points possible based on the number of factors considered. (100-81% = Critically Endangered; 80-61% = Severely Endangered; 60-41% = Endangered; 40-21% = Threatened; 20-1% = Vulnerable; 0% = Safe) Level of Certainty will be computed based simply on the percentage of factors that are known and entered. (25 points possible = Certain; 20 points possible = Mostly Certain; 15 points possible = Fairly Certain; 10 points possible = Mostly Uncertain; 5 points possible = Uncertain) Examples: Intergen. Trans. (x2) Language A Abs. # Speaker Trends Domains Total 6 4 3 3 16 Pts. possible Language B 10 5 5 5 25 8 5 0 0 13 Pts. possible Language C Pts. possible 10 0 0 5 3 5 0 0 0 0 0 0 15 3 5 Status Severely Endangered Certain Critically Endangered Fairly Certain Endangered Uncertain Need for Documentation Scale The need for documentation is based on the adequacy of available documentation of three types: grammar, dictionary, and corpus. Each of these factors has a total number of points; the number of points received is a percentage that is then weighted. Grammar weighs 4; Dictionary weighs 2; Corpus weighs 1. Grammar (Factor 1 out of 3) Size: Description large, comprehensive 4 Score Criteria Scientific Accessible basic reference grammar 3 Yes x 1.5 x 1.5 Highest Score Possible: Lowest Score: grammatical sketch 2 treats some aspects 1 No (remains unchanged) x1 x1 9 0 Example: Basic reference grammar, pre-scientific, accessible 3 x1 x 1.5 7 = 4.5 nothing 0 Note: A grammar is considered either scientific or pre-scientific. In terms of its score, this is a function of its size. A scientific grammar is 1.5 time the value of a pre-scientific grammar of the same size. The same for accessibility: it is a binary matter (accessible or not) rather than a range. Dictionary (Factor 2 out of 3) Size: # of words > 5,000 2,000 - 5,000 < 2,000 Nothing Score 3 2 1 0 Bonus points: Criteria Example Sentences Usage Cultural explanations Present in dict. +1 +1 +1 Absent from dict. 0 0 0 Accessibility – a factor of the total score for the dictionary Accessible Inaccessible x 1.5 x1 Highest Score Possible: Lowest Score: 9 0 Example: 2,750 words, no example sentences, usage present, no cultural expl., accessible (2 +0 +1 +0 ) x1.5 = 4.5 Corpus (Factor 3 out of 3) Length Score Size of annotated audio/video texts: > 120 min. 119-60 min. 4 3 59-15 min. < 15 min. Nothing 2 1 0 Written texts (with no corresponding audio/video): +0.5 Unannotated audio/video: +0.5 Highest possible score: Lowest score: 5 0 Example: 30 min annotated transcription, and some written texts, and some unann. audio 2 +0.5 + 0.5 = 3 8 Example language score: Total Score Based on All Three Factors (weighted mean) Section Score Grammar 4.5/9 50% Grammar weighted: 2x dictionary Dictionary weighted: 2x annotated corpus 4(50) + 2(50) + 1(60) 4+2+1 = 51% (documented) Dictionary 4.5/9 50% Annotated Corpus (text) 3/5 60% High Need for Documentation Need for Documentation: Urgent Very High High 0-19% 20-39% 40-59% Moderate Low Very Low 60-79% 80-99% 100% Behind the Need for Documentation ratings The Need for Documentation Index is designed to offer, at a glance, how well documented a language is, and thus what the need for documentation is for that language. This is based on an evaluation of the published documentation in three areas: grammar, dictionaries/lexicon, and texts/corpora. All material relating to one of these areas is evaluated together to provide an overall picture. The initial evaluation is carried out by ELCat researchers, with further review sought by users as new documentation is discovered, written or published and becomes available. Grammars – Grammatical documentation may consist of book-length published grammars, shorter grammatical sketches or articles on particular aspects of a language’s grammar. A large, comprehensive grammar (score of 4) covers all major aspects of the language (phonology, morphology, syntax, etc.) and leaves little to nothing to be desired by a person wishing to know more about the language. An example of this would be Dixon’s (1997) grammar of Yidiny. A basic reference grammar (score of 3) covers most, but not all major aspects of the language (e.g., little phonological information, but lots on syntax). A grammatical sketch (score of 2) is much shorter and provides only preliminary information about some aspects of the language. Documentation that treats some aspects (score of 1) provides information about very limited topics in the language’s grammar, even if it explores those topics in a thorough way. Finally, if the language has no available documentation dealing specifically with the grammar it receives a score of 0. For example, at the time of writing, no documentation is available for the grammars of languages such as Kujarge and Guriaso. If the documentation is informed by modern linguistic training and is written in a way that is useful to today’s linguists, it is rated as ‘scientific’ and the score is adjusted. Hence, the score for grammars such as Dixon’s grammar of Yidiny would be adjusted. If the documentation is not written in such a way that it takes advantage of common generalizations observed in linguistics, the score remains the same. Documentation that is easy to find through university libraries or on the internet, written in a language of wider communication, and is not written in a specific theoretical framework is rated as accessible and the score is adjusted. This applies to grammars such as the Elkins’ (1970) grammar of Western Bukidnon Manobo, which can be found in its entirety online. If it fails to meet all of these criteria, then the score remains the same. Dictionaries/Lexicon – Lexical documentation, including all available wordlists and/or dictionaries, is evaluated first by the number of entries. Entries > 5,000 receives a score of 3; 2,000 – 4,999 entries receives a score of 2; < 2,000 entries receives a score of 1; and no available wordlists receives a score of 0. The quality of those entries is then evaluated by three criteria: if the entries include example sentences, it receives one extra point; if the entries include information about how the words are used in phrases, sentences or discourse, it receives one extra point; if the entries include information that places words in their cultural contexts, it receives one extra point. The above considerations are important because they can help to make a dictionary or wordlist more useful for its users. Finally, if the dictionary/wordlist is not available through university libraries or on the internet, or if it uses special symbols or terms that are not explained, or does not include definitions in a 9 language of wider communication, then it is considered inaccessible and the score remains the same. If not, it is considered accessible and the score is adjusted. Hence, a work like Blust’s (2003) dictionary of Thao would receive a full score of three for having more than 5000 entries, as well as extra points for including example sentences, information on how an entry is used in a phrase, sentence or discourse, as well as cultural information. Finally its score is adjusted for being accessible: the information appears on Google books. Texts/Corpora – Textual documentation consists of recordings of connected speech in a variety of contexts, such as conversations, personal narratives, rituals, instructions, myths/folklore, etc. Our primary consideration are texts that are accessible online or through archives and that are most useful because they include recordings (audio or video) and are annotated with word-by-word or morpheme-by-morpheme glossing and a free translation. Texts meeting these criteria which are > 120 minutes receive a score of 4; 119-60 minutes receive a score of 3; 59-15 minutes receive a score of 2; < 15 minutes receives a score of 1; no texts of this kind merits a score of 0. If the language has written texts (annotated or not) with no corresponding audio or video, it receives 0.5 points, and if the language has audio/video recordings which include no annotation, it also receives 0.5 points. For example, since Chamorro has more than 120 minutes worth of annotated audio corpus, it receives a score of 4. It also receives 0.5 for unannotated audio material and 0.5 for having written texts available, hence scoring a total of 5 for corpus. On the other hand, a language such as Chrau, which has no annotated corpus, no unannotated audio and no written texts, would score a total of zero on the corpus scale. Overall score – The total need for documentation is computed by weighing the scores in each of the categories. Grammars (x 4) are worth twice as much as dictionaries; dictionaries (x 2) are worth twice as much as texts; and texts (x 1) are weighted one. The total grammar score is divided by the points possible for a grammar, yielding a percentage which is then weighted by four. This score is then added to the dictionary points percentage (calculated the same way as the grammar score), which is weighted by two, and added to the percentage of text points. This total is converted to a percentage of total documentation that corresponds to the following levels of need: Urgent Very High High 0-19% 20-39% 40-59% Moderate Low Very Low 60-79% 80-99% 100% Reasons these scores are only rough guides: It is impossible to know whether the grammatical documentation for a language covers all, or even close to all, of the topics in the language. First, it would take someone very familiar with the language to decide that all topics were adequately covered; second, there may be interesting topics in the language that have not yet been considered. Therefore, our evaluation of a grammar as ‘comprehensive’ is based on an educated guess. Basing the quality of lexical documentation (i.e., dictionaries) on the number of entries is a necessary first step, though this can be misleading because not all entries are equal. Some dictionaries may inflate the number of entries by including inflected forms which are predictable. A language may have a perfectly adequate dictionary based just on roots – a dictionary like this would have a much lower number of entries. Textual documentation is evaluated only on what is available to the researchers. In some cases, this may mean that there is significant textual documentation that we have not evaluated and that the score might be higher. However, until the texts are made available to the wider public, then we cannot consider the amount of textual documentation to be satisfactory. When considering the quality of textual documentation, it is important to consider whether a wide variety of genres exists in the available documentation. Because of practical considerations, however, we have reluctantly decided not to consider this factor. First, it is very hard to determine exactly which genres are covered in a corpus and, second, there are challenges in determining whether some texts should be considered a single or multiple genres. (E.g., Are marriage ceremonies and funeral ceremonies considered one genre – ritual – or two?) The overall rating of the need for documentation is inherently arbitrary because it is determined by numerical values. The difference between a rating of Low and Moderate need is one percentage point, which is of course unrealistic. Unfortunately, there is no easy solution to this. 10 References: Blust, R. 2003. Thao dictionary. Language and Lingusitics Monograph Series, No. A5. Taipei: Institute of Linguistics (Preparatory Office), Academia Sinica. Dixon, R.M.W. 1997. A Grammar of Yidiny. Cambridge: Cambridge University Press. Elkins, R.E. 1970. Major grammatical patterns of western Bukidnon Manobo. SIL Publications in Linguistics and Related Fields. 11 Silent Languages To understand the plight of endangered languages today, it is valuable to be able to see just how many languages have become extinct, and to compare the list with the number of living languages and with the number of currently endangered languages. However, “extinction” is not straightforward. Two lists are presented here. One is of languages which are well and truly extinct. The second list is of language that are sometimes declared to have no remaining native speakers but whose status may not be certain. The languages of this second list underscore the need for careful and urgent attention to these cases. Telling questions are, when is a language “extinct”, and indeed what does it mean for a language to be “extinct”? Where there have been no known speakers for hundreds or even thousands of years, extinction is clear and uncontroversial. However, there are uncertain languages about which one source says the language in question is “extinct,” “probably extinct,” “possibly extinct,” or has “no known speakers”, where another equally credible source reports it as still having some speakers or possibly some speakers. The list includes these languages and also languages whose last fluent speaker is reported to have died in recent times, even when sources do not disagree. In some cases of languages recently declared extinct, later on other speakers were found. Most of these languages reported to have recently lost their last speakers probably are truly no longer spoken; nevertheless, it is possible that for some cases some unknown speakers may yet turn up. For that reason we give these languages considerable benefit of the doubt. These languages are all included in the Catalogue of Endangered Languages, together with whatever is reported in sources about their status. There are 141 entries in the Catalogue of Endangered Languages that fall into this possibly speakerless but unclear category, where sources disagree or where the languages have only recently been said to no longer have speakers – 141 is no small set. But there is much more to the extinction story. When a language qualifies as extinct is not a precise matter. For some scholars, a language is considered extinct when there are no longer any completely fluent native speakers who learned the language as children. For others, a language that may lack fluent native speakers but still have semi-speakers or is being learning as a second language is not considered extinct. Moreover, many prefer to avoid calling any language extinct where people whose heritage languages are involved may be interested in attempting to learn or revitalize it, to avoid discouraging such efforts. To encourage efforts toward recovery of a language that lacks fully fluent native speakers, or for that matter, lacks any speakers of any sort, some prefer to speak of such languages as “silent”, “sleeping”, “dormant”, or just “unspoken”. The second list, the one of languages of which there is suspicion or even good reason to believe but no certainty that there are no longer native speakers, serves to call attention to those languages that are perhaps no longer spoken but where it may be possible, nevertheless, that speakers might remain. Such extremely precarious languages merit high priority. Many will be and should be the objects of caring concern by those whose heritage languages these cases represent. These lists demonstrates starkly the problem of language endangerment by showing just how many of the world’s languages have already become extinct or are “silent” (“sleeping”), in contrast to the great many languages that are currently endangered, listed in this catalogue. Up to now, 635 known languages appear on these two lists, just under 10% of the language known ever to have existed. Already all the languages of more than 100 language families (including language isolates) are extinct from among the 420 independent language families (including isolates) in the world – 25% of the linguistic diversity of the world has already disappeared. Worse, this number will change radically and rapidly: the Catalogue of Endangered Languages has just over 3,000 entries from among the approximately 7,000 living languages in the world – by this count, 43% of living languages are endangered! The number of extinct languages will soon swell dramatically. Clearly, as these numbers show, languages on a course towards extinction are vastly more numerous currently than in the past. 12 1. Language of Uncertain but Precarious Status (sometimes reported as having no speakers) (141 languages): Agwamin Akkala Saami Alngith Amanaye Amonap Arabana-Wangkangurru Arapaso Arara (Arara do Beiradao) Ariba Atampaya Atsugewi Ayapathu Bare Baygo Bung Canichana Catawba Cayuvava Chiapanec Chilanga (Salvadoran Lenca) wmi sia aid ama mzo ard arp axg aea amz atw ayd bae byg bgd caz chc cyb cip len? Jumaytepeque Xinka Kansa Kushyana (Kaxuiana) Kerek Klallam Korana Kukatj Kuku-Mangk Kuku-Mu'inh Kuku-Ugbanh Kuku-Thaypan Laimon Lake Miwok Lamalama Lapachu Leco Lipan Lower Chinook Lower Chehalis Macaguaje xin ksk kbb krk clm kqz ggd xmg xmp ugb typ coj lmw lby qa6 lec apl chh cea mcl Chimariko cid Maidu nmu Chitimacha Chiwere Coast Miwok ctm iow csi Copper Island Aleut mud Cupeño Deti Dirari Djangun Duungidjawu Eastern Pomo Eel River Athabaskan Eyak Gamberre Ganggalida Garlali Guazacapan Xinka Gununa-Kune Gurr-goni Hanis Honduran Lenca Gros Ventre Hpon Ilgar Itene cup shg-det dit djf wkw-duu peb qt8 eya gma gcd nbx xin pue gge csz len? ats hpo ilg ite Makolkol Malyangapa Mandahuaca Martha’s Vineyard Sign Language Mbariman-Gudhinma Miami-Illinois Miriti Miwa Muluridyi Mayi-Kutuna Mbabaram Ngamini Ngarinyin Ngawun Ngumbarl Nimbari Nisenan Njerep Nungali Nyaki Nyaki Nyang'i N|u Ona Opata-Eudeve Jawi djw Otoe Jiwarli djl Paraujano 13 trz tno tkf tud umg teb 0qk tol umd psm lwl-pha pit pmw qua qui kyl sar swn ser pom zmh 0h1 mht Tora Toromona Tukumanfed Tuxa Umbuygamu Teteté Teushen Tolowa Umpithamu Pauserna Phalok Pitta-Pitta Plains Miwok Quapaw Quileute Santiam Saraveca Sawknah Serrano Southeastern Pomo Southern Sierra Miwok Tapeba Tequiraca Umutina mre Unami unm zmv mia mvv vmi vmu xmy vmb nmv wil nxn 08s nmr nsz njr nug nys nyp ngh ona opt iowoto pbg Uradhi Uru Vilela Wanggamala Wangganguru Wappo Wik-Epa Wik-Keyangan Wirafed Wirangu Wiyot Wotapuri-Katargalai Xakriaba Xiriâna Yahuna Yameo Yangman Yaquina Yavitero Yir-Yoront urf ure vil wnm wgg wao wie wif wir wiw wiy wsv xkr xir ynu yme jng aes yvt yiy Zire sih Ziriya zir skd tbb ash umo 2. Extinct Languages (494 languages) //Xegwi xeg Aruá aru Chipiajes /Xam xam Assan xss Chiquimulilla Xinka Abipon Abishira Abnaki, Eastern Acroá Adai Aequian Aghu Tharnggalu Aghwan Agta, Dicamay Aguano Ahom Ajawa Aka-Bea Aka-Bo Aka-Cari Aka-Jeru Aka-Kede Aka-Kol Aka-Kora Akar-Bale Akkadian Alanic Algonquian, Carolina Alsea Ammonite Andaqui Andoa Anglo-Norman Anserma Apalachee Aquitanian Arabic, Andalusian Aramaic, Jewish Babylonian Aramaic, Jewish Palestinian Aramaic, Official Aramaic, Samaritan Aranama-Tamique Arára, Mato Grosso Aribwatsa Arikem Arin Arma Armazic axb ash aaq acs xad xae ggr xag duy aga aho ajw abj akm aci akj akx aky ack acl akk xln crr aes qgg ana anb xno ans xap xaq xaa Atakapa Atsahuaca Aushiri Auyokawa Avar, Old Avestan Awabakal Ayta, Tayabas Bactrian Baga Kaloum Baga Sobané Banggarla Baniva Barbacoas Barbareño Baré Basa-Gumna Basay Bayali Baygo Beothuk Berti Biloxi Bina Biri Birked Bolgarian Cacaopera Cagua Camunic Caramanta Carian aqp atc avs auo oav ave awk ayy xbc bqf bsv bjb bvv bpb boi bae bsl byq bjy byg bue byt bll bmn bzr brk xbo ccr cbh xcc crf xcr Cholón Chorasmian Chorotega Chumash Chuvantsy Coahuilteco Cochimi Comecrudo Coptic Coquille Cornish Cotoname Coxima Coyaima Creole Dutch, Skepi Cruzeño Cumanagoto Cumbric Cumeral Curonian Dacian Dagoman Dalmatian Deir Alla Delaware, Pidgin Dhurga Dieri Dororo Duli Dura Eblan Edomite cbe xinchi cht xco cjr chs xcv xcw coj xcm cop coq cnx xcn kox coy skw crz cuo xcb cum xcu xdc dgn dlm xdr dep dhu dif drr duz drq xeb xdm tmr Carib, Island crb Egyptian egy jpa Catawba chc Elamite elx arc sam xrt axg laz ait xrn aoh xrm Cauca Cayubaba Cayuse Celtiberian Chagatai Chané Chibcha Chicomuceltec Chimakum cca cyb xcy xce chg caj chb cob cmk Elymian Emok Epi-Olmec Esselen Esuma Etchemin Eteocretan Eteocypriot Etruscan xly emo xep esq esm etc ecr ecy ett 14 Faliscan Frankish GabrielinoFernandeño Gafat Galatian Galice Galindan Gamo-Ningi Gangulu Garza Gaulish, Cisalpine Gaulish, Transalpine Geez Gey xfa frk Kalapuya, Southern Kalarko sxk kba Langobardic Laurentian lng lre xgf Kalkutung ktg Lemnian xle gft xga gce xgl bte gnl xgr xcg xtg gez guv Kamakan Kamas Kamba Kambiwá Kaniet Kanoé Kapinawá Kara Kara Karakhanid Karami vkm xas xba xbw ktk kxo xpn lnj xlp xli xlg lab pml xlo xlb lmz xls xlu Ghomara gho Karankawa zkk Gothic Greek, Cappadocian got cpg Karipúna Karirí-Xocó kgm kzw Guana gqn Kariyarra vka Guanche Gugu Warra Gule Guliguli Gureng Gureng Guyani Hadrami Harami Hattic Hermit Hernican Hibito Hittite Homa Horo gnc wrw gly gli gnr gvy xhd xha xht llf xhr hib hit hom hor Karkin Kaskean Katabaga Kaurna Kawi Kazukuru Kepkiriwát Ketangalan Khazar Khorezmian Kitan Kitsai Knaanic Koguryo Koibal krb zsk ktq zku kaw kzk kpn kae zkz zkh zkt kii czk zkg zkb Hunnic xhc Kott zko Hurrian Iberian Ifo Illyrian Ineseño Iowa-Oto Jorá Jurchen Kaimbé Kakauhua xhu xib iff xil inz iow jor juc xai kbf koc zkv kof uun qwm ggk kuz kgg wka kwz Kalapuya, Northern nrt Kpati Krevinian Kubi Kulon-Pazeh Kuman Kungarakany Kunza Kusunda Kw'adza Kwadi KwalhioquaTlatskanai Leningitij Lepontic Liburnian Ligurian Linear A Lingua Franca Loup A Loup B Lumbee Lusitanian Luwian, Cuneiform Luwian, Hieroglyphic Lycian Lydian Macedonian, Ancient Maek Mahican Maidu, Valley Malgana Mamulique Manangkari Mandaic, Classical Mangue Manipuri, Old Manx Maritsauá Marrucinian Marsian Matagalpa Mator Mator-TaygiKaragas Mattole Mawa Maykulan Mbara Median Meroitic Mesmes Messapic Michigamea Miluk qwt Milyan imy zra xqa xar 15 hlu xlc xld xmk hmk mjy vmv vml emm znk myz mom omp glv msp umc ims mtn mtm ymt mvb wma mnt mvl xme xmr mys cms cmm iml Minaean Minoan Mittu Miwok, Bay Miwok, Coast Mlahsö Moabite Mobilian Mochica Mohegan-MontaukNarragansett Moksela Molale Mozarabic Mulaha Muskum Mysian Nadruvian Nagumi Nanticoke Narrinyeri Natagaimas Natchez inm omn mwu mkq csi lhs obm mod omc Oirat, Written Oko-Juwoi Omejes Omok Omurano Oscan Oti Otuke Ouma xwo okj ome omk omu osc oti otu oum Puquina Puri Purisimeño Puyo Puyo-Paekche Pyu Qatabanian Raetic Rema puq prr puy xpy xpp pyx xqt xrr bow mof Paekche pkc Remo rem vms mbe mxi mfw mje yms ndf ngv nnt nay nts ncz Paelignian Pahlavi Paisaci Prakrit Palaic Pali Palumata Pame, Southern Pamlico Pankararé Pankararú Panobo Papora pgn pal qpp plq pli pmc pmz pmk pax paz pno ppu rna xsa sbv xsk sln qey smp sjk sar xsc sxl sds Nawathinehena nwa Paranawát paf Negerhollands Neo-Aramaic, Barzani Jewish Newar, Middle Ngandi Nganyaywana Ngbee Niuatoputapu Nocamán Nooksack Noric Norn North Arabian, Ancient Nottoway-Meherrin Nubian, Old Nukuini Numidian Nyang'i Obispeño Ofayé dcr Parthian xpr Runa Sabaean Sabine Sakan Salinan Sam'alian Samaritan Sami, Kemi Saraveca Scythian Selian Sened Senhaja De Srair Sensi bjf Pataxó Hã-Ha-Hãe pth Seroa kqu nwx nid nyx jgb nkp nom nok nrc nrn xna nwy onw nuc nxm nyp obi opy Pecheneg Pentlatch Phoenician Phrygian Picene, North Picene, South Pictish Pidgin, Timor Pijao Pirlatapa Pomo, Northern Ponares Potiguára Powhatan Prākrit, Ardhamāgadhī Prākrit, Māhārāṣṭri Prākrit, Sauraseni xpc ptw phn xpg nrp spx xpi tvy pij bxi pej pod pog pim pka pmh psu szd snh sdt sxc scx xsd sgm fos sis svx sog xso sxo sut xsv sux sqn Ofo ofo Prussian prg Ohlone, Northern Ohlone, Southern cst css Pumpokol Punic xpm xpu Seru Shinabo Shuadit Sicanian Sicel Sidetic Singa Siraya Siuslaw Skalvian Sogdian Solano Sorothaptic Subtiaba Sudovian Sumerian Susquehannock Syriac, Classical Taino Takelma 16 sjs sni syc tnq tkm Tama Tamanaku Tanema ten tmz tnx Woccon Worimi Wulguru xwc kda qgu Tangut txg Wuliwuli wlu Tapeba Tartessian Tasmanian Tay Boi Tepecano Torona Totoro Tripuri, Early Truká Tsetsaut Tubar Tumshuqese Tunica Tupí Tupinambá Tupinikin Turiwára Turung Tutelo Tuxá Tuxináwa Twana Uamué Ubykh Ugaritic Umbrian Piro tbb txr xtz tas tep tqr ttk xtr tka txc tbu xtq tun tpw tpn tpk twt try tta tud tux twa uam uby uga xum pie wur wya xoo ybn ylr ynn ysc yei yob yug yub ysr tmg twc txh tbh til tjm tgv tju tgy xto txb toe tjn tqw umo Piscataway psy Pisidian Pochutec Polabian Pomo, Eastern Wampanoag Wandarang Wariyangga Wasu Weyto xps xpo pox Wurrugu Wyandot Xukurú Yabaâna Yalarnnga Yana Yassic Yeni Yoba Yug Yugambal Yupik, Sirenik Ternateño Teshenawa Thracian Thurawal Tillamook Timucua Tingui-Boto Tjurruru Togoyo Tokharian A Tokharian B Tomedes Tonjon Tonkawa Umotína Umpqua, Upper Urartian Uruava Urumi peb Vandalic xvn wam wnd wri wsu woy Vano Venetic Ventureño Vestinian Volscian vnk xve veo xvs xvo xup xur urv uru 17 Waamwang Wailaki Wakoná Yupiltepeque Xinka Zarphatic Zemgalian Zhang-Zhung wmn wlk waf xin-yul zrp xzm xzh