Introduction and background information for the Catalogue of

advertisement
About the Catalogue of the Endangered Languages of the World
Many languages of the world are at risk of extinct soon. The crisis of endangered languages is one of the most serious issues
facing humanity today, posing moral, practical, and scientific problems of enormous proportions. This catalogue informs
users about the plight of endangered languages and encourages efforts to slow the loss. It provides information on the
endangered languages of the world as a resource for the public, scholars, those whose languages are in peril and community
groups facing language loss, and funding agencies to be able to deploy limited resources.
Until now, with this Catalogue of Endangered Languages, there has been no single reliable source of information on the
endangered languages of the world, describing how endangered each language is and to what extent it has been documented.
For many of the languages in this catalogue, little or no accessible information exists. For others, the existing sources are
often inaccurate, unreliable, or inaccessible. For those seeking to understand where documentation efforts and resources
might most effectively be directed, and where language conservation or revitalization efforts are most needed, it is important
to know not only how critically endangered a language is, but also how well it has already been described, how different or
unique it might be, and how further description might contribute to understanding of human language in general. This
catalogue presents these kinds of information on the endangered languages of the world.
The Catalogue of Endangered Languages Project personnel. The Catalogue of Endangered Languages is under the
direction of Lyle Campbell (University of Hawai‘i Mānoa) and Anthony Aristar and Helen Aristar-Dry (LINGUIST
List/Eastern Michigan University). The team at Eastern Michigan University (EMU) is responsible for the programming,
technical aspects of the Catalogue, bibliography management, and for the languages of Africa and Australia. The University
of Hawai‘i Mānoa (UHM) team is responsible for the languages of Europe, the Caucasus, North Asia, East Asia, South Asia
(the Indian subcontinent), Southeast Asia, North America, Central America, South America, and the Pacific, and for the
endangerment scale and the documentation need scale. The following individuals contributed to Phase I of the data
collection:
EMU team:
Dr. Anthony Aristar
Dr. Helen Aristar-Dry
Anna Belew (Team Leader)
Lwin Moe
Kristen Dunkinson
Amy Brunett
Brent Woo
UHM team:
Dr. Lyle Campbell
John Van Way (Project Coordinator)
Huiying Nala Lee
Eve Okura
Dr. Kaori Ueki
The initial catalogue content was prepared by the members of these two teams, with some input from Regional Directors,
experts on the languages of specific regions who provide expertise to correct and expand the catalogue, whose primary role
begins in Phase II. The Regional Directors are:
Willem F.H.Adelaar (South America)
Greg Anderson (South Asia)
I. Wayan Arka (Indonesia)
Claire Bowern (Australia)
Matthias Brenzinger (Africa)
Lyle Campbell (the Americas)
Alice C. Harris (Caucasus)
Brian Joseph (Europe)
Juha Janhunen (Northern and Central Eurasia)
Bill Palmer (Pacific)
Keren Rice (North America)
David Solnit (East and Southeast Asia)
1 Acknowledgements. The research for the Endangered Language Catalogue project is funded by a grant from the National
Science Foundation: Collaborative Research: Endangered Languages Catalog (ELCat), BCS-1058096 to the University of
Hawai'i at Mānoa (Principal Investigator, Lyle Campbell) and BCS-1057725 to Eastern Michigan University (Principal
Investigators Helen Aristar-Dry and Anthony Aristar). The goals and basic organization of the Catalogue were established
in an international workshop with some 50 specialists from around the world supported by National Science Foundation
grant Collaborative Research: Endangered Languages Information and Infrastructure Project (NSF 0924140 ).
This is just the beginning. It is extremely important to understand that the Catalogue is a work in progress. At launch of
this website, the Catalogue is still in Phase I, which is based only on the information available in existing publications and
web resources about the individual endangered languages. Bringing in more recent and local information is critical to this
project, which is the focus of Phase II. The second phase will continue over the next two years. It involves an international
team of regional specialists (see above) reaching out to knowledgeable individuals and organizations to fill in the missing
information for languages in their areas, to check the accuracy of information, and to make needed corrections. For this
phase and long into the future, the goal is to modify, update, and improve the catalogue contents constantly, as new
information becomes available or as the situation for particular languages changes. If users of this website have particular
knowledge or information about specific languages, we encourage submission of comments and suggestions for
improvement of language entries. We are grateful for your help in improving the collective knowledge of the endangered
languages.
The Language Endangerment Index and the Need for Documentation Index presented for each language are not meant to be
the final word about degree of endangerment or extent of documentation. The scores for individual languages will change as
more information becomes available. They are provided for practical purposes, to give a quick but rough visual indication of
a language’s endangerment status and documentation needs. The level of certainty accompanying each language shows the
degree of confidence in the score: a label of “uncertain” may indicate that the level is not yet known or the score has been
computed and further evaluation is needed.
How the Catalogue handles tough questions. Some may wonder how differences of opinion have been handled. For
example, some language varieties are believed to be independent languages by some scholars but are considered only
dialects of a single language by others. In cases where it is not clear whether separate languages are involved or just dialects
of one language, the entity in question is given its own entry as a potentially distinct language, but with the different
opinions noted. In cases where the evidence is clear, entities are joined in a single entry, but with differences of opinion
registered. Similarly, in cases the evidence is clear that separate independent languages are involved, though some believe
they are dialects of a single language, these are given separate entries in the catalogue, with description of the different
interpretations. The thorny issue of distinguishing dialects from closely related languages is avoided simply by giving
doubtful entities their own entries with comments representing the range of opinion. As more comes to be known, it will
possible to resolve the status of many of these entities; for others, the status may just remain unclear.
This benefit-of-the-doubt approach to inclusion in the Catalogue, however, means that it is not possible just to count the
total number of entries in the catalogue to get an absolute number of how many endangered languages there are in the world.
Almost certainly some entities given their own entry will turn out to be only dialects that need to be joined in a single entry
as representatives of a single language, reducing the total number of entries in the Catalogue. This approach results in the
total number of entries being greater than the absolute number of true languages that are endangered, though hopefully not
by a very large margin.
Opinions differ also over the word “extinct.” In cases there have been no known speakers for hundreds or thousands of
years, extinction is clear. However, there are cases where one source says “extinct,” “probably extinct,” “possibly extinct,”
or “no known speakers”, and another credible source reports some speakers. In unclear instances, we include the language in
the Catalogue, but report the conflicting designations. This means that almost certainly some languages in the Catalogue are
in fact extinct -- not just endangered -- though definitive information is not yet available. As work on the Catalogue
progresses, more accurate information on these cases will be obtained and their situations clarified. However, this means
that it is not possible to take the total number of entries in the Catalogue as the absolute number of endangered languages in
the world today, since some of these languages will prove not to be just endangered, but in fact extinct. There are 133
entries in the Catalogue that fall into this category. [link to “Silent languages” here]
The word “extinct” raises other questions. Some scholars consider a language extinct when there are no longer any
completely fluent native speakers who learned the language as children from the previous generation. Often, however, even
2 after there are no fully fluent native speakers, there remain speakers with some aptitude in the language, others with passive
knowledge, and others who have learned or are learning their heritage tongue as a second language. Many oppose calling
these languages “extinct.” Some do not consider languages with any of these sorts of speakers (even if not fluent) as extinct,
and they recommend avoiding premature declaration of extinction: for those attempting to learn or revitalize their
languages, it can be demoralizing to read that their language is deemed dead. In order not to discourage learning and
revitalization efforts in these situations, they recommend reporting these languages as having “no known speakers,” or
something equivalent. This practice of avoiding the word “extinct” in such situations is followed in this catalogue, though
when the number of native speakers is given as Ø, that is an indication that the language in question falls into this category,
a language with no known speakers.
Endangered languages: Why so important?
Language extinction is not new – languages have been dying since the ancient times. However, languages are becoming
extinct today at an alarming rate. Of the nearly 7,000 languages in the world today, some 3,000 (over 40%) are endangered;
many others will make their way into this catalogue in the near future.
Experts predict that in the worst-case scenario 90% of all languages will be extinct within 100 years; in best-case scenarios,
only 50% will survive, and just 10% are considered safe during the next century (see Krauss 1992). Languages not being
learned by children are not just endangered but doomed. Of the Native American languages of the US, 90% are not being
passed on to a new generation, while also 90% of Australian aboriginal languages and over 50% of minority languages of
Russia are in a similar situation. There were 312 American Indian languages in use when Europeans first arrived in North
America; of these, 123 (40%) are extinct and others were lost without record. In the US, of the 280 languages known from
the time of first European contact, only 151 still have speakers (54%), but all are endangered. Only 20 of these (13%) are
being learned by children, but by ever fewer children each year. Most of these languages will be extinct in your lifetime, if
language revitalization programs are not successful. California illustrates the crisis: at the time of the Gold Rush (c.1850),
California had about 100 Native American languages; only 50 of these survive with speakers, but none is being learned by
children in the normal way – the youngest remaining speakers are well into senior-citizenhood.
The disappearance of an individual language constitutes a monumental loss of scientific information and cultural
knowledge, comparable in gravity to the loss of a species, for example the Bengal tiger or the white whale. However, the
extinction of whole families of languages is a tragedy comparable in magnitude to the loss of whole branches of the animal
kingdom (classes, orders, families), for example to the loss of all felines or all cetaceans. Just as it would be difficult to
understand the animal kingdom with major branches missing, it is impossible to understand the history and classification of
human languages with the loss of entire language families. Yet this is what confronts us: already all the languages belonging
to 108 of the 420 independent language families (including isolates) of the world are extinct – a staggering 26% of the
linguistic diversity of the world is gone forever.
Why should you care? We should all be concerned over the crisis of language loss for compelling reasons.
(1) Human concerns. Languages are treasure houses of information on literature, history, philosophy, and art. Their stories,
ideas, and words help us make sense of our lives and the world round us. For example, the life-enriching value of literature
is well-understood and is true also of the oral literatures of the indigenous peoples of the world – they, too, have grappled
with the complexities of their world and the problems of life, and the insights and discoveries represented in their literatures
are of value to us all. When a language becomes extinct without documentation, taking all its oral literature, oral tradition,
and oral history with it into oblivion, we are all diminished. There are also great reservoirs of historical information to be
recovered from the study of languages. The classification of related languages teaches us about the history of human groups
and how they are related to one another, and we gain understanding of contacts and migrations, the original homelands
where languages were spoken, and past cultures from the comparison of related languages and the study of language change
– all irretrievably lost when a language becomes extinct without adequate documentation.
(2) Lost knowledge. Specific knowledge is often held by the smaller speech communities of the world – knowledge of
medicinal plants and cures, identification of plants and animals yet unknown scientifically, new crops, etc. When the
language is not learned by the next generation, the knowledge of the natural and cultural world encoded in the language
typically fails to be transmitted. Loss of such knowledge could have devastating consequences for humanity. For example,
3 the Seri (of Mexico, only 700 speakers) use xnois ‘eelgrass seed’ (Zostera marina L.) as a food. This is “the only known
grain from the sea used as a human food source” and it has considerable potential as a general food source … Its cultivation
would not require fresh water, pesticides, or artificial fertilizer” (Felger and Moser 1973). It is easy to imagine a future in
which natural or human-caused catastrophes compromise land-based crops, leaving human survival in jeopardy if we lose
knowledge such as this.
Medicines provide similar examples. Seventy-five percent of plant-derived pharmaceuticals were discovered by examining
traditional medicines, and the languages of curers often played a key role. If these languages had become extinct and
knowledge of the medicinal plants and associated cures had been lost in the process, all of humanity would have been
impoverished and our survival as a species left more precarious. Paul Cox worked with Epenesa Mauigoa, a taulasea,
traditional healer, on Upolu, Samoa, and they described 121 herbal remedies. Their work led to knowledge of the mamala
plant (Homalanthus nutans) and the anti-viral drug prostratin, used to treat yellow fever. In trials at the National Cancer
Institute, it also proved effective against HIV Type 1 (Cox 1993, 2001). Loss of this endangered traditional Samoan
knowledge would have been a loss for all of humanity.
(3) Scientific understanding of human language. Linguists have the goal of understanding what is possible and
impossible in human languages, and through the study of human language capacity, of advancing knowledge of how the
human mind works. For these goals, language extinction is a disaster. The discovery of previously unknown features and
traits in undescribed languages contributes to this goal. For example, the discovery of languages with OVS [Object-VerbSubject] and OSV [Object-Subject-Verb] basic word orders forced abandonment of previously postulated universals of
language. Since languages with these basic word orders were not previously known, it was claimed that “the dominant order
is almost always one in which the subject precedes the object” (Greenberg 1966:177), like English with SVO or Japanese
with SOV. However, languages such as Hixkaryana (Brazil, 350 speakers) were discovered with OVS basic word order, as
in:
toto
yonoye kamura
man
ate
jaguar
‘The jaguar ate the man.’
Discovery of languages with these previously unattested basic word orders forced this claim to be abandoned. It is all too
plausible, however, given the recent loss of many languages in Brazil where most of the OVS and OSV languages were
found, that the few languages with these word orders could have become extinct before they were described, leaving us
forever in error about what is possible in human language and how that reflects human cognition.
The discovery of a new speech sound is to linguists like the discovery of a new species to biologists. Recent discoveries of a
new speech sound in threatened languages has led to testing scientific claims about sound systems and to refining our
knowledge. Linguists document endangered languages to discover information of this sort, to determine the full range of
what is possible in human languages.
(4) Human rights. Language loss is often not voluntary; it frequently involves violations of human rights, with oppression
or repression of speakers of minority languages. It is a matter of injustice when people are forced to give up their languages
by repressive regimes or prejudiced dominant societies.
Related to this is the personal loss associated with the death of one’s heritage language. Language loss is often experienced
as a crisis of social identity. Our psychological, social, and physical well-being is connected with our native language; it
shapes our values, self-image, identity, relationships, and ultimately success in life. For many communities, work towards
language revitalization is not about language alone, but is part of a “larger effort to restore personal and societal wellness”
(Pfeiffer and Holm 1994, the of Navajo Nation’s Education Division). Many indigenous voices affirm the importance of
language in cultural identity:
Linguistic diversity ... constitutes one of the great treasures of humanity, an enormous storehouse of expressive
power and profound understanding of the universe. The loss of hundreds of languages that have already passed into
history is an intellectual catastrophe in every way comparable in magnitude to the ecological catastrophe we face
today as the earth’s tropical forests are swept by fire. Each language still spoken is fundamental to the personal,
social and – a key term in the discourse of indigenous peoples – spiritual identity of its speakers. (Zepeda [Tohono
O’odham nation] and Hill 1991.)
4 But why save our languages ... we should save our languages because it is the spiritual relevance that is deeply
embedded in our own languages that is important. (Richard Littlebear [Northern Cheyenne, President of Chief Dull
Knife College, Lame Deer]. 1999:1.)
I canʼt stress enough the importance of retaining our tribal languages, when it comes to the core relevance or
existence of our people … You could argue that when a tribe loses its language, it loses a piece of its inner-most
being, a part of its soul or spirit … When it comes to native languages, the situation is simple: Use it or lose it.
(Sonny Skyhawk [Sicangu Lakota, Hollywood actor] 2012.)
Language loss does not promote peace. It is often claimed that there would be more harmony if there were just one or only
a few languages in the world. Some see language loss as promoting greater understanding and fostering world peace. This is
wrong. Having only one language is no guarantee of “understanding.” We need look no further than the conflicts in
monolingual Northern Ireland, the former Yugoslavia (where Serbians and Croatians have a common language), or the 1994
Rwanda genocide (involving Hutu and Tutsi, both speakers of Kinyarwanda), not to mention th US Civil War. National
unity is not fostered by monolingualism; rather, recognition of minority languages’ rights may be a better way of bringing
about peace, understanding, and ultimately national unity, as in relatively peaceful multilingual Belgium, Finland, or
Switzerland.
References
Cox, Paul Alan. 1993. Saving the ethnopharmacological heritage of Samoa. Journal of Ethnopharmacology 38.181-8.
Cox, Paul Alan. 2001. Will Tribal Knowledge Survive the Millennium? Science 287.5450.44-5.
Felger, Richard and Mary Beck Moser. 1973. Eelgrass (Zostera marina L.) in the Gulf of California. Science.181.4097:3556.
Greenberg, Joseph H. 1966. Some universals of grammar with particular reference to the order of meaningful elements.
Universals of Language, ed. by Joseph H. Greenberg, 73-113. Cambridge, MA: MIT Press.
Littlebear, Richard. 1999. Some Rare and Radical Ideas for Keeping Indigenous Languages Alive. Revitalizing Indigenous
Languages, edited by Jon Reyhner, Gina Cantoni, Robert N. St. Clair, and Evangeline Parsons Yazzie, 1-5. Flagstaff, AZ:
Northern Arizona University.
Skyhawk, Sonny. 2012. Why should we keep tribal languages alive? Indian Country, April 6, 2012
(http://indiancountrytodaymedianetwork.com/2012/04/06/why-should-we-keep-tribal- languages-alive-99182).
Zepeda, Ofelia and Jane H. Hill. 1991. The condition of Native American languages in the United States. Endangered
Languages, ed. by R. H. Robins and E. M. Uhlenbeck, 135-55. Oxford: Berg.
5 Scale of Endangerment
Level of
Endangerment
Intergenerational
Transmission
5 Critically
Endangered
4 Severely
Endangered
3 Endangered
2 Threatened
1 Vulnerable
0 Safe1
Few speakers, all
elderly
Many of the
grandparent
generation speaks
the language.
Some of childbearing age know
the language, but
do not speak it to
children.
Most adults of
child-bearing age
speak the
language.
Most adults and
some children are
speakers.
Absolute
Number of
Speakers
Speaker Number
Trends
1-9 speakers
10-99 speakers
100-999 speakers
1000-9999
speakers
10,000-99,999
speakers
All community
members
/members of the
ethnic group
speak the
language.
>100,000
speakers
A small
percentage of
community
members or
members of the
ethnic group
speaks the
language;, the
rate of language
shift is very high.
Fewer than half
of community
members or
members of the
ethnic group
speak the
language; the rate
of language shift
is accelerated.
Almost all
community
members or
members of the
ethnic group
speak the
language; speaker
numbers are
stable or
increasing.
Used only in very
few domains, (for
example,
restricted to
ceremonies, to
few specific
domestic
activities; a
majority of
speakers supports
language shift; no
institutional
support.
The language is
being replaced
even in the home;
some speakers
may values their
language while
the majority
support language
shift; very limited
institutional
support, if any.
A majority of
community
members or
members of the
ethnic group
speak the
language; the
numbers of
speakers is
gradually
diminishing.
Used in nonofficial domains;
shares usage in
social domains
with other
languages; most
value their
language but
some are
indifferent;
education and
literacy programs
are rarely
embraced by the
community;
government has
no explicit policy
regarding
minority
languages, though
some outside
institutions
support the
languages.
Most community
members or
members of the
ethnic group are
speakers; speaker
numbers are
diminishing, but
at a slow rate.
Domains of use
of the language
About half of
community
members or
members of the
ethnic group
speak the
language; the rate
of language shift,
is frequent but
not rapidly
accelerating.
Used mainly just
in the home;
some speakers
may value their
language but
many are
indifferent or
support language
shift; no literacy
or education
programs exist
for the language;
Government
encourages shift
to the majority
language; there is
little few outside
institutional
support.
Used in all
domains except
official ones (i.e.,
government and
workplace);
nearly all
speakers value
their language
and are positive
about using it
(prestigious);
education and
literacy in the
language is
available, but
only valued by
some;
government and
other institutional
support for use in
non-official
domains.
Used in
government,
mass media,
education and the
workplace; most
speakers value
their language
and are
enthusiastic about
promoting it;
education and
literacy in the
language are
valued by most
community
members;
government and
other institutions
support the
language for use
in all domains.
1 In
order for a language to be considered ‘Safe,’ it must receive a 0 rating in all four categories. If a language’s composite score is 0%
but the score is anything less than ‘Certain,’ it will be considered ‘At risk.’ 6 Computing Level of Endangerment:
Intergenerational Transmission will be worth twice each of the other factors. Because many languages will not have
reliable data for some of these factors, the total score will be based on the percentage of points out of the total points
possible based on the number of factors considered. (100-81% = Critically Endangered; 80-61% = Severely Endangered;
60-41% = Endangered; 40-21% = Threatened; 20-1% = Vulnerable; 0% = Safe)
Level of Certainty will be computed based simply on the percentage of factors that are known and entered. (25 points
possible = Certain; 20 points possible = Mostly Certain; 15 points possible = Fairly Certain; 10 points possible = Mostly
Uncertain; 5 points possible = Uncertain)
Examples:
Intergen.
Trans. (x2)
Language A
Abs.
#
Speaker
Trends
Domains
Total
6
4
3
3
16
Pts. possible
Language B
10
5
5
5
25
8
5
0
0
13
Pts. possible
Language C
Pts. possible
10
0
0
5
3
5
0
0
0
0
0
0
15
3
5
Status
Severely
Endangered
Certain
Critically
Endangered
Fairly Certain
Endangered
Uncertain
Need for Documentation Scale
The need for documentation is based on the adequacy of available documentation of three types: grammar, dictionary, and
corpus. Each of these factors has a total number of points; the number of points received is a percentage that is then
weighted. Grammar weighs 4; Dictionary weighs 2; Corpus weighs 1.
Grammar
(Factor 1 out of 3)
Size:
Description
large,
comprehensive
4
Score
Criteria
Scientific
Accessible
basic reference
grammar
3
Yes
x 1.5
x 1.5
Highest Score Possible:
Lowest Score:
grammatical
sketch
2
treats some
aspects
1
No (remains unchanged)
x1
x1
9
0
Example:
Basic reference grammar, pre-scientific, accessible
3
x1
x 1.5
7 = 4.5
nothing
0
Note: A grammar is considered either scientific or pre-scientific. In terms of its score, this is a function of its size. A
scientific grammar is 1.5 time the value of a pre-scientific grammar of the same size. The same for accessibility: it is a
binary matter (accessible or not) rather than a range.
Dictionary
(Factor 2 out of 3)
Size:
# of words
> 5,000
2,000 - 5,000
< 2,000
Nothing
Score
3
2
1
0
Bonus points:
Criteria
Example Sentences
Usage
Cultural explanations
Present in dict.
+1
+1
+1
Absent from dict.
0
0
0
Accessibility – a factor of the total score for the dictionary
Accessible
Inaccessible
x 1.5
x1
Highest Score Possible:
Lowest Score:
9
0
Example:
2,750 words, no example sentences, usage present, no cultural expl., accessible
(2
+0
+1
+0 )
x1.5 =
4.5
Corpus (Factor 3 out of 3)
Length
Score
Size of annotated audio/video texts:
> 120 min.
119-60 min.
4
3
59-15 min.
< 15 min.
Nothing
2
1
0
Written texts (with no corresponding audio/video): +0.5
Unannotated audio/video: +0.5
Highest possible score:
Lowest score:
5
0
Example:
30 min annotated transcription, and some written texts, and some unann. audio
2
+0.5
+ 0.5
= 3
8 Example language score:
Total Score Based on All Three Factors (weighted mean)
Section
Score
Grammar
4.5/9
50%
Grammar weighted: 2x dictionary
Dictionary weighted: 2x annotated corpus
4(50) + 2(50) + 1(60)
4+2+1
= 51% (documented)
Dictionary
4.5/9
50%
Annotated Corpus (text)
3/5
60%
High Need for Documentation
Need for Documentation:
Urgent
Very High
High
0-19%
20-39%
40-59%
Moderate
Low
Very Low
60-79%
80-99%
100%
Behind the Need for Documentation ratings
The Need for Documentation Index is designed to offer, at a glance, how well documented a language is, and thus what the
need for documentation is for that language. This is based on an evaluation of the published documentation in three areas:
grammar, dictionaries/lexicon, and texts/corpora. All material relating to one of these areas is evaluated together to provide
an overall picture. The initial evaluation is carried out by ELCat researchers, with further review sought by users as new
documentation is discovered, written or published and becomes available.
Grammars – Grammatical documentation may consist of book-length published grammars, shorter grammatical sketches or
articles on particular aspects of a language’s grammar. A large, comprehensive grammar (score of 4) covers all major
aspects of the language (phonology, morphology, syntax, etc.) and leaves little to nothing to be desired by a person wishing
to know more about the language. An example of this would be Dixon’s (1997) grammar of Yidiny. A basic reference
grammar (score of 3) covers most, but not all major aspects of the language (e.g., little phonological information, but lots on
syntax). A grammatical sketch (score of 2) is much shorter and provides only preliminary information about some aspects
of the language. Documentation that treats some aspects (score of 1) provides information about very limited topics in the
language’s grammar, even if it explores those topics in a thorough way. Finally, if the language has no available
documentation dealing specifically with the grammar it receives a score of 0. For example, at the time of writing, no
documentation is available for the grammars of languages such as Kujarge and Guriaso.
If the documentation is informed by modern linguistic training and is written in a way that is useful to today’s
linguists, it is rated as ‘scientific’ and the score is adjusted. Hence, the score for grammars such as Dixon’s grammar of
Yidiny would be adjusted. If the documentation is not written in such a way that it takes advantage of common
generalizations observed in linguistics, the score remains the same. Documentation that is easy to find through university
libraries or on the internet, written in a language of wider communication, and is not written in a specific theoretical
framework is rated as accessible and the score is adjusted. This applies to grammars such as the Elkins’ (1970) grammar of
Western Bukidnon Manobo, which can be found in its entirety online. If it fails to meet all of these criteria, then the score
remains the same.
Dictionaries/Lexicon – Lexical documentation, including all available wordlists and/or dictionaries, is evaluated first by the
number of entries. Entries > 5,000 receives a score of 3; 2,000 – 4,999 entries receives a score of 2; < 2,000 entries receives
a score of 1; and no available wordlists receives a score of 0. The quality of those entries is then evaluated by three criteria:
if the entries include example sentences, it receives one extra point; if the entries include information about how the words
are used in phrases, sentences or discourse, it receives one extra point; if the entries include information that places words in
their cultural contexts, it receives one extra point. The above considerations are important because they can help to make a
dictionary or wordlist more useful for its users. Finally, if the dictionary/wordlist is not available through university
libraries or on the internet, or if it uses special symbols or terms that are not explained, or does not include definitions in a
9 language of wider communication, then it is considered inaccessible and the score remains the same. If not, it is considered
accessible and the score is adjusted. Hence, a work like Blust’s (2003) dictionary of Thao would receive a full score of three
for having more than 5000 entries, as well as extra points for including example sentences, information on how an entry is
used in a phrase, sentence or discourse, as well as cultural information. Finally its score is adjusted for being accessible: the
information appears on Google books.
Texts/Corpora – Textual documentation consists of recordings of connected speech in a variety of contexts, such as
conversations, personal narratives, rituals, instructions, myths/folklore, etc. Our primary consideration are texts that are
accessible online or through archives and that are most useful because they include recordings (audio or video) and are
annotated with word-by-word or morpheme-by-morpheme glossing and a free translation. Texts meeting these criteria
which are > 120 minutes receive a score of 4; 119-60 minutes receive a score of 3; 59-15 minutes receive a score of 2; < 15
minutes receives a score of 1; no texts of this kind merits a score of 0. If the language has written texts (annotated or not)
with no corresponding audio or video, it receives 0.5 points, and if the language has audio/video recordings which include
no annotation, it also receives 0.5 points. For example, since Chamorro has more than 120 minutes worth of annotated
audio corpus, it receives a score of 4. It also receives 0.5 for unannotated audio material and 0.5 for having written texts
available, hence scoring a total of 5 for corpus. On the other hand, a language such as Chrau, which has no annotated
corpus, no unannotated audio and no written texts, would score a total of zero on the corpus scale.
Overall score – The total need for documentation is computed by weighing the scores in each of the categories. Grammars
(x 4) are worth twice as much as dictionaries; dictionaries (x 2) are worth twice as much as texts; and texts (x 1) are
weighted one. The total grammar score is divided by the points possible for a grammar, yielding a percentage which is then
weighted by four. This score is then added to the dictionary points percentage (calculated the same way as the grammar
score), which is weighted by two, and added to the percentage of text points. This total is converted to a percentage of total
documentation that corresponds to the following levels of need:
Urgent
Very High
High
0-19%
20-39%
40-59%
Moderate
Low
Very Low
60-79%
80-99%
100%
Reasons these scores are only rough guides:
It is impossible to know whether the grammatical documentation for a language covers all, or even close to all, of the topics
in the language. First, it would take someone very familiar with the language to decide that all topics were adequately
covered; second, there may be interesting topics in the language that have not yet been considered. Therefore, our
evaluation of a grammar as ‘comprehensive’ is based on an educated guess.
Basing the quality of lexical documentation (i.e., dictionaries) on the number of entries is a necessary first step, though this
can be misleading because not all entries are equal. Some dictionaries may inflate the number of entries by including
inflected forms which are predictable. A language may have a perfectly adequate dictionary based just on roots – a
dictionary like this would have a much lower number of entries.
Textual documentation is evaluated only on what is available to the researchers. In some cases, this may mean that there is
significant textual documentation that we have not evaluated and that the score might be higher. However, until the texts
are made available to the wider public, then we cannot consider the amount of textual documentation to be satisfactory.
When considering the quality of textual documentation, it is important to consider whether a wide variety of genres exists in
the available documentation. Because of practical considerations, however, we have reluctantly decided not to consider this
factor. First, it is very hard to determine exactly which genres are covered in a corpus and, second, there are challenges in
determining whether some texts should be considered a single or multiple genres. (E.g., Are marriage ceremonies and
funeral ceremonies considered one genre – ritual – or two?)
The overall rating of the need for documentation is inherently arbitrary because it is determined by numerical values. The
difference between a rating of Low and Moderate need is one percentage point, which is of course unrealistic.
Unfortunately, there is no easy solution to this.
10 References:
Blust, R. 2003. Thao dictionary. Language and Lingusitics Monograph Series, No. A5. Taipei: Institute of Linguistics
(Preparatory Office), Academia Sinica.
Dixon, R.M.W. 1997. A Grammar of Yidiny. Cambridge: Cambridge University Press.
Elkins, R.E. 1970. Major grammatical patterns of western Bukidnon Manobo. SIL Publications in Linguistics and Related
Fields.
11 Silent Languages
To understand the plight of endangered languages today, it is valuable to be able to see just how many languages have
become extinct, and to compare the list with the number of living languages and with the number of currently endangered
languages. However, “extinction” is not straightforward.
Two lists are presented here. One is of languages which are well and truly extinct. The second list is of language that are
sometimes declared to have no remaining native speakers but whose status may not be certain. The languages of this second
list underscore the need for careful and urgent attention to these cases.
Telling questions are, when is a language “extinct”, and indeed what does it mean for a language to be “extinct”? Where
there have been no known speakers for hundreds or even thousands of years, extinction is clear and uncontroversial.
However, there are uncertain languages about which one source says the language in question is “extinct,” “probably
extinct,” “possibly extinct,” or has “no known speakers”, where another equally credible source reports it as still having
some speakers or possibly some speakers. The list includes these languages and also languages whose last fluent speaker is
reported to have died in recent times, even when sources do not disagree. In some cases of languages recently declared
extinct, later on other speakers were found. Most of these languages reported to have recently lost their last speakers
probably are truly no longer spoken; nevertheless, it is possible that for some cases some unknown speakers may yet turn
up. For that reason we give these languages considerable benefit of the doubt. These languages are all included in the
Catalogue of Endangered Languages, together with whatever is reported in sources about their status. There are 141 entries
in the Catalogue of Endangered Languages that fall into this possibly speakerless but unclear category, where sources
disagree or where the languages have only recently been said to no longer have speakers – 141 is no small set.
But there is much more to the extinction story. When a language qualifies as extinct is not a precise matter. For some
scholars, a language is considered extinct when there are no longer any completely fluent native speakers who learned the
language as children. For others, a language that may lack fluent native speakers but still have semi-speakers or is being
learning as a second language is not considered extinct. Moreover, many prefer to avoid calling any language extinct where
people whose heritage languages are involved may be interested in attempting to learn or revitalize it, to avoid discouraging
such efforts. To encourage efforts toward recovery of a language that lacks fully fluent native speakers, or for that matter,
lacks any speakers of any sort, some prefer to speak of such languages as “silent”, “sleeping”, “dormant”, or just
“unspoken”.
The second list, the one of languages of which there is suspicion or even good reason to believe but no certainty that there
are no longer native speakers, serves to call attention to those languages that are perhaps no longer spoken but where it may
be possible, nevertheless, that speakers might remain. Such extremely precarious languages merit high priority. Many will
be and should be the objects of caring concern by those whose heritage languages these cases represent.
These lists demonstrates starkly the problem of language endangerment by showing just how many of the world’s languages
have already become extinct or are “silent” (“sleeping”), in contrast to the great many languages that are currently
endangered, listed in this catalogue. Up to now, 635 known languages appear on these two lists, just under 10% of the
language known ever to have existed. Already all the languages of more than 100 language families (including language
isolates) are extinct from among the 420 independent language families (including isolates) in the world – 25% of the
linguistic diversity of the world has already disappeared. Worse, this number will change radically and rapidly: the
Catalogue of Endangered Languages has just over 3,000 entries from among the approximately 7,000 living languages in
the world – by this count, 43% of living languages are endangered! The number of extinct languages will soon swell
dramatically. Clearly, as these numbers show, languages on a course towards extinction are vastly more numerous currently
than in the past.
12 1. Language of Uncertain but Precarious Status (sometimes reported as having no speakers) (141 languages):
Agwamin
Akkala Saami
Alngith
Amanaye
Amonap
Arabana-Wangkangurru
Arapaso
Arara (Arara do Beiradao)
Ariba
Atampaya
Atsugewi
Ayapathu
Bare
Baygo
Bung
Canichana
Catawba
Cayuvava
Chiapanec
Chilanga (Salvadoran Lenca)
wmi
sia
aid
ama
mzo
ard
arp
axg
aea
amz
atw
ayd
bae
byg
bgd
caz
chc
cyb
cip
len?
Jumaytepeque Xinka
Kansa
Kushyana (Kaxuiana)
Kerek
Klallam
Korana
Kukatj
Kuku-Mangk
Kuku-Mu'inh
Kuku-Ugbanh
Kuku-Thaypan
Laimon
Lake Miwok
Lamalama
Lapachu
Leco
Lipan
Lower Chinook
Lower Chehalis
Macaguaje
xin
ksk
kbb
krk
clm
kqz
ggd
xmg
xmp
ugb
typ
coj
lmw
lby
qa6
lec
apl
chh
cea
mcl
Chimariko
cid
Maidu
nmu
Chitimacha
Chiwere
Coast Miwok
ctm
iow
csi
Copper Island Aleut
mud
Cupeño
Deti
Dirari
Djangun
Duungidjawu
Eastern Pomo
Eel River Athabaskan
Eyak
Gamberre
Ganggalida
Garlali
Guazacapan Xinka
Gununa-Kune
Gurr-goni
Hanis
Honduran Lenca
Gros Ventre
Hpon
Ilgar
Itene
cup
shg-det
dit
djf
wkw-duu
peb
qt8
eya
gma
gcd
nbx
xin
pue
gge
csz
len?
ats
hpo
ilg
ite
Makolkol
Malyangapa
Mandahuaca
Martha’s Vineyard Sign
Language
Mbariman-Gudhinma
Miami-Illinois
Miriti
Miwa
Muluridyi
Mayi-Kutuna
Mbabaram
Ngamini
Ngarinyin
Ngawun
Ngumbarl
Nimbari
Nisenan
Njerep
Nungali
Nyaki Nyaki
Nyang'i
N|u
Ona
Opata-Eudeve
Jawi
djw
Otoe
Jiwarli
djl
Paraujano
13 trz
tno
tkf
tud
umg
teb
0qk
tol
umd
psm
lwl-pha
pit
pmw
qua
qui
kyl
sar
swn
ser
pom
zmh
0h1
mht
Tora
Toromona
Tukumanfed
Tuxa
Umbuygamu
Teteté
Teushen
Tolowa
Umpithamu
Pauserna
Phalok
Pitta-Pitta
Plains Miwok
Quapaw
Quileute
Santiam
Saraveca
Sawknah
Serrano
Southeastern Pomo
Southern Sierra
Miwok
Tapeba
Tequiraca
Umutina
mre
Unami
unm
zmv
mia
mvv
vmi
vmu
xmy
vmb
nmv
wil
nxn
08s
nmr
nsz
njr
nug
nys
nyp
ngh
ona
opt
iowoto
pbg
Uradhi
Uru
Vilela
Wanggamala
Wangganguru
Wappo
Wik-Epa
Wik-Keyangan
Wirafed
Wirangu
Wiyot
Wotapuri-Katargalai
Xakriaba
Xiriâna
Yahuna
Yameo
Yangman
Yaquina
Yavitero
Yir-Yoront
urf
ure
vil
wnm
wgg
wao
wie
wif
wir
wiw
wiy
wsv
xkr
xir
ynu
yme
jng
aes
yvt
yiy
Zire
sih
Ziriya
zir
skd
tbb
ash
umo
2. Extinct Languages (494 languages)
//Xegwi
xeg
Aruá
aru
Chipiajes
/Xam
xam
Assan
xss
Chiquimulilla Xinka
Abipon
Abishira
Abnaki, Eastern
Acroá
Adai
Aequian
Aghu Tharnggalu
Aghwan
Agta, Dicamay
Aguano
Ahom
Ajawa
Aka-Bea
Aka-Bo
Aka-Cari
Aka-Jeru
Aka-Kede
Aka-Kol
Aka-Kora
Akar-Bale
Akkadian
Alanic
Algonquian, Carolina
Alsea
Ammonite
Andaqui
Andoa
Anglo-Norman
Anserma
Apalachee
Aquitanian
Arabic, Andalusian
Aramaic, Jewish
Babylonian
Aramaic, Jewish
Palestinian
Aramaic, Official
Aramaic, Samaritan
Aranama-Tamique
Arára, Mato Grosso
Aribwatsa
Arikem
Arin
Arma
Armazic
axb
ash
aaq
acs
xad
xae
ggr
xag
duy
aga
aho
ajw
abj
akm
aci
akj
akx
aky
ack
acl
akk
xln
crr
aes
qgg
ana
anb
xno
ans
xap
xaq
xaa
Atakapa
Atsahuaca
Aushiri
Auyokawa
Avar, Old
Avestan
Awabakal
Ayta, Tayabas
Bactrian
Baga Kaloum
Baga Sobané
Banggarla
Baniva
Barbacoas
Barbareño
Baré
Basa-Gumna
Basay
Bayali
Baygo
Beothuk
Berti
Biloxi
Bina
Biri
Birked
Bolgarian
Cacaopera
Cagua
Camunic
Caramanta
Carian
aqp
atc
avs
auo
oav
ave
awk
ayy
xbc
bqf
bsv
bjb
bvv
bpb
boi
bae
bsl
byq
bjy
byg
bue
byt
bll
bmn
bzr
brk
xbo
ccr
cbh
xcc
crf
xcr
Cholón
Chorasmian
Chorotega
Chumash
Chuvantsy
Coahuilteco
Cochimi
Comecrudo
Coptic
Coquille
Cornish
Cotoname
Coxima
Coyaima
Creole Dutch, Skepi
Cruzeño
Cumanagoto
Cumbric
Cumeral
Curonian
Dacian
Dagoman
Dalmatian
Deir Alla
Delaware, Pidgin
Dhurga
Dieri
Dororo
Duli
Dura
Eblan
Edomite
cbe
xinchi
cht
xco
cjr
chs
xcv
xcw
coj
xcm
cop
coq
cnx
xcn
kox
coy
skw
crz
cuo
xcb
cum
xcu
xdc
dgn
dlm
xdr
dep
dhu
dif
drr
duz
drq
xeb
xdm
tmr
Carib, Island
crb
Egyptian
egy
jpa
Catawba
chc
Elamite
elx
arc
sam
xrt
axg
laz
ait
xrn
aoh
xrm
Cauca
Cayubaba
Cayuse
Celtiberian
Chagatai
Chané
Chibcha
Chicomuceltec
Chimakum
cca
cyb
xcy
xce
chg
caj
chb
cob
cmk
Elymian
Emok
Epi-Olmec
Esselen
Esuma
Etchemin
Eteocretan
Eteocypriot
Etruscan
xly
emo
xep
esq
esm
etc
ecr
ecy
ett
14 Faliscan
Frankish
GabrielinoFernandeño
Gafat
Galatian
Galice
Galindan
Gamo-Ningi
Gangulu
Garza
Gaulish, Cisalpine
Gaulish, Transalpine
Geez
Gey
xfa
frk
Kalapuya, Southern
Kalarko
sxk
kba
Langobardic
Laurentian
lng
lre
xgf
Kalkutung
ktg
Lemnian
xle
gft
xga
gce
xgl
bte
gnl
xgr
xcg
xtg
gez
guv
Kamakan
Kamas
Kamba
Kambiwá
Kaniet
Kanoé
Kapinawá
Kara
Kara
Karakhanid
Karami
vkm
xas
xba
xbw
ktk
kxo
xpn
lnj
xlp
xli
xlg
lab
pml
xlo
xlb
lmz
xls
xlu
Ghomara
gho
Karankawa
zkk
Gothic
Greek, Cappadocian
got
cpg
Karipúna
Karirí-Xocó
kgm
kzw
Guana
gqn
Kariyarra
vka
Guanche
Gugu Warra
Gule
Guliguli
Gureng Gureng
Guyani
Hadrami
Harami
Hattic
Hermit
Hernican
Hibito
Hittite
Homa
Horo
gnc
wrw
gly
gli
gnr
gvy
xhd
xha
xht
llf
xhr
hib
hit
hom
hor
Karkin
Kaskean
Katabaga
Kaurna
Kawi
Kazukuru
Kepkiriwát
Ketangalan
Khazar
Khorezmian
Kitan
Kitsai
Knaanic
Koguryo
Koibal
krb
zsk
ktq
zku
kaw
kzk
kpn
kae
zkz
zkh
zkt
kii
czk
zkg
zkb
Hunnic
xhc
Kott
zko
Hurrian
Iberian
Ifo
Illyrian
Ineseño
Iowa-Oto
Jorá
Jurchen
Kaimbé
Kakauhua
xhu
xib
iff
xil
inz
iow
jor
juc
xai
kbf
koc
zkv
kof
uun
qwm
ggk
kuz
kgg
wka
kwz
Kalapuya, Northern
nrt
Kpati
Krevinian
Kubi
Kulon-Pazeh
Kuman
Kungarakany
Kunza
Kusunda
Kw'adza
Kwadi
KwalhioquaTlatskanai
Leningitij
Lepontic
Liburnian
Ligurian
Linear A
Lingua Franca
Loup A
Loup B
Lumbee
Lusitanian
Luwian, Cuneiform
Luwian,
Hieroglyphic
Lycian
Lydian
Macedonian,
Ancient
Maek
Mahican
Maidu, Valley
Malgana
Mamulique
Manangkari
Mandaic, Classical
Mangue
Manipuri, Old
Manx
Maritsauá
Marrucinian
Marsian
Matagalpa
Mator
Mator-TaygiKaragas
Mattole
Mawa
Maykulan
Mbara
Median
Meroitic
Mesmes
Messapic
Michigamea
Miluk
qwt
Milyan
imy
zra
xqa
xar
15 hlu
xlc
xld
xmk
hmk
mjy
vmv
vml
emm
znk
myz
mom
omp
glv
msp
umc
ims
mtn
mtm
ymt
mvb
wma
mnt
mvl
xme
xmr
mys
cms
cmm
iml
Minaean
Minoan
Mittu
Miwok, Bay
Miwok, Coast
Mlahsö
Moabite
Mobilian
Mochica
Mohegan-MontaukNarragansett
Moksela
Molale
Mozarabic
Mulaha
Muskum
Mysian
Nadruvian
Nagumi
Nanticoke
Narrinyeri
Natagaimas
Natchez
inm
omn
mwu
mkq
csi
lhs
obm
mod
omc
Oirat, Written
Oko-Juwoi
Omejes
Omok
Omurano
Oscan
Oti
Otuke
Ouma
xwo
okj
ome
omk
omu
osc
oti
otu
oum
Puquina
Puri
Purisimeño
Puyo
Puyo-Paekche
Pyu
Qatabanian
Raetic
Rema
puq
prr
puy
xpy
xpp
pyx
xqt
xrr
bow
mof
Paekche
pkc
Remo
rem
vms
mbe
mxi
mfw
mje
yms
ndf
ngv
nnt
nay
nts
ncz
Paelignian
Pahlavi
Paisaci Prakrit
Palaic
Pali
Palumata
Pame, Southern
Pamlico
Pankararé
Pankararú
Panobo
Papora
pgn
pal
qpp
plq
pli
pmc
pmz
pmk
pax
paz
pno
ppu
rna
xsa
sbv
xsk
sln
qey
smp
sjk
sar
xsc
sxl
sds
Nawathinehena
nwa
Paranawát
paf
Negerhollands
Neo-Aramaic, Barzani
Jewish
Newar, Middle
Ngandi
Nganyaywana
Ngbee
Niuatoputapu
Nocamán
Nooksack
Noric
Norn
North Arabian, Ancient
Nottoway-Meherrin
Nubian, Old
Nukuini
Numidian
Nyang'i
Obispeño
Ofayé
dcr
Parthian
xpr
Runa
Sabaean
Sabine
Sakan
Salinan
Sam'alian
Samaritan
Sami, Kemi
Saraveca
Scythian
Selian
Sened
Senhaja De
Srair
Sensi
bjf
Pataxó Hã-Ha-Hãe
pth
Seroa
kqu
nwx
nid
nyx
jgb
nkp
nom
nok
nrc
nrn
xna
nwy
onw
nuc
nxm
nyp
obi
opy
Pecheneg
Pentlatch
Phoenician
Phrygian
Picene, North
Picene, South
Pictish
Pidgin, Timor
Pijao
Pirlatapa
Pomo, Northern
Ponares
Potiguára
Powhatan
Prākrit, Ardhamāgadhī
Prākrit, Māhārāṣṭri
Prākrit, Sauraseni
xpc
ptw
phn
xpg
nrp
spx
xpi
tvy
pij
bxi
pej
pod
pog
pim
pka
pmh
psu
szd
snh
sdt
sxc
scx
xsd
sgm
fos
sis
svx
sog
xso
sxo
sut
xsv
sux
sqn
Ofo
ofo
Prussian
prg
Ohlone, Northern
Ohlone, Southern
cst
css
Pumpokol
Punic
xpm
xpu
Seru
Shinabo
Shuadit
Sicanian
Sicel
Sidetic
Singa
Siraya
Siuslaw
Skalvian
Sogdian
Solano
Sorothaptic
Subtiaba
Sudovian
Sumerian
Susquehannock
Syriac,
Classical
Taino
Takelma
16 sjs
sni
syc
tnq
tkm
Tama
Tamanaku
Tanema
ten
tmz
tnx
Woccon
Worimi
Wulguru
xwc
kda
qgu
Tangut
txg
Wuliwuli
wlu
Tapeba
Tartessian
Tasmanian
Tay Boi
Tepecano
Torona
Totoro
Tripuri, Early
Truká
Tsetsaut
Tubar
Tumshuqese
Tunica
Tupí
Tupinambá
Tupinikin
Turiwára
Turung
Tutelo
Tuxá
Tuxináwa
Twana
Uamué
Ubykh
Ugaritic
Umbrian
Piro
tbb
txr
xtz
tas
tep
tqr
ttk
xtr
tka
txc
tbu
xtq
tun
tpw
tpn
tpk
twt
try
tta
tud
tux
twa
uam
uby
uga
xum
pie
wur
wya
xoo
ybn
ylr
ynn
ysc
yei
yob
yug
yub
ysr
tmg
twc
txh
tbh
til
tjm
tgv
tju
tgy
xto
txb
toe
tjn
tqw
umo
Piscataway
psy
Pisidian
Pochutec
Polabian
Pomo,
Eastern
Wampanoag
Wandarang
Wariyangga
Wasu
Weyto
xps
xpo
pox
Wurrugu
Wyandot
Xukurú
Yabaâna
Yalarnnga
Yana
Yassic
Yeni
Yoba
Yug
Yugambal
Yupik, Sirenik
Ternateño
Teshenawa
Thracian
Thurawal
Tillamook
Timucua
Tingui-Boto
Tjurruru
Togoyo
Tokharian A
Tokharian B
Tomedes
Tonjon
Tonkawa
Umotína
Umpqua,
Upper
Urartian
Uruava
Urumi
peb
Vandalic
xvn
wam
wnd
wri
wsu
woy
Vano
Venetic
Ventureño
Vestinian
Volscian
vnk
xve
veo
xvs
xvo
xup
xur
urv
uru
17 Waamwang
Wailaki
Wakoná
Yupiltepeque
Xinka
Zarphatic
Zemgalian
Zhang-Zhung
wmn
wlk
waf
xin-yul
zrp
xzm
xzh
Download