Language Codes

advertisement
Language Codes
Anthony Aristar
LINGUIST List, ILIT &
Eastern Michigan University
DELAMAN, London, 3rd November 2006
A Brief History of Ethnologue





Begun over 50 years ago
First edition 1951
3-letter codes instituted in 1971
Publishes a new version every 4 years
The latest version is Ethnologue 15)
Nov 3, 2006
Principles behind
Ethnologue Codes





Consistently apply an operational definition of
language so that all entities for which an identifier is
assigned are of a comparable nature
Encompass all of the languages of the world,
Clearly document the speech variety that each
identifier denotes
Maintain and update the system on an on-going
basis
Make the system freely and readily accessible to
the public over the Internet
Nov 3, 2006
Range of Coverage


The Ethnologue system is intended to
encompass only those languages of the world
in current use. Thus the Ge’ez (Ethnologue
code gez) and Sanskrit (Ethnologue code
san) languages both appear in Ethnologue
Most ancient languages are thus absent
Nov 3, 2006
Ancient Languages added


Agreement made between Ethnologue and
the LINGUIST List that LINGUIST would add
codes for ancient and constructed languages
Furthermore…
Nov 3, 2006
The Canary Agreement, 2002

All languages which require codes and
which became extinct before 1950 should
become the responsibility of LINGUIST. All
languages after 1950 will be in the purview
of Ethnologue.
Nov 3, 2006
The need for a standard

The ISO organization had adopted in 1988 a
2-letter set of language codes (ISO 639-1)


Inadequate: 136 codes
In 1998 adopted 3-letter 639-2 codes

Still inadequate: 460 codes, many defining
multiple languages.
Nov 3, 2006
Becoming a standard



In 2002, ISO TC37/SC2 invited Ethnologue to
participate in the development of a new
standard
Must be a superset of ISO 639-2
Would provide identifiers for all known
languages.
Nov 3, 2006
Issues

Ethnologue had to violate its own rules to
accomplish this:
Macrolanguages had to be included, e.g. code
zho for all Chinese languages
 Codes had to be reused (e.g. code san now
used for Sanskrit (previously skt), once used
for the Niger-Congo language Sakata (once
san)


But collective codes (e.g. afa for Afroasiatic)
were abandoned
Nov 3, 2006
Ethnologue/LINGUIST codes a
standard



In 2004 the revised Ethnologue/LINGUIST
codes became what is called a DIS or “Draft
International Standard”.
Usually called simply ISO 639-3, but its
correct title is ISO/DIS 639-3.
SIL became the Registration Authority or
curator of the codes
Nov 3, 2006
Dissatisfaction with Ethnologue

But now that Ethnologue was a standard,
people had to use it
E.g. NSF now requires ISO 639-3 codes
 LSA requires them when you submit an
abstract…
 Many digital sites using them…


And there are shortcomings in Ethnologue…
Nov 3, 2006
Shortcomings in Ethnologue

Every language in Ethnologue is documented to a
greater or lesser degree, but…






We usually do not have a clear idea of the evidence
upon which it was decided to assign the language a
unique code.
Languages which should not be there
Languages which should be there, but aren’t
Dialects called languages… and vice versa
Wrong names used for languages…
Wrong locations… wrong populations…
Nov 3, 2006
Most of all…

It was very hard to get SIL to make
a change in the code-set!
Nov 3, 2006
Meeting at LSA in Oakland

Members of the community and
representatives of SIL
Very cooperative meeting
 SIL accepting of need for more community
input

Nov 3, 2006


Decision that committees would best handle
mass code-set changes for specific areas
Committee set up to oversee thoroughgoing
revision of code-set for Americas,
Chairman Lyle Campbell
 SSILA (Society for the Study of the Indigenous
Languages of the Americas) initiating the
process

Nov 3, 2006
How the process works

Forms available from address:


ISO639-3@sil.org
Two forms:
Change request form
 New code request form


Formal review of next set of code changes
starts December 1, 2006
Nov 3, 2006
Download