Language Codes Anthony Aristar LINGUIST List, ILIT & Eastern Michigan University DELAMAN, London, 3rd November 2006 A Brief History of Ethnologue Begun over 50 years ago First edition 1951 3-letter codes instituted in 1971 Publishes a new version every 4 years The latest version is Ethnologue 15) Nov 3, 2006 Principles behind Ethnologue Codes Consistently apply an operational definition of language so that all entities for which an identifier is assigned are of a comparable nature Encompass all of the languages of the world, Clearly document the speech variety that each identifier denotes Maintain and update the system on an on-going basis Make the system freely and readily accessible to the public over the Internet Nov 3, 2006 Range of Coverage The Ethnologue system is intended to encompass only those languages of the world in current use. Thus the Ge’ez (Ethnologue code gez) and Sanskrit (Ethnologue code san) languages both appear in Ethnologue Most ancient languages are thus absent Nov 3, 2006 Ancient Languages added Agreement made between Ethnologue and the LINGUIST List that LINGUIST would add codes for ancient and constructed languages Furthermore… Nov 3, 2006 The Canary Agreement, 2002 All languages which require codes and which became extinct before 1950 should become the responsibility of LINGUIST. All languages after 1950 will be in the purview of Ethnologue. Nov 3, 2006 The need for a standard The ISO organization had adopted in 1988 a 2-letter set of language codes (ISO 639-1) Inadequate: 136 codes In 1998 adopted 3-letter 639-2 codes Still inadequate: 460 codes, many defining multiple languages. Nov 3, 2006 Becoming a standard In 2002, ISO TC37/SC2 invited Ethnologue to participate in the development of a new standard Must be a superset of ISO 639-2 Would provide identifiers for all known languages. Nov 3, 2006 Issues Ethnologue had to violate its own rules to accomplish this: Macrolanguages had to be included, e.g. code zho for all Chinese languages Codes had to be reused (e.g. code san now used for Sanskrit (previously skt), once used for the Niger-Congo language Sakata (once san) But collective codes (e.g. afa for Afroasiatic) were abandoned Nov 3, 2006 Ethnologue/LINGUIST codes a standard In 2004 the revised Ethnologue/LINGUIST codes became what is called a DIS or “Draft International Standard”. Usually called simply ISO 639-3, but its correct title is ISO/DIS 639-3. SIL became the Registration Authority or curator of the codes Nov 3, 2006 Dissatisfaction with Ethnologue But now that Ethnologue was a standard, people had to use it E.g. NSF now requires ISO 639-3 codes LSA requires them when you submit an abstract… Many digital sites using them… And there are shortcomings in Ethnologue… Nov 3, 2006 Shortcomings in Ethnologue Every language in Ethnologue is documented to a greater or lesser degree, but… We usually do not have a clear idea of the evidence upon which it was decided to assign the language a unique code. Languages which should not be there Languages which should be there, but aren’t Dialects called languages… and vice versa Wrong names used for languages… Wrong locations… wrong populations… Nov 3, 2006 Most of all… It was very hard to get SIL to make a change in the code-set! Nov 3, 2006 Meeting at LSA in Oakland Members of the community and representatives of SIL Very cooperative meeting SIL accepting of need for more community input Nov 3, 2006 Decision that committees would best handle mass code-set changes for specific areas Committee set up to oversee thoroughgoing revision of code-set for Americas, Chairman Lyle Campbell SSILA (Society for the Study of the Indigenous Languages of the Americas) initiating the process Nov 3, 2006 How the process works Forms available from address: ISO639-3@sil.org Two forms: Change request form New code request form Formal review of next set of code changes starts December 1, 2006 Nov 3, 2006