Use of language codes in RDA name authority records, Associated Language: ISO 639-2 versus ISO 639-3 - an Africana cataloger’s perspective (and learning curve) Marcia Tiede Area Studies Cataloger, Northwestern University Library March 2014 One of the many optional MARC 21 fields in name and series authority records under RDA is Associated Language (377), defined as: "Codes for languages associated with the entity described in the record. Includes the language a person uses when writing for publication, broadcasting, etc., a language a corporate body uses in its communications, a language of a family, or a language in which a work is expressed." The principal subfield codes in Associated Language are ‡a , Language code, and ‡l, Language term. Both are repeatable. ‡2 , Source of language code (nonrepeatable), is used in association with ‡a if the code source is not ISO 639-2. As in other fields, if a term’s source is specified in ‡2 , 7 is used in the second indicator. ISO 639-2 and ISO 639-3 are both alpha-3 (three-letter) international standard codes for language identification. They represent two parts of a six-part standard that has been evolving since the 1990s from the original alpha-2 code known simply as ISO 639 (now ISO 639-1). ISO 639-2 (1998) is a code set for over 400 individual languages and collections of languages. ISO 639-3 (2007) is a code set containing all the codes from ISO 639-2 plus 7,000 others, which aims to be "comprehensive" in its coverage of languages. ISO 639-2 is the default language code set in MARC, and was in fact based on the MARC Language Code. As of 2007 when the Introduction to the MARC Code List for Languages was written, there were 484 language codes in MARC, of which 55 were collective codes. The relationship between the MARC and ISO-639-2 language codes is described there as follows: ISO 639-2 ... was based on the MARC Code List for Languages. Language names in ISO 639-2 are not necessarily the same as those in MARC, particularly because of the practice of correlating the MARC language names with those used in Library of Congress Subject Headings. The MARC list includes references for unused forms of language names, while the ISO list has in [only] some cases included alternative name forms ... In addition the MARC documentation includes a list of individual languages under collective codes or language groups, while the ISO list only includes the group codes themselves. The Library of Congress is the maintenance agency for both lists, and the two are kept compatible in terms of code additions and deletions. ISO 639-2 is also the code set used to represent written languages in NISO Z39.53, Codes for the Representation of Languages for Information Exchange. ISO 639-2 is intended for use in "libraries, archives, and other documentation applications." The Library of Congress is the registration authority for ISO 639-2 codes. In accordance with LC Standards, Criteria for ISO 639-2 (Sep. 22, 2006, issued just before the publication of ISO 639-3), there are a variety of "objective and subjective metrics" for petitioning to include a language in ISO 639-2. The criteria include number of documents in that language (50 or more from a given institution or group of five institutions), along with other factors such as documentation of the language's official status and use in formal education. ISO 639-2 language codes cannot in theory be changed (though they can be "retired" or discontinued). The process of proposing a new language code for ISO 639-2 depends on a fairly formidable set of criteria. And in fact, almost no language codes have been added to ISO 639-2 in recent years. 1 The Africana cataloging context African languages for which few or no cataloged publications exist do not have a language identifier code assigned in ISO 639-2, or may be represented only by a collective (language cluster) code at a very broad level. I have not yet come across an explanation for how these collective codes were determined, beyond mention of the practical need to cluster related languages that are still in some cases being classified. For most libraries the codes provided in ISO 639-2 are more than adequate. And for library users, subject or keyword access is what counts. The complexity of African (and other) language categorization, and the scarcity or nonexistence of publications about some African languages, means that many African languages are still without established LC subject headings. Unlike language codes, language subject headings may be proposed to Library of Congress readily, at the point of need, when cataloging a publication about a language for which there is not yet a relevant heading. Usage in that publication is a starting point in creating the language heading proposal, and the format of the subject proposal follows the "pattern" described in the Subject Headings Manual, H 1154 - Languages. As with other subject headings, there is the possibility of proposing changes to them over time. Submission of African language subject proposals is generally done via the SACO Africana Subject Funnel. Cataloging materials that are published in African languages is another matter, and not one that I will get into much here, because I’ve had limited experience of that so far. But even for libraries that specialize in collecting Africana, we outside the African context experience only a very small sampling of the actual volume that is produced in some languages. Identifying the language in itself can be very challenging, much less comprehending the content. Locating language resources or expertise in such a variety of languages is something that requires patience and resourcefulness. Script can be an issue, even in a romanized context. The diversity of languages in a single country can be bewildering; South Africa has eleven official languages as of 1997, meaning that government and educational publications are often issued in parallel in multiple languages. More to the point for use of Associated Language in name authority files, there is a fair likelihood of multilingual knowledge and production, be the person a missionary, musician or Muslim scholar. The definition of this field hews to a fairly narrow interpretation of “communication” for individuals – “writing for publication, broadcasting, etc.” – but I have generally interpreted this a little more broadly to include any languages a person is likely to use for communication of any sort. Someone who translates may not actually write in that language, but is able to use his/her understanding to “commune” with that language in order to transform the meaning into another language. And in the Africana context, there is a high likelihood that one of those languages might not have an equivalent MARC / ISO 639-2 code, or only be represented by a collective code. Getting back to subject headings: Africana catalogers refer to Ethnologue: Languages of the World, a project of SIL International, as a primary resource for proposing new language headings. Ethnologue uses ISO 639-3 language identifier codes, and SIL International is the registration authority for ISO 639-3 codes. According to the code's home page on SIL's website, "ISO 639-3 attempts to provide as complete an enumeration of languages as possible, including living, extinct, ancient, and constructed languages, whether major or minor, written or unwritten." Gary Simons of SIL International recently did a synopsis of the history of, need for, and goals of ISO 639-3 and the other members of the ISO 639 family. One small point where ISO language codes and MARC or other language names can meet up, in the context of a name authority record’s 377 field, is in the use of ‡l, Language term – specifically, to clarify an ISO 6392 collective language code. (There are no collective language codes in ISO 639-3.) Field 377's ‡l would seem to be a useful field for the (anglophone) human eye. My first assumption was that this could be used at will. But the Descriptive Cataloging Manual (DCM, Z1) specifies, "Prefer language codes over language terms .... Use subfield $1 (Language term) only to provide information not available in the MARC Code List 2 for Languages.” (In any case, the spelled-out justification for using a given language code should already be supplied in a 670 note field, Source Data Found, in human-readable form.) Here is the example that DCM Z1 provides for use of Associated Language: 377 ## $a myn 377 #7 $a acr $2 iso639-3 (ISO 639-3 code for Achi (acr); assigned a collective code (myn) for Mayan languages in the MARC Code List for Languages) Though “Achi” is established as a language subject heading and could therefore (I believe) have been supplied as a legitimate, clarifying term in 377 ‡l, a separate 377 was made to specify the Achi language according to its ISO 639-3 code. This leads me to some questions. If the language is not yet given in the MARC Code List for Languages, and there is only a collective language code in MARC, what would one use in ‡l ? The answer seems to be that one would use the relevant LC language heading if there is one (just the substantive part, dropping "language" or "dialect"); and if an LCSH for that language doesn't yet exist, one can supply the language name as given in a reference source such as Ethnologue. My thought was that, in the "natural" progression of things, at some point newly approved LC language subject headings would appear on the MARC Code List for Languages, Name Sequence, and that they would be assigned a MARC code. In other words, I had assumed, naively, that our efforts to establish LCSH's for languages led in some way to establishment of MARC language codes for these languages. But that is not the case – or at least not now, though language headings that were already extant did shape the MARC Code List for Languages. There is an appendix of Changes to MARC Code List for Languages since the 2007 Edition, updated most recently four months ago (November 2013), but under Part IV: New Codes, it notes, "None since 2007 Edition." On the LC Standards ISO 639-2 Registration Authority there is a table tracking changes from 1989 to 2012, last updated a year ago. After a flurry of language code additions through 2006, there were two in 2007 and one in 2012 (for zgh, Standard Moroccan Tamazight). So it seems that the MARC language codes upon which ISO 639-2 was based, and ISO 639-2 itself, are essentially becoming static, non-developing standards. The changes that have taken place are tweaks to the language name itself (e.g. Maasai rather than Masai), reassignment of a language to a different collective language code, and three actual code changes - for Serbian, Croatian, and Moldovan - which occurred in mid-2008 and early 2009. This last part is interesting, since the whole point of creating "standard" codes for language identification is to have a fixed point of reference. But there were two factors that entered into these particular changes - political upheaval (the breakup of Yugoslavia) reflected in language splitting, and script differences in language expression (Moldavan in Cyrillic, Romanian in Latin) being "compressed" into a single code. (See Tanya Whippie's 2010 thesis describing the use of MARC codes for these languages.) If one is using a code from ISO 639-3, what is to prohibit use of a complementary ‡l language term with that? The Descriptive Cataloging Manual stipulation to "prefer language codes over language terms" was written specifically in the context of MARC language codes. Here is a hypothetical case: Subject uses a language that has not been established as an LC subject heading, and (of course) has no ISO 639-2 code. Since in this case one is not specifying a language to make greater “sense” of a collective code, my initial thought was to use two 377 fields, as follows: 3 377 _ 7 bog ‡2 iso639-3 377 _ 7 ‡l Bamako Sign Language ‡2 iso639-3 But there is apparently a philosophical basis for preferring use of language codes rather than names, as a way to skirt potential political issues around the full expression of a language name. This controversy remains even at the code level, though, with some codes being truncations of language names that could be taken as pejorative to speakers of the language. It is also good to remember, for perspective (something that one does lose track of sometimes in the thick of the effort), that these name authority records are not actually intended as reading material for anyone outside of the rarefied cataloging community. (Though shared public resources such as VIAF are exposing a narrow slice of that content.) The purpose of these codes is for machine recognition, to permit data retrieval and manipulation. Since use of the 377 field is merely optional, it is not clear how useful such manipulation might be (and to what ends it might be put). But we are just coming up on the first anniversary of RDA implementation, so time will tell more. What formality or usefulness is there to the term used in ‡l language term? And if there is no formality or usefulness, why does that subfield even exist? I had assumed that one needed to use an established form, preferably as established in an LCSH if not (yet) in MARC. But given the lack of transfusion of newer LCSH language headings into MARC language codes / names, I now doubt that. A paper presented by Stephen Morey, Mark W. Post and Victor A. Friedman in December 2013, "The language codes of ISO 639: a premature and possibly unobtainable standardization," calls into question the usability and credibility of the entire ISO 639 standardization endeavor. Among other critiques, they single out ISO 639-3 and its maintenance by SIL International as being "excessively centralized" and potentially preserving offensive names for language communities, and that "the in-principle 'permanency' of language codes such as those of ISO 639-3 is fundamentally incompatible with the nature of human languages, which are demonstrably impermanent". (See a commentary on their presentation by Martin Haspelmath.) In a bibliographic context, however, the printed word is at least somewhat permanent, and we can try to describe it as such. Our language coding efforts in name authority work – which only obliquely refers to language materials produced or potentially produced by those entities – is a different endeavor. *** Following are three situations recently encountered in creating or revising personal name authority records under RDA, to illustrate Associated Language (377) field use with a combination of ISO 639-2 and ISO 639-3 language codes. 4 Case 1: Subject is bilingual, Arabic and Zaghawa. Zaghawa language has been established as LCSH since 1992, but there is no equivalent MARC / ISO 639-2 language code. It is classified in Ethnologue as NiloSaharan, Saharan, Eastern, and its ISO 639-3 language code, zag, is provided there. Two separate 377 fields may be created – the first, ara for Arabic, with no source flagging needed since the default source is ISO 639-2; and the second, zag for Zaghawa, flagged with its source as ISO 639-3, and second indicator 7. In addition, based on the language classification in Ethnologue, one may enter a 377 field for the ISO 639-2 collective code for Nilo-Saharan languages, ssa, followed by a clarifying language term, even though this language is not referenced in the MARC Code List for Languages. When using ‡l to clarify a collective ISO 639-2 code, that pairing needs to be in its own field. 5 Case 2: Subject uses three languages—Dutch, English, and Adhola. Adhola language is established as LCSH since 2009, but has only a collective code in MARC / ISO 639-2: ssa, which represents Nilo- Saharan languages. Three separate 377 fields may be created. The first is for the established MARC / ISO 639-2 codes for Dutch and English. The second is for the ISO 639-2 collective code ssa and the subfield for the specific language, Adhola, since when employing ‡l to clarify a collective code, that pairing needs to be in its own field. The third is for the ISO 639-3 language code adh flagged with its source. An additional issue here is that none of the several more specific levels of linguistic group for this language have been established in MARC / ISO 639-2 as collective codes, though one of them, Nilotic languages, has been established since 1985 as an LCSH. So we are obliged to use the very broad collective code. 6 Case 3: Subject uses or has worked in three languages—French, Amharic, and Dogon. French and Amharic have MARC / ISO 639-2 language codes. Dogon has been assigned an ISO 639-2 collective code and has had an LC language subject heading since 1985. On further investigation, however, Dogon turns out to be a collective language name in itself, representing over a dozen languages, some of which are not mutually intelligible. There are several more specific codes available for Dogon languages in ISO 639-3. Three 377 fields may be entered here. The first gives the MARC / ISO 639-2 codes for French and Amharic. The second gives the MARC / ISO 639-2 collective code nic for Niger-Kordofanian (Other), with a clarifying ‡l, Dogon - which, however, turns out to be a collective term in itself. The third 377 is for the ISO 639-3 language code dts for the more specific Dogon language that Griaule studied, Toro So. 7 References: Codes for the Representation of Languages for Information Interchange, ANSI/NISO Z39.53-2001 http://www.niso.org/apps/group_public/download.php/6541/Codes%20for%20the%20Representation%20 of%20Languages%20for%20Information%20Interchange.pdf Codes for the Representation of Names of Languages: ISO 639-2/RA [Registration Authority] change notice. http://www.loc.gov/standards/iso639-2/php/code_changes.php Criteria for ISO 639-2 http://www.loc.gov/standards/iso639-2/criteria2.html Ethnologue: Languages of the world. 17th edition; online version. https://www.ethnologue.com/ Haspelmath, Martin. Can language identity be standardized? On Morey et al.'s critique of ISO 639-3. Diversity Linguistics Comment website, posted Dec. 4, 2013. http://dlc.hypotheses.org/610 ISO 639-2: an international standard for language codes. November 1998 (rev. June 4, 1999) http://www.loc.gov/marc/iso639.html ISO 639-3 Registration Authority. http://www-01.sil.org/iso639-3/ Marc 21 Format for Authority Data: 377: Associated Language. http://www.loc.gov/marc/authority/ad377.html MARC Code List for Languages. http://www.loc.gov/marc/languages/ MARC Code List for Languages: Introduction (as of Sep. 2007). http://www.loc.gov/marc/languages/introduction.pdf Simons, Gary. ISO 639-3 : where are we and how did we get here? Workshop on Identifying Codes for Languages, Newcastle, Australia, 9 Feb. 2013. http://www-01.sil.org/~simonsg/local/ISO%20639-3.pdf Source Codes for Vocabularies, Rules, and Schemes: Language Code and Term Source Codes. http://www.loc.gov/standards/sourcelist/language.html Whipple, Tanya L. A study of the use of MARC language codes in OCLC catalog records. Thesis, M.S.L.S., University of North Carolina at Chapel Hill, 2010. https://cdr.lib.unc.edU/indexablecontent/uuid:be4aad8f-3846-472b-a8e9-05818d5d2186 8