Mixed Orthographies and Related AACR2 Cataloging Issues White Paper Geoff Husić for the ALA SEES Taskforce on Updating the Slavic Cataloging Manual Background Many languages of Europe have gone through periods in their histories when their written forms were very unsettled. This may have been because of various dialects competing to be the standard, influence of prestige languages spoken by the elite but not necessarily the common folk, intentional competition among language planners, or the influence of other cultural factors, such as the lingering sway of archaic liturgical languages. The topic of how to approach mixed orthographies is conspicuously missing from AACR2 and the LC Rule Interpretations, and therefore a wide range of actual cataloging practices can be observed when viewing these records in OCLC. Summary of problematic situations 1) These unsettled orthographic states are often most conspicuously seen in the languages’ writing systems and can be manifested by competing alphabets (e.g. Latin, Cyrillic, and Arabic scripts), competing spelling conventions in a single script (e.g. spelling the letter S (as in Sam) as s, ss, ß, sz, etc. in German and some Slavic languages in the 17th-18th centuries), or the use of historically unjustified “fancy” letters for stylistic reasons, and introduction of idiosyncratic letters of limited use even among the native speakers (e.g. the ʒ in Romani1), or combinations of the above. In the context of Slavic languages this orthographic turmoil was most often seen in the competing influences of Western versus Eastern European languages, and clashed especially conspicuously on the linguistic territory corresponding to the former Yugoslavia and the Balkans, but also occurred in the Central European countries in the Czech, Slovak, Polish, and Hungarian speech areas.2 2) A particularly complicated situation for Slavic/Eurasian catalogers arises when the mixed orthography also includes characters for which no valid ALA character is available. (Instructions for entering non-ALA diacritics and special characters can be found at: http://www.oclc.org/support/documentation/worldcat/diacritics/default.htm) 3) Pseudo-languages: A phenomenon that has occasionally been observed by catalogers is when, for stylistic or artistic reasons, a language is somewhat disguised in order to appear to be another language, e.g. Russian is written with so many Church Slavic letters and Church Slavic-looking grammatical endings, so as to appear to be Church Slavic, or, say, Russian is written in such a way as to appear that the writer is a speaker of Russian but is only familiar with the Ukrainian alphabet.3 The author is clearly trying to make a point with such a graphical presentation, and this information should be preserved as faithfully as possible. 1 Depending on the Romani dialect this in usually pronounced as a range of sounds, e.g. BCS đ or dž, to the Macedonian ѓ, hence the idea of trying to represent this very frequent letter with one common grapheme. 2 More recently in the 20th-21st centuries, especially in the former-Soviet countries of Central Asia, we have rather seen wholesale changes in the orthographic systems (e.g. Tajik went from Persian script to roman script (for a few years only) and then to Cyrillic; or Uzbek, which has a similar history but is now reverting once again back to roman script. 3 An example of the latter can be seen in the book at OCLC #71840031. 4) Another kind of situation that makes transcribing a title problematic is seen in cases where special symbols or logos are used to substitute for letters or even whole words in various ways. This is not really an “orthography” situation, per se. Think, for example, “I ♥ Philly”. This category doesn’t come up in Slavic cataloging very frequently, but there is no reason it cannot. For the most part it can be handled according to current AARC2 cataloging conventions. Cf. AACR2 1.1B1 (“If the title proper as given in the chief source of information includes symbols that cannot be reproduced by the facilities available, replace them with a cataloguer’s description in square brackets. Make an explanatory note if necessary”). For non-English titles it might be advisable to include a few possible interpretations. For example Я ♥ could have added titles: Я [сердце], Я [heart], Я [люблю] Main Cataloging Principals As catalogers, our primary goal is to represent the resource through metadata in as thorough manner as we can to allow others to locate the resource and to distinguish it from other similar resources. In cases where the primary graphic representation either 1) cannot be reproduced because of technical or cataloging-policy limitations, or 2) is in a form that it unusual enough that a person very familiar with the language is unlikely to locate it by searching in the usual manner, then we must find some alternatives to provide appropriate access. The Task Force on Updating the Slavic Cataloging Manual suggests the following approaches to the descriptive part of the cataloging record. These suggestions are our opinion only and do not represent any official cataloging bodies, but reflect our understanding, from experience, in how to best provide access within the framework of AACR2. Differing practices may become apparent when and if RDA is widely adopted. [Group: since part of the idea of RDA is to allow for easy repurposing of publisher and other data, in RDA I imagine some of this procedure might be inverted in some cases, i.e. the true transcription being in 246’s. If a cataloging agency follows a different practice than envisioned here for the 245, we can still freely add 246’s. Changing the 245 in an established OCLC master record might be more controversial. I guess we will have to wait and see. Geoff]. Suggested Practices: Title Proper transcription: In most cases we are dealing with problems in representing the title proper of the work, represented by MARC field 245 in OCLC records. For the most part, cases of mixed orthography, whether because of an unsettled orthographic history in a specific time period, or a more modern example of pseudo-language, can be handled in the same manner. 1) Strive to represent the title proper of the resource as faithfully as possible.4 and 5 a) If the title consists completely of valid ALA roman characters then there is no particular problem in transcribing the title proper. b) If the title proper of the resource is in a non-roman script for which characters are available for use in OCLC, e.g. most European Cyrillic languages (but not necessarily all Central Asians characters), then the title proper should be romanized as usual according to the ALA Romanization tables (http://www.indiana.edu/~libslav/slavcatman/sltrans.html), even if these letters seem to be from a different language than the prevailing language of the text. A matching vernacular field (MARC format 880 field, officially called Alternate Graphic 4 In most cases we are talking about books. Serials have some other stipulations about titles that are out of scope here. 5 In RDA this will also include the option retaining the transcription of titles in all capital letters if that is how they appear on the piece. Representation: see http://www.loc.gov/marc/bibliographic/bd880.html) can then be created in OCLC Connexion (if your library chooses to use these) so that the original script can be searched and displayed. Titles containing mixed roman and Cyrillic characters can be handled in the same way by using a matching vernacular field, as long as all are valid ALA characters. c) If any of the characters appearing in the title string contain non-valid ALA characters or characters that cannot yet be used in OCLC Connexion, then there are a few options: 1) Romanize the title proper in the most faithful way possible, using the ALA Romanization tables and Instructions for entering non-ALA diacritics and special characters as appropriate. Do so, even in cases where, for example, a letter normally recognized as Church Slavic is used for apparent stylistic reasons in, say, modern Russian. This is an increasingly common phenomenon in recent publications. Note that in some cases there will be a valid romanization for the character even if there is no OCLC character available for the corresponding vernacular form, e.g., the Russian character yus malyi ѧ6 or Tajik ḣ = ҳ7. In such cases you will not be able to create the matching vernacular field, as you will be lacking certain characters. [Note to group: I know we didn’t agree universally about how to transcribe these stylistically archaic letters used in modern works. So I am proceeding here with my personal bias that the author puts these in because he wishes them to be noticed and I want to respect that. But this is up to debate and if enough of you object I can change this. Geoff] 2) Although somewhat laborious, valid Unicode characters that are otherwise unavailable in Connexion can be represented in the matching vernacular 6 There is currently some controversy about how the ALA Romanization table says to Romanize this character when occurring in Russian. The table says to use the Romanization ę which is somewhat problematic as this letter is usually used merely as a stylistic я. In cases of this letter it will definitely be necessary to provide an added title for the romanized ia (with ligatures), and preferably a matching vernacular field with я. 7 The Tajik ҳ is called “х with descender” (NCR 04B3) is not an available character in Connexion. fields by their hexadecimal Numerical Character Reference (NCR). To create these the NRC is input in the Connexion matching vernacular field surrounded thus8, &#xNCR; e.g. ѧ as ѧ If the local catalog is properly Unicode complaint, then these characters should be displayed in the OPAC correctly. They will display correctly in OCLC WorldCat if they are input in the match-vernacular field 880 when input through OCLC. Experimentation will be necessary to see if your catalog can deal with these characters. Added Title Entries Liberally provide added titles with forms that a user could reasonably be expected to search in order to assure adequate access: For 1a: If the state of the orthography is different enough from the current language as to make searching difficult, provide as many added title entries (246’s) as necessary to assure access based on your knowledge of the language. For 1b: Also provide as many added title entries as necessary to assure access based on your knowledge of the language, using the vernacular script (880) and romanization (246) using the parallel fields in the MARC format. Optionally, make added titles needed in their vernacular script form only, if their romanizations will regularize out identically (i.e. are identical except for the presence or absence of diacritics) to other added titles, if your catalog can handle these. [Group: I removed this option because after reviewing the PCC guidelines at: http://www.loc.gov/catdir/pcc/scs/PCCNonLatinGuidelines.pdf (p. 3) I realized this really goes too counter to the agreed-upon practices, but again, up to debate. It could always be a local option after production in OCLC is anyone wants to do such post-production work. It might be more trouble than it’s worth. Geoff. 8 A quick way to find the correct hexadecimal value for such characters is to find them through Insert >> Symbol in Microsoft Word. The hexadecimal value will be displayed in the lower right. For 1c: In this situation, once you have chosen whether to follow option 1 or option 2 you will generally want to provide added title also according to For 1b. Notes Optionally also for cases of mixed orthography: Try to make as clear a language note (546) as possible to explain the nature of the text and the presence of those added titles, otherwise the user might assume these were forms actually found on the piece, which would be misleading.9 Please note also that since this white paper discusses specifically the phenomenon of mixed orthographies, it does not per se affect those situations where an older form of the title was made obsolete by an well-known official change of orthography for the language (e.g. pre- and post- Revolution Russian orthographies), in which cases a Uniform Title (130 or 240 depending on absence or presence of 100) can be constructed for the modern form, (see LC Rule Interpretation 25.3a10), if the uniform title regularizes differently from the uniform title to be constructed. In cases where identical regularization does occur, an added title (246s in roman and vernacular script (880) can be made to ensure that those forms of the title are also searchable. 9 The MARC format for 246 doesn’t really make available any indicator that shows an added entry has been supplied purely to facilitate searching. 10 “Orthographic Reform: For items published in countries where orthographic reform has taken place (Indonesia and Malaysia, the Netherlands, Soviet Union, etc.), record the data appearing in the area preceding the physical description area and in the series area exactly as found in the source of information with regard to orthography. For monographs, on the bibliographic record for any edition of a work whose title proper contains a word in the old orthography, provide a uniform title reflecting the new orthography, although no edition with the reformed orthography has been received.”