mixed+orthography+for+SCM

advertisement
Mixed Orthographies and Related AACR2 Cataloging Issues
White Paper
Geoff Husić for the ALA SEES Taskforce on Updating the Slavic Cataloging Manual
Background
Many languages of Europe have gone through periods in their histories when their
written forms were very unsettled. This may have been because of various dialects
competing to be the standard, influence of prestige languages spoken by the elite but
not necessarily the common folk, intentional competition among language planners, or
the influence of other cultural factors, such as the lingering sway of archaic liturgical
languages. The topic of how to approach mixed orthographies is conspicuously missing
from AACR2 and the LC Rule Interpretations, and therefore a wide range of actual
cataloging practices can be observed when viewing these records in OCLC.
Summary of problematic situations
1) These unsettled orthographic states are often most conspicuously seen in the
languages’ writing systems and can be manifested by competing alphabets (e.g. Latin,
Cyrillic, and Arabic scripts), competing spelling conventions in a single script (e.g.
spelling the letter S (as in Sam) as s, ss, ß, sz, etc. in German and some Slavic
languages in the 17th-18th centuries), or the use of historically unjustified “fancy” letters
for stylistic reasons, and introduction of idiosyncratic letters of limited use even among
the native speakers (e.g. the ʒ in Romani1), or combinations of the above.
In the context of Slavic languages this orthographic turmoil was most often seen in the
competing influences of Western versus Eastern European languages, and clashed
especially conspicuously on the linguistic territory corresponding to the former
Yugoslavia and the Balkans, but also occurred in the Central European countries in the
Czech, Slovak, Polish, and Hungarian speech areas.2
2) A particularly complicated situation for Slavic/Eurasian catalogers arises when the
mixed orthography also includes characters for which no valid ALA character is
available. (Instructions for entering non-ALA diacritics and special characters can be
found at: http://www.oclc.org/support/documentation/worldcat/diacritics/default.htm)
3) Pseudo-languages: A phenomenon that has occasionally been observed by
catalogers is when, for stylistic or artistic reasons, a language is somewhat disguised in
order to appear to be another language, e.g. Russian is written with so many Church
Slavic letters and Church Slavic-looking grammatical endings, so as to appear to be
Church Slavic, or, say, Russian is written in such a way as to appear that the writer is a
speaker of Russian but is only familiar with the Ukrainian alphabet.3 The author is
clearly trying to make a point with such a graphical presentation, and this information
should be preserved as faithfully as possible.
1
Depending on the Romani dialect this in usually pronounced as a range of sounds, e.g. BCS đ or dž, to
the Macedonian ѓ, hence the idea of trying to represent this very frequent letter with one common
grapheme.
2
More recently in the 20th-21st centuries, especially in the former-Soviet countries of Central Asia, we
have rather seen wholesale changes in the orthographic systems (e.g. Tajik went from Persian script to
roman script (for a few years only) and then to Cyrillic; or Uzbek, which has a similar history but is now
reverting once again back to roman script.
3
An example of the latter can be seen in the book at OCLC #71840031.
4) Another kind of situation that makes transcribing a title problematic is seen in cases
where special symbols or logos are used to substitute for letters or even whole words in
various ways. This is not really an “orthography” situation, per se. Think, for example, “I
♥ Philly”. This category doesn’t come up in Slavic cataloging very frequently, but there
is no reason it cannot. For the most part it can be handled according to current AARC2
cataloging conventions. Cf. AACR2 1.1B1 (“If the title proper as given in the chief
source of information includes symbols that cannot be reproduced by the facilities
available, replace them with a cataloguer’s description in square brackets. Make an
explanatory note if necessary”). For non-English titles it might be advisable to include a
few possible interpretations. For example Я ♥ could have added titles: Я [сердце], Я
[heart], Я [люблю]
Main Cataloging Principals
As catalogers, our primary goal is to represent the resource through metadata in as
thorough manner as we can to allow others to locate the resource and to distinguish it
from other similar resources. In cases where the primary graphic representation either
1) cannot be reproduced because of technical or cataloging-policy limitations, or 2) is in
a form that it unusual enough that a person very familiar with the language is unlikely to
locate it by searching in the usual manner, then we must find some alternatives to
provide appropriate access.
The Task Force on Updating the Slavic Cataloging Manual suggests the following
approaches to the descriptive part of the cataloging record. These suggestions are our
opinion only and do not represent any official cataloging bodies, but reflect our
understanding, from experience, in how to best provide access within the framework of
AACR2. Differing practices may become apparent when and if RDA is widely adopted.
[Group: since part of the idea of RDA is to allow for easy repurposing of publisher and
other data, in RDA I imagine some of this procedure might be inverted in some cases,
i.e. the true transcription being in 246’s. If a cataloging agency follows a different
practice than envisioned here for the 245, we can still freely add 246’s. Changing the
245 in an established OCLC master record might be more controversial. I guess we will
have to wait and see. Geoff].
Suggested Practices:
Title Proper transcription:
In most cases we are dealing with problems in representing the title proper of the work,
represented by MARC field 245 in OCLC records. For the most part, cases of mixed
orthography, whether because of an unsettled orthographic history in a specific time
period, or a more modern example of pseudo-language, can be handled in the same
manner.
1) Strive to represent the title proper of the resource as faithfully as possible.4 and 5
a) If the title consists completely of valid ALA roman characters then there is no
particular problem in transcribing the title proper.
b) If the title proper of the resource is in a non-roman script for which characters
are available for use in OCLC, e.g. most European Cyrillic languages (but not
necessarily all Central Asians characters), then the title proper should be romanized as
usual according to the ALA Romanization tables
(http://www.indiana.edu/~libslav/slavcatman/sltrans.html), even if these letters seem to
be from a different language than the prevailing language of the text. A matching
vernacular field (MARC format 880 field, officially called Alternate Graphic
4
In most cases we are talking about books. Serials have some other stipulations about titles that are out
of scope here.
5
In RDA this will also include the option retaining the transcription of titles in all capital letters if that is
how they appear on the piece.
Representation: see http://www.loc.gov/marc/bibliographic/bd880.html) can then be
created in OCLC Connexion (if your library chooses to use these) so that the original
script can be searched and displayed. Titles containing mixed roman and Cyrillic
characters can be handled in the same way by using a matching vernacular field, as
long as all are valid ALA characters.
c) If any of the characters appearing in the title string contain non-valid ALA
characters or characters that cannot yet be used in OCLC Connexion, then there are a
few options:
1) Romanize the title proper in the most faithful way possible, using the
ALA Romanization tables and Instructions for entering non-ALA diacritics and special
characters as appropriate. Do so, even in cases where, for example, a letter normally
recognized as Church Slavic is used for apparent stylistic reasons in, say, modern
Russian. This is an increasingly common phenomenon in recent publications. Note that
in some cases there will be a valid romanization for the character even if there is no
OCLC character available for the corresponding vernacular form, e.g., the Russian
character yus malyi ѧ6 or Tajik ḣ = ҳ7. In such cases you will not be able to create the
matching vernacular field, as you will be lacking certain characters. [Note to group: I
know we didn’t agree universally about how to transcribe these stylistically archaic
letters used in modern works. So I am proceeding here with my personal bias that the
author puts these in because he wishes them to be noticed and I want to respect that.
But this is up to debate and if enough of you object I can change this. Geoff]
2) Although somewhat laborious, valid Unicode characters that are
otherwise unavailable in Connexion can be represented in the matching vernacular
6
There is currently some controversy about how the ALA Romanization table says to Romanize this
character when occurring in Russian. The table says to use the Romanization ę which is somewhat
problematic as this letter is usually used merely as a stylistic я. In cases of this letter it will definitely be
necessary to provide an added title for the romanized ia (with ligatures), and preferably a matching
vernacular field with я.
7
The Tajik ҳ is called “х with descender” (NCR 04B3) is not an available character in Connexion.
fields by their hexadecimal Numerical Character Reference (NCR). To create these the
NRC is input in the Connexion matching vernacular field surrounded thus8, &#xNCR; e.g.
ѧ as ѧ
If the local catalog is properly Unicode complaint, then these characters should be
displayed in the OPAC correctly. They will display correctly in OCLC WorldCat if they
are input in the match-vernacular field 880 when input through OCLC. Experimentation
will be necessary to see if your catalog can deal with these characters.
Added Title Entries
Liberally provide added titles with forms that a user could reasonably be expected to
search in order to assure adequate access:
For 1a: If the state of the orthography is different enough from the current
language as to make searching difficult, provide as many added title entries (246’s) as
necessary to assure access based on your knowledge of the language.
For 1b: Also provide as many added title entries as necessary to assure access
based on your knowledge of the language, using the vernacular script (880) and
romanization (246) using the parallel fields in the MARC format. Optionally, make added
titles needed in their vernacular script form only, if their romanizations will regularize out
identically (i.e. are identical except for the presence or absence of diacritics) to other
added titles, if your catalog can handle these. [Group: I removed this option because
after reviewing the PCC guidelines at:
http://www.loc.gov/catdir/pcc/scs/PCCNonLatinGuidelines.pdf (p. 3) I realized this really
goes too counter to the agreed-upon practices, but again, up to debate. It could always
be a local option after production in OCLC is anyone wants to do such post-production
work. It might be more trouble than it’s worth. Geoff.
8
A quick way to find the correct hexadecimal value for such characters is to find them through Insert >>
Symbol in Microsoft Word. The hexadecimal value will be displayed in the lower right.
For 1c: In this situation, once you have chosen whether to follow option 1 or
option 2 you will generally want to provide added title also according to For 1b.
Notes
Optionally also for cases of mixed orthography: Try to make as clear a language note
(546) as possible to explain the nature of the text and the presence of those added
titles, otherwise the user might assume these were forms actually found on the piece,
which would be misleading.9
Please note also that since this white paper discusses specifically the phenomenon of
mixed orthographies, it does not per se affect those situations where an older form of
the title was made obsolete by an well-known official change of orthography for the
language (e.g. pre- and post- Revolution Russian orthographies), in which cases a
Uniform Title (130 or 240 depending on absence or presence of 100) can be
constructed for the modern form, (see LC Rule Interpretation 25.3a10), if the uniform title
regularizes differently from the uniform title to be constructed. In cases where identical
regularization does occur, an added title (246s in roman and vernacular script (880) can
be made to ensure that those forms of the title are also searchable.
9
The MARC format for 246 doesn’t really make available any indicator that shows an added entry has
been supplied purely to facilitate searching.
10
“Orthographic Reform: For items published in countries where orthographic reform has taken place
(Indonesia and Malaysia, the Netherlands, Soviet Union, etc.), record the data appearing in the area
preceding the physical description area and in the series area exactly as found in the source of
information with regard to orthography. For monographs, on the bibliographic record for any edition of a
work whose title proper contains a word in the old orthography, provide a uniform title reflecting the new
orthography, although no edition with the reformed orthography has been received.”
Download