Designing an Alphabet for an Unwritten Language Roger Stone Neri Zamora SIL International SIL Bagabag 3711 Nueva Vizcaya Roger_Stone@sil.org TAP P.O. Box 1183 (MAIN) 1151 Quezon City Neri_Zamora@sil.org ABSTRACT In this paper, we describe the process of developing an alphabet or orthography for an unwritten language. Categories and Subject Descriptors not understand the ramifications of each choice they make. Educators will often be the first users and implementors of an orthography so their input in the process can be invaluable. 3. Processes of orthography development [Orthography Development] 3.1 Linguistic analysis General Terms 3.1.1 Phonetics - What are the sounds of the language? Design, Human Factors, Standardization, Languages. Keywords Phonetics, Phoneme, Community input. Morphology, Morphophonemics, An initial inventory must be done of all the sounds in the language. This can be done by recording speech, transcribing the words using the International Phonetic Alphabet using the charts below from the International Phonetics Association.[1] The data can be compiled and a list of the sounds of the language determined. 1. What is an orthography? Orthography is a technical term that simply means the system of writing. Its purpose is to facilitate communication. As far as we in TAP and SIL are concerned, there are three types of orthography: phonetic, technical and practical A phonetic orthography is typically a non-Roman-based transcription that is suitable for phonetically transcribing data. An alphabet called the International Phonetic Alphabet (more about this in 3.2.1) is used for this type of orthography. 1. A technical orthography of a language is typically a Roman-based transcription that is suitable for academic publication. It should reflect the underlying phonemic representation of the language. A technical orthography is typically Roman-based transcription that is suitable for the academic publication. 2. A practical orthography is typically the language encoding used by readers and writers of the language. For many languages there may be more than one writing system. Some spelling systems are much easier to learn and use than others. The most important consideration in a practical orthography is that writing system is adapted to the cultural trends, to the prestige, education, and political goals. 2. Who should be on an orthography development team? Orthographies are not to be developed in isolation. A linguist may be able to determine the optimal writing system for a particular language, but if the native speakers of the language do not appreciate it, it will not be used. Speakers of the language have intuitions about how letters and words should be written but may 3.1.2 Phonology – What are the sounds that need to be represented in the orthography? Just because a sound is found in the language does not mean that it needs a separate symbol in the alphabet. For instance, phonetically there are 12-13 vowels in English. But not all of those have a separate symbol in the English alphabet. Only five of those sounds are represented in the alphabet because there are only five “phonemes”. The e in the English word roses is a good example. The phonetic sound is ɨ but the symbol e is used. The sound that actually comes out of people’s mouths is different than the prototypical e sound. Why does this happen? Sounds are influenced by the sounds around them. The presence of two s or z sounds (both symbolized by “s”) causes the e sound to become phonetic ɨ. This is predictable in light of the s or z sound being made by when the tongue is placed in the middle of the mouth. The e sound also is made in the front of the mouth whereas the ɨ sound is made in the middle of the mouth. So, the e sound assimilates to the place of articulation of the consonants around it. In Ayta Abellen there are four vowel phonemes although if one listens to speakers of the language and carefully transcribes the uses of the vowel o, one will hear that sometimes the speakers are actually saying u. To date the orthography has only been using one symbol o for these two sounds and we have yet to discover a word where not writing two symbols has affected reading ability. So, if sounds are being influenced by the sounds around them, how do we know when to use two symbols or just one? In phonology there is a term called “minimal pair”, where two words have all the same sounds except one. When this happens we can say that the difference between those two sounds is significant and most likely the distinction will need to be represented in the orthography. An example in Tagalog is kulay and gulay. Even though k and g are phonetically similar (both made by placing the tongue at the back of the mouth) there have to be separate k and g symbols in the Tagalog orthography so that the differences in meaning can be determined by the reader. consistently. In the case of the example above, the third sample is the most common way of writing this in Ilokano with the pronoun attaching to the verb and the two particles being written as separate words. 3.1.4 Morphophonemics – What to do when affixes collide? Morphophonemics is “the study of phonemic variations in morphemes”. When a prefix is added onto a root there may be changes in the sounds made either at the end of the prefix or the beginning of the root. A simple example is darating in Tagalog. We know that the root is dating and the prefix is da-. The sound that is actually pronounced, though, is r and so the Tagalog orthography calls for writing this phonetically rather than phonemically. Phonemically this would be dadating. Language communities might not always want to choose the phonetic spelling as they may want to retain the original phoneme. For example the Ayta MagIndi have a prefix in-. When it attaches to a root that begins with p or b the natural tendency is to say im-. When occuring before roots beginning with t or d the natural tendency is to say in- and when before k or g the tendency is ing-. At present the Ayta MagIndi writers are trying to decide whether it is best to write all of these cases as in- or to alternate between im-, in-, and ing- depending on the following letter. 3.2 Community Input “Orthography development is a participatory process. It should be designed, implemented, and managed by the language community. During the process, participants must make many decisions related to factors that affect orthography development. Stress can also be phonemic but is often not written as in the case of ta’yo ‘to stand’ and ‘tayo ‘we’. In cases like this the context may be used by the reader to determine which form is being referenced. As participants become more and more aware of the structures of the language, they will need to make orthography revisions. During the process of testing and revising, a developing orthography is classified according to the type of revisions it has undergone.”[2] 3.1.3 Morphology – Where do the words and affixes begin and end? There are often three stages in the process. Stage one is a trial or “initial” orthography. Stage two is an approved or “working” orthography. Stage three is an established or “standard” orthography. Morphology is the identification, analysis and description of the structure of words. This includes the identification of the boundaries between prefixes, suffixes, and infixes in relation to the root. It also includes the identification of the boundaries of words. 4. Questions to consider “Is it easy to teach? For unwritten languages it is important that there is adequate analysis of where words begin and end. For instance, many northern Philippine languages like Ilokano have pronouns and particles that always follow the verb and when people speak it can sound like those words are attached. For instance, the phrase “I also would like to stand” could be written any of several ways. Is it easy to read? Is it easy to write? Can it be typed? Will words be too long? Will bridging to Tagalog be difficult? Tumakder ak koma metten. Do people like it”[3] Tumakderakkomametten. Tumakderak koma metten. Tumakderak komametten. It is important that the word divisions and their parts of speech be understood so that if there is a desire to attach some of the parts of the speech to the verb, a principled writing system can be specified where words and sentences can be constructed 5. Potential problem areas for Philippine languages Use of “o and u” Symbolization of the offglides (e.g. sya, bwaya) Symbolizing the glottal Representation of the juncture of “n” and “g” when they do not form “ng” Readability of reduplication (use of hyphen) Hyphenation Vowels that are dropped from words during affixation (also fast speech although it is normally written in the full form) Contractions Consonant gemination Pronoun attachment Compound words 6. Orthography development samples 6.1 Bolinao Four areas that we considered when we formed the Bolinao orthography: The linguistic evidence (phonemics). Graphic considerations (writing instruments available). Pedagogy (its teachability). The social acceptance (cultural considerations). systems and and political 6.1.1 The Linguistic Evidence (phonemics) The ideal for a writing system based on sounds is that one and only one symbol would represent each “emic” sound. (Emic sounds are the sounds that would change the meaning of a word). In Bolinao orthography, the letters being used for the nonborrowed words except proper names are: a, b, d, e, g, i, k, l, m, n, ng, o, p, r, s, t, w, y, and ‘. In borrowed words or loan words all of the letters of the Romanized alphabet are used following first of all the spelling in Filipino and, if that is not applicable, then the foreign spelling. Letters for which the symbols are consistently used in Bolinao, Tagalog and Ilocano (the lingua franca) languages are: a, b, d, g, l, m, n, ng, p, r, s, t 6.1.2 Graphic considerations (Writing systems and instruments available) If possible the symbols should be readily reproducible by local means. 1. 2. 3. Ideally, the symbols chosen should be available on the standard typewriter and computer keyboards, printing presses, or typesetters. The symbol should be legible even with diacritics. The symbol should be easily written in cursive script. Symbols that could be embarrassing to a people group because of their strangeness should not be used. We did not use, for instance, η for ng? or for a? or ʃ for sya? or j for y? 6.1.3 Pedagogy (its teachability) One test of a good orthography is that the speaker of the language be able to learn to read and write it. Or to put it another way, the written language should be readily teachable. This makes consistency a very important factor in developing an orthography. It also means we need to keep relearning to a minimum. Thus what is already being taught by local school and for the national language is a heavy factor to consider. Having learned to read the national language or the trade language of the area, a reader should be able to read in his native language with as little difficulty as possible as the transference of the value of the symbols. But, Smalley (1963:23) has summarized an alternative viewpoint as follows: It is not what is easiest to learn, but what people want to learn and use which ultimately determines orthographies.” We need to be open and very flexible in our attitudes as we seek to formulate orthographies. Part I of the orthography testing done in 1980 was a spelling test consisting of 65 words given orally in order to give opportunity for individuals to write words as they thought they should be written. Part II was a multiple choice test that gave opportunity to choose from various options. The results of Part I and Part II were consistent with each other and confirmed the present orthographic choices. Amongst the older people there was some use of C and QU instead of K but even there some consistently used K. The younger people used the K. The younger people tended to use the hyphen for the glottal sound in the position following a consonant as in Tagalog but otherwise consistently used a symbol similar to the apostrophe preceding a consonant. Final glottals were generally marked. The symbols used in the pre-consonant and final positions were first the apostrophe ('), second the grave accent (`) third the circumflex (^). The use of offglides, Y and W, depended on if the word used was cognate in Tagalog or Ilocano. The corresponding shape was used. All permutations were used. Generally the O-U rule of the last syllable using O, and U being used elsewhere was followed except by the older people who retained the shape of the root when affixing or reduplicating. The separation of particles following the verb and pronouns was based on the formation of the stress groups which normally meant the pronouns were separated from the verb and grouped into separate words containing more than one morpheme. These pronoun “words” may or may not contain some of the particles. Part III of the test was to demonstrate readability. Part A involved reading individual words out of context and Part B involved reading a story given by a Bolinao speaker. The reading test gave results consistent with the demonstrated reading ability of the person in other areas and so confirmed the orthographic choices. 6.1.4 The social acceptance (cultural and political considerations) What the people will use is the ultimate force. Of course we can anticipate this to some extent and also train people in the “official” way but after that, it will come back to what is being accepted and used. A primary factor here is the major language. In your case, Tagalog or Filipino has the primary influence since that is what is taught in schools, heard on TV and offers advancement within the country. This will influence the choice of symbols, word breaks, glides and other choices. Also, history affects the cultural pressure and so we have conflicts in Bolinao between Spanish spellings and national language spelling. The Spanish Romanized the ancient Baybayin form of writing. They, however, had no “K” in their alphabet and so used ”C” and “QU”. They also had difficulty in representing the “NG” sound and used several ways to try to represent this as one sound. The “W” sound was also absent in their Romanization and so they used diphthongs instead (au, iu, ou, oe, ua, ui, and uo). Rizal tried to harmonize these points in Tagalog when he simplified the Spanish “C” “CQ” and “QU” to “K” and the problem of diphtongs by the use of “W” and “Y”. The preferences of people is also considered. You may have varying opinions on the marking of the glottal. In Bolinao, it is generally the older people that want to retain “C” and “QU”. Pressures that may be placed by the government (approval of KWF) on languages may also be a factor. 6.2 Ayta languages The orthographies of the Ayta languages are not as developed. An orthography was developed for the Ayta MagAntsi language in conjunction with the publication of the Ayta Mag-antsi-English dictionary and the Ayta Mag-antsi New Testament but the other Ayta orthographies are still in process. The other Ayta languages have initial orthographies that are being used for test publications. The process being utilized here is what I call background analysis. As mother tongue speakers draft translated and original materials the linguist is watching for patterns and informing the writers about the implications of the emerging patterns. For example, if a writing pattern is noted that seems to capture better what is really happening in the language, I present the pattern to the writers and ask if they would all like to make that change. If yes, then the writers adjust their future drafts and I use regularized expression search and replace capabilities in software to change existing documents. I also inform the writers of patterns that are not in conformity with the overall orthography. In this way the writing system is improved over time as the different choices are discussed in context. It’s kind of like ongoing analysis is being done in the background. 7. Suggested Practical Procedure Assess the language and cultural situation Help community produce some sample materials Test the materials with readers Revise orthography as needed Prepare more sample materials Present findings to community and seek consensus through a “Language Congress” Revise orthography in response to community input Submit to LGU and/or Komisyon sa Wikang Filipino 8. Computational Tools Available 8.1 Speech Analyzer for Phonetic analysis Speech Analyzer is for recording speech and analyzing the resulting wave files. The tool can be useful for determining syllable length, stress, and even the correct phonetic symbol for individual sounds. It is downloadable at: http://www.sil.org/computing/sa/sa_download.htm 8.2 Phonology Assistant for analyzing the sound system Phonology Assistant takes phonetic data and helps the analyst find the patterns in the sound system of the language. It is downloadable at: http://www.sil.org/computing/pa/ 8.3 FLEx/WeSay for dictionary storage FLEx is a powerful tool for dictionary development and language analysis. The tool is designed specifically for these tasks and can handle virtually any writing system (definitely all in the Philippines). There is a learning curve to using it but it is free and very powerful for lexicography and analysis. It is downloadable at: http://www.sil.org/computing/fieldworks/flex/ A much less powerful but much easier to use solution for enabling native speakers of languages to document their words (and the meanings) is called WeSay. It can run on any Windows or Linux computer and is also freely downloadable. Teaching someone how to use it who is not familiar with computers is very simple. It is downloadable at: http://www.wesay.org/wiki/Downloads 9. ACKNOWLEDGMENTS Our thanks to SIL and TAP. 10. REFERENCES [1] SIL International Literacy Department. 1996. “Develop an Orthography”. SIL LinguaLinks Library. [2] http://www.langsci.ucl.ac.uk/ipa/ipachart.html [3] Easton, Catherine and Diane Wroge. 2002. Manual for Alphabet Design through Community Interaction for Papua New Guinea Elementary Teacher Trainers. SIL Papua New Guinea.