18th International Congress of Linguists 21 July 2008 A Web-Accessible Dictionary of Southeastern Pomo Charles B. Chang, Shira Katseff, Russell Lee-Goldman, Marta Piqueras-Brunet, and Yao Yao University of California, Berkeley Outline 1. Background 2. Dictionary structure 3. Behind the scenes: mySQL, XML, XSL 4. Searches: words, audio, texts 5. Future work 2 Background: Southeastern Pomo (SEP) Southeastern Pomo (Northern Hokan, Pomoan) is an acutely endangered language historically spoken in the area around Clear Lake, CA (Moshinsky 1974, Gordon 2005). Speakers/learners are mostly affiliated with the Elem Pomo Tribe. Haynie (2007) Only two fluent speakers remain. 3 Background: Southeastern Pomo (SEP) Revitalization efforts are underway, led by Loretta Kelsey and Robert Geary (cf. Shavelson 2006, Fagan 2007), and have resulted in: community orthography teaching materials language camps print dictionary with pictures of flora & fauna Another component of documentation and revitalization work: a web-accessible dictionary. 4 Background: online dictionaries As resources for linguistic analysis have entered cyberspace, so have the products of linguistic documentation (cf. Babel et al. 2006, Dick & Haynes 2006). Online dictionaries have been created for a number of languages native to the Americas (e.g. Yurok, Hupa, Northern Paiute, Washo). These online dictionaries make the information gathered through fieldwork on these languages accessible to researchers, tribe members, and in particular, young learners. 5 Outline 1. Background 2. Dictionary structure 3. Behind the scenes: mySQL, XML, XSL 4. Searches: words, audio, texts 5. Future work 6 Dictionary structure Three main parts: 1. lexicon entries for individual affixes, words, and fixed expressions entry displays the following information about an item: transcription part of speech gloss links to sound clips in the audio dictionary (if available) different lexicon entries correspond to different forms, but not necessarily different lexemes 2. audio dictionary entry for each audio file clipped out for a lexicon entry 7 Dictionary structure Three main parts: 3. texts database entries for elicited sentences, narratives, and other discourse above the sentence level entry displays the following information about an item: speaker genre transcriptions of individual sentences free translations interlinear glosses 8 Outline 1. Background 2. Dictionary structure 3. Behind the scenes: mySQL, XML, XSL 4. Searches: words, audio, texts 5. Future work 9 Entering data into a mySQL database Lexicon entries are made in a mySQL database with fields for: transcription variants (if any) community orthography (if available) part of speech free gloss & interlinear gloss semantic domain source file & start time links to other morphologically related entries notes 10 Entering data into a mySQL database Desirable features: automatically generates ID numbers for entries fully sortable and searchable allows several researchers to make/edit entries at the same time without overwriting each other’s work easily exportable to XML (eXtensible Markup Language) format 11 Displaying XML with XSL When the database is exported to XML, the data becomes essentially text. Sample XML for a lexicon entry: <lemma> <id>101</id> <lx>kachuchu</lx> <community_orthography>kuchechoo</community_orthography> <ps>n</ps> <ge>cap</ge> <short-gloss>cap</short-gloss> <ref>21sep06_LK1b</ref> <time>18:39</time> <sd>clothes</sd> <is-headword>yes</is-headword> </lemma> 12 Displaying XML with XSL Such a text format allows the data to be easily manipulated into other formats as the technology of documentation changes over time. A separate XSL (eXtensible Stylesheet Language) file controls how the data from the XML file is displayed. Summary of what goes on in a dictionary query: DISPLAY QUERY XSL XML 13 Outline 1. Background 2. Dictionary structure 3. Behind the scenes: mySQL, XML, XSL 4. Searches: words, audio, texts 5. Future work 14 The sounds of Southeastern Pomo Consonant inventory of SEP: LAB DEN ALV PAL VEL P-VEL GL p p’ b t̪ t̪’ t t’ d k k’ q q’ ʔ ts ts’ (tʃ tʃ’) f s ʃ x χ h m n (ŋ) ɾl w j 15 The sounds of Southeastern Pomo Linguistic orthography of SEP consonants: LAB DEN ALV PAL VEL P-VEL GL p p’ b th th’ t t’ d k k’ q q’ 7 ts ts’ (ch ch’) f s sh x X h m n (ng) rl w y 16 The sounds of Southeastern Pomo Vowel inventory of SEP: FRONT CENTRAL BACK i (ɪ) u (ʊ) e (ɛ) (ə) o a (ɐ) 17 The sounds of Southeastern Pomo Linguistic orthography of SEP vowels: FRONT CENTRAL BACK i e a u o 18 Sample dictionary queries Word searches “What does lq’olq’okin mean?” “How do you say ‘red’ in SEP?” “What are some kinship terms in SEP?” Audio searches “I want to hear all the words that contain the cluster /mf/.” Text searches “I want to see all the contexts in which the word mko ‘see’ appears.” 19 Outline 1. Background 2. Dictionary structure 3. Behind the scenes: mySQL, XML, XSL 4. Searches: words, audio, texts 5. Future work 20 Future work In the near future, we hope to: add different types of multimedia (e.g. photos of local flora & fauna, videos of the actions described by verbs of motion and placement) have multimedia display within the same window as the lexicon entry merge the data of the print dictionary with that of the online dictionary update all entries with their spelling in the Elem orthography have teachers and learners make use of this as a CALL (Computer-Aided Language Learning) tool 21 Thank you! Acknowledgements: Jocelyn Ahlers ◆ Zhenya Antić ◆ Thera Crane ◆ Donna Fenton Andrew Garrett ◆ Robert Geary ◆ Hannah Haynie Leanne Hinton ◆ Jisup Hong ◆ Loretta Kelsey ◆ Julius Moshinsky Lindsey Newbold ◆ Ronald Sprouse ◆ Maziar Toosarvandani Corey Yoquelet ◆ UCB Linguistics 22 References Babel, Molly, Andrew Garrett, Erin Haynes, Michael Houser, Reiko Kataoka, Fanny Liu, Nicole Marcus, Ruth Rouvier, Ronald Sprouse, Ange Strom-Weber, and Maziar Toosarvandani. 2006. A web-accessible Mono Lake Paiute dictionary and text archive. Paper presented at the Friends of Uto-Aztecan Conference. Salt Lake City, UT: University of Utah, August 24. Dick, Grace, and Erin Haynes. 2006. A web-accessible Mono Lake Paiute dictionary and text archive. Paper presented at the Great Basin Language Conference. Bishop, CA, October 21. Fagan, Kevin. 2007. Only living Elem Pomo speaker teaches so she won’t be the last. San Francisco Chronicle, September 30. http://www.sfgate.com/cgibin/article.cgi?file= /c/a/2007/09/30/MNAISEMAH.DTL. Retrieved 1 July 2008. Gordon, Raymond G., Jr., ed. 2005. Ethnologue: Languages of the World, 15th edition. Dallas, TX: SIL International. Online version: http://www.ethnologue.com. Haynie, Hannah. 2007. Southeastern Pomo. http://hjhaynie.berkeley.edu/ southeasternpomo. Retrieved 5 November 2007. Moshinsky, Julius. 1974. A Grammar of Southeastern Pomo. University of California Publications in Linguistics 72. Berkeley, CA: University of California Press. Shavelson, Lonny. 2006. California tribe tries to save its language. Voice of America News, March 30. http://www.voanews.com/english/archive/200623 03/2006-03-30-voa46. cfm?CFID=88126261&CFTOKEN=81958375.