Southeastern Pomo - Linguistics - University of California, Berkeley

advertisement
18th International Congress of Linguists
21 July 2008
A Web-Accessible
Dictionary of
Southeastern Pomo
Charles B. Chang, Shira Katseff, Russell Lee-Goldman,
Marta Piqueras-Brunet, and Yao Yao
University of California, Berkeley
Outline
1.
Background
2.
Dictionary structure
3.
Behind the scenes: mySQL, XML, XSL
4.
Searches: words, audio, texts
5.
Future work
2
Background: Southeastern Pomo (SEP)


Southeastern Pomo
(Northern Hokan,
Pomoan) is an
acutely endangered
language
historically spoken
in the area around
Clear Lake, CA
(Moshinsky 1974,
Gordon 2005).
Speakers/learners
are mostly affiliated
with the Elem
Pomo Tribe.
Haynie
(2007)

Only two fluent speakers
remain.
3
Background: Southeastern Pomo (SEP)

Revitalization efforts are underway, led by
Loretta Kelsey and Robert Geary (cf.
Shavelson 2006, Fagan 2007), and have
resulted in:

community orthography
 teaching materials
 language camps
 print dictionary with pictures of flora & fauna

Another component of documentation and
revitalization work: a web-accessible
dictionary.
4
Background: online dictionaries

As resources for linguistic analysis have
entered cyberspace, so have the products of
linguistic documentation (cf. Babel et al. 2006,
Dick & Haynes 2006).


Online dictionaries have been created for a number
of languages native to the Americas (e.g. Yurok,
Hupa, Northern Paiute, Washo).
These online dictionaries make the information
gathered through fieldwork on these languages
accessible to researchers, tribe members, and in
particular, young learners.
5
Outline
1.
Background
2.
Dictionary structure
3.
Behind the scenes: mySQL, XML, XSL
4.
Searches: words, audio, texts
5.
Future work
6
Dictionary structure

Three main parts:
1. lexicon



entries for individual affixes, words, and fixed
expressions
entry displays the following information about an item:
 transcription
 part of speech
 gloss
 links to sound clips in the audio dictionary (if
available)
different lexicon entries correspond to different forms,
but not necessarily different lexemes
2. audio dictionary

entry for each audio file clipped out for a lexicon entry 7
Dictionary structure

Three main parts:
3. texts database


entries for elicited sentences, narratives, and other
discourse above the sentence level
entry displays the following information about an item:
 speaker
 genre
 transcriptions of individual sentences
 free translations
 interlinear glosses
8
Outline
1.
Background
2.
Dictionary structure
3.
Behind the scenes: mySQL, XML, XSL
4.
Searches: words, audio, texts
5.
Future work
9
Entering data into a mySQL database

Lexicon entries are made in a mySQL database
with fields for:









transcription
variants (if any)
community orthography (if available)
part of speech
free gloss & interlinear gloss
semantic domain
source file & start time
links to other morphologically related entries
notes
10
Entering data into a mySQL database

Desirable features:

automatically generates ID numbers for entries
 fully sortable and searchable
 allows several researchers to make/edit entries at
the same time without overwriting each other’s
work
 easily exportable to XML (eXtensible Markup
Language) format
11
Displaying XML with XSL
When the database is exported to XML, the
data becomes essentially text.
 Sample XML for a lexicon entry:

<lemma>
<id>101</id>
<lx>kachuchu</lx>
<community_orthography>kuchechoo</community_orthography>
<ps>n</ps>
<ge>cap</ge>
<short-gloss>cap</short-gloss>
<ref>21sep06_LK1b</ref>
<time>18:39</time>
<sd>clothes</sd>
<is-headword>yes</is-headword>
</lemma>
12
Displaying XML with XSL
Such a text format allows the data to be easily
manipulated into other formats as the
technology of documentation changes over
time.
 A separate XSL (eXtensible Stylesheet
Language) file controls how the data from the
XML file is displayed.
 Summary of what goes on in a dictionary query:


DISPLAY
QUERY XSL

XML

13
Outline
1.
Background
2.
Dictionary structure
3.
Behind the scenes: mySQL, XML, XSL
4.
Searches: words, audio, texts
5.
Future work
14
The sounds of Southeastern Pomo

Consonant inventory of SEP:
LAB DEN ALV PAL VEL P-VEL GL
p p’ b t̪ t̪’ t t’ d
k k’ q q’ ʔ
ts ts’ (tʃ tʃ’)
f
s ʃ
x χ
h
m
n
(ŋ)
ɾl
w
j
15
The sounds of Southeastern Pomo

Linguistic orthography of SEP consonants:
LAB DEN ALV PAL VEL P-VEL GL
p p’ b th th’ t t’ d
k k’ q q’ 7
ts ts’ (ch ch’)
f
s sh x X
h
m
n
(ng)
rl
w
y
16
The sounds of Southeastern Pomo

Vowel inventory of SEP:
FRONT CENTRAL BACK
i (ɪ)
u (ʊ)
e (ɛ)
(ə)
o
a (ɐ)
17
The sounds of Southeastern Pomo

Linguistic orthography of SEP vowels:
FRONT CENTRAL BACK
i
e
a
u
o
18
Sample dictionary queries

Word searches
“What does lq’olq’okin mean?”
 “How do you say ‘red’ in SEP?”
 “What are some kinship terms in SEP?”


Audio searches


“I want to hear all the words that contain the
cluster /mf/.”
Text searches

“I want to see all the contexts in which the word
mko ‘see’ appears.”
19
Outline
1.
Background
2.
Dictionary structure
3.
Behind the scenes: mySQL, XML, XSL
4.
Searches: words, audio, texts
5.
Future work
20
Future work

In the near future, we hope to:





add different types of multimedia (e.g. photos of
local flora & fauna, videos of the actions
described by verbs of motion and placement)
have multimedia display within the same window
as the lexicon entry
merge the data of the print dictionary with that of
the online dictionary
update all entries with their spelling in the Elem
orthography
have teachers and learners make use of this as a
CALL (Computer-Aided Language Learning) tool
21
Thank you!
Acknowledgements:
Jocelyn Ahlers ◆ Zhenya Antić ◆ Thera Crane ◆ Donna Fenton
Andrew Garrett ◆ Robert Geary ◆ Hannah Haynie
Leanne Hinton ◆ Jisup Hong ◆ Loretta Kelsey ◆ Julius
Moshinsky
Lindsey Newbold ◆ Ronald Sprouse ◆ Maziar Toosarvandani
Corey Yoquelet ◆ UCB Linguistics
22
References
Babel, Molly, Andrew Garrett, Erin Haynes, Michael Houser, Reiko Kataoka,
Fanny Liu, Nicole Marcus, Ruth Rouvier, Ronald Sprouse, Ange Strom-Weber,
and Maziar Toosarvandani. 2006. A web-accessible Mono Lake Paiute
dictionary and text archive. Paper presented at the Friends of Uto-Aztecan
Conference. Salt Lake City, UT: University of Utah, August 24.
Dick, Grace, and Erin Haynes. 2006. A web-accessible Mono Lake Paiute
dictionary and text archive. Paper presented at the Great Basin Language
Conference. Bishop, CA, October 21.
Fagan, Kevin. 2007. Only living Elem Pomo speaker teaches so she won’t be the
last. San Francisco Chronicle, September 30. http://www.sfgate.com/cgibin/article.cgi?file= /c/a/2007/09/30/MNAISEMAH.DTL. Retrieved 1 July
2008.
Gordon, Raymond G., Jr., ed. 2005. Ethnologue: Languages of the World, 15th
edition.
Dallas,
TX:
SIL
International.
Online
version:
http://www.ethnologue.com.
Haynie, Hannah. 2007. Southeastern Pomo. http://hjhaynie.berkeley.edu/
southeasternpomo. Retrieved 5 November 2007.
Moshinsky, Julius. 1974. A Grammar of Southeastern Pomo. University of
California Publications in Linguistics 72. Berkeley, CA: University of
California Press.
Shavelson, Lonny. 2006. California tribe tries to save its language. Voice of
America News, March 30. http://www.voanews.com/english/archive/200623
03/2006-03-30-voa46.
cfm?CFID=88126261&CFTOKEN=81958375.
Download