AF-PAK LEARN Omaha
May 17, 2010
Corey Miller (cmiller@casl.umd.edu), Anne David,
Michael Maxwell, Alina Twist, Claudia Brugman,
Evelyn Browne, Melissa Fox, Michael Marlo, Paul
Rodrigues and Tristan Purvis
LANGUAGE RESEARCH IN SERVICE TO THE NATION
2
• Pashto is an indispensable Afghan language critical to our nation’s security
• Pashto is difficult for English speakers
• Updated, comprehensive, learner-oriented
Pashto materials are needed
– Grammar
– Easy-access dictionary
LANGUAGE RESEARCH IN SERVICE TO THE NATION
3
• Ergativity
• Up to four cases: direct, oblique, ablative, and vocative
• Multiple noun and adjective declension classes
• Variety of adpositions: prepositions, postpositions, and circumpositions
• Retroflex consonants
• Variety of verbal structures
LANGUAGE RESEARCH IN SERVICE TO THE NATION
4
Fieldwork
Descriptive
Grammar
Formal
Grammar
Dictionary
Parser enables easy access to dictionary
Parser
LANGUAGE RESEARCH IN SERVICE TO THE NATION
5
• Identified native speakers of Pashto from
Afghanistan and Pakistan living in the US
– Peshawar, Quetta, Pakistan
– Kabul, Kandahar, Afghanistan
• Create and run elicitation guides highlighting range of grammatical features
• Review all paradigms and example sentences, note dialect variation
• Digitally record all sessions
LANGUAGE RESEARCH IN SERVICE TO THE NATION
6
Motivation for descriptive grammar
• Existing materials suffer from liabilities
– dated
– cover single dialect
• Tegey and Robson 1996: Kabul
• Penzl 1955: Kandahar
• Shafeev 1964: Kandahar
– lack Pashto script (T&R has it)
LANGUAGE RESEARCH IN SERVICE TO THE NATION
• Contemporary data and presentation
• Use of Pashto script and transcription throughout
• Cover dialect variation wherever it applies
7
LANGUAGE RESEARCH IN SERVICE TO THE NATION
8
• Pashto language, orthography, phonology
• Adpositions
• Pronouns
• Nouns
• Adjectives
• Verbs
• Dialectology
• Miscellaneous
LANGUAGE RESEARCH IN SERVICE TO THE NATION
9
LANGUAGE RESEARCH IN SERVICE TO THE NATION
Pronoun paradigm: incorporation of dialect information
10
LANGUAGE RESEARCH IN SERVICE TO THE NATION
11
LANGUAGE RESEARCH IN SERVICE TO THE NATION
12
LANGUAGE RESEARCH IN SERVICE TO THE NATION
Formal grammar of inflectional affix
13
LANGUAGE RESEARCH IN SERVICE TO THE NATION
14
LANGUAGE RESEARCH IN SERVICE TO THE NATION
Formal grammar of phonological rule
15
LANGUAGE RESEARCH IN SERVICE TO THE NATION
16
• Inputs
– Formal grammar
– Dictionary (Lexicon)
• Output capability
– Analysis: given an inflected form, produce possible headwords
– Generation: given a headword, produce possible inflected forms
LANGUAGE RESEARCH IN SERVICE TO THE NATION
• Analysis capability enables dictionary lookup of inflected forms
• Generation has pedagogical uses including self-testing
17
LANGUAGE RESEARCH IN SERVICE TO THE NATION
18
How morphological analysis aids lookup
• Inflected forms may differ substantially from citation forms
Pashto
ملو
ملتشيو
Transcription wə́ləm wiʃtə́ləm
Translation
I am shooting
I was shooting
• Experts can work around this problem, but non-experts often can ’t
LANGUAGE RESEARCH IN SERVICE TO THE NATION
19
The parser maps inflected forms to citation forms (headwords)
What does this
What does this mean?
ملو
Grammatical info: first person singular present imperfective
Citation form: لتشيو
لتشيو
[ wishtə́l]
(verb) to shoot
LANGUAGE RESEARCH IN SERVICE TO THE NATION
20
• Updated descriptive grammar based on fieldwork
• Formal grammar and lexicon feed parser
• Parser enables simplified dictionary lookup
• Faster, more informed processing of Pashto
LANGUAGE RESEARCH IN SERVICE TO THE NATION
21
• Updated descriptive grammar based on fieldwork
• Formal grammar and lexicon feed parser
• Parser enables simplified dictionary lookup
Faster, more informed processing of
Pashto
LANGUAGE RESEARCH IN SERVICE TO THE NATION
22
• David, Anne and Michael Maxwell. 2008. Joint grammar development by linguists and computer scientists. Workshop on NLP for Less Privileged
Languages, Third International Joint Conference on Natural Language Processing, Hyderabad,
India.
• Maxwell, Michael and Anne David. 2008.
Interoperable Grammars. First International
Conference on Global Interoperability for
Language Resources, Hong Kong.
• Maxwell, Michael. 2010. Standardizaton as a means to Sustainability. LREC (to appear).
LANGUAGE RESEARCH IN SERVICE TO THE NATION
23
• Penzl, Herbert. 1955. A Grammar of Pashto.
Washington, DC: American Council of
Learned Societies.
• Tegey, Habibullah and Barbara Robson.
1996. A Reference Grammar of Pashto.
Washington, DC: Center for Applied
Linguistics.
• Shafeev, D. A. 1964. A Short Grammatical
Outline of Pashto. International Journal of
American Linguistics 30.
LANGUAGE RESEARCH IN SERVICE TO THE NATION