AFPAK Learn Pashto

advertisement

Creating a dual-use pandialectal Pashto grammar

AF-PAK LEARN Omaha

May 17, 2010

Corey Miller (cmiller@casl.umd.edu), Anne David,

Michael Maxwell, Alina Twist, Claudia Brugman,

Evelyn Browne, Melissa Fox, Michael Marlo, Paul

Rodrigues and Tristan Purvis

LANGUAGE RESEARCH IN SERVICE TO THE NATION

2

Motivation

• Pashto is an indispensable Afghan language critical to our nation’s security

• Pashto is difficult for English speakers

• Updated, comprehensive, learner-oriented

Pashto materials are needed

– Grammar

– Easy-access dictionary

LANGUAGE RESEARCH IN SERVICE TO THE NATION

3

What makes Pashto difficult?

• Ergativity

• Up to four cases: direct, oblique, ablative, and vocative

• Multiple noun and adjective declension classes

• Variety of adpositions: prepositions, postpositions, and circumpositions

• Retroflex consonants

• Variety of verbal structures

LANGUAGE RESEARCH IN SERVICE TO THE NATION

4

Project components

Fieldwork

Descriptive

Grammar

Formal

Grammar

Dictionary

Parser enables easy access to dictionary

Parser

LANGUAGE RESEARCH IN SERVICE TO THE NATION

5

Fieldwork

• Identified native speakers of Pashto from

Afghanistan and Pakistan living in the US

– Peshawar, Quetta, Pakistan

– Kabul, Kandahar, Afghanistan

• Create and run elicitation guides highlighting range of grammatical features

• Review all paradigms and example sentences, note dialect variation

• Digitally record all sessions

LANGUAGE RESEARCH IN SERVICE TO THE NATION

6

Motivation for descriptive grammar

• Existing materials suffer from liabilities

– dated

– cover single dialect

• Tegey and Robson 1996: Kabul

• Penzl 1955: Kandahar

• Shafeev 1964: Kandahar

– lack Pashto script (T&R has it)

LANGUAGE RESEARCH IN SERVICE TO THE NATION

Goals for descriptive grammar

• Contemporary data and presentation

• Use of Pashto script and transcription throughout

• Cover dialect variation wherever it applies

7

LANGUAGE RESEARCH IN SERVICE TO THE NATION

8

Descriptive grammar

• Pashto language, orthography, phonology

• Adpositions

• Pronouns

• Nouns

• Adjectives

• Verbs

• Dialectology

• Miscellaneous

LANGUAGE RESEARCH IN SERVICE TO THE NATION

Pashto dialects

9

LANGUAGE RESEARCH IN SERVICE TO THE NATION

Pronoun paradigm: incorporation of dialect information

10

LANGUAGE RESEARCH IN SERVICE TO THE NATION

Interlinear example sentences

11

LANGUAGE RESEARCH IN SERVICE TO THE NATION

Adjective paradigm

12

LANGUAGE RESEARCH IN SERVICE TO THE NATION

Formal grammar of inflectional affix

13

LANGUAGE RESEARCH IN SERVICE TO THE NATION

Stem allomorphy in nouns

14

LANGUAGE RESEARCH IN SERVICE TO THE NATION

Formal grammar of phonological rule

15

LANGUAGE RESEARCH IN SERVICE TO THE NATION

16

Morphological parsing

• Inputs

– Formal grammar

– Dictionary (Lexicon)

• Output capability

– Analysis: given an inflected form, produce possible headwords

– Generation: given a headword, produce possible inflected forms

LANGUAGE RESEARCH IN SERVICE TO THE NATION

Uses of morphological parser

• Analysis capability enables dictionary lookup of inflected forms

• Generation has pedagogical uses including self-testing

17

LANGUAGE RESEARCH IN SERVICE TO THE NATION

18

How morphological analysis aids lookup

• Inflected forms may differ substantially from citation forms

Pashto

ملو

ملتشيو

Transcription wə́ləm wiʃtə́ləm

Translation

I am shooting

I was shooting

• Experts can work around this problem, but non-experts often can ’t

LANGUAGE RESEARCH IN SERVICE TO THE NATION

19

The parser maps inflected forms to citation forms (headwords)

What does this

What does this mean?

ملو

Grammatical info: first person singular present imperfective

Citation form: لتشيو

لتشيو

[ wishtə́l]

(verb) to shoot

LANGUAGE RESEARCH IN SERVICE TO THE NATION

20

Conclusion

• Updated descriptive grammar based on fieldwork

• Formal grammar and lexicon feed parser

• Parser enables simplified dictionary lookup

• Faster, more informed processing of Pashto

LANGUAGE RESEARCH IN SERVICE TO THE NATION

21

Conclusion

• Updated descriptive grammar based on fieldwork

• Formal grammar and lexicon feed parser

• Parser enables simplified dictionary lookup

 Faster, more informed processing of

Pashto

LANGUAGE RESEARCH IN SERVICE TO THE NATION

22

References

• David, Anne and Michael Maxwell. 2008. Joint grammar development by linguists and computer scientists. Workshop on NLP for Less Privileged

Languages, Third International Joint Conference on Natural Language Processing, Hyderabad,

India.

• Maxwell, Michael and Anne David. 2008.

Interoperable Grammars. First International

Conference on Global Interoperability for

Language Resources, Hong Kong.

• Maxwell, Michael. 2010. Standardizaton as a means to Sustainability. LREC (to appear).

LANGUAGE RESEARCH IN SERVICE TO THE NATION

23

References

• Penzl, Herbert. 1955. A Grammar of Pashto.

Washington, DC: American Council of

Learned Societies.

• Tegey, Habibullah and Barbara Robson.

1996. A Reference Grammar of Pashto.

Washington, DC: Center for Applied

Linguistics.

• Shafeev, D. A. 1964. A Short Grammatical

Outline of Pashto. International Journal of

American Linguistics 30.

LANGUAGE RESEARCH IN SERVICE TO THE NATION

Download