NLP1

advertisement
NLP #1
Natural Language Processing
William S-Y. Wang
wsywang@ee.cuhk.edu.hk
September 2005
A major goal of NLP is to enable computers to process natural languages in ways
that we humans do. We may think of language as occurring in three S domains: Sense,
Speech, and Script. Sense involves the internal organization of linguistic information
into various systems, including a grammatical system. Speech and Script are the two
domains which connect the internal representations of linguistic information with sound
waves and with graphic symbols respectively. The 4 corresponding activities are
speaking and hearing for one, and writing and reading for the other.
In this course, we will discuss each of these domains and its associated activities.
We will examine some major perspectives on how language is organized, such as
presented in the books by Goldberg, O’Grady and Tomasello, listed below. We will also
review current efforts at harnessing the computer to process language, particularly with
respect to Chinese and English. These efforts essentially started in the 1950s, with the
increasing power of the computer.
Simultaneous interpretation puts NLP in its most challenging situation: the
interpreter needs to hear what is being said in the source language and extract its
semantic content, compose an equivalent content in the target language or nearest thereof,
and speak it in the target language. Oftentimes, the interpreter needs to be hearing the
next sentence in the source language while speaking the sentence he has just completed
interpreting, so that no information will be missed. Although progress has been made
over the half-century, computers are still very far behind humans in NLP; we will discuss
some of the difficulties researchers have encountered as well as what future research
must include.
The grade for the course will be based on two tasks. Based on a topic agreed
upon with the instructor, each student will make a class presentation, which will comprise
1/3 of the grade. The topic will then be written up as a project report, which will
comprise ½ of the grade. The remaining 1/6 of the grade will be based on the student’s
contribution to the class discussions, including his comments on the lectures and on the
presentations of his classmates.
1
Selected References:
Goldberg, Adele. 1995. Constructions: a construction grammar approach to argument
structure. University of Chicago Press.
1. Introduction
2. The interaction between verbs and constructions
3. Relations among constructions
4. On linking
5. Partial productivity
6. The English ditransitive construction
7. The English caused-motion construction
8. The English resultative construction
9. The Way construction
10. Conclusion
Huang, Churen, K.J.Chen and B.K.T’sou, eds. 1996. Readings in Chinese Natural
Language Processing. Journal of Chinese Linguistics Monograph #9.
Huang, Churen and W.Lenders, eds. 2004. Computational Linguistics and Beyond.
Institute of Linguistics. Academia Sinica. Taiwan.
1. Huang and Lenders, an introduction
2. Fillmore, Ruppenhofer and Baker, FrameNet and representing the link between
semantic and syntactic relations
3. Wang, Ke and Minett, computational studies of language evolution
4. Uszkoreit, new chances for deep linguistic processing
5. Wilcock et al, the roles of natural language and XML in the semantic web
6. Tsou, language processing at the dawn of the 21st century
O’Grady, William. 2005. Syntactic Carpentry: an emergentist approach to syntax.
Lawrence Erlbaum.
1. Language without grammar
2. More on structure building
3. Pronoun interpretation
4. Control
5. Raising structures
6. Agreement
7. Wh questions
8. The syntax of contraction
9. Syntax and processing
10. Language acquisition
11. Concluding remarks
Tomasello, Michael. 2003. Constructing a Language: a usage-based theory of language
acquisition. Harvard University Press.
1. Usage-based linguistics
2
2.
3.
4.
5.
6.
7.
8.
9.
Origins of language
Words
Early syntactic constructions
Abstract syntactic constructions
Nominal and clausal constructions
Complex constructions and discourse
Biological, cultural, and ontogenetic processes
Toward a psychology of language acquisition
Wang, William S-Y., ed. 1986. Language, Writing, and the Computer. W.H.Freeman.
1. Geschwind, specialization of the human brain
2. Eimas, the perception of speech in early infancy
3. Bickerton, Creole languages
4. Schmandt-Besserat, the earliest precursor of writing
5. Fairservis, the script of the Indus Valley civilization
6. Wang, the Chinese language
7. Winograd, computer software for working with language
8. Waltz, artificial intelligence
9. Becker, multilingual word processors
10. Levinson and Liberman, speech recognition by computer
11. Tesler, programming languages
Yu 俞士汶 主编。2003。 计算语言学概论。商务印书馆。
1. 什么是计算语言学?
2. 语言知识的形式化表达
3. 语料库:语言知识的另外一种表示形式
4. 词法分析
5. 句法分析
6. 机器翻译
7. 面向文本的智能信息处理
术语表
3
Download