NLP #1 Natural Language Processing William S-Y. Wang wsywang@ee.cuhk.edu.hk September 2005 A major goal of NLP is to enable computers to process natural languages in ways that we humans do. We may think of language as occurring in three S domains: Sense, Speech, and Script. Sense involves the internal organization of linguistic information into various systems, including a grammatical system. Speech and Script are the two domains which connect the internal representations of linguistic information with sound waves and with graphic symbols respectively. The 4 corresponding activities are speaking and hearing for one, and writing and reading for the other. In this course, we will discuss each of these domains and its associated activities. We will examine some major perspectives on how language is organized, such as presented in the books by Goldberg, O’Grady and Tomasello, listed below. We will also review current efforts at harnessing the computer to process language, particularly with respect to Chinese and English. These efforts essentially started in the 1950s, with the increasing power of the computer. Simultaneous interpretation puts NLP in its most challenging situation: the interpreter needs to hear what is being said in the source language and extract its semantic content, compose an equivalent content in the target language or nearest thereof, and speak it in the target language. Oftentimes, the interpreter needs to be hearing the next sentence in the source language while speaking the sentence he has just completed interpreting, so that no information will be missed. Although progress has been made over the half-century, computers are still very far behind humans in NLP; we will discuss some of the difficulties researchers have encountered as well as what future research must include. The grade for the course will be based on two tasks. Based on a topic agreed upon with the instructor, each student will make a class presentation, which will comprise 1/3 of the grade. The topic will then be written up as a project report, which will comprise ½ of the grade. The remaining 1/6 of the grade will be based on the student’s contribution to the class discussions, including his comments on the lectures and on the presentations of his classmates. 1 Selected References: Goldberg, Adele. 1995. Constructions: a construction grammar approach to argument structure. University of Chicago Press. 1. Introduction 2. The interaction between verbs and constructions 3. Relations among constructions 4. On linking 5. Partial productivity 6. The English ditransitive construction 7. The English caused-motion construction 8. The English resultative construction 9. The Way construction 10. Conclusion Huang, Churen, K.J.Chen and B.K.T’sou, eds. 1996. Readings in Chinese Natural Language Processing. Journal of Chinese Linguistics Monograph #9. Huang, Churen and W.Lenders, eds. 2004. Computational Linguistics and Beyond. Institute of Linguistics. Academia Sinica. Taiwan. 1. Huang and Lenders, an introduction 2. Fillmore, Ruppenhofer and Baker, FrameNet and representing the link between semantic and syntactic relations 3. Wang, Ke and Minett, computational studies of language evolution 4. Uszkoreit, new chances for deep linguistic processing 5. Wilcock et al, the roles of natural language and XML in the semantic web 6. Tsou, language processing at the dawn of the 21st century O’Grady, William. 2005. Syntactic Carpentry: an emergentist approach to syntax. Lawrence Erlbaum. 1. Language without grammar 2. More on structure building 3. Pronoun interpretation 4. Control 5. Raising structures 6. Agreement 7. Wh questions 8. The syntax of contraction 9. Syntax and processing 10. Language acquisition 11. Concluding remarks Tomasello, Michael. 2003. Constructing a Language: a usage-based theory of language acquisition. Harvard University Press. 1. Usage-based linguistics 2 2. 3. 4. 5. 6. 7. 8. 9. Origins of language Words Early syntactic constructions Abstract syntactic constructions Nominal and clausal constructions Complex constructions and discourse Biological, cultural, and ontogenetic processes Toward a psychology of language acquisition Wang, William S-Y., ed. 1986. Language, Writing, and the Computer. W.H.Freeman. 1. Geschwind, specialization of the human brain 2. Eimas, the perception of speech in early infancy 3. Bickerton, Creole languages 4. Schmandt-Besserat, the earliest precursor of writing 5. Fairservis, the script of the Indus Valley civilization 6. Wang, the Chinese language 7. Winograd, computer software for working with language 8. Waltz, artificial intelligence 9. Becker, multilingual word processors 10. Levinson and Liberman, speech recognition by computer 11. Tesler, programming languages Yu 俞士汶 主编。2003。 计算语言学概论。商务印书馆。 1. 什么是计算语言学? 2. 语言知识的形式化表达 3. 语料库:语言知识的另外一种表示形式 4. 词法分析 5. 句法分析 6. 机器翻译 7. 面向文本的智能信息处理 术语表 3