Overview of Natural Language Processing Advanced AI CSCE 976 Amy Davis amydavis@cse.unl.edu Outline • Common Applications • Dealing with Sentences (and words) • Dealing with Discourses Practical Applications Machine translation Database access Information Retrieval Query-answering Text categorization Summarization Data extraction Machine Translation Proposals for mechanical translators of languages pre-date the invention of the digital computer First was a dictionary look-up system at Birkbeck College, London 1948 American interest started by Warren Weaver, a code breaker in WW2, was popular during cold war, but alas, rather unsuccessful Machine Translation: Working Systems Taum-Meteo – Translates Weather reports from English to French in Montreal. Works because language used in reports is stylized and regular. Xerox Systram – Translates Xerox manuals from English to all languages that Xerox deals in. Utilized pre-edited texts Machine Translation: Difficulties Need a big Dictionary with Grammar rules in both (or all) languages, large start-up cost Direct word translation often ambiguous Lexicons (words that aren’t in a dictionary, but made of common parts) (ex. Lebensversicherungsgesellschaftsangestellter, a life insurance company employee) Ambiguity even in primary language Elements of language are different Machine Translation: Difficulties Essentially requires a good understanding of the text, and finding a corresponding text in the target language that does a good job of describing the same (or similar) situation. Requires computer to “understand”. Machine Translation: Successes Limited Domain allows for limited vocabulary, grammar, easier disambiguation and understanding Journal article: Church, K.W. and E.H. Hovy. 1993. Good Applications for Crummy Machine Translation. Machine Translation 8 (239--258) MAT machine-aided translation, where a machine starts, and a real person proof-reads for clarity. (Sometimes doesn’t require bi-lingual people). Example of MAT (page 692) The extension of the coverage of the health services to the underserved or not served population of the countries of the region was the central goal of the Ten-Year Plan and probably that of greater scope and transcendence. Almost all the countries formulated the purpose of extending the coverage although could be appreciated a diversity of approaches for its attack, which is understandable in view of the different national policies that had acted in the configuration of the health systems of each one of the countries. (Translated by SPANAM: Vasconcellos and Leon, 1985). Database Access The first major success for NLP was in the area of database access Natural Language Interfaces to Databases were developed to save mainframe operators the work of accessing data through complicated programs. Database Access: Working Systems LUNAR (by Woods for NASA, 1973) allowed queries of chemical analysis data of lunar rock and soil samples brought back by Apollo missions CHAT (Pereira, 1983) allows queries of a geographical database Database Access: Difficulties Limited Vocabulary User must phrase question correctly – system doesn’t understand everything Context detection allowing questions that implicitly refer to previous questions Becomes Text Interpretation question Database Access: Conclusion Worked well for a time Now more information is stored in text, not in databases (ex. email, news, articles, books, encyclopedias, web pages) The problem now is not to find information, it’s to sort through the information that’s available. Information Retrieval Now the main focus of Natural Language Processing There are four types: 1. 2. 3. 4. Query answering Text categorization Text summary Data extraction Information Retrieval: The task Choose from some set of documents ones that are related to my query Ex. Internet search Information Retrieval Methods Boolean: “(Natural AND Language) OR (Computational AND Linguistics)” • too confusing for most users Vector: Assign different weights to each term in query. Rank documents by distance from query and report ones that are close. Information Retrieval Mostly implemented using simple statistical models on the words only More advanced NLP techniques have not yielded significantly better results Information in a text is mostly in its words Text Categorization Once upon a time… this was done by humans Computers are much better at it (and more consistent) Best success for NLP so far (90+ % accuracy) Much faster and more consistent than humans. Automated systems now perform most of the work. NLP works better for TC than IR because categories are fixed. Text Summarization Main task: understand main meaning and describe in a shorter way Common Systems: Microsoft How: – Sentence/paragraph extraction (find the most important sentences/paragraphs and string them together for a summary) – Statistical methods are more common Data extraction Goal: Derive from text assertions to store in a database Example: SCISOR, Jacobs and Rau 1990 Summarizes Dow Jones News stories, and adds information to a database. NLP Goals Have (or feign) some understanding based on communication with Natural Language In order to receive and send information in ways easily understandable by human users How to get there NLP applications are all similar in that they require some level of understanding. Understand the query, understand the document, understand the data being communicated… Understanding Sentences: Overview Parsing and Grammar How is a sentence composed? Lexicons How is a word composed? Ambiguity Parsing Requirements Requires a defined Grammar Requires a big dictionary (10K words) Requires that sentences follow the grammar defined Requires ability to deal with words not in dictionary Parsing (from Section 22.4) Goal: Understand a single sentence by syntax analysis Methods – Bottom-up – Top-down More efficient (and complicated) algorithm given in 23.2 A Parsing Example Rules: S NP VP NP Article N | Proper VP Verb NP N home | boy | store Proper Betty | John Verb go|give|see Article the | an | a The Sentence: The boy went home. A Parsing Example: The answer Lexicons The current trend in parsing Goal: figure out this word Method: 1. Tokenize with morphological analysis Inflectional, derivational, compound 2. Dictionary lookup on each token 3. Error recovery (spelling correction, domaindependent cues) Lexicons in Practice 10,000 – 100,000 root word forms Expensive to develop, not readily shared Wordnet (George Miller, Princeton) clarity.princeton.edu Ambiguity More extensive Language more Ambiguity Disambiguation: task of finding correct interpretation Evidence: • • • • • Syntactic Lexical Semantic Metonymy Metaphor Disambiguation Tools Syntax modifiers (prepositions, adverbs) usually attach to nearest possible place Lexical probability of a word having a particular meaning, or being used in a particular way Semantic determine most likely meaning from context Semantic Disambiguation Example: “with” Sentence Relation I ate spaghetti with meatballs. I ate spaghetti with salad. I ate spaghetti with abandon. I ate spaghetti with a fork. I ate spaghetti with a friend. (ingredient of spaghetti) (side dish of spaghetti) (manner of eating) (instrument of eating) (accompanier of eating) Disambiguation is probabilistic! More Disambiguation Tools Metonymy “Chrysler announced” doesn’t mean companies can talk. Metaphor more is up: confidence has fallen, prices have sky-rocketed. Beyond Sentences: Discourse understanding Sentences are nice but… Most communication takes place in the form of multiple sentences (discourses) There’s lots more to the world than parsing and grammar! Discourse Understanding: Goals Correctly interpret sequences of sentences Increase knowledge about world from discourse (learn) – Dependent on facts as well as new knowledge gained from discourse. Discourse Understanding: an example John went to a fancy restaurant. He was pleased and gave the waiter a big tip. He spent $50. What is a proper understanding of this discourse? What is needed to have a proper understanding of this discourse? General world knowledge • Restaurants serve meals, so a reason for going to a restaurant is to eat. • Fancy restaurants serve fancy meals, $50 is a typical price for a fancy meal. Paying and leaving a tip is customary after eating meals at restaurants. • Restaurants employ waiters. General Structure of Discourse “John went to a fancy restaurant. He was pleased…” Describe some steps of a plan for a character Leave out steps that can be easily inferred from other steps. From first sentence: John is in the eat-atrestaurant plan. Inference: eat-meal step probably occurred – even if it wasn’t mentioned. Syntax and Semantics “...gave the waiter a big tip.” “the” used for objects that have been mentioned before OR Have been implicitly alluded to; in this case, by the eat-at-restaurant plan Specific knowledge about situation “He spent $50” • “He” is John. • Recipients of the $50 are the restaurant and the waiter. Structure of coherent discourse Discourses comprised of segments Relations between segments (more in Mann and Thompson, 1983) (coherence relation) – – – – – Enablement Evaluation Causal Elaboration Explanation Speaker Goals (Hobbs 1990) The Speaker does 4 things: 1) wants to convey a message 2) has a motivation or goal 3) wants to make it easy for the hearer to understand. 4) links new information to what hearer knows. A Theory of “Attention” Grosz and Sidner, 1986 Speaker or hearer’s attention is focused Focus follows a stack model Explains why order is important. Order is important What’s the difference? I visited Paris. I bought you some expensive cologne. Then I flew home. I went to Kmart. I bought some underwear. I visited Paris. Then I flew home. I went to Kmart. I bought you some expensive cologne. I bought some underwear. Summary • NLP have practical applications, but none do a great job in an open-ended domain • Sentences are understood through grammar, parsing and lexicons • Choosing a good interpretation of a sentence requires evidence from many sources • Most interesting NLP comes in connected discourse rather than in isolated sentences Current NLP Crowd – Originally, mostly mathematicians. – Now Computer Scientists (computational linguists= linguists, stasticians, computer science folk). – Big names are Perrault, Hobbs, Pereira, Grosz and Charniak Current NLP conferences Association for Computational Linguistics Coling EACL (Europe Association for Computational Linguistics) USA Schools with NLP Grad. Brown University Massachusetts at Amherst, University of Buffalo, SUNY at Massachusetts Institute of Technology California at Berkeley, University of Michigan, University of California at Los Angeles, University of New Mexico State University Carnegie-Mellon University New York University Columbia University Ohio State University Cornell University Pennsylvania, University of Delaware, University of Rochester, University of Duke University Southern California, University of Georgetown University Stanford University Georgia, University of Utah, University of Georgia Institute of Technology Wisconsin - Milwaukee, University of Harvard University Yale University Indiana University Information Sciences Institute (ISI) at the University of Southern California Johns Hopkins University Current NLP Journals Computational Linguistics Journal of Natural Language Engineering (JLNE) Machine Translation Natural Language and Linguistic Theory Industrial NLP Research Centers AT&T Labs - Research BBN Systems and Technologies Corporation DFKI (German research center for AI) General Electric R&D IRST, Italy IBM T.J. Watson Research, NY Lucent Technologies Bell Labs, Murray Hill, NJ Microsoft Research, Redmond, WA MITRE NEC Corporation SRI International, Menlo Park, CA SRI International, Cambridge, UK Xerox, Palo Alto, CA XRCE, Grenoble, France Speaker Goals (Hobbs 1990) The Speaker does 4 things: 1) wants to convey a message 2) has a motivation or goal 3) wants to make it easy for the hearer to understand. 4) links new information to what hearer knows. Discourse comprehension The procedure is actually quite simple. First you arrange things into different groups. Of course, one pile may be sufficient depending on how much there is to do. If you have to go somewhere else due to lack of facilities that is the next step, otherwise you are pretty well set. It is important not to overdo things. That is, it is better to do too few things at once than too many. In the short run this may not seem important but complications can easily arise. A mistake is expensive as well. At first the whole procedure will seem complicated. Soon however, it will become just another facet of life. It is difficult to foresee any end to the necessity of this task in the immediate future, but then one can never tell. After the procedure is completed one arranges the material into different groups again. Then they can get put into their appropriate places. Eventually they will be used once more and the whole cycle will have to be repeated. However, this is a part of life. Now: What do you remember? What are the four steps mentioned? What step is left out? What is the “material” mentioned? What kind of mistake would be expensive? Is it better to do too few or too many? Why? Oh Yeah -The title of the discourse is: “Washing Clothes” Now, re-read, and see if the questions are easier. What does this say about discourse comprehension?