Linguistically Rich Statistical Models of Language Joseph Smarr M.S. Candidate Symbolic Systems Program Advisor: Christopher D. Manning December 5th, 2002 Grand Vision Talk to your computer like another human Ask your computer a question, it finds the answer HAL, Star Trek, etc. “Who’s speaking at this week’s SymSys Forum?” Computer can read and summarize text for you “What’s the cutting edge in NLP these days?” We’re Not There (Yet) Turns out behaving intelligently is difficult General Artificial Intelligence problems What does it take to achieve the grand vision? Knowledge representation, common sense reasoning, etc. Language-specific problems Complexity, ambiguity, and flexibility of language Always underestimated because language is so easy for us! Are There Useful Sub-Goals? Grand vision is still too hard, but we can solve simpler problems that are still valuable Filter news for stories about new tech gadgets Take the SSP talk email and add it to my calendar Dial my cell phone by speaking my friend’s name Automatically reply to customer service emails Find out which episode of The Simpsons is tonight Theoretical Linguistics vs. NLP Theoretical Linguistics Goal: Understand people’s Knowledge of language Goal: Method: Natural Language Processing Rich logical representations of language’s hidden structure and meaning Method: Guiding principles: Separation of (hidden) knowledge of language and (observable) performance Grammaticality is categorical (all or none) Describe what are possible Develop practical tools for analyzing speech / text Simple, robust models of everyday language use that are sufficient to perform tasks Guiding principles Exploit (empirical) regularities and patterns in examples of language in text collections Sentence “goodness” is gradient (better or worse) Theoretical Linguistics vs. NLP Linguistics NLP Linguistic Puzzle When dropping an argument, why do some verbs keep the subject and some keep the object? Not just “quirkiness of language” John sang the song John sang John broke the vase The vase broke Similar patterns show up in other languages Seems to involve deep aspects of verb meaning Rules to account for this phenomenon Two classes of verbs (unergative & unaccusative) Exception: Imperatives “Open the pod bay doors, Hal” Different goals lead to study of different problems. In NLP... Need to recognize this as a command Need to figure out what specific action to take Irrelevant how you’d say it in French Describing language vs. working with language Theoretical Linguistics vs. NLP Potential for much synergy between linguistics and NLP Chomsky (founder of generative grammar): “It must be recognized that the notion ‘probability of a sentence’ is an entirely useless one, under any known interpretation of this term.” Karttunen (founder of finite state technologies at Xerox) However, historically they have remained quite distinct Linguists’ reaction to NLP: “Not interested. You do not understand Theory. Go away you geek.” Jelinek (former head of IBM speech project): “Every time I fire a linguist, the performance of our speech recognition system goes up.” Potential Synergies Lexical acquisition (unknown words) Modeling “naturalness” and “conventionality” Use corpus data to weight constructions Dealing with ungrammatical utterances Statistically infer new lexical entries from context Find “most similar / most likely” correction Richer patterns for finding information in text Use argument structure / semantic dependencies Finding Information in Text US Government has sponsored lots of research in “information extraction” from news articles Find mentions of terrorists and which locations they’re targeting Find which companies are being acquired by which others and for how much Progress driven by simplifying the models used Early work used rich linguistic parsers Unable to robustly handle natural text Modern work is mainly finite state patterns Web Information Extraction How much does that text book cost on Amazon? Learn patterns for finding relevant fields Our Price: $##.## Concept: Book Title: Foundations of Statistical Natural Language Processing Author(s) Christopher D. Manning & : Hinrich Schütze Price: $58.45 Improving IE Performance on Natural Text Documents How can we scale IE back up for natural text? Need to look elsewhere for regularities to exploit Idea: Consider grammatical structure Run shallow parser on each sentence Flatten output into sequence of “typed chunks” Example of Tagged Sentence: Uba2p is located largely in the nucleus. NP_SEG VP_SEG PP_SEG NP_SEG Power of Linguistic Features Using typed phrase segment tags uniformly impoves BWI's performance on the 4 natural text MEDLINE extraction tasks Average performance on 4 data sets 1.0 0.8 0.6 no tags tags 0.4 0.2 0.0 Precision Recall 21% increase 65% increase F1 45% increase Linguistically Rich(er) IE Exploit more grammatical structure for patterns e.g. Tim Grow’s work on IE with PCFGs S{pur, acq, amt} VP{acq, amt} NP{pur} NNP NNP NNP {pur} {pur} {pur} First Union Corp VP{acq, amt} MD will PP{amt} VB NP{acq} acquire NNP NNP NNP {acq} {acq} {acq} Sheland Bank Inc IN for NP{amt} CD CD NNP {amt} {amt} {amt} three million dollars Classifying Unknown Words Which of the following is the name of a city? Cotrimoxazole Wethersfield Alien Fury: Countdown to Invasion Most linguistic grammars assume a fixed lexicon How do humans learn to deal with new words? Context (“I spent a summer living in Wethersfield”) Makeup of the word itself (“phonesthetics”) What’s in a Name? oxa 0 : 0 00 6 field 0 00 708 18 4 17 14 4 drug company movie place person 14 0 8 6 68 Generative Model of PNPs Length n-gram model and word model P(pnp|c) = Pn-gram(word-lengths(pnp)) *Pword ipnp P(wi|word-length(wi)) Word model: mixture of character n-gram model and common word model P(wi|len) = llen*Pn-gram(wi|len)k/len + (1-llen)* Pword(wi|len) N-Gram Models: deleted interpolation P0-gram(symbol|history) = uniform-distribution Pn-gram(s|h) = lC(h)Pempirical(s|h) + (1- lC(h))P(n-1)-gram(s|h) Experimental Results 82% 84% 86% 88% 90% 92% 94% 96% 98% 98.93% drug-nyse nyse-drug_movie_place_person 98.70% nyse-place 98.64% 98.41% nyse-person 98.16% drug-person 97.76% nyse-movie 96.81% 95.77% drug-nyse_movie_place_person drug-movie 95.47% person-drug_nyse_movie_place 95.24% drug-place 94.57% nyse-place-person 94.34% 93.25% place-person drug-nyse-place-person 92.70% movie-person 91.86% place-drug_nyse_movie_person 90.90% 89.94% movie-drug_nyse_place_person movie-place drug-nyse-movie-place-person 88.11% pairwise 1-all n-way 100% Knowledge of Frequencies Linguistics traditionally assumes Knowledge of Language doesn’t involve counting Letter frequencies are clearly an important source of knowledge for unknown words Similarly, we saw before that there are regular patterns to exploit in grammatical information Take home point: Combining Statistical NLP methods with richer linguistic representations is a big win! Language is Ambiguous! Ban on Nude Dancing on Governor’s Desk – from a Georgia newspaper column discussing current legislation Lebanese chief limits access to private parts – talking about an Army General’s initiative Death may ease tension – an article about the death of Colonel Jean-Claude Paul in Haiti Iraqi Head Seeks Arms Juvenile Court to Try Shooting Defendant Teacher Strikes Idle Kids Stolen Painting Found By Tree Language is Ambiguous! Local HS Dropouts Cut in Half Obesity Study Looks for Larger Test Group British Left Waffles on Falkland Islands Red Tape Holds Up New Bridges Man Struck by Lightning Faces Battery Charge Clinton Wins on Budget, but More Lies Ahead Hospitals Are Sued by 7 Foot Doctors Kids Make Nutritious Snacks Coping With Ambiguity Categorical grammars like HPSG provide many possible analyses for sentences 455 parses for “List the sales of the products produced in 1973 with the products produced in 1972.” (Martin et al, 1987) In most cases, only one interpretation is intended Initial solution was hand-coded preferences among rules Hard to manage as number of rules increase Need to capture interactions among rules Statistical HPSG Parse Selection HPSG provides deep analyses of sentence structure and meaning Useful for NLP tasks like question answering Need to solve disambiguation problem to make using these richer representations practical Idea: Learn statistical preferences among constructions from hand-disambiguated collection of sentences Result: Correct analysis chosen >80% of the time Towards Semantic Extraction HPSG provides representation of meaning Computers need meaning to do inference Who did what to whom? Can we extend information extraction methods to extract meaning representations from pages? Current project: IE for the semantic web Large project to build rich ontologies to describe the content of web pages for intelligent agents Use IE to extract new instances of concepts from web pages (as opposed to manual Towards the Grand Vision? Collaboration between Theoretical Linguistics and NLP is important step forward How can we ever teach computers enough about language and the world? Practical tools with sophisticated language power Hawking: Moore’s Law is sufficient Moravec: mobile robots must learn like children Kurzweil: reverse-engineer the human brain The experts agree: Upcoming Convergence Courses Ling 139M Ling 239E CS 276B Ling 239A CS 224N Machine Translation Win Grammar Engineering Win Text Information Retrieval Win Parsing and Generation Spr Natural Language Processing Spr Get Involved!!