Question Answering Tutorial Based on: John M. Prager IBM T.J. Watson Research Center jprager@us.ibm.com Taken from (with deletions and adaptations): RANLP 2003 tutorial http://lml.bas.bg/ranlp2003/ Tutorials link, Prager tutorial Part I - Anatomy of QA A Brief History of QA Terminology The Essence of Text-based QA Basic Structure of a QA System NE Recognition and Answer Types Answer Extraction John M. Prager RANLP 2003 Tutorial on Question Answering A Brief History of QA NLP front-ends to Expert Systems SHRDLU (Winograd, 1972) User manipulated, and asked questions about, blocks world First real demo of combination of syntax, semantics, and reasoning ** NLP front-ends to Databases LUNAR (Woods,1973) User asked questions about moon rocks Used ATNs and procedural semantics LIFER/LADDER (Hendrix et al. 1977) User asked questions about U.S. Navy ships Used semantic grammar; domain information built into grammar ** NLP + logic CHAT-80 (Warren & Pereira, 1982) NLP query system in Prolog, about world geography Definite Clause Grammars ** “Modern Era of QA” – answers from free text MURAX (Kupiec, 2001) NLP front-end to Encyclopaedia ** IR + NLP TREC-8 (1999) (Voorhees & Tice, 2000) Today – all of the above John M. Prager RANLP 2003 Tutorial on Question Answering Some “factoid” questions from TREC8-9 9: How far is Yaroslav from Moscow? 15: When was London's Docklands Light Railway constructed? 22: When did the Jurassic Period end? 29: What is the brightest star visible from Earth? * 30: What are the Valdez Principles? 73: Where is the Taj Mahal? 197: What did Richard Feynman say upon hearing he would receive the Nobel Prize in Physics? 198: How did Socrates die? 199: How tall is the Matterhorn? 200: How tall is the replica of the Matterhorn at Disneyland? * 227: Where does dew come from? 269: Who was Picasso? 298: What is California's state tree? John M. Prager RANLP 2003 Tutorial on Question Answering Terminology Question Type Answer Type Question Topic Candidate Passage Candidate Answer Authority File/List John M. Prager RANLP 2003 Tutorial on Question Answering Terminology – Question Type Question Type: an idiomatic categorization of questions for purposes of distinguishing between different processing strategies and/or answer formats E.g. TREC2003 FACTOID: “How far is it from Earth to Mars?” LIST: “List the names of chewing gums” DEFINITION: “Who is Vlad the Impaler?” Other possibilities: RELATIONSHIP: “What is the connection between Valentina Tereshkova and Sally Ride?” SUPERLATIVE: “What is the largest city on Earth?” YES-NO: “Is Saddam Hussein alive?” OPINION: “What do most Americans think of gun control?” CAUSE&EFFECT: “Why did Iraq invade Kuwait?” … John M. Prager RANLP 2003 Tutorial on Question Answering Terminology – Answer Type Answer Type: the class of object (or rhetorical type of sentence) sought by the question. E.g. PERSON (from “Who …”) PLACE (from “Where …”) DATE (from “When …”) NUMBER (from “How many …”) … but also EXPLANATION (from “Why …”) METHOD (from “How …”) … Answer types are usually tied intimately to the classes recognized by the system’s Named Entity Recognizer. John M. Prager RANLP 2003 Tutorial on Question Answering Broader Answer Types E.g. “In what state is the Grand Canyon?” “What is the population of Bulgaria?” “What colour is a pomegranate?” John M. Prager RANLP 2003 Tutorial on Question Answering Terminology – Question Topic Question Topic: the object (person, place, …) or event that the question is about. The question might well be about a property of the topic, which will be the question focus. E.g. “What is the height of Mt. Everest?” Mt. Everest is the topic Topic has to be mentioned in answer passage John M. Prager RANLP 2003 Tutorial on Question Answering Terminology – Candidate Passage Candidate Passage: a text passage (anything from a single sentence to a whole document) retrieved by a search engine in response to a question. Candidate passage expected to contain candidate answers. Candidate passages will usually have associated scores, from the search engine. John M. Prager RANLP 2003 Tutorial on Question Answering Terminology – Candidate Answer Candidate Answer: in the context of a question, a small quantity of text (anything from a single word to a sentence or bigger, but usually a noun phrase) that is of the same type as the Answer Type. In some systems, the type match may be approximate Candidate answers are found in candidate passages E.g. 50 Queen Elizabeth II September 8, 2003 by baking a mixture of flour and water John M. Prager RANLP 2003 Tutorial on Question Answering Terminology – Authority List Authority List (or File): a collection of instances of a class of interest, used to test a term for class membership. <Answer type> Instances should be derived from an authoritative source and be as close to complete as possible. Ideally, class is small, easily enumerated and with members with a limited number of lexical forms. Good: Days of week Planets Elements Good statistically, but difficult to get 100% recall: Animals Plants Colours Problematic People Organizations Impossible All numeric quantities Explanations and other clausal quantities John M. Prager RANLP 2003 Tutorial on Question Answering Essence of Text-based QA Need to find a passage that answers the question. Steps: Find a candidate passage (search) Check that semantics of passage and question match Extract the answer John M. Prager RANLP 2003 Tutorial on Question Answering Basic Structure of a QA-System See for example Abney et al., 2000; Clarke et al., 2001; Harabagiu et al.; Hovy et al., 2001; Prager et al. 2000 Question Question Analysis Query Search Answer Type Answer Documents/ passages Answer Extraction John M. Prager RANLP 2003 Tutorial on Question Answering Corpus or Web Essence of Text-based QA Search For a very small corpus, can consider every passage as a candidate, but this is not interesting Need to perform a search to locate good passages. If search is too broad, have not achieved that much, and are faced with lots of noise If search is too narrow, will miss good passages Two broad possibilities: Optimize search Use iteration John M. Prager RANLP 2003 Tutorial on Question Answering Essence of Text-based QA Match Need to test whether semantics of passage match semantics of question Approaches: Count question words present in passage Score based on proximity Score based on syntactic relationships Prove match John M. Prager RANLP 2003 Tutorial on Question Answering Essence of Text-based QA Answer Extraction Find candidate answers of same type as the answer type sought in question. Has implications for size of type hierarchy John M. Prager RANLP 2003 Tutorial on Question Answering Essence of Text-based QA High-Level View of Recall Have three broad locations in the system where expansion takes place, for purposes of matching passages Where is the right trade-off? Question Analysis. Expand individual terms to synonyms (hypernyms, hyponyms, related terms) Reformulate question (paraphrases) In Search Engine At indexing time Stemming/lemmatization John M. Prager RANLP 2003 Tutorial on Question Answering Essence of Text-based QA High-Level View of Precision Have three broad locations in the system where narrowing/filtering/matching takes place Where is the right trade-off? Question Analysis. Include all question terms in query, vs. allow partial matching Use IDF-style weighting to indicate preferences Search Engine Possibly store POS information for polysemous terms Answer Extraction Reward (penalize) passages/answers that (don’t) pass matching test John M. Prager RANLP 2003 Tutorial on Question Answering Answer Types and Modifiers Name 5 French Cities Most likely there is no type for “French Cities” Cf. Wikipedia So will look for CITY include “French/France” in bag of words, and hope for the best include “French/France” in bag of words, retrieve documents, and look for evidence (deep parsing, logic) If you have a list of French cities, could either Filter results by list Use Answer-Based QA (see later) Domain Model: Use longitude/latitude information of cities and countries – practical for domain oriented systems (e.g. geographical) John M. Prager RANLP 2003 Tutorial on Question Answering Answer Types and Modifiers Name a female figure skater Most likely there is no type for “female figure skater” Most likely there is no type for “figure skater” Look for PERSON, with query terms {figure, skater} What to do about “female”? Two approaches. 1. Include “female” in the bag-of-words. • • 2. Relies on logic that if “femaleness” is an interesting property, it might well be mentioned in answer passages. Does not apply to, say “singer”. Leave out “female” but test candidate answers for gender. • Needs either an authority file or a heuristic test • • e.g. look for she,her, … Test may not be definitive. John M. Prager RANLP 2003 Tutorial on Question Answering Named Entity Recognition BBN’s IdentiFinder (Bikel et al. 1999) Hidden Markov Model Sheffield GATE (http://www.gate.ac.uk/) Development Environment for IE and other NLP activities IBM’s Textract/Resporator (Byrd & Ravin, 1999; Wacholder et al. 1997; Prager et al. 2000) FSMs and Authority Files + others Inventory of semantic classes recognized by NER related closely to set of answer types system can handle John M. Prager RANLP 2003 Tutorial on Question Answering Named Entity Recognition John M. Prager RANLP 2003 Tutorial on Question Answering Answer Extraction Also called Answer Selection/Pinpointing Given a question and candidate passages, the process of selecting and ranking candidate answers. Usually, candidate answers are those terms in the passages which have the same answer type as that generated from the question Ranking the candidate answers depends on assessing how well the passage context relates to the question 3 Approaches: Heuristic features Shallow parse fragments Logical proof John M. Prager RANLP 2003 Tutorial on Question Answering Answer Extraction using Features Heuristic feature sets (Prager et al. 2003+). See also (Radev at al. 2000) Calculate feature values for each CA, and then calculate linear combination using weights learned from training data. Features are generic/non-lexicalized, question independent (vs. supervised IE) Ranking criteria: Good global context: the global context of a candidate answer evaluates the relevance of the passage from which the candidate answer is extracted to the question. Good local context the local context of a candidate answer assesses the likelihood that the answer fills in the gap in the question. Right semantic type the semantic type of a candidate answer should either be the same as or a subtype of the answer type identified by the question analysis component. Redundancy the degree of redundancy for a candidate answer increases as more instances of the answer occur in retrieved passages. John M. Prager RANLP 2003 Tutorial on Question Answering Answer Extraction using Features (cont.) Features for Global Context KeywordsInPassage: the ratio of keywords present in a passage to the total number of keywords issued to the search engine. NPMatch: the number of words in noun phrases shared by both the question and the passage. SEScore: the ratio of the search engine score for a passage to the maximum achievable score. FirstPassage: a Boolean value which is true for the highest ranked passage returned by the search engine, and false for all other passages. Features for Local Context AvgDistance: the average distance between the candidate answer and keywords that occurred in the passage. NotInQuery: the number of words in the candidate answers that are not query keywords. John M. Prager RANLP 2003 Tutorial on Question Answering Answer Extraction using Relationships Can be viewed as additional features Computing Ranking Scores – Linguistic knowledge to compute passage & candidate answer scores Perform syntactic processing on question and candidate passages Extract predicate-argument & modification relationships from parse Question: “Who wrote the Declaration of Independence?” Relationships: [X, write], [write, Declaration of Independence] Answer Text: “Jefferson wrote the Declaration of Independence.” Relationships: [Jefferson, write], [write, Declaration of Independence] Compute scores based on number of question relationship matches John M. Prager RANLP 2003 Tutorial on Question Answering Answer Extraction using Relationships (cont.) Example: When did Amtrak begin operations? Question relationships [Amtrak, begin], [begin, operation], [X, begin] Compute passage scores: passages and relationships In 1971, Amtrak began operations,… [Amtrak, begin], [begin, operation], [1971, begin]… “Today, things are looking better,” said Claytor, expressing optimism about getting the additional federal funds in future years that will allow Amtrak to begin expanding its operations. [Amtrak, begin], [begin, expand], [expand, operation], [today, look]… Airfone, which began operations in 1984, has installed air-to-ground phones…. Airfone also operates Railfone, a public phone service on Amtrak trains. [Airfone, begin], [begin, operation], [1984, operation], [Amtrak, train]… John M. Prager RANLP 2003 Tutorial on Question Answering Answer Extraction using Logic Logical Proof Convert question to a goal Convert passage to set of logical forms representing individual assertions Add predicates representing subsumption rules, real-world knowledge Prove the goal See section on LCC next John M. Prager RANLP 2003 Tutorial on Question Answering LCC Moldovan & Rus, 2001 Uses Logic Prover for answer justification Question logical form Candidate answers in logical form XWN glosses Linguistic axioms Lexical chains Inference engine attempts to verify answer by negating question and proving a contradiction If proof fails, predicates in question are gradually relaxed until proof succeeds or associated proof score is below a threshold. John M. Prager RANLP 2003 Tutorial on Question Answering LCC: Lexical Chains Q:1518 What year did Marco Polo travel to Asia? Answer: Marco polo divulged the truth after returning in 1292 from his travels, which included several months on Sumatra Lexical Chains: (1) travel_to:v#1 -> GLOSS -> travel:v#1 -> RGLOSS -> travel:n#1 (2) travel_to#1 -> GLOSS -> travel:v#1 -> HYPONYM -> return:v#1 (3) Sumatra:n#1 -> ISPART -> Indonesia:n#1 -> ISPART -> Southeast _Asia:n#1 -> ISPART -> Asia:n#1 Q:1570 What is the legal age to vote in Argentina? Answer: Voting is mandatory for all Argentines aged over 18. Lexical Chains: (1) legal:a#1 -> GLOSS -> rule:n#1 -> RGLOSS -> mandatory:a#1 (2) age:n#1 -> RGLOSS -> aged:a#3 (3) Argentine:a#1 -> GLOSS -> Argentina:n#1 John M. Prager RANLP 2003 Tutorial on Question Answering LCC: Logic Prover Question Which company created the Internet Browser Mosaic? QLF: (_organization_AT(x2) ) & company_NN(x2) & create_VB(e1,x2,x6) & Internet_NN(x3) & browser_NN(x4) & Mosaic_NN(x5) & nn_NNC(x6,x3,x4,x5) Answer passage ... Mosaic , developed by the National Center for Supercomputing Applications ( NCSA ) at the University of Illinois at Urbana - Champaign ... ALF: ... Mosaic_NN(x2) & develop_VB(e2,x2,x31) & by_IN(e2,x8) & National_NN(x3) & Center_NN(x4) & for_NN(x5) & Supercomputing_NN(x6) & application_NN(x7) & nn_NNC(x8,x3,x4,x5,x6,x7) & NCSA_NN(x9) & at_IN(e2,x15) & University_NN(x10) & of_NN(x11) & Illinois_NN(x12) & at_NN(x13) & Urbana_NN(x14) & nn_NNC(x15,x10,x11,x12,x13,x14) & Champaign_NN(x16) ... Lexical Chains develop <-> make and make <->create exists x2 x3 x4 all e2 x1 x7 (develop_vb(e2,x7,x1) <-> make_vb(e2,x7,x1) & something_nn(x1) & new_jj(x1) & such_jj(x1) & product_nn(x2) & or_cc(x4,x1,x3) & mental_jj(x3) & artistic_jj(x3) & creation_nn(x3)). all e1 x1 x2 (make_vb(e1,x1,x2) <-> create_vb(e1,x1,x2) & manufacture_vb(e1,x1,x2) & man-made_jj(x2) & product_nn(x2)). Linguistic axioms all x0 (mosaic_nn(x0) -> internet_nn(x0) & browser_nn(x0)) John M. Prager RANLP 2003 Tutorial on Question Answering USC-ISI Textmap system Ravichandran and Hovy, 2002 Hermjakob et al. 2003 Use of Surface Text Patterns When was X born -> Mozart was born in 1756 Gandhi (1869-1948) Can be captured in expressions <NAME> was born in <BIRTHDATE> <NAME> (<BIRTHDATE> These patterns can be learned Similar in nature to DIRT, using Web as a corpus Developed in the QA application context John M. Prager RANLP 2003 Tutorial on Question Answering USC-ISI TextMap Use bootstrapping to learn patterns. For an identified question type (“When was X born?”), start with known answers for some values of X Mozart 1756 Gandhi 1869 Newton 1642 Issue Web search engine queries (e.g. “+Mozart +1756” ) Collect top 1000 documents Filter, tokenize, smooth etc. Use suffix tree constructor to find best substrings, e.g. Mozart (1756-1791) Filter Mozart (1756- Replace query strings with e.g. <NAME> and <ANSWER> Determine precision of each pattern Find documents with just question term (Mozart) Apply patterns and calculate precision John M. Prager RANLP 2003 Tutorial on Question Answering USC-ISI TextMap Finding Answers Determine Question type Perform IR Query Do sentence segmentation and smoothing Replace question term by question tag i.e. replace Mozart with <NAME> Search for instances of patterns associated with question type Select words matching <ANSWER> Assign scores according to precision of pattern John M. Prager RANLP 2003 Tutorial on Question Answering Additional Linguistic Phenomena John M. Prager RANLP 2003 Tutorial on Question Answering Negation (1) Q: Who invented the electric guitar? A: While Mr. Fender did not invent the electric guitar, he did revolutionize and perfect it. Note: Not all instances of “not” will invalidate a passage. John M. Prager RANLP 2003 Tutorial on Question Answering Negation (2) Name a US state where cars are manufactured. versus Name a US state where cars are not manufactured. Certain kinds of negative events or instances are rarely asserted explicitly in text, but must be deduced by other means John M. Prager RANLP 2003 Tutorial on Question Answering Other Adverbial Modifiers (Only, Just etc.) Name an astronaut who nearly made it to the moon To satisfactorily answer such questions, need to know what are the different ways in which events can fail to happen. In this case there are several. John M. Prager RANLP 2003 Tutorial on Question Answering Attention to Details Tenses Who is the Prime Minister of Japan? Number What are the largest snakes in the world? ^ John M. Prager RANLP 2003 Tutorial on Question Answering Jeopardy Examples - Correct Literary Character Wanted for killing sir Danvers Carew ; Seems to have a split personality Hyde – correct ( Dr. Jekyll and Mr. Hyde) category: olympic oddities Milrad Cavic almost upset this man's perfect 2008 olypmics, losing to him by 100th of a second Michael Phelps (identified name type – “man”) John M. Prager RANLP 2003 Tutorial on Question Answering Jeopardy Examples - Wrong Name the decade: The first modern crossword puzzle is published & Oreo cookies are introduced Watson: wrong - 1920’s (57%), but the correct 1910’s with 30% largest US airport named after a World War II hero Toronto, the name of a Canadian city. (Missed that US airport means that the airport is in the US, or that Toronto isn’t in the U.S.) John M. Prager RANLP 2003 Tutorial on Question Answering General Perspective on Semantic Applications Semantic applications as “text matching” Matching between target texts and Supervised: training texts Unsupervised: user input (e.g. question) Cf. the textual entailment paradigm John M. Prager RANLP 2003 Tutorial on Question Answering