SIMS 290-2: Applied Natural Language Processing Marti Hearst November 15, 2004 1 Question Answering Today: Introduction to QA A typical full-fledged QA system A very simple system, in response to this An intermediate approach Wednesday: Using external resources – WordNet – Encyclopedias, Gazeteers Incorporating a reasoning system Machine Learning of mappings Other question types (e.g., biography, definitions) 2 A of Search Types Question/Answer What is the typical height of a giraffe? Browse and Build What are some good ideas for landscaping my client’s yard? Text Data Mining What are some promising untried treatments for Raynaud’s disease? 3 Beyond Document Retrieval Document Retrieval Users submit queries corresponding to their information needs. System returns (voluminous) list of full-length documents. It is the responsibility of the users to find information of interest within the returned documents. Open-Domain Question Answering (QA) Users ask questions in natural language. What is the highest volcano in Europe? System returns list of short answers. … Under Mount Etna, the highest volcano in Europe, perches the fabulous town … A real use for NLP Adapted from slide by Surdeanu and Pasca 4 Questions and Answers What is the height of a typical giraffe? The result can be a simple answer, extracted from existing web pages. Can specify with keywords or a natural language query – However, most web search engines are not set up to handle questions properly. – Get different results using a question vs. keywords 5 6 7 8 9 The Problem of Question Answering When was the San Francisco fire? … were driven over it. After the ceremonial tie was removed - it burned in the San Francisco fire of 1906 – historians believe an unknown Chinese worker probably drove the last steel spike into a wooden tie. If so, it was only… What is the nationality of Pope John Paul II? … stabilize the country with its help, the Catholic hierarchy stoutly held out for pluralism, in large part at the urging of Polish-born Pope John Paul II. When the Pope emphatically defended the Solidarity trade union during a 1987 tour of the… Where is the Taj Mahal? … list of more than 360 cities around the world includes the Great Reef in Australia, the Taj Mahal in India, Chartre’s Cathedral in France, and Serengeti National Park in Tanzania. The four sites Japan has listed include… Adapted from slide by Surdeanu and Pasca 10 The Problem of Question Answering Natural language question, not keyword queries What is the nationality of Pope John Paul II? … stabilize the country with its help, the Catholic hierarchy stoutly held out for pluralism, in large part at the urging of Polish-born Pope John Paul II. When the Pope emphatically defended the Solidarity trade union during a 1987 tour of the… Short text fragment, not URL list Adapted from slide by Surdeanu and Pasca 11 Question Answering from text With massive collections of full-text documents, simply finding relevant documents is of limited use: we want answers QA: give the user a (short) answer to their question, perhaps supported by evidence. An alternative to standard IR The first problem area in IR where NLP is really making a difference. Adapted from slides by Manning, Harabagiu, Kushmeric, and ISI 12 People want to ask questions… Examples from AltaVista query log who invented surf music? how to make stink bombs where are the snowdens of yesteryear? which english translation of the bible is used in official catholic liturgies? how to do clayart how to copy psx how tall is the sears tower? Examples from Excite query log (12/1999) how can i find someone in texas where can i find information on puritan religion? what are the 7 wonders of the world how can i eliminate stress What vacuum cleaner does Consumers Guide recommend Adapted from slides by Manning, Harabagiu, Kushmeric, and ISI 13 A Brief (Academic) History In some sense question answering is not a new research area Question answering systems can be found in many areas of NLP research, including: Natural language database systems – A lot of early NLP work on these Problem-solving systems – STUDENT (Winograd ’77) – LUNAR (Woods & Kaplan ’77) Spoken dialog systems – Currently very active and commercially relevant The focus is now on open-domain QA is new First modern system: MURAX (Kupiec, SIGIR’93): – Trivial Pursuit questions – Encyclopedia answers FAQFinder (Burke et al. ’97) TREC QA competition (NIST, 1999–present) Adapted from slides by Manning, Harabagiu, Kushmeric, and ISI 14 AskJeeves AskJeeves is probably most hyped example of “Question answering” How it used to work: Do pattern matching to match a question to their own knowledge base of questions If a match is found, returns a human-curated answer to that known question If that fails, it falls back to regular web search (Seems to be more of a meta-search engine now) A potentially interesting middle ground, but a fairly weak shadow of real QA Adapted from slides by Manning, Harabagiu, Kushmeric, and ISI 15 Question Answering at TREC Question answering competition at TREC consists of answering a set of 500 fact-based questions, e.g., “When was Mozart born?”. Has really pushed the field forward. The document set Newswire textual documents from LA Times, San Jose Mercury News, Wall Street Journal, NY Times etcetera: over 1M documents now. Well-formed lexically, syntactically and semantically (were reviewed by professional editors). The questions Hundreds of new questions every year, the total is ~2400 Task Initially extract at most 5 answers: long (250B) and short (50B). Now extract only one exact answer. Several other sub-tasks added later: definition, list, biography. Adapted from slides by Manning, Harabagiu, Kushmeric, and ISI 16 Sample TREC questions 1. Who is the author of the book, "The Iron Lady: A Biography of Margaret Thatcher"? 2. What was the monetary value of the Nobel Peace Prize in 1989? 3. What does the Peugeot company manufacture? 4. How much did Mercury spend on advertising in 1993? 5. What is the name of the managing director of Apricot Computer? 6. Why did David Koresh ask the FBI for a word processor? 7. What is the name of the rare neurological disease with symptoms such as: involuntary movements (tics), swearing, and incoherent vocalizations (grunts, shouts, etc.)? 17 TREC Scoring For the first three years systems were allowed to return 5 ranked answer snippets (50/250 bytes) to each question. Mean Reciprocal Rank Scoring (MRR): – Each question assigned the reciprocal rank of the first correct answer. If correct answer at position k, the score is 1/k. 1, 0.5, 0.33, 0.25, 0.2, 0 for 1, 2, 3, 4, 5, 6+ position Mainly Named Entity answers (person, place, date, …) From 2002 on, the systems are only allowed to return a single exact answer and the notion of confidence has been introduced. 18 Top Performing Systems In 2003, the best performing systems at TREC can answer approximately 60-70% of the questions Approaches and successes have varied a fair deal Knowledge-rich approaches, using a vast array of NLP techniques stole the show in 2000-2003 – Notably Harabagiu, Moldovan et al. ( SMU/UTD/LCC ) Statistical systems starting to catch up AskMSR system stressed how much could be achieved by very simple methods with enough text (and now various copycats) People are experimenting with machine learning methods Middle ground is to use large collection of surface matching patterns (ISI) Adapted from slides by Manning, Harabagiu, Kusmerick, ISI 19 Example QA System This system contains many components used by other systems, but more complex in some ways Most work completed in 2001; there have been advances by this group and others since then. Next slides based mainly on: Paşca and Harabagiu, High-Performance Question Answering from Large Text Collections, SIGIR’01. Paşca and Harabagiu, Answer Mining from Online Documents, ACL’01. Harabagiu, Paşca, Maiorano: Experiments with Open-Domain Textual Question Answering. COLING’00 20 QA Block Architecture Extracts and ranks passages using surface-text techniques Captures the semantics of the question Selects keywords for PR Extracts and ranks answers using NL techniques Question Semantics Q Question Processing Keywords WordNet Parser NER Adapted from slide by Surdeanu and Pasca Passage Retrieval Document Retrieval Passages Answer Extraction A WordNet Parser NER 21 Question Processing Flow Construction of the question representation Q Question parsing Answer type detection Keyword selection Adapted from slide by Surdeanu and Pasca Question semantic representation AT category Keywords 22 Question Stems and Answer Types Identify the semantic category of expected answers Question Question stem Answer type Q555: What was the name of Titanic’s captain? What Person Q654: What U.S. Government agency registers trademarks? What Organization Q162: What is the capital of Kosovo? What City Q661: How much does one ton of cement cost? How much Quantity Other question stems: Who, Which, Name, How hot... Other answer types: Country, Number, Product... Adapted from slide by Surdeanu and Pasca 23 Detecting the Expected Answer Type In some cases, the question stem is sufficient to indicate the answer type (AT) Why REASON When DATE In many cases, the question stem is ambiguous Examples – What was the name of Titanic’s captain ? – What U.S. Government agency registers trademarks? – What is the capital of Kosovo? Solution: select additional question concepts (AT words) that help disambiguate the expected answer type Examples – captain – agency – capital Adapted from slide by Surdeanu and Pasca 24 Answer Type Taxonomy Encodes 8707 English concepts to help recognize expected answer type Mapping to parts of Wordnet done by hand Can connect to Noun, Adj, and/or Verb subhierarchies 25 Answer Type Detection Algorithm Select the answer type word from the question representation. Select the word(s) connected to the question. Some “content-free” words are skipped (e.g. “name”). From the previous set select the word with the highest connectivity in the question representation. Map the AT word in a previously built AT hierarchy The AT hierarchy is based on WordNet, with some concepts associated with semantic categories, e.g. “writer” PERSON. Select the AT(s) from the first hypernym(s) associated with a semantic category. Adapted from slide by Surdeanu and Pasca 26 Answer Type Hierarchy PERSON inhabitant, dweller, denizen American westerner islander, island-dweller scientist, man of science chemist researcher oceanographer What PERSON Hepatitis-B researcher discovered vaccine What researcher discovered the vaccine against Hepatitis-B? Adapted from slide by Surdeanu and Pasca performer, performing artist dancer actor ballet tragedian actress dancer name PERSON What oceanographer French Calypso owned What is the name of the French oceanographer who owned Calypso? 27 Evaluation of Answer Type Hierarchy This evaluation done in 2001 Controlled the variation of the number of WordNet synsets included in the answer type hierarchy. Test on 800 TREC questions. Hierarchy coverage 0% 3% 10% 25% 50% Precision score (50-byte answers) 0.296 0.404 0.437 0.451 0.461 The derivation of the answer type is the main source of unrecoverable errors in the QA system Adapted from slide by Surdeanu and Pasca 28 Keyword Selection Answer Type indicates what the question is looking for, but provides insufficient context to locate the answer in very large document collection Lexical terms (keywords) from the question, possibly expanded with lexical/semantic variations provide the required context. Adapted from slide by Surdeanu and Pasca 29 Lexical Terms Extraction Questions approximated by sets of unrelated words (lexical terms) Similar to bag-of-word IR models Question (from TREC QA track) Lexical terms Q002: What was the monetary value of the Nobel monetary, value, Nobel, Peace Prize in 1989? Peace, Prize Q003: What does the Peugeot company manufacture? Peugeot, company, manufacture Q004: How much did Mercury spend on advertising in 1993? Mercury, spend, advertising, 1993 Q005: What is the name of the managing director of Apricot Computer? name, managing, director, Apricot, Computer Adapted from slide by Surdeanu and Pasca 30 Keyword Selection Algorithm 1. 2. 3. 4. 5. 6. 7. 8. Select Select Select Select Select Select Select Select all non-stopwords in quotations all NNP words in recognized named entities all complex nominals with their adjectival modifiers all other complex nominals all nouns with adjectival modifiers all other nouns all verbs the AT word (which was skipped in all previous steps) Adapted from slide by Surdeanu and Pasca 31 Keyword Selection Examples What researcher discovered the vaccine against Hepatitis-B? Hepatitis-B, vaccine, discover, researcher What is the name of the French oceanographer who owned Calypso? Calypso, French, own, oceanographer What U.S. government agency registers trademarks? U.S., government, trademarks, register, agency What is the capital of Kosovo? Kosovo, capital Adapted from slide by Surdeanu and Pasca 32 Passage Retrieval Extracts and ranks passages using surface-text techniques Captures the semantics of the question Selects keywords for PR Extracts and ranks answers using NL techniques Question Semantics Q Question Processing Keywords WordNet Parser NER Adapted from slide by Surdeanu and Pasca Passage Retrieval Document Retrieval Passages Answer Extraction A WordNet Parser NER 33 Passage Extraction Loop Passage Extraction Component Extracts passages that contain all selected keywords Passage size dynamic Start position dynamic Passage quality and keyword adjustment In the first iteration use the first 6 keyword selection heuristics If the number of passages is lower than a threshold query is too strict drop a keyword If the number of passages is higher than a threshold query is too relaxed add a keyword Adapted from slide by Surdeanu and Pasca 34 Passage Retrieval Architecture Keywords Keyword Adjustment No Passage Quality Yes Passages Passage Extraction Passage Scoring Passage Ordering Ranked Passages Documents Document Retrieval Adapted from slide by Surdeanu and Pasca 35 Passage Scoring Passages are scored based on keyword windows For example, if a question has a set of keywords: {k1, k2, k3, k4}, and in a passage k1 and k2 are matched twice, k3 is matched once, and k4 is not matched, the following windows are built: Window 1 Window 2 k1 k2 k2 k3 k1 Window 3 k2 k2 k3 k1 Window 4 k1 k2 k1 k2 k3 k1 Adapted from slide by Surdeanu and Pasca k1 k2 k2 k3 k1 36 Passage Scoring Passage ordering is performed using a radix sort that involves three scores: SameWordSequenceScore (largest) – Computes the number of words from the question that are recognized in the same sequence in the window DistanceScore (largest) – The number of words that separate the most distant keywords in the window MissingKeywordScore (smallest) – The number of unmatched keywords in the window Adapted from slide by Surdeanu and Pasca 37 Answer Extraction Extracts and ranks passages using surface-text techniques Captures the semantics of the question Selects keywords for PR Extracts and ranks answers using NL techniques Question Semantics Q Question Processing Keywords WordNet Parser NER Adapted from slide by Surdeanu and Pasca Passage Retrieval Document Retrieval Passages Answer Extraction A WordNet Parser NER 38 Ranking Candidate Answers Q066: Name the first private citizen to fly in space. Answer type: Person Text passage: “Among them was Christa McAuliffe, the first private citizen to fly in space. Karen Allen, best known for her starring role in “Raiders of the Lost Ark”, plays McAuliffe. Brian Kerwin is featured as shuttle pilot Mike Smith...” Best candidate answer: Christa McAuliffe Adapted from slide by Surdeanu and Pasca 39 Features for Answer Ranking relNMW number of question terms matched in the answer passage relSP number of question terms matched in the same phrase as the candidate answer relSS number of question terms matched in the same sentence as the candidate answer relFP flag set to 1 if the candidate answer is followed by a punctuation sign relOCTW number of question terms matched, separated from the candidate answer by at most three words and one comma relSWS number of terms occurring in the same order in the answer passage as in the question relDTW average distance from candidate answer to question term matches SIGIR ‘01 Adapted from slide by Surdeanu and Pasca 40 Answer Ranking based on Machine Learning Relative relevance score computed for each pair of candidates (answer windows) relPAIR = wSWS relSWS + wFP relFP + wOCTW relOCTW + wSP relSP + wSS relSS + wNMW relNMW + wDTW relDTW + threshold If relPAIR positive, then first candidate from pair is more relevant Perceptron model used to learn the weights Scores in the 50% MRR for short answers, in the 60% MRR for long answers Adapted from slide by Surdeanu and Pasca 41 Evaluation on the Web - test on 350 questions from TREC (Q250-Q600) - extract 250-byte answers Google Answer extraction from Google AltaVista Answer extraction from AltaVista Precision score 0.29 0.44 0.15 0.37 Questions with a correct answer among top 5 returned answers 0.44 0.57 0.27 0.45 Adapted from slide by Surdeanu and Pasca 42 Can we make this simpler? One reason systems became so complex is that they have to pick out one sentence within a small collection The answer is likely to be stated in a hard-to-recognize manner. Alternative Idea: What happens with a much larger collection? The web is so huge that you’re likely to see the answer stated in a form similar to the question Goal: make the simplest possible QA system by exploiting this redundancy in the web Use this as a baseline against which to compare more elaborate systems. The next slides based on: – Web Question Answering: Is More Always Better? Dumais, Banko, Brill, Lin, Ng, SIGIR’02 – An Analysis of the AskMSR Question-Answering System, Brill, Dumais, and Banko, EMNLP’02. Adapted from slides by Manning, Harabagiu, Kusmerick, ISI 43 AskMSR System Architecture 2 1 3 5 Adapted from slides by Manning, Harabagiu, Kusmerick, ISI 4 44 Step 1: Rewrite the questions Intuition: The user’s question is often syntactically quite close to sentences that contain the answer Where is the Louvre Museum located? The Louvre Museum is located in Paris Who created the character of Scrooge? Charles Dickens created the character of Scrooge. Adapted from slides by Manning, Harabagiu, Kusmerick, ISI 45 Query rewriting Classify question into seven categories Who is/was/are/were…? When is/did/will/are/were …? Where is/are/were …? a. Hand-crafted category-specific transformation rules e.g.: For where questions, move ‘is’ to all possible locations Look to the right of the query terms for the answer. “Where is the Louvre Museum located?” “is the Louvre Museum located” “the is Louvre Museum located” “the Louvre is Museum located” “the Louvre Museum is located” “the Louvre Museum located is” Nonsense, but ok. It’s only a few more queries to the search engine. b. Expected answer “Datatype” (eg, Date, Person, Location, …) When was the French Revolution? DATE Adapted from slides by Manning, Harabagiu, Kusmerick, ISI 46 Query Rewriting - weighting Some query rewrites are more reliable than others. Where is the Louvre Museum located? Weight 5 if a match, probably right Weight 1 Lots of non-answers could come back too +“the Louvre Museum is located” +Louvre +Museum +located Adapted from slides by Manning, Harabagiu, Kusmerick, ISI 47 Step 2: Query search engine Send all rewrites to a Web search engine Retrieve top N answers (100-200) For speed, rely just on search engine’s “snippets”, not the full text of the actual document Adapted from slides by Manning, Harabagiu, Kusmerick, ISI 48 Step 3: Gathering N-Grams Enumerate all N-grams (N=1,2,3) in all retrieved snippets Weight of an n-gram: occurrence count, each weighted by “reliability” (weight) of rewrite rule that fetched the document Example: “Who created the character of Scrooge?” Dickens Christmas Carol Charles Dickens Disney Carl Banks A Christmas Christmas Carol Uncle 117 78 75 72 54 41 45 31 Adapted from slides by Manning, Harabagiu, Kusmerick, ISI 49 Step 4: Filtering N-Grams Each question type is associated with one or more “data-type filters” = regular expression When… Date Where… What … Location Who … Person Boost score of n-grams that match regexp Lower score of n-grams that don’t match regexp Details omitted from paper…. Adapted from slides by Manning, Harabagiu, Kusmerick, ISI 50 Step 5: Tiling the Answers Scores 20 Charles Dickens Dickens 15 10 merged, discard old n-grams Mr Charles Score 45 Mr Charles Dickens tile highest-scoring n-gram N-Grams N-Grams Repeat, until no more overlap Adapted from slides by Manning, Harabagiu, Kusmerick, ISI 51 Results Standard TREC contest test-bed (TREC 2001): ~1M documents; 900 questions Technique doesn’t do too well (though would have placed in top 9 of ~30 participants) – MRR: strict: .34 – MRR: lenient: .43 – 9th place Adapted from slides by Manning, Harabagiu, Kusmerick, ISI 52 Results From EMNLP’02 paper MMR of .577; answers 61% correctly Would be near the top of TREC-9 runs Breakdown of feature contribution: 53 Issues Works best/only for “Trivial Pursuit”-style fact-based questions Limited/brittle repertoire of question categories answer data types/filters query rewriting rules Adapted from slides by Manning, Harabagiu, Kusmerick, ISI 54 Intermediate Approach: Surface pattern discovery Based on: Ravichandran, D. and Hovy E.H. Learning Surface Text Patterns for a Question Answering System, ACL’02 Hovy, et al., Question Answering in Webclopedia, TREC-9, 2000. Use of Characteristic Phrases "When was <person> born” Typical answers – "Mozart was born in 1756.” – "Gandhi (1869-1948)...” Suggests regular expressions to help locate correct answer – "<NAME> was born in <BIRTHDATE>” – "<NAME> ( <BIRTHDATE>-” Adapted from slides by Manning, Harabagiu, Kusmerick, ISI 55 Use Pattern Learning Examples: “The great composer Mozart (1756-1791) achieved fame at a young age” “Mozart (1756-1791) was a genius” “The whole world would always be indebted to the great music of Mozart (1756-1791)” Longest matching substring for all 3 sentences is "Mozart (1756-1791)” Suffix tree would extract "Mozart (1756-1791)" as an output, with score of 3 Reminiscent of IE pattern learning Adapted from slides by Manning, Harabagiu, Kusmerick, ISI 56 Pattern Learning (cont.) Repeat with different examples of same question type “Gandhi 1869”, “Newton 1642”, etc. Some patterns learned for BIRTHDATE a. born in <ANSWER>, <NAME> b. <NAME> was born on <ANSWER> , c. <NAME> ( <ANSWER> d. <NAME> ( <ANSWER> - ) Adapted from slides by Manning, Harabagiu, Kusmerick, ISI 57 QA Typology from ISI Typology of typical question forms—94 nodes (47 leaf nodes) Analyzed 17,384 questions (from answers.com) (THING ((AGENT (NAME (FEMALE-FIRST-NAME (EVE MARY ...)) (MALE-FIRST-NAME (LAWRENCE SAM ...)))) (COMPANY-NAME (BOEING AMERICAN-EXPRESS)) JESUS ROMANOFF ...) (ANIMAL-HUMAN (ANIMAL (WOODCHUCK YAK ...)) PERSON) (ORGANIZATION (SQUADRON DICTATORSHIP ...)) (GROUP-OF-PEOPLE (POSSE CHOIR ...)) (STATE-DISTRICT (TIROL MISSISSIPPI ...)) (CITY (ULAN-BATOR VIENNA ...)) (COUNTRY (SULTANATE ZIMBABWE ...)))) (PLACE (STATE-DISTRICT (CITY COUNTRY...)) (GEOLOGICAL-FORMATION (STAR CANYON...)) AIRPORT COLLEGE CAPITOL ...) (ABSTRACT (LANGUAGE (LETTER-CHARACTER (A B ...))) (QUANTITY (NUMERICAL-QUANTITY INFORMATION-QUANTITY MASS-QUANTITY MONETARY-QUANTITY TEMPORAL-QUANTITY ENERGY-QUANTITY TEMPERATURE-QUANTITY ILLUMINATION-QUANTITY (SPATIAL-QUANTITY (VOLUME-QUANTITY AREA-QUANTITY DISTANCEQUANTITY)) ... PERCENTAGE))) (UNIT ((INFORMATION-UNIT (BIT BYTE ... EXABYTE)) (MASS-UNIT (OUNCE ...)) (ENERGY-UNIT (BTU ...)) (CURRENCY-UNIT (ZLOTY PESO ...)) (TEMPORAL-UNIT (ATTOSECOND ... MILLENIUM)) (TEMPERATURE-UNIT (FAHRENHEIT KELVIN CELCIUS)) (ILLUMINATION-UNIT (LUX CANDELA)) (SPATIAL-UNIT ((VOLUME-UNIT (DECILITER ...)) (DISTANCE-UNIT (NANOMETER ...)))) (AREA-UNIT (ACRE)) ... PERCENT)) (TANGIBLE-OBJECT ((FOOD (HUMAN-FOOD (FISH CHEESE ...))) (SUBSTANCE ((LIQUID (LEMONADE GASOLINE BLOOD ...)) (SOLID-SUBSTANCE (MARBLE PAPER ...)) (GAS-FORM-SUBSTANCE (GAS AIR)) ...)) (INSTRUMENT (DRUM DRILL (WEAPON (ARM GUN)) ...) (BODY-PART (ARM HEART ...)) (MUSICAL-INSTRUMENT (PIANO))) ... *GARMENT *PLANT DISEASE) Adapted from slides by Manning, Harabagiu, Kusmerick, ISI 58 Experiments 6 different question types from Webclopedia QA Typology – – – – – – BIRTHDATE LOCATION INVENTOR DISCOVERER DEFINITION WHY-FAMOUS Adapted from slides by Manning, Harabagiu, Kusmerick, ISI 59 Experiments: pattern precision BIRTHDATE: 1.0 0.85 0.6 0.59 0.53 0.50 0.36 <NAME> ( <ANSWER> - ) <NAME> was born on <ANSWER>, <NAME> was born in <ANSWER> <NAME> was born <ANSWER> <ANSWER> <NAME> was born - <NAME> ( <ANSWER> <NAME> ( <ANSWER> - INVENTOR 1.0 1.0 1.0 <ANSWER> invents <NAME> the <NAME> was invented by <ANSWER> <ANSWER> invented the <NAME> in Adapted from slides by Manning, Harabagiu, Kusmerick, ISI 60 Experiments (cont.) DISCOVERER 1.0 1.0 0.9 when <ANSWER> discovered <NAME> <ANSWER>'s discovery of <NAME> <NAME> was discovered by <ANSWER> in DEFINITION 1.0 1.0 0.94 <NAME> and related <ANSWER> form of <ANSWER>, <NAME> as <NAME>, <ANSWER> and Adapted from slides by Manning, Harabagiu, Kusmerick, ISI 61 Experiments (cont.) WHY-FAMOUS 1.0 1.0 0.71 <ANSWER> <NAME> called laureate <ANSWER> <NAME> <NAME> is the <ANSWER> of LOCATION 1.0 1.0 0.92 <ANSWER>'s <NAME> regional : <ANSWER> : <NAME> near <NAME> in <ANSWER> Depending on question type, get high MRR (0.6–0.9), with higher results from use of Web than TREC QA collection Adapted from slides by Manning, Harabagiu, Kusmerick, ISI 62 Shortcomings & Extensions Need for POS &/or semantic types – "Where are the Rocky Mountains?” – "Denver's new airport, topped with white fiberglass cones in imitation of the Rocky Mountains in the background , continues to lie empty” – <NAME> in <ANSWER> NE tagger &/or ontology could enable system to determine "background" is not a location Adapted from slides by Manning, Harabagiu, Kusmerick, ISI 63 Shortcomings... (cont.) Long distance dependencies "Where is London?” "London, which has one of the busiest airports in the world, lies on the banks of the river Thames” would require pattern like: <QUESTION>, (<any_word>)*, lies on <ANSWER> Abundance & variety of Web data helps system to find an instance of patterns w/o losing answers to long distance dependencies Adapted from slides by Manning, Harabagiu, Kusmerick, ISI 64 Shortcomings... (cont.) System currently has only one anchor word Doesn't work for Q types requiring multiple words from question to be in answer – "In which county does the city of Long Beach lie?” – "Long Beach is situated in Los Angeles County” – required pattern: <Q_TERM_1> is situated in <ANSWER> <Q_TERM_2> Does not use case – "What is a micron?” – "...a spokesman for Micron, a maker of semiconductors, said SIMMs are..." If Micron had been capitalized in question, would be a perfect answer Adapted from slides by Manning, Harabagiu, Kusmerick, ISI 65 Question Answering Today: Introduction to QA A typical full-fledged QA system A very simple system, in response to this An intermediate approach Wednesday: Using external resources – WordNet – Encyclopedias, Gazeteers Incorporating a reasoning system Machine Learning of mappings Alternative question types 66