Università di Pisa Search Engines & Question Answering Giuseppe Attardi Dipartimento di Informatica Università di Pisa Question Answering IR: find documents relevant to query – query: boolean combination of keywords QA: find answer to question – Question: expressed in natural language – Answer: short phrase (< 50 byte) Trec-9 Q&A track 693 fact-based, short answer questions – either short (50 B) or long (250 B) answer ~3 GB newspaper/newswire text (AP, WSJ, SJMN, FT, LAT, FBIS) Score: MRR (penalizes second answer) Resources: top 50 (no answer for 130 q) Questions: 186 (Encarta), 314 (seeds from Excite logs), 193 (syntactic variants of 54 originals) Commonalities Approaches: – question classification – finding entailed answer type – use of WordNet High-quality document search helpful (e.g. Queen College) Sample Questions Q: Who shot President Abraham Lincoln? A: John Wilkes Booth Q: How many lives were lost in the Pan Am crash in Lockerbie? A: 270 Q: How long does it take to travel from London to Paris through the Channel? A: three hours 45 minutes Q: Which Atlantic hurricane had the highest recorded wind speed? A: Gilbert (200 mph) Q: Which country has the largest part of the rain forest? A: Brazil (60%) Question Types Class 1 Answer: single datum or list of items C: who, when, where, how (old, much, large) Class 2 A: multi-sentence C: extract from multiple sentences Class 3 A: across several texts C: comparative/contrastive Class 4 A: an analysis of retrieved information C: synthesized coherently from several retrieved fragments Class 5 A: result of reasoning C: word/domain knowledge and common sense reasoning Question subtypes Class 1.A About subjects, objects, manner, time or location Class 1.B About properties or attributes Class 1.C Taxonomic nature Results (long) 0,8 0,7 0,6 0,5 0,4 MRR Unofficial 0,3 0,2 0,1 sa Pi IC N TT M SI LI IB M o W at er lo en s Q ue SM U 0 Falcon: Architecture Question Collins Parser + NE Extraction Paragraph Index Question Semantic Form Paragraph filtering Question Taxonomy Expected Answer Type Answer Paragraphs WordNet Collins Parser + NE Extraction Answer Semantic Form Answer Logical Form Coreference Resolution Question Expansion Abduction Filter Question Logical Form Answer Question Processing Paragraph Processing Answer Processing Question parse S VP S VP PP NP WP VBD DT JJ NNP NP NP TO VB IN NN Who was the first Russian astronaut to walk in space Question semantic form first Answer type Russian astronaut PERSON walk space Question logic form: first(x) astronaut(x) Russian(x) space(z) walk(y, z, x) PERSON(x) Expected Answer Type WordNet QUANTITY dimension size Argentina Question: What is the size of Argentina? Questions about definitions Special patterns: – What {is|are} …? – What is the definition of …? – Who {is|was|are|were} …? Answer patterns: – …{is|are} – …, {a|an|the} –…- Question Taxonomy Question Location Country Reason City Product Province Nationality Continent Manner Number Speed Currency Degree Language Dimension Mammal Rate Reptile Duration Game Percentage Organization Count Question expansion Morphological variants – invented inventor Lexical variants – killer assassin – far distance Semantic variants – like prefer Indexing for Q/A Alternatives: – IR techniques – Parse texts and derive conceptual indexes Falcon uses paragraph indexing: – Vector-Space plus proximity – Returns weights used for abduction Abduction to justify answers Backchaining proofs from questions Axioms: – Logical form of answer – World knowledge (WordNet) – Coreference resolution in answer text Effectiveness: – 14% improvement – Filters 121 erroneous answers (of 692) – Requires 60% question processing time TREC 13 QA Several subtasks: – Factoid questions – Definition questions – List questions – Context questions LCC still best performance, but different architecture LCC Block Architecture Extracts and ranks passages using surface-text techniques Captures the semantics of the question Selects keywords for PR Question Processing Q Question Semantics Question Parse Keywords Semantic Transformation Recognition of Expected Answer Type Keyword Extraction NER WordNet Extracts and ranks answers using NL techniques Passage Retrieval Document Retrieval Answer Processing Passages Answer Extraction Theorem Prover Answer Justification Answer Reranking Axiomatic Knowledge Base NER WordNet A Question Processing Two main tasks – Determining the type of the answer – Extract keywords from the question and formulate a query Answer Types Factoid questions… – Who, where, when, how many… – The answers fall into a limited and somewhat predictable set of categories • Who questions are going to be answered by… • Where questions… – Generally, systems select answer types from a set of Named Entities, augmented with other types that are relatively easy to extract Answer Types Of course, it isn’t that easy… – Who questions can have organizations as answers • Who sells the most hybrid cars? – Which questions can have people as answers • Which president went to war with Mexico? Answer Type Taxonomy Contains ~9000 concepts reflecting expected answer types Merges named entities with the WordNet hierarchy Answer Type Detection Most systems use a combination of hand-crafted rules and supervised machine learning to determine the right answer type for a question. Not worthwhile to do something complex here if it can’t also be done in candidate answer passages. Keyword Selection Answer Type indicates what the question is looking for: – It can be mapped to a NE type and used for search in enhanced index Lexical terms (keywords) from the question, possibly expanded with lexical/semantic variations provide the required context. Keyword Extraction Questions approximated by sets of unrelated keywords Question (from TREC QA track) Keywords Q002: What was the monetary value of the Nobel Peace Prize in 1989? monetary, value, Nobel, Peace, Prize Q003: What does the Peugeot company manufacture? Peugeot, company, manufacture Q004: How much did Mercury spend on advertising in 1993? Mercury, spend, advertising, 1993 Q005: What is the name of the managing director of Apricot Computer? name, managing, director, Apricot, Computer Keyword Selection Algorithm 1. 2. 3. 4. 5. 6. 7. 8. Select all non-stopwords in quotations Select all NNP words in recognized named entities Select all complex nominals with their adjectival modifiers Select all other complex nominals Select all nouns with adjectival modifiers Select all other nouns Select all verbs Select the answer type word Passage Retrieval Extracts and ranks passages using surface-text techniques Question Processing Q Question Semantics Question Parse Keywords Semantic Transformation Recognition of Expected Answer Type Keyword Extraction NER WordNet Passage Retrieval Document Retrieval Answer Processing Passages Answer Extraction Theorem Prover Answer Justification Answer Reranking Axiomatic Knowledge Base NER WordNet A Passage Extraction Loop Passage Extraction Component – Extracts passages that contain all selected keywords – Passage size dynamic – Start position dynamic Passage quality and keyword adjustment – In the first iteration use the first 6 keyword selection heuristics – If the number of passages is lower than a threshold query is too strict drop a keyword – If the number of passages is higher than a threshold query is too relaxed add a keyword Passage Scoring Passages are scored based on keyword windows – For example, if a question has a set of keywords: {k1, k2, k3, k4}, and in a passage k1 and k2 are matched twice, k3 is matched once, and k4 is not matched, the following windows are built: Window 1 Window 2 k1 k2 k2 k3 k1 Window 3 k2 k2 k3 k1 Window 4 k1 k2 k1 k2 k1 k3 k1 k2 k2 k1 k3 Passage Scoring Passage ordering is performed using a sort that involves three scores: – The number of words from the question that are recognized in the same sequence in the window – The number of words that separate the most distant keywords in the window – The number of unmatched keywords in the window Answer Extraction Extracts and ranks answers using NL techniques Question Processing Q Question Semantics Question Parse Keywords Semantic Transformation Recognition of Expected Answer Type Keyword Extraction NER WordNet Passage Retrieval Document Retrieval Answer Processing Passages Answer Extraction Theorem Prover Answer Justification Answer Reranking Axiomatic Knowledge Base NER WordNet A Ranking Candidate Answers Q066: Name the first private citizen to fly in space. Answer type: Person Text passage: “Among them was Christa McAuliffe, the first private citizen to fly in space. Karen Allen, best known for her starring role in “Raiders of the Lost Ark”, plays McAuliffe. Brian Kerwin is featured as shuttle pilot Mike Smith...” Ranking Candidate Answers Q066: Name the first private citizen to fly in space. Answer type: Person Text passage: “Among them was Christa McAuliffe, the first private citizen to fly in space. Karen Allen, best known for her starring role in “Raiders of the Lost Ark”, plays McAuliffe. Brian Kerwin is featured as shuttle pilot Mike Smith...” Best candidate answer: Christa McAuliffe Features for Answer Ranking Number of question terms matched in the answer passage Number of question terms matched in the same phrase as the candidate answer Number of question terms matched in the same sentence as the candidate answer Flag set to 1 if the candidate answer is followed by a punctuation sign Number of question terms matched, separated from the candidate answer by at most three words and one comma Number of terms occurring in the same order in the answer passage as in the question Average distance from candidate answer to question term matches Lexical Chains Question: When was the internal combustion engine invented? Answer: The first internal combustion engine was built in 1867. Lexical chains: (1) invent:v#1 HYPERNIM create_by_mental_act:v#1 HYPERNIM create:v#1 HYPONIM build:v#1 Question: How many chromosomes does a human zygote have? Answer: 46 chromosomes lie in the nucleus of every normal human cell. Lexical chains: (1) zygote:n#1 HYPERNIM cell:n#1 HAS.PART nucleus:n#1 Theorem Prover Q: What is the age of the solar system? QLF: quantity_at(x2) & age_nn(x2) & of_in(x2,x3) & solar_jj(x3) & system_nn(x3) Question Axiom: (exists x1 x2 x3 (quantity_at(x2) & age_NN(x2) & of_in(x2,x3) & solar_jj(x3) & system_nn(x3)) Answer: The solar system is 4.6 billion years old. Wordnet Gloss: old_jj(x6) live_vb(e2,x6,x2) & for_in(e2,x1) & relatively_jj(x1) & long_jj(x1) & time_nn(x1) & or_cc(e5,e2,e3) & attain_vb(e3,x6,x2) & specific_jj(x2) & age_nn(x2) Linguistic Axiom: all x1 (quantity_at(x1) & solar_jj(x1) & system_nn(x1) of_in(x1,x1)) Proof: ¬quantity_at(x2) | ¬age_nn(x2) | ¬of_in(x2,x3) | ¬solar_jj(x3) | ¬system_nn(x3) Refutation assigns value to x2 Is the Web Different? In TREC (and most commercial applications), retrieval is performed against a smallish closed collection of texts. The diversity/creativity in how people express themselves necessitates all that work to bring the question and the answer texts together. But… The Web is Different On the Web popular factoids are likely to be expressed in a gazillion different ways. At least a few of which will likely match the way the question was asked. So why not just grep (or agrep) the Web using all or pieces of the original question. AskMSR Process the question by… – Forming a search engine query from the original question – Detecting the answer type Get some results Extract answers of the right type based on – How often they occur Step 1: Rewrite the questions Intuition: The user’s question is often syntactically quite close to sentences that contain the answer – Where is the Louvre Museum located? • The Louvre Museum is located in Paris – Who created the character of Scrooge? • Charles Dickens created the character of Scrooge. Query rewriting Classify question into seven categories – – – Who is/was/are/were…? When is/did/will/are/were …? Where is/are/were …? a. Hand-crafted category-specific transformation rules e.g.: For where questions, move ‘is’ to all possible locations Look to the right of the query terms for the answer. “Where is the Louvre Museum located?” “is the Louvre Museum located” “the is Louvre Museum located” “the Louvre is Museum located” “the Louvre Museum is located” “the Louvre Museum located is” Step 2: Query search engine Send all rewrites to a Web search engine Retrieve top N answers (100-200) For speed, rely just on search engine’s “snippets”, not the full text of the actual document Step 3: Gathering N-Grams Enumerate all N-grams (N=1,2,3) in all retrieved snippets Weight of an n-gram: occurrence count, each weighted by “reliability” (weight) of rewrite rule that fetched the document – Example: “Who created the character of Scrooge?” Dickens Christmas Carol Charles Dickens Disney Carl Banks A Christmas Christmas Carol Uncle 117 78 75 72 54 41 45 31 Step 4: Filtering N-Grams Each question type is associated with one or more “data-type filters” = regular expressions for answer types Boost score of n-grams that match the expected answer type. Lower score of n-grams that don’t match. Step 5: Tiling the Answers Scores 20 Charles Dickens Dickens 15 10 merged, discard old n-grams Mr Charles Score 45 Mr Charles Dickens Results Standard TREC contest test-bed (TREC 2001): 1M documents; 900 questions – Technique does ok, not great (would have placed in top 9 of ~30 participants) – But with access to the Web… they do much better, would have come in second on TREC 2001 Harder Questions Factoid question answering is really pretty silly. A more interesting task is one where the answers are fluid and depend on the fusion of material from disparate texts over time. – Who is Condoleezza Rice? – Who is Mahmoud Abbas? – Why was Arafat flown to Paris? IXE Components IXE Framework Passage Index NE Tagger EventStream ContextStream Sent. Splitter GIS Files Mem Mapping Threads Crawler Synchronization POS Tagger MaxEntropy OS Abstraction Indexer Web Service Python Perl Java Clustering Wrappers Unicode RegExp Tokenizer Readers Search Suffix Trees Object Store Text Language Processing Tools Maximum Entropy classifier Sentence Splitter Multi-language POS Tagger Multi-language NE Tagger Conceptual clustering Maximum Entropy Machine Learning approach to classification: – System trained on test cases – Learned model used for predictions Classification problem described as a number of features Each feature corresponds to a constraint on the model Maximum entropy model: the model with the maximum entropy of all the models that satisfy the constraints Choosing a model with less entropy, would add ‘information’ constraints not justified by the empirical evidence available MaxEntropy: example data Features Outcome Sunny, Happy Outdoor Sunny, Happy, Dry Outdoor Sunny, Happy, Humid Outdoor Sunny, Sad, Dry Outdoor Sunny, Sad, Humid Outdoor Cloudy, Happy, Humid Outdoor Cloudy, Happy, Humid Outdoor Cloudy, Sad, Humid Outdoor Cloudy, Sad, Humid Outdoor Rainy, Happy, Humid Indoor Rainy, Happy, Dry Indoor Rainy, Sad, Dry Indoor Rainy, Sad, Humid Indoor Cloudy, Sad, Humid Indoor Cloudy, Sad, Humid Indoor MaxEnt: example predictions Context Cloudy, Happy, Humid Outdoor 0.771 Indoor 0.228 Rainy, Sad, Humid 0.001 0.998 MaxEntropy: application Sentence Splitting Not all punctuations are sentence boundaries: – U.S.A. – St. Helen – 3.14 Use features like: – – – – Capitalization (previous, next word) Present in abbreviation list Suffix/prefix digits Suffix/prefix long Precision: > 95% Part of Speech Tagging TreeTagger: statistic package based on HMM and decision trees Trained on manually tagged text Full language lexicon (with all inflections: 140.000 words for Italian) Training Corpus Il presidente della Repubblica francese Francois Mitterrand ha proposto … DET:def:*:*:masc:sg NOM:*:*:*:masc:sg PRE:det:*:*:femi:sg NOM:*:*:*:femi:sg ADJ:*:*:*:femi:sg NPR:*:*:*:*:* NPR:*:*:*:*:* VER:aux:pres:3:*:sg VER:*:pper:*:masc:sg _il _presidente _del _repubblica _francese _Francois _Mitterrand _avere _proporre Named Entity Tagger Uses MaxEntropy NE categories: – Top level: NAME, ORGANIZATION, LOCATION, QUANTITY, TIME, EVENT, PRODUCT – Second level: 30-100. E.g. QUANTITY: • MONEY, CARDINAL, PERCENT, MEASURE, VOLUME, AGE, WEIGHT, SPEED, TEMPERATURE, ETC. See resources at CoNLL (cnts.uia.ac.be/connl2004) NE Features Feature types: – – – – – – – word-level (es. capitalization, digits, etc.) punctuation POS tag Category designator (Mr, Av.) Category suffix (center, museum, street, etc.) Lowercase intermediate terms (of, de, in) presence in controlled dictionaries (locations, people, organizations) Context: words in position -1, 0, +1 Sample training document <TEXT> Today the <ENAMEX TYPE='ORGANIZATION'>Dow Jones</ENAMEX> industrial average gained <NUMEX TYPE='MONEY'>thirtyeight and three quarter points</NUMEX>. When the first American style burger joint opened in <ENAMEX TYPE='LOCATION'>London</ENAMEX>'s fashionable <ENAMEX TYPE='LOCATION'>Regent street</ENAMEX> some <TIMEX TYPE='DURATION'>twenty years</TIMEX> ago, it was mobbed. Now it's <ENAMEX TYPE='LOCATION'>Asia</ENAMEX>'s turn. </TEXT> <TEXT> The temperatures hover in the <NUMEX TYPE='MEASURE'>nineties</NUMEX>, the heat index climbs into the <NUMEX TYPE='MEASURE'>hundreds</NUMEX>. And that's continued bad news for <ENAMEX TYPE='LOCATION'>Florida</ENAMEX> where wildfires have charred nearly <NUMEX TYPE='MEASURE'>three hundred square miles</NUMEX> in the last <TIMEX TYPE='DURATION'>month</TIMEX> and destroyed more than a <NUMEX TYPE='CARDINAL'>hundred</NUMEX> homes. </TEXT> Clustering Classification: assign an item to one among a given set of classes Clustering: find groupings of similar items (i.e. generate the classes) Conceptual Clustering of results Similar to Vivisimo – Built on the fly rather than from – Predefined categories (Northern Light) Generalized suffix tree of snippets Stemming Stop words (articulated, essential) Demo: python, upnp PiQASso: Pisa Question Answering System “Computers are useless, they can only give answers” Pablo Picasso PiQASso Architecture ? Question analysis Answer analysis MiniPar Relation Matching Answer Scoring Question Classification Type Matching Popularity Ranking MiniPar Answe r found? Document collection WNSense Sentence Splitter Indexer Query Formulation /Expansion WordNet Answer Pars Answer Linguistic tools WNSense • extracts lexical knowledge from WordNet • classifies words according to WordNet top-level categories, weighting its senses • • computes distance between words based on is-a links suggests word alternatives for query expansion Minipar [D. Lin] • Identifies dependency relations between words (e.g. subject, object, modifiers) • Provides POS tagging • Detects semantic types of words (e.g. location, person, organization) • Extensible: we integrated a Maximum Entropy based Named Entity Tagger Example: Theatre Categorization: artifact 0.60, communication 0.40 Synonyms: dramaturgy, theater, house, dramatics obj subj lex-mod What metal has the highest melting point? mod Question Analysis What metal has the highest melting point? 1. Parsing obj subj lex-mod What metal has the highest melting point? mod 2. Keyword extraction 3. Answer type detection metal, highest, melting, point SUBSTANCE 4. Relation extraction <SUBSTANCE, has, subj> <point, has, obj> <melting, point, lex-mod> <highest, point, mod> 1. NL question is parsed 2. POS tags are used to select search keywords 3. Expected answer type is determined applying heuristic rules to the dependency tree 4. Additional relations are inferred and the answer entity is identified Answer Analysis Tungsten is a very dense material and has the highest melting point of any metal. 1 Parsing …………. 2 Answer type check SUBSTANCE 1. Parse retrieved paragraphs 2. Paragraphs not containing an entity of the expected type are discarded 3. Dependency relations are extracted from Minipar output 4. Matching distance between word relations in question and answer is computed 5. Too distant paragraphs are filtered out 6. Popularity rank used to weight distances 3 Relation extraction <tungsten, material, pred> <tungsten, has, subj> <point, has, obj> … 4 Matching Distance Tungsten 5 Distance Filtering 6 Popularity Ranking ANSWER Match Distance between Question and Answer Analyze relations between corresponding words considering: number of matching words in question and in answer distance between words. Ex: moon matching with satellite relation types. Ex: words in the question related by subj while the matching words in the answer related by pred http://medialab.di.unipi.it/askpiqasso.html Improving PIQASso More NLP NLP techniques largely unsuccessful at information retrieval – Document retrieval as primary measure of information retrieval success • Document retrieval reduces the need for NLP techniques – Discourse factors can be ignored – Query words perform word-sense disambiguation – Lack of robustness: • NLP techniques are typically not as robust as word indexing How these technologies help? Question Analysis – The tag of the predicted category is added to the query Named-Entity Detection: – The NE categories found in text are included as tags in the index What party is John Kerry in? (ORGANIZATION) John Kerry defeated John Edwards in the primaries for the Democratic Party. Tags: PERSON, ORGANIZATION NLP Technologies Coreference Relations: – Interpretation of a paragraph may depend on the context in which it occurs Description Extraction: – Appositive and predicate nominative constructions provide descriptive terms about entities Coreference Relations Represented as annotations associated to words, i.e. words in the same position as the reference How long was Margaret Thatcher the prime minister? (DURATION) The truth, which has been added to over each of her 11 1/2 years in power, is that they don't make many like her anymore. Tags: DURATION Colocated: her, MARGARET THATCHER Description Extraction Identifies DESCRIPTION category Allows descriptive terms to be used in term expansion Who is Frank Gary? (DESCRIPTION) What architect designed the Guggenheim Museum in Bilbao? (PERSON) Famed architect Frank Gary… Tags: DESCRIPTION, PERSON, LOCATION Buildings he designed include the Guggenheim Museum in Bilbao. Colocation: he, FRANK GARY NLP Technologies Question Analysis: – identify the semantic type of the expected answer implicit in the query Named-Entity Detection: – determine the semantic type of proper nouns and numeric amounts in text Will it work? Will these semantic relations improve paragraph retrieval? – Are the implementations robust enough to see a benefit across large document collections and question sets? – Are there enough questions where these relationships are required to find an answer? Hopefully yes! Preprocessing Paragraph Detection Sentence Detection Tokenization POS Tagging NP-Chunking Queries to a NE enhanced index text matches bush text matches PERSON:bush text matches LOCATION:* & PERSON: bin-laden text matches DURATION:* PERSON:margaret-thatcher primeminister Coreference Task: – Determine space of entity extents: • Basal noun phrases: – Named entities consisting of multiple basal noun phrases are treated as a single entity • Pre-nominal proper nouns • Possessive pronouns – Determine which extents refer to the same entity in the world Paragraph Retrieval Indexing: – add NE tags for each NE category present in the text – add coreference relationships – Use syntactically-based categorical relations to create a DESCRIPTION category for term expansion – Use IXE passage indexer High Composability DocInfo name date size Collection<DocInfo> Collection<PassageDoc> Cursor PassageDoc next() text boundaries QueryCursor next() PassageQueryCursor next() Tagged Documents select documents where – text matches bush – text matches PERSON:bush – text matches osama & LOCATION:* QueryCursor QueryCursorWord QueryCursorTaggedWord Combination Searching passages on a collection of tagged documents QueryCursor<Collection> PassageQueryCursor<Collection<TaggedDoc>> Paragraph Retrieval Retrieval: – Use question analysis component to predict answer category and append it to the question – Evaluate using TREC questions and answer patterns • 500 questions System Overview Indexing Retrieval Documents Paragr. Splitter Sent. Splitter Tokenization Question Coreference Resolution Description Extraction POS tagger IXE indexer NE Recognizer Paragraphs+ Question Analysis IXE Search Paragraphs Conclusion QA is a challenging task Involves state of the art techniques in various fields: – IR – NLP – AI – Managing large data sets – Advanced Software Technologies