Natural Language Question Answering Gary Geunbae Lee NLP Lab. Professor@POSTECH Chief Scientist@DiQuest Kiss_tutorial 2 Dan Jurafsky Question Answering One of the oldest NLP tasks (punched card systems in 1961) Simmons, Klein, McConlogue. 1964. Indexing and Dependency Logic for Answering English Questions. American Documentation 15:30, 196-204 2 Dan Jurafsky Question Answering: IBM’s Watson • Won Jeopardy on February 16, 2011! WILLIAM WILKINSON’S “AN ACCOUNT OF THE PRINCIPALITIES OF WALLACHIA AND MOLDOVIA” INSPIRED THIS AUTHOR’S MOST FAMOUS NOVEL 3 Bram Stoker Dan Jurafsky Apple’s Siri 4 Dan Jurafsky Wolfram Alpha 5 6 Task Definition Open-domain QA Find answers to open-domain natural language questions from large collections of documents Questions are usually about factoid Documents includes texts, web pages, databases, multimedia, maps, etc. Canned QA Map new questions into predefined questions for which answers exist. FAQ Finder Gary G. Lee, Postech 7 Example (1) Q: What is the fastest car in the world? Correct answer: …, the Jaguar XJ220 is the dearest (Pounds 415,000), fastest (217 mph / 350 kmh) and most sought-after car in the world. Wrong answer: … will stretch Volkswagen’s lead in the world’s fastest-growing vehicle market. Demand is expected to soar in the next few years as more Chinese are able to afford their own cars. Gary G. Lee, Postech 8 Example (2) Q: How did Socrates die? Answer: … Socrates drunk poisoned wine. Needs world knowledge and reasoning. (Anyone drinking or eating something that is poisoned is likely to die.) Gary G. Lee, Postech 9 A taxonomy of QA systems Moldovan et al., ACL 2002 Class Type 1 Factual 2 Simple reasoning 3 Fusion list 4 Interactive context 5 Speculative Example Q: What is the largest city in Germany? A: … Berlin, the largest city in Germany, … Q: How did Socrates die? A: … Socrates poisoned himself … Q: What are the arguments for and against prayer in school? Answer across several texts Clarification questions Q: Should the Fed raise interest rates at their next meeting? Answer provides analogies to past actions Gary G. Lee, Postech 10 Enabling Technologies and Applications Enabling Technologies Applications POS Tagger Parser WSD Named Entity Tagger Information Retrieval Information Extraction Inference engine Ontology (WordNet) Knowledge Acquisition Knowledge Classification Language generation Smart agents Situation Management E-Commerce Summarization Tutoring / Learning Personal Assistant in business On-line Documentation On-line Troubleshooting Semantic Web Gary G. Lee, Postech 11 Question Taxonomies Wendy Lehnert’s Taxonomy 13 question categories Based on Shank’s Conceptual Dependency Arthur Graesser’s Taxonomy 18 question categories Expanding Lehnert’s categories Based on speech act theory Question Types in LASSO [Moldovan et al., TREC-8] Combines question stems, question focus and phrasal heads. Unifies the question class with the answer type via the question focus. Gary G. Lee, Postech 12 Earlier QA Systems The QUALM system (Wendy Lehnert, 1978) Reads stories and answers questions about what was read The LUNAR system (W. Woods, 1977) One of the first user evaluations of question answering systems The STUDENT system (Terry Winograd, 1977) Solved high school algebra problems The MURAX system (Julian Kupiec, 1993) Uses online encyclopedia for closed-class questions The START system (Boris Katz, 1997) Uses annotations to process questions from the Web FAQ-Finder (Robin Burke, 1997) Uses Frequently Asked Questions from Usenet newsgroups Gary G. Lee, Postech 13 Recent QA Conferences TREC-8 TREC-9 TREC-10 TREC-11 NTCIR-3 No. Questions 200 693 500 500 200 Collection size 2GB 3GB 3GB 3GB (AQUAINT) 280MB Question source From text Query logs Query logs Query logs From text Participating systems 41 75 92 76 36 Performance 64.5% 76% 69% 85.6% 60.8% Answer format 50&250 bytes 50&250 bytes 50 bytes Exact Exact Top 5, MRR Top 5, MRR Main task List task Context task Null answer Top 1, CW Main task List task Null answer Top 5, MRR Task 1 Task 2 Task 3 Top 5, MRR Gary G. Lee, Postech Dan Jurafsky Types of Questions in Modern Systems • Factoid questions • • • • Who wrote “The Universal Declaration of Human Rights”? How many calories are there in two slices of apple pie? What is the average age of the onset of autism? Where is Apple Computer based? • Complex (narrative) questions: • In children with an acute febrile illness, what is the efficacy of acetaminophen in reducing fever? • What do scholars think about Jefferson’s position on dealing with pirates? 14 Dan Jurafsky Commercial systems: mainly factoid questions Where is the Louvre Museum located? In Paris, France What’s the abbreviation for limited partnership? L.P. What are the names of Odin’s ravens? Huginn and Muninn What currency is used in China? The yuan What kind of nuts are used in marzipan? almonds What instrument does Max drums Dan Jurafsky Paradigms for QA • IR-based approaches • TREC; IBM Watson; Google • Knowledge-based and Hybrid approaches • IBM Watson; Apple Siri; Wolfram Alpha; True Knowledge Evi 16 Dan Jurafsky Many questions can already be answered by web search • a 17 Dan Jurafsky IR-based Question Answering • a 18 Dan Jurafsky IR-based Factoid QA Document DocumentDocument Document Document Document Answer Indexing Passage Retrieval Question Processing Query Formulation Question Answer Type Detection 19 Document Retrieval Docume Docume nt Docume nt Docume nt Docume nt Relevant nt Docs Passage Retrieval passages Answer Processing Dan Jurafsky IR-based Factoid QA • QUESTION PROCESSING • Detect question type, answer type, focus, relations • Formulate queries to send to a search engine • PASSAGE RETRIEVAL • Retrieve ranked documents • Break into suitable passages and rerank • ANSWER PROCESSING • Extract candidate answers • Rank candidates • using evidence from the text and external sources Dan Jurafsky Knowledge-based approaches (Siri) • Build a semantic representation of the query • Times, dates, locations, entities, numeric quantities • Map from this semantics to query structured data or resources • • • • 21 Geospatial databases Ontologies (Wikipedia infoboxes, dbPedia, WordNet, Yago) Restaurant review sources and reservation services Scientific databases Dan Jurafsky Hybrid approaches (IBM Watson) • Build a shallow semantic representation of the query • Generate answer candidates using IR methods • Augmented with ontologies and semi-structured data • Score each candidate using richer knowledge sources • Geospatial databases • Temporal reasoning • Taxonomical classification 22 Dan Jurafsky Factoid Q/A Document DocumentDocument Document Document Document Answer Indexing Passage Retrieval Question Processing Query Formulation Question Answer Type Detection 23 Document Retrieval Docume Docume nt Docume nt Docume nt Docume nt Relevant nt Docs Passage Retrieval passages Answer Processing Dan Jurafsky Question Processing Things to extract from the question • Answer Type Detection • Decide the named entity type (person, place) of the answer • Query Formulation • Choose query keywords for the IR system • Question Type classification • Is this a definition question, a math question, a list question? • Focus Detection • Find the question words that are replaced by the answer • Relation Extraction • Find relations between entities in the question 24 Dan Jurafsky Question Processing They’re the two states you could be reentering if you’re crossing Florida’s northern border • • • • 25 Answer Type: US state Query: two states, border, Florida, north Focus: the two states Relations: borders(Florida, ?x, north) Dan Jurafsky Answer Type Detection: Named Entities • Who founded Virgin Airlines? • PERSON • What Canadian city has the largest population? • CITY. Dan Jurafsky Answer Type Taxonomy Xin Li, Dan Roth. 2002. Learning Question Classifiers. COLING'02 • 6 coarse classes • ABBEVIATION, ENTITY, DESCRIPTION, HUMAN, LOCATION, NUMERIC • 50 finer classes • LOCATION: city, country, mountain… • HUMAN: group, individual, title, description • ENTITY: animal, body, color, currency… 27 Dan Jurafsky Part of Li & Roth’s Answer Type Taxonomy city country state reason expression LOCATION definition abbreviation ABBREVIATION DESCRIPTION individual food ENTITY NUMERIC currency animal HUMAN date title group money percent distance size 28 Dan Jurafsky Answer Types 29 Dan Jurafsky More Answer Types 30 Dan Jurafsky Answer types in Jeopardy Ferrucci et al. 2010. Building Watson: An Overview of the DeepQA Project. AI Magazine. Fall 2010. 59-79. • 2500 answer types in 20,000 Jeopardy question sample • The most frequent 200 answer types cover < 50% of data • The 40 most frequent Jeopardy answer types he, country, city, man, film, state, she, author, group, here, company, president, capital, star, novel, character, woman, river, island, king, song, part, series, sport, singer, actor, play, team, show, actress, animal, presidential, composer, musical, nation, book, title, leader, game 31 Dan Jurafsky Answer Type Detection • Hand-written rules • Machine Learning • Hybrids Dan Jurafsky Answer Type Detection • Regular expression-based rules can get some cases: • Who {is|was|are|were} PERSON • PERSON (YEAR – YEAR) • Other rules use the question headword: (the headword of the first noun phrase after the wh-word) • Which city in China has the largest number of foreign financial companies? • What is the state flower of California? Dan Jurafsky Answer Type Detection • Most often, we treat the problem as machine learning classification • Define a taxonomy of question types • Annotate training data for each question type • Train classifiers for each question class using a rich set of features. • features include those hand-written rules! 34 Dan Jurafsky Features for Answer Type Detection • • • • • 35 Question words and phrases Part-of-speech tags Parse features (headwords) Named Entities Semantically related words Dan Jurafsky Factoid Q/A Document DocumentDocument Document Document Document Answer Indexing Passage Retrieval Question Processing Query Formulation Question Answer Type Detection 36 Document Retrieval Docume Docume nt Docume nt Docume nt Docume nt Relevant nt Docs Passage Retrieval passages Answer Processing Dan Jurafsky Keyword Selection Algorithm Dan Moldovan, Sanda Harabagiu, Marius Paca, Rada Mihalcea, Richard Goodrum, Roxana Girju and Vasile Rus. 1999. Proceedings of TREC-8. 1. Select all non-stop words in quotations 2. Select all NNP words in recognized named entities 3. Select all complex nominals with their adjectival modifiers 4. Select all other complex nominals 5. Select all nouns with their adjectival modifiers 6. Select all other nouns 7. Select all verbs 8. Select all adverbs 9. Select the QFW word (skipped in all previous steps) 10. Select all other words Dan Jurafsky Choosing keywords from the query Slide from Mihai Surdeanu Who coined the term “cyberspace” in his novel “Neuromancer”? 1 4 1 4 7 cyberspace/1 Neuromancer/1 term/4 novel/4 coined/7 38 Dan Jurafsky Factoid Q/A Document DocumentDocument Document Document Document Answer Indexing Passage Retrieval Question Processing Query Formulation Question Answer Type Detection 39 Document Retrieval Docume Docume nt Docume nt Docume nt Docume nt Relevant nt Docs Passage Retrieval passages Answer Processing Dan Jurafsky Passage Retrieval • Step 1: IR engine retrieves documents using query terms • Step 2: Segment the documents into shorter units • something like paragraphs • Step 3: Passage ranking • Use answer type to help rerank passages 40 Dan Jurafsky Features for Passage Ranking Either in rule-based classifiers or with supervised machine learning • • • • • • Number of Named Entities of the right type in passage Number of query words in passage Number of question N-grams also in passage Proximity of query keywords to each other in passage Longest sequence of question words Rank of the document containing passage Dan Jurafsky Factoid Q/A Document DocumentDocument Document Document Document Answer Indexing Passage Retrieval Question Processing Query Formulation Question Answer Type Detection 42 Document Retrieval Docume Docume nt Docume nt Docume nt Docume nt Relevant nt Docs Passage Retrieval passages Answer Processing Dan Jurafsky Answer Extraction • Run an answer-type named-entity tagger on the passages • Each answer type requires a named-entity tagger that detects it • If answer type is CITY, tagger has to tag CITY • Can be full NER, simple regular expressions, or hybrid • Return the string with the right type: • Who is the prime minister of India (PERSON) Manmohan Singh, Prime Minister of India, had told left leaders that the deal would not be renegotiated . • How tall is Mt. Everest? (LENGTH) The official height of Mount Everest is 29035 feet Dan Jurafsky Ranking Candidate Answers • But what if there are multiple candidate answers! Q: Who was Queen Victoria’s second son? • Answer Type: Person • Passage: The Marie biscuit is named after Marie Alexandrovna, the daughter of Czar Alexander II of Russia and wife of Alfred, the second son of Queen Victoria and Prince Albert Dan Jurafsky Ranking Candidate Answers • But what if there are multiple candidate answers! Q: Who was Queen Victoria’s second son? • Answer Type: Person • Passage: The Marie biscuit is named after Marie Alexandrovna, the daughter of Czar Alexander II of Russia and wife of Alfred, the second son of Queen Victoria and Prince Albert Dan Jurafsky Use machine learning: Features for ranking candidate answers Answer type match: Candidate contains a phrase with the correct answer type. Pattern match: Regular expression pattern matches the candidate. Question keywords: # of question keywords in the candidate. Keyword distance: Distance in words between the candidate and query keywords Novelty factor: A word in the candidate is not in the query. Apposition features: The candidate is an appositive to question terms Punctuation location: The candidate is immediately followed by a comma, period, quotation marks, semicolon, or exclamation mark. Sequences of question terms: The length of the longest sequence of question terms that occurs in the candidate answer. Dan Jurafsky Candidate Answer scoring in IBM Watson • Each candidate answer gets scores from >50 components • (from unstructured text, semi-structured text, triple stores) • logical form (parse) match between question and candidate • passage source reliability • geospatial location • California is ”southwest of Montana” • temporal relationships • taxonomic classification 47 Dan Jurafsky Common Evaluation Metrics 1. Accuracy (does answer match gold-labeled answer?) 2. Mean Reciprocal Rank • For each query return a ranked list of M candidate answers. • Its score is 1/Rank of the first right answer. • Take the mean over all N queries N MRR = 48 1 å rank i i=1 N 49 A Generic QA Architecture Document Collection Question Question Processing QA Query Answers Document Retrieval Documents Answer Extraction & Formulation Resources (dictionary, ontology, …) Gary G. Lee, Postech 50 IBM - Statistical QA Ittycheriah et al. TREC-9, TREC-10 Answer type Answer Type Docs Prediction Question Query & Focus Expansion IR Named Entity Marking Answer Selection Answers 31 answer categories Named Entity tagging Focus expansion using WordNet Query expansion using encyclopedia Dependency Relations matching Use a maximum entropy model in both Answer Type Prediction and Answer Selection Gary G. Lee, Postech 51 USC/ISI - Webclopedia Hovy et al. TREC-9, TREC-10 Parsing Create Query IR Question Parsing Docs Select & rank sentences Matching QA typology Constraint patterns Inference Answers Ranking Parsing questions and top segments CONTEX Query expansion using WordNet Score each sentence using word scores Match constraint patterns & parse trees Match semantic type & parse tree elements Gary G. Lee, Postech 52 LCC - PowerAnswer Moldovan et al. ACL-02 Question M1: Keyword pre-processing (split/bind/spell) Loop1 M2: Construction of question representation M3: Derivation of expected answer type M4: Keyword selection M5: Keyword expansion Loop2 Loop3 M6: Actual retrieval of documents and passages M7: Passage post-filtering M8: Identification of candidate answers M8’: Logic Proving M9: Answer ranking M5: Answer formulation Answers Docs Gary G. Lee, Postech 53 LCC – PowerAnswer (cont’d) Keyword alternations Morphological alternations Lexical alternations invent inventor How far distance Semantic alternations like prefer 3 feedback loops Loop1: adjust Boolean queries Loop2: lexical alternations Loop3: semantic alternations Logic proving Unification between the question and answer logic forms Gary G. Lee, Postech Dan Jurafsky Relation Extraction • Answers: Databases of Relations • born-in(“Emma Goldman”, “June 27 1869”) • author-of(“Cao Xue Qin”, “Dream of the Red Chamber”) • Draw from Wikipedia infoboxes, DBpedia, FreeBase, etc. • Questions: Extracting Relations in Questions Whose granddaughter starred in E.T.? (acted-in ?x “E.T.”) (granddaughter-of ?x ?y) 54 Dan Jurafsky Temporal Reasoning • Relation databases • (and obituaries, biographical dictionaries, etc.) • IBM Watson ”In 1594 he took a job as a tax collector in Andalusia” Candidates: • Thoreau is a bad answer (born in 1817) • Cervantes is possible (was alive in 1594) 55 Dan Jurafsky Geospatial knowledge (containment, directionality, borders) • Beijing is a good answer for ”Asian city” • California is ”southwest of Montana” • geonames.org: 56 Dan Jurafsky Context and Conversation in Virtual Assistants like Siri • Coreference helps resolve ambiguities U: “Book a table at Il Fornaio at 7:00 with my mom” U: “Also send her an email reminder” • Clarification questions: U: “Chicago pizza” S: “Did you mean pizza restaurants in Chicago or Chicago-style pizza?” 57 Dan Jurafsky Answering harder questions Q: What is water spinach? A: Water spinach (ipomoea aquatica) is a semi-aquatic leafy green plant with long hollow stems and spear- or heart-shaped leaves, widely grown throughout Asia as a leaf vegetable. The leaves and stems are often eaten stir-fried flavored with salt or in soups. Other common names include morning glory vegetable, kangkong (Malay), rau muong (Viet.), ong choi (Cant.), and kong xin cai (Mand.). It is not related to spinach, but is closely related to sweet potato and convolvulus. Dan Jurafsky Answering harder question Q: In children with an acute febrile illness, what is the efficacy of single medication therapy with acetaminophen or ibuprofen in reducing fever? A: Ibuprofen provided greater temperature decrement and longer duration of antipyresis than acetaminophen when the two drugs were administered in approximately equal doses. (PubMedID: 1621668, Evidence Strength: A) Dan Jurafsky Answering harder questions via query-focused summarization • The (bottom-up) snippet method • Find a set of relevant documents • Extract informative sentences from the documents (using tf-idf, MMR) • Order and modify the sentences into an answer • The (top-down) information extraction method • build specific answerers for different question types: • definition questions, • biography questions, • certain medical questions Dan Jurafsky The Information Extraction method • a good biography of a person contains: • a person’s birth/death, fame factor, education, nationality and so on • a good definition contains: • genus or hypernym • The Hajj is a type of ritual • a medical answer about a drug’s use contains: • the problem (the medical condition), • the intervention (the drug or procedure), and • the outcome (the result of the study). Dan Jurafsky Information that should be in the answer for 3 kinds of questions Dan Jurafsky Architecture for complex question answering: . definition questions S. Blair-Goldensohn, K. McKeown and A. Schlaikjer. 2004. Answering Definition Questions: A Hyrbid Approach "What is the Hajj?" (Ndocs=20, Len=8) Document Retrieval 11 Web documents 1127 total sentences The Hajj, or pilgrimage to Makkah [Mecca], is the central duty of Islam. More than two million Muslims are expected to take the Hajj this year. Muslims must perform the hajj at least once in their lifetime if physically and financially able. The Hajj is a milestone event in a Muslim's life. The annual hajj begins in the twelfth month of the Islamic year (which is lunar, not solar, so that hajj and Ramadan fall sometimes in summer, sometimes in winter). The Hajj is a week-long pilgrimage that begins in the 12th month of the Islamic lunar calendar. Another ceremony, which was not connected with the rites of the Ka'ba before the rise of Islam, is the Hajj, the annual pilgrimage to 'Arafat, about two miles east of Mecca, toward Mina… Predicate Identification 9 Genus-Species Sentences The Hajj, or pilgrimage to Makkah (Mecca), is the central duty of Islam. The Hajj is a milestone event in a Muslim's life. The hajj is one of five pillars that make up the foundation of Islam. ... 383 Non-Specific Definitional sentences Definition Creation Sentence clusters, Importance ordering Data-Driven Analysis SiteQ system 64 Postech qa: Answer Type Top 18, Total 84 DATE TIME PROPERTY LANGUAGE_UNIT LANGUAGE SYMBOLIC_REP ACTION ACTIVITY LIFE_FORM QUANTITY NATURAL_OBJECT LOCATION SUBSTANCE ARTIFACT GROUP PHENOMENON STATUS BODY_PART …… AGE DISTANCE DURATION LENGTH MONETARY_VALUE NUMBER PERCE NTAGE POWER RATE SIZE SPEED TEMPERATU RE VOLUME WEIGHT …… Gary G. Lee, Postech SiteQ system 65 Layout of SiteQ/J Question Index DB Part-Of-Speech Tagging (POSTAG/J) Document Retrieval (POSNIR/J) Semantic Category Dictionary Chunking & Category Processing Passage Selection - NP,VP chunking - syntactic/semantic category check - divide document into dynamic passages - score passages Identification of Answer Type LSP grammar for question - locate question word - LSP matching - date constraint check Detection of Answer Candidates - LSP matching LSP grammar for answer Answer Filtering & Scoring & Ranking Answer Gary G. Lee, Postech SiteQ system 66 Question Processing question Tagger NP Chunker Query Formatter query Answer Type Determiner Normalizer RE Matcher Qnorm dic LSP answer type Gary G. Lee, Postech SiteQ system 67 Lexico-Semantic Patterns LSP is a pattern expressed by lexical entries, POS, syntactic categories and semantic categories and more flexible than surface pattern Used for Identification of answer type of a question Detection of answer candidates from selected passages Gary G. Lee, Postech SiteQ system 68 Answer Type Determination (1) Who was the 23rd president of the United States? Tagger Who/WP was/VBD the/DT 23rd/JJ president/NN of/IN the/DT United/NP States/NPS ?/SENT ? NP Chunker Who/WP was/VBD [[ the/DT 23rd/JJ president/NN ] of/IN [ the/DT United/NP States/NPS ]] ?/SENT ? Normalizer %who%be@person RE Matcher (%who)(%be)(@person) PERSON Gary G. Lee, Postech SiteQ system 69 Dynamic Passage Selection (1) Analysis of answer passages of TREC-10 questions About 80% of query terms occurred within 50-word (or 3-sentence) window including answer string in its center Definition of passage Prefer sentence window to word window Prefer dynamic passage to static passage Passage is defined as any consecutive three sentences Gary G. Lee, Postech SiteQ system 70 Answer Extraction Query-term detection Query-term filtering Answer detection core Stop-Word Term list Noun-phrase chunking Pivot creation Answer detection Stemming (Porter’s) AAD processing Answer filtering WordNet Answer scoring Gary G. Lee, Postech SiteQ system 71 Answer Detection (1) … The/DT velocity/NN of/IN light/NN is/VBZ 300,000/CD km/NN per/IN second/JJ in/IN the/DT physics/NN textbooks/NNS … cd@unit_length%per@unit_time speed|1|4|4 Gary G. Lee, Postech SiteQ system 72 Answer Scoring Based on Passage score Number of unique terms matched in the passage Distances from each matched term AScore PScore ptuc ptuc (1 ) LSPwgt qtuc ( ptuc avgdist ) LSPwgt: the weight of LSP grammar avgdist: the average distance between the answer candidate and each matched term Gary G. Lee, Postech SiteQ system 73 Performance in TREC-10 Rank # of Qs (strict) # of Qs (lenient) Avg. of 67 runs 1 121 124 88.58 2 45 49 28.24 3 24 29 20.46 4 15 16 12.57 5 11 14 12.46 276 260 329.7 0.320 0.335 0.234 No MRR Q-ty pe freq how + adj/adv 31 how do 2 what do 24 what is 242 what/which noun 88 when 26 where 27 who 46 why 4 name a 2 Total 492 MRR MRR (strict) (lenient) 0.316 0.250 0.050 0.308 0.289 0.362 0.515 0.464 0.125 0.500 0.332 0.250 0.050 0.320 0.331 0.362 0.515 0.47 1 0.125 0.500 Gary G. Lee, Postech SiteQ system 74 Performance in QAC-1 Performance of question answering Return 994 answer candidates among which 183 responses were judged as correct MRR is 0.608, the best performance (average is 0.303) Question Answer 200 Output Correct 288 Recall Precision 0.635 0.184 994 F-measure 183 MRR 0.285 Rank # of questions 1 98 2 27 3 14 4 6 5 4 Total 149 0.608 Gary G. Lee, Postech 75 Contents Introduction: QA as a high precision search Architecture of QA Systems Main components for QA A QA system: SiteQ Issues for advanced QA Gary G. Lee, Postech 76 Question Interpretation Use pragmatic knowledge “Will Prime Minister Mori survive the political crisis?” The analyst is not literary meaning Will Prime Minister Mori be still alive when the political crisis is over? But expresses the belief that The current political crisis might cost the Japanese Prime Minister his job. Decompose complex questions into series of simpler questions Expand the question with contextual information Gary G. Lee, Postech 77 Semantic Parsing Use the explicit representation of semantic roles as encoded in FrameNet FrameNet provides database of words with associated semantic roles Enables efficient, computable semantic representation of questions and texts. Filling in semantic roles Prominent role for named entities Generalization across domains FrameNet coverage and fallback techniques Gary G. Lee, Postech 78 Information Fusion Combining answer fragments in a concise response Technologies Learning paraphrases, syntactic and lexical Detecting important differences Merging descriptions Learning and formulating content plans Gary G. Lee, Postech 79 Answer Generation Compare request fills from several answer extractions to derive answer Merge / remove duplicates Detect ambiguity Reconcile answer (secondary search) Identify possible contradictions Apply utility measures Gary G. Lee, Postech 80 And more … Real Time QA Multi-Lingual QA Interactive QA Advanced Reasoning for QA User Profiling for QA Collaborative QA Spoken language QA Gary G. Lee, Postech