Semantic Retrieval for Question Answering

advertisement
Semantic Retrieval
for Question Answering
Student Research Symposium
Language Technologies Institute
AQUAINT
Program
Matthew W. Bilotti
mbilotti@cs.cmu.edu
September 23, 2005
Outline
•
•
•
•
•
•
What is Question Answering?
What is the cause of wrong answers?
What is Semantic Retrieval, and can it help?
What have other teams tried?
How is JAVELIN using Semantic Retrieval?
How can we evaluate the impact of Semantic
Retrieval on Question Answering systems?
• Where can we go from here?
What is Question Answering?
• A process that finds succinct answers to questions
phrased in natural language
Input
Question
Question
Answering
Output
Answers
Q: “Where is Carnegie Mellon?”
A: “Pittsburgh, Pennsylvania, USA”
Q: “Who is Jared Cohon?”
A: “... is the current President of Carnegie Mellon University?”
Q: “When was Herbert Simon born?”
A: “15 June 1916”
Google. http://www.google.com
Classic “Pipelined” QA Architecture
Input
Question
Question
Analysis
Document
Retrieval
Answer
Extraction
PostProcessing
Output
Answers
• A sequence of discrete modules cascaded
such that the output of the previous
module is the input to the next module.
Classic “Pipelined” QA Architecture
“Where was Andy Warhol born?
Input
Question
Question
Analysis
Document
Retrieval
Answer
Extraction
PostProcessing
Output
Answers
Classic “Pipelined” QA Architecture
“Where was Andy Warhol born?
Input
Question
Question
Analysis
Discover keywords
in the question,
generate
alternations, and
determine answer
type.
Document
Retrieval
Answer
Extraction
PostProcessing
Output
Answers
Keywords: Andy (Andrew), Warhol, born
Answer type: Location (City)
Classic “Pipelined” QA Architecture
Input
Question
Question
Analysis
Document
Retrieval
Formulate IR
queries using the
keywords, and
retrieve answerbearing documents
Answer
Extraction
PostProcessing
Output
Answers
( Andy OR Andrew ) AND
Warhol AND born
Classic “Pipelined” QA Architecture
Input
Question
Question
Analysis
Document
Retrieval
Answer
Extraction
PostProcessing
“Andy Warhol was born on
August 6, 1928 in Pittsburgh and
died February 22, 1927 in New York.”
“Andy Warhol was born to Slovak immigrants
as Andrew Warhola on August 6, 1928, on
73 Orr Street in Soho, Pittsburgh, Pennsylvania.”
Output
Answers
Extract answers of
the expected type
from retrieved
documents.
Classic “Pipelined” QA Architecture
Input
Question
Question
Analysis
Document
Retrieval
Answer
Extraction
PostProcessing
Output
Answers
1. “Pittsburgh,
select appropriate
Pennsylvania”
granularity
Pittsburgh,
merge
2. “New York”
Pennsylvania
1. 73 Orr Street in Soho,
Pittsburgh, Pennsylvania
Answer cleanup and
merging, consistency or
2. New York
constraint checking,
rank
answer selection and
presentation.
Pittsburgh
What is the cause of wrong answers?
Failure point
Input
Question
Question
Analysis
Document
Retrieval
Answer
Extraction
PostProcessing
Output
Answers
• A pipelined QA system is only as good as its
weakest module
• Poor retrieval and/or query formulation can
result in low ranks for answer-bearing
documents, or no answer-bearing documents
retrieved
What is Semantic Retrieval, and
can it help?
• Semantic Retrieval is a broad term for a
document retrieval technique that
makes use of semantic information and
language understanding
• Hypothesis: Use of Semantic Retrieval
can improve performance, retrieving
more, and more highly-ranked, relevant
documents
What have other teams tried?
• LCC/SMU approach
– Use an existing IR system as a black box; rich
query expansion
• CL Research approach
– Process top documents retrieved from an IR
engine, extracting semantic relation triples,
index and retrieve using RDBMS
• IBM (Prager) Predictive Annotation
– Store answer types (QA-Tokens) in the IR
system’s index, and retrieve on them
LCC/SMU Approach
• Syntactic relationships (controlled synonymy), morphological and
derivational expansions for Boolean keywords
• Statistical passage extraction finds windows around keywords
• Semantic constraint check for filtering (unification)
• NE recognition and pattern matching as a third pass for answer
extraction
• Ad hoc relevance scoring: term proximity, occurrence of answer in
an apposition, etc
Extended
Wordnet
Boolean
query
Passage
Extraction
Constraint
Checking
Named
Entity
Extraction
IR
Keywords and
Alternations
Documents
Passages
Answer Candidates
Moldovan, et. al., Performance issues and error analysis in an open-domain QA system, ACM TOIS, vol. 21, no. 2. 2003
Litkowski/CL Research Approach
• Relation triples: discourse entity (NP) + semantic role or
relation + governing word; essentially similar to our
predicates
• Unranked XPath querying against RDBMS
entity mention
canonicalization
lazy dog
The quick brown fox
jumped over the lazy dog.
quick brown fox
Docs
10-20 top PRISE
documents
jumped
RDBMS
Sentences
Semantic
relationship
triples
XML/xpath
Litkowski, K.C. Question Answering Using XML-Tagged Documents. TREC 2003
Predictive Annotation
• Textract identifies candidate answers at indexing time
• QA-Tokens are indexed as text items along with actual
doc tokens
• Passage retrieval, with simple bag-of-words combomatch (heuristic) ranking formula
Answer type
taxonomy
Gasoline cost $0.78 per
gallon in 1999.
Docs
Corpus
Textract
(IE/NLP)
Gasoline cost $0.78
MONEY$ per gallon
VOLUME$ in 1999
YEAR$.
IR
QA-Tokens
Prager, et. al. Question-answering by predictive annotation. SIGIR 2000
How is JAVELIN using Semantic
Retrieval?
• Annotate corpus with semantic content
(e.g. predicates), and index this content
• At runtime, perform similar analysis on
input questions to get predicate templates
• Maximal recall of documents that contain
matching predicate instances
• Constraint checking at the answer
extraction stage to filter out false positives
and rank best matches
Nyberg, et. al. “Extending the JAVELIN QA System with Domain Semantics”, AAAI 2005.
Annotating and Indexing the
Corpus
PredicateArgument
Structure
loves
ARG1
ARG0
John
Mary
ARG1
loves
Actual
Index
Content
Mary
Annotation
Framework
Text
ARG0
RDBMS
Indexer
John
IR
Nyberg, et. al. “Extending the JAVELIN QA System with Domain Semantics”, AAAI 2005.
Retrieval on Predicate-Argument
Structure
“Who does John love?"
Input
Question
Question
Analysis
Document
Retrieval
Answer
Extraction
PostProcessing
Output
Answers
Nyberg, et. al. “Extending the JAVELIN QA System with Domain Semantics”, AAAI 2005.
Retrieval on Predicate-Argument
Structure
“Who does John love?"
Input
Question
Question
Analysis
Document
Retrieval
Answer
Extraction
PostProcessing
Output
Answers
loves
ARG0
John
ARG1
?x
Predicate-Argument Template
Nyberg, et. al. “Extending the JAVELIN QA System with Domain Semantics”, AAAI 2005.
Retrieval on Predicate-Argument
Structure
“Who does John love?"
Question
Analysis
Input
Question
Document
Retrieval
Answer
Extraction
PostProcessing
Output
Answers
What the IR engine sees:
IR
loves
ARG0
John
ARG1
?x
Some Retrieved
Documents:
“Frank loves Alice.
John dislikes Bob."
"John loves Mary.”
Nyberg, et. al. “Extending the JAVELIN QA System with Domain Semantics”, AAAI 2005.
Retrieval on Predicate-Argument
Structure
“Who does John love?"
Input
Question
Question
Analysis
“Frank loves Alice.
John dislikes Bob."
Document
Retrieval
Answer
Extraction
Post-
Output
Answers
Processing
“Mary”
X
RDBMS
"John loves Mary.”
loves
ARG0
Frank
loves
ARG1
ARG0
Alice
dislikes
ARG0
John
John
ARG1
Mary
ARG1
Bob
Matching Predicate Instance
Nyberg, et. al. “Extending the JAVELIN QA System with Domain Semantics”, AAAI 2005.
How can we evaluate the impact of
Semantic Retrieval on QA systems?
•
Performance can be indirectly evaluated by
measuring the performance of the end-to-end
QA system while varying the document retrieval
strategy employed, in one of two ways:
– NIST-style comparative evaluation
– Absolute evaluation against new test sets
•
Direct analysis of document retrieval
performance
– Requires an assumption such as, “maximal recall of
relevant documents translates to best end-to-end
system performance”
NIST-style Comparative Evaluation
• Answer keys developed by pooling
– All answers gathered by all systems are checked by a human to
develop the answer key
– Voorhees showed that the comparative orderings between
systems are stable regardless of exhaustiveness of judgments
– Answer keys from TREC evaluations are never suitable for posthoc evaluation (nor were they intended to be), since they may
penalize a new strategy for discovering good answers not in the
original pool
• Manual scoring
– Judging system output involves semantics (Voorhees 2003)
– Abstract away from differences in vocabulary or syntax, and
robustly handle paraphrase
• This is the same methodology used in the Definition QA
evaluation in TREC 2003 and 2004
Absolute Evaluation
• Requires building new test collections
– Not dependent on pooled results from systems, so suitable for
post-hoc experimentation
– Human effort is required; a methodology is described in (Katz
and Lin 2005), (Bilotti, Katz and Lin 2004) and (Bilotti 2004)
• Automatic scoring methods based on n-grams, or fuzzy
unification on predicate-argument structure (Lin and
Demner-Fushman 2005), (Vandurme et al. 2003) can be
applied
• Can evaluate at the level of documents or passages
retrieved, predicates matched, or answers extracted,
depending on the level of detail in the test set.
Preliminary Results:
TREC 2005 Relationship QA Track
• 25 scenario-type questions; the first time
such questions have occurred officially in
the TREC QA track
• Semi-automatic runs were allowed:
JAVELIN submitted a second run using
manual question analysis
• Results (in MRR of relevant nuggets):
– Run 1: 0.1356
– Run 2: 0.5303
• Example on the next slide!
Example: Question Analysis
The analyst is interested in Iraqi oil smuggling. Specifically, is Iraq
smuggling oil to other countries, and if so, which countries? In addition,
who is behind the Iraqi oil smuggling?
interested
ARG0
smuggling
ARG1
The
analyst
Iraqi oil
smuggling
ARG0
ARG2
Iraq
ARG1
oil
which
countries
smuggling
ARG0
Iraq
ARG1
is behind
ARG2
oil
other
countries
ARG0
Who
ARG1
the Iraqi oil
smuggling
Example: Results
The analyst is interested in Iraqi oil smuggling. Specifically, is Iraq
smuggling oil to other countries, and if so, which countries? In addition,
who is behind the Iraqi oil smuggling?
1. “The amount of oil smuggled out of Iraq has doubled since
August last year, when oil prices began to increase,”
Gradeck said in a telephone interview Wednesday from
Bahrain.
2. U.S.: Russian Tanker Had Iraqi Oil By ROBERT BURNS, AP
Military Writer WASHINGTON (AP) – Tests of oil samples
(7 of 15
taken from a Russian tanker suspected of violating the U.N.
relevant) embargo on Iraq show that it was loaded with petroleum
products derived from both Iranian and Iraqi crude, two
senior defense officials said.
5. With no American or allied effort to impede the traffic,
between 50,000 and 60,000 barrels of Iraqi oil and fuel
products a day are now being smuggled along the Turkish
route, Clinton administration officials estimate.
Where do we go from here?
• What to index and how to represent it
– Moving to Indri1 allows exact representation of our
predicate structure in the index
• Building a Scenario QA test collection
• Query formulation and relaxation
– Learning or planning strategies
• Ranking retrieved predicate instances
– Aggregating information across documents
• Inference and evidence combination
• Extracting answers from predicate-argument
structure
1. http://www.lemurproject.org
References
•
•
•
•
•
•
•
•
•
•
•
•
Bilotti. Query Expansion Techniques for Question Answering. Masters’ Thesis, MIT.
2004.
Bilotti, et. al. What Works Better for Question Answering: Stemming or Morphological
Query Expansion? IR4QA workshop at SIGIR 2004.
Lin and Demner-Fushman. Automatically Evaluating Answers to Definition
Questions. HLT/EMNLP 2005.
Litkowski, K.C. Question Answering Using XML-Tagged Documents. TREC 2003.
Metzler and Croft. Combining the Language Model and Inference Network
Approaches to Retrieval. Information Processing and Management Special Issue on
Bayesian Networks and Information Retrieval, 40(5), 735-750, 2004.
Metzler, et. al., Indri at TREC 2004: Terabyte Track. TREC 2004.
Moldovan, et. al., Performance issues and error analysis in an open-domain question
answering system, ACM TOIS, vol. 21, no. 2. 2003.
Nyberg, et. al. “Extending the JAVELIN QA System with Domain Semantics”,
Proceedings of the 20th National Conference on Artificial Intelligence (AAAI 2005).
Pradhan, S., et. al. Shallow Semantic Parsing using Support Vector Machines.
HTL/NAACL-2004.
Prager, et. al. Question-answering by predictive annotation. SIGIR 2000.
Vandurme, B. et. al. Towards Light Semantic Processing for Question Answering.
HLT/NAACL 2003.
Voorhees, E. Overview of the TREC 2003 question answering track. TREC 2003.
Download