AnswerBus Question Answering System

advertisement
AnswerBus Question
Answering System
Zhiping Zheng
School of Information, University of Michigan
HLT 2002
ABSTRACT
• AnswerBus is an open-domain question answering
system based on sentence level Web information
retrieval.
• It accepts users’ natural-language questions in
English, German, French, Spanish, Italian and
Portuguese and provides answers in English.
• Five search engines and directories are used to
retrieve relevant Web pages.
• MRR=70.5% for TREC-8’s 200 questions.
Introduction
• Researchers have experimented with QA
systems based on
– closed, pre-tagged corpora
– knowledge bases
– Text REtrieval Conference (TREC) tasks
• Recent open-domain QA systems on WWW:
– LCC, QuASM, IONAUT, START and
Webclopedia
AnswerBus
• Questions:
– in natural language
– in English, German, French, Spanish, Italian
and Portuguese
• Answers:
– from the Web
– via Google, Yahoo, WiseNut, AltaVista, and
Yahoo News
Working process of
AnswerBus
• A simple language recognition module will determine whether the
question is in English,
• If not, AltaVista’s translation tool BabelFish is used to translate it into
English.
3.
extract sentences that potentially contain answers from the
documents
4. rank the answers and return the sentences of top choices with
contextual URL links to the user.
1. select two or three search engines among five for information retrieval
2. contact the search engines and retrieve documents referred at the top
of the hit lists
Search Engine Selection
• Different search engines or directories may suit
different types of questions.
– for current events, Yahoo News may be a better choice
than Google
• Determination
– pre-answer 2000 questions
– record words in each question together with correct
answers returned by each search engine
– given a new query “word1 word2”
word1: (Google, 7 answers), (AltaVista, 4 answers)
word2: (Google, 8 answers), (AltaVista, 6 answers)
– Google (7+8) is chosen this time.
Relevant Document Retrieval
• AnswerBus aims to retrieve enough relevant
documents from search engines within an
acceptable response time.
• The main tasks are to select one or more
appropriate search engines for a specific user
question.
• Then form the queries.
–
–
–
–
Functional words deletion (of, in, …)
Frequently used words deletion
Special words deletion (give me, name one…)
Word form modification (Who did … end? → ended)
Candidate Answer Extraction
• AnswerBus first parses the documents into
sentences and then determines whether sentence
is an answer candidate.
• Two classes of words in a question:
– matching words: words also in the query
– non-matching words: words not in the query
• Filtering
– sentences not matching the following formula are
filtered out.
Filtering Formula
q   Q  1  1
•
• q is the number of matching words in the sentence
• Q is the total number of matching words in the
query
– Ex: if a query is of 3 words long, then only sentences
which match 2 or more words are kept for answer
ranking.
• Sentences which contain no non-matching words
are also dropped.
• Sentences ended with ‘?’ are also dropped.
Answer Ranking
• Other factors:
– the determination of question type and use of a
QA specific dictionary
– named entities extraction
– coreference resolution
• The final score is a combination of the
primary score and the influence of all the
different factors.
Question Type and
QA specific dictionary
• “How far …?” and “How close …?”
– Qtype: DISTANCE
In QA specific dictionary:
– “How close” unit: mile, kilometer, light year,
inch, centimeter,…
– “How far” unit: all above except short unit,
such as inch, centimeter…
Dynamic Named Entities Extraction
• The speed of a normal NE tagging technique
is 100M/hour.
– For one question, 50 HTML documents  1M
bytes needs 36 seconds.
• AnswerBus conducts dynamic named entities
extraction, which extracts only the named
entities that match question types.
Coreference Resolution
• AnswerBus only solves the coreferences in
the adjacent sentences.
– “he”, “they”…
• When this type of coreference is detected,
the later sentence receives part of score
from its previous sentence.
Hit Position and
Search Engine Confidence
• A sentence extracted from the first hit
receives the highest score.
• The score decreases according to the position.
• Documents returned by different search
engines may also receive different scores.
• Redundant sentences from different search
engines are removed.
Evaluation
• Questions: 200 TREC-8 questions
• Comparing systems (via Internet): START,
LCC, IONAUT, and QuASM
• Answers are judged manually.
• In the following table, T refers to Time, and
L the Length of answers.
The Performance of Online Question Answering Systems
Systems
AnswerBus
IONAUT
LCC
QuASM
START
Correct
Correct
NIST Tmax Tmin Tmean
TOP 5
TOP 1
Score
(s)
(s)
(s)
141
120 64.18% 15.06
3.79
7.2
44.88
2.78 12.51
97
75 41.73% 342.52
4.3 44.24
13
7 4.45% 284.29
2.61 20.72
29
29 14.50% 62.07
2.02
9.84
Tstd Lmean
dev
(byte)
3.07
141
6.81
1312
32.63
178
33.92
1766
7.45
Future Work
• Answer generation
– An ideal QA system should be able to extract
the exact answer or summarize the potential
answers.
• QA specific indexing
– instead of general search engines
• New question set
– TREC questions are not designed for Webbased QA systems.
Download