Question Answering at TREC-10 - Linguistic Modelling Department

advertisement
Open Domain Question
Answering:
Techniques, Resources and
Systems
Bernardo Magnini
Itc-Irst
Trento, Italy
magnini@itc.it
RANLP 2005 - Bernardo Magnini
1
Outline of the Tutorial
I.
II.
III.
Introduction to QA
QA at TREC
System Architecture
- Question Processing
- Answer Extraction
IV.
V.
Answer Validation on the Web
Cross-Language QA
RANLP 2005 - Bernardo Magnini
2
Previous Lectures/Tutorials on QA

Dan Moldovan and Sanda Harabagiu: Question
Answering, IJCAI 2001.

Marteen de Rijke and Bonnie Webber: KnowlwdgeIntensive Question Answering, ESSLLI 2003.

Jimmy Lin and Boris Katz, Web-based Question
Answering, EACL 2004
RANLP 2005 - Bernardo Magnini
3
I.
Introduction to Question
Answering








What is Question Answering
Applications
Users
Question Types
Answer Types
Evaluation
Presentation
Brief history
RANLP 2005 - Bernardo Magnini
4
Query Driven vs Answer Driven
Information Access




What does LASER stand for?
When did Hitler attack Soviet Union?

Using Google we find documents containing the
question itself, no matter whether or not the
answer is actually provided.
Current information access is query driven.
Question Answering proposes an answer driven
approach to information access.
RANLP 2005 - Bernardo Magnini
5
Question Answering

Find the answer to a question in a large collection of
documents


questions (in place of keyword-based query)
answers (in place of documents)
RANLP 2005 - Bernardo Magnini
6
Why Question Answering?
From the Caledonian Star in the Mediterranean –
September 23, 1990 (www.expeditions.com):
Document
collection
On a beautiful early morning the Caledonian Star
approaches Naxos, situated on the east coast of Sicily.
As we anchored and put the Zodiacs into the sea we
enjoyed the great scenery. Under Mount Etna, the
highest volcano in Europe, perches the fabulous town
of Taormina. This is the goal for our morning.
After a short Zodiac ride we embarked our buses with
local guides and went up into the hills to reach the
town of Taormina.
Naxos was the first Greek settlement at Sicily. Soon a
harbor was established but the town was later
destroyed by invaders.[...]
RANLP 2005 - Bernardo Magnini
What
Whereis
Searching
continent
isthe
Naxos?
for:
highest
Etna
Naxos
Taormina
is Taormina
volcano in
in?Europe?
7
Alternatives to Information
Retrieval

Document Retrieval




users submit queries corresponding to their information need
system returns (voluminous) list of full-length documents
it is the responsibility of the users to find their original information need,
within the returned documents
Open-Domain Question Answering (QA)



users ask fact-based, natural language questions
What is the highest volcano in Europe?
system returns list of short answers
… Under Mount Etna, the highest volcano in Europe, perches the fabulous
town …
more appropriate for specific information needs
RANLP 2005 - Bernardo Magnini
8
What is QA?

Find the answer to a question in a large
collection of documents

What is the brightest star visible from Earth?
1. Sirio A is the brightest star visible from Earth even if it is…
2. the planet is 12-times brighter than Sirio, the brightest star in
the sky…
RANLP 2005 - Bernardo Magnini
9
QA: a Complex Problem (1)

Problem: discovery implicit relations among
question and answers
Who is the author of the “Star Spangled Banner”?
…Francis Scott Key wrote the “Star Spangled Banner”
in 1814.
…comedian-actress Roseanne Barr sang her famous
rendition of the “Star Spangled Banner” before …
RANLP 2005 - Bernardo Magnini
10
QA: a Complex Problem (2)

Problem: discovery implicit relations among
question and answers
Which is the Mozart birth date?
…. Mozart (1751 – 1791) ….
RANLP 2005 - Bernardo Magnini
11
QA: a complex problem (3)

Problem: discovery implicit relations among
question and answers
Which is the distance between Naples and Ravello?
“From the Naples Airport follow the sign to Autostrade (green
road sign). Follow the directions to Salerno (A3). Drive for about
6 Km. Pay toll (Euros 1.20). Drive appx. 25 Km. Leave the
Autostrade at Angri (Uscita Angri). Turn left, follow the sign to
Ravello through Angri. Drive for about 2 Km. Turn right
following the road sign "Costiera Amalfitana". Within 100m you
come to traffic lights prior to narrow bridge. Watch not to miss
the next Ravello sign, at appx. 1 Km from the traffic lights. Now
relax and enjoy the views (follow this road for 22 Km). Once in
Ravello ...”.
RANLP 2005 - Bernardo Magnini
12
QA: Applications (1)

Information access:
 Structured
data (databases)
 Semi-structured data (e.g. comment field in
databases, XML)
 Free text

To search over:
 The
Web
 Fixed set of text collection (e.g. TREC)
 A single text (reading comprehension
evaluation)
RANLP 2005 - Bernardo Magnini
13
QA: Applications (2)

Domain independent QA
Domain specific (e.g. help systems)

Multi-modal QA

 Annotated
images
 Speech data
RANLP 2005 - Bernardo Magnini
14
QA: Users

Casual users, first time users
 Understand
the limitations of the system
 Interpretation of the answer returned

Expert users
 Difference
between novel and already
provided information
 User Model
RANLP 2005 - Bernardo Magnini
15
QA: Questions (1)

Classification according to the answer type
questions (What is the larger city …)
 Opinions (What is the author attitude …)
 Summaries (What are the arguments for and
against…)
 Factual

Classification according to the question speech act:
questions (Is it true that …)
 WH questions (Who was the first president …)
 Indirect Requests (I would like you to list …)
 Commands (Name all the presidents …)
 Yes/NO
RANLP 2005 - Bernardo Magnini
16
QA: Questions (2)

Difficult questions
 Why,
How questions require
understanding causality or instrumental
relations
 What questions have little constraint on
the answer type (e.g. What did they do?)
RANLP 2005 - Bernardo Magnini
17
QA: Answers

Long answers, with justification
Short answers (e.g. phrases)
Exact answers (named entities)

Answer construction:


 Extraction:
cut and paste of snippets from
the original document(s)
 Generation: from multiple sentences or
documents
 QA and summarization (e.g. What is this
story about?)
RANLP 2005 - Bernardo Magnini
18
QA: Information Presentation

Interfaces for QA
 Not
just isolated questions, but a dialogue
 Usability and user satisfaction

Critical situations
 Real

time, single answer
Dialog-based interaction
 Speech
input
 Conversational access to the Web
RANLP 2005 - Bernardo Magnini
19
QA: Brief History (1)

NLP interfaces to databases:
 BASEBALL (1961),
LUNAR (1973),
TEAM (1979), ALFRESCO (1992)
 Limitations: structured knowledge and
limited domain

Story comprehension: Shank (1977),
Kintsch (1998), Hirschman (1999)
RANLP 2005 - Bernardo Magnini
20
QA: Brief History (2)

Information retrieval (IR)
 Queries
are questions
 List of documents are answers
 QA is close to passage retrieval
 Well established methodologies (i.e. Text
Retrieval Conferences TREC)

Information extraction (IE):
 Pre-defined
templates are questions
 Filled template are answers
RANLP 2005 - Bernardo Magnini
21
Research Context (1)
Question Answering
Domain specific
Domain-independent
Structured data
Web
Free text
Fixed set
Single
of collections document
Growing interest in QA (TREC, CLEF, NT evaluation campaign).
Recent focus on multilinguality and context aware QA
RANLP 2005 - Bernardo Magnini
22
Research Context (2)
compactness
as compact as possible
Automatic
Summarization
Automatic
Question Answering
answers must be faithful
w.r.t. questions
(correctness) and
compact (exactness)
as faithful as possible
Machine
Translation
faithfulness
RANLP 2005 - Bernardo Magnini
23
II. Question Answering at TREC




The problem simplified
Questions and answers
Evaluation metrics
Approaches
RANLP 2005 - Bernardo Magnini
24
The problem simplified:
The Text Retrieval Conference

Goal


Encourage research in information retrieval based on large-scale
collections
Sponsors
NIST: National Institute of Standards and Technology
 ARDA: Advanced Research and Development Activity
 DARPA: Defense Advanced Research Projects Agency



Since 1999
Participants are research institutes, universities, industries
RANLP 2005 - Bernardo Magnini
25
TREC Questions
Q-1391: How many feet in a mile?
Q-1057: Where is the volcano Mauna Loa?
Q-1071: When was the first stamp issued?
Q-1079: Who is the Prime Minister of Canada?
Q-1268: Name a food high in zinc.
Q-896: Who was Galileo?
Q-897: What is an atom?
Fact-based,
short answer
questions
Definition
questions
Q-711: What tourist attractions are there in Reims?
Q-712: What do most tourists visit in Reims?
Q-713: What attracts tourists in Reims
Q-714: What are tourist attractions in Reims?
RANLP 2005 - Bernardo Magnini
Reformulation
questions
26
Answer Assessment

Criteria for judging an answer






Relevance: it should be responsive to the question
Correctness: it should be factually correct
Conciseness: it should not contain extraneous or irrelevant
information
Completeness: it should be complete, i.e. partial answer should not
get full credit
Simplicity: it should be simple, so that the questioner can read it
easily
Justification: it should be supplied with sufficient context to allow a
reader to determine why this was chosen as an answer to the question
RANLP 2005 - Bernardo Magnini
27
Questions at TREC
Yes/ No
Entity
Definition Opinion/
Procedure/
Explanation
Single
answer
Multiple
answer
Is Berlin the
capital of
Germany?
What is the
Who was
largest city in GalileoÊ?
GermanyÊ?
Name 9
countries that
import
Cuban sugar
RANLP 2005 - Bernardo Magnini
What are the
arguments for and
against prayer in
schoolÊ?
28
Exact Answers


Basic unit of a response: [answer-string, docid] pair
An answer string must contain a complete, exact answer and nothing else.
What is the longest river in the United States?
The following are correct, exact answers
Mississippi,
the Mississippi,
the Mississippi River,
Mississippi River
mississippi
while none of the following are correct exact answers
At 2,348 miles the Mississippi River is the longest river in the US.
2,348 miles; Mississippi
Missipp
Missouri
RANLP 2005 - Bernardo Magnini
29
Assessments

Four possible judgments for a triple
[ Question, document, answer ]




Rigth: the answer is appropriate for the question
Inexact: used for non complete answers
Unsupported: answers without justification
Wrong: the answer is not appropriate for the
question
RANLP 2005 - Bernardo Magnini
30
What is the capital city of New Zealand?
What is the Boston Strangler's name?
What is the world's second largest island?
What year did Wilt Chamberlain score
100 points?
Who is the governor of Tennessee?
What's the name of King Arthur's sword?
When did Einstein die?
What was the name of the plane that dropped
the Atomic Bomb on Hiroshima?
What was the name of FDR's dog?
What day did Neil Armstrong land on the moon?
Who was the first Triple Crown Winner?
When was Lyndon B. Johnson born?
Who was Woodrow Wilson's First Lady?
Where is Anne Frank's diary?
R 1530 XIE19990325.0298 Wellington
R 1490 NYT20000913.0267 Albert DeSalvo
R 1503 XIE19991018.0249 New Guinea
U 1402 NYT19981017.0283 1962
R 1426 NYT19981030.0149 Sundquist
U 1506 NYT19980618.0245 Excalibur
R 1601 NYT19990315.0374 April 18 , 1955
X 1848 NYT19991001.0143 Enola
R 1838 NYT20000412.0164 Fala
R 1674 APW19990717.0042 July 20 , 1969
X 1716 NYT19980605.0423 Barton
R 1473 APW19990826.0055 1908
R 1622 NYT19980903.0086 Ellen
W 1510 NYT19980909.0338 Young Girl
R=Right, X=ineXact, U=Unsupported, W=Wrong
RANLP 2005 - Bernardo Magnini
31
1402: What year did Wilt Chamberlain score 100 points?
DIOGENE: 1962
ASSESMENT: UNSUPPORTED
PARAGRAPH: NYT19981017.0283
Petty's 200 victories, 172 of which came during a 13-year
span between 1962-75, may be as unapproachable as Joe DiMaggio's
56-game hitting streak or Wilt Chamberlain's 100-point game.
RANLP 2005 - Bernardo Magnini
32
1506: What's the name of King Arthur's sword?
ANSWER: Excalibur
PARAGRAPH: NYT19980618.0245
ASSESMENT: UNSUPPORTED
`QUEST FOR CAMELOT,' with the voices of Andrea Carr, Gabriel Byrne,
Cary Elwes, John Gielgud, Jessalyn Gilsig, Eric Idle, Gary Oldman, Bronson
Pinchot, Don Rickles and Bryan White. Directed by Frederik Du Chau (G, 100
minutes). Warner Brothers' shaky entrance into the Disney-dominated
sweepstakes of the musicalized animated feature wants to be a juvenile feminist
``Lion King'' with a musical heart that fuses ``Riverdance'' with formulaic
Hollywood gush. But its characters are too wishy-washy and visually unfocused
to be compelling, and the songs (by David Foster and Carole Bayer Sager) so
forgettable as to be extraneous. In this variation on the Arthurian legend, a
nondescript Celtic farm girl named Kayley with aspirations to be a knight wrests
the magic sword Excalibur from the evil would-be emperor Ruber (a Hulk
Hogan look-alike) and saves the kingdom (Holden).
RANLP 2005 - Bernardo Magnini
33
1848: What was the name of the plane that dropped the
Atomic Bomb on Hiroshima?
DIOGENE: Enola
PARAGRAPH: NYT19991001.0143
ASSESMENT: INEXACT
Tibbets piloted the Boeing B-29 Superfortress Enola Gay,
which dropped the atomic bomb on Hiroshima on Aug. 6, 1945,
causing an estimated 66,000 to 240,000 deaths. He named the plane
after his mother, Enola Gay Tibbets.
RANLP 2005 - Bernardo Magnini
34
1716: Who was the first Triple Crown Winner?
DIOGENE: Barton
PARAGRAPH: NYT19980605.0423
ASSESMENT: INEXACT
Not all of the Triple Crown winners were immortals.
The first, Sir Barton, lost six races in 1918 before his
first victory, just as Real Quiet lost six in a row last year.
Try to find Omaha and Whirlaway on anybody's list of
all-time greats.
RANLP 2005 - Bernardo Magnini
35
1510: Where is Anne Frank's diary?
DIOGENE: Young Girl
PARAGRAPH: NYT19980909.0338
ASSESMENT: WRONG
Otto Frank released a heavily edited version of “B” for its first
publication as “Anne Frank: Diary of a Young Girl” in 1947.
RANLP 2005 - Bernardo Magnini
36
TREC Evaluation Metric:
Mean Reciprocal Rank (MRR)

Reciprocal Rank = inverse of rank at which first
correct answer was found:
[1, 0,5, 0.33, 0.25, 0.2, 0]

MRR: average over all questions
Strict score: unsupported count as incorrect
Lenient score: unsupported count as correct


RANLP 2005 - Bernardo Magnini
37
TREC Evaluation Metrics:
Confidence-Weighted Score (CWS)
Sum for i = 1 to 500 (#-correct-up-to-question i / i)
500
System A:
1C
2W
3C
4C
5W
System B:
1W
2W
3C
4C
5C
(1/1) + ((1+0)/2) + (1+0+1)/3) + ((1+0+1+1)/4) + ((1+0+1+1+0)/5)
5
Total: 0.7
0 + ((0+0)/2) + (0+0+1)/3) + ((0+0+1+1)/4) + ((0+0+1+1+1)/5)
5
Total: 0.29
RANLP 2005 - Bernardo Magnini
38
Evaluation

Best result:
67%

Average over 67 runs: 23%
66%
25%
TREC-8
58%
24%
TREC-9
67%
23%
TREC-10
RANLP 2005 - Bernardo Magnini
39
Main Approaches at TREC

Knowledge-Based
Web-based

Pattern-based

RANLP 2005 - Bernardo Magnini
40
Knowledge-Based Approach

Linguistic-oriented methodology
 Determine
the answer type from question form
 Retrieve small portions of documents
 Find entities matching the answer type category in text
snippets

Majority of systems use a lexicon (usually WordNet)
 To
find answer type
 To verify that a candidate answer is of the correct type
 To get definitions

Complex architecture...
RANLP 2005 - Bernardo Magnini
41
Web-Based Approach
QUESTION
Question
Processing
Component
WEB
Search
Component
Answer
Extraction
Component
RANLP 2005 - Bernardo Magnini
Auxiliary
Corpus
TREC Corpus
ANSWER
42
Patter-Based Approach (1/3)


Knowledge poor
Strategy
Search for predefined patterns of
textual expressions that may be interpreted
as answers to certain question types.
 The presence of such patterns in answer
string candidates may provide evidence of
the right answer.
RANLP 2005 - Bernardo Magnini
43
Patter-Based Approach (2/3)

Conditions
 Detailed
categorization of question types
 Up to 9 types of the “Who” question; 35
categories in total
 Significant number of patterns corresponding to
each question type
 Up to 23 patterns for the “Who-Author” type,
average of 15
 Find multiple candidate snippets and check for the
presence of patterns (emphasis on recall)
RANLP 2005 - Bernardo Magnini
44
Pattern-based approach (3/3)


Example: patterns for definition questions
Question: What is A?
1. <A; is/are; [a/an/the]; X>
...23 correct answers
2. <A; comma; [a/an/the]; X; [comma/period]> …26 correct answers
3. <A; [comma]; or; X; [comma]>
…12 correct answers
4. <A; dash; X; [dash]>
…9 correct answers
5. <A; parenthesis; X; parenthesis>
…8 correct answers
6. <A; comma; [also] called; X [comma]>
…7 correct answers
7. <A; is called; X>
…3 correct answers
total:
88 correct answers
RANLP 2005 - Bernardo Magnini
45
Use of answer patterns
1.
2.
For generating queries to the search engine.
How did Mahatma Gandhi die?
Mahatma Gandhi die <HOW>
Mahatma Gandhi die of <HOW>
Mahatma Gandhi lost his life in <WHAT>
The TEXTMAP system (ISI) uses 550 patterns, grouped in 105
equivalence blocks. On TREC-2003 questions, the system produced,
on average, 5 reformulations for each question.
For answer extraction
When was Mozart born?
P=1
<PERSON> (<BIRTHDATE> - DATE)
P=.69
<PERSON> was born on <BIRTHDATE>
RANLP 2005 - Bernardo Magnini
46
Acquisition of Answer Patterns
Relevant approaches:
 Manually developed surface pattern library (Soubbotin, Soubbotin,
2001)
 Automatically extracted surface patterns (Ravichandran, Hovy 2002)
1.
2.
3.
4.
5.
Patter learning:
Start with a seed, e.g. (Mozart, 1756)
Download Web documents using a search engine
Retain sentences that contain both question and answer terms
Construct a suffix tree for extracting the longest matching substring that
spans <Question> and <Answer>
Calculate precision of patterns
Precision = # of correct patterns with correct answer / # of total patterns
RANLP 2005 - Bernardo Magnini
47
Capturing variability with
patterns

Pattern based QA is more effective when supported by variable typing
obtained using NLP techniques and resources.
When was <A> born?
<A:PERSON> (<ANSWER:DATE> <A :PERSON > was born in <ANSWER :DATE >


Surface patterns can not deal with word reordering and apposition phrases:
Galileo, the famous astronomer, was born in …
The fact that most of the QA systems use syntactic parsing demonstrates that
the successful solution of the answer extraction problem goes beyond the
surface form analysis
RANLP 2005 - Bernardo Magnini
48
Syntactic answer patterns (1)
Answer patterns that capture the syntactic
relations of a sentence.
When was <A> invented?
S
NP
The
VP
<A>
was invented
PP
in
RANLP 2005 - Bernardo Magnini
<ANSWER>
49
Syntactic answer patterns (2)
The matching phase turns out to be a problem
of partial match among syntactic trees.
S
NP
The
first
VP
phonograph
was invented
PP
in
1877
RANLP 2005 - Bernardo Magnini
50
III. System Architecture

Knowledge Based approach
 Question
Processing
 Search component
 Answer Extraction
RANLP 2005 - Bernardo Magnini
51
Knowledge based QA
QUESTION
TOKENIZATION &
POS TAGGING
MULTIWORDS
RECOGNITION
ANSWER
Document
collection
ANSWER
IDENTIFICATION
QUESTION PARSING
WORD SENSE
DISAMBIGUATION
ANSWER TYPE
IDENTIFICATION
KEYWORDS
EXPANSION
Question Processing
Component
ANSWER
VALIDATION
SEARCH ENGINE
NAMED ENTITIES
RECOGNITION
QUERY
COMPOSITION
Search
Component
RANLP 2005 - Bernardo Magnini
PARAGRAPH
FILTERING
Answer Extraction52
Component
Question Analysis (1)


Input: NLP question
Output:
 query
for the search engine (i.e. a boolean
composition of weighted keywords)
 Answer type
 Additional constraints: question focus,
syntactic or semantic relations that should
hold for a candidate answer entity and other
entities
RANLP 2005 - Bernardo Magnini
53
Question Analysis (2)

Steps:
1.
2.
3.
4.
5.
6.
7.
8.
Tokenization
POS-tagging
Multi-words recognition
Parsing
Answer type and focus identification
Keyword extraction
Word Sense Disambiguation
Expansions
RANLP 2005 - Bernardo Magnini
54
Tokenization and POS-tagging
NL-QUESTION: Who was the inventor of the electric light?
Who
was
the
inventor
of
the
electric
light
?
Who
be
det
inventor
of
det
electric
light
?
CCHI
VIY
RS
SS
ES
RS
AS
SS
XPS
RANLP 2005 - Bernardo Magnini
[0,0]
[1,1]
[2,2]
[3,3]
[4,4]
[5,5]
[6,6]
[7,7]
[8,8]
55
Multi-Words recognition
NL-QUESTION: Who was the inventor of the electric light?
Who
was
the
inventor
of
the
electric_light
?
Who
be
det
inventor
of
det
electric_light
?
CCHI
VIY
RS
SS
ES
RS
SS
XPS
RANLP 2005 - Bernardo Magnini
[0,0]
[1,1]
[2,2]
[3,3]
[4,4]
[5,5]
[6,7]
[8,8]
56
Syntactic Parsing
Identify syntactic structure of a
sentence

 noun
phrases (NP), verb phrases (VP),
prepositional phrases (PP) etc.
Why did David Koresh ask the FBI for a word processor?
SBARQ
SQ
VP
PP
WHADVP
WRB VBD
Why
did
NP
NP
NP
NNP
NNP
VB
DT NNP IN
David
Koresh
ask
the
FBI
RANLP 2005 - Bernardo Magnini
for
DT
a
NN
NN
word processor
57
Answer Type and Focus


Focus is the word that expresses the relevant entity in the question
 Used to select a set of relevant documents
 ES: Where was Mozart born?
Answer Type is the category of the entity to be searched as answer

PERSON, MEASURE, TIME PERIOD, DATE, ORGANIZATION,
DEFINITION

ES: Where was Mozart born?

LOCATION
RANLP 2005 - Bernardo Magnini
58
Answer Type and Focus
What famous communist leader died in Mexico City?
RULENAME: WHAT-WHO
TEST: [“what” [¬ NOUN]* [NOUN:person-p]J +]
OUTPUT: [“PERSON” J]
Answer type: PERSON
Focus: leader
This rule matches any question starting with what,
whose first noun, if any, is a person (i.e. satisfies the
person-p predicate)
RANLP 2005 - Bernardo Magnini
59
Keywords Extraction
NL-QUESTION: Who was the inventor of the electric light?
Who
was
the
inventor
of
the
electric_light
?
Who
be
det
inventor
of
det
electric_light
?
CCHI
VIY
RS
SS
ES
RS
SS
XPS
RANLP 2005 - Bernardo Magnini
[0,0]
[1,1]
[2,2]
[3,3]
[4,4]
[5,5]
[6,7]
[8,8]
60
Word Sense Disambiguation
What is the brightest star visible from Earth?”
STAR
star#1: celestial body
star#2: an actor who play …
ASTRONOMY
ART
BRIGHT
bright #1: bright brilliant shining
bright #2: popular glorious
bright #3: promising auspicious
PHYSICS
GENERIC
GENERIC
VISIBLE
visible#1: conspicuous obvious
visible#2: visible seeable
PHYSICS
ASTRONOMY
EARTH
earth#1: Earth world globe
earth #2: estate land landed_estate acres
earth #3: clay
earth #4: dry_land earth solid_ground
earth #5: land ground soil
earth #6: earth ground
ASTRONOMY
ECONOMY
GEOLOGY
GEOGRAPHY
GEOGRAPHY
GEOLOGY
RANLP 2005 - Bernardo Magnini
61
Expansions
- NL-QUESTION:
- BASIC-KEYWORDS:
Who was the inventor of the electric light?
inventor electric-light
inventor
synonyms
derivation
derivation
discoverer, artificer
invention
synonyms
innovation
synonyms
excogitate
invent
electric_light
synonyms
incandescent_lamp, ligth_bulb
RANLP 2005 - Bernardo Magnini
62
Keyword Composition


Keywords and expansions are composed in a
boolean expression with AND/OR operators
Several possibilities:
 AND
composition
 Cartesian composition
(OR (inventor AND electric_light)
OR (inventor AND incandescent_lamp)
OR (discoverer AND electric_light)
…………………………
OR inventor OR electric_light))
RANLP 2005 - Bernardo Magnini
63
Document Collection Pre-processing

For real time QA applications off-line pre-processing
of the text is necessary
 Term
indexing
 POS-tagging
 Named Entities Recognition
RANLP 2005 - Bernardo Magnini
64
Candidate Answer Document
Selection


Passage Selection: Individuate relevant, small, text
portions
Given a document and a list of keywords:
 Paragraph
length (e.g. 200 words)
 Consider the percentage of keywords present in the
passage
 Consider if some keyword is obligatory (e.g. the focus
of the question).
RANLP 2005 - Bernardo Magnini
65
Candidate Answer Document Analysis


Passage text tagging
Named Entity Recognition
Who is the author of the “Star Spangled Banner”?
…<PERSON>Francis Scott Key </PERSON> wrote the
“Star Spangled Banner” in <DATE>1814</DATE>

Some systems:
 passages
parsing (Harabagiu, 2001)
 Logical form (Zajac, 2001)
RANLP 2005 - Bernardo Magnini
66
Answer Extraction (1)

Who is the author of the “Star Spangled Banner”?
…<PERSON>Francis Scott Key </PERSON> wrote the “Star Spangled
Banner” in <DATE>1814</DATE>
Answer Type = PERSON
Candidate Answer = Francis Scott Key
Ranking candidate answers: keyword density in the passage, apply additional
constraints (e.g. syntax, semantics), rank candidates using the Web
RANLP 2005 - Bernardo Magnini
67
Answer Identification
Thomas E.
Edison
RANLP 2005 - Bernardo Magnini
68
IV. Answer Validation


Automatic answer validation
Approach:
web-based
 use of patterns
 combine statistics and linguistic
information



Discussion
Conclusions
RANLP 2005 - Bernardo Magnini
69
QA Architecture
QUESTION
TOKENIZATION &
POS TAGGING
ANSWER
Document
collection
QUESTION PARSING
ANSWER
RANKING
ANSWER
IDENTIFICATION
WORD SENSE
DISAMBIGUATION
SEARCH ENGINE
NAMED ENTITIES
RECOGNITION
ANSWER TYPE
IDENTIFICATION
KEYWORDS
EXPANSION
Question Processing
Component
QUERY
COMPOSITION
Search
Component
RANLP 2005 - Bernardo Magnini
PARAGRAPH
FILTERING
Answer Extraction70
Component
The problem: Answer Validation
Given a question q and a candidate answer a,
decide if a is a correct answer for q
What is the capital of the USA?
Washington D.C.
San Francisco
Rome
RANLP 2005 - Bernardo Magnini
71
The problem: Answer Validation
Given a question q and a candidate answer a,
decide if a is a correct answer for q
What is the capital of the USA?
Washington D.C.
San Francisco
Rome
correct
wrong
wrong
RANLP 2005 - Bernardo Magnini
72
Requirements for Automatic AV



Accuracy: it has to compare well with respect to
human judgments
Efficiency: large scale (Web), real time scenarios
Simplicity: avoid the complexity of QA systems
RANLP 2005 - Bernardo Magnini
73
Approach

Web-based


Pattern-based


take advantage of Web redundancy
the Web is mined using patterns (i.e. validation
patterns) extracted from the question and the
candidate answer
Quantitative (as opposed to content-based)

check if the question and the answer tend to appear
together in the Web considering the number of
documents returned (i.e. documents are not
downloaded)
RANLP 2005 - Bernardo Magnini
74
Web Redundancy
What is the capital of the USA?
Washington
Capital Region USA: Fly-Drive Holidays in and Around
Washington D.C.
the Insider’s Guide to the Capital Area Music Scene
(Washington D.C., USA).
The Capital Tangueros (Washington DC Area, USA)
I live in the Nations’s Capital, Washington Metropolitan
Area (USA)
In 1790 Capital (also USA’s capital): Washington D.C.
Area: 179 square km
RANLP 2005 - Bernardo Magnini
75
Validation Pattern
Capital Region USA: Fly-Drive Holidays in and Around
Washington D.C.
the Insider’s Guide to the Capital Area Music Scene
(Washington D.C., USA).
The Capital Tangueros (Washington DC Area, USA)
I live in the Nations’s Capital, Washington Metropolitan
Area (USA)
In 1790 Capital (also USA’s capital): Washington D.C.
Area: 179 square km
[Capital NEAR USA NEAR Washington]
RANLP 2005 - Bernardo Magnini
76
Related Work

Pattern-based QA




Use of the Web for QA



Brill, 2001 – TREC-10
Subboting, 2001 – TREC-10
Ravichandran and Hovy, ACL-02
Clarke et al. 2001 – TREC-10
Radev, et al. 2001 - CIKM
Statistical approach on the Web

PMI-IR: Turney, 2001 and ACL-02
RANLP 2005 - Bernardo Magnini
77
Architecture
question
candidate answer
validation pattern
filtering
answer validity score
#doc < k
>t
#doc
<t
wrong
correct answer
RANLP 2005 - Bernardo
Magninianswer
78
Architecture
question
candidate answer
validation pattern
filtering
answer validity score
#doc < k
>t
#doc
<t
wrong
correct answer
RANLP 2005 - Bernardo
Magninianswer
79
Extracting Validation Patterns
question
candidate answer
answer type
stop-word filter
named entity
recognition
term expansion
question pattern (Qp)
stop-word
filter
answer pattern (Ap)
validation
pattern
RANLP 2005 - Bernardo Magnini
80
Architecture
question
candidate answer
validation pattern
filtering
answer validity score
#doc < k
>t
#doc
<t
wrong
correct answer
RANLP 2005 - Bernardo
Magninianswer
81
Answer Validity Score

PMI-IR algorithm (Turney, 2001)
P(Qp, Ap)
PMI (Qp, Ap) =

P(Qp) * P(Ap)
The result is interpreted as evidence that the
validation pattern is consistent, which imply
answer accuracy
RANLP 2005 - Bernardo Magnini
82
Answer Validity Score
PMI (Qp, Ap) =

hits(Qp NEAR Ap)
hits(Qp) * hits(Ap)
Three searches are submitted to the Web:
hits(Qp)
hits(Ap)
hits(Qp NEAR Ap)
RANLP 2005 - Bernardo Magnini
83
Example
A1= The Stanislaus County district attorney’s
A2 = In Modesto, San Francisco, and
What county is Modesto, California in?
Stop-word
filter
Answer type: Location
Qp = [county NEAR Modesto NEAR California]
P(Qp) = P(county, Modesto, California) =
RANLP 2005 - Bernardo Magnini
909
8
3 *10
84
Example (cont.)
The Stanislaus County
In Modesto, San
district attorney’s
Francisco, and
NER(location)
A1p = [Stanislaus]
A2p = [San Francisco]
73641
P(Stanislaus)=
8
3 *10
4072519
P(San Francisco)=
8
3 *10
RANLP 2005 - Bernardo Magnini
85
Example (cont.)
The Stanislaus County
district attorney’s
In Modesto, San
Francisco, and
552
P(Qp, A1p) =
8
3 *10
11
P(Qp, A2p) =
8
3 *10
PMI(Qp, A1p) = 2473
PMI(Qp, A2p) = 0.89
t = 0.2 * MAX(AVS)
>t
<t
wrong answer
correct answer
RANLP 2005 - Bernardo Magnini
86
Experiments


Data set:
 492 TREC-2001 questions
 2726 answers: 3 correct answers and 3 wrong
answers for each question, randomly selected
from TREC-10 participants human-judged
corpus
Search engine: Altavista
 allows the NEAR operator
RANLP 2005 - Bernardo Magnini
87
Experiment: Answers
Q-916: What river in the US is known as the Big Muddy ?
 The Mississippi
 Known as Big Muddy, the Mississippi is the longest
 as Big Muddy, the Mississippi is the longest
 messed with. Known as Big Muddy, the Mississip
 Mississippi is the longest river in the US
 the Mississippi is the longest river(Mississippi)
 has brought the Mississippi to its lowest
 ipes.In Life on the Mississippi,Mark Twain wrote t
 Southeast;Mississippi;Mark Twain; officials began
 Known; Mississippi; US; Minnesota; Gulf Mexico
 Mud Island,;Mississippi;”The;--history,;Memphis
RANLP 2005 - Bernardo Magnini
88
Baseline


Consider the documents provided by NIST to
TREC-10 participants (1000 documents for each
question)
If the candidate answer occurs (i.e. string match)
at least one time in the top 10 documents it is
judged correct, otherwise it is considered wrong
RANLP 2005 - Bernardo Magnini
89
Asymmetrical Measures


Problem: some candidate answers (e.g. numbers)
produce an enormous amount of Web documents
Scores for good (Ac) and bad (Aw) answers tend to
be similar, making the choice more difficult
PMI(q, ac) =~ PMI (q, aw)
How many Great Lakes are there?
… to cross all five Great Lakes completed a 19.2 …
RANLP 2005 - Bernardo Magnini
90
Asymmetric Conditional
Probability (ACP)
P(Qsp | Asp)
ACP (Qsp, Asp) =
=
P(Qsp) * P(Asp)
2/3
hits(Qsp NEAR Asp)
hits(Qsp) * hits(Asp)
RANLP 2005 - Bernardo Magnini
2/3
91
Comparing PMI and ACP
PMI (Great Lakes, five)  0.036
 1.8
PMI (Great Lakes,19.2)  0.02
ACP(Great Lakes, five)  0.015
 5.17
ACP(Great Lakes,19.2)  0.0029
ACP increases the difference between the right and the
wrong answer.
RANLP 2005 - Bernardo Magnini
92
Results
SR on all 492
TREC-2001
questions
SR on all 249
factoid
questions
Absolute
threshold
Relative
threshold
Baseline MLHR PMI
ACP
52.9
77.4
77.7
78.4
52.9
79.6
79.5
81.2
Baseline MLHR PMI
ACP
Absolute
82.1
threshold
Relative
84.4
RANLP 2005 - Bernardo Magnini
threshold
83.3
83.3
84.9
86.3
93
Discussion (1)

Definition questions are the more problematic




on the subset of 249 named-entities questions
success rate is higher (i.e. 86.3)
Relative threshold improve performance (+ 2%)
over fixed threshold
Non symmetric measures of co-occurrence work
better for answer validation (+ 2%)
Source of errors:



Answer type recognition
Named-entities recognition
TREC answer set (e.g. tokenization)
RANLP 2005 - Bernardo Magnini
94
Discussion (2)


Automatic answer validation is a key challenge for
Web-based question/answering systems
Requirements:
accuracy with respect to human judgments: 80%
success rate is a good starting point
 efficiency: documents are not downloaded
 simplicity: based on patterns


At present, it is suitable for a generate&test
component integrated in a QA system
RANLP 2005 - Bernardo Magnini
95
V. Cross-Language QA




Motivations
QA@CLEF
Performances
Approaches
RANLP 2005 - Bernardo Magnini
96
Motivations




Answers may be found in languages different from
the language of the question.
Interest in QA systems for languages other than
English.
Force the QA community to design real multilingual
systems.
Check/improve the portability of the technologies
implemented in current English QA systems.
RANLP 2005 - Bernardo Magnini
97
Cross-Language QA
Quanto è alto il Mont Ventoux?
(How tall is Mont Ventoux?)
Italian corpus
French corpus
English corpus
Spanish corpus
“Le Mont Ventoux, impérial avec ses 1909 mètres et sa
tour blanche telle un étendard, règne de toutes …”
1909 metri
RANLP 2005 - Bernardo Magnini
98
CL-QA at CLEF




Adopt the same rules used at TREC QA
 Factoid questions (i.e. no definition questions)
 Exact answers + document id
Use the CLEF corpora (news, 1994 -1995)
Return the answer in the language of the text collection in
which it has been found (i.e. no translation of the answer)
QA-CLEF-2003 was an initial step toward a more complex
task organized at CLEF-2004 and 2005.
RANLP 2005 - Bernardo Magnini
99
QA @ CLEF 2004
(http://clef-qa.itc.it/2004)
Seven groups coordinated the QA track:
- ITC-irst (IT and EN test set preparation)
- DFKI (DE)
- ELDA/ELRA (FR)
- Linguateca (PT)
- UNED (ES)
- U. Amsterdam (NL)
- U. Limerick (EN assessment)
Two more groups participated in the test set construction:
- Bulgarian Academy of Sciences (BG)
- U. Helsinki (FI)
RANLP 2005 - Bernardo Magnini
100
CLEF QA - Overview
question generation (2.5 p/m per group)
document
collections
100 monolingual
Q&A pairs with EN
translation
700 Q&A pairs in
1 language + EN
translation
EN => 7 languages
Multieight-04
XML collection
of 700 Q&A
in 8 languages
selection of
additional
80 + 20 questions
IT
FR
evaluation
(2 p/d for 1 run)
manual
assessment
NL
ES
…
Exercise (10-23/5)
systems’
answers
experiments
(1 week window)
extraction of
plain text test sets
RANLP 2005 - Bernardo Magnini
101
CLEF QA – Task Definition
Given 200 questions in a source language, find one exact answer per
question in a collection of documents written in a target language, and
provide a justification for each retrieved answer (i.e. the docid of the unique
document that supports the answer).
T
S
DE
EN
ES
FR
IT
NL
BG
DE
PT
6 monolingual and
50 bilingual tasks.
Teams participated
in 19 tasks,
EN
ES
FI
FR
IT
NL
PT
RANLP 2005 - Bernardo Magnini
102
CLEF QA - Questions
All the test sets were made up of 200 questions:
- ~90% factoid questions
- ~10% definition questions
- ~10% of the questions did not have any answer in the corpora (right answerstring was “NIL”)
Problems in introducing definition questions:
 What’s the right answer? (it depends on the user’s model)
 What’s the easiest and more efficient way to assess their answers?
 Overlap with factoid questions:
F Who is the Pope?
D Who is John Paul II?
the Pope
John Paul II
the head of the Roman Catholic Church
RANLP 2005 - Bernardo Magnini
103
CLEF QA – Multieight
<q cnt="0675" category="F" answer_type="MANNER">
<language val="BG" original="FALSE">
<question group="BTB">Как умира Пазолини?</question>
<answer n="1" docid="">TRANSLATION[убит]</answer>
</language>
<language val="DE" original="FALSE">
<question group="DFKI">Auf welche Art starb Pasolini?</question>
<answer n="1" docid="">TRANSLATION[ermordet]</answer>
<answer n="2" docid="SDA.951005.0154">ermordet</answer>
</language>
<language val="EN" original="FALSE">
<question group="LING">How did Pasolini die?</question>
<answer n="1" docid="">TRANSLATION[murdered]</answer>
<answer n="2" docid="LA112794-0003">murdered</answer>
</language>
<language val="ES" original="FALSE">
<question group="UNED">¿Cómo murió Pasolini?</question>
<answer n="1" docid="">TRANSLATION[Asesinado]</answer>
<answer n="2" docid="EFE19950724-14869">Brutalmente asesinado en los arrabales de Ostia</answer>
</language>
<language val="FR" original="FALSE">
<question group="ELDA">Comment est mort Pasolini ?</question>
<answer n="1" docid="">TRANSLATION[assassiné]</answer>
<answer n="2" docid="ATS.951101.0082">assassiné</answer>
<answer n="3" docid="ATS.950904.0066">assassiné en novembre 1975 dans des circonstances mystérieuses</answer>
<answer n="4" docid="ATS.951031.0099">assassiné il y a 20 ans</answer>
</language>
<language val="IT" original="FALSE">
<question group="IRST">Come è morto Pasolini?</question>
<answer n="1" docid="">TRANSLATION[assassinato]</answer>
<answer n="2" docid="AGZ.951102.0145">massacrato e abbandonato sulla spiaggia di Ostia</answer>
</language>
<language val="NL" original="FALSE">
<question group="UoA">Hoe stierf Pasolini?</question>
<answer n="1" docid="">TRANSLATION[vermoord]</answer>
<answer n="2" docid="NH19951102-0080">vermoord</answer>
</language>
<language val="PT" original="TRUE">
<question group="LING">Como morreu Pasolini?</question>
<answer n="1" docid="LING-951120-088">assassinado</answer>
</language>
</q>
RANLP 2005 - Bernardo Magnini
104
CLEF QA - Assessment
Judgments taken from the TREC QA tracks:
- Right
- Wrong
- ineXact
- Unsupported
Other criteria, such as the length of the answer-strings (instead of X, which
is underspecified) or the usefulness of responses for a potential user, have
not been considered.
Main evaluation measure was accuracy (fraction of Right responses).
Whenever possible, a Confidence-Weighted Score was calculated:
CWS =
1
Q
Q
in first i ranks
 number of correct responses
i
i=1
RANLP 2005 - Bernardo Magnini
105
Evaluation Exercise - Participants
Distribution of participating groups in different QA evaluation campaigns.
America
Europe
Asia
Australia
TOTAL
submitted
runs
TREC-8
13
3
3
1
20
46
TREC-9
14
7
6
-
27
75
TREC-10
19
8
8
-
35
67
TREC-11
16
10
6
-
32
67
TREC-12
13
8
4
-
25
54
NTCIR-3 (QAC-1)
1
-
15
-
16
36
CLEF 2003
3
5
-
-
8
17
CLEF 2004
1
17
-
-
18
48
RANLP 2005 - Bernardo Magnini
106
Evaluation Exercise - Participants
Number of participating teams-number of submitted runs at CLEF 2004.
Comparability issue.
T
S
DE
BG
DE
2 -2
EN
ES
FR
1-1
1-2
2-3
1-2
IT
1-2
EN
5-8
ES
NL
PT
1-1
1-2
FI
1-1
FR
3-6
1-2
IT
1-2
1-2
NL
1-2
PT
1-2
2-3
RANLP 2005 - Bernardo Magnini
1-2
2-3
107
Evaluation Exercise - Results
Systems’ performance at the TREC and CLEF QA tracks.
accuracy (%)
best system
83
average
70
70
67
65
45.5
41.5
35
25
35
29
24
23
22
TREC-8 TREC-9 TREC-10 TREC-11 TREC-12*
* considering only the 413 factoid questions
23.7
21.4
17
CLEF-2003**
monol.
bil.
14.7
CLEF-2004
monol.
bil.
** considering only the answers returned at the first rank
RANLP 2005 - Bernardo Magnini
108
Evaluation Exercise – CL Approaches
U. Amsterdam
U. Edinburgh
U. Neuchatel
INPUT (source language)
question translation
into target language
Question
Analysis /
keyword
extraction
translation of
retrieved data
Candidate
Document
Selection
Document
Collection
Bulg. Ac. of Sciences
ITC-Irst
U. Limerick
U. Helsinki
DFKI
LIMSI-CNRS
OUTPUT (target language)
Candidate
Document
Analysis
Document
Collection
Preprocessing
Answer
Extraction
Preprocessed
Documents
RANLP 2005 - Bernardo Magnini
109
Discussion on Cross-Language QA
CLEF multilingual QA track (like TREC QA) represents a formal evaluation,
designed with an eye to replicability. As an exercise, it is an abstraction
of the real problems.
Future challenges:
• investigate QA in combination with other applications (for instance
summarization)
• access not only free text, but also different sources of data (multimedia,
spoken language, imagery)
• introduce automated evaluation along with judgments given by humans
• focus on user’s need: develop real-time interactive systems, which means
modeling a potential user and defining suitable answer types.
RANLP 2005 - Bernardo Magnini
110
References











Books
Pasca, Marius, Open Domain Question Answering from Large Text Collections, CSLI, 2003.
Maybury, Mark (Ed.), New Directions in Question Answering, AAAI Press, 2004.
Journals
Hirshman, Gaizauskas. Natural Language question answering: the view from here. JNLE, 7
(4), 2001.
TREC
E. Voorhees. Overview of the TREC 2001 Question Answering Track.
M.M. Soubbotin, S.M. Soubbotin. Patterns of Potential Answer Expressions as Clues to the
Right Answers.
S. Harabagiu, D. Moldovan, M. Pasca, M. Surdeanu, R. Mihalcea, R. Girju, V. Rus, F.
Lacatusu, P. Morarescu, R. Brunescu. Answering Complex, List and Context questions with
LCC’s Question-Answering Server.
C.L.A. Clarke, G.V. Cormack, T.R. Lynam, C.M. Li, G.L. McLearn. Web Reinforced
Question Answering (MultiText Experiments for TREC 2001).
E. Brill, J. Lin, M. Banko, S. Dumais, A. Ng. Data-Intensive Question Answering.
RANLP 2005 - Bernardo Magnini
111
References

Workshop Proceedings

H. Chen and C.-Y. Lin, editors. 2002. Proceedings of the Workshop on
Multilingual Summarization and Question Answering at COLING-02,
Taipei, Taiwan.
M. de Rijke and B. Webber, editors. 2003. Proceedings of the Workshop on
Natural Language Processing for Question Answering at EACL-03,
Budapest, Hungary.
R. Gaizauskas, M. Hepple, and M. Greenwood, editors. 2004. Proceedings
of the Workshop on Information Retrieval for Question Answering at SIGIR04, Sheffield, United Kingdom.


RANLP 2005 - Bernardo Magnini
112
References





N. Kando and H. Ishikawa, editors. 2004. Working Notes of the 4th NTCIR
Workshop Meeting on Evaluation of Information Access Technologies:
Information Retrieval, Question Answering and Summarization (NTCIR04), Tokyo, Japan.
M. Maybury, editor. 2003. Proceedings of the AAAI Spring Symposium on
New Directions in Question Answering, Stanford, California.
C. Peters and F. Borri, editors. 2004. Working Notes of the 5th CrossLanguage Evaluation Forum (CLEF-04), Bath, United Kingdom.
J. Pustejovsky, editor. 2002. Final Report of the Workshop on
TERQAS: Time and Event Recognition in Question Answering
Systems, Bedford, Massachusetts.
Y. Ravin, J. Prager and S. Harabagiu, editors. 2001. Proceedings of
the Workshop on Open-Domain Question Answering at ACL-01,
Toulouse, France.
RANLP 2005 - Bernardo Magnini
113
Download