Question Answering at CLEF 2012

advertisement
CLEF 2012, Rome
QA4MRE, Question Answering
for Machine Reading Evaluation
Anselmo Peñas (UNED, Spain)
Eduard Hovy (USC-ISI, USA)
Pamela Forner (CELCT, Italy)
Álvaro Rodrigo (UNED, Spain)
Richard Sutcliffe (U. Limerick, Ireland)
Roser Morante (U. Antwerp, Belgium)
Walter Daelemans (U. Antwerp, Belgium)
Caroline Sporleder (U. Saarland, Germany)
Corina Forascu (UAIC, Romania)
Yassine Benajiba (Philips, USA)
Petya Osenova (Bulgarian Academy of Sciences)
1
Question Answering Track at CLEF
2003
2004
2005
2006
2007 2008
Multiple Language QA Main Task
QA
Tasks
2
Temporal
restrictions
and lists
2009
2010
ResPubliQA
Answer Validation
Exercise (AVE)
Giki
CLEF
Real
Time
QA over Speech
Transcriptions (QAST)
WiQA
WSD
QA
2011
2012
QA4MRE
Negation and Modality
Biomedical
Portrayal
Question
Question
analysis
Passage
Retrieval
0.8
Answer
Extraction
x
0.8
Answer
Answer
Ranking
x
1.0
=
0.64
Along the years, we learnt that the architecture is
one of the main limitations for improving QA
technology
So we bet on a reformulation:
3
Hypothesis generation + validation
Answer
Searching space of
candidate answers
Hypothesis
generation
functions
+
Answer validation
functions
Question
4
We focus on validation …
Is the candidate answer correct?
QA4MRE setting:
Multiple Choice Reading Comprehension Tests

Measure progress in two reading abilities
•Answer questions about a single text
•Capture knowledge from text collections
5
… and knowledge

Why capture knowledge from text
collections?

We need knowledge to understand language


6
The ability of making inferences about texts is
correlated to the amount of knowledge
considered
Texts always omit information we need to
recover
• To build the complete story behind the document
• And be sure about the answer
Text as source of knowledge
Text Collection (background collection)

Set of documents that contextualize the one under
reading (20,000-100,000 docs.)
• We can imagine this done on the fly by the machine
• Retrieval



Big and diverse enough to acquire knowledge
Define a scalable strategy: topic by topic
Reference collection per topic
Background Collections

They must serve to acquire



This is sensitive to occurrence in texts


General facts (with categorization and relevant relations)
Abstractions (such as
Thus, also to the way we create the collection
Key: Retrieve all relevant documents and only them


Classical IR
Interdependence with topic definition
• The topic is defined by the set of queries that produce the
collection
8
Example: Biomedical
Alzheimer’s Disease Literature Corpus
Search PubMed about Alzheimer
Query: (((((("Alzheimer Disease"[Mesh] OR "Alzheimer's disease
antigen"[Supplementary Concept] OR "APP protein, human"[Supplementary
Concept] OR "PSEN2 protein, human"[Supplementary Concept] OR "PSEN1
protein, human"[Supplementary Concept]) OR "Amyloid beta-Peptides"[Mesh])
OR "donepezil"[Supplementary Concept]) OR ("gamma-secretase activating
protein, human"[Supplementary Concept] OR "gamma-secretase activating
protein, mouse"[Supplementary Concept])) OR "amyloid beta-protein (142)"[Supplementary Concept]) OR "Presenilins"[Mesh]) OR "Neurofibrillary
Tangles"[Mesh] OR "Alzheimer's disease"[All Fields] OR "Alzheimer's
Disease"[All Fields] OR "Alzheimer s disease"[All Fields] OR "Alzheimers
disease"[All Fields] OR "Alzheimer's dementia"[All Fields] OR "Alzheimer
dementia"[All Fields] OR "Alzheimer-type dementia"[All Fields] NOT "nonAlzheimer"[All Fields] NOT ("non-AD"[All Fields] AND "dementia"[All Fields])
AND (hasabstract[text] AND English[lang])
9
66,222 abstracts
Questions (Main Task)
Distribution of question types
27 PURPOSE
30 METHOD
36 CAUSAL
36 FACTOID
31 WHICH-IS-TRUE
Distribution of answer types
75 REQUIRE NO EXTRA KNOWLEDGE
46 REQUIRE BACKGROUND KNOWLEDGE
21 REQUIRE INFERENCE
20 REQUIRE GATHERING INFORMATION FROM DIFFERENT
SENTENCES
10
Questions (Biomedical Task)
Question types
1.
Experimental evidence/qualifier
2.
Protein-protein interaction
3.
Gene synonymy relation
4.
Organism source relation
5.
Regulatory relation
6.
Increase (higher expression)
7.
Decrease (reduction)
8.
Inhibition
11
Answer types
Simple: The answer is found
almost verbatim in the paper
Medium: The answer is rephrased
Complex: Require combining
pieces of evidence and inference
They involve a predefined set of
entity types
Main Task
16 test documents, 160 questions, 800 candidate answers
4 Topics
1.
2.
3.
4.
AIDS
Music and Society
Climate Change
Alzheimer (divulgative sources: blogs, web, news, …)
4 Reading tests per topic
Document + 10 questions
5 choices per question
6 Languages
new
English, German, Spanish, Italian, Romanian, Arabic
new
Biomedical Task



Same setting
Scientific language
Focus on one disease: Alzheimer




Alzheimer's Disease Literature Corpus (ADLC)
66,222 abstracts from PubMed
9,500 full articles
Most of them processed:
• Dependency parser GDep (Sagae and Tsujii 2007)
• UMLS-based NE tagger (CLiPS)
• ABNER NE tagger (Settles 2005)
Task on Modality and Negation
Given an event in the text decide whether it is
1.
2.
3.
4.
Asserted (NONE: no negation and no speculation)
Negated (NEG: negation and no speculation)
Speculated but negated (NEGMOD)
Speculated and not negated (MOD)
Is the event present as certain?
Yes
No
Did it happen?
Is it negated?
Yes
No
Yes
NONE
NEG
NEGMOD
No
MOD
Participation
Task
Registered
groups
Participant groups
Submitted Runs
Main
25
11
43
Biomedical
23
7
43
Modality and Negation
3
3
6
Total
51
21
92
~100% increase
15
100
80
60
40
20
0
Participants
Runs
2011
2012
Evaluation and results
QA perspective evaluation
c@1 over all questions (random 0.2)
Best systems Main
Best systems Biomedical
0.65
0.55
0.40
0.47
Reading perspective evaluation
Aggregating results test by test (pass if c@1 > 0.5)
Best systems Main
Best systems Biomedical
Tests passed: 12 / 16 Tests passed: 3 / 4
Tests passed: 6 /16
16
More details during the workshop
Monday 17th Sep.
17:00 - 18:00 Poster Session
Tuesday 18th Sep.
10:40 – 12:40 Invited Talk + Overviews
14:10 – 16:10 Reports from participants (Main + Bio)
16:40 – 17:15 Reports from participants (Mod&Neg)
17:15 – 18:10 Breakout session

17
Thanks!
Download