slides - Rensselaer Polytechnic Institute

advertisement
Why Read if You Can Scan?
Trigger Scoping Strategy for Biographical Fact Extraction
and Chin-Yew
Polytechnic Institute 2Peking University
3 Microsoft
1. Task Introduction
3. Triggers in Extraction
• Biographical Fact Extraction:
Given a query, extract the values for given biographical
fact types.
• The Definition of Triggers:
The smallest extent of a text which most clearly expresses
an event occurrence or indicates a relation type.
born is an important trigger for
extracting birth-related facts.
Query:
Colin Firth
Date of Birth: 9/10/1960
Place of Birth: Grayshott,
Hampshire,
England
Spouse:
Livia Firth
Children:
Will Firth,
Luca Firth,
Matteo Firth
Siblings:
Kate Firth,
Jonathan Firth
• Should we read every word of the relative
documents for facts?
• The Importance of Triggers:
94.36% of the biographical facts are mentioned in a
sentence containing indicative triggers (KBP SF 2012
corpus).
• Trigger Scope:
The shortest fragment that is related to a trigger.
Our observation:
Each fact –specific trigger has its own scope and its
corresponding facts seldom appear outside of its scope.
the scope of “graduated”, a trigger for education-related facts
She <graduated> from Barnard in 1965 and soon
began teaching English at Chesterbrook Academy in
Pennsylvania.
Using scanning strategy, we only need to focus on the following segment:
She graduated from Barnard in 1965
2. Learn from Scanning Strategy
• Four Steps in implementing scanning strategy
----------------------------------------------------------------------------1) Keep in mind what you are searching for: e.g.,
KEYWORDs.
2) Anticipate in what form the information –
number, proper nouns, etc.
3) Let your eyes
run rapidly over several lines
of print at a same time until KEYWORDs are found.
4) Closer reading can occur.
-----------------------------------------------------------------------------
What are the “keywords” in our task?
• Trigger-based extraction process following the
scanning strategy
----------------------------------------------------------------------------1) Let the computer know the query and the fact type
to be extracted.
2) Set the type constrains the candidate answer
should satisfy- person, organization, GPE, etc.
3) Locate all the triggers of the given fact type and
recognize their respective scopes.
4) Within each scope, extract candidate answers
which satisfy the entity type constraint set in 2).
-----------------------------------------------------------------------------
4. Scope Identification
• Rule-based Method:
Left boundary: trigger
Right boundary: verb or trigger with other fact types
Paul Francis Conrad and his twin <brother>, James,
were <born> in Cedar Rapids, Iowa, on June 27, 1924,
<sons> of Robert H. Conrad and Florence Lawler Conrad.
Triggers are marked by angle brackets (<>) and the scopes of triggers are
colored in different colors.
RESEARCH POSTER PRESENTATION DESIGN © 2012
www.PosterPresentations.com
3
Lin
Research Asia
• Supervised Classification
For each detected trigger, we perform a binary
classification of each token in the sentence as to
whether it is within or outside of the scope of a trigger.
-----------------Features used to train a classifier------------------Position: the feature takes value 1 if the word appears
before the trigger
Distance: the distance (in words) between the word and
the trigger
POS: POS tags of the word and the trigger
Name Entity: the name entity type of the word
Interrupt: the feature takes value 1 if there is a verb or a
trigger with other fact type between trigger and the word.
• Trigger Mining
We mine fact-specific triggers from existing patterns (NYU,
RPI, PRIS Slot Filling (SF) systems) and ground truth
sentences from SF 2012 corpus.
5. Experiments
• Data
Development set: KBP SF 2012 corpus
Testing set: KBP 2013 corpus
Table 1: scope identification results.
95
96
90
94
F-score (%)
Sujian
2
Li
92
90
88
Rule
SVMs
86
birth sidence ucation
re
ed
ly
h
deat
fami
85
80
Rule
SVMs
75
e
ly
fami residenc
birth
h
n
deat ducatio
e
Table 2: performance on KBP 2013
Fact
Group
Birth
Fact Type
100
90
place_of_birth
80
date_of_birth
Death
place_of_death
date_of_death
Residence place_of_residence
Education school_attended
Family
parents
sibling
spouse
children
other_family
F-score (%)
1Rensselaer
Heng
1
Ji ,
Accuracy (%)
Dian
1
Yu ,
70
60
50
40
state-of-the-art
Rule
SVMs
30
ly nce use ren nts ling ath irth ded ath irth
i
m de po ild re ib de _b en de _b
a
f
i s ch pa s f_ of tt f_ of
_
s
r
o e_ l_a _o e_
e
e
r
_
h
te lac oo ace dat
ot _of_
a
d p ch pl
e
c
s
a
l
p
Download