Why Read if You Can Scan? Trigger Scoping Strategy for Biographical Fact Extraction and Chin-Yew Polytechnic Institute 2Peking University 3 Microsoft 1. Task Introduction 3. Triggers in Extraction • Biographical Fact Extraction: Given a query, extract the values for given biographical fact types. • The Definition of Triggers: The smallest extent of a text which most clearly expresses an event occurrence or indicates a relation type. born is an important trigger for extracting birth-related facts. Query: Colin Firth Date of Birth: 9/10/1960 Place of Birth: Grayshott, Hampshire, England Spouse: Livia Firth Children: Will Firth, Luca Firth, Matteo Firth Siblings: Kate Firth, Jonathan Firth • Should we read every word of the relative documents for facts? • The Importance of Triggers: 94.36% of the biographical facts are mentioned in a sentence containing indicative triggers (KBP SF 2012 corpus). • Trigger Scope: The shortest fragment that is related to a trigger. Our observation: Each fact –specific trigger has its own scope and its corresponding facts seldom appear outside of its scope. the scope of “graduated”, a trigger for education-related facts She <graduated> from Barnard in 1965 and soon began teaching English at Chesterbrook Academy in Pennsylvania. Using scanning strategy, we only need to focus on the following segment: She graduated from Barnard in 1965 2. Learn from Scanning Strategy • Four Steps in implementing scanning strategy ----------------------------------------------------------------------------1) Keep in mind what you are searching for: e.g., KEYWORDs. 2) Anticipate in what form the information – number, proper nouns, etc. 3) Let your eyes run rapidly over several lines of print at a same time until KEYWORDs are found. 4) Closer reading can occur. ----------------------------------------------------------------------------- What are the “keywords” in our task? • Trigger-based extraction process following the scanning strategy ----------------------------------------------------------------------------1) Let the computer know the query and the fact type to be extracted. 2) Set the type constrains the candidate answer should satisfy- person, organization, GPE, etc. 3) Locate all the triggers of the given fact type and recognize their respective scopes. 4) Within each scope, extract candidate answers which satisfy the entity type constraint set in 2). ----------------------------------------------------------------------------- 4. Scope Identification • Rule-based Method: Left boundary: trigger Right boundary: verb or trigger with other fact types Paul Francis Conrad and his twin <brother>, James, were <born> in Cedar Rapids, Iowa, on June 27, 1924, <sons> of Robert H. Conrad and Florence Lawler Conrad. Triggers are marked by angle brackets (<>) and the scopes of triggers are colored in different colors. RESEARCH POSTER PRESENTATION DESIGN © 2012 www.PosterPresentations.com 3 Lin Research Asia • Supervised Classification For each detected trigger, we perform a binary classification of each token in the sentence as to whether it is within or outside of the scope of a trigger. -----------------Features used to train a classifier------------------Position: the feature takes value 1 if the word appears before the trigger Distance: the distance (in words) between the word and the trigger POS: POS tags of the word and the trigger Name Entity: the name entity type of the word Interrupt: the feature takes value 1 if there is a verb or a trigger with other fact type between trigger and the word. • Trigger Mining We mine fact-specific triggers from existing patterns (NYU, RPI, PRIS Slot Filling (SF) systems) and ground truth sentences from SF 2012 corpus. 5. Experiments • Data Development set: KBP SF 2012 corpus Testing set: KBP 2013 corpus Table 1: scope identification results. 95 96 90 94 F-score (%) Sujian 2 Li 92 90 88 Rule SVMs 86 birth sidence ucation re ed ly h deat fami 85 80 Rule SVMs 75 e ly fami residenc birth h n deat ducatio e Table 2: performance on KBP 2013 Fact Group Birth Fact Type 100 90 place_of_birth 80 date_of_birth Death place_of_death date_of_death Residence place_of_residence Education school_attended Family parents sibling spouse children other_family F-score (%) 1Rensselaer Heng 1 Ji , Accuracy (%) Dian 1 Yu , 70 60 50 40 state-of-the-art Rule SVMs 30 ly nce use ren nts ling ath irth ded ath irth i m de po ild re ib de _b en de _b a f i s ch pa s f_ of tt f_ of _ s r o e_ l_a _o e_ e e r _ h te lac oo ace dat ot _of_ a d p ch pl e c s a l p