Information Extraction in the Past 20 Years: Traditional vs. Open Heng Ji jih@rpi.edu Acknowledgement: some slides from Radu Florian and Stephen Soderland Long successful run – MUC – CoNLL – ACE – TAC-KBP – DEFT – BioNLP Programs – MUC – ACE – GALE – MRP – BOLT – DEFT 2 Genres – Newswire – Broadcast news – Broadcast conversations – Weblogs – Blogs – Newsgroups – Speech – Biomedical data – Electronic Medical Records Quality 3 Portability Quality Challenges 4 Where have we been? We’re thriving We’re making slow but consistent progress Relation Extraction Event Extraction Slot Filling We’re running around in circles Entity Linking Name Tagging We’re stuck in a tunnel Entity Coreference Resolution 5 Name Tagging: “Old” Milestones Year Tasks & Resources Methods F-Measure Example References 1966 - First person name tagger with punch card 30+ decision tree type rules - (Borkowski et al., 1966) 1998 MUC-6 MaxEnt with diverse levels of linguistic features 97.12% (Borthwick and Grishman, 1998) 2003 CONLL System combination; Sequential labeling with Conditional Random Fields 89% (Florian et al., 2003; McCallum et al., 2003; Finkel et al., 2005) 2006 ACE Diverse levels of linguistic features, Re-ranking, joint inference ~89% (Florian et al., 2006; Ji and Grishman, 2006) Our progress compared to 1966: More data, a few more features and more fancy learning algorithms Not much active work after ACE because we tend to believe it’s a solved problem… 6 The end of extreme happiness is sadness… State-of-the-art reported in papers 7 The end of extreme happiness is sadness… Experiments on ACE2005 data 8 Challenges Defining or choosing an IE schema Dealing with genres & variations –Dealing with novelty Bootstrapping a new language Improving the state-of-the-art with unlabeled data Dealing with a new domain Robustness 9 99 Schemas of IE on the Wall… Many IE schemas over the years: – MUC – 7 types • PER, ORG, LOC, DATE, TIME, MONEY, PERCENT – ACE – 5 7 5 types • PER, ORG, GPE, LOC, FAC, WEA, VEH • Has substructure (subtypes, mention types, specificity, roles) – CoNLL: 4 types • ORG, PER, LOC, MISC – ONTONotes: 18 types • CARDINAL,DATE,EVENT,FAC,GPE,LANGUAGE,LAW,LOC,MONEY,NORP,ORDIN AL,ORG,PERCENT,PERSON,PRODUCT,QUANTITY,TIME,WORK_OF_ART – IBM KLUE2: 50 types, including event anchors – Freebase categories – Wikipedia categories Challenges: – Selecting an appropriate schema to model – Combining training data 10 My Favorite Booby-trap Document http://www.nytimes.com/2000/12/19/business/lvmh-makes-a-two-part-offer-for-donna-karan.html LVMH Makes a Two-Part Offer for Donna Karan By LESLIE KAUFMAN Published: December 19, 2000 The fashion house of Donna Karan, which has long struggled to achieve financial equilibrium, has finally found a potential buyer. The giant luxury conglomerate LVMH-Moet Hennessy Louis Vuitton, which has been on a sustained acquisition bid, has offered to acquire Donna Karan International for $195 million in a cash deal with the idea that it could expand the company's revenues and beef up accessories and overseas sales. At $8.50 a share, the LVMH offer represents a premium of nearly 75 percent to the closing stock price on Friday. Still, it is significantly less than the $24 a share at which the company went public in 1996. The final price is also less than one-third of the company's annual revenue of $662 million, a significantly smaller multiple than European luxury fashion houses like Fendi were receiving last year. The deal is still subject to board approval, but in a related move that will surely help pave the way, LVMH purchased Gabrielle Studio, the company held by the designer and her husband, Stephan Weiss, that holds all of the Donna Karan trademarks, for $450 million. That price would be reduced by as much as $50 million if LVMH enters into an agreement to acquire Donna Karan International within one year. In a press release, LVMH said it aimed to combine Gabrielle and Donna Karan International and that it expected that Ms. Karan and her husband ''will exchange a significant portion of their DKI shares for, and purchase additional stock in, the combined entity.'' 11 Analysis of an Error Donna Karan International 12 Analysis of an Error: How can you Tell? FAC Saddam Hussein International Airport 8 FAC Baghdad International 1 ORG Amnesty International 3 FAC International Space Station 1 ORG International Criminal Court 1 ORG Habitat for Humanity International 1 ORG U-Haul International 1 FAC Saddam International Airport 7 ORG Karan International Committee of the Red Cross 4 Donna International ORG International Committee for the Red Cross 1 FAC International Press Club 1 Ronald International ORG Reagan American International Group Inc. 1 ORG Boots and Coots International Well Control Inc. 1 ORG International Committee of Red Cross 1 Saddam International ORGHussein International Black Coalition for Peace and Justice 1 FAC Baghdad International Airport RG Center for Strategic and International Studies 2 Dana International ORG International Monetary Fund 1 13 14 Dealing With Different Genres: Weblogs: – All lower case data • obama has stepped up what bush did even to the point of helping our enemy in Libya. – Non-standard capitalization/title case • LiveLeak.com - Hillary Clinton: Saddam Has WMD, Terrorist Ties (Video) Solution: Case Restoration (truecasing) 15 } 16 Out-of-domain data Volunteers have also aided victims of numerous other disasters, including hurricanes Katrina, Rita, Andrew and Isabel, the Oklahoma City bombing, and the September 11 terrorist attacks. 17 Out-of-domain Data Manchester United manager Sir Alex Ferguson got a boost on Tuesday as a horse he part owns What A Friend landed the prestigious Lexus Chase here at Leopardstown racecourse. 18 Bootstrapping a New Language English is resource-rich: –Lexical resources: gazetteers –Syntactic resources: PennTreeBank –Semantic resources: Wordnet, entity-labeled data (MUC, ACE, CoNLL), Framenet, PropBank, NomBank, OntoBank How can we leverage these resources in other languages? MT to the rescue! Mention Detection Transfer ES: El soldado nepalés fue baleado por ex soldados haitianos cuando patrullaba la zona central de Haiti , informó Minustah . EN: The Nepalese soldier was gunned down by former Haitian soldiers when patrullaba the central area of Haiti , reported minustah . El soldado nepalés fue baleado por ex soldados haitianos cuando patrullaba la zona central de Haiti , informó Minustah . The Nepalese soldier was gunned down by former Haitian soldiers when patrolling the central area of Haiti , reported minustah . O B-GPE B-PER O O OO O B-GPE B-PER O O O O B-LOC O B-GPE O O O O System Spanish Arabic Chinese F-measure Direct Transfer 66.5 Source Only (100k words) 71.0 Source Only (160k words) 76.0 Source + Transfer 78.5 Direct Transfer 51.6 Source Only (186k tokens) 79.6 Source + Transfer 80.5 Direct Transfer 58.5 Source Only 74.5 Source + Transfer 76.0 How to deal with out-of-domain data? How to even detect if you’re out of domain? How to deal with unseen WotD? (e.g. ISIS, ISIL, IS, Ebola) How to improve significantly the state-of-theart using unlabeled data? 22 What’s Wrong? Name tagger s are getting old (trained from 2003 news & test on 2012 news) Genre adaptation (informal contexts, posters) Revisit the definition of name mention – extraction for linking Limited types of entities (we really only cared about PER, ORG, GPE) Old unsolved problems Identification: “Asian Pulp and Paper Joint Stock Company , Lt. of Singapore” Classification: “FAW has also utilized the capital market to directly finance,…” (FAW = First Automotive Works) Potential Solutions for Quality Word clustering, Lexical Knowledge Discovery (Brown, 1992; Ratinov and Roth, 2009; Ji and Lin, 2010) Feedback from Linking, Relation, Event (Sil and Yates, 2013; Li and Ji, 2014) Potential Solutions for Portability Extend entity types based on AMR (140+) 23 Entity Linking Milestones 2006: The first definition of Wikification task (Bunescu and Pasca, 2006) 2009: TAC-KBP Entity Linking launched (McNamee and Dang, 2009) 2008-2012: Supervised learning-to-rank with diverse levels of features such as entity profiling, various popularity and similarity measures were developed (Gao et al., 2010; Chen and Ji, 2011; Ratinov et al., 2011; Zheng et al., 2010; Dredze et al., 2010; Anastacio et al., 2011) 2008-2013: Collective Inference, Coherence measures were developed (Milne and Witten, 2008; Kulkarni et al., 2009; Ratinov et al., 2011; Chen and Ji, 2011; Ceccarelli et al., 2013; Cheng and Roth, 2013) 2012: Various applications(e.g., Coreference resolution (Ratinov & Roth, 2012) – Dan’s talk 2014: TAC-KBP Entity Discovery and Linking (end-to-end name tagging, cross-document entity clustering, entity linking) (Ji et al., 2014) Many different versions of international evaluations were inspired from TAC-KBP; more than 130 papers have been published 24 Current Linking Problems and Possible Solutions State-of-the-art Entity Linking: 85% B-cubed+ F-score on formal genres and 70% B-cubed+ F-score on informal genres State-of-the-art Entity Discovery and Linking: 66% Discovery and Linking F-score, 73% Clustering CEAFm F-score Remaining Challenges Popularity bias Require better meaning representation Select collaborators from rich contexts Knowledge gap between source and KB Cross-lingual Entity Linking (name translation problem) Potential Solutions: Deep knowledge acquisition and representation (e.g., AMR) Better graph search alignment algorithms Make more people excited about Chinese and Spanish 25 Slot Filling Milestones 2009-2014: Top systems achieved 30%-40% F-measure Ground-truth is created based on manual assessment of pooled system output – relative recall; score may appear lower with stronger teams 2014 queries are more challenging than 2013; including some ambiguous queries sharing with entity linking (Stephen’s talk) Consistent progress on individual system (RPI, test on 2014 data): 2010: 20% 2011: 22% 2013: 28% 2014: 34% Successful Methods Multi-label Multi-instance learning (Seadeanu et al., 2012) Combination of distant supervision with heuristic rules and patterns (Roth et al., 2013) Cross-source Cross-system Inference (Chen et al., 2011; Yu et al., 2014) Linguistic constraints (Yu et al., 2014) – Heng’s one-week pencil-andpaper work to semi-automatically acquire trigger phrases; an awfully simple trigger scoping method beat all 2013 systems 26 Have the Error Sources Changed over Years? 0.3 2010 0.25 0.2 0.15 0.1 0.05 0 (Min and Grishman, 2011) (Yu and Ji, 2014) 2014 27/35 Blame Ourselves First… Non-verb and multi-word expression as triggers his men back to their compound Knowledge scarcity - Long-tail A suicide bomber detonated explosives at the entrance to a crowded medical teams carting away dozens of wounded victims Today I was let go from my job after working there for 4 1/2 years. Possible solution: increase coverage with FrameNet (Li et al., 2014) Global context I didn't want to hurt him . I miss him to death. I threw stone out of the window. vs. I threw him out of the window. Ellison to spend $10.3 billion to get his company. We believe that the likelihood of them using those weapons goes up. Fifteen people were killed and more than 30 wounded Wednesday as a suicide bomber blew himself up on a student bus in the northern town of Haifa Possible solution: joint modeling between triggers and arguments (Li et al., 2013) 28 Then Blame Others… Fundamental language problem – ambiguity and variety Coreference, coreference, coreference… 25% of the examples involve coreference which is beyond current system capabilities, such as nominal anaphors and non-identity coreference Almost overnight, he became fabulously rich, with a $3-million book deal, a $100,000 speech making fee, and a lucrative multifaceted consulting business, Giuliani Partners. … His consulting partners included seven of those who were with him on 9/11, and in 2002 Alan Placa, his boyhood pal, went to work at the firm. After successful karting career in Europe, Perera became part of the Toyota F1 Young Drivers Development Program and was a Formula One test driver for the Japanese company in 2006. “a woman charged with running a prostitution ring … her business, Pamela Martin and Associates” 29 Then Blame Others… Paraphrase, paraphrase, paraphrase… “employee/member”: Sutil, a trained pianist, tested for Midland in 2006 and raced for Spyker in 2007 where he scored one point in the Japanese Grand Prix. Daimler Chrysler reports 2004 profits of $3.3 billion; Chrysler earns $1.9 billion. In her second term, she received a seat on the powerful Ways and Means Committee Jennifer Dunn was the face of the Washington state Republican Party for more than two decades Buchwald lied about his age and escaped into the Marine Corps. By 1942, Peterson was performing with one of Canada's leading big bands, the Johnny Holmes Orchestra. “spouse”: Buchwald 's 1952 wedding -- Lena Horne arranged for it to be held in London 's Westminster Cathedral -- was attended by Gene Kelly , John Huston , Jose Ferrer , Perle Mesta and Rosemary Clooney , to name a few 30 Then Blame Others… Inference, Inference, Inference… systems would benefit from specialists which are able to reason about times, locations, family relationships, and employment relationships. People Magazine has confirmed that actress Julia Roberts has given birth to her third child a boy named Henry Daniel Moder. Henry was born Monday in Los Angeles and weighed 8 lbs. Roberts, 39, and husband Danny Moder, 38, are already parents to twins Hazel and Phinnaeus who were born in November… He [Pascal Yoadimnadji] has been evacuated to France on Wednesday after falling ill and slipping into a coma in Chad, Ambassador Moukhtar Wawa Dahab told The Associated Press. His wife, who accompanied Yoadimnadji to Paris, will repatriate his body to Chad, the amba. is he dead? in Paris? Until last week, Palin was relatively unknown outside Alaska… does she live in Alaska? The list says that the state is owed $2,665,305 in personal income taxes by singer Dionne Warwick of South Orange, N.J., with the tax lien dating back to 1997. does she live in NJ? 31 Portability/Scalability Challenges 32 Defining the Problem • Deep understanding of all possible relations? • Open IE, pre-emptive IE, on-demand IE… 10/15/2014 DEFT PI meeting -- U. Washington 33 Defining the Problem • Deep understanding of all possible relations? • Deep Extraction for Focused Tasks (D.E.F.T.) – User has a focused information need: • A few dozen relations, several entity types: • Date_of_birth(per, date), city_of_headquarters(org, city), … • Treatment(substance, condition), studies_disease(per/org, condition),… • Arrive_in(per, loc), meet_with(per, per), unveil(org, product), … – Quickly train an extractor for the task • Domain independent: parsing, Open IE, SRL, … • Task specific: semantic tagging, extraction patterns, … TAC-KBP Freedman et al. Extreme Extraction -- Machine Reading in a Week. EMNLP 2011 Zhang et al. NewsSpike Event Extractor, in review 10/15/2014 DEFT PI meeting -- U. Washington 34 Aim for the Head ? A Zipfian Distribution of surface forms to express a textual relation Dead simple Frequency The real challenge A hopeless case Patterns 10/15/2014 DEFT PI meeting -- U. Washington 35 Open IE for KBP • Advantages of Open IE – Robust – Massively scalable – Works out of the box – Finds whatever relations are expressed in the text – Not tied to an ontology of relations • Disadvantages – Finds whatever relations are expressed in the text – Not tied to an ontology of relations • Challenge – Map Open IE to an ontology of relations – Minimum of user effort github/knowitall/openie 10/15/2014 DEFT PI meeting -- U. Washington 36 OpenIE–KBP Rule Language Arg1 Rel Arg2 (Smith, was appointed , Acting Director of Acme Corporation) entity slotfill per:employee_or_member_of (Smith, Acme Corporation) Terms in Rule Example Target relation: Query entity in: Slotfill in: Slotfill type: Arg1 terms: Relation terms: Arg2 terms: per:employee_or_member_of Arg1 Arg2 Organization appointed <JobTitle> of 10/15/2014 DEFT PI meeting -- U. Washington 37 Hits the Head, but … • High precision, average recall • Limited recall from Open IE, – Good with verb-based relations – Weak on noun-based relations • “Implicit relation” patterns “Bashardost, 43, is …” (Baradost, [has age], 43) “… the Election Complaints Commission (ECC)…” (Election Complaints Commission, [has acronym], ECC) “French journalist Jean LeGall reported that …” (Jean LeGall, [has job title], journalist ) (Jean LeGall, [has nationality], French ) 10/15/2014 DEFT PI meeting -- U. Washington 38 NewsSpike Event Extractor • Extracts event relations from news streams – Event = event_phrase(arg1_type, arg2_type) • NewsSpike = (entity1, entity2, date, {sentences}) – from parallel news streams – Open IE identifies entity1, entity2, and event phrase – a spike in frequency on that date indicates an event between entity1 and entity2 • Automatically discover relations not covered by Freebase arrive_in (person, location) beat (sports_team, sports_team) meet_with (person, person) nominate (person/politician, person) unveil (organization, product) … 10/15/2014 DEFT PI meeting -- U. Washington 39 NewsSpike Architecture Parallel news streams Discover events Group E=e(t1,t2) Test sentences s Generate training data NS=(a1,a2,d,S) (a1,a2,t) S={s rr1 r2r1r,3rs2 ,s3} r1r41 r2r25r33 NewsSpike w/ Parallel sentences Training Phase 10/15/2014 Event E=e(t (a1,a12,t,t)2) ,a ) rs→E(a r11rr2r2rr3r3r1 2 s’→E(a’ r41 r25 31,a’2) input learn Extractions s→ E(a1,a2) extract Event Extractor Training sentences Testing Phase DEFT PI meeting -- U. Washington 40 High Quality Training • Paraphrases in NewsSpike gives positive training • Negative training from Temporal negation heuristic: – If event phrases e1 and e2 are in the same NewsSpike – and one of them is negated – e1 is probably not a paraphrase of e2 “Team1 faces Team2” “Team1 did not beat Team2” face ≠ beat • High precision from negative training 10/15/2014 DEFT PI meeting -- U. Washington 41 High Precision Event Extractor Doubles the area under PR curve vs. Universal Schemas NewsSpike-E2 on news stream Universal Schemas on news stream Universal Schemas on NYT 10/15/2014 DEFT PI meeting -- U. Washington 42