Slide

advertisement
Overview of Entity Discovery and Linking
Tasks at KBP2014
Heng Ji (RPI)
Joel Nothman, Ben Hachey (Univ. of Sydney)
Thanks to KBP2014 Organizing Committee
jih@rpi.edu
Goals and The Task
2
Overview
• Motivations
o The most popular EL Trend: Collective Inference - disambiguate
a set of relevant mentions simultaneously by leveraging the
global topical coherence between entities
o A lot of research has been done in parallel in the Wikification
community (Bunescu, 2006) - extract prominent ngrams as
concept mentions, and link each concept mention to the KB
o One important research direction of KBP: “Cold-start”
• What’s New in 2014
o Extend English task to Entity Discovery and Linking (full Entity
Extraction + Entity Linking + NIL Clustering)
o Add discussion forums to Cross-lingual tracks
o Share some source collections and queries with regular and
cold-start slot filling tracks, to investigate the role of EDL in the
entire cold-start KBP pipeline
o Provide automatic annotations, reading list, software tools
3
Entity Mention Extraction
It’s a version of Chicago – the
standard classic Macintosh
menu font, with that distinctive
thick diagonal in the ”N”.
Chicago was used by default
for Mac menus through
MacOS 7.6, and OS 8 was
released mid-1997..
4
Chicago VIII was one of the
early 70s-era Chicago
albums to catch my
ear, along with Chicago II.
Clustering: Cross-doc Coreference Resolution
It’s a version of Chicago – the
standard classic Macintosh
menu font, with that distinctive
thick diagonal in the ”N”.
Chicago was used by default
for Mac menus through
MacOS 7.6, and OS 8 was
released mid-1997..
5
Chicago VIII was one of the
early 70s-era Chicago
albums to catch my
ear, along with Chicago II.
Linking: Disambiguation to KB
It’s a version of Chicago – the
standard classic Macintosh
menu font, with that distinctive
thick diagonal in the ”N”.
Chicago was used by default
for Mac menus through
MacOS 7.6, and OS 8 was
released mid-1997..
6
Chicago VIII was one of the
early 70s-era Chicago
albums to catch my
ear, along with Chicago II.
Evaluation Measures
• Added type matching variant into each measure
7
3
B : Precision
● Precision = sum mention credits / #system-output-mentions
= (1/2 + 2/2 + 2/2 +1/1 + 0)/6 = 0.583
1: 1/2
1
3
2
1
6
5
2: 2
/2
7
3
6: 2
/2
3: 1/1
4
4
4: 0
Gold Standard
2
6
System Output
cluster mentions together
1
color refer to kb_id
shape refer to entity type
number refer to doc_id + offset
3
B : Recall
● Recall = sum mention credits / #gold-standard-mentions
= (1/3+ 2/3 + 2/3 + 1/2)/6 = 0.361
1: 1/3
1
3
2
1
6
5
2: 2
/3
7
3
6: 2
/3
3: 1/2
4
4
4: 0
Gold Standard
2
6
System Output
cluster mentions together
1
color refer to kb_id
shape refer to entity type
number refer to doc_id + offset
CEAF (Luo, 2005)
• Idea: a mention or entity should not be credited more
than once
• Formulated as a bipartite matching problem
o
o
A special ILP problem
Efficient algorithm: Kuhn-Munkres
CEAFm: Example
● Solid: best 1-1 alignment
●
● Recall=#common / #mentions-in-key = (2+1)/6 = 1/2
● Precision= #common / #mentions-in-response = (2+1)/6 = 1/2
1
1
2
6
1
7
3
3
2
5
1
4
4
2
Gold Standard
6
System Output
cluster mentions together
1
color refer to kb_id
shape refer to entity type
number refer to doc_id + offset
Participants • EDL: 20 teams, 75 runs; EL: 17 teams, 55 runs
12
The Results
13
General Architecture

Feedback from linking to improve
extraction

New ranking algorithm:
Progamming with Personalized
PageRank algorithm by
CohenCMU (Mazaitis et al., 2014)

A nice summary of the state-of-theart ranking features by Tohoku NL
(Zhou et al., 2014)
14
Overall Performance: Extraction + Linking

Scoring: span, type and KB ID match

Systems with > 60% NERL F1 are significantly better than others (90%
confidence interval)
15
Overall Performance: Extraction + Clustering

Scoring: span, type and clustering

LCC and RPI systems are significantly better than others (90% confidence
interval)
16
Impact of Entity Mention Extraction
75%, Much lower than state-ofthe-art name tagging (89%)

NER: span; NERC: span_type; NERL: span_type_KBID KBIDs: docid_KBID

NER (extraction) correlates with NERL (Extraction + Linking) well

Bug in IBM system
17
Diagnostic Entity Linking Performance
IBM is
somewhere
here too!

High performance with perfect entity mentions (70%90%)
18
Entity Types and Textual Genres

Scoring: span, type
and linking

Easiest: persons and
discussion forum
19
Clustering Measures

B-cubed is very sensitive to mention extraction errors
20
Cross-lingual Entity Linking
Query
Spanish
English
B-cubed+ (%)
Team
P
R
F
HITS1
78.9
68.4
73.2
IBM1
84.0
81.6
82.8
HITS1
68.4
60.3
64.1
IBM1
80.6
77.7
79.1

Both systems followed their English EL approaches

IBM achieved similar performance with the top English EDL system (the
difficulty level of queries are not comparable)

Many Chinese teams chose to focus on English EDL (a cloned version in
NLPCC2014 organized by PKU)

Tri-lingual EDL in KBP2015
21
What’s New and What Works
- Or How to Make My Advisor Happy

A roll-coaster-style conversation 12 hours before this presentation…

R: I started to question why we are doing all of these…

H:  Please don’t tell me all of these are meaningless…

R: Did EDL produce any new science?

H: Of course! Blabla…blabla…blabla…blabla…and Blabla

R: You make me happy
22
Entity Linking Milestones







2006: The first definition of Wikification task (Bunescu and Pasca, 2006)
2009: TAC-KBP Entity Linking launched (McNamee and Dang, 2009)
2008-2012: Supervised learning-to-rank with diverse levels of features
such as entity profiling, various popularity and similarity measures
were developed (Gao et al., 2010; Chen and Ji, 2011; Ratinov et al., 2011;
Zheng et al., 2010; Dredze et al., 2010; Anastacio et al., 2011)
2008-2013: Collective Inference, Coherence measures were developed
(Milne and Witten, 2008; Kulkarni et al., 2009; Ratinov et al., 2011; Chen
and Ji, 2011; Ceccarelli et al., 2013; Cheng and Roth, 2013)
2012: Various applications(e.g., Knowledge Acquisition (via grounding),
Coreference resolution (Ratinov and Roth, 2012) and Document
classification (Vitale et al., 2012; Song and Roth, 2014; Gao et al., 2014)
2014: TAC-KBP Entity Discovery and Linking (end-to-end name
tagging, cross-document entity clustering, entity linking)
2012-2014: Many different versions of international evaluations were
inspired from TAC-KBP; more than 130 papers have been published
23
Joint Extraction and Linking

Some recent work (Sil and Yates, 2013; Meij et al., 2012; Guo et al., 2013;
Huang et al., 2014b) proved extraction and linking can mutually enhance
each other



IBM (Sil and Florian, 2014), MSIIPL THU (Zhao et al., 2014), SemLinker
(Meurs et al., 2014), UBC (Barrena et al., 2014) and RPI (Hong et al., 2014)
used the properties in external KBs such as DBPedia as feedback to refine
the identification and classification of name mentions.





Bosch will provide the rear axle.  Robert Bosch Tool Corporation  ORG
Parker was 15 for 21 from the field, putting up a season high while scoring nine of San
Antonio’s final 10 points in regulation  San Antonio Spurs  ORG
RPI system successfully corrected 11.26% wrong mentions
HITS team (Judea et al., 2014) proposed a joint approach that
simultaneously solves extraction, linking and clustering using Markov
Logic Networks
Document Linking  Event Extraction (Ji and Grishman, 2008)
Entity Linking  Relation Extraction (Chan and Roth, 2010)
Toward more interactions and joint inferences between tasks  Marry
EDL and SF in KBP2015
24
Entity Linking to Improve Relation Extraction
(Chan and Roth, 2010)
David
Cone
,
a
Kansas
City
native
,
was
originally
signed
by
the
Royals
and
broke
into
the
majors
with
the
team
David Brian Cone (born January 2, 1963) is a former
Major League Baseball pitcher. He compiled an 8–3
postseason record over 21 postseason starts and was a
part of five World Series championship teams (1992 with
the Toronto Blue Jays and 1996, 1998, 1999 & 2000 with
the New York Yankees). He had a career postseason ERA
of 3.80. He is the subject of the book A Pitcher's Story:
Innings With David Cone by Roger Angell. Fans of David
are known as "Cone-Heads."
Cone lives in Stamford, Connecticut, and is formerly a
color commentator for the Yankees on the YES Network.[1]
Contents
[hide]
1 Early years
2 Kansas City Royals
3 New York Mets
Partly because of the resulting lack of leadership,
after the 1994 season the Royals decided to
reduce payroll by trading pitcher David Cone and
outfielder Brian McRae, then continued their
salary dump in the 1995 season. In fact, the team
payroll, which was always among the league's
highest, was sliced in half from $40.5 million in
1994 (fourth-highest in the major leagues) to $18.5
million in 1996 (second-lowest in the major
leagues)
25
25
Task-specific / Genre-specific Mention Extraction

Extraction for Linking
 4% entity mentions included nested mentions
 Posters in discussion forum should be extracted

HITS (Judea et al., 2014), LCC (Monahan et al., 2014), MSIIPL
THU (Zhao et al., 2014), NYU (Nguyen et al., 2014) and RPI
(Hong et al., 2014) developed heuristic rules to significantly
improve name tagging
26
Toward Deep Understanding of Full Documents

Old Query-driven Entity Linking
 Limited exploration of co-occurring entity mentions
 Bag-of-words style

New EDL Task
 Deep representation and understanding the relations
among entities in the source documents
 Natural Language Understanding style
 e.g., Use Abstract Meaning Representation (details in RPI’s
EDL talk)
27
Better Meaning Representation



It was a pool report typo. Here is exact Rhodes quote: ”this is not
gonna be a couple of weeks. It will be a period of days.”
At a WH briefing here in Santiago, NSA spox Rhodes came with a
litany of pushback on idea WH didn’t consult with Congress.
Rhodes singled out a Senate resolution that passed on March 1st
which denounced Khaddafy’s atrocities. WH says UN rez
incorporates it
Ben Rhodes
(Speech Writer)
28
Select Collaborators from Rich Context
Source:
No matter what, he never should have given Michael Jackson that
propofol. He seems to think a “proper” court would have let Murray go
free.
Social Relation
KB:
The trial of Conrad Murray was the American criminal trial of Michael
Jackson's personal physician, Conrad Murray.
29
Select Collaborators from Rich Context
Source:
Mubarak, the wife of deposed Egyptian President Hosni Mubarak,
…
wife
Family
KB:
Suzanne Mubarak (born 28 February 1941) is the wife of former
Egyptian President Hosni Mubarak…
30
Select Collaborators from Rich Context
Source:
Hundreds of protesters from various groups converged on the state
capitol in Topeka, Kansas today…
Second, I have a really hard time believing that there were any
ACTUAL “explosives” since the news story they link to talks about one
guy getting arrested for THREATENING Governor Brownback.
Employment
Sam Brownback
Peter Brownback
KB:
Sam Brownback was elected Governor of Kansas in 2010 and took
office in January 2011.
31
Select Collaborators from Rich Context
Source:
AT&T coverage in GA is good along the interstates and in the major
cities like Atlanta, Athens, Rome, Roswell and Albany.
Rome, Georgia
Part-whole
Rome, Italy
KB:
At the 2010 census, Rome had a total population of 36,303, and is the
largest city in Northwest [Georgia] and the 19th largest city in the state.
32
Select Collaborators from Rich Context
Source:
Going into the big Super Tuesday, Romney had won the most votes,
states and delegates, Santorum had won some contests and was
second, Gingrich had only one contest.
Start-position Event
George W. Romney
Mitt Romney
KB:
The Super Tuesday primaries took place on March 6. Mitt Romney
carried six states, Rich Santorum carried three, and Newt Gingrich won
only in his home state of Georgia.
33
Graph-based NIL Entity Clustering

Bad News in EL2012


CUNY-BLENDER (Tamang et al., 2012) explored more than 40
clustering algorithms and found that advanced graph-based clustering
algorithms did not significantly out-perform single baseline “All-inone” clustering algorithm on the overall queries (except the most
difficult ones)
Good News in EDL2014

LCC (Monahan et al., 2014) proved that graph partition based
algorithm achieved significant gains.
34
Remaining Challenges
35
Name Tagging: “Old” Milestones
Year
Tasks &
Resources
Methods
F-Measure
Example
References
1966
-
First person name tagger with
punch card
30+ decision tree type rules
-
(Borkowski et al.,
1966)
1998
MUC-6
MaxEnt with diverse levels of
linguistic features
97.12%
(Borthwick and
Grishman, 1998)
2003
CONLL
System combination;
Sequential labeling with
Conditional Random Fields
89%
(Florian et al., 2003;
McCallum et al., 2003;
Finkel et al., 2005)
2006
ACE
Diverse levels of linguistic
features, Re-ranking, joint
inference
~89%
(Florian et al., 2006; Ji
and Grishman, 2006)

Our progress compared to 1966:


More data, a few more features and more fancy learning algorithms
Not much active work after ACE because we tend to believe it’s a solved
problem…
36
Cross-genre Name Tagging

Experiments on ACE2005 data
37
What’s Wrong?

Name taggers are getting old (trained from 2003 news & test on 2012 news)

Genre adaptation (informal contexts, posters)

Revisit the definition of name mention – extraction for linking

Old unsolved problems


Identification: “Asian Pulp and Paper Joint Stock Company , Lt. of Singapore”

Classification: “FAW has also utilized the capital market to directly finance,…” (FAW =
First Automotive Works)
Potential Solutions for Quality

Word clustering, Lexical Knowledge Discovery (Brown, 1992; Ratinov and Roth, 2009; Ji
and Lin, 2010)

Feedback from Linking, Relation, Event (Sil and Yates, 2013; Li and Ji, 2014)
38
Remaining Challenges for Linking


Remaining Challenges

Popularity bias

Knowledge gap between source and KB

Commonsense Knowledge
Potential Solutions

Deep knowledge acquisition and representation (e.g.,
AMR)

Better graph search alignment algorithms

Make more people excited about Chinese and Spanish by
providing more resources  Tri-lingual EDL in KBP2015
39
Popularity Bias
If you are called Michael Jordan…
A Little Better…
Knowledge Gap between Source and KB
Source: breaking news/new
information/rumors
KB: bio, summary, snapshot of life
Christies denial of marriage privledges
to gays will alienate independents and
his “I wanted to have the people vote
on it” will ring hollow.
Christie has said that he favoured New Jersey's
law allowing same-sex couples to form civil
unions, but would veto any bill legalizing samesex marriage in New Jersey
Translation out of hype-speak: some
kook made threatening noises at
Brownback and go arrested
Samuel Dale "Sam" Brownback (born
September 12, 1956) is an American politician,
the 46th and current Governor of Kansas.
Connect/Sort
Background Knowledge
42
Man Accused Of Making Threatening
Phone Call To Kansas Gov. Sam
Brownback May Face Felony Charge
Commonsense Knowledge
2008-07-26
During talks in Geneva attended by William J. Burns Iran refused to respond to Solana’s
offers.
William_J._Burns (1861-1932)
William_Joseph_Burns (1956- )
43
Conclusions and Looking Forward


The new EDL task has attracted much interests from the KBP
community and produced some interesting research problems
and new directions
KBP2015





Improve the annotation guideline and annotation quality of the
training and evaluation data sets
Develop more open sources, data and resources for Spanish and
Chinese EDL
Encourage researchers to re-visit the entity mention extraction problem
in the new cold-start KBP setting
Propose a new tri-lingual EDL task on a source collection from three
languages: English, Chinese and Spanish
Investigate the impact of EDL on the end-to-end cold-start KBP
framework; joint inference between EDL and SF
44
We can do it!
45
Download