Relation Clustering, Entity Linking

advertisement
Knowledge Gaps for Entity Linking
Heng Ji
Outline
• Relation Clustering
• Remaining Challenges for Entity Linking
2
Relation Clustering: The paper from ten years
ago (Hasegawa et al., 2014)
• Relation Clustering
• Remaining Challenges for Entity Linking
3
Relation Discovery Overview
• Assume that pairs of entities occurring in similar context can
be clustered and each pair in a cluster is an instance of the
relation.
o
o
o
o
o
1. Tag NE in text corpora
2. Get co-occurrence pairs of NE and their context
3. Measure context similarities among pairs of NEs.
4. Make clusters of pairs of NEs.
5. Label each cluster of pairs of NEs.
• Run NE tagger, get all context words within a certain distance;
if context words of A-B and C-D pair are similar, these two
pairs are placed into the same cluster(the same relation), in
this case the relation is merger and acquisition.
Relation Discovery
Relation Discovery
• NE tagging use the extended NE tagger(Sekine, 2001) to
detect useful relations.
• Collect intervening words between two NEs for each cooccurrence.
o Two NEs are considered to co-occur if they appear within the
same sentence and separated by at most N intervening words.
o Different orders are considered as different contexts. That is,
e1…e2 and e2…e1 are collected as different contexts.
o Passive voice: collect the base forms of words which are
stemmed by a POS tagger, but verb past participles are
distinguished from other verb forms.
• Less frequent pairs of NEs should be eliminated.
o Set a frequency threshold
Relation Discovery
• Calculate similarity between the set of contexts of NE pairs.
o Vector space model and cosine similarity
o Only compare NE pairs which have the same types, e.g., one
PERSON-GPE pair and another PERSON-GPE pair.
o Eliminate stop words, words in parallel expressions, and
expressions peculiar to particular source documents.
• A context vector for each NE pair consists of the bag of words formed
from all intervening words from all co-occurrences of two NEs.
o Different orders: if a word wi occurred L times in e1…e2, M times
in e2…e1, the tfi of wi is defined as L-M.
o If the norm |α| is small due to the lack of context words, the
similarity might be unreliable, so define a threshold to eliminate
short context vectors.
Relation Discovery
• We can cluster the NE pairs base on the similarity among
context vectors of them.
o We do not know the # of clusters in advance so we adopt
hierarchical clustering.
o Using complete linkage
• Label the cluster with the most frequent word in all
combinations of the NE pairs in the same cluster.
o The frequencies are normalized.
Discussions
• How will embeddings play a role here?
• Did/Will we make fundamental changes to this “old”
framework?
Outline
• Relation Clustering
• Remaining Challenges for Entity Linking (10% Errors for News
and 15% Errors for Social Media)
10
Entity Linking
It’s a version of Chicago – the
standard classic Macintosh
menu font, with that distinctive
thick diagonal in the ”N”.
Chicago was used by default
for Mac menus through
MacOS 7.6, and OS 8 was
released mid-1997..
11
Chicago VIII was one of the
early 70s-era Chicago
albums to catch my
ear, along with Chicago II.
Knowledge Gap between Source and KB
Source: breaking news/new
information/rumor
KB: bio, summary, snapshot of life
According to Darwin it is the
Males who do the vamping.
Charles Robert Darwin, was an English naturalist and
geologist best known for his contributions to evolutionary
theory.
I had no idea the victim in the
Jackson cases was publicized.
In the summer of 1993, Jackson was accused of child
sexual abuse by a 13-year-old boy named Jordan Chandler
and his father, Dr. Evan Chandler, a dentist.
I went to youtube and
checked out the Gulf oil crisis:
all of the posts are one month
old, or older…
On April 20, 2010, the Deepwarter Horizon oil platform,
located in the Mississippi Canyon about 40 miles (64 km)
off the Louisiana coast, suffered a catastrophic explosion;
it sank a day-and-a-half later
12
Fill in the Gap with Background Knowledge
Source: breaking news/new
information/rumors
KB: bio, summary, snapshot of life
Christies denial of marriage privledges
to gays will alienate independents and
his “I wanted to have the people vote
on it” will ring hollow.
Christie has said that he favoured New Jersey's
law allowing same-sex couples to form civil
unions, but would veto any bill legalizing samesex marriage in New Jersey
Translation out of hype-speak: some
kook made threatening noises at
Brownback and go arrested
Samuel Dale "Sam" Brownback (born
September 12, 1956) is an American politician,
the 46th and current Governor of Kansas.
Connect/Sort
Background Knowledge
13
Man Accused Of Making Threatening
Phone Call To Kansas Gov. Sam
Brownback May Face Felony Charge
Knowledge Synthesis
The Stockholm Institute stated that 23 of 25 major armed conflicts in the world in 2000
occurred in impoverished nations.
Stockholm_International_Peace_Research_Institute
Stockholm_Institute_of_Education
14
Morphs
They passed a bill, and Christie the Hutt decides he's stull sucking up to be RomBot's
running mate.
Chris Christie
Mitt Romney
They passed a bill, and Christie the Hutt decides he's stull sucking up to be RomBot's
running mate.
15
Commonsense Knowledge
2008-07-26
During talks in Geneva attended by William J. Burns Iran refused to respond to Solana’s
offers.
William_J._Burns (1861-1932)
William_Joseph_Burns (1956- )
16
Commonsense Knowledge
The petition demanded the introduction of a parliament elected by all adults - men and
women in Saudi Arabia.
Consultative Assembly of Saudi_Arabia
17
World Knowledge
Millions of Americans went to war for America, and came back broken or otherwise gave up
a lot, and now we look to take a huge chunk of their hide because Washington no longer
works.
Federal government of the United States
18
Collective Inference: What We've done Before (Pan et al., 2015)
• Entity mentions involved in AMR conjunction relations should be
linked jointly to KB; their candidates in KB should also be strongly
connected to each other with high semantic relatedness
o “and”, “or”, “contrast-01”, “either”, “compared to”, “prep along with”,
“neither”, “slash”, “between” and “both”
19
Collective Inference: Beyond Sentence and Beyond Syntax
I think Mitt drops out...
Ok, my answer is no one and Obama wins the GE.
20
Download