m,e - English

advertisement
Knowledge Harvesting
from Text and Web Sources
Part 3: Knowledge Linking
Gerhard Weikum
Max Planck Institute for Informatics
http://www.mpi-inf.mpg.de/~weikum/
Quiz Time
How many days do you need to visit
all Shangri-La places on this planet?
Source: geonames.org
Answer: 365
3-2
Quiz Time
How many days do you need to visit
all Shangri-La places on this planet?
3-3
3-3
Linkied Data: RDF Triples on the Web
30 Bio. triples
500 Mio. links
http://richard.cyganiak.de/2007/10/lod/lod-datasets_2011-09-19_colored.png
3-4
Linked RDF Triples on the Web
yago/wordnet: Artist109812338
yago/wordnet:Actor109765278
yago/wikicategory:ItalianComposer
imdb.com/name/nm0910607/
dbpedia.org/resource/Ennio_Morricone
imdb.com/title/tt0361748/
dbpedia.org/resource/Rome
rdf.freebase.com/ns/en.rome
data.nytimes.com/51688803696189142301
geonames.org/5134301/city_of_rome
N 43° 12' 46'' W 75° 27' 20''
3-5
Linked RDF Triples on the Web
yago/wordnet: Artist109812338
yago/wordnet:Actor109765278
yago/wikicategory:ItalianComposer
imdb.com/name/nm0910607/
dbpedia.org/resource/Ennio_Morricone
imdb.com/title/tt0361748/
dbpedia.org/resource/Rome
rdf.freebase.com/ns/en.rome_ny
?
?
data.nytimes.com/51688803696189142301
Referential data quality?
Hand-crafted sameAs links?
generated sameAs links?
?
geonames.org/5134301/city_of_rome
N 43° 12' 46'' W 75° 27' 20''
3-6
RDF Entities on the Web
http://sig.ma
3-7
RDF Entities on the Web
http://sig.ma
3-8
Entity-Name Ambiguity
http://sameas.org
3-9
Entities in HTML
http://sindice.com
3-10
http://schema.org/
Entity Markup in HTML:
Towards Standardized Microformats
3-11
http://schema.org/
Entity Markup in HTML:
Towards Standardized Microformats
3-12
Web Page in Standard HTML
http://schema.org/
Jane Doe
<img src="janedoe.jpg" />
Professor
20341 Whitworth Institute
405 Whitworth
Seattle WA 98052
(425) 123-4567
<a href="mailto:jane-doe@xyz.edu">jane-doe@illinois.edu</a>
Jane's home page:
<a href="http://www.janedoe.com">janedoe.com</a>
Graduate students:
<a href="http://www.xyz.edu/students/alicejones.html">Alice Jones</a>
<a href="http://www.xyz.edu/students/bobsmith.html">Bob Smith</a>
3-13
Web Page in HTML with Microdata
<div itemscope itemtype="http://schema.org/Person">
<span itemprop="name">Jane Doe</span>
<img src="janedoe.jpg" itemprop="image" />
http://schema.org/
<span itemprop="jobTitle">Professor</span>
<div itemprop="address" itemscope itemtype="http://schema.org/PostalAddress">
<span itemprop="streetAddress">
20341 Whitworth Institute
405 N. Whitworth
</span>
<span itemprop="addressLocality">Seattle</span>,
<span itemprop="addressRegion">WA</span>
<span itemprop="postalCode">98052</span>
</div>
<span itemprop="telephone">(425) 123-4567</span>
<a href="mailto:jane-doe@xyz.edu" itemprop="email">
jane-doe@xyz.edu</a>
Jane's home page:
<a href="http://www.janedoe.com" itemprop="url">janedoe.com</a>
Graduate students:
<a href="http://www.xyz.edu/students/alicejones.html" itemprop="colleague">
Alice Jones</a>
<a href="http://www.xyz.edu/students/bobsmith.html" itemprop="colleague">
Bob Smith</a>
</div>
3-14
Web-of-Data vs. Web-of-Contents
Critical for knowledge linkage:
entity name ambiguity
 more structured data combined with text
 boosted by knowledge harvesting methods
3-15
Embedding RDFa in Web Contents
<html … May 2, 2011
May 2, 2011
<div typeof=event:music>
Maestro Morricone will perform
on the stage of the Smetana Hall
to conduct the Czech National
Symphony Orchestra and Choir.
The concert will feature both
Classical compositions and
soundtracks such as
the Ecstasy of Gold.
In programme two concerts for
July 14th and 15th.
<span id="Maestro_Morricone">
Maestro Morricone
<a rel="sameAs"
resource="dbpedia…/Ennio_Morricone "/>
</span>
…
<span property = "event:location" >
Smetana Hall </span>
…
<span property="rdf:type"
resource="yago:performance">
The concert </span> will feature
…
RDF data and Web contents
need to be interconnected
<span property="event:date"
RDFa & microformatscontent="14-07-2011"></span>
provide the mechanism
July 1
Need ways of creating more embedded RDF triples!
</div>
3-16
Outline

Motivation
Entity-Name Disambiguation
Mapping Questions into Queries
Entity Linkage
Wrap-up
...
3-17
Named-Entity Disambiguation
Harry fought with you know who. He defeats the dark lord.
Dirty
Harry
Harry
Potter
Prince Harry
of England
The Who
(band)
Lord
Voldemort
Three NLP tasks:
1) named-entity detection: segment & label by HMM or CRF
(e.g. Stanford NER tagger)
2) co-reference resolution: link to preceding NP
(trained classifier over linguistic features)
3) named-entity disambiguation:
map each mention (name) to canonical entity (entry in KB)
3-18
Named Entity Disambiguation
Eli (bible)
Sergio talked to
Ennio about
Eli‘s role in the
Ecstasy scene.
This sequence on
the graveyard
was a highlight in
Sergio‘s trilogy
of western films.
Mentions
(surface names)
Eli Wallach
?
Benny
EcstasyGoodman
(drug)
EcstasyAndersson
of Gold
Benny
Star Wars Trilogy
KB
Sergio means Sergio_Leone
Sergio means Serge_Gainsbourg
Ennio means Ennio_Antonelli
Ennio means Ennio_Morricone
Eli means Eli_(bible)
Eli means ExtremeLightInfrastructure
Eli means Eli_Wallach
Ecstasy means Ecstasy_(drug)
Ecstasy means Ecstasy_of_Gold
trilogy means Star_Wars_Trilogy
trilogy means Lord_of_the_Rings
trilogy means Dollars_Trilogy
Lord of the Rings
Dollars Trilogy
Entities
(meanings)
3-19
Mention-Entity Graph
weighted undirected graph with two types of nodes
bag-of-words or
Sergio talked to language model: Eli (bible)
words, bigrams,
Ennio about
Eli Wallach
phrases
Eli‘s role in the
Ecstasy (drug)
Ecstasy scene.
This sequence on
Ecstasy of Gold
the graveyard
Star Wars
was a highlight in
Sergio‘s trilogy
Lord of the Rings
of western films.
Dollars Trilogy
Popularity
(m,e):
Similarity
(m,e):
• freq(e|m)
• length(e)
• #links(e)
• cos/Dice/KL
(context(m),
context(e))
KB+Stats
3-20
Mention-Entity Graph
weighted undirected graph with two types of nodes
Sergio talked to
Ennio about
Eli‘s role in the
Ecstasy scene.
This sequence on
the graveyard
was a highlight in
Sergio‘s trilogy
of western films.
Eli (bible)
Eli Wallach
joint
mapping
Popularity
(m,e):
Similarity
(m,e):
• freq(e|m)
• length(e)
• #links(e)
• cos/Dice/KL
(context(m),
context(e))
Ecstasy (drug)
Ecstasy of Gold
Star Wars
Lord of the Rings
Dollars Trilogy
KB+Stats
3-21
Mention-Entity Graph
weighted undirected graph with two types of nodes
Sergio talked to
Ennio about
Eli‘s role in the
Ecstasy scene.
This sequence on
the graveyard
was a highlight in
Sergio‘s trilogy
of western films.
Eli (bible)
Eli Wallach
Ecstasy(drug)
Ecstasy of Gold
Star Wars
Lord of the Rings
Popularity
(m,e):
Similarity
(m,e):
• freq(m,e|m)
• length(e)
• #links(e)
• cos/Dice/KL
(context(m),
context(e))
Dollars Trilogy
KB+Stats
Coherence
(e,e‘):
• dist(types)
• overlap(links)
• overlap
22
/ 20
3-22
(anchor words)
Mention-Entity Graph
weighted undirected graph with two types of nodes
American Jews
Eli (bible)
film actors
Sergio talked to
artists
Ennio about
Academy Award winners
Eli Wallach
Eli‘s role in the
Metallica songs
Ecstasy (drug)
Ecstasy scene.
Ennio Morricone songs
This sequence on
artifacts
Ecstasy of Gold
soundtrack music
the graveyard
Star Wars
was a highlight in
spaghetti westerns
Sergio‘s trilogy
Lord of the Rings
film trilogies
movies
of western films.
artifacts
Dollars Trilogy
Popularity
(m,e):
Similarity
(m,e):
• freq(m,e|m)
• length(e)
• #links(e)
• cos/Dice/KL
(context(m),
context(e))
KB+Stats
Coherence
(e,e‘):
• dist(types)
• overlap(links)
• overlap
23
/ 20
3-23
(anchor words)
Mention-Entity Graph
weighted undirected graph with two types of nodes
http://.../wiki/Dollars_Trilogy
Eli (bible)
http://.../wiki/The_Good,_the_Bad, _
Sergio talked to
http://.../wiki/Clint_Eastwood
Ennio about
http://.../wiki/Honorary_Academy_A
Eli Wallach
Eli‘s role in the
Ecstasy (drug)
http://.../wiki/The_Good,_the_Bad,_t
Ecstasy scene.
http://.../wiki/Metallica
This sequence on
Ecstasy of Gold
http://.../wiki/Bellagio_(casino)
http://.../wiki/Ennio_Morricone
the graveyard
Star Wars
was a highlight in
http://.../wiki/Sergio_Leone
Sergio‘s trilogy
Lord of the Rings
http://.../wiki/The_Good,_the_Bad,_
http://.../wiki/For_a_Few_Dollars_M
of western films.
http://.../wiki/Ennio_Morricone
Dollars Trilogy
Popularity
(m,e):
Similarity
(m,e):
• freq(m,e|m)
• length(e)
• #links(e)
• cos/Dice/KL
(context(m),
context(e))
KB+Stats
Coherence
(e,e‘):
• dist(types)
• overlap(links)
• overlap
24
/ 20
3-24
(anchor words)
Mention-Entity Graph
weighted undirected graph with two types of nodes
The Magnificent Seven
Eli (bible)
The Good, the Bad, and the Ugly
Sergio talked to
Clint Eastwood
Ennio about
University of Texas at Austin
Eli Wallach
Eli‘s role in the
Metallica on Morricone tribute
Ecstasy (drug)
Ecstasy scene.
Bellagio water fountain show
This sequence on
Yo-Yo Ma
Ecstasy of Gold
Ennio Morricone composition
the graveyard
Star Wars
was a highlight in
For a Few Dollars More
Sergio‘s trilogy
Lord of the Rings
The Good, the Bad, and the Ugly
Man with No Name trilogy
of western films.
soundtrack by Ennio Morricone
Dollars Trilogy
Popularity
(m,e):
Similarity
(m,e):
• freq(m,e|m)
• length(e)
• #links(e)
• cos/Dice/KL
(context(m),
context(e))
KB+Stats
Coherence
(e,e‘):
• dist(types)
• overlap(links)
• overlap
25
/ 20
3-25
(anchor words)
Different Approaches
Combine Popularity, Similarity, and Coherence Features
(Cucerzan: EMNLP‘07, Milne/Witten: CIKM‘08):
• for sim (context(m), context(e)):
consider surrounding mentions
and their candidate entities
• use their types, links, anchors
as features of context(m)
• set m-e edge weights accordingly
• use greedy methods for solution
Collective Learning with Prob. Factor Graphs
(Chakrabarti et al.: KDD‘09):
• model P[m|e] by similarity and P[e1|e2] by coherence
• consider likelihood of P[m1 … mk | e1 … ek]
• factorize by all m-e pairs and e1-e2 pairs
• use hill-climbing, LP, etc. for solution
3-26
Joint Mapping
50
30
20
30
50
10
10
90
100
30
90
100
5
20
80
30
90
• Build mention-entity graph or joint-inference factor graph
from knowledge and statistics in KB
• Compute high-likelihood mapping (ML or MAP) or
dense subgraph such that:
each m is connected to exactly one e (or at most one e)
3-27
Mention-Entity Popularity Weights
[Milne/Witten 2008, Spitkovsky/Chang 2012]
Need dictionary with entities‘ names:
• full names: Arnold Alois Schwarzenegger, Los Angeles, Microsoft Corporation
• short names: Arnold, Arnie, Mr. Schwarzenegger, New York, Microsoft, …
• nicknames & aliases: Terminator, City of Angels, Evil Empire, …
• acronyms: LA, UCLA, MS, MSFT
• role names: the Austrian action hero, Californian governor, the CEO of MS, …
…
plus gender info (useful for resolving pronouns in context):
Bill and Melinda met at MS. They fell in love and he kissed her.
Collect hyperlink anchor-text / link-target pairs from
• Wikipedia redirects
• Wikipedia links between articles
• Interwiki links between Wikipedia editions
• Web links pointing to Wikipedia articles
…
Build statistics to estimate P[entity | name]
3-28
Mention-Entity Similarity Edges
Precompute characteristic keyphrases q for each entity e:
anchor texts or noun phrases in e page with high PMI:
freq(q, e)
„Metallica tribute to Ennio Morricone“
weight (q, e)  log
freq(q) freq(e)
Match keyphrase q of candidate e in context of mention m
1 
# matching words   wcover(q)weight ( w | e) 
score (q | e) ~

length of cover(q) 
weight(w
|
e)

wq


Extent of partial matches
Weight of matched words
The Ecstasy piece was covered by Metallica on the Morricone tribute album.
Compute overall similarity of context(m) and candidate e
score (e | m) ~

score
(
q
)
dist
(
cover(q)
,
m
)

qkeyphrases(e)
in context( m)
3-29
Entity-Entity Coherence Edges
Precompute overlap of incoming links for entities e1 and e2
log max( in (e1, e2))  log( in (e1)  in (e2))
mw - coh(e1, e2) ~ 1 
log | E |  log min( in (e1), in (e2))
Alternatively compute overlap of anchor texts for e1 and e2
ngrams(e1)  ngrams(e2)
ngram - coh(e1, e2) ~
ngrams(e1)  ngrams(e2)
or overlap of keyphrases, or similarity of bag-of-words, or …
Optionally combine with type distance of e1 and e2
(e.g., Jaccard index for type instances)
For special types of e1 and e2 (locations, people, etc.)
use spatial or temporal distance
3-30
Coherence Graph Algorithm
[J. Hoffart et al.: EMNLP‘11]
50
30
20
30
100
30
90
100
5
140
180
50
50
10
10
90
470
20
80
145
30
90
230
• Compute dense subgraph to
maximize min weighted degree among entity nodes
such that:
each m is connected to exactly one e (or at most one e)
• Greedy approximation:
iteratively remove weakest entity and its edges
• Keep alternative solutions, then use local/randomized search
3-31
Coherence Graph Algorithm
[J. Hoffart et al.: EMNLP‘11]
50
30
100
30
90
100
5
140
30
180
170
50
10
90
50
470
80
145
30
90
230
210
• Compute dense subgraph to
maximize min weighted degree among entity nodes
such that:
each m is connected to exactly one e (or at most one e)
• Greedy approximation:
iteratively remove weakest entity and its edges
• Keep alternative solutions, then use local/randomized search
3-32
Coherence Graph Algorithm
[J. Hoffart et al.: EMNLP‘11]
140
30
170
120
90
100
30
90
100
5
460
80
145
30
90
210
• Compute dense subgraph to
maximize min weighted degree among entity nodes
such that:
each m is connected to exactly one e (or at most one e)
• Greedy approximation:
iteratively remove weakest entity and its edges
• Keep alternative solutions, then use local/randomized search
3-33
Coherence Graph Algorithm
[J. Hoffart et al.: EMNLP‘11]
30
120
90
100
380
90
100
145
90
210
• Compute dense subgraph to
maximize min weighted degree among entity nodes
such that:
each m is connected to exactly one e (or at most one e)
• Greedy approximation:
iteratively remove weakest entity and its edges
• Keep alternative solutions, then use local/randomized search
3-34
Alternative: Random Walks

0.5
50
0.3
30
0.2
20
0.23
30


0.83
50
0.1
10
0.17
10
0.7
90
0.77
100
0.25
30
0.75
90
0.96
100
0.04
5


0.2
20
0.4
80
0.15
30
0.75
90

• for each mention run random walks with restart
(like personalized PR with jumps to start mention(s))
• rank candidate entities by stationary visiting probability
• very efficient, decent accuracy
3-35
AIDA: Accurate Online Disambiguation
http://www.mpi-inf.mpg.de/yago-naga/aida/
3-36
AIDA: Accurate Online Disambiguation
http://www.mpi-inf.mpg.de/yago-naga/aida/
3-37
AIDA: Very Difficult Example
http://www.mpi-inf.mpg.de/yago-naga/aida/
3-38
AIDA: Very Difficult Example
http://www.mpi-inf.mpg.de/yago-naga/aida/
3-39
AIDA: Accurate Online Disambiguation
http://www.mpi-inf.mpg.de/yago-naga/aida/
3-40
AIDA: Accurate Online Disambiguation
http://www.mpi-inf.mpg.de/yago-naga/aida/
3-41
AIDA: Accurate Online Disambiguation
http://www.mpi-inf.mpg.de/yago-naga/aida/
3-42
AIDA: Accurate Online Disambiguation
http://www.mpi-inf.mpg.de/yago-naga/aida/
3-43
Some NED Online Tools for
J. Hoffart et al.: EMNLP 2011, VLDB 2011
https://d5gate.ag5.mpi-sb.mpg.de/webaida/
P. Ferragina, U. Scaella: CIKM 2010
http://tagme.di.unipi.it/
R. Isele, C. Bizer: VLDB 2012
http://spotlight.dbpedia.org/demo/index.html
Reuters Open Calais
http://viewer.opencalais.com/
S. Kulkarni, A. Singh, G. Ramakrishnan, S. Chakrabarti: KDD 2009
http://www.cse.iitb.ac.in/soumen/doc/CSAW/
D. Milne, I. Witten: CIKM 2008
http://wikipedia-miner.cms.waikato.ac.nz/demos/annotate/
perhaps more
some use Stanford NER tagger for detecting mentions
http://nlp.stanford.edu/software/CRF-NER.shtml
3-44
NED: Experimental Evaluation
Benchmark:
• Extended CoNLL 2003 dataset: 1400 newswire articles
• originally annotated with mention markup (NER),
now with NED mappings to Yago and Freebase
• difficult texts:
… Australia beats India …
… White House talks to Kreml …
… EDS made a contract with …
 Australian_Cricket_Team
 President_of_the_USA
 HP_Enterprise_Services
Results:
Best: AIDA method with prior+sim+coh + robustness test
82% precision @100% recall, 87% mean average precision
Comparison to other methods, see paper
J. Hoffart et al.: Robust Disambiguation of Named Entities in Text, EMNLP 2011
http://www.mpi-inf.mpg.de/yago-naga/aida/
3-45
Ongoing Research & Remaining Challenges
• More efficient graph algorithms (multicore, etc.)
• Allow mentions of unknown entities, mapped to null
• Leverage deep-parsing structures,
leverage semantic types
Example: Page played Kashmir on his Gibson
subj
obj
mod
• Short and difficult texts:
• tweets, headlines, etc.
• fictional texts: novels, song lyrics, etc.
• incoherent texts
• Structured Web data: tables and lists
• Disambiguation beyond entity names:
• coreferences: pronouns, paraphrases, etc.
• common nouns, verbal phrases (general WSD)
3-46
General Word Sense Disambiguation
{songwriter,
composer}
{cover, perform}
Which
song writers
{cover, report, treat}
{cover, help out}
covered
ballads
written by
the Stones ?
3-47
Handling Out-of-Wikipedia Entities
wikipedia.org/Good_Luck_Cave
Cave
composed
haunting
songs like
Hallelujah,
O Children,
and the
Weeping Song.
wikipedia.org/Nick_Cave
wikipedia/Hallelujah_Chorus
wikipedia/Hallelujah_(L_Cohen)
last.fm/Nick_Cave/Hallelujah
wikipedia/Children_(2011 film)
last.fm/Nick_Cave/O_Children
wikipedia.org/Weeping_(song)
last.fm/Nick_Cave/Weeping_Song
3-48
Handling Out-of-Wikipedia Entities
Gunung Mulu National Park
Sarawak Chamber
largest underground chamber
wikipedia.org/Good_Luck_Cave
Cave
composed
haunting
songs like
Hallelujah,
O Children,
and the
Weeping Song.
wikipedia.org/Nick_Cave
wikipedia/Hallelujah_Chorus
wikipedia/Hallelujah_(L_Cohen)
Bad Seeds
No More Shall We Part
Murder Songs
Messiah oratorio
George Frideric Handel
Leonard Cohen
Rufus Wainwright
Shrek and Fiona
last.fm/Nick_Cave/Hallelujah
eerie violin
Bad Seeds
No More Shall We Part
wikipedia/Children_(2011 film)
South Korean film
last.fm/Nick_Cave/O_Children
Nick Cave & Bad Seeds
Harry Potter 7 movie
haunting choir
wikipedia.org/Weeping_(song)
last.fm/Nick_Cave/Weeping_Song
Dan Heymann
apartheid system
Nick Cave
Murder Songs
P.J. Harvey
Nick and Blixa duet
3-49
Handling Out-of-Wikipedia Entities
[J. Hoffart et al.: CIKM‘12]
• Characterize all entities (and mentions) by sets of keyphrases
• Entity coherence then becomes:
keyphrases overlap, no need for href link data
• For each mention add a „self“ candidate:
out-of-KB entity with keyphrases computed by Web search
PO(p,q) =
phrases p,q
KORE (e,f) =
entities e,f
wpq
wpq
min(p(w), q(w))
max(p(w), q(w))
pe,qf PO(p,q)2  min(e(p), f(q))
pe e(p) + qf f(q)
with word
weights 
with phrase
weights 
Efficient comparison of two keyphrase-sets
 two-stage hashing, using min-hash sketches and LSH
3-50
Variants of NED at Web Scale
Tools can map short text onto entities in a few seconds
• How to run this on big batch of 1 Mio. input texts?
 partition inputs across distributed machines,
organize dictionary appropriately, …
 exploit cross-document contexts
• How to deal with inputs from different time epochs?
 consider time-dependent contexts,
map to entities of proper epoch
(e.g. harvested from Wikipedia history)
• How to handle Web-scale inputs (100 Mio. pages)
restricted to a set of interesting entities?
(e.g. tracking politicians and companies)
3-51
Outline

Motivation

Entity-Name Disambiguation
Mapping Questions into Queries
Entity Linkage
Wrap-up
...
3-52
Word Sense Disambiguation for
Question-to-Query Translation
QA system
DEANNA
[M. Yahya et al.:
EMNLP‘12]
Question
Translation
with WSD
SPARQL
KB
Answer
“Who played in Casablanca
and was married to a writer
born in Rome?”
Select ?p Where {
?p type person.
?p actedIn Casablanca_(film).
?p isMarriedTo ?w.
?w type writer .
?w bornIn Rome . }
?p
?w
www.mpi-inf.mpg.de/
yago-naga/deanna/3-53
DEANNA in a Nutshell
Phrase
detection
Question
Phrase
mapping
DEANNA
Dependency
detection
SPARQL
Joint
Disambig.
Query
Generation
KB
Answers
3-54
DEANNA in a Nutshell
Phrase
detection
Question
Phrase
mapping
DEANNA
Dependency
detection
SPARQL
Joint
Disambig.
Query
Generation
KB
Answers
3-55
DEANNA in a Nutshell
Phrase
detection
Question
Phrase
mapping
DEANNA
Dependency
detection
SPARQL
Joint
Disambig.
Query
Generation
KB
Answers
3-56
DEANNA in a Nutshell
Phrase
detection
Question
Phrase
mapping
DEANNA
Dependency
detection
SPARQL
Joint
Disambig.
Query
Generation
KB
Answers
3-57
DEANNA Components
1
Phrase
detection
Question
2
Phrase
mapping
DEANNA
3
Dependency
detection
SPARQL
4
Joint
Disambig.
Query
Generation
KB
Answers
3-58
Phrase Detection
Concepts: entities & classes:
dictionary-based
Concept
Phrase
Casablanca
Casablanca
Casablanca
Casablanca, Morocco
played
Casablanca_(film)
Casablanca the film
played in
Casablanca_(film)
Casablanca
Who
…
a writer
Casablanca
married
married to
was married to
…
Relations:
mainly use Reverb [Fader et al: EMNLP’11]:
V | VP | VW*P
… was/VBD married/VBN to/TO a/DT…
3-59
DEANNA Components
Question
1
Phrase
detection
2
Phrase
mapping
DEANNA
3
Dependency
detection
SPARQL
4
Joint
Disambig.
KB
Query
Generation
3-60
Phrase Mapping
Concepts: entities & classes:
dictionary-based
Concept
Phrase
Casablanca
Casablanca
Casablanca
Casablanca, Morocco
e:Casablanca_(film)
Casablanca_(film)
Casablanca the film
e:Played_(film)
Casablanca_(film)
Casablanca
Played_(film)
Played
Casablanca
e:White_House
played
e:Casablanca
played in
r:actedIn
r:hasMusicalRole
Relations:
Relation
DictionaryPhrase
-based
actedIn
acted in
actedIn
played in
hasMusicalRole
plays
hasMusicalRole
mastered
3-61
DEANNA Components
Question
1
Phrase
detection
2
Phrase
mapping
DEANNA
3
Dependency
detection
SPARQL
4
Joint
Disambig.
KB
Query
Generation
3-62
Dependency Detection
e:Rome
Rome
e:Sydne_Rome
born
e:Born_(film)
Look for specific
patterns in
dependency graph
[de Marneffe et al.
LREC’06]
e:Max_Born
q1
was born
a writer
r:bornOnDate
r:bornInPlace
writer
partmod
c:writer
born
prep
in
pobj
Rome
3-63
Disambiguation Graph
Phrase-nodes
Semantic nodes
e:Rome
Rome
e:Sydne_Rome
q-nodes
born
e:Born_(film)
q1
was born
e:Max_Born
a writer
q2
r:bornInPlace
c:writer
Casablanca
e:White_House
played
e:Casablanca
played in
Who
married
q3
r:bornOnDate
married to
was married to
e:Casablanca_(film)
e:Played_(film)
r:actedIn
r:hasMusicalRole
c:person
e:Married_(series)
c: married_person
r:isMarriedTo
3-64
DEANNA Components
Question
1
Phrase
detection
2
Phrase
mapping
DEANNA
3
Dependency
detection
SPARQL
4
Joint
Disambig.
KB
Query
Generation
3-65
Joint Disambiguation - ILP
• ILP: Integer Linear Programming
• maximize α Σi,j wi,jYi,j + β Σk,l vk,l Zk,l + …
• Subject to:
 No token in multiple phrases,
 Triples observe type constraints, …
3-66
Joint Disambiguation – Objective
Phrase nodes
Rome
q-nodes
q1
born
was born
a writer
Similarity
Edges
Semantic nodes
e:Rome
e:Sydne_Rome
Coherence
Edges
e:Born_(film)
e:Max_Born
r:bornOnDate
r:bornInPlace
c:writer
α Σi,j wi,jYi,j + β Σk,l vk,l Zk,l
Prior
3-67
Joint Disambiguation – Objective
Phrase nodes
Rome
q-nodes
q1
born
was born
a writer
Similarity
Edges
Semantic nodes
e:Rome
e:Sydne_Rome
Coherence
Edges
e:Born_(film)
e:Max_Born
r:bornOnDate
r:bornInPlace
c:writer
α Σi,j wi,jYi,j + β Σk,l vk,l Zk,l
Coherence
3-68
Joint Disambiguation – Constraints
A phrase node can be assigned to only one semantic node:
Semantic nodes
Phrase nodes
a
Casablanca
Ya,1
Ya,2
Ya,3
e:White_House
e:Casablanca
e:Casablanca_(film)
1
2
3
α Σi,j wi,jYi,j + β Σk,l vk,l Zk,l
3-69
Joint Disambiguation – Constraints
Classes translate to type-constrained variables
 Every semantic triple should have a class to
join & project!
person actedIn Casablanca_(film)
▼
?x type person . ?x actedIn Casablanca_(film)
Phrase nodes
Semantic nodes
e:Rome
Rome
q-nodes
e:Sydne_Rome
r:bornOnDate
q1
was born
r:bornInPlace
e:The_Writer (magazine)
a writer
c:writer
3-70
DEANNA Components
Question
1
Phrase
detection
2
Phrase
mapping
DEANNA
3
Dependency
detection
SPARQL
4
Joint
Disambig.
KB
Query
Generation
3-71
Structured Query Generation
q1
q2
q3
SELECT
?w
?w
?p
?p
?p
Rome
e:Rome
was born
r:bornIn
a writer
c:writer
Casablanca
e:Casablanca_(film)
played in
r:actedIn
Who
c:person
was married to
r:isMarriedTo
?p WHERE {
type writer .
bornIn Rome .
type person.
actedIn Casablanca_(film).
isMarriedTo ?w }
3-72
Outline

Motivation

Entity-Name Disambiguation

Mapping Questions into Queries
Entity Linkage
Wrap-up
...
3-73
Entity Linkage for the Web of Data
30 Bio. triples
500
Mio. links
yago/wordnet:Actor109765278
yago/wordnet: Artist109812338
yago/wikicategory:ItalianComposer
imdb.com/name/nm0910607/
dbpedia.org/resource/Ennio_Morricone
imdb.com/title/tt0361748/
dbpedia.org/resource/Rome
rdf.freebase.com/ns/en.rome_ny
?
?
data.nytimes.com/51688803696189142301
sameAs links ?
Where? How?
?
geonames.org/5134301/city_of_rome
N 43° 12' 46'' W 75° 27' 20''
3-74
Record Linkage (Entity Resolution)
…
record 1
record 2
record 3
record N
Susan B. Davidson
Peter Buneman
Yi Chen
O.P. Buneman
S. Davison
Y. Chen
P. Baumann
S. Davidson
Cheng Y.
Y. Davidson
Sean Penn
S. Chen
University of
Pennsylvania
Issues in …
U Penn
Penn State
Penn Station
Issues in …
Issues in …
Issues in …
Int. Conf. on Very
Large Data Bases
VLDB Conf.
PVLDB
XLDB
Conference
Find equivalence classes of entities, and records, based on:
• similarity of values (edit distance, n-gram overlap, etc.)
• joint agreement of linkage
 similarity joins, grouping/clustering, collective learning, etc.
 often domain-specific customization (similarity measures etc.)
Halbert L. Dunn: Record Linkage. American Journal of Public Health. 1946
H.B. Newcombe et al.: Automatic Linkage of Vital Records. Science, 1959.
I.P. Fellegi, A.B. Sunter: A Theory of Record Linkage. J. of American Statistical Soc., 1969.
3-75
Entity Linkage via Markov Logic
record 1
record 2
record 3
Susan B. Davidson
Peter Buneman
Yi Chen
O.P. Buneman
S. Davison
Y. Chen
P. Baumann
S. Davidson
Cheng Y.
…
record N
Y. Davidson
Sean Penn
S. Chen
University of
U Penn
Penn State
Penn Station
Pennsylvania prob. / uncertain rules:
Issues in … sameTitle(x,y)
Issues
in …
Issues
in …
in …
 sameAuths(x,y)
 sameVenue(x,y)
Issues
sameAs(x,y)
sameTitle(x,y)
 sameAuths(x,y)
 sameAffil(x,y)  sameAs(x,y)
Int. Conf. on Very
VLDB
Conf.
PVLDB
XLDB
overlapAuths(x,y)  sameAffil(x,y)  sameAuths(x,y) Conference
Large Data Bases
sameAs(rec1.auth1, rec2.auth1) [0.2]
sameAs(rec1.auth1,
[0.9] records, based on:
Find equivalence
classes ofrec2.auth2)
entities, and
… of values (edit distance, n-gram overlap, etc.)
• similarity
• specify in of
Markov
Logic or as factor graph
• joint agreement
linkage
MRF (or …) and solvecollective
by MCMClearning,
(or …) etc.
 similarity• generate
joins, grouping/clustering,
(Singla/Domingos: ICDM’06,
Hall/Sutton/McCallum:KDD’08)
Halbert L. Dunn: Record Linkage. American Journal of Public Health. 1946
H.B. Newcombe et al.: Automatic Linkage of Vital Records. Science, 1959.
I.P. Fellegi, A.B. Sunter: A Theory of Record Linkage. J. of American Statistical Soc., 1969.
3-76
sameAs-Link Test across Sources
LOD source 1
ei
LOD source 2
sameAs ?
ej
?
?
?
?
similarity: sim (ei, ej)
neighborhoods: N(ei), N(ej)
coherence: coh (xN(ei), yN(ej))
record linkage
problem
sameAs (ei, ej)

sim (ei, ej) ≥ … 
x,y coh(x,y) ≥ …
3-77
sameAs-Link Generation across Sources
LOD source 1
ei
LOD source 2
sameAs ?
sameAs ?
LOD source 3
sameAs ?
ej
ek
…
…
…
3-78
sameAs-Link Generation across Sources
LOD source 1
ei
LOD source 2
sameAs ?
sameAs ?
ej
LOD source 3
sameAs ?
• Joint Mapping
• ILP model
or prob. factor graph or …
• Use your favorite solver
• How?
ek
at Web
scale ???
sim(ei, ej): likelihood of being equivalent, mapped to [-1,1]
coh(x, y): likelihood of being mentioned together, mapped to [-1,1]
0-1 decision variables: Xij … Xjk … Xik …
objective function:
ij (Xij sim(ei,ej) + Xij xNi, yNj coh(x,y))
+ jk (…) + ik (…) = max!
constraints:
j Xij  1 for all i
…
(1Xij ) + (1Xjk )  (1Xik)
for all i, j, k
…
3-79
Similarity Flooding
Graph with record / entity pairs as nodes (sameAs candidates)
and edges connecting related pairs:
R(x,y) and S(u,w) and sameAs candidates (x,u), (y,w)
 edge between (x,u) and (y,w)
• Node weights: belief strength in sameAs(x,u)
• Edge weights: degree of relatedness
Iterate until convergence:
• propagate node weights to neighbors
• new node weight is linear combination of inputs
Related to
belief propagation algorithms,
label propagation, etc.
3-80
Blocking of Match Candidates
Avoid computing O(n2) similarities between records / entities
• Group potentially matching records
• Run more accurate & more expensive method per group
at risk of missing some matches
1
2
3
4
5
Name
John Doe
John Doe
Jon Foe
Jane Foe
Jane Fog
Zip
49305
94305
94305
12345
12345
Email
jdoe@yahoo
jdoe@gmail
jdoe@yahoo
jane@msn
jane@msn
Group by zip code:
{1,4,5} and {2,3}
 sameAs(4,5), sameAs(2,3)
Group by 1st char of lastname:
{1,2} and {3,4,5}
 sameAs(1,2), sameAs(4,5)
• Iterative Blocking:
distribute found matches to other blocks,
then repeat per-block runs
• Multi-type Joint Resolution
blocks of different record types (author, venue, etc.)
propagate matches to other types, then repeat runs
3-81
Iterative Blocking for Joint Resolution
[Whang et al. 2012]
with Multiple Entity Types
Publications
Authors
Venues
after
round 1
after
round 2
heuristics for constructing
efficient execution plans
exploiting „influence graph“
3-82
RiMOM Method
[Juanzi Li et al.;TKDE‘09]
Risk Minimization Based Ontology Matching Method
for joint matching of concepts (entities, classes) & properties (relations)
Strategies using variety
of matching criteria:
• Linguistic-based:
• edit distance
• context vector
distance
…
• Structure-based:
• similarity flooding
…
keg.cs.tsinghua.edu.cn/project/RiMOM/
3-83
COMA++ Framework
[E. Rahm et al.]
• Joint schema alignment and entity matching
• Comprehensive architecture with many plug-ins
for customizing to specific application
• Blocked matchers parallelizable on Map-Reduce platform
dbs.uni-leipzig.de/Research/coma.html/
3-84
PARIS Method
[F. Suchanek et al. 2012]
Probabilistic Alignment of Relations, Instances, and Schema:
joint reasoning on sameEntity, sameRelation, sameClass
with direct probabilistic assessment
P[literal1  literal2] = …
same constant value
P[r1  r2] = …
sub-relation
P[e1  e2] = …
same entity
P[c1  c2] = …
sub-class
Iterate through probabilistic equations
Empirically converges to fixpoint
Matching entities of DBpedia with YAGO:
90% precision, 73% recall, after 4 iterations, 5 h run-time
webdam.inria.fr/paris/
3-85
PARIS Method
P[literal1  literal2] = …
P[x  y] =
[F. Suchanek et al. 2012]
based on similarity
and co-occurrence
P[Shanri-La  Zhongdian] =
… fun(bornIn-1) P[Jet Li  Li Lianjie]
(1  r(x,u),r(y,w) (1 fun(r1)P[uw]))
 r(x,u) (1 fun(r)  r(y,w) (1  P[uw])))
same entity
if relations were
already aligned
considering
negative evidence
where
fun(r) =
#x: y: r(x,y)
#x,y: r(x,y))
degree to which r
Is a function
webdam.inria.fr/paris/
3-86
PARIS Method
P[s  r]:
P[s  r] =
P[s  r] =
[F. Suchanek et al. 2012]
sub-relation
#x,u: s(x,u)  r(x,u)
#x,u: s(x,u)
if entities were
already resolved
s(x,u) (1  r(y,w) (1 P[xy]P[uw]))
s(x,u) (1  y,w (1 P[xy]P[uw]))
with same-entity
probabilities
webdam.inria.fr/paris/
3-87
PARIS Method
[F. Suchanek et al. 2012]
P[x  y] =
same entity
revisited
(1  s(x,u),r(y,w) (1 P[s  r]fun(s1)P[uw]) 
(1 P[s  r]fun(r1)P[uw]))
with
sub-relation
probabily
 s(x,u),r(y,w) (1 P[s  r] fun(s)  r(y,w) (1  P[uw])) 
(1 P[s  r] fun(r)  r(y,w) (1  P[uw]))
considering
negative evidence
webdam.inria.fr/paris/
3-88
PARIS Method
P[c  d]:
P[c  d] =
P[c  d] =
[F. Suchanek et al. 2012]
sub-class
#x type(x,c))  type(x,d)
#x: type(x,c)
if entities were
already resolved
x:type(x,c) (1  y:type(y,d) (1 P[xy]))
#x: type(x,c)
with same-entity
probabilities
webdam.inria.fr/paris/
3-89
Partitioned MLN Method
V. Rastogi et al. 2011]
• Use Markov Logic Network for entity resolution
• Partition MLN with replication of nodes so that:
• Each node has its neighborhood in the same partition
Repeat
• local computation:
run MLN inference via MCMC on each partition (in parallel)
• message passing:
exchange beliefs (on sameAs) among partitions
with overlapping node sets
Until convergence
R1: sim(x,y)  sameAuthor(x,y)
R2: sim(x,y)  coAuthor(x,a) 
coAuthor(y,b) s ameAuthor(a,b)
 sameAuthor(x,y)
3-90
LINDA: Linked Data Alignment at Scale
[C. Böhm et al. 2012]
• uses context sim and joint inference to process
sameAs matrix with transitivity and other constraints
• alternates between setting sameAs and recomputing sim
• puts promising candidate pairs in priority queue
• queue is partitioned and processing parallelized
ek
Node n
Q-part 1
(2) notify
ei ej y
(3) update
…
…
Q-part n
ek el y
…
…
G-part 1
e e
G-part n
e e
i
j
k
ej
Result
Matrix
X
e1 … em
em… e1
read
el
l
…
read
…
Input
Queue Q
distribute
ei
Node 1
distribue
distribute
Input
Entity Graph G
Queue
Updates
ei ej y‘
ei ek y‘
…
ei ej y
ei ek y
ek el y
…
…
…
…
…
Experiment
with BTC+ dataset:
• 3 Bio. quads
• 345 Mio. triples
• 95 Mio. URIs
Result after
30 h run-time:
• 12.3 Mio. sameAs
• 66% precision
• > 80% for
Dbpedia-Yago
3-91
Cross-Lingual Linking
+ simpler than monolingual: natural equivalences, interwiki links
 harder than monolingual: different terminologies & structures
en.wikipedia.org:
3.5 Mio. articles
baike.baidu.com:
4 Mio. articles
Source: Z. Wang et al.: WWW‘12
Z. Wang et al. WWW‘12: factor-graph learning  200,000 sameAs
T. Nguyen et al. VLDB‘12: sim features & LSI  infobox mappings
3-92
Challenges Remaining
Entity linkage is at the heart of semantic data integration !
More than 50 years of research, still some way to go!
• Highly related entities with ambiguous names
George W. Bush (jun.) vs. George H.W. Bush (sen.)
• Out-of-Wikipedia entities with sparse context
• Enterprise data (perhaps combined with Web2.0 data)
• Records with complex DB / XML / OWL schemas
• Entities with very noisy context (in social media)
Benchmarks:
•
•
•
OAEI Ontology Alignment & Instance Matching: oaei.ontologymatching.org
TAC KBP Entity Linking: www.nist.gov/tac/2012/KBP/
TREC Knowledge Base Acceleration: trec-kba.org
3-93
TREC Task: Knowledge Base Acceleration
Goal: assist Wikipedia / KB editors
• recommend key citations as evidence of truth
• recommend infobox structure and categories
• recommend entity links and external links
http://trec-kba.org
3-94
TREC Task: Knowledge Base Acceleration
+

http://trec-kba.org
3-95
Outline

Motivation

Entity-Name Disambiguation

Mapping Questions into Queries

Entity Linkage
Wrap-up
...
3-96
Take-Home Lessons
Web of Linked Data is great
100‘s of KB‘s with 30 Bio. triples and 500 Mio. links
mostly reference data, dynamic maintenance is bottleneck
connection with Web of Contents needs improvement
Entity detection and disambiguation is key
for creating sameAs links in text (RDFa, microformats)
for machine reading, semantic authoring,
knowledge base acceleration, …
NED methods come close to human quality
combine popularity, similarity, and coherence
extend towards general WSD (e.g. for QA)
Linking entities across KB‘s is advancing
Integrated methods for aligning entities, classes and relations
3-97
Open Problems and Grand Challenges
Entity name disambiguation in difficult situations
Short and noisy texts about long-tail entities in social media
Robust disambiguation of entities, relations and classes
Relevant for question answering & question-to-query translation
Key building block for KB building and maintenance
Combine algorithms and crowdsourcing for NED & ER
with active learning, minimizing human effort or cost/accuracy
Automatic and continuously maintained sameAs links
for Web of Linked Data with high accuracy & coverage
3-98
End of Part 3
Questions?
3-99
Recommended Readings: Disambiguation
• J. Hoffart, M. A. Yosef, I. Bordino, et al.: Robust Disambiguation of Named Entities in Text. EMNLP 2011
• J. Hoffart et al.: KORE: Keyphrase Overlap Relatedness for Entity Disambiguation. CIKM 2012
• R.C. Bunescu, M. Pasca: Using Encyclopedic Knowledge for Named entity Disambiguation. EACL 2006
• S. Cucerzan: Large-Scale Named Entity Disambiguation Based on Wikipedia Data. EMNLP 2007
• D.N. Milne, I.H. Witten: Learning to link with wikipedia. CIKM 2008
• S. Kulkarni et al.: Collective annotation of Wikipedia entities in web text. KDD 2009
• G.Limaye et al: Annotating and Searching Web Tables Using Entities, Types and Relationships. PVLDB 2010
• A. Rahman, V. Ng: Coreference Resolution with World Knowledge. ACL 2011
• L. Ratinov et al.: Local and Global Algorithms for Disambiguation to Wikipedia. ACL 2011
• M. Dredze et al.: Entity Disambiguation for Knowledge Base Population. COLING 2010
• P. Ferragina, U. Scaiella: TAGME: on-the-fly annotation of short text fragments. CIKM 2010
• X. Han, L. Sun, J. Zhao: Collective entity linking in web text: a graph-based method. SIGIR 2011
• M. Tsagkias, M. de Rijke, W. Weerkamp.: Linking Online News and Social Media. WSDM 2011
• J. Du et al.: Towards High-Quality Semantic Entity Detection over Online Forums. SocInfo 2011
• V.I. Spitkovsky, A.X. Chang: A Cross-Lingual Dictionary for English Wikipedia Concepts, LREC 2012
• J.R. Finkel, T. Grenager, C. Manning. Incorporating Non-local Information into Information Extraction
Systems by Gibbs Sampling. ACL 2005
• V. Ng: Supervised Noun Phrase Coreference Research: The First Fifteen Years. ACL 2010
• S. Singh, A. Subramanya, F.C.N. Pereira, A. McCallum: Large-Scale Cross-Document Coreference
Using Distributed Inference and Hierarchical Models. ACL 2011
• T . Lin et al.: No Noun Phrase Left Behind: Detecting and Typing Unlinkable Entities. EMNLP 2012
• A. Rahman, V. Ng: Inducing Fine-Grained Semantic Classes via Hierarchical Classification. COLING 2010
• X. Ling, D.S. Weld: Fine-Grained Entity Recognition. AAAI 2012
• R. Navigli: Word sense disambiguation: A survey. ACM Comput. Surv. 41(2), 2009
• M. Yahya et al.: Natural Language Questions for the Web of Data. EMNLP 2012
• S. Shekarpour: Automatically Transforming Keyword Queries to SPARQL on Large-Scale KBs. ISWC 2011 3-100
Recommended Readings:
Linked Data and Entity Linkage
• T. Heath, C. Bizer: Linked Data: Evolving the Web into a Global Data Space. Morgan&Claypool, 2011
• A. Hogan, et al.: An empirical survey of Linked Data conformance. J. Web Sem. 14, 2012
• H. Glaser, A. Jaffri, I.C. Millard: Managing Co-Reference on the Semantic Web. LDOW 2009
• J. Volz, C.Bizer, M.Gaedke, G.Kobilarov : Discovering and Maintaining Links on the Web of Data. ISWC 2009
• F. Naumann, M. Herschel: An Introduction to Duplicate Detection. Morgan&Claypool, 2010
• H.Köpcke et al: Learning-Based Approaches for Matching Web Data Entities. IEEE Internet Computing 2010
• H. Köpcke et al.: Evaluation of entity resolution approaches on real-world match problems. PVLDB 2010
• S. Melnik, H. Garcia-Molina, E. Rahm: Similarity Flooding: A Versatile Graph Matching Algorithm and its
Application to Schema Matching. ICDE 2002
• S. Chaudhuri, V. Ganti, R. Motwani: Robust Identification of Fuzzy Duplicates. ICDE 2005
• S.E. Whang et al.: Entity Resolution with Iterative Blocking. SIGMOD 2009
• S.E. Whang, H. Garcia-Molina: Joint Entity Resolution. ICDE 2012
• L. Kolb, A. Thor, E. Rahm: Load Balancing for MapReduce-based Entity Resolution. ICDE 2012
• J.Li, J.Tang, Y.Li, Q.Luo: RiMOM: A dynamic multistrategy ontology alignment framework. TKDE 21(8), 2009
• P. Singla, P. Domingos: Entity Resolution with Markov Logic. ICDM 2006
• I.Bhattacharya, L. Getoor: Collective Entity Resolution in Relational Data. TKDD 1(1), 2007
• R. Hall, C.A. Sutton, A. McCallum: Unsupervised deduplication using cross-field dependencies. KDD 2008
• V. Rastogi, N. Dalvi, M. Garofalakis: Large-Scale Collective Entity Matching. PVLDB 2011
• F. Suchanek et al.: PARIS: Probabilistic Alignment of Relations, Instances, and Schema. PVLDB 2012
• Z. Wang, J. Li, Z. Wang, J. Tang: Cross-lingual knowledge linking across wiki knowledge bases. WWW 2012
• T. Nguyen et al.: Multilingual Schema Matching for Wikipedia Infoboxes. PVLDB 2012
• A.Hogan et al.: Scalable and distributed methods for entity matching. J. Web Sem. 10, 2012
• C. Böhm et al.: LINDA: Distributed Web-of-Data-Scale Entity Matching. CIKM 2012
• J. Wang, T. Kraska, M. Franklin, J. Feng: CrowdER: Crowdsourcing Entity Resolution. PVLDB 2012
3-101
Knowledge Harvesting:
Overall Take-Home Lessons
KB‘s are great opportunity in the big-data era:
revive old AI vision, make it real & large-scale !
challenging, but high pay-off
Strong success story on entities and classes
Good progress on relational facts
Methods for open-domain relation discovery
Many opportunities remaining:
temporal knowledge, spatial, visual, commonsense
vertical domains: health, music, travel, …
Search and ranking:
Combine facts (SPO triples) with witness text
Extend SPARQL, LM‘s for ranking, UI unclear
Entity linking:
From names in text to entities in KB
sameAs between entities in different KB‘s / DB‘s
Knowledge Harvesting: Research
Opportunities & Challenges
Explore & exploit synergies between
semantic, statistical, & social Web methods:
statistical evidence +
logical consistency +
wisdom of the crowd !
For DB / AI / IR / NLP / Web researchers:
• efficiency & scalability
• consistency constraints & reasoning
• search and ranking
• deep linguistic patterns & statistics
• text (& speech) disambiguation
• killer app for uncertain data management
• knowledge-base life-cycle
1-103
and more
cmn: 非常谢谢你
yue: 唔該
wu: 谢谢侬
dai: ขอบคุณ
tib: ཐུགས་རྗེ་ཆྗེ་།
en: thank you
expression of
gratitude
de: vielen Dank
fr: Merci beaucoup
es: muchas gracias
ru: Большое спасибо
3-104
Download