Don’t Take Shortcuts!
Computational Lexical Semantics and the Turing Test
Roberto Navigli
http://lcl.uniroma1.it
ERC StG: Multilingual Joint Word Sense Disambiguation (MultiJEDI)
Roberto Navigli
1
Turing Test (1950)
• A test of a machine’s ability to demonstrate intelligence
• Inspired by a party game, known as the “imitation game"
(a man vs. a woman)
“I propose to consider the question, Can
machines think? [Since "thinking" is
difficult to define, let’s] replace the
question by another, which is closely
related to it and is expressed in
relatively unambiguous words. […] Are
there imaginable digital computers
which would do well in the imitation
game?”
-Alan Turing, “Computing Machinery and Intelligence” (1950)
Don’t Take Shortcuts! Computational Lexical Semantics and the Turing Test
Roberto Navigli
07/04/2015
2
Turing Test (1950)
• A test of a machine’s ability to demonstrate intelligence
Human judge
Human Player
Computer Player
Don’t Take Shortcuts! Computational Lexical Semantics and the Turing Test
Roberto Navigli
07/04/2015
3
Why a language-driven test?
• Because intelligence is manifested through the ability to
converse (as a sufficient (?) condition)
Don’t Take Shortcuts! Computational Lexical Semantics and the Turing Test
Roberto Navigli
07/04/2015
4
Why a language-driven test?
• Because intelligence is manifested through the ability to
converse (as a sufficient but not necessary condition)
• Because Natural Language Understanding (NLU) is often
thought of as an AI-complete problem
• NLU requires extensive knowledge about the world and
the ability to interact with it
Don’t Take Shortcuts! Computational Lexical Semantics and the Turing Test
Roberto Navigli
07/04/2015
5
ELIZA (Weizenbaum, 1964-66)
• Simulates a Rogerian psychotherapist (PersonCentered Therapy)
E> Hi, I’m Eliza. What problem do you want to talk about?
U> Let’s talk about the SOFSEM conference
E> What is your problem with the SOFSEM conference?
U> No problem, really
E> Why not?
U> I just wanted to chat about the conference
E> Is that the real reason?
U> Yes
E> I understand
Don’t Take Shortcuts! Computational Lexical Semantics and the Turing Test
Roberto Navigli
07/04/2015
6
ELIZA (Weizenbaum, 1964-66)
• Simulates a Rogerian psychotherapist (PersonCentered Therapy)
U> So what
E> Have you asked such questions before?
U> About a zillion times
E> Can you elaborate on that?
U> Ok, let’s talk about the Turing session of the
conference
E> Does that question interest you?
U> Yes, tremendously, I’m here for giving an invited talk!
E> Are you sure?
Don’t Take Shortcuts! Computational Lexical Semantics and the Turing Test
Roberto Navigli
07/04/2015
7
ELIZA (Weizenbaum, 1964-66)
• An early example of computer program performing
primitive natural language processing
• Processes human replies to questions
• Uses simple template matching by inserting keywords
into template phrases
• No semantics, really!
Don’t Take Shortcuts! Computational Lexical Semantics and the Turing Test
Roberto Navigli
07/04/2015
8
Shortcut #1:
Flatter the user
(AKA: Template Matching)
Don’t Take Shortcuts! Computational Lexical Semantics and the Turing Test
Roberto Navigli
07/04/2015
9
Mgonz (Humphrys, 1989)
• Memoryless is good in a random conversation!
U> finger
M> cut this cryptic s**t speak in full sentences
U> well hello
M> ok thats it im not talking to you any more
U> mark?
M> mark isnt here and has left me to deal with cretins like you
U> hello mark are you there?
U> I can't tell if this thing is working.
M> ah type something interesting or shut up
Don’t Take Shortcuts! Computational Lexical Semantics and the Turing Test
Roberto Navigli
07/04/2015
10
Shortcut #2:
Insult the user
(AKA: “type something
interesting or shut up”)
Don’t Take Shortcuts! Computational Lexical Semantics and the Turing Test
Roberto Navigli
07/04/2015
11
Implementing the Turing Test: the Loebner Prize
• Annual competition in Artificial Intelligence
• Round: a judge converses with a human and a machine
and decides which is which
• Time limit: 5-20+ minutes
• Risk: a winning system might aim at trickery rather than
plausible intelligence
Don’t Take Shortcuts! Computational Lexical Semantics and the Turing Test
Roberto Navigli
07/04/2015
12
These systems do not really UNDERSTAND!
• JUDGE> I liked "Betty Blue" very much. I like movies
about crazy people.
• PROGRAM> Like wow. I'm so flattered. I saw "The City
of Lost Children" recently. Have you heard of it?
• You just need a database of movies…
Don’t Take Shortcuts! Computational Lexical Semantics and the Turing Test
Roberto Navigli
07/04/2015
13
These systems do not really UNDERSTAND!
• JUDGE> Yes, but I haven't seen it. Why did you say you
were "so flattered"?
• PROGRAM> OK. That's good... Does there have to be
a reason?
• Reminiscent of ELIZA, maybe?
Don’t Take Shortcuts! Computational Lexical Semantics and the Turing Test
Roberto Navigli
07/04/2015
14
These systems do not really UNDERSTAND!
• JUDGE> Oh, I suppose not. Say anything which will
convince me you are not a machine
• PROGRAM> Wow - you suppose not? Me too! Listen. I
don't know which convince you I am a machine will
not, and I don't wish to be told either! So, have you
ever visited Australia?
• Alright, better than ELIZA at changing the subject when
on a difficult topic
• But linguistically challenged!
Don’t Take Shortcuts! Computational Lexical Semantics and the Turing Test
Roberto Navigli
07/04/2015
15
These systems do not really UNDERSTAND!
• JUDGE> are you from australia?
• PROGRAM> It's cool being from Australia I guess.
Gives you something to chat about when talking
with a Yank!
• JUDGE> I am unsure of that reply--are you from
australia?
• PROGRAM> It's cool being from Australia I guess.
Gives you something to chat about when talking
with a Yank!
• Oooops, a little crack in the system?
Don’t Take Shortcuts! Computational Lexical Semantics and the Turing Test
Roberto Navigli
07/04/2015
16
Let’s get SERIOUS!
• JUDGE> I bought pens for my cattle in Perth, Australia.
What do you think about that?
• PROGRAM1> Check back later and see if I learn the
answer to that one.
• Oooops, a larger crack in the system?
Don’t Take Shortcuts! Computational Lexical Semantics and the Turing Test
Roberto Navigli
07/04/2015
17
Let’s get SERIOUS!
• JUDGE> I bought pens for my cattle in Perth, Australia.
What do you think about that?
• PROGRAM2> Hey, are they writing the next best seller?
• Nope, it doesn’t work…
Don’t Take Shortcuts! Computational Lexical Semantics and the Turing Test
Roberto Navigli
07/04/2015
18
Shortcut #3 (generalizes
shortcuts #1 and #2):
Keep away from
Natural Language Processing
(AKA: “the intelligent parrot
approach to the Turing Test”)
Don’t Take Shortcuts! Computational Lexical Semantics and the Turing Test
Roberto Navigli
07/04/2015
19
What knowledge does HAL 9000 need?
• HAL (Heuristically programmed ALgorithmic
computer)
syntax
phonetics &
phonology
?
semantics
Dave Bowman: Hello, HAL. Do you read me, HAL?
HAL: Affirmative, Dave. I read you.
Dave Bowman: Open the pod bay doors, HAL.
discourse
HAL: I'm sorry, Dave. I'm afraid I can't do that.
Dave Bowman: What's the problem?
HAL: I think you know what the problem is just as well as I do.
Dave Bowman: What are you talking about, HAL?
pragmatics
HAL: This mission is too important for me to allow you to
jeopardize it.
Dave Bowman: I don't know what you're talking about, HAL.
morphology
Don’t Take Shortcuts! Computational Lexical Semantics and the Turing Test
Roberto Navigli
07/04/2015
20
So what can we (researchers?) do about
automatic text understanding?
Don’t Take Shortcuts! Computational Lexical Semantics and the Turing Test
Roberto Navigli
07/04/2015
21
Don’t Take Shortcuts! Computational Lexical Semantics and the Turing Test
Roberto Navigli
07/04/2015
22
The (Commercial) State of the Art
• How to translate: “I like chocolate, so I bought a bar in a
supermarket”?
• Google Translate: “Mi piace il cioccolato, così ho
comprato un bar in un supermercato”!
Don’t Take Shortcuts! Computational Lexical Semantics and the Turing Test
Roberto Navigli
07/04/2015
23
Shortcut #4:
Just use statistics
(AKA: “tons of data will tell you”)
Don’t Take Shortcuts! Computational Lexical Semantics and the Turing Test
Roberto Navigli
07/04/2015
24
It’s not that the “big data” approach is bad,
it’s just that mere statistics is not enough
Don’t Take Shortcuts! Computational Lexical Semantics and the Turing Test
Roberto Navigli
07/04/2015
25
Source: Little Britain.
Symbol vs. Concept
Don’t Take Shortcuts! Computational Lexical Semantics and the Turing Test
Roberto Navigli
07/04/2015
26
Source: Little Britain.
Symbol vs. Concept
Don’t Take Shortcuts! Computational Lexical Semantics and the Turing Test
Roberto Navigli
07/04/2015
27
Word Sense Disambiguation (WSD)
• The task of computationally determining the
senses (meanings) for words in context
• “Spring water can be found at different altitudes”
Don’t Take Shortcuts! Computational Lexical Semantics and the Turing Test
Roberto Navigli
07/04/2015
28
Word Sense Disambiguation (WSD)
• The task of computationally determining the
senses (meanings) for words in context
• “Spring water can be found at different altitudes”
Don’t Take Shortcuts! Computational Lexical Semantics and the Turing Test
Roberto Navigli
07/04/2015
29
Word Sense Disambiguation (WSD)
• It is basically a classification task
– The objective is to learn how to associate word senses with
words in context
– This task is strongly linked to Machine Learning
I like chocolate, so I bought a bar in a supermarket
bar =
,
,
, …,
Don’t Take Shortcuts! Computational Lexical Semantics and the Turing Test
Roberto Navigli
07/04/2015
30
Word Sense Disambiguation (WSD)
bar
(target word)
“I like chocolate, so I bought a bar in a supermarket”
(context)
at the bar?
coffee … bar?
Corpora
Lexical
Knowledge
Resources
bar/counter
bar … court?
bar/law
WSD
system
bar/block
resources
output sense
Don’t Take Shortcuts! Computational Lexical Semantics and the Turing Test
Roberto Navigli
07/04/2015
32
Lexical sample vs. All words
• Lexical sample
– A system is asked to disambiguate a restricted set of
words (typically one word per sentence):
•
We are fond of fruit such as kiwi and banana.
•
The kiwi is the national bird of New Zealand.
• ... a perspective from a native kiwi speaker (NZ).
• All words
– A system is asked to disambiguate all content words in a
data set:
• The kiwi is the national bird of New Zealand.
• ... a perspective from a native kiwi speaker (NZ).
Don’t Take Shortcuts! Computational Lexical Semantics and the Turing Test
Roberto Navigli
07/04/2015
33
But… is WSD any good?
Don’t Take Shortcuts! Computational Lexical Semantics and the Turing Test
Roberto Navigli
07/04/2015
35
But… is WSD any good?
• Back in 2004 (Senseval-3, ACL):
Many instances of
just a few words
State of the art:
65% accuracy
Few instances of
many words
Don’t Take Shortcuts! Computational Lexical Semantics and the Turing Test
Roberto Navigli
07/04/2015
36
What does “65% accuracy” mean?
• You understand 65 (content) words out of 100 in a text:
• You might behave the wrong way!
• For instance:
I used to be a banker, but
lost interest in the work.
I was arrested after my
therapist suggested I take
something for my kleptomania.
Don’t Take Shortcuts! Computational Lexical Semantics and the Turing Test
Roberto Navigli
07/04/2015
37
Source: [Iraolak, 2004]
And you need a lot of training data!
• Dozens (hundreds?) of person-years, at a throughput of
one word instance per minute [Edmonds, 2000]
• For each language of interest!
Don’t Take Shortcuts! Computational Lexical Semantics and the Turing Test
Roberto Navigli
07/04/2015
38
Why is WSD so difficult?
“For three days and nights she danced in the streets with
the crowd.”
WordNet [Miller et al. 1990; Fellbaum, 1998]:
• street1 - a thoroughfare (usually including sidewalks) that
is lined with buildings
• street2 - the part of a thoroughfare between the sidewalks;
the part of the thoroughfare on which vehicles travel
• street3 - the streets of a city viewed as a depressed
environment in which there is poverty and crime and
prostitution and dereliction
• street5 - people living or working on the same street
Don’t Take Shortcuts! Computational Lexical Semantics and the Turing Test
Roberto Navigli
07/04/2015
39
Inter-Tagger Agreement using WordNet
• On average human sense taggers agree on their
annotations 67-72% of the time
• Question 1: So why should we expect an automatic
system to be able to do any better than us?!
• Question 2: What can we do to improve this frustrating
scenario?
Don’t Take Shortcuts! Computational Lexical Semantics and the Turing Test
Roberto Navigli
07/04/2015
40
From Fine-Grained to Coarse-Grained Word Senses
• We need to remove unnecessarily fine-grained sense
distinctions
Street with sidewalks
Street without sidewalks
People living/working on the street
Depressed environment
Street with/without sidewalks?
Don’t Take Shortcuts! Computational Lexical Semantics and the Turing Test
Roberto Navigli
07/04/2015
41
From Fine-Grained to Coarse-Grained Word Senses (1)
• Coarser sense inventories can be created manually
[Hovy et al., NAACL 2006]:
– Start with the WordNet sense inventory of a target word
– Iteratively partition its senses
– Until 90% agreement is reached among human annotators
in a sense annotation task
Don’t Take Shortcuts! Computational Lexical Semantics and the Turing Test
Roberto Navigli
07/04/2015
42
From Fine-Grained to Coarse-Grained Word Senses (2)
• Coarser sense inventories can be created automatically
using WSD [Navigli, ACL 2006]:
flat list of senses
hierarchy of senses
subsenses
Don’t Take Shortcuts! Computational Lexical Semantics and the Turing Test
Roberto Navigli
polysemous
43
senses
07/04/2015
homonyms
Is coarse-grained better than fine-grained?
• Let’s move forward to 2007 (Semeval-2007, ACL):
training data
Most Frequent Sense Baseline
Don’t Take Shortcuts! Computational Lexical Semantics and the Turing Test
Roberto Navigli
07/04/2015
44
The Most Frequent Sense (MFS) Baseline
• Given an ambiguous word, simply always choose the
most common (i.e., most frequently occurring) sense
within a sense-annotated corpus
Don’t Take Shortcuts! Computational Lexical Semantics and the Turing Test
Roberto Navigli
07/04/2015
45
The Most Frequent Sense (MFS) Baseline
• Given an ambiguous word, simply always choose the
most common (i.e., most frequently occurring) sense
within a sense-annotated corpus
• High performance on open text (the predominant sense
is most likely to be correct)
• Looks like a pretty lazy way to tackle ambiguity!
Sense #1
Sense #1
Sense #1
Sense #1
Sense #1
Sense #1
Sense #1
Sense #1
Sense #1
Sense #1
Don’t Take Shortcuts! Computational Lexical Semantics and the Turing Test
Roberto Navigli
07/04/2015
46
But, hey, just think of the Web!
There are at least 8 billion Web pages out there!
On average each page contains ~500 words (2006)
Don’t Take Shortcuts! Computational Lexical Semantics and the Turing Test
Roberto Navigli
07/04/2015
47
MFS on 8G * 500 words = 4T words...
• At 78.9% performance, it makes 21.1% of errors
• So it chooses the wrong sense for 844G words (out of
4T)
• Can we afford that?
• So what’s the output on:
– JUDGE> I bought pens for my cattle in Perth, Australia. What
do you think about that?
– PROGRAM2> Hey, are they writing the next best
seller?
Don’t Take Shortcuts! Computational Lexical Semantics and the Turing Test
Roberto Navigli
07/04/2015
48
Shortcut #5:
Most Frequent Sense
(AKA: “lazy about senses”)
Don’t Take Shortcuts! Computational Lexical Semantics and the Turing Test
Roberto Navigli
07/04/2015
49
Coarse-grained Word Sense Disambiguation
[Navigli et al. 2007; Pradhan et al. 2007]
• Semeval-2007, ACL:
83.2
training data
SSI
KB
State of the art:
82-83%!
large knowledge base
Don’t Take Shortcuts! Computational Lexical Semantics and the Turing Test
Roberto Navigli
07/04/2015
50
Knowledge-based WSD: the core idea
• We view WordNet (or any other knowledge base) as a
graph
• We exploit the relational structure (i.e., edges) of the
graph to perform WSD
Don’t Take Shortcuts! Computational Lexical Semantics and the Turing Test
Roberto Navigli
07/04/2015
52
WordNet (Miller et al. 1990; Fellbaum, 1998)
{ w heeled vehic le}
s-
{ w heel}
pa
a
{ brake}
ha
is-a
is -
h a s- p a rt
ha
s-p
art
rt
is
is - a
is { m otor vehic le}
-a
{ loc om otive, engine,
loc om otive engine,
railw ay loc om otive}
{ trac tor}
is
-a
is - a
has
i
a
s-
rt
{ c ar,auto, autom obile,
m ac hine, m otorc ar}
-p a
{ golf c art,
golfc art}
{ s plas her}
{ s elf-propelled vehic le}
a
{ w agon,
w aggon }
{ c onvertible}
{ air bag}
ar
h a s-p
ha
s-
pa
t
{ c ar w indow }
rt
{ ac c elerator,
ac c elerator pedal,
gas pedal, throttle}
Don’t Take Shortcuts! Computational Lexical Semantics and the Turing Test
Roberto Navigli
07/04/2015
53
Knowledge-based WSD: the main idea
The
waiter
served
white
Don’t Take Shortcuts! Computational Lexical Semantics and the Turing Test
Roberto Navigli
port
07/04/2015
54
intermediate node
Knowledge-based WSD: the main idea
The
waiter#1 served#5
word sense of interest
N
N
player#1
N
person#1
N
N
V
N
N
N
N
N
wine#1 N
N
V
serve#15N
port#2
N
N white
N
N
A
alcohol#1
white#1
N
serve#5
waiter#1
N
beverage#1
N
N
port #2
white#3
N
fortified wine#1
N
N
wine#1
A
N
white#3
N
port#1
waiter#2
Don’t Take Shortcuts! Computational Lexical Semantics and the Turing Test
Roberto Navigli
07/04/2015
55
Two Knowledge-Based WSD Algorithms which Exploit
Semantic Structure
• Graph connectivity measures [Navigli & Lapata, 2007; 2010]
• Structural Semantic Interconnections [Navigli & Velardi, 2005]
• Both based on a graph construction phase
Don’t Take Shortcuts! Computational Lexical Semantics and the Turing Test
Roberto Navigli
07/04/2015
56
Graph Construction
[Navigli & Lapata, IJCAI 2007; IEEE TPAMI 2010]
• Transform the input sentence (e.g. “She drank some milk”)
into a bag of content words:
{ drinkv, milkn }
• Build a disambiguation graph that includes all nodes in
paths of length ≤ L connecting pairs of senses of words in
context
• The disambiguation graph contains the possible semantic
interpretations for the sentence
Don’t Take Shortcuts! Computational Lexical Semantics and the Turing Test
Roberto Navigli
07/04/2015
57
Graph Construction
[Navigli & Lapata, IJCAI 2007; IEEE TPAMI 2010]
Don’t Take Shortcuts! Computational Lexical Semantics and the Turing Test
Roberto Navigli
07/04/2015
58
Graph Construction
[Navigli & Lapata, IJCAI 2007; IEEE TPAMI 2010]
Don’t Take Shortcuts! Computational Lexical Semantics and the Turing Test
Roberto Navigli
07/04/2015
59
Graph Construction
[Navigli & Lapata, IJCAI 2007; IEEE TPAMI 2010]
Don’t Take Shortcuts! Computational Lexical Semantics and the Turing Test
Roberto Navigli
07/04/2015
60
Graph Construction
[Navigli & Lapata, IJCAI 2007; IEEE TPAMI 2010]
Don’t Take Shortcuts! Computational Lexical Semantics and the Turing Test
Roberto Navigli
07/04/2015
61
Graph Construction
[Navigli & Lapata, IJCAI 2007; IEEE TPAMI 2010]
Don’t Take Shortcuts! Computational Lexical Semantics and the Turing Test
Roberto Navigli
07/04/2015
62
Graph Construction
[Navigli & Lapata, IJCAI 2007; IEEE TPAMI 2010]
Don’t Take Shortcuts! Computational Lexical Semantics and the Turing Test
Roberto Navigli
07/04/2015
63
Graph Construction
[Navigli & Lapata, IJCAI 2007; IEEE TPAMI 2010]
Don’t Take Shortcuts! Computational Lexical Semantics and the Turing Test
Roberto Navigli
07/04/2015
64
Graph Construction
[Navigli & Lapata, IJCAI 2007; IEEE TPAMI 2010]
Don’t Take Shortcuts! Computational Lexical Semantics and the Turing Test
Roberto Navigli
07/04/2015
65
Graph Construction
[Navigli & Lapata, IJCAI 2007; IEEE TPAMI 2010]
Don’t Take Shortcuts! Computational Lexical Semantics and the Turing Test
Roberto Navigli
07/04/2015
66
Graph Construction
[Navigli & Lapata, IJCAI 2007; IEEE TPAMI 2010]
Don’t Take Shortcuts! Computational Lexical Semantics and the Turing Test
Roberto Navigli
07/04/2015
67
Knowledge-based WSD with Connectivity Measures
[Navigli & Lapata, IJCAI 2007; IEEE TPAMI 2010]
• Measures from different areas of computer science (graph
theory, social network analysis, link analysis, etc.) tested
within a single framework
– Degree
– Betweenness [Freeman, 1977]
– Key Player Problem [Borgatti, 2003]
Local measures
– Maximum Flow
– PageRank [Brin and Page, 1998]
– HITS [Kleinberg, 1998]
– Compactness [Botafogo et al., 1992]
Global measures
– Graph Entropy
– Edge Density
Don’t Take Shortcuts! Computational Lexical Semantics and the Turing Test
Roberto Navigli
07/04/2015
68
Degree Centrality for WSD
[Navigli & Lapata, IEEE TPAMI 2010]
• Calculate the normalized number of edges terminating in
a given vertex
• Choose the highest-ranking sense(s) for each target word
Don’t Take Shortcuts! Computational Lexical Semantics and the Turing Test
Roberto Navigli
07/04/2015
69
Structural Semantic Interconnections (SSI)
[Navigli & Velardi, IEEE TPAMI 2005]
• Key idea: “good” edge sequences (semantic
interconnections) encoded with an edge grammar
• Examples of good and bad semantic
interconnections according to the SSI grammar:
• Exploit good edge sequences only for WSD
Don’t Take Shortcuts! Computational Lexical Semantics and the Turing Test
Roberto Navigli
07/04/2015
71
The SSI Algorithm
• It takes as input a set of words in context
• It consists of two phases:
1. graph construction
• A graph, induced from WordNet, is associated with the input
words
2. interpretation
• For each word the highest-ranking node (i.e. word sense) in the
graph is selected according to a path-based measure
Don’t Take Shortcuts! Computational Lexical Semantics and the Turing Test
Roberto Navigli
07/04/2015
72
Structural Semantic Interconnections (SSI)
[Navigli & Velardi, IEEE TPAMI 2005]
• Available online:
related-to
related-to
– http://lcl.uniroma1.it/ssi
• Example:
•
“We drank a cup of chocolate”
related-to
related-to
related-to related-torelated-to
related-to related-to
related-to
related-to
related-to
related-to related-to
related-to
related-to
related-to
related-to
related-torelated-to
related-to
related-to
Don’t Take Shortcuts! Computational Lexical Semantics and the Turing Test
Roberto Navigli
related-to
07/04/2015
73
related-to
Lesson Learned (1): Horizontal vs. Vertical Relations
• WordNet provides mostly paradigmatic (i.e. vertical)
relations
bunga bunga -is-a-> sex party
• We need syntagmatic (i.e. horizontal) relations for highquality WSD
bunga bunga -related-to-> Silvio Berlusconi
beware: hot information!
Don’t Take Shortcuts! Computational Lexical Semantics and the Turing Test
Roberto Navigli
07/04/2015
74
Lesson Learned (2): The Richer, The Better
• Highly-interconnected semantic networks have a great
impact on knowledge-based WSD [Navigli & Lapata,
IEEE TPAMI 2010] even in a fine-grained setting
nirvana point!!!
Degree/SSI
Don’t Take Shortcuts! Computational Lexical Semantics and the Turing Test
Roberto Navigli
source: [Navigli and Lapata, 2010]
divergence
point
07/04/2015
75
The knowledge acquisition bottleneck
• So why don’t we “simply” enrich 100k concepts with 100
semantic relations each?
• Again it would require dozens of person-years of
annotation work!
• A task which then has to be repeated for each new
language!
Don’t Take Shortcuts! Computational Lexical Semantics and the Turing Test
Roberto Navigli
07/04/2015
76
Multilingual Joint Word Sense Disambiguation
(MultiJEDI)
Key Idea 1: create knowledge for all languages
Multilingual
Semantic
Network
Don’t Take Shortcuts! Computational Lexical Semantics and the Turing Test
Roberto Navigli
07/04/2015
77
Multilingual Joint Word Sense Disambiguation
(MultiJEDI)
Key Idea 2: use many languages to disambiguate one language
Don’t Take Shortcuts! Computational Lexical Semantics and the Turing Test
Roberto Navigli
07/04/2015
78
The BabelNet solution [Navigli & Ponzetto, ACL 2010]
• A wide-coverage multilingual semantic network
containing concepts, named entities and semantic
relations, obtained from the automatic integration of:
– Wikipedia (encyclopedic)
– WordNet (lexicographic)
Don’t Take Shortcuts! Computational Lexical Semantics and the Turing Test
Roberto Navigli
07/04/2015
79
BabelNet [Navigli & Ponzetto, ACL 2010]
• From a heterogeneous set of monolingual resources…
MultiWordNet
WOLF
BalkaNet
MCR
WordNet
GermaNet
• to a new unified multilingual language resource
• containing millions of ready-made semantic relations
harvested from Wikipedia
• First version available at: http://lcl.uniroma1.it/babelnet
Don’t Take Shortcuts! Computational Lexical Semantics and the Turing Test
Roberto Navigli
07/04/2015
80
State-of-the-art Monolingual WSD
(Navigli & Ponzetto, AIJ 2012?, pending revision)
Ok, so now we
have ~85%
performance
with non-naive
WSD!
Don’t Take Shortcuts! Computational Lexical Semantics and the Turing Test
Roberto Navigli
07/04/2015
81
What for?
• JUDGE> I bought pens for my cattle
in Perth, Australia. What do you think
about that?
• PROGRAM3> Hey, do you like
raising livestock?
Don’t Take Shortcuts! Computational Lexical Semantics and the Turing Test
Roberto Navigli
07/04/2015
82
Keep cool, we have not solved the problem yet!
• What about pragmatics without tricks?
• And discourse? (Maybe not so relevant for a 5-minute
chat?)
• Also, we just dealt with (some) computational lexical
semantics
• Much more can be done!
Don’t take shortcuts!
Don’t Take Shortcuts! Computational Lexical Semantics and the Turing Test
Roberto Navigli
07/04/2015
83
The next level: our current work in Rome
Ontologized semantic
relations, really
Andrea Moro_
Multilingual WSD &
semantic relatedness
Domain WSD &
Ontology Learning
from Scratch
Stefano Faralli_
Simone Ponzetto_
That’s me!!!
Word Sense Induction &
Ontology-based Query
Suggestion
Ontologized Selectional
Preferences
Antonio Di Marco_
Tiziano Flati_
Large-scale WSD &
Statistical Machine
Translation
Taher Pilehvar_
Don’t Take Shortcuts! Computational Lexical Semantics and the Turing Test
Roberto Navigli
07/04/2015
84
Thanks or…
m
i
(grazie)
Don’t Take Shortcuts! Computational Lexical Semantics and the Turing Test
Roberto Navigli
07/04/2015
85
Roberto Navigli
http://lcl.uniroma1.it
Joint work with: Mirella Lapata, Simone Ponzetto, Paola Velardi