Semantic relations

advertisement
Mono- and bilingual modeling
of selectional preferences
Sebastian Padó
Institute for Computational Linguistics
Heidelberg University
(joint work with Katrin Erk, Ulrike Pado, Yves Peirsman)
Some context
• Computational lexical semantics: modeling the meaning
of words and phrases
• Distributional approach
Corpus
• Observe the usage of words in corpora
Knowledg
e
• Robustness: Broad coverage, manageable complexity
• Flexibility: Corpus choice determines model
Structure
Application:
Predictions of
plausibility judgments
Methods:
Distributional semantics
Phenomena:
Semantic relations in
bilingual dictionaries
Plausibility of Verb-Relation-Argument-Triples
Verb
Relation
Argument
Plausibility
eat
subject
customer
6.9
eat
object
customer
1.5
eat
subject
apple
1.0
eat
object
apple
6.4
• Central aspect of language
• Selectional preferences [Katz & Fodor 1963, Wilks 1975]
•
Generalization of lexical similarity
• Incremental language processing [McRae & Matsuki 2009]
• Disambiguation [Toutanova et al. 2005], Applicability of
inference rules [Pantel et al. 2007], SRL [Gildea & Jurafsky 2002]
Modelling Plausibility
• Approximating plausibility by frequency
English
corpus
(eat, obj, apple) 100
(eat, obj, hat) 1
(eat, obj, telephone)
0
(eat, obj, caviar) 0
(eat, obj, apple): highly
plausible
(eat, obj, hat): somewhat
plausible
(eat, obj, telephone): ?
(eat, obj, caviar): ?
• Two lexical variables: Frequency of most triples is zero
• Implausibility or sparse data?
• Generalization based on an ontology (WordNet) [Resnik 1996]
• Generalization based on vector space [Erk, Padó, und Padó 2010]
Semantic Spaces
cultiver
cultiv rouler
er
Fr
5
1
clémentine 4
1
mandarine
voiture
•
1
20
mandarine
clémentine
voiture
rouler
Characterization of word meaning though profile
over occurrence contexts [Salton, Wang, and Yang 1974,
Landauer & Dumais 1997, Schütze 1998]
•
•
Geometrically: Vector in high-dimensional space
High vector similarity implies high semantic
similarity
• Next neighbors = synonyms
Similarity-based generalization
[Pado, Pado & Erk 2010]
• Plausibility is average vector space similarity to seen arguments
• (v, r, a): verb – relation – argument head word triple
• seenargs: set of argument head words seen in the corpus
• wt: weight function
• Z: normalization constant
• sim: semantic (vector space) similarity
Geometrical interpretation
apple
telephone
orange
caviar
breakfast
Seen objects of “eat”
Peter
husband
child
Seen subjects of “eat”
Evaluation
Modell
Abdeckung Spearman’s rho
Resnik 1996 [ontology-based]
100%
0.123 n.s.
EPP [vector space-based]
98%
0.325 ***
U. Pado et al. 2006 [“deep”
model]
78%
0.415 ***
• Triples with human plausibility ratings [McRae et al. 1996]
• Evaluation: Correlation of model predictions with
human judgments
•
Spearman’s  = 1: perfect correlation;  = 0: no correlation
• Result: Vector space model attains almost quality of
“deep” model at 98% coverage
From one to many languages…
• Vector space model reduces the need for language
resources to predict plausibility judgments
• No ontologies
• Still necessary: Observations of triples, target words
• Large, accurately parsed corpus
•Problematic for basically all languages except English
Resnik [Brockmann & Lapata
2002]
TIGER+
GermaNet
ρ= .37
EPP [Pado & Peirsman 2010]
HGC
ρ= .33
• Can we extend our strategy to new languages?
Predicting plausibility for new languages
• Transfer with a bilingual lexicon
[Koehn and Knight 2002]
English
corpus
• Cross-lingual knowledge transfer
cultiver – grow
(cultiver, Obj, pomme)
pomme – apple
English
model
(grow, obj, apple): highly plausible
• Print dictionaries are problematic
• Instead: acquire from distributional data
Bilingual semantic space
(cultiv
er,
grow)
E
Fr
1
mandarin 4
2
1
cultiver/gro
w
mandarine
mandarin
car
mandarin 5
e
car
•
(rouler,
drive)
rouler/drive
20
Joint semantic space for words from both languages
[Rapp 1995, Fung & McKeown 1997]
•
•
•
Dimensions are bilingual word pairs, can be bootstrapped
Frequencies observable from comparable corpora
Nearest neighbors:
Cross-lingual synonyms ⟷ Translations
Nearest neighbors in bilingual space
(cultiv
er,
grow)
(rouler,
drive)
pear
5
1
pomme
4
2
car
1
20
E
Fr
•
•
cultiver/gro
w
pear
car
Similar usages / context profiles do not
necessarily indicate synonymy
Bilingual case: Peirsman & Pado (2011)
•
pomme
Lexicon extraction for EN/DE and EN/NL
rouler/drive
Evaluation against Gold Standard
• Evaluation of nearest cross-lingual neighbors against a
translators’ dictionary
Analysis of 200 noun pairs (EN-DE)
Meta-Relation
Relation
Synonymy (50%)
Frequen Example
cy
99
Verhältnis relationship
Antonymy
1
Inneres - exterior
CoHyponymy
15
Straßenbahn - bus
Hyponymy
3
Kunstwerk - painting
Hypernymy
15
Dramatiker - poet
Semantic relatedness
(19%)
39
Kapitel - essay
Errors (14%)
28
DDR-Zeit – trainee
Semantic similarity
(16%)
Similarity by relation
How to proceed?
• Classical reaction: Focus on cross-lingual synonyms
• Aggressive filtering of nearest-neighbor lists
• Risk: Sparse data issues
• Our hypothesis (prelimimary version):
• Non-synonymous pairs still provide information about
bilingual similarity
• Should be exploited for cross-lingual knowledge transfer
• Experimental validation: Vary number of synonyms, observe
effect on cross-lingual knowledge transfer
Varying the number of neighbors
• Nearest neighbors: 50% of
synonyms
• Further neighbors: quick
decline to 10% of synonyms
Experimental setup
English
corpus
rouler – drive
(bagnole, subj, rouler)
bagnole – jalopy,
banger, car
English
model
Consider plausibilities
für:
(jalopy, subj, drive)
(banger, subj, drive)
(car, subj, drive)
Details
• Model:
•
•
•
English model: trained on BNC as before
Bilingual lexicon extracted from BNC und Stuttgarter
Nachrichtenkorpus HGC as comparable corpora
Prediction based on n nearest English neighbours for
German argument
• Evaluation:
•
90 German (v,r,a) triples with human plausibility ratings
[Brockmann & Lapata 2003]
Results – EN-DE
Model
Resources
Sperman’s ρ
Resnik [Brockmann &
Lapata 2002]
TIGER corpus,
German Word Net
.37
EPP German [Pado &
Peirsman 2010]
HGC corpus parsed
with PCFG
.33
Translated English EPP
1 NN 2 NN
3 NN 4 NN 5 NN
0.34
0.44
0.41
0.46
0.40
• Result: Transfer model significantly better than
monolingual model, but only if non-synonymous
neighbors are included
Results: Details
1 NN 2 NN
3 NN 4 NN
5 NN
English EPP (all )
0.34
0.41
0.44
0.46
0.40
English EPP (subjects)
0.53
0.51
0.56
0.56
0.55
English EPP (objects)
0.58
0.61
0.61
0.64
0.58
English EPP (pp objects)
0.33
0.45
0.45
0.46
0.42
Sources of the positive effect
• Non-synonyms are in fact informative for plausibility
translation
• Semantically similar verbs: eat – munch – feast
• Similar events, similar arguments [Fillmore et al. 2003, Levin 1993]
• Semantically related verbs: peel – cook – eat
• Schemas/narrative chains: shared participants
[Shank & Abelson 1977, Chambers & Jurafsky 2009]
Our hypothesis with qualifications
•
Using non-synonymous translation pairs is helpful
1.
if transferred knowledge is lexical
•
2.
Many infrequently observed datapoints
if knowledge is stable across semantically related/similar
word pairs
• Counterexample: polarity/sentiment judgments
• food – feast – grub
• Parallel experiment: best results for single nearest neighbor
Summary
• Plausibility can be modeled with fairly shallow methods
• Seen head words plus generalization in vector space
• Precondition: accurately parsed corpus
• If unavailable: Transfer from better-endowed language
• Translation through automatically induced lexicons
• Transfer of knowledge about certain phenomena can
benefit from non-synonymous translations
• Corresponding to monolingual results from QA [Harabagiu et al.
2000], paraphrases [Lin & Pantel 2001], entailment [Dagan et al. 2006], …
Download