Ontologies - Departament de Ciències de la Computació

advertisement
Ontologies
German Rigau i Claramunt
http://www.lsi.upc.es/~rigau
TALP Research Center
Departament de Llenguatges i Sistemes Informàtics
Universitat Politècnica de Catalunya
Ontologies
Outline
•
•
•
•
•
•
•
•
•
•
•
WordNet (Miller et al. 90, Fellbaum 98)
EuroWordNet (Vossen et al. 98)
Spanish WordNet
Combining Methods (Atserias et al. 97)
Mapping hierarchies (Daudé et al. 01)
Mikrokosmos (Viegas et al. 96)
Cyc (Malesh et al. 96)
WordNet 2 (Harabagiu 98)
MindNet (Richardson et al. 97)
ThoughtTreasure (Mueller 00)
Meaning ...
WordNet &
EuroWordNet
German Rigau i Claramunt
http://www.lsi.upc.es/~rigau
TALP Research Center
Departament de Llenguatges i Sistemes Informàtics
Universitat Politècnica de Catalunya
WordNet & EuroWordNet
WordNet
• Universidad de Princeton (Miller et al. 1990)
• Conceptos lexicalizados (parabras, lexíes)
• Relacionados entre sí por relaciones semánticas
• sinonimia
• antonimia
• hiperonimia-hiponimia
• meronimia
• implicación
• causa
• ...
WordNet & EuroWordNet
Relaciones Semánticas de WN1.5
•Sinonimia
•Conceptos Lexicalizados (SYNSETS)
•Noción débil de sinonimia: Sinonimia en
contexto
•Synset: Conjunto de palabras o lexías que
en un contexto dado expresan un concepto
•Hiperonimia / Hiponimia
•Relación de clase a subclase
WordNet & EuroWordNet
Relacions Semàntiques de WN1.5
•Meronimias
•Parte componente
{mano}{brazo}
•Elemento de colectividad
{persona}{gente}
•Sustancia
{periódico}{papel}
WordNet & EuroWordNet
Relaciones Semánticas de WN1.5
• Antonimia
{grande}{pequeño}
• Causa
{matar}{morir}
• Implicación
{divorciarse}{casarse}
• Derivación
{presidencial}{presidente}
• Similitud
{bueno}{positivo}
WordNet & EuroWordNet
Ejemplo WordNet
<conveyance>
<vehicle>
<motor vehicle, automovile,...>
<doorlock>
<car door>
<cruiser, squad car, patrol car, ...>
<cruiser, squad car, patrol car, ...> <cab, taxi, hack, ...>
WordNet & EuroWordNet
EuroWordNet
• Proyecto LE-2 4003
•Telematics Application Programme de la UE
• Redes semánticas de diversas lenguas
• Integradas e interconectadas
•Inglés
•Holandés
•Italiano
•Español
Universidad de Sheffield
Univ. de Amsterdam
I.L.C. de Pisa
UB, UPC, UNED.
• Computers and the Humanities
• (Vol.monográfico,1998)
• http://www.hum.uva.nl/~ewn/
WordNet & EuroWordNet
Extensiones EuroWordNet
•EWN2
Alemán, Francés, Checo, Sueco, Estonio
•Proyecto ITEM
Castellano, Catalán, Vasco
•CREL (Centre de Referència d’Enginyeria Lingüística)
Catalán (UB, UPC)
WordNet & EuroWordNet
Aplicaciones
•Desarrollo de recursos Básicos
•Tratamiento interlingüístico de la información
- Sistemas multilingües de recuperación de
información (p.e., Internet)
- Módulo léxico-semántico de los sistemas de
ingeniería lingüística
 Extracción de información
 Traducción automática
WordNet & EuroWordNet
Requisitos de Diseño
•Preservación de las relaciones semánticas
específicas de cada lengua
•Máxima compatibilidad entre los diferentes
recursos
•Relativa independencia de los WordNets
•en el proceso de construcción
•en el resultado final
WordNet & EuroWordNet
Componentes de EuroWordNet
•Núcleo
•El ILI
•La Top Concept Ontology (TCO)
•Ontología de dominios (DO)
•Periferia
•WordNets específicos
WordNet & EuroWordNet
Interlingual Index of EuroWordNet
•Colección no estructurada de elementos
•Ligados con
•al menos, un synset de un EWN
•un elemento de la TCO o DO
•Asociados a synsets de WN 1.5
WordNet & EuroWordNet
Top Concept Ontology of EuroWordNet
•Jerarquía de conceptos independientes de la
lengua
•distinciones semánticas: objeto, lugar,
dinámico, …
•abstracta (no léxica)
•Superpuesta al ILI
•Tres tipos de entidades:
•Primer orden: entidades concretas
•Segundo orden: situaciones estáticas o
dinámicas
•Tercer orden: proposiciones abstractas
WordNet & EuroWordNet
Top Concept Ontology of EuroWordNet
Top0
1stOrderEntity1
2ndOrderEntity0
Origin0
SituationType6
Natural
21
30
Living
Plant18
Human106
Creature2
Animal23
Artifact144
0
Dynamic134
BoundedEvent183
UnboundedEvent48
28
Static
Property61
Relation38
SituationComponent0
Form
Cause67
Substance32
Solid63
Liquid13
Gas1
62
Object1
Composition0
Part86
Group63
Function55
Vehicle8
Agentive170
Phenomenal17
Stimulating25
Communication50
Condition62
Existence27
Experience43
Location76
Manner21
Mental90
3rdOrderEntity33
WordNet & EuroWordNet
Domain Ontology of EuroWordNet
•Jerarquía de etiquetas de dominio
•Reducción de la polisemia
•Dominios:
•Tráfico:
•Tráfico rodado, tráfico aéreo
•Información Internacional
•Micología
•Medicina
WordNet & EuroWordNet
Relaciones de EuroWordNet
•Riqueza superior a WN
•Entre:
•synsets (módulos monolingües)
•registros ILI (multilingües):
{actuar-1} EQ-SYNONYM {‘behave in a
certain manner’}
•registros ILI y TCO o OD
WordNet & EuroWordNet
Relaciones Interlingüísticas de EuroWordNet
finger
dito
toe
finger or toe
IT
finger
toe
head
ING
head
dedo
human head
cabeza
ES
P
animal head
ILI
eq_synonym
has_eq_hyponym
hoofd
kop
HOL
has_eq_hypernym
WordNet & EuroWordNet
Relaciones de EuroWordNet
relación
ejemplo
descripción
HAS_XPOS_HYPERONYM
HAS_XPOS_HYPONYM
NEAR_SYNONYM
NEAR_ANTONYM
XPOS_NEAR_ANTONYM
INVOLVED
etiquetas
aplicables
dcr
dr
r
r
r
dcr
destrucción > cambiar
cambio > destruir
aparato<>instrumento
construir <> destrozar
construcción > destrozar
martillear > martillo
ROLE
dcr
vino > beber
involved_agent
dcr
educar > educador
role_agent
dcr
educador > educar
role_location
dcr
comedor > comer
HAS_MERONYM
has_mero_portion
has_mero_location
BE_IN_STATE
dcrn
dcrn
dcrn
dcrn
cara > nariz
pan > mendrugo
desierto > oasis
belleza > bello
hiperonimia transcategorial
hiponimia transcategorial
cuasi-sinonimia
cuasi-antonimia
cuasi-antonimia transcategorial
entidad directamente relacionada
con un evento
evento directamente relacionado
con una entidad
involvement en que la entidad
realiza un papel agentivo
role en que la entidad realiza un
papel agentivo
role en que la entidad realiza
un papel locativo
meronimia (genérica)
inversa de la anterior
inversa de la anterior
estado correspondiente a la
posesión de una cierta propiedad
Spanish WordNet:
Building Process
German Rigau i Claramunt
http://www.lsi.upc.es/~rigau
TALP Research Center
Departament de Llenguatges i Sistemes Informàtics
Universitat Politècnica de Catalunya
Spanish WordNet
General Methodology
1) Mapping to WN1.5


manual work
automatic derivation of equivalents, using bilingual dictionaries
2) Manual correction
3) Re-structuring
Spanish WordNet
Main Steps: First Core (Manual Translation)
– Nouns:



A) WN1.5’s Tops File plus first level of
hyponyms (about 800 synsets).
B) The rest of EWN’s Common Base Concepts
(which were not in our set).
C) Manual translation of synsets intermediate
between (A) and (B) following WN1.5
hyerarchy thus building a compact taxonomy
equivalent to WN1.5 without gaps
– Verbs:

Manual translation of EWN’s Base Concepts
(about 150 synsets)
Spanish WordNet
Main Steps: Subset 1 (Semi-automatic)

Nouns:
– Applying authomatic methods using bi-lingual dictionaries
– Manual validation of several subsets to check if the link is
correct
– Deriving a Confidence Score (CS) for every authomatic method
(heuristic)
– Selecting pairs synset-word above 85% CS
– Some manual correction of this Subset 1 (mainly, filling gaps)

Verbs:
– 3600 English verbs connected to WN1.5 senses and
ambiguously translated to Spanish are manually inspected and
disambiguated
Spanish WordNet
Main Steps: Subset 1 (Results 1)
Synsets
number of senses (variants)
X variants per synset
Corresponding to number of entries (words)
X senses per word
Language Internal Relations
Average per synset
Equivalent Relations to ILI (WN1.5)
Average per synset
Synset without ILI
Percentage of Synsets without translation
Nouns Verbs Others Total
18577 2602
0 21179
39620 6795
0 46415
2.22 2.61
0
2.27
23216 2278
0 25494
1.77 2.98
0
1.88
40559 3749
0 44308
2.18 1.44
0
2.09
18634 2602
0 21236
1.00 1.00
0
1.00
0
0
0
0
0%
0%
0%
Spanish WordNet
Main Steps: Subset 1 (Results 2)
CS
100% (manual)
> 97%
> 95%
> 93%
> 86%
> 85 %
Total
Nouns Verbs Total
5041 6795 11836
403
0
403
304
0
304
1598
0 1598
27649
0 27649
4625
0 4625
39620 6795 46415
Spanish WordNet
Main Steps: Subset 2
Main goals

enhance the quality of the Subset 1 by manual
revision

extend it by manual building of synsets

4 Sub-tasks
Spanish WordNet
Main Steps: Subset 2
1) Covering manually those gaps in the hyponymy
chains covered by other languages
2) Manual cleaning of some automatically-generated
variants.
– (a) pairs of synsets which are adjacent in the hyponymy
chain and share at least one variant.
 deleting redundant variants
 re-locating to either pre-existant or newly created
synsets
– (b) multi-word expressions present in synsets.
 Deleting non-lexicalized
Spanish WordNet
Main Steps: Subset 2
3) Manual addition of new vocabulary which has been
considered relevant.
– It mainly comes from the Catalan WordNet: since
we are building both wordnets in parallell, we
detected those synsets which were built for
Catalan and not for Spanish
4) Manual addition of cross-part of speech relations
between nominal and verbal synsets.
– This work has been based mainly on noun-verb
pairs obtained by means of morphological criteria.
(Work carried out by UNED –Madrid-)
Spanish WordNet
Main Steps: Subset 2 (Results)
Synsets
number of senses (variants)
X variants per synset
Corresponding to number of entries (words)
X senses per word
Language Internal Relations
Average per synset
Equivalent Relations to ILI (WN1.5)
Average per synset
Synset without ILI
Percentage of Synsets without translation
Nouns Verbs Others Total
19663 3538
0 23201
39782 8394
0 48176
2.02 2.37
0
2.08
22881 3324
0 26205
1.74 2.53
0
1.84
43151 6756 2661 52568
2.19 1.91
?
2.27
19534 3534
0 23068
0.99 1.00
0
0.99
185
4
0
189
1%
0%
0
1%
Spanish WordNet
Main Steps: Subset 2 (Results)
Confidence (Variants)
100% (Manual)
>96%
>94%
>92%
>85%
>84%
Total
Nouns Verbs Total
7819 8394 16213
382
0
382
0 2948
2948
0 1364
1364
0 23113
23113
0 4156
4156
39782 8394 48176
Spanish WordNet
Main Steps: Beyond Subset 2

Massive Manual Checking (from Nov’98)
– Using WEI
– Variants automatically generated
– Filling gaps in the hierachy
– New vocabulary
– New Adjectives
Spanish WordNet
Main Steps: Beyond Subset 2
Synsets
No. of senses
Sens./syns.
Entries
Sens./entry
LIRels.
LIRels/syns
EQRels-ILI
EQRels/syn
Synsets without ILI
Noun
24215
40759
1.68
26485
1.54
54832
2.26
24209
1.00
62
Verb Others Total
4079 2191 30485
9317 2439 52515
2.28
1.11
1.72
3828 2439 32752
2.43
1.00
1.60
7978 10855 73665
1.96
*
2.42
4074
0 28283
1.00
0
0.93
5 2191 2258
Spanish WordNet
Main Steps: Beyond Subset 2
CS
99% (Manual)
97%
95%
93%
90%
86%
85%
Total
Nouns Verbs Adjectives Total
16568 9317
2439 28324
310
0
0
310
2652
0
0 2652
1173
0
0 1173
6
0
0
6
16605
0
0 16605
3445
0
0 3445
40759 9317
2439 52515
Spanish WordNet
Main Steps: Parole Coverage
Frequency parole entries
1001501-1000
251-500
101-250
51-100
31-50
21-30
11-20
6-10
3-5
2
1
overall
147
261
462
933
959
892
730
1202
1024
968
435
643
8656
Nouns
Verbs
parole covered %coverage parole
parole
%coverage
entries
covered
143
97.28
110
107
97.27
246
94.25
139
118
84.89
429
92.86
218
172
78.90
863
92.50
381
257
67.45
863
89.99
374
265
70.86
804
90.13
347
185
53.31
632
86.57
286
141
49.30
978
81.36
469
175
37.31
790
77.15
360
129
35.83
665
68.70
254
74
29.13
257
59.08
123
32
26.02
334
51.94
131
26
19.85
7004
80.91
3192
1681
52.66
Spanish WordNet
Current Figures
– Spanish, Catalan, Basque, (English)
– http://nipadio.lsi.upc.es/wei2.html
Nouns
Verbs
Adjs
Synsets Words Synsets Words
Synsets Words
English
60556
87641
11363
14727
16428
19101
Spanish
43522
47665
7934
5312
12481
8762
Catalan
30701
32987
4505
4285
1444
1561
Combining Multiple Methods
for the Automatic Construction
of Multilingual WordNets
German Rigau i Claramunt
http://www.lsi.upc.es/~rigau
TALP Research Center
Departament de Llenguatges i Sistemes Informàtics
Universitat Politècnica de Catalunya
Combining Multiple Methods ...
Outline

Ten class methods
– Four monosemic criteria
– Four polysemic criteria
– two hybrid criteria

Three conceptual distance methods
– CD1: using pairwise word coocurrences
– CD2: using headword and genus
– CD3: using bilingual Spanish entries with
multiple translations
Combining Multiple Methods ...
Ten class methods
– Four Classes
SW
EW
SW
EW
EW
SW
EW
SW
SW
EW
SW
EW
Combining Multiple Methods ...
Ten class methods
– Four monosemic criteria
SW
EW
Synset
SW
EW
Synset
EW
Synset
EW
Synset
SW
EW
Synset
SW
EW
Synset
SW
SW
Combining Multiple Methods ...
Ten class methods
– Four polysemic criteria
SW
EW
Synset+
SW
EW
Synset+
EW
Synset+
EW
Synset+
SW
EW
Synset+
SW
EW
Synset+
SW
SW
Combining Multiple Methods ...
Ten class methods
– Variant criterion
<..., EW, ..., EW, ...>
SW
– Field criterion
<..., headword-EW, ..., Ind-EW, ...>
SW
Combining Multiple Methods ...
Ten class methods

Results
Criterion
mono1
mono2
mono3
mono4
poly1
poly2
poly3
poly4
Variant
Field
#links #synsets #words %ok
3697
3583
3697
92
935
929
661
89
1863
1158
1863
89
2688
1328
2063
85
5121
4887
1992
80
1450
1426
449
75
11687
6611
3165
58
40298
9400
3754
61
3164
2195
2261
85
510
379
421
78
Combining Multiple Methods ...
Conceptual Distance methods

Conceptual Distance (Agirre et al. 94)
– length of the shortest path
– specificity of the concepts
1
dist(w 1 , w 2 )  min

c1i w1
c k path(c1i ,c 2i ) depth(c k )
c w
2i


using WordNet
Bilingual dictionary
2
Combining Multiple Methods ...
Conceptual Distance methods

Three conceptual distance methods
– CD1: using pairwise word coocurrences
– CD2: using headword and genus
– CD3: using bilingual Spanish entries with
multiple translations
Combining Multiple Methods ...
Conceptual Distance methods (Example CD2)
<entity>
<object, ...>
<artifact, artefact>
<structure, construction>
<building, edifice>
<house, lodging>
<place of worship, ...>
<church, church building>
<abbey>
<religious residence, cloiser>
<convent>
<monastery>
<abbey>
<abbey>
abadía_1_2 Iglesia o monasterio regido por un abad o abadesa
(abbey, a church or a monastery ruled by an abbot or an abbess)
Combining Multiple Methods ...
Conceptual Distance methods (Example CD2)
<entity>
<object, ...>
<artifact, artefact>
<structure, construction>
<building, edifice>
<house, lodging>
<place of worship, ...>
<church, church building>
<abbey> 06 ARTIFACT
<religious residence, cloiser>
<convent>
<monastery>
<abbey>
<abbey>
abadía_1_2 Iglesia o monasterio regido por un abad o abadesa
(abbey, a church or a monastery ruled by an abbot or an abbess)
Combining Multiple Methods ...
Three CD methods

Results
Criter.
CD - 1
CD - 2
CD - 3
#links
23,828
24,739
4,567
#synsets
11,269
12,709
3,089
#words
7,283
10,300
2,313
%ok
56
61
75
Combining Multiple Methods ...
Combining methods

Results
method1
cd1
cd2
cd3
p1
p2
size
%ok
size
%ok
size
%ok
size
%ok
size
%ok
method2
cd2
cd3
p1
p2
p3
p4
15736 1849 2076 556 3146 15105
79
85
86
86
72
64
0 2401 2536 592 3777 13246
0
86
88
86
75
67
0
0
205 180
215
3114
0
0
95
95
100
77
0
0
0
0
77
178
0
0
0
0
100
88
0
0
0
0
28
78
0
0
0
0
77
96
Combining Multiple Methods ...
Resulting Spanish WordNets
WNs
SpWN v0.0
Combination
SpWN v0.1
#links
10,982
7,244
15,535
#synsets
7,131
5,852
10,786
#word
8,396
3,939
9,986
#CS #poly links
87.4
1,777
85.6
2,075
86.4
3,373
Mapping
Conceptual Hierarchies Using
Relaxation Labelling
German Rigau i Claramunt
TALP Research Center
UPC
Mapping Conceptual Hierarchies using Relaxation Labelling
Outline
– Setting
– Relaxation Labelling Algorithm
– Constraints
– Experiments & Results I (multilingual)
– Experiments & Results II (monolingual)
– Further work
Mapping Conceptual Hierarchies using Relaxation Labelling
Setting
C1
C2
C3
C4
C5
C6
Mapping Conceptual Hierarchies using Relaxation Labelling
Setting
C1
C2
C3
C4
C5
C6
Mapping Conceptual Hierarchies using Relaxation Labelling
Setting
Connecting already existing Hierarchies
– Relaxattion labelling Algorithn
– Constraints
Between
– Spanish taxonomy automatically derived
from an MRD (Rigau et al. 98)
– WordNet

using a bilingual MRD
Mapping Conceptual Hierarchies using Relaxation Labelling
Setting
animal
(Tops <animal, animate_being, ...>)
ave
faisán
rapaz
(person <beast, brute, ...>)
(person <dunce, blockhead, ...>)
(animal <bird>)
(artifact <bird, shuttle, ...>)
(food <fowl, poultry, ...>)
(person <dame, doll, ...>)
(animal <pheasant>)
(food <pheasant>)
(animal <bird>)
(artifact <bird, shuttle, ...>)
(food <fowl, poultry, ...>)
(person <dame, doll, ...>)
Mapping Conceptual Hierarchies using Relaxation Labelling
Outline
– Setting
– Relaxation Labelling Algorithm
– Constraints
– Experiments & Results I (multilingual)
– Experiments & Results II (monolingual)
– Further work
Mapping Conceptual Hierarchies using Relaxation Labelling
Relaxation Labelling Algorithm
– Iterative algorithm for function optimization
based on local information
– it can deal with any kind of constraints


variables (senses of the taxonomy)
labels (synsets)
– Finds a weight assignment for each
possible label for each variable


weights for the labels of the same variable add
up to one
weigth assignation satisfies -to the maximum
possible extent- the set of constraints
Mapping Conceptual Hierarchies using Relaxation Labelling
Relaxation Labelling Algorithm
1) Start with a random weight assigment
2) Compute the support value for each label of each
variable (according to the constraints)
3) Increase the weights of the labels more compatible
with context and decrease those and decrease those
of the less compatible labels.
4) If a stopping/convergence is satisfied, stop,
otherwiese go to step 2.
Mapping Conceptual Hierarchies using Relaxation Labelling
Outline
– Setting
– Relaxation Labelling Algorithm
– Constraints
– Experiments & Results I (multilingual)
– Experiments & Results II (monolingual)
– Further work
Mapping Conceptual Hierarchies using Relaxation Labelling
Constraints
– Rely on the taxonomy structure
– Coded with three characters



X: Spanish Taxonomy,
I (immediate),
Y: English Taxonomy,
A (ancestor)
X: Relation, E (hypernym), O (hyponym), B (both)
– Examples:
IIE
AAB
+
+
+
+
Mapping Conceptual Hierarchies using Relaxation Labelling
Hierarchical Constraints
– II Constraints
IIE
NAACL’2001
IIO
IIB
Mapping Conceptual Hierarchies using Relaxation Labelling
Hierarchical Constraints
– AI Constraints
+
+
+
AIE
NAACL’2001
+
AIO
AIB
Mapping Conceptual Hierarchies using Relaxation Labelling
Hierarchical Constraints
– IA Constraints
+
+
+
IAE
NAACL’2001
IAO
+
IAB
Mapping Conceptual Hierarchies using Relaxation Labelling
Hierarchical Constraints
– AA Constraints
+
+
+
AAE
NAACL’2001
+
AAO
+
+
+
+
AAB
Mapping Conceptual Hierarchies using Relaxation Labelling
Outline
– Setting
– Relaxation Labelling Algorithm
– Constraints
– Experiments & Results I
(multilingual)
– Experiments & Results II (monolingual)
– Further work
Combining Multiple Methods ...RANLP’97
Eight class methods
– Four monosemic criteria
Prec.
Cov.
SW
EW
Synset 92%
5%
SW
EW
Synset 89%
1%
EW
Synset
EW
Synset 89%
2%
SW
EW
Synset 85%
4%
SW
EW
Synset
SW
SW
Combining Multiple Methods ...RANLP’97
Eight class methods
– Four polysemic criteria
Prec.
Cov.
SW
EW
Synset+ 80%
8%
SW
EW
Synset+ 75%
2%
EW
Synset+
EW
Synset+ 58%
17%
SW
EW
Synset+ 61%
60%
SW
EW
Synset+
SW
SW
Combining Multiple Methods ...RANLP’97
Experiments & Results
Poly
TOK, FOK
TOK, FNOK
total
animal
food
cognition
communication
279
166
198
533
30 (91%)
3 (100%)
27 (90%)
40 (97%)
209
169
225
573
all
TOK, FOK
TOK, FNOK
total
animal
food
cognition
communication
424
166
200
536
62 (95%)
83 (100%)
245 (90%)
234 (97%)
486
249
445
760
(90%)
(94%)
(67%)
(77%)
(93%)
(94%)
(67%)
(77%)
(90%)
(94%)
(69%)
(78%)
(90%)
(96%)
(82%)
(81%)
Combining Multiple Methods ...RANLP’97
Experiments & Results
piel
(substance <skin, fur, peel>)
marta
visón
(substance <sable, marte, coal_back>)
(substance <mink, mink_coat>)
Mapping Conceptual Hierarchies using Relaxation Labelling
Outline
– Setting
– Relaxation Labelling Algorithm
– Constraints
– Experiments & Results I (multilingual)
– Experiments & Results II
(monolingual)
– Further work
A Complete WN1.5 to WN1.6 Mapping ... ACL’00, NAACL’01
Generalized Constraints

All Relationships
– also-see, similar-to, attribute, antonym, etc.
R
R
A Complete WN1.5 to WN1.6 Mapping ... ACL’00, NAACL’01
Generalized Constraints

Non-structural constraints
– W: number of word coincidences
– G: word coincidences in glosses
– F: number of frame coincidences (verbs)
A Complete WN1.5 to WN1.6 Mapping ... ACL’00, NAACL’01
POS mapping depencences
Nouns
Adjectives
Verbs
Adverbs
A Complete WN1.5 to WN1.6 Mapping ... ACL’00, NAACL’01
Constraints for Verbs

Structural constraints
– hyper/hyponymy
– antonymy
– also-see

Non-structural constraints
– W, G and F
A Complete WN1.5 to WN1.6 Mapping ... ACL’00, NAACL’01
Constraints Adjectives

Structural constraints
– Adj-to-Adj

antonymy, similar-to and also-see
– Adj-to-Verb

participle-of
– Adj-to-Noun


pertains and attribute
Non-structural constraints
– W and G
A Complete WN1.5 to WN1.6 Mapping ... ACL’00, NAACL’01
Constraints Adverbs

Structural constraints
– Adv-to-Adv

antonymy
– Adv-to-Adj


derived
Non-structural constraints
– W and G
A Complete... ACL’00, NAACL’01
Example extra-POS
WN1.5
02025107a
evangelical evangelistic
pertainym
04237485n
Gospel Gospels evangel
WN1.6
00843344a
evangelical evangelistic
Similar to
00842521a
enthusiastic
02025107a
evangelical
pertainym
04853575n
Gospel Gospels evangel
A Complete WN1.5 to WN1.6 Mapping ... ACL’00, NAACL’01
Example extra-POS
WN1.5
00057615r
impossibly absurdly
WN1.6
00294844r
impossibly
derived from
derived from
01393725a
impossible
01752468a
impossible
antonym
00294658a
possibly
A Complete WN1.5 to WN1.6 Mapping ... ACL’00, NAACL’01
Results

Basic constraint set: structural constraints
– Nouns: AA hyper/hyponym
– Verbs: AA hyper/hyponym, II also-see
– Adjectives: II antonymy, similar-to, also-see
– Adverbs: II antonymy
A Complete WN1.5 to WN1.6 Mapping ... ACL’00, NAACL’01
Results

Basic constraint set: structural constraints
Coverage
N 99.7%
Ambigous
94.9% - 99.6%
Overall
97.6% - 99.8%
V
A
R
93.5% - 99.2%
82.8% - 98.9%
97.5% - 100%
94.6% - 99.2%
89.5% - 99.4%
99.0% - 100%
96.9%
94.1%
80.8%
Precision - recall
A Complete WN1.5 to WN1.6 Mapping ... ACL’00, NAACL’01
Results

Basic constraint set + W, G and F for verbs
Coverage
N 99.9%
Ambigous
97.5% - 97.7 %
Overall
98.8% - 98.9%
V
A
R
99.4% - 99.7%
96.5% - 98.8%
97.5% - 100%
99.3% - 99.6%
97.9% - 99.3%
99.0% - 100%
99.8%
98.9%
99.5%
Precision - recall
A Complete WN1.5 to WN1.6 Mapping ... ACL’00, NAACL’01
Results

Basic + extra-POS relationships
Coverage
N -
-
-
V
A
R
95.8% - 98.9%
69.2% - 94.2%
90.9% - 99.4%
97.9% - 98.1%
95.8%
88.0%
Ambigous
Overall
Precision - recall
A Complete WN1.5 to WN1.6 Mapping ... ACL’00, NAACL’01
Results

Basic + extra-POS relationships + WGF
Coverage
N 99.9%
Ambigous
97.5% - 97.7 %
Overall
98.8% - 98.9%
V
A
R
99.4% - 99.7%
96.5% - 99.1%
98.3% - 100%
99.3% - 99.6%
97.9% - 99.5%
99.3% - 100%
99.8%
99.0%
99.6%
Precision - recall
Mapping Conceptual Hierarchies using Relaxation Labelling
Conclusions
– First complete mapping between Wordnet
versions
– Combining structural and non-structural
information
– Robust approach based on local
information, but with global effects
– Incremental POS approach
– http://www.lsi.upc.es/~nlp
– 90 downloads (since November 2000)
Mapping Conceptual Hierarchies using Relaxation Labelling
Further Work
– mapping other structures


WN-EDR, WN-LDOCE, etc.
Other language taxonomies to EuroWordNet
– SpanishEWN to WN1.6
– symmetrical philosophy rather than sourcetarget
Mikrokosmos
German Rigau i Claramunt
http://www.lsi.upc.es/~rigau
TALP Research Center
Departament de Llenguatges i Sistemes Informàtics
Universitat Politècnica de Catalunya
Mikrokosmos
Outline
• Introduction
• Representational Issues
• The Lexicon
• The Ontology
• Acquisition Process
• Lexicon Acquisition
• Guidelines
• Ontology/Lexicon Trade-off
• Semantics in Action
Mikrokosmos
Introduction
• Knowledge Base Machine Translation (KBMT)
• CRL, NMSU
• 5,000 concepts
• Events
• Objects
• Properties
• 7,000 Spanish word senses
• 40,000 word senses
• after expansion with productive Lexical Rules
• comprar -> comprador, comprable, ...
• Text Meaning Representation
Mikrokosmos
Representational Issues: The Lexicon
• Typed Feature Structures (Pollard and Sag 87)
• language-dependant
• 10 zones
• phonology
• orthography
• morphology
• Syntactic (subcategorization)
• Semantic (Lexical Semantic Representation)
• syntax-semantic linking
• stylistics
• paradigmatic
• syntacmatic
Mikrokosmos
Representational Issues: The Lexicon
Adquirir-V1
syn: subj: cat:
obj: cat:
sem: acquire
agent:
theme:
Adquirir-V2
syn: subj: cat:
obj: cat:
sem: acquire
agent:
theme:
NP
NP
HUMAN
OBJECT
NP
NP
HUMAN
INFORMATION
Mikrokosmos
Representational Issues: The Ontology
•
•
•
•
•
Taxonomic multi-hierarchical
14 local or inherited links in average
language-impartial
EVENTS, OBJECTS, PROPERTIES
Methodology & Guidelines
Mikrokosmos
Representational Issues: The Ontology
• ACQUIRE
DEFINITION
“The transfer of possession event where the
agent transfers an object to its possession”
IS - A
TRANSFER-POSSESSION
SOURCE
HUMAN PLACE
THEME
OBJECT (NOT HUMAN)
AGENT
ANIMAL (DEFAULT HUMAN)
DESTINATION ANIMAL PLACE (DEFAULT HUMAN)
INHERITED
BENEFICIARY
HUMAN
Mikrokosmos
Acquisition Process: The Lexicon
• Multi-lingual
•French, English, Japanese, Russian, Spanish, etc.
• Multi-media
• Multi-process
•
•
•
•
•
•
Analysis
Generation (mono and multilingual)
MT
Summarization
IE
Speech Processing
• Tools
• corpus-search, lookup dictionary, ontology browser
Mikrokosmos
Acquisition Process: The Ontology
• Guidelines
1) Do not add instances as concepts
• Instances do not have their own instances
• Concepts do not have fixed position in space/time
2) Do not decompose concepts further
3) Use close concepts
4) Do not add EVENTs with particular arguments
5) Do not add concepts with instance-specific aspects,
temporal relations
6) Do not add language-specific concepts
7) Do not add ontologycal concepts for collections
Mikrokosmos
Acquisition Process: Ontology/Lexicon Trade-off
• Daily negociations
• lexicon acquirers
• ontology acquirers
• Possibilities
• one-to-one mapping
• lexicon unspecification
• lexicon ontology balance
Mikrokosmos
Acquisition Process: Ontology/Lexicon Trade-off
• one-to-one mapping
PREPARE-FOOD
INST: COOKING-EQUIPMENT
COOK
BAKE
INST: STOVE
cook : cuire sur le feu
• Problems
INST: OVEN
bake : cuire ou four
• Lexical: every word in a language is a concept
• conceptual: cuire in french is not ambiguous
Mikrokosmos
Acquisition Process: Ontology/Lexicon Trade-off
• Lexicon Unspecification
PREPARE-FOOD
INST: COOKING-EQUIPMENT
cook : cuire sur le feu
• Problems
bake : cuire ou four
• BAKE is not in the ontology
INST: OVEN
Mikrokosmos
Acquisition Process: Ontology/Lexicon Trade-off
• Lexicon-Ontology Balance
PREPARE-FOOD
INST: COOKING-EQUIPMENT
BAKE
FRY
INST: STOVE
INST: FRYING-PAN
INST: OVEN
bake
cook : cuire
Mikrokosmos
Semantics in Action
• El grupo Roche, a través de su compañía en España,
adquirió Doctor Andreu.
• El grupo Roche adquirió Doctor Andreu a través de su
compañía en España.
• La adquisición de Doctor Andreu por el grupo Roche fue
hecha a través de su compañía en España.
ACQUIRE-1
ORGANIZATION-1
ORGANIZATION-2
ORGANIZATION-3
Agent: ORGANIZATION-1
Theme: ORGANIZATION-2
Instrument: ORGANIZATION-3
Object-Name: Grupo Roche
Object-Name: Doctor Andreu
Location: España
Mikrokosmos
Semantics in Action
• Onto-Search: Ontological search mechanism to
check constraints
• check-onto(ACQUIRE, EVENT) = 1
• since ACQUIRE is a type of EVENT
• check-onto(ORGANIZATION, HUMAN) = 0.9
• since ORGANIZATION HAS-MEMBER HUMAN
Mikrokosmos
Semantics in Action
1) a-través-de INSTRUMENT, LOCATION
adquirir require PHYSICAL-OBJECT
2) en LOCATION, TEMPORAL
España is not a TEMPORAL-OBJECT
3) adquirir ACQUIRE, LEARN
Doctor Andreu is not an INFORMATION
4) Doctor Andreu ORGANIZATION, HUMAN
the Theme of ACQUIRE is not HUMAN
5) compañía CORPORATION, SOCIAL-EVENT
ORGANIZATIONs typically fill the INSTRUMENT slot of
ACQUIRE acts
Mikrokosmos
Experiment: WSD
Text
words
words/sentence
open-class words
ambiguous words
syntax
correct
%
1
347
16.5
183
57
21
51
97
2
385
24.0
167
42
19
41
99
3
370
26.4
177
57
20
45
93
4
353
20.8
177
35
12
34
99
Mean
364
21.4
176
48
18
43
97
Mikrokosmos
Experiment: WSD
Text
words
words/sentence
open-class words
ambiguous words
syntax
correct
%
Mean
364
21.4
176
48
18
43
97
Mean Unseen
390
26
104
26
9
23
97
WordNet2
German Rigau i Claramunt
http://www.lsi.upc.es/~rigau
TALP Research Center
Departament de Llenguatges i Sistemes Informàtics
Universitat Politècnica de Catalunya
WordNet2
Outline
•
•
•
•
•
•
•
Introduction
Text Inferences
Defining Features
Plausible inferences
Inference Rules
Semantic Paths
What WordNet cannot do
WordNet2
Introduction
• (Harabagiu 98)
• Commonse reasoning requires extensive
knowledge
• ~ 100 millions of concepts and relations
• WordNet
• represents almost all English words
• 100.000 synsets
• linked by semantic relations
• WordNet2
• each synset has a gloss that, when disambiguated
may increase the number of relations
• WordNet glosses into semantic networks
• NEW RELATIONS
WordNet2
Text Inferences
German was hungry
He opened the refrigerator
• hungry (feeling a need or desire to eat)
• eat (take in solid food)
• refrigerator (an appliance in which foods can be
stored at low temperature)
WordNet2
Defining Features
• Transform each concept’s gloss into a graph
where concepts are nodes and lexical relations
are links
• <culture> (all the knowledge shared by society)
<share> --AGENT--> <society>
• <doctor> (licensed medical practitioner)
<medical practitioner> --ATRIBUTTE--> <licensed>
WordNet2
Defining Features
ship
OBJECT
guide
PURPOSE
pilot
LOCATION
person
GLOSS
water
ATTRIBUTE
ATTRIBUTE
qualified
difficult
WordNet2
Inference Rules
Rule 1
Rule 2
VC1
IS-A
VC2
VC2
IS-A
VC3
------------------------VC1
IS-A
VC3
Rule 3
VC1
IS-A
VC2
VC2
ENTAIL VC3
------------------------VC1
ENTAIL VC3
Rule 2
VC1
IS-A
VC2
VC2
R_IS-A VC3
------------------------VC1
PLAUSIBLE (not VC3)
• 16 + 1 regles
VC1
IS-A
VC2
VC2
R_ENTAIL VC3
------------------------VC1
EXPLAINS VC3
WordNet2
Semantic Paths
0) Create and load the KB
1) Place markers on KB concepts
2) Propagate markers
The algorithm avoids cycles
3) Detect collisions
To each marker collision it corresponds a path
4) Extract Inferences
WordNet2
Semantic Paths
Inference sequence
• German was hungry
• German felt a desire to eat
• German felt a desire to take in food
COLLISION: German=he felt a desire to take
food, stored in an appliance, which he opened
• He opened an appliance where food is stored
• He opened the refrigerator
WordNet2
What WordNet cannot do
Major WordNet limitations:
1) The lack of compound concepts
2) The small number of causation and
entailment relations
3) the lack of preconditions for verbs
4) the absence of case relations
ThoughtTreasure
German Rigau i Claramunt
http://www.lsi.upc.es/~rigau
TALP Research Center
Departament de Llenguatges i Sistemes Informàtics
Universitat Politècnica de Catalunya
ThoughtTreasure
Overview
• a comprehensive platform for
• NLP English, French
• commonsense reasoning
•
•
•
•
•
•
•
A hotel room has a bed, night table, ...
People has fingernails
soda is a drink
one hangs up at the end of a phone call
the sky is blue
dogs bark
someone who is 16 years old is a teenager
ThoughtTreasure
Overview
• 25,000 concepts organized into a hierarchy
EVIAN -> FLAT-WATER -> DRINKING-WATER
•55,000 words (English, French)
food <-> aliment <-> FOOD
•50,000 asertions about concepts
green-pea is green
•100 scripts
ThoughtTreasure
Overview
• Text Agents for recognizing names, phones, etc
• mechanisms for learning new words
•
•
•
•
•
•
•X-phile is someone who likes X
a syntactic parser
a NL generator
a semantic parser
an anaphoric parser
planning agents for achieving goals
understanding agents
ThoughtTreasure
Example
• Who created Bugs Bunny?
• 1.0 (create human-interrogative-pronoun Bugs-Bunny)
• 0.9 (create rock-group-the-Who Bugs-Bunny)
• 1.0 (create Tex-Avery Bugs-Bunny)
• 0.1 (not (create rock-group-the-Who Bugs-Bunny))
Meaning
German Rigau i Claramunt
http://www.lsi.upc.es/~rigau
TALP Research Center
Departament de Llenguatges i Sistemes Informàtics
Universitat Politècnica de Catalunya
Meaning
Overview

Bases de Conocimiento
– Enriquecimiento automático de EWN (modelos
verbales, etc.)
– Aproximación mixta (KB + ML)
– Q/A

Problema
– ambigüedad estructural y léxica

Aproximación
– localizar automáticamente ejemplos de sentidos
(Leacock et al. 98, Mihalcea y Moldovan 99)
– WSD a gran escala (Boosting, SVM, transductivos …)
– Acquisición Conocimiento (Ribas 95, McCarthy 01)
Meaning
Exploiting EWN Semantic Relations
<evento>
<agrupación grupo colectivo>
<evento social>
<grupo_social>
<competición, concurso>
<organización>
<partido_1>
<partido_2, partido_político>
<semifinal>
<cuartos_de_final>
<partido_laborista>
Meaning
Exploiting EWN Semantic Relations
partido 1
Todos los partidos piden reformas legales para TV3.
La derecha planea agruparse en un partido.
El diputado reiteró que ni él ni UDC, “como partido”, han recibido
dinero de Pellerols.
partido 2
Pero España puso al partido intensidad, ritmo y coraje.
El seleccionador cree que el partido de hoy contra Italia dará la
medida de España
El Racing no gana en su campo desde hace seis partidos.
Meaning
Exploiting EWN Semantic Relations
partido 1
No negociaremos nunca com un partido político que sea partidario
de la independencia de Taiwan.
Una vez más es noticia la desviación de fondos destinadoss a la
formación ocupacional hacia la financiación de un partido político.
Estas lleyess fueron votadas gracias a un consenso general de los
partidos políticos.
partido 2
Rivera pide el suporte de la afición para encarrilar las semifinales.
Sólo el equipo de Valero Ribera puede sentenciar una semifinal como
lo hizo ayer en un Palau Blaugrana completamente entregado.
El Racing ganó los cuartos de final en su campo.
Meaning
Arquitecture
English
Web Corpus
ACQ
WSD
English
EWN
WSD
UPLOAD
UPLOAD
PORT
PORT
ACQ
Spanish
Web Corpus
Catalan
Web Corpus
Spanish
EWN
Catalan
EWN
WSD
Italian
EWN
Italian
Web Corpus
ACQ
PORT
Multilingual
Central Repository
UPLOAD
UPLOAD
PORT
Basque
EWN
WSD
ACQ
Basque
Web Corpus
Download