Semantic relations

advertisement
From WordNet,
to EuroWordNet,
to the Global Wordnet Grid:
anchoring languages to universal meaning
Piek Vossen
VU University Amsterdam
1
What kind of resource is wordnet?
• Mostly used database in language
technology
• Enormous impact in language technology
development
• Large
• Free and downloadable
• English
2
WordNet
 http://wordnet.princeton.edu/
• Developed by George Miller and his team at
Princeton University, as the implementation of
a mental model of the lexicon
• Organized around the notion of a synset: a set
of synonyms in a language that represent a
single concept
• Semantic relations between concepts
• Covers over 117,000 concepts and over
150,000 English words
Relational model of meaning
animal
kitten
animal
man
boy
man
woman
cat
dog
cat
meisje
boy
girl
kitten
puppy
dog
puppy
woman
4
Wordnet: a network of semantically
related words
{conveyance;transport}
{vehicle}
{motor vehicle; automotive vehicle}
{car mirror}
{armrest}
{car door}
{doorlock}
{car; auto; automobile; machine; motorcar}
{bumper}
{car window}
{cruiser; squad car; patrol car;
police car; prowl car}
{cab; taxi; hack; taxicab}
{hinge;
flexible joint}
Wordnet Semantic Relations
WN 1.5 starting point
The ‘synset’ as a weak notion of synonymy:
“two expressions are synonymous in a linguistic context C
if the substitution of one for the other in C does not alter
the truth value.” (Miller et al. 1993)
Relations between synsets:
Relation
POS-combination
ANTONYMY
adjective-to-adjective
verb-to-verb
HYPONYMY
noun-to-noun
verb-to-verb
MERONYMY
noun-to-noun
ENTAILMENT
verb-to-verb
CAUSE
verb-to-verb
Example
good/bad
open/ close
car/ vehicle
walk/ move
head/ nose
buy/ pay
kill/ die
6
Wordnet Data Model
Relations
type-of
type-of
part-of
Concepts
rec: 12345
1
- financial institute
rec: 54321
2
- side of a river
rec: 9876
- small string instrument
rec: 65438
- musician playing violin
rec:42654
- musician
rec:35576
1
- string of instrument
rec:29551
2
- underwear
rec:25876
- string instrument
Vocabulary of a language
bank
1
2
fiddle
violin
fiddler
violist
string
7
Some observations on Wordnet
• synsets are more compact representations for concepts than
word meanings in traditional lexicons
• synonyms and hypernyms are substitutional variants:
– begin – commence
– I once had a canary. The bird got sick. The poor animal died.
• hyponymy and meronymy chains are important transitive
relations for predicting properties and explaining textual
properties:
object -> artifact -> vehicle -> 4-wheeled vehicle -> car
• strict separation of part of speech although concepts are
closely related (bed – sleep) and are similar (dead – death)
• lexicalization patterns reveal important mental structures
8
Lexicalization patterns
entity
object
garbage
threat
artifact
building bird
25 unique
beginners
organism
animal
plant
waste
tree flower
basic level
church canary dog crocodile
rose
concepts
• balance of two principles:
abbey common
• predict most features
canary
• apply to most subclasses
• where most concepts are created
• amalgamate most parts
• most abstract level to draw a pictures
9
Wordnet top level
10
Meronymy & pictures
beak
tail
leg
11
Meronymy & pictures
12
Co-reference constraint in wordnet:
Cats cannot be a kind of cats
•
•
•
•
•
•
•
•
•
S: (n) cat, true cat (feline mammal usually having thick soft fur and no ability to roar:
domestic cats; wildcats)
S: (n) guy, cat, hombre, bozo (an informal term for a youth or man) "a nice guy"; "the
guy's only doing it for some doll"
S: (n) cat (a spiteful woman gossip) "what a cat she is!"
S: (n) kat, khat, qat, quat, cat, Arabian tea, African tea (the leaves of the shrub Catha
edulis which are chewed like tobacco or used to make tea; has the effect of a euphoric
stimulant) "in Yemen kat is used daily by 85% of adults"
S: (n) cat-o'-nine-tails, cat (a whip with nine knotted cords) "British sailors feared the
cat"
S: (n) Caterpillar, cat (a large tracked vehicle that is propelled by two endless metal
belts; frequently used for moving earth in construction and farm work)
S: (n) big cat, cat (any of several large cats typically able to roar and living in the wild)
S: (n) computerized tomography, computed tomography, CT, computerized axial
tomography, computed axial tomography, CAT (a method of examining body organs
by scanning them with X rays and using a computer to construct a series of crosssectional scans along a single axis)
S: (n) domestic cat, house cat, Felis domesticus, Felis catus (any domesticated member
13
of the genus Felis)
14
Wordnet 3.0 statistics
POS
Unique
Synsets
Total
Word-Sense
Pairs
Strings
Noun
117,798
82,115
146,312
Verb
11,529
13,767
25,047
Adjective
21,479
18,156
30,002
4,481
3,621
5,580
155,287
117,659
206,941
Adverb
Totals
15
Wordnet 3.0 statistics
POS
Noun
Verb
Adjective
Adverb
Totals
Monosemous
Polysemous
Polysemous
Words and
Senses
Words
Senses
101,863
15,935
44,449
6,277
5,252
18,770
16,503
4,976
14,399
3,748
733
1,832
128,391
26,896
79,450
16
Wordnet 3.0 statistics
POS
Average Polysemy
Average Polysemy
Including Monosemous
Words
Excluding Monosemous
Words
Noun
1.24
2.79
Verb
2.17
3.57
1.4
2.71
1.25
2.5
Adjective
Adverb
17
http://www.visuwords.com
18
19
Usage of Wordnet
• Improve recall of textual based analysis:
– Query -> Index
•
•
•
•
•
Synonyms: commence – begin
Hypernyms: taxi -> car
Hyponyms: car -> taxi
Meronyms: trunk -> elephant
Lexical entailments: gun -> shoot
• Inferencing:
– what things can burn?
• Expression in language generation and translation:
– alternative words and paraphrases
20
Improve recall
• Information retrieval:
– small databases without redundancy, e.g. image
captions, video text
• Text classification:
– small training sets
• Question & Answer systems
– query analysis: who, whom, where, what, when
21
Improve recall
• Anaphora resolution:
– The girl fell off the table. She....
– The glass fell of the table. It...
• Coreference resolution:
– When he moved the furniture, the antique table got
damaged.
• Information extraction (unstructed text to
structured databases):
– generic forms or patterns "vehicle" - > text with
specific cases "car"
22
Improve recall
• Summarizers:
– Sentence selection based on word counts ->
concept counts
– Avoid repetition in summary -> language
generation
• Limited inferencing: detect locations,
organisations, etc.
23
Many others
• Data sparseness for machine learning:
hapaxes can be replaced by semantic classes
• Use redundancy for more robustness:
spelling correction and speech recognition
can built semantic expectations using
Wordnet and make better choices
• Sentiment and opinion mining
• Natural language learning
24
Recall & Precision
“jail”
“nerve cell”
“police cell”
“neuron”
found
query:
“cell”
“cell
phone”
intersection
“mobile
phones”
relevant
recall = doorsnede / relevant
Recall < 20%
for basic search engines!
precision = doorsnede / gevonden
(Blair & Maron 1985)
EuroWordNet
• The development of a multilingual database with wordnets
for several European languages
• Funded by the European Commission, DG XIII,
Luxembourg as projects LE2-4003 and LE4-8328
• March 1996 - September 1999
• 2.5 Million EURO.
• http://www.hum.uva.nl/~ewn
• http://www.illc.uva.nl/EuroWordNet/finalresultsewn.html
26
EuroWordNet
• Languages covered:
– EuroWordNet-1 (LE2-4003): English, Dutch, Spanish, Italian
– EuroWordNet-2 (LE4-8328): German, French, Czech, Estonian.
• Size of vocabulary:
– EuroWordNet-1: 30,000 concepts - 50,000 word meanings.
– EuroWordNet-2: 15,000 concepts- 25,000 word meaning.
• Type of vocabulary:
– the most frequent words of the languages
– all concepts needed to relate more specific concepts
27
EuroWordNet Model
Domains
move
go
Air
bewegen
gaan
2OrderEntity
Traffic
III
ride
Ontology
Location Dynamic
Road`
III
rijden
drive
I
III
I
II
III
II
Lexical Items Table
Lexical Items Table
Lexical Items Table
ILI-record
{drive}
Lexical Items Table
III
III
II
cabalgar
jinetear
II
conducir
III
mover
transitar
berijden
cavalcare
guidare
Inter-Lingual-Index
I = Language Independent link
II = Link from Language Specific
to Inter lingual Index
III = Language Dependent Link
III
andare
muoversi
28
Differences in relations between
EuroWordNet and WordNet
• Added Features to relations
• Cross-Part-Of-Speech relations
• New relations to differentiate shallow hierarchies
• New interpretations of relations
30
EWN Relationship Labels
{airplane}
HAS_MERO_PART: conj1
HAS_MERO_PART: conj2 disj1
HAS_MERO_PART: conj2 disj2
{door}
{jet engine}
{propeller}
{door}
HAS_HOLO_PART: disj1
HAS_HOLO_PART: disj2
HAS_HOLO_PART: disj3
{car}
{room}
{entrance}
{dog}
HAS_HYPERONYM: conj1
HAS_HYPERONYM: conj2
{mammal}
{pet}
{albino}
HAS_HYPERONYM: disj1
HAS_HYPERONYM: disj2
{plant}
{animal}
Default Interpretation: non-exclusive disjunction
32
EWN Relationship Labels
Factive/Non-factive CAUSES (Lyons 1977)
factive (default interpretation):
“to kill causes to die”:
{kill}
CAUSES
{die}
non-factive: E1 probably or likely causes event E2 or E1 is intended to cause
some event E2:
“to search may cause to find”.
{search}
CAUSES
{find} non-factive
33
Cross-Part-Of-Speech relations
WordNet1.5: nouns and verbs are not interrelated by basic semantic
relations such as hyponymy and synonymy:
adornment 2
adorn 1
change of state-- (the act of changing something)
change, alter-- (cause to change; make different)
EuroWordNet: words of different parts of speech can be inter-linked with
explicit xpos-synonymy, xpos-antonymy and xpos-hyponymy relations:
{adorn V}
{size N}
XPOS_NEAR_SYNONYM
XPOS_NEAR_HYPONYM
{adornment N}
{tall A}
{short A}
34
Role relations
In the case of many verbs and nouns the most salient relation is not the hyperonym
but the relation between the event and the involved participants. These relations
are expressed as follows:
{knife}
{to cut}
{school}
{to teach}
ROLE_INSTRUMENT
INVOLVED_INSTRUMENT
ROLE_LOCATION
INVOLVED_LOCATION
{to cut}
{knife}
{to teach}
{school}
reversed
reversed
These relations are typically used when other relations, mainly hyponymy, do not
clarify the position of the concept network, but the word is still closely related to
another word.
35
Co_Role relations
guitar player
player
to play music
guitar
ice saw
saw
ice
HAS_HYPERONYM
CO_AGENT_INSTRUMENT
HAS_HYPERONYM
ROLE_AGENT
CO_AGENT_INSTRUMENT
HAS_HYPERONYM
ROLE_INSTRUMENT
HAS_HYPERONYM
CO_INSTRUMENT_AGENT
HAS_HYPERONYM
CO_INSTRUMENT_PATIENT
HAS_HYPERONYM
ROLE_INSTRUMENT
CO_PATIENT_INSTRUMENT
player
guitar
person
to play music
musical instrument
to make
musical instrument
musical instrument
guitar player
saw
ice
saw
to saw
ice saw REVERSED
36
Co_Role relations
Examples of the other relations are:
criminal
novel writer/ poet
dough
photograpic camera
CO_AGENT_PATIENT
CO_AGENT_RESULT
CO_PATIENT_RESULT
CO_INSTRUMENT_RESULT
victim
novel/ poem
pastry/ bread
photo
37
Overview of the Language Internal
relations in EuroWordnet
Same Part of Speech relations:
NEAR_SYNONYMY
HYPERONYMY/HYPONYMY
ANTONYMY
HOLONYMY/MERONYMY
apparatus - machine
car - vehicle
open - close
head - nose
Cross-Part-of-Speech relations:
XPOS_NEAR_SYNONYMY
dead - death; to adorn - adornment
XPOS_HYPERONYMY/HYPONYMY
to love - emotion
XPOS_ANTONYMY
to live - dead
CAUSE
die - death
SUBEVENT
buy - pay; sleep - snore
ROLE/INVOLVED
write - pencil; hammer - hammer
STATE
the poor - poor
MANNER
to slurp - noisily
38
BELONG_TO_CLASS
Rome - city
Horizontal & vertical semantic relations
chronical patient ;
mental patient
HYPONYM
ρ-PATIENT
patient
STATE
cure
ρ-CAUSE
docter
treat
ρ-PATIENT
ρ-AGENT
HYPONYM
child docter
disease; disorder
HYPONYM
stomach disease,
kidney disorder,
ρ-PROCEDURE
physiotherapy
medicine
etc.
ρ-LOCATION
co-ρAGENT-PATIENT
hospital, etc.
child
The Multilingual Design
• Inter-Lingual-Index: unstructured fund of concepts to
provide an efficient mapping across the languages;
• Index-records are mainly based on WordNet synsets and
consist of synonyms, glosses and source references;
• Various types of complex equivalence relations are
distinguished;
• Equivalence relations from synsets to index records: not on a
word-to-word basis;
• Indirect matching of synsets linked to the same index items;
40
Equivalent Near Synonym
1. Multiple Targets (1:many)
Dutch wordnet: schoonmaken (to clean) matches with 4
senses of clean in WordNet1.5:
• make clean by removing dirt, filth, or unwanted substances from
• remove unwanted substances from, such as feathers or pits, as of chickens or fruit
• remove in making clean; "Clean the spots off the rug"
• remove unwanted substances from - (as in chemistry)
2. Multiple Sources (many:1)
Dutch wordnet: versiersel near_synonym versiering
ILI-Record: decoration.
3. Multiple Targets and Sources (many:many)
Dutch wordnet: toestel near_synonym apparaat
ILI-records: machine; device; apparatus; tool 41
Equivalent Hyperonymy
Typically used for gaps in English WordNet:
• genuine, cultural gaps for things not known in
English culture:
– Dutch: klunen, to walk on skates over land from one
frozen water to the other
• pragmatic, in the sense that the concept is known but
is not expressed by a single lexicalized form in
English:
– Dutch: kunststof = artifact substance <=> artifact object
42
Equivalent Hyponymy
has_eq_hyponym
Used when wordnet1.5 only provides more narrow
terms. In this case there can only be a pragmatic
difference, not a genuine cultural gap, e.g.: Spanish
dedo = either finger or toe.
43
Complex mappings across languages
EN-Net
IT-Net
toe
dito
finger
{ toe : part of foot }
head
{ finger : part of hand }
{ dedo , dito :
finger or toe }
{ head : part of body }
NL-Net
hoofd
kop
{ hoofd : human head }
{ kop : animal head }
ES-Net
dedo
= normal equivalence
= eq _has_hyponym
= eq _has_hyperonym
44
Typical gaps in the (English) ILI
• Dutch:
doodschoppen (to kick to death):
eq_hyperonym {kill}V and to {kick}V
aardig (Adjective, to like):
eq_near_synonym {like}V
cassière (female cashier)
eq_hyperonym {cashier}, {woman}
kunstproduct (artifact substance)
eq_hyperonym {artifact} and to {product}
• Spanish:
alevín (young fish):
eq_hyperonym {fish} and eq_be_in_state {young}
cajera (female cashier)
eq_hyperonym {cashier}, {woman}
45
Wordnets as semantic structures
• Wordnets are unique language-specific structures:
–
–
–
–
different lexicalizations
differences in synonymy and homonymy
different relations between synsets
same organizational principles: synset structure and
same set of semantic relations.
• Language independent knowledge is assigned to
the ILI and can thus be shared for all language
linked to the ILI: both an ontology and domain
hierarchy
46
Autonomous & Language-Specific
Wordnet1.5
Dutch Wordnet
voorwerp
{object}
object
artifact, artefact
(a man-made object)
block
natural object (an
object occurring
naturally)
blok
{block}
instrumentality
body
implement
lichaam
{body}
device
container
tool
instrument
box
werktuig{tool}
spoon
bag
bak
{box}
lepel
{spoon}
tas
{bag}
47
Linguistic versus Artificial Ontologies
Artificial ontology:
• better control or performance, or a more compact and
coherent structure.
• introduce artificial levels for concepts which are not
lexicalized in a language (e.g. instrumentality, hand tool),
• neglect levels which are lexicalized but not relevant for the
purpose of the ontology (e.g. tableware, silverware,
merchandise).
What properties can we infer for spoons?
spoon -> container; artifact; hand tool; object; made of metal or
plastic; for eating, pouring or cooking
48
Linguistic versus Artificial Ontologies
Linguistic ontology:
• Exactly reflects the relations between all the lexicalized words and
expressions in a language.
• Captures valuable information about the lexical capacity of
languages: what is the available fund of words and expressions in a
language.
What words can be used to name spoons?
spoon -> object, tableware, silverware, merchandise, cutlery,
49
Wordnets versus ontologies
• Wordnets:
• autonomous language-specific lexicalization
patterns in a relational network.
• Usage: to predict substitution in text for
information retrieval,
• text generation, machine translation, wordsense-disambiguation.
• Ontologies:
• data structure with formally defined concepts.
• Usage: making semantic inferences.
50
Sharing world knowledge
• All wordnets in the world can be linked to
the same ontology
• All wordnets in the world can be linked to
the same thesaurus
51
Wordnet: Domain information
Concepts
Vocabularies of languages
1
2
bank
rec: 12345
- financial institute
Clothing
rec: 54321
- river side
rec: 9876
- small string instrument
2
rec: 65438
- musician playing a violin
violist
rec:42654
- musician
1
2
Domains
Culture Sport Finance
Music
1
violin
string
Relations
Ball
Winter
sports sports
type-of
rec:35576
- string of an instrument
part-of
type-of
rec:29551
- underwear
rec:25876
- string instrument
52
How to harmonize wordnets?
• Wordnets are unique language-specific
lexicalizations patterns
• Define universal sets of concepts that play a major
role in many different wordnets: so-called Base
Concepts
• Define base concepts in each language wordnet
– High level in the hierarchy
– Many hyponyms
• Provide the closest equivalent in English wordnet
• Determine the intersection of English
equivalences
53
Lexicalization patterns
entity
object
garbage
threat
artifact
organism
animal
building bird
church canary dog crocodile
plant
tree flower
rose
25 unique
beginners
1024 base
concepts
basic level
concepts
abbey common
canary
54
Base Concept Intersection
Nouns
Verbs
Intersection EN, NL, IT, ES
24
6
Intersection FR, DE, EE, CZ
70
30
Intersection All
13
2
{human 1; individual#1; mortal#1; person#1; someone#1; soul#1}
{animal 1; animate being#1; beast#1; brute#1; creature#1; fauna#1}
{flora 1; plant#1; plant life#1}
{matter 1; substance#1}
{food 1; nutrient#1}
{feeling 1}
{act 1; human action#1; human activity#1}
{cause 6; get#9; have#7; induce#2; make#12; stimulate#3}
{create 2; make#13}
{go 14; locomote#1; move#15; travel#4}
{be 4; have the quality of being#1}
55
Explanations for low intersection of
Base Concepts
• The individual selections are not representative
enough.
• There are major differences in the way meanings are
classified, which have an effect on the frequency of
the relations.
• The translations of the selection to WordNet1.5
synsets are not reliable
• The resources cover very different vocabularies
56
Concepts selected by at least two
languages: intersections of pairs
NOUNS
NL
ES
IT
EN
VERBS
NL
ES
IT
EN
NL
ES
IT
EN
1027 103 182 333 323
36
42
86
103 523
45 284
36 128
18
43
182
45 334 167
42
18 104
39
333
284
167 1296
86
43
39
236
57
Common Base Concepts
Nouns
Verbs
Physical objects & substances
491
Processes and states
272
Mental objects
Total
Total
491
228
33
796
500
33
228
1024
58
Table 4: Number of Common BCs represented in the local wordnets
NL
ES
IT
Related to CBCs Eq_synonym
Eq_near
992
1012
878
269
0
191
725
1009
759
CBCs Without
Direct Equivalent
97
15
9
Table 5: BC4 Gaps in at least two wordnets (10 synsets)
body covering#1
mental object#1; cognitive content#1; content#2
body substance#1
natural object#1
social control#1
place of business#1; business establishment#1
change of magnitude#1 plant organ#1
contractile organ#1
plant part#1
psychological feature#1 spatial property#1; spatiality#1
59
Table 6: Local senses with complex equivalence
relations to CBCs
Eq_has_hyperonym
eq_has_hyponym
Eq_has_holonym
Eq_has_meronym
Eq_involved
Eq_is_caused_by
Eq_is_state_of
NL
61
34
2
3
3
3
1
ES
40
14
0
2
IT
4
20
Example of complex relation
CBC: cause to feel unwell#1, Verb
Closest Dutch concept: {onwel#1}, Adjective (sick)
Equivalence relation: eq_is_caused_by
60
EuroWordNet data
Dutch
Spanish
Italian
French
German
Czech
Estonian
English
WN15
Synsets No. of senses Sens./ Entries Sens./ LIRels. LIRels/ EQRels- EQRels/s Synsets
syns.
entry
syns
ILI
yn
without
ILI
44015
70201 1,59 56283 1,25 111639
2,54 53448
1,21
7203
23370
50526 2,16 27933 1,81 55163
2,36 21236
0,91
0
40428
48499 1,20 32978 1,47 117068
2,90 71789
1,78
1561
22745
32809 1.44 18777 1.75 49494
2.18 22730
1.00
20
15132
20453 1.35 17098 1.20 34818
2.30 16347
1.08
0
12824
19949 1.56 12283 1.62 26259
2.05 12824
1.00
0
7678
13839 1.80 10961 1.26 16318
2.13
9004
1.17
0
16361
40588 2,48 17320 2,34 42140
2,58
n.a.
n.a.
n.a.
94515
187602 1,98 126617 1,48 211375
2,24
n.a.
n.a.
n.a.
61
From EuroWordNet to Global WordNet
• Currently, wordnets exist for more than 50
languages, including:
• Arabic, Bantu, Basque, Chinese, Bulgarian,
Estonian, Hebrew, Icelandic, Japanese, Kannada,
Korean, Latvian, Nepali, Persian, Romanian,
Sanskrit, Tamil, Thai, Turkish, Zulu...
• Many languages are genetically and typologically
unrelated
• http://www.globalwordnet.org
62
Global Wordnet Association
EuroWordNet
•
•
•
•
•
•
•
•
English
German
Spanish
French
Italian
Dutch
Czech
Estonian
BalkaNet






Romanian
Bulgarian
Turkish
Slovenian
Greek
Serbian
http://www.globalwordnet.org
•
•
•
•
•
•
•
•
•
Danish
Norway
Swedish
Portuguese
Korean
Russian
Basque
Catalan
Thai













Arabic
Polish
Welsh
Chinese
20 Indian
Languages
Brazilian
Portuguese
Hebrew
Latvian
Persian
Kurdish
Avestan
Baluchi
Hungarian
63
Some downsides of the EuroWordnet
model
• Construction is not done uniformly
• Coverage differs
• Not all wordnets can communicate with one
another
• Proprietary rights restrict free access and usage
• A lot of semantics is duplicated
• Complex and obscure equivalence relations due to
linguistic differences between English and other
languages
64
Next step: Global WordNet Grid
Fahrzeug
1
Auto Zug
Inter-Lingual
Ontology
vehicle
voertuig
1
auto trein
1
car
Object
train
2
Dutch Words
2
English Words
TransportDevice
vehículo
1
véhicule
veicolo
voiture
1
auto treno
2
Italian Words
dopravní prostředník
1
auto
vlak
2
Czech Words
liiklusvahend
auto killavoor
3
auto tren
Spanish Words
1
Device
3
2
2
German Words
2
Estonian Words
1
train
2
French Words
65
GWNG: Main Features
• Construct separate wordnets for each Grid
language
• Contributors from each language encode the
same core set of concepts plus
culture/language-specific ones
• Synsets (concepts) can be mapped
crosslinguistically via an ontology
66
The Ontology: Main Features
• Formal ontology serves as universal index of
concepts
• List of concepts is not just based on the lexicon of
a particular language (unlike in EuroWordNet) but
uses ontological observations
• Ontology contains only upper and mid-level
concepts
• Concepts are related in a type hierarchy
• Concepts are defined with axioms
67
The Ontology: Main Features
• In addition to high-level (“primitive”) concept
ontology needs to express low-level concepts
lexicalized in the Grid languages
• Additional concepts can be defined with
expressions in Knowledge Interchange Format
(KIF) based on first order predicate calculus and
atomic element
68
The Ontology: Main Features
• Minimal set of concepts (Reductionist view):
– to express equivalence across languages
– to support inferencing
• Ontology must be powerful enough to encode all concepts
that are lexically expressed in any of the Grid languages
• Ontology need not and cannot provide a linguistic
encoding for all concepts found in the Grid languages
– Lexicalization in a language is not sufficient to warrant inclusion
in the ontology
– Lexicalization in all or many languages may be sufficient
• Ontological observations will be used to define the
concepts in the ontology
69
Ontological observations
• Identity criteria as used in OntoClean (Guarino &
Welty 2002), :
– rigidity: to what extent are properties true for entities
in all worlds? You are always a human, but you can be
a student for a short while.
– essence: what properties are essential for an entity?
Shape is essential for a statue but not for the clay it is
made of.
– unicity: what represents a whole and what entities are
parts of these wholes? An ocean is a whole but the
water it contains is not.
70
Type-role distinction
• Current WordNet treatment:
(1) a husky is a kind of dog(type)
(2) a husky is a kind of working dog (role)
• What’s wrong?
(2) is defeasible, (1) is not:
*This husky is not a dog
This husky is not a working dog
Other roles: watchdog, sheepdog, herding dog, lapdog, etc….
71
Ontology and lexicon
•Hierarchy of disjunct types:
Canine  PoodleDog; NewfoundlandDog;
GermanShepherdDog; Husky
•Lexicon:
– NAMES for TYPES:
{poodle}EN, {poedel}NL, {pudoru}JP
((instance x Poodle)
– LABELS for ROLES:
{watchdog}EN, {waakhond}NL, {banken}JP
((instance x Canine) and (role x GuardingProcess))
72
Ontology and lexicon
•Hierarchy of disjunct types:
River; Clay; etc…
•Lexicon:
– NAMES for TYPES:
{river}EN, {rivier, stroom}NL
((instance x River)
– LABELS for dependent concepts:
{rivierwater}NL (water from a river => water is not a unit)
{kleibrok}NL (irregularly shared piece of clay=>non-essential)
((instance x water) and (instance y River) and (portion x y)
((instance x Object) and (instance y Clay) and (portion x y)
and (shape X Irregular))
73
Rigidity
• The “primitive” concepts represented in the
ontology are rigid types
• Entities with non-rigid properties will be
represented with KIF statements
• But: ontology may include some universal,
core concepts referring to roles like father,
mother
74
Properties of the Ontology
• Minimal: terms are distinguished by
essential properties only
• Comprehensive: includes all distinct
concepts types of all Grid languages
• Allows definitions via KIF of all lexemes
that express non-rigid, non-essential
properties of types
• Logically valid, allows inferencing
75
Mapping Grid Languages onto
the Ontology
• Explicit and precise equivalence relations among synsets in
different languages:
– type hierarchy is minimal
– subtle differences can be encoded in KIF expressions
• Grid database contains wordnets with synsets that label
• --either “primitive” types in the hierarchies,
• --or words relating to these types in ways made explicit in
KIF expressions
• If 2 lgs. create the same KIF expression, this is a statement
of equivalence!
76
How to construct the GWNG
• Take an existing ontology as starting point;
• Use English WordNet to maximize the number of disjunct
types in the ontology;
• Link English WordNet synsets as names to the disjunct
types;
• Provide KIF expressions for all other English words and
synsets
• Copy the relation to the ontology to other languages,
including KIF statements built for English
• Revise KIF statements to make the mapping more precise
• Map all words and synsets that are and cannot be mapped
to English WordNet to the ontology:
– propose extensions to the type hierarchy
– create KIF expressions for all non-rigid concepts
77
Initial Ontology: SUMO
(Niles and Pease)
SUMO = Suggested Upper Merged Ontology
--consistent with good ontological practice
--fully mapped to WordNet(s): 1000 equivalence
mappings, the rest through subsumption
--freely and publicly available
--allows data interoperability
--allows NLP
--allows reasoning/inferencing
78
SUMO
• 1,000 generic, abstract, high-level terms
• 4,000 definitional statements
• MILO (Mid-Level Ontology)
closer to lexicon, WordNet
79
Mapping Grid languages onto the
Ontology
• Check existing SUMO mappings to
Princeton WordNet -> extend the ontology
with rigid types for specific concepts
• Extend it to many other WordNet synsets
• Observe OntoClean principles! (Synsets
referring to non-rigid, non-essential, nonunicitous concepts must be expressed in
KIF)
80
Lexicalizations not mapped to WordNet
• Not added to the type hierarchy:
{straathond}NL (a dog that lives in the streets)
((instance x Canine) and (habitat x Street))
• Added to the type hierarchy:
{klunen}NL (to walk on skates from one frozen body to
the next over land)
WalkProcess  KluunProcess
Axioms:
(and (instance x Human) (instance y Walk) (instance z
Skates) (wear x z) (instance s1 Skate) (instance s2
Skate) (before s1 y) (before y s2) etc…
• National dishes, customs, games,....
81
Most mismatching concepts are not
new types
• Refer to sets of types in specific circumstances or
to concept that are dependent on these types, next
to {rivierwater}NL there are many other:
{theewater}NL (water used for making tea)
{koffiewater}NL (water used for making coffee)
{bluswater}NL (water used for making extinguishing file)
• Relate to linguistic phenomena:
– gender, perspective, aspect, diminutives, politeness,
pejoratives, part-of-speech constraints
82
KIF expression for gender marking
• {teacher}EN
((instance x Human) and (agent x
TeachingProcess))
• {Lehrer}DE ((instance x Man) and (agent
x TeachingProcess))
• {Lehrerin}DE ((instance x Woman) and
(agent x TeachingProcess))
83
KIF expression for perspective
sell: subj(x), direct obj(z),indirect obj(y)
versus
buy: subj(y), direct obj(z),indirect obj(x)
(and (instance x Human)(instance y Human)
(instance z Entity) (instance e FinancialTransaction)
(source x e) (destination y e) (patient e)
The same process but a different perspective by subject
and object realization: marry in Russian two verbs,
apprendre in French can mean teach and learn
84
Aspectual variants
• Slavic languages: two members of a verb pair for an
ongoing event and a completed event.
• English: can mark perfectivity with particles, as in the
phrasal verbs eat up and read through.
• Romance languages: mark aspect by verb conjugations on
the same verb.
• Dutch, verbs with marked aspect can be created by
prefixing a verb with door: doorademen, dooreten,
doorfietsen, doorlezen, doorpraten (continue to
breathe/eat/bike/read/talk).
• These verbs are restrictions on phases of the same process
• Does NOT warrant the extension of the ontology with
separate processes for each aspectual variant
85
Kinship relations in Arabic
•
•
•
•
‫عم‬
father's brother,
َ (Eam~)
paternal uncle.
‫( خَال‬xaAl)
mother's brother,
maternal uncle.
‫ع َّمة‬
َ (Eam~ap) father's sister, paternal
aunt.
‫( خَالَة‬xaAlap) mother's sister, maternal
aunt
86
Kinship relations in Arabic
•
•
•
•
.........
‫ش ِقيقَة‬
َ ($aqiyqapfull) sister, sister on the paternal and
maternal side (as distinct from ‫>( أ ُ ْخت‬uxot): 'sister'
which may refer to a 'sister' from paternal or maternal
side, or both sides).
‫( ثَ ْكالن‬vakolAna)
father bereaved of a child (as
opposed to ‫( يَ ِتيم‬yatiym) or ‫( يَ ِتي َمة‬yatiymap) for
feminine: 'orphan' a person whose father or mother died
or both father and mother died).
‫( ثَ ْكلَى‬vakolaYa)
other bereaved of a child (as
opposed to ‫ يَتِيم‬or ‫ يَتِي َمة‬for feminine: 'orphan' a person
whose father or mother died or both father and mother
died).
87
Complex Kinship concepts
father's brother, paternal uncle
WORDNET
paternal uncle
=> uncle
=> brother of ....????
ONTOLOGY
(=>
(paternalUncle ?P ?UNC)
(exists (?F)
(and
(father ?P ?F)
(brother ?F ?UNC))))
88
Universality as evidence
• English verb cut abstracts from the precise process but
there are troponyms that implicate the manner :
– snip, clip imply scissors, chop and hack a large knife or an axe
• Dutch there is no general verb but only specific verbs:
knippen “clip, snip, cut with scissors or a scissor-like tool'”, snijden
“cut with a knife or knife-like tool”, hakken “chop, hack, to cut
with an axe, or similar tool”).
• If lexicalization of the specific process is more universal it
can be seen as evidence that the specific processes should
be listed in the ontology and not the generic verb
89
Open Questions/Challenges
• What is a word, i.e., a lexical unit?
• What is the status of complex lexemes like
English lightning rod, word of mouth, find
out, kick the bucket?
• What is a semantic unit, i.e. a concept?
90
Open Questions/Challenges
• Is there a core inventory of concepts that are
universally encoded?
• If so, what are these concepts?
• How can crosslinguistic equivalence be verified?
• Is there systematicity to the language-specific
extensions?
• What are the lexicalization patterns of individual
languages?
• Are lexical gaps accidental or systematic?
91
Coverage: what belongs in a
universal lexical database?
• Formal, linguistic criteria for inclusion
• Informal, cultural criteria
• Both are difficult to define and apply!
92
Advantages of the Global Wordnet
Grid
• Shared and uniform world knowledge:
– universal inferencing
– uniform text analysis and interpretation
• More compact and less redundant databases
• More clear notion how languages map to
the knowledge
– better criteria for expressing knowledge
– better criteria for understanding variation
93
Expansion with pure hyponymy
relations
dog
hunting dog
puppy
dachshund
lapdog
street dog
poodle
bitch
watchdog
short hair
dachshund
long hair
dachshund
Expansion from a type to roles
94
Expansion with pure hyponymy
relations
dog
hunting dog
puppy
dachshund
lapdog
street dog
poodle
bitch
watchdog
short hair
dachshund
long hair
dachshund
Expansion from a role to types and other roles
95
Automotive ontology:
(http://www.ontoprise.de)
96
Who uses ontologies?
97
98
Download