Ontology Learning O to ogy ea g Ontologies and Ontology Engineering

advertisement
Ontology
O
to ogy Learning
ea
g
Ontologies and Ontology Engineering
Kristian Stavåker and Dag Sonntag
VT 2011
Summary
•
Wikipedia:
•
•
Paul Buitelaar et al. / Ontology Learning from Text: An
Overview [3]:
•
2
“Ontology learning (ontology extraction, ontology generation,
or ontology acquisition) is a subtask of information extraction.
extraction
The goal of ontology learning is to (semi-)automatically extract
relevant concepts and relations from a given corpus or other
kinds of data sets to form an ontology
ontology.”
”The process of defining and instantiating a knowledge base
is referred to as knowledge markup or ontology population,
whereas (semi
(semi-)automatic
)automatic support in ontology development is
usually referred to as ontology learning.”
2011-06-12
Ontology Learning
S
Summary
(2)
3
•
A lot of the work in this area builds on work from natural
language processing, artificial intelligence and machine
learning.
learning
•
The ontology layer cake consists of the various subtasks
(increasing in complexity) involved in ontology learning.
2011-06-12
Ontology Learning
Th Ontology
The
O t l
Learning
L
i L
Layer C
Cake
k
x, y (sufferFrom(x, y)  ill(x))
cure (domain:Doctor, range:Disease)
is_a (Doctor, Person)
Disease:=<I
Disease:
<I, E,
E L>
{disease, illness}
disease, illness, hospital
4
2011-06-12
Ontology Learning
O tli
Outline
5
•
Background
•
Terms
•
Synonyms
•
Concepts
•
Concept Hierarchies
•
Relations
•
Rules
•
Conclusions
2011-06-12
Ontology Learning
Background
Terms
Synonyms
Concepts
Concept
Co
cept Hierarchies
e a c es
Relations
Rules
Conclusions
6
B k
Background
d
•
Used for building an ontology from scratch
through the application of a set of methods
and techniques.
techniques
•
The process of identifying terms, concepts,
relations and optionally axioms from textual
information to form an ontology.
2011-06-12
Ontology Learning
Background
Terms
Synonyms
Concepts
Concept
Co
cept Hierarchies
e a c es
Relations
Rules
Conclusions
7
B k
Background
d (2)
•
Unstructured sources (NLP techniques,
morphological and syntactic analysis, etc.)
•
Semi-structured sources (such as XML
schema)
•
Structured data (extract concepts and
relations from for instance databases)
2011-06-12
Ontology Learning
Background
Terms
Synonyms
Concepts
Concept
Co
cept Hierarchies
e a c es
Relations
Rules
Conclusions
T
Terms
disease, illness, hospital
8
2011-06-12
Ontology Learning
Background
Terms
Synonyms
Concepts
Concept
Co
cept Hierarchies
e a c es
Relations
Rules
Conclusions
9
T
Terms
•
Term extraction is a prerequisite for all
aspects of ontology learning from text.
•
Term extraction
T
t ti implies
i li more or lless
advanced levels of liguistic processing.
•
An example of extracting relevant terms is
counting frequencies of terms in a given set
of documents (the corpus).
2011-06-12
Ontology Learning
Background
Terms
Synonyms
Concepts
Concept
Co
cept Hierarchies
e a c es
Relations
Rules
Conclusions
10
T
Terms
•
The computational linguistics community has
proposed a wide range of more sophisticated
techniques for term extraction.
extraction
2011-06-12
Ontology Learning
Background
Terms
Synonyms
Concepts
Concept
Co
cept Hierarchies
e a c es
Relations
Rules
Conclusions
11
T
Terms
•
One common method:
•
A Part-Of-Speech (POS) tagger is run over the
domain corpus
•
Possible terms are identified by constructing
patterns, such as: Adj-Noun, Noun-noun, AdjNoun-Noun, … (names are ignored)
•
Apply statistical metrics in order to identify only
the relevant to the text terms
2011-06-12
Ontology Learning
Background
Terms
Synonyms
Concepts
Concept
Co
cept Hierarchies
e a c es
Relations
Rules
Conclusions
T
Terms
[[He SUBJ] [booked PRED] [[this] [table HEAD]NP:DOBJ:X1]…]…
[[ SUBJ:X1]] [was
[[It
[
PRED]] still available…]]
[[He SUBJ] [booked PRED] [[this] [table HEAD] NP:DOBJ]S]
[[the SPEC] [large MOD] [table HEAD] NP]
[[the] [large] [table] NP] [[in] [the] [corner] PP]
[work~ing V]
[table N:ARTIFACT] [table N:furniture]
[table] [2005
[2005-06-01]
06 01] [John Smith]
12
2011-06-12
Ontology Learning
Background
Terms
Synonyms
Concepts
Concept
Co
cept Hierarchies
e a c es
Relations
Rules
Conclusions
T
Terms
•
Statistical Analysis
•
Term Frequency Inverted Document
F
Frequency
(TFIDF) - a popular
l weighting
i hti
scheme
N
tfidf
f f ( w)  tff ( w)  log(
g(
)
df ( w)
13
2011-06-12
Ontology Learning
Background
Terms
Synonyms
Concepts
Concept
Co
cept Hierarchies
e a c es
Relations
Rules
Conclusions
S
Synonyms
{disease, illness}
14
2011-06-12
Ontology Learning
Background
Terms
Synonyms
Concepts
Concept
Co
cept Hierarchies
e a c es
Relations
Rules
Conclusions
S
Synonyms
•
Identification of terms that share semantics
(potentially refer to the same concept)
•
M th d ffor extracting
Methods
t ti synonyms
•
Based on WordNet/EuroWordNet
•
Harris’ distributional hypothesis
Harris
•
Latent Semantic Indexing (LSI)
•
15
2011-06-12
A NLP technique
q of analyzing
y g relationships
p
between a set of documents and the terms they
contain
Ontology Learning
Background
Terms
Synonyms
Concepts
Concept
Co
cept Hierarchies
e a c es
Relations
Rules
Conclusions
S
Synonyms
•
Pointwise Mutual Information measure for
extracting synonyms.
PMI ( x, y ) : log 2
PMIWeb ( x, y ) : log 2
16
2011-06-12
Ontology Learning
P ( x, y )
P( x) P( y )
Hits ( xANDy ) MaxPages
Hits ( x) Hits ( y )
Background
Terms
Synonyms
Concepts
Concept
Co
cept Hierarchies
e a c es
Relations
Rules
Conclusions
C
Concepts
t
Disease:=<I, E, L>
17
2011-06-12
Ontology Learning
Background
Terms
Synonyms
Concepts
Concept
Co
cept Hierarchies
e a c es
Relations
Rules
Conclusions
18
C
Concepts
t
•
Some confusion what extraction of concepts
is since it is not clear what exactly constitutes
a concept
concept.
2011-06-12
Ontology Learning
Background
Terms
Synonyms
Concepts
Concept
Co
cept Hierarchies
e a c es
Relations
Rules
Conclusions
C
Concepts
t
•
Intension – (in)formal definition of the set of
objects that this concept describes
•
•
Extension – a set of objects that the definition
of this concept describes
•
•
Example: music
music, noise
noise, speech
Lexical realizations – the term itself and its
multilingual synonyms
•
19
Example:
E
l sound
d iis a mechanical
h i l wave th
thatt iis
an oscillation of pressure composed of
frequencies within the range of hearing
2011-06-12
Example: sound, acoustics
Ontology Learning
Background
Terms
Synonyms
Concepts
Concept
Co
cept Hierarchies
e a c es
Relations
Rules
Conclusions
T
Taxonomy
is_a
_ ((Doctor,, Person))
20
2011-06-12
Ontology Learning
Background
Terms
Synonyms
Concepts
Concept
Co
cept Hierarchies
e a c es
Relations
Rules
Conclusions
T
Taxonomy
•
•
21
Simple ideas that work fairly well
•
Lexico-Synthatic Patterns (Hearst)
•
Formal Concept Analysis (FCA)
•
Phrase Analysis
•
WordNet
These methods can be weightened together
to g
get best results
2011-06-12
Ontology Learning
Background
Terms
Synonyms
Concepts
Concept
Co
cept Hierarchies
e a c es
Relations
Rules
Conclusions
•
•
22
Lexico-Synthatic
L
i S th ti P
Patterns
tt
(Hearst)
Hearst identified the following patterns
•
Hearst1: NPhyper such as {NPhypo
{NPhypo,}}* {(and | or)} NPhypo
•
Hearst2: Such NPhyper as {NPhypo,}* {(and | or)} NPhypo
•
Hearst3: NPhypo
yp {{,NP}*
} {{,}} or other NPhyper
yp
•
Hearst4: NPhypo {,NP}* {,} and other NPhyper
•
Hearst5: NPhyper including {NPhypo,}* NPhypo {(and | or)}
NPhypo
NPh
•
Hearst6: NPhyper especially {NPhypo,}* {(and | or)} NPhypo
Example: ”Vehicles
Vehicles such as bikes and cars”
cars
2011-06-12
Ontology Learning
Background
Terms
Synonyms
Concepts
Concept
Co
cept Hierarchies
e a c es
Relations
Rules
Conclusions
Machine
M
hi R
Readable
d bl
Dictionaries
•
Idea: Exploit the regularity of dictionaries like
Wikepedia or Google definition
•
E
Example:
l (f
(from wikipedia)
iki di )
•
23
2011-06-12
Car: ”An automobile, autocar, motor car or car
is a wheeled motor vehicle used for
transporting passengers, which also carries its
own engine or motor. “ => is(car, vehicle)
Ontology Learning
Background
Terms
Synonyms
Concepts
Concept
Co
cept Hierarchies
e a c es
Relations
Rules
Conclusions
24
Formall C
F
Conceptt A
Analysis
l i
(FCA)
•
Idea: Similar words share similiar attributes
•
Example:
2011-06-12
Ontology Learning
Background
Terms
Synonyms
Concepts
Concept
Co
cept Hierarchies
e a c es
Relations
Rules
Conclusions
25
Ph
Phrase
A
Analysis
l i
•
Idea: Adjectives or Nominals in front of
nouns in noun-phrases often indicate
subclass
•
Example: Focal epilepsy is a subclass of
epilepsy
p p y
2011-06-12
Ontology Learning
Background
Terms
Synonyms
Concepts
Concept
Co
cept Hierarchies
e a c es
Relations
Rules
Conclusions
W d t
Wordnet
•
Idea: Use already
y created ontologies
g
•
WordNet contains ≈155000 words
•
Type
yp ((noun, verb and so on))
•
Their definition
•
Semantic synsets like
•
26
2011-06-12
Hypernyms, hyponyms, meronyms,
synonyms…
Ontology Learning
Background
Terms
Synonyms
Concepts
Concept
Co
cept Hierarchies
e a c es
Relations
Rules
Conclusions
W d t
Wordnet
Entity
Artifact
Motor Vehicle
Car
27
Journey
Bike
Apartment
2011-06-12
Trip
Ontology Learning
Excursion
Background
Terms
Synonyms
Concepts
Concept
Co
cept Hierarchies
e a c es
Relations
Rules
Conclusions
R l ti
Relations
cure (domain:Doctor, range:Disease)
28
2011-06-12
Ontology Learning
Background
Terms
Synonyms
Concepts
Concept
Co
cept Hierarchies
e a c es
Relations
Rules
Conclusions
R l ti
Relations
•
•
Can be found similar as concept hierarchies
(is relation)
•
L i S th ti P
Lexico-Synthatic
Patterns
tt
•
Collocation Discovery
•
WordNet
Specific relations
•
•
Attributes
•
29
Part of (meronym)
(
y )
2011-06-12
Adjectives, e.g. color, weight and so on…
Ontology Learning
Background
Terms
Synonyms
Concepts
Concept
Co
cept Hierarchies
e a c es
Relations
Rules
Conclusions
L i S th ti P
Lexico-Synthatic
Patterns
tt
•
Finds relations by searching after patterns of
words
•
E.g. The car has wheels =>, part-of(wheel, car)
The car is red => is(car, red) or color(car,red)
The car consists of wheels, engine, …
=> part-of(engine, car)…
•
30
Hard to model every possible combination
2011-06-12
Ontology Learning
Background
Terms
Synonyms
Concepts
Concept
Co
cept Hierarchies
e a c es
Relations
Rules
Conclusions
C ll
Collocation
ti Di
Discovery
•
Idea: find words that occur together in a
statistically significant manner
•
Similar
Si
il tto th
the PCA
PCA-approach
h ffor the
th
taxonomy
•
The type of relation can then later be found
by linguistic approaches
•
Example: www-search for two concepts K1
and K2 using the Jaccard coefficient:
GoogleHits(Keyword1,Keyword2)
G
GoogleHits(K1)
i ( 1) +GoogleHits(K2)G
i ( ) GoogleHits(K1,K2)
G
i ( 1
)
31
2011-06-12
Ontology Learning
Background
Terms
Synonyms
Concepts
Concept
Co
cept Hierarchies
e a c es
Relations
Rules
Conclusions
32
W dN t
WordNet
•
Idea: Use already created ontologies
•
Previously found relations can be used to
t i other
train
th machine
hi llearning
i ttechniques
h i
((e.g.
decision trees, association rule learning,
neural networks)
2011-06-12
Ontology Learning
Background
Terms
Synonyms
Concepts
Concept
Co
cept Hierarchies
e a c es
Relations
Rules
Conclusions
A i
Axioms
&R
Rules
l
x, y (sufferFrom(x, y)  ill(x))
33
2011-06-12
Ontology Learning
Background
Terms
Synonyms
Concepts
Concept
Co
cept Hierarchies
e a c es
Relations
Rules
Conclusions
34
A i
Axioms
&R
Rules
l
• Mostly Lexico-Synthatic Patterns
• Example: LExO
2011-06-12
Ontology Learning
Background
Terms
Synonyms
Concepts
Concept
Co
cept Hierarchies
e a c es
Relations
Rules
Conclusions
C
Conclusions
l i
x, y (sufferFrom(x, y)  ill(x))
cure (domain:Doctor, range:Disease)
is_a
_ ((Doctor,, Person))
Disease:=<I, E, L>
{disease, illness}
disease, illness, hospital
35
2011-06-12
Ontology Learning
Background
Terms
Synonyms
Concepts
Concept
Co
cept Hierarchies
e a c es
Relations
Rules
Conclusions
36
C
Conclusions
l i
•
Mainly three methods used:
•
Natural language processing (NLP)
•
Statistical methods
•
Using other existing ontology
2011-06-12
Ontology Learning
Background
Terms
Synonyms
Concepts
Concept
Co
cept Hierarchies
e a c es
Relations
Rules
Conclusions
N t lL
Natural
Language P
Processing
i
•
•
37
Positive:
•
Easy to find corpus
•
High success-rate for certain patterns
Negative:
•
H d to model
Hard
d l every ki
kind
d off pattern
•
Ambiguity in natural text
2011-06-12
Ontology Learning
Background
Terms
Synonyms
Concepts
Concept
Co
cept Hierarchies
e a c es
Relations
Rules
Conclusions
St ti ti l M
Statistical
Methods
th d
•
•
Positive:
•
Needed in some parts of the layer cake, like
term extraction
•
Can give confidence to relations (or synonyms)
found by other methods
Negative:
•
38
2011-06-12
Even if a statistical similarity if found, the
relation
l ti bi
binding
di th
the words
d ttogether
th iis unknown
k
Ontology Learning
Background
Terms
Synonyms
Concepts
Concept
Co
cept Hierarchies
e a c es
Relations
Rules
Conclusions
U Oth
Use
Other E
Existing
i ti O
Ontology
t l
•
•
39
Positive:
•
Very easy to find relations and definitions
•
If a relation or similar is found the probabilityrate of it being correct is very high
•
Can be combined in a successful way with
other methods
Negative:
•
Can be hard to find ontologies in specialized
areas
•
Words can have multiple meaning
2011-06-12
Ontology Learning
Background
Terms
Synonyms
Concepts
Concept
Co
cept Hierarchies
e a c es
Relations
Rules
Conclusions
40
R f
References
•
[1] Ontology Learning – Philipp Cimiano,
Al
Alexander
d Mäd
Mädche,
h St
Steffen
ff St
Staab,
b JJohanna
h
Völker, 2009.
•
[2] Ontology Learning – Alexander Maedche
Maedche,
Steffen Staab.
•
[3] Ontology Learning from Text: An Overview
– Paul Buitelaar, Philipp Cimiano, and
Bernardo Magnini, 2003.
•
[4] Ontology Learning from Text: A Look back
and into the Future – Wilson Wong, Wei Liu,
Mohammed Bennamoun, 2011.
•
[5] Aquisition of OWL DL Axioms from Lexical
Resources – Johanna Völker, Pascal Hitzler
and Philipp Cimiano
2011-06-12
Ontology Learning
2011-06-12
www.liu.se
Ontology Learning
41
Download