10. Ontology Engineering.ppt

advertisement
Ontology Engineering &
Maintenance
Semantic Web - Spring 2006
Computer Engineering Department
Sharif University of Technology
Outline
Ontology Engineering
 Ontology evaluation

Introduction

Why do we use ontology?
 To describe the semantics of the data (which we
name as Meta-Data)

Why do we describe the semantics?
 In order to provide a uniform way to make different
parties to understand each other

Which data?
 Any data (on the web, or in the existing legacy
databases)
Introduction

Formal definition on Ontology:


Ontologies are knowledge bodies that provide a
formal representation of a shared
conceptualization of a particular domain.
Ontologies are widely used in the Semantic
Web.
 Recently ontologies have become
increasingly common on WWW where
they provide semantics of annotations in
web pages
What Is “Ontology Engineering”?
Ontology Engineering: Defining terms in the
domain and relations among them




Defining concepts in the domain (classes)
Arranging the concepts in a hierarchy
(subclass-superclass hierarchy)
Defining which attributes and properties (slots)
classes can have and constraints on their
values
Defining individuals and filling in slot values
Ontology-Development Process
here:
determine
scope
consider
reuse
enumerate
terms
define
classes
define
properties
define
constraints
create
instances
define
classes
enumerate
terms
define
classes
create
instances
define
classes
create
instances
In reality - an iterative process:
determine
scope
consider
reuse
define
properties
define
classes
consider
reuse
define
properties
enumerate
terms
define
properties
define
constraints
consider
reuse
define
constraints
create
instances
Determine Domain and Scope
determine
scope



consider
reuse
enumerate
terms
define
classes
define
properties
define
constraints
create
instances
What is the domain that the ontology will
cover?
For what we are going to use the ontology?
For what types of questions the information in
the ontology should provide answers?
Consider Reuse
determine
scope

consider
reuse
enumerate
terms
define
classes
define
properties
define
constraints
create
instances
Why reuse other ontologies?



to save the effort
to interact with the tools that use other
ontologies
to use ontologies that have been validated
through use in applications
What to Reuse?

Ontology libraries




DAML ontology library (www.daml.org/ontologies)
Ontolingua ontology library
(www.ksl.stanford.edu/software/ontolingua/)
Protégé ontology library
(protege.stanford.edu/plugins.html)
Upper ontologies


IEEE Standard Upper Ontology (suo.ieee.org)
Cyc (www.cyc.com)
What to Reuse? (II)

General ontologies



DMOZ (www.dmoz.org)
WordNet (www.cogsci.princeton.edu/~wn/)
Domain-specific ontologies


UMLS Semantic Net
GO (Gene Ontology) (www.geneontology.org)
Enumerate Important Terms
determine
scope
consider
reuse
enumerate
terms
define
classes
define
properties
define
constraints
create
instances
What are the terms we need to talk
about?
 What are the properties of these terms?
 What do we want to say about the terms?

Define Classes and the Class
Hierarchy
determine
scope




enumerate
terms
define
classes
define
properties
define
constraints
create
instances
A class is a concept in the domain


consider
reuse
a class of wines
a class of wineries
a class of red wines
A class is a collection of elements with similar
properties
Instances of classes

a glass of California wine you’ll have for lunch
Class Inheritance


Classes usually constitute a taxonomic hierarchy
(a subclass-superclass hierarchy)
A class hierarchy is usually an IS-A hierarchy:
an instance of a subclass is an instance of
a superclass


If you think of a class as a set of elements, a
subclass is a subset
e.g., Apple is a subclass of Fruit
Every apple is a fruit
Levels in the Hierarchy
Top
level
Middle
level
Bottom
level
Modes of Development
top-down – define the most general
concepts first and then specialize them
 bottom-up – define the most specific
concepts and then organize them in more
general classes
 combination – define the more salient
concepts first and then generalize and
specialize them

Documentation

Classes (and Properties) usually have
documentation




Describing the class in natural language
Listing domain assumptions relevant to the
class definition
Listing synonyms
Documenting classes and slots is as
important as documenting computer code!
Define Properties (Slots) of Classes
determine
scope

consider
reuse
enumerate
terms
define
classes
define
properties
define
constraints
create
instances
Properties in a class definition describe
attributes of instances of the class and
relations to other instances
Each wine will have color, sugar content,
producer, etc.
Properties (Slots)

Types of properties





“intrinsic” properties: flavor and color of wine
“extrinsic” properties: name and price of wine
parts: ingredients in a dish
relations to other objects: producer of wine (winery)
Simple and complex properties


simple properties (attributes): contain primitive values
(strings, numbers)
complex properties: contain (or point to) other objects
(e.g., a winery instance)
Property Constraints (facets)
determine
scope

consider
reuse
enumerate
terms
define
classes
define
properties
define
constraints
create
instances
Property constraints (facets) describe or
limit the set of possible values for a
property
The name of a wine is a string
The wine producer is an instance of Winery
A winery has exactly one location
An Example: Domain and Range
DOMAIN
class


slot
allowed values
When defining a domain or range for a slot, find the most
general class or classes
Consider the flavor slot



RANGE
Domain: Red wine, White wine, Rosé wine
Domain: Wine
Consider the produces slot for a Winery:


Range: Red wine, White wine, Rosé wine
Range: Wine
Create Instances
determine
scope

enumerate
terms
define
classes
define
properties
define
constraints
create
instances
Create an instance of a class



consider
reuse
The class becomes a direct type of the instance
Any superclass of the direct type is a type of the
instance
Assign slot values for the instance frame


Slot values should conform to the facet constraints
Knowledge-acquisition tools often check that
Defining Classes and a Class Hierarchy

The things to remember:



There is no single correct class hierarchy
But there are some guidelines
The question to ask:
“Is each instance of the subclass an instance of
its superclass?”
Transitivity of the Class Hierarchy

The is-a relationship is
transitive:
B is a subclass of A
C is a subclass of B
C is a subclass of A

A direct superclass of a
class is its “closest”
superclass
Multiple Inheritance



A class can have more than
one superclass
A subclass inherits slots and
facet restrictions from all the
parents
Different systems resolve
conflicts differently
Disjoint Classes


Classes are disjoint if they cannot have common instances
Disjoint classes cannot have any common subclasses either
Wine
Red wine, White wine,
Rosé wine are disjoint
Dessert wine and Red
wine are not disjoint
Dessert
wine
Red
wine
White
wine
Rosé
wine
Avoiding Class Cycles
Danger of multiple
inheritance: cycles in the
class hierarchy
 Classes A, B, and C have
equivalent sets of instances


By many definitions, A, B, and
C are thus equivalent
The Perfect Family Size



If a class has only one child,
there may be a modeling
problem
If the only Red Burgundy we
have is Côtes d’Or, why
introduce the sub-hierarchy?
Compare to bullets in a
bulleted list
The Perfect Family Size (II)


If a class has more
than a dozen children,
additional
subcategories may be
necessary
However, if no natural
classification exists,
the long list may be
more natural
Single and Plural Class Names
A “wine” is not a kind-of
“wines”
 A wine is an instance of the
class Wines
 Class names should be either

Class

instance-of
Instance

all singular
all plural
Classes and Their Names



Classes represent concepts in the domain, not
their names
The class name can change, but it will still refer
to the same concept
Synonym names for the same concept are not
different classes

Many systems allow listing synonyms as part of the class
definition
Content: Top-Level Ontologies

What does “top-level” mean?






Objects: tangible, intangible
Processes, events, actors, roles
Agents, organizations
Spaces, boundaries, location
Time
IEEE Standard Upper Ontology effort


Goal: Design a single upper-level ontology
Process: Merge upper-level of existing ontologies
CYC: Top-Level Categories
WORDNET: Representation of Subclass
Relation among Synsets
Sowa’s Ontology
Ontology Evaluation




Key factor which makes a particular discipline or
approach scientific is the ability to evaluate and
compare the ideas within the area.
In most practical cases ontologies are a nonuniquely expressible.
One can build many different ontologies which
conceptualizing the same body of knowledge.
We should be able to say which of these
ontologies serves better some predefined
criterion.
Categories of Ontology Evaluation




Those based on comparing the ontology to a
"golden standard“ (a ontology).
Those based on using the ontology in an
application and evaluating the results of it.
Those involving comparisons with a source of
data (e.g. a collection of documents) about the
domain that is to be covered by the ontology.
Those where evaluation is done by humans who
try to assess how well the ontology meets a set
of predefined criteria, standards, requirements,
etc.
Different Levels of Evaluation
Lexical, vocabulary, or Data Layer
 Hierarchy or Taxonomy
 Other Semantic relations
 Context or application level
 Syntactic Level
 Structure, Architecture, Design
 Multiple-criteria approaches

A: Lexical, Vocabulary, or Data Layer




The focus is on which concepts, instances, facts, etc. have
been include in the ontology, and the vocabulary used to
represent or identify these concepts.
Evaluation on this level tends to involve comparisons with
various sources of data concerning the problem, as well as
techniques such as string similarity measures (e.g. edit
distance).
MAEDCHE AND STAAB (2002). Concepts are compared to a
“Golden Standard” set of strings that are considered a good
representation of the concepts.
Golden standard



Another ontology
Taken statistically from a corpus of documents
Prepared by domain experts.
B: Hierarchy or Taxonomy


An ontology typically includes a hierarchical “is-a
or subsumption” relation between concepts.
BREWSTER et al. (2004) used a data-driven
approach to evaluate the degree of structural fit
between an ontology and a corpus of documents.




Cluster the documents and make topic representing
documents
Each concept c of the ontology is represented by a set of
terms including its name in the ontology and the
hypernyms of this name, taken from Wordnet.
Measure how well a concept fits a topic results from the
clustering step.
Indicate that the structure of the ontology is reasonably
well aligned with the hidden structure of topics in the
domain-specific corpus of documents.
C: Context Level



An ontology may be part of a larger collection of ontologies,
and may reference or be referenced by various definitions
in these other ontologies. In this case it may be important
to take this context into account when evaluating it.
Swoogle search engine uses cross-references between
semantic-web documents to define a graph and compute a
score for each ontology in a manner analogous to PageRank
used by the Google web search engine. The resulting
“ontology rank” is used by Swoogle to rank its query
results.
An important difference in comparison to PageRank is that
not all “links” or references between ontologies are treated
the same. If one ontology defines a subclass of a class from
another ontology, this reference might be considered more
important than if one ontology only uses a class from
another as the domain or range of some relation.
D: Application Level



It may be more practical to evaluate an ontology
within the context of particular application, and
to see how the results of the application are
affected by the use of ontology in question.
The outputs of the application, or its performance
on the given task, might be better or worse
depending partly on the ontology used in it.
One might argue that a good ontology is one
which helps the application in question produce
good results on the given task.
E: Syntactic Level
For manually constructed Ontologies.
 The ontology is usually described in a
particular formal language and must
match the syntactic requirements of that
language (use of the correct keywords,
etc.).
 This is probably the one that lends itself
the most easily to automated processing.

F: Structure, Architecture, Design
This is primarily of interest in manually
constructed ontologies.
 Assuming
that some kind of design
principles or criteria have been agreed
upon prior to constructing the ontology,
evaluation on this level means checking to
what extent the resulting ontology
matches those criteria.
 Must usually be done largely or even
entirely manually by people such as
ontological engineers and domain experts.

G: Multiple-Criteria Approaches



Selecting a good ontology from a given set of
ontologies.
Techniques familiar from the area of decision
support systems can be used to help us evaluate
the ontologies and choose one of them.
Are based on defining several decision criteria or
attributes;



for each criterion, the ontology is evaluated and given a
numerical score.
A weight is assigned to each criterion.
An overall score for the ontology is then computed as a
weighted sum of its per-criterion scores.
Example Select an Ontology - Type G:
Ontology Auditor Metrics Suite
Metric
Syntactic
Quality
Semantic
Quality
Pragmatic
Quality
Social Quality
Attributes
Description
Lawfulness
Correctness of syntax used
Richness
Breadth of syntax used
Interpretability
Meaningfulness of terms
Consistency
Consistency of meaning of terms
Clarity
Average number of word senses
Comprehensibility
Amount of information
Accuracy
Accuracy of information
Relevance
Relevance of information for a task
Authority
Extent to which other ontologies rely on it
History
Number of times ontology has been used
Example Cont.: Overall Quality Metric

Overall quality (Q) is a weighted function of its
constituents:
Q = c1 × S + c2 × E + c3 × P + c4 × O
where
S = syntactic quality
E = semantic quality
P = pragmatic quality
O = social quality, and
c1+c2+c3+c4 = 1

The weights sum to unity, and currently, are set
by the user, the application, or else assumed
equal
Example Cont.: Syntactic Quality (S)

Measures the quality of the ontology
according to the way it is written.

Lawfulness
 refers to the degree to which an ontology language’s
rules have been complied.

Richness
 refers to the proportion of features in the ontology
language that have been used in an ontology
Syntactic Quality (S)
S = b1SL + b2SR
Lawfulness (SL)
Let X be total syntactical rules. Let Xb be total breached rules. Let NS
be the number of statements in the ontology. Then SL = Xb / NS.
Richness (SR)
Let Y be the total syntactical features available in ontology language.
Let Z be the total syntactical features used in this ontology.
Then SR = Z/Y.
Example Cont.: Semantic Quality (E)

Evaluates the meaning of terms in the
ontology library.

Interpretability


Consistency


refers to the meaning of terms in the ontology
whether terms have consistent meaning
Clarity

whether the context of terms is clear
Semantic Quality (E)
E = b1EI + b2EC + b3EA
Interpretability (EI)
Let C be the total number of terms used to define classes and
properties in ontology. Let W be the number of terms that have a
sense listed in WordNet. Then EI = W/C.
Consistency (EC)
Let I = 0. Let C be the number of classes and properties in ontology.
Ci, if meaning in ontology is inconsistent, I+1. I = number of terms
with inconsistent meaning. Ec = I/C.
Clarity (EA)
Let Ci = name of class or property in ontology.  Ci, count Ai , (the
number of word senses for that term in WordNet). Then EA = A/C.
Example Cont.: Pragmatic Quality (P)

Refers to ontology’s usefulness for users or their
agents, irrespective of syntax or semantics.

Accuracy


Comprehensiveness


whether the claims an ontology makes are ‘true.’
measure of the size of the ontology.
Relevance

whether ontology satisfies the agent’s specific requirements.
Pragmatic Quality (P)
P = b1PO + b2PU + b3PR
Comprehensiveness (PO)
Let C be the total number of classes and properties in ontology. Let V
be the average value for C across entire library. Then PO = C/V.
Accuracy (PU)
Relevance (PR)
Let NS be the number of statements in ontology. Let F be the number
of false statements. PU = F/NS. Requires evaluation by domain expert
and/or truth maintenance system.
Let NS be the number of statements in the ontology. Let S be the type
of syntax relevant to agent. Let R be the number of statements within
NS that use S. PR = R / NS.
Example Cont.: Social Quality (O)

Reflects that agents and ontologies exist
in communities.

Authority


number of other ontologies that link to it
History

number of times the ontology is accessed
Social Quality (O)
O = b1OT + b2OH
Authority (OT)
Let an ontology in the library be OA. Let the set of other ontologies
in the library be L. Let the total number of links from ontologies in
L to OA be K. Let the average value for K across ontology library be
V. Then OT = K/V.
History (OH)
Let the total number of accesses to an ontology be A. Let the
average value for A across ontology library be H. Then OH = A/H.
The End
Download