Ontology development and evaluation Dr. Alexandra I. Cristea

advertisement
Ontology development and
evaluation
Dr. Alexandra I. Cristea
Why develop an ontology?
Shared understanding of information structure
(between people or agents)
Enable reuse of domain knowledge
Make domain assumptions explicit
Separate domain knowledge from operational knowledge
Analyse the domain knowledge
Ontology Development 101: A Guide to Creating Your First Ontology
2
What is an ontology? (reminder)
A way of encoding domain knowledge, linking the
knowledge, which allows for reasoning with the data
Ontologies allow for data integration and inference, for
automated query-answering and automated use of data
formal explicit description of concepts in a domain of
discourse (classes/concepts), concept properties
(slots/roles/ properties), and restrictions on
slots (facets /role restrictions)
3
Developing an ontology in praxis (reminder)
defining classes
arranging classes in hierarchy (sub/superclass)
defining properties/slots & their allowed values,
filling in values for slots for instances
4
Rules in ontology design: a simple
knowledge engineering methodology
1. There is no one correct way to model a domain— there are viable
alternatives. Best solution depends on application and extensions.
2. Ontology development is an iterative process.
3. Concepts should be close to objects (physical/logical) and
relationships in domain of interest.
– E.g., nouns (objects) or verbs (relationships) in domain describing sentences.
5
Ontology design steps
1.
2.
3.
4.
5.
6.
7.
Determine the domain and scope of the ontology
Consider reusing existing ontologies
Enumerate important terms in the ontology
Define the classes and the class hierarchy
Define the properties of classes—slots
Define the facets of the slots
Create instances
6
1. Domain and scope: general questions
What is the domain that the ontology will cover?
For what we are going to use the ontology?
For what types of questions the information in the
ontology should provide answers?
Who will use and maintain the ontology?
7
1. Domain and scope: general questions
What is the domain that the ontology will cover?
– E.g., wine and food
For what we are going to use the ontology?
– E.g., for applications that suggest good combinations of wines and food
For what types of questions the information in the ontology should
provide answers?
– See competency questions
Who will use and maintain the ontology?
– E.g., users could be restaurant customers deciding which wine to order – so
price is necessary
– Maintenance may use different language – so mapping is necessary
8
1. Domain and scope: competency questions
a list of questions that a knowledge base based on the
ontology should be able to answer
These questions will serve as the litmus test later:
– Does the ontology contain enough information to
answer these types of questions?
– Do the answers require a particular level of detail or
representation of a particular area?
9
1. Example competency questions
Which wine characteristics should I consider when choosing a wine?
Is Bordeaux a red or white wine?
Does Cabernet Sauvignon go well with seafood?
What is the best choice of wine for grilled meat?
Which characteristics of a wine affect its appropriateness for a dish?
Does a bouquet or body of a specific wine change with vintage year?
What were good vintages for Napa Zinfandel?
10
2. Reusing existing ontologies
may be requirement if system needs to interact with
other applications
many ontologies are already available
may require translation of used formalism
Libraries: SWOOGLE; DAML Library; Ontolingua
Consider also domain specific lists, classifications
– e.g. lists of wine properties from commercial websites
11
3. Important terms
write down a list of all terms we would like either
to make statements about or to explain to a user
– What are the terms we would like to talk about?
– What properties do those terms have?
– What would we like to say about those terms?
12
4. Classes and hierarchy:
approaches
top-down development process starts with definition of most
general concepts in domain and then specialization of concepts.
bottom-up development process starts with definition of most
specific classes, hierarchy leaves, with subsequent grouping of these
classes into more general concepts.
combination development process = combination of top-down &
bottom-up approaches: define more salient concepts first and then
13
generalize and specialize them appropriately.
4. Classes and hierarchy:
approach examples
top-down
– start with general concepts Wine, Food; specialize Wine into WhiteWine,
RedWine, Roséwine etc.
bottom-up
– start with leaves of hierarchy: Pauillac and Margaux wines; then, the common
superclass: Medoc, which is a subclass of Bordeaux.
combination
– Might start with top-level concepts Wine, then specific concepts Margaux,
then mid-concepts Medoc, etc.
14
5. Properties (slots)
Classes alone won’t answer all
competency questions (Step 1).
Once we have defined some classes, we must describe the
internal structure of concepts.
After selecting classes from the list of terms (Step 3),
remaining terms are likely to be properties of classes.
For each property, we must determine which class it
describes. These become slots attached to classes.
15
5. Object property types
“intrinsic” properties (e.g., flavour of a wine)
“extrinsic” properties (e.g., wine’s name, and
provenance area)
parts, for structured object (e.g., courses of a meal)
relationships to other individuals (e.g., maker of
wine, relationship between wine and winery)
16
6. Facets of properties/slots
value type (string, number, Boolean, enumerated, instance)
allowed values (e.g., name of wine is a String)
number of the values (cardinality: exact, min, max)
Domain: classes to which slot is attached (I)
Range: allowed classes for slots of type instance (O)
17
6. Domain and Range of properties: design issues
If a list of classes defining a range or a domain of a slot
includes a class and its subclass, remove the subclass.
If a list of classes defining a range or a domain of a slot
contains all subclasses of a class A, but not class A, the
range should contain only class A and not the subclasses.
If a list of classes defining a range or a domain of a slot
contains all but a few subclasses of a class A, consider if
class A makes a more appropriate range definition.
18
7. Create instances
1) choosing a class
– E.g., class BeaujolaisWine
2) creating an individual instance of that class
– E.g., Chateau-Morgon-Beaujolais
3) filling in the property (slot) values
– E.g., hasBody Light; hasColor Red; hasFlavour
Delicate, etc.
19
Error detection in building
ontologies
20
4. Classes and class hierarchy: correction
Is-a: hierarchical relations:
– A subclass of a class represents a concept that is a “kind of” the
concept that the superclass represents.
– E.g., Chardonnay is a subclass of WhiteWine
– Common mistake: to include both singular and plural version of
same concept in hierarchy, making former subclass of latter. (e.g.,
class Wines and a class Wine as a subclass is an error).
21
4. Classes and class hierarchy: correction
Transitivity of hierarchical relations
– If B is a subclass of A and C is a subclass of B, then C is a subclass
of A
– E.g., class Wine, and then define a class WhiteWine as a subclass
of Wine. Then we define a class Chardonnay as a subclass of
WhiteWine.
– Conclusion: So, Chardonnay will be also a subclass of Wine.
22
4. Classes and class hierarchy: correction
Evolution of a class hierarchy
– a subclass now may become a subclass of another class later
– E.g., Zinfandel wines were red. Therefore, Zinfandel was a
subclass of the RedWine class. Now, they also are rose.
– Conclusion: We need to break the Zinfandel class into two
classes— WhiteZinfandel and RedZinfandel — and classify them
as subclasses of RoseWine and WhiteWine, respectively.
23
4. Classes and class hierarchy: correction
Classes as their names:
– Classes represent concepts in the domain and
not the words that denote these concepts.
• e.g. renaming the class Shrimp to Prawn represents the same concept
– Synonyms for the same concept do not represent different classes.
– Common mistake: to have the two classes above (avoid if possible)
24
4. Classes and class hierarchy: correction
Avoiding class cycles:
– there is a cycle in a hierarchy when some class A has a subclass B
and at the same time B is a superclass of A.
– Correction: Creating such a cycle in a hierarchy amounts to
declaring that the classes A and B are equivalent: all instances of A
are instances of B and all instances of B are also instances of A.
25
4. Classes: corrections
Siblings in a class hierarchy:
– All the siblings in the hierarchy (except for the ones at the root)
must be at the same level of generality.
– Common mistake: White wine and Chardonnay being
subclasses of the same class (should be avoided)
26
4. Classes: corrections
How many is too many and how few is too few?
– If a class has only one direct subclass there may be a modelling
problem, or the ontology is not complete.
• E.g., an only subclass CotesD’Or for RedBurgundy may be an error.
– If there are more than a dozen (12) subclasses for a given class
then additional intermediate categories may be necessary.
• E.g., a long list of wines under the class Wine, without any sub-hierarchy,
may be an error.
27
4. Classes: other issues
Multiple inheritance
– These are allowed in most systems –
but beware that all slots and facets are
also inherited
– E.g., Port is also RedWine as well as DesertWine.
Conclusion: So it will inherit:
• hasSugarLevel Sweet from DesertWine
• and containsTannin from RedWine.
28
4. Classes: corrections
New class? (or not)
– Subclasses of a class usually:
1. have additional properties that the superclass does not
have, or
2. different restrictions from those of the superclass, or
3. participate in different relationships than the superclasses
– However: Classes in terminological hierarchies do not have to
introduce new properties
29
4. Classes: corrections
New class or property(slot) value?
– If the concepts with different slot values become restrictions for
different slots in other classes. Otherwise, we represent the
distinction in a slot value. (e.g., RedMerlot, WhiteMerlot if they
have different properties)
– If a distinction is important in the domain and we think of the
objects with different values for the distinction as different kinds
of objects, then we should create a new class for the distinction.
– A class to which an individual instance belongs should not change
often. (e.g., chilledWine should be a property, not a class)
30
4. Classes: corrections
New class or instance?
– Individual instances are the most specific concepts represented
in a knowledge base. (e.g., individual wine bottles)
– If concepts form a natural hierarchy, then we should represent
them as classes. (e.g., BourgogneRegion is a class, as it has
subclasses or instances such as CotesD’Or)
– However: Abstract classes have no instances.
31
4. Classes: corrections
Limiting scope
– The ontology should not contain all the possible information
about the domain: you do not need to specialize (or generalize)
more than you need for your application (at most one extra level
each way). (e.g., the ontology has no FavouriteWine term)
Disjoint subclasses
– Many systems allow us to specify explicitly that several classes
are disjoint.
– Common mistake: a subclass of both shouldn’t exist!
32
5. Defining properties
Inverse properties/slots/relations
– Storing the information “in both directions” is redundant.
However, it can be convenient to have both pieces of information
explicitly available. A knowledge-acquisition system could
automatically fill in value for inverse relation for consistency of
knowledge base.
– e.g., makerOf versus produces
33
5. Defining properties
Default property (slot) values
– If a particular slot value is the same for most instances of a class.
When a new instance of a class containing this slot is created, the
system fills in the default value automatically. It can then be
changed to any other value allowed.
– e.g. hasSugarLevel Sweet for all DesertWines
34
Previously: ontology building and error
correction
Next: Names; formal criteria to evaluate
ontologies
35
What’s in a name?
Define a naming convention for classes and slots and adhere to it.
Capitalisation and delimiters (space, underscore, dash)
– consistent capitalization for concept names;
– use lower case for property names
Singular or plural (a class Wine actually represents all wines)
– Prefix and suffix conventions
• Prefix has, suffix of: e.g., hasMaker; makerOf;
– Other naming conventions
• Do not add “class”, “property”, “slot”, to concept names.
• Avoid abbreviations. Use similar conventions for subclasses (not Red and
WhiteWine)
36
(Formal) criteria to evaluate
ontologies
37
Evaluation criteria
Consistency
Completeness
Conciseness
Expandability
Sensitiveness
38
Consistency
Whether it is possible to obtain contradictory
conclusions from valid input definitions
3 conditions (if and only if):
– If there is no contradiction between formal definition
and real world
– If there is no contradiction between informal
definition and real world
– If formal and informal definition have same meaning
(internal consistency)
39
Example: internal consistency &
individual inconsistency
Informal: “The days in the week are: house,
Tuesday, Wednesday, Thursday, Friday, Saturday,
Sunday”
Formal: <owl:Class rdf:ID=“Week"/>
<Week rdf:ID=“house" />
<Week rdf:ID=“Tuesday" /> …
<Week rdf:ID=“Sunday" />
40
Example: internal consistency &
individual inconsistency
Informal: “The days in the week are: house,
Tuesday, Wednesday, Thursday, Friday, Saturday,
Sunday”
Internal consistency:
Formal:
informal and formal
<owl:Class rdf:ID=“Week"/>
definitions have the
<Week rdf:ID=“house" />
same meaning
<Week rdf:ID=“Tuesday" /> …
<Week rdf:ID=“Sunday" />
41
Example: internal consistency &
individual inconsistency
Informal: “The days in the week are: house,
Tuesday, Wednesday, Thursday, Friday, Saturday,
Sunday”
Individual
inconsistency:
Formal:
contradiction between real
<owl:Class rdf:ID=“Week"/>
world and
<Week rdf:ID=“house" />
<Week rdf:ID=“Tuesday" /> … formal/informal definition
<Week rdf:ID=“Sunday" />
42
Completeness
Ontology complete, if and only if:
– All that is supposed to be in the ontology is explicitly
stated in it, or can be inferred.
– Each definition is complete.
To prove incompleteness, we prove the
incompleteness of an individual definition
43
Conciseness
An ontology is concise if:
– It doesn’t store any unnecessary definitions;
– Explicit redundancies between definitions of
terms don’t exist;
– Redundancies cannot be inferred.
44
Expandability
The effort required to add new definitions
to an ontology (or more knowledge to its
definitions) without altering the set of
well-defined properties already present
(guaranteed).
45
Sensitiveness
How small changes in a definition alter the
set of well-defined properties already
present (guaranteed).
46
47
Semantic Inconsistency errors
Incorrect semantic classification:
– classifies a concept as a subclass of a concept
to which it does not belong;
• e.g., Dog as subclass of House or
– an instance under the wrong class
• e.g., Pluto under House
48
Inconsistency: circularity errors
A class if defined as a specialisation or
generalisation of itself;
Distance: 0 (a class within itself) 1, …, n
49
Inconsistency: circularity errors
50
Inconsistency: partition errors
Partitions: define concept classifications in a
disjoint and/or complete manner.
Mistakes:
–
–
–
–
–
Subclass partition with common classes
Subclass partition with common instances
Exhaustive subclass partition with common classes
Exhaustive subclass partition with common instances
Exhaustive subclass partition with external instances.
51
Subclass partition with common classes
When a class
belongs to more
than one subclass
– e.g., Dog and Cat as
subclasses of
Mammal, and
Doberman as a
subclass of both
classes
52
Subclass partition with common instances
When several
instances belong to
more than one
subclass
e.g., Dog and Cat
as subclasses of
Mammal, and
Pluto as instance
of both classes
53
Exhaustive subclass partition with common classes
E.g., Odd and Even
as exhaustive
subclass partition
of Number, if a
subclass Prime is a
subclass of both
classes.
54
Exhaustive subclass partition with common instances
When one or several
instances belong to more
than one subclass of the
exhaustive partition
e.g., Odd and Even as
exhaustive subclass partition
of Number, if a number
Three is an instance of both
classes
55
Exhaustive subclass partition with external instances
When having defined
an exhaustive subclass
partition of the base
class, there is one
more instance that
doesn’t belong to any
class in the subpartition
e.g., Odd and Even as exhaustive subclass partition of
Number, and instance Three of Number doesn’t belong to
any of the other classes
56
Detecting Incompleteness
Check completeness of class hierarchy
(impreciseness or over-specification)
Check completeness of the domains and ranges of
properties (impreciseness or over-specification)
Check completeness of classes (properties missing,
different classes with same definition, etc.)
57
Example incomplete concept
classification
Class MusicalInstruments is defined
considering only the sub-classes
StringInstruments and WindInstruments
(overlooking, e.g. percussion instruments)
58
Partition errors
When a definition of a partition between a set of
classes is omitted.
Subclass partition omission: dog, cat as subclasses
of mammal, but forget that they are a subclass
partition of mammals (disjoint classes)
Exhaustive subclass partition omission: defining a
partition of a class and omitting the fact that it is
exhaustive (e.g., odd, even subclasses of number,
without specifying exhaustiveness)
59
Detecting Redundancy
Grammatical redundancy errors: where there is
more than one definition of the hierarchical
relation
– Redundancies of subclass-of relations: more than one
– Redundancies of instance-of relations: direct/indirect
repetition (the latter instance of a subclass)
Identical formal definition of classes
Identical formal definition of instances
60
Conclusions
Ontology-development methodology for
declarative frame-based systems (see, e.g., Generic UM
techniques)
complex issues of defining class hierarchies and
properties of classes and instances
there is no single correct ontology for any domain
However: this does not mean that errors cannot be
61
detected!
Download