Ontology development and evaluation Dr. Alexandra I. Cristea Why develop an ontology? Shared understanding of information structure (between people or agents) Enable reuse of domain knowledge Make domain assumptions explicit Separate domain knowledge from operational knowledge Analyse the domain knowledge Ontology Development 101: A Guide to Creating Your First Ontology 2 What is an ontology? (reminder) A way of encoding domain knowledge, linking the knowledge, which allows for reasoning with the data Ontologies allow for data integration and inference, for automated query-answering and automated use of data formal explicit description of concepts in a domain of discourse (classes/concepts), concept properties (slots/roles/ properties), and restrictions on slots (facets /role restrictions) 3 Developing an ontology in praxis (reminder) defining classes arranging classes in hierarchy (sub/superclass) defining properties/slots & their allowed values, filling in values for slots for instances 4 Rules in ontology design: a simple knowledge engineering methodology 1. There is no one correct way to model a domain— there are viable alternatives. Best solution depends on application and extensions. 2. Ontology development is an iterative process. 3. Concepts should be close to objects (physical/logical) and relationships in domain of interest. – E.g., nouns (objects) or verbs (relationships) in domain describing sentences. 5 Ontology design steps 1. 2. 3. 4. 5. 6. 7. Determine the domain and scope of the ontology Consider reusing existing ontologies Enumerate important terms in the ontology Define the classes and the class hierarchy Define the properties of classes—slots Define the facets of the slots Create instances 6 1. Domain and scope: general questions What is the domain that the ontology will cover? For what we are going to use the ontology? For what types of questions the information in the ontology should provide answers? Who will use and maintain the ontology? 7 1. Domain and scope: general questions What is the domain that the ontology will cover? – E.g., wine and food For what we are going to use the ontology? – E.g., for applications that suggest good combinations of wines and food For what types of questions the information in the ontology should provide answers? – See competency questions Who will use and maintain the ontology? – E.g., users could be restaurant customers deciding which wine to order – so price is necessary – Maintenance may use different language – so mapping is necessary 8 1. Domain and scope: competency questions a list of questions that a knowledge base based on the ontology should be able to answer These questions will serve as the litmus test later: – Does the ontology contain enough information to answer these types of questions? – Do the answers require a particular level of detail or representation of a particular area? 9 1. Example competency questions Which wine characteristics should I consider when choosing a wine? Is Bordeaux a red or white wine? Does Cabernet Sauvignon go well with seafood? What is the best choice of wine for grilled meat? Which characteristics of a wine affect its appropriateness for a dish? Does a bouquet or body of a specific wine change with vintage year? What were good vintages for Napa Zinfandel? 10 2. Reusing existing ontologies may be requirement if system needs to interact with other applications many ontologies are already available may require translation of used formalism Libraries: SWOOGLE; DAML Library; Ontolingua Consider also domain specific lists, classifications – e.g. lists of wine properties from commercial websites 11 3. Important terms write down a list of all terms we would like either to make statements about or to explain to a user – What are the terms we would like to talk about? – What properties do those terms have? – What would we like to say about those terms? 12 4. Classes and hierarchy: approaches top-down development process starts with definition of most general concepts in domain and then specialization of concepts. bottom-up development process starts with definition of most specific classes, hierarchy leaves, with subsequent grouping of these classes into more general concepts. combination development process = combination of top-down & bottom-up approaches: define more salient concepts first and then 13 generalize and specialize them appropriately. 4. Classes and hierarchy: approach examples top-down – start with general concepts Wine, Food; specialize Wine into WhiteWine, RedWine, Roséwine etc. bottom-up – start with leaves of hierarchy: Pauillac and Margaux wines; then, the common superclass: Medoc, which is a subclass of Bordeaux. combination – Might start with top-level concepts Wine, then specific concepts Margaux, then mid-concepts Medoc, etc. 14 5. Properties (slots) Classes alone won’t answer all competency questions (Step 1). Once we have defined some classes, we must describe the internal structure of concepts. After selecting classes from the list of terms (Step 3), remaining terms are likely to be properties of classes. For each property, we must determine which class it describes. These become slots attached to classes. 15 5. Object property types “intrinsic” properties (e.g., flavour of a wine) “extrinsic” properties (e.g., wine’s name, and provenance area) parts, for structured object (e.g., courses of a meal) relationships to other individuals (e.g., maker of wine, relationship between wine and winery) 16 6. Facets of properties/slots value type (string, number, Boolean, enumerated, instance) allowed values (e.g., name of wine is a String) number of the values (cardinality: exact, min, max) Domain: classes to which slot is attached (I) Range: allowed classes for slots of type instance (O) 17 6. Domain and Range of properties: design issues If a list of classes defining a range or a domain of a slot includes a class and its subclass, remove the subclass. If a list of classes defining a range or a domain of a slot contains all subclasses of a class A, but not class A, the range should contain only class A and not the subclasses. If a list of classes defining a range or a domain of a slot contains all but a few subclasses of a class A, consider if class A makes a more appropriate range definition. 18 7. Create instances 1) choosing a class – E.g., class BeaujolaisWine 2) creating an individual instance of that class – E.g., Chateau-Morgon-Beaujolais 3) filling in the property (slot) values – E.g., hasBody Light; hasColor Red; hasFlavour Delicate, etc. 19 Error detection in building ontologies 20 4. Classes and class hierarchy: correction Is-a: hierarchical relations: – A subclass of a class represents a concept that is a “kind of” the concept that the superclass represents. – E.g., Chardonnay is a subclass of WhiteWine – Common mistake: to include both singular and plural version of same concept in hierarchy, making former subclass of latter. (e.g., class Wines and a class Wine as a subclass is an error). 21 4. Classes and class hierarchy: correction Transitivity of hierarchical relations – If B is a subclass of A and C is a subclass of B, then C is a subclass of A – E.g., class Wine, and then define a class WhiteWine as a subclass of Wine. Then we define a class Chardonnay as a subclass of WhiteWine. – Conclusion: So, Chardonnay will be also a subclass of Wine. 22 4. Classes and class hierarchy: correction Evolution of a class hierarchy – a subclass now may become a subclass of another class later – E.g., Zinfandel wines were red. Therefore, Zinfandel was a subclass of the RedWine class. Now, they also are rose. – Conclusion: We need to break the Zinfandel class into two classes— WhiteZinfandel and RedZinfandel — and classify them as subclasses of RoseWine and WhiteWine, respectively. 23 4. Classes and class hierarchy: correction Classes as their names: – Classes represent concepts in the domain and not the words that denote these concepts. • e.g. renaming the class Shrimp to Prawn represents the same concept – Synonyms for the same concept do not represent different classes. – Common mistake: to have the two classes above (avoid if possible) 24 4. Classes and class hierarchy: correction Avoiding class cycles: – there is a cycle in a hierarchy when some class A has a subclass B and at the same time B is a superclass of A. – Correction: Creating such a cycle in a hierarchy amounts to declaring that the classes A and B are equivalent: all instances of A are instances of B and all instances of B are also instances of A. 25 4. Classes: corrections Siblings in a class hierarchy: – All the siblings in the hierarchy (except for the ones at the root) must be at the same level of generality. – Common mistake: White wine and Chardonnay being subclasses of the same class (should be avoided) 26 4. Classes: corrections How many is too many and how few is too few? – If a class has only one direct subclass there may be a modelling problem, or the ontology is not complete. • E.g., an only subclass CotesD’Or for RedBurgundy may be an error. – If there are more than a dozen (12) subclasses for a given class then additional intermediate categories may be necessary. • E.g., a long list of wines under the class Wine, without any sub-hierarchy, may be an error. 27 4. Classes: other issues Multiple inheritance – These are allowed in most systems – but beware that all slots and facets are also inherited – E.g., Port is also RedWine as well as DesertWine. Conclusion: So it will inherit: • hasSugarLevel Sweet from DesertWine • and containsTannin from RedWine. 28 4. Classes: corrections New class? (or not) – Subclasses of a class usually: 1. have additional properties that the superclass does not have, or 2. different restrictions from those of the superclass, or 3. participate in different relationships than the superclasses – However: Classes in terminological hierarchies do not have to introduce new properties 29 4. Classes: corrections New class or property(slot) value? – If the concepts with different slot values become restrictions for different slots in other classes. Otherwise, we represent the distinction in a slot value. (e.g., RedMerlot, WhiteMerlot if they have different properties) – If a distinction is important in the domain and we think of the objects with different values for the distinction as different kinds of objects, then we should create a new class for the distinction. – A class to which an individual instance belongs should not change often. (e.g., chilledWine should be a property, not a class) 30 4. Classes: corrections New class or instance? – Individual instances are the most specific concepts represented in a knowledge base. (e.g., individual wine bottles) – If concepts form a natural hierarchy, then we should represent them as classes. (e.g., BourgogneRegion is a class, as it has subclasses or instances such as CotesD’Or) – However: Abstract classes have no instances. 31 4. Classes: corrections Limiting scope – The ontology should not contain all the possible information about the domain: you do not need to specialize (or generalize) more than you need for your application (at most one extra level each way). (e.g., the ontology has no FavouriteWine term) Disjoint subclasses – Many systems allow us to specify explicitly that several classes are disjoint. – Common mistake: a subclass of both shouldn’t exist! 32 5. Defining properties Inverse properties/slots/relations – Storing the information “in both directions” is redundant. However, it can be convenient to have both pieces of information explicitly available. A knowledge-acquisition system could automatically fill in value for inverse relation for consistency of knowledge base. – e.g., makerOf versus produces 33 5. Defining properties Default property (slot) values – If a particular slot value is the same for most instances of a class. When a new instance of a class containing this slot is created, the system fills in the default value automatically. It can then be changed to any other value allowed. – e.g. hasSugarLevel Sweet for all DesertWines 34 Previously: ontology building and error correction Next: Names; formal criteria to evaluate ontologies 35 What’s in a name? Define a naming convention for classes and slots and adhere to it. Capitalisation and delimiters (space, underscore, dash) – consistent capitalization for concept names; – use lower case for property names Singular or plural (a class Wine actually represents all wines) – Prefix and suffix conventions • Prefix has, suffix of: e.g., hasMaker; makerOf; – Other naming conventions • Do not add “class”, “property”, “slot”, to concept names. • Avoid abbreviations. Use similar conventions for subclasses (not Red and WhiteWine) 36 (Formal) criteria to evaluate ontologies 37 Evaluation criteria Consistency Completeness Conciseness Expandability Sensitiveness 38 Consistency Whether it is possible to obtain contradictory conclusions from valid input definitions 3 conditions (if and only if): – If there is no contradiction between formal definition and real world – If there is no contradiction between informal definition and real world – If formal and informal definition have same meaning (internal consistency) 39 Example: internal consistency & individual inconsistency Informal: “The days in the week are: house, Tuesday, Wednesday, Thursday, Friday, Saturday, Sunday” Formal: <owl:Class rdf:ID=“Week"/> <Week rdf:ID=“house" /> <Week rdf:ID=“Tuesday" /> … <Week rdf:ID=“Sunday" /> 40 Example: internal consistency & individual inconsistency Informal: “The days in the week are: house, Tuesday, Wednesday, Thursday, Friday, Saturday, Sunday” Internal consistency: Formal: informal and formal <owl:Class rdf:ID=“Week"/> definitions have the <Week rdf:ID=“house" /> same meaning <Week rdf:ID=“Tuesday" /> … <Week rdf:ID=“Sunday" /> 41 Example: internal consistency & individual inconsistency Informal: “The days in the week are: house, Tuesday, Wednesday, Thursday, Friday, Saturday, Sunday” Individual inconsistency: Formal: contradiction between real <owl:Class rdf:ID=“Week"/> world and <Week rdf:ID=“house" /> <Week rdf:ID=“Tuesday" /> … formal/informal definition <Week rdf:ID=“Sunday" /> 42 Completeness Ontology complete, if and only if: – All that is supposed to be in the ontology is explicitly stated in it, or can be inferred. – Each definition is complete. To prove incompleteness, we prove the incompleteness of an individual definition 43 Conciseness An ontology is concise if: – It doesn’t store any unnecessary definitions; – Explicit redundancies between definitions of terms don’t exist; – Redundancies cannot be inferred. 44 Expandability The effort required to add new definitions to an ontology (or more knowledge to its definitions) without altering the set of well-defined properties already present (guaranteed). 45 Sensitiveness How small changes in a definition alter the set of well-defined properties already present (guaranteed). 46 47 Semantic Inconsistency errors Incorrect semantic classification: – classifies a concept as a subclass of a concept to which it does not belong; • e.g., Dog as subclass of House or – an instance under the wrong class • e.g., Pluto under House 48 Inconsistency: circularity errors A class if defined as a specialisation or generalisation of itself; Distance: 0 (a class within itself) 1, …, n 49 Inconsistency: circularity errors 50 Inconsistency: partition errors Partitions: define concept classifications in a disjoint and/or complete manner. Mistakes: – – – – – Subclass partition with common classes Subclass partition with common instances Exhaustive subclass partition with common classes Exhaustive subclass partition with common instances Exhaustive subclass partition with external instances. 51 Subclass partition with common classes When a class belongs to more than one subclass – e.g., Dog and Cat as subclasses of Mammal, and Doberman as a subclass of both classes 52 Subclass partition with common instances When several instances belong to more than one subclass e.g., Dog and Cat as subclasses of Mammal, and Pluto as instance of both classes 53 Exhaustive subclass partition with common classes E.g., Odd and Even as exhaustive subclass partition of Number, if a subclass Prime is a subclass of both classes. 54 Exhaustive subclass partition with common instances When one or several instances belong to more than one subclass of the exhaustive partition e.g., Odd and Even as exhaustive subclass partition of Number, if a number Three is an instance of both classes 55 Exhaustive subclass partition with external instances When having defined an exhaustive subclass partition of the base class, there is one more instance that doesn’t belong to any class in the subpartition e.g., Odd and Even as exhaustive subclass partition of Number, and instance Three of Number doesn’t belong to any of the other classes 56 Detecting Incompleteness Check completeness of class hierarchy (impreciseness or over-specification) Check completeness of the domains and ranges of properties (impreciseness or over-specification) Check completeness of classes (properties missing, different classes with same definition, etc.) 57 Example incomplete concept classification Class MusicalInstruments is defined considering only the sub-classes StringInstruments and WindInstruments (overlooking, e.g. percussion instruments) 58 Partition errors When a definition of a partition between a set of classes is omitted. Subclass partition omission: dog, cat as subclasses of mammal, but forget that they are a subclass partition of mammals (disjoint classes) Exhaustive subclass partition omission: defining a partition of a class and omitting the fact that it is exhaustive (e.g., odd, even subclasses of number, without specifying exhaustiveness) 59 Detecting Redundancy Grammatical redundancy errors: where there is more than one definition of the hierarchical relation – Redundancies of subclass-of relations: more than one – Redundancies of instance-of relations: direct/indirect repetition (the latter instance of a subclass) Identical formal definition of classes Identical formal definition of instances 60 Conclusions Ontology-development methodology for declarative frame-based systems (see, e.g., Generic UM techniques) complex issues of defining class hierarchies and properties of classes and instances there is no single correct ontology for any domain However: this does not mean that errors cannot be 61 detected!