Principles of Best Practice in Ontology Development Barry Smith 1 Prospective standardization is a good thing Prospective standardization is the only thing which will work in mission critical domains Prospective standardization means that certain limits to tolerance must be imposed, Need for top-down governance to ensure common architecture and resolution of border disputes in areas of overlap between domains 2 Problem of ensuring sensible cooperation in a massively interdisciplinary community Consider multiple uses of technical terms such as − type − concept − instance − model − representation − data 3 Three Levels L3. Words, models (published representations, ontologies, databases ...) L2. Ideas (concepts, thoughts, memories, ...) L1. Things (cells, planets, processes of cell division ...) 4 Entity =def anything which exists, including things and processes, functions and qualities, beliefs and actions, documents and software (entities on levels 1, 2 and 3) 5 First basic distinction among entities type vs. instance (science text vs. diary) (human being vs. Tom Cruise) 6 For ontologies it is generalizations that are important = types, universals, kinds, species 7 Catalog vs. inventory A B C 515287 521683 521682 DC3300 Dust Collector Fan Gilmer Belt Motor Drive Belt 8 An ontology is a representation of types We learn about types in reality from looking at the results of scientific experiments in the form of scientific theories experiments relate to what is particular science describes what is general 9 types object organism animal mammal cat siamese frog instances 10 Domain =def a portion of reality that forms the subjectmatter of a single science or technology or mode of study or administrative practice: proteomics epidemiology C2 M&S 11 Representation =def an image, idea, map, picture, name or description ... of some entity or entities. 12 Ontologies are representational artifacts comparable to science texts and subject to the same sorts of constraints (including need for update) 13 Representational units =def terms, icons, alphanumeric identifiers ... which refer, or are intended to refer, to entities and which are minimal (atoms) 14 Composite representation =def representation (1) built out of representational units which (2) form a structure that mirrors, or is intended to mirror, the entities in some domain 15 The Periodic Table Periodic Table 16 Ontologies are here 17 or here 18 Ontologies represent general structures in reality (leg) 19 Ontologies do not represent concepts in people’s heads 20 They represent types in reality 21 How do we know which general terms designate types? Types are repeatables: cell, electron, weapon, F16 ... Instances are one-off: Bill Clinton, this laptop, this handwave 22 Problem The same general term can be used to refer both to types and to collections of particulars. Consider: HIV is an infectious retrovirus HIV is spreading very rapidly through Asia 23 Class =def a maximal collection of particulars determined by a general term (‘cell’, ‘electron’ but also: ‘ ‘restaurant in Palo Alto’, ‘Italian’) the class A = the collection of all particulars x for which ‘x is A’ is true 24 types vs. their extensions types {a,b,c,...} collections of particulars 25 Extension =def The extension of a type is the class of its instances 26 types vs. classes types {c,d,e,...} classes 27 types vs. classes types extensions other sorts of classes compare: ‘natural kinds’ 28 types vs. classes types populations, ... the class of all diabetic patients in Leipzig on 4 June 1952 29 OWL is a good representation of classes • F16s • sibling of Finnish spy • member of Abba aged > 50 years 30 types, classes, concepts types classes ‘concepts’ ? 31 types < classes < ‘concepts’ ? Cases of ‘concepts’ which, some people say, do not correspond to classes: ‘Cancelled oophorectomy’ ‘Absent nipple’ ‘Unlocalized ligand’ A cancelled oophorectomy is not a special kind of conceptual oophorectory Use: Information Artifact Ontology (IAO) 32 Ontology =def. a representational artifact whose representational units (which may be drawn from a natural or from some formalized language) are intended to represent 1. types in reality 2. those relations between these types which obtain universally (= for all instances) lung is_a anatomical structure lobe of lung part_of lung 33 Relation Ontology The prime goal is to create a limited repertoire of relations linking types A is_a B A part_of B To do this we need coherent treatment of the relations between the underlying instances 34 35 RO1.0 http://obofoundry.org/ro/ is_a part_of has_part located_in contained_in adjacent_to transformation_of derives_from preceded_by has_participant has_agent plus: instance_of, instance-level relations plus multiple defined (short-cut) relations 36 Rules for including relations in RO To avoid forking, keep RO as small as possible If we have a relation, say, adjacent_to in RO, then we should not add lists of easily defined relations of the form adjacent_to_organ: adjacent_to_cytoplasm: adjacent_to_neuron: In general: include a relation only if it is lexicalized 37 Thus for example instead of: results_in_reception_of_stimulus_and_ conversion_into_molecular_signal_of use just the relations: results_in, is_a and biological process terms: reception_of_stimulus, conversion_into_molecular_signal 38 Or in other words: A results_in_reception_of_stimulus_and_ conversion_into_molecular_signal_of B =Def. A results_in B & B is_a reception_of_stimulus & B is_a conversion_into_molecular_signal 39 Instance-level relations to be added to RO 1.0 dependent_on (between a dependent entity and its carrier or bearer) quality_of (between a dependent and an independent continuant) functioning_of (between a process and an independent continuant) 40 Definitions of type-level relations presuppose underlying instance-level relations A is_a B presupposes instance_of All instances of A are instances of B A part_of B presupposes instance-levelpart-of Every instance of A is an instance-levelpart-of some instance of B 41 What is symmetric on the level of instances need not be symmetric on the level of types adjacency on the instance level is always symmetric 46 Not however on the level of types: seminal vesicle adjacent_to urinary bladder Not: urinary bladder adjacent_to seminal vesicle 47 Similarly, on the level of types, while: nucleus adjacent_to cytoplasm it is not the case that cytoplasm adjacent_to nucleus 48 Principle of Low Hanging Fruit Include even absolutely trivial assertions (assertions you know to be universally true) pneumococcal virus is_a virus Computers need to be led by the hand 49 MeSH MeSH Descriptors Index Medicus Descriptor Anthropology, Education, Sociology and Social Phenomena (MeSH Category) Social Sciences Political Systems National Socialism National Socialism is_a Political Systems National Socialism is_a Anthropology ... 50 Principle of singular nouns Terms in ontologies represent types Goal: Each term in an ontology should represent exactly one type Thus every term should be a singular noun 51 Principle: do not commit the usemention confusion mouse =def. common name for the species mus musculus swimming is healthy and has eight letters 52 Principle: do not commit the usemention confusion Avoid confusing between words and things Avoid confusing between concepts in our minds and entities in reality Recommendation: avoid the word ‘concept’ entirely 53 Trialbank ‘information’ = def. ‘a written or spoken designation of a concept’ 54 Trialbank ‘Heparin therapy’ is an instance of ‘written or spoken designation of a concept’ What are the problems here? 1. misuse of quotation marks 2. confusion of instances and types 3. confusion of concept and reality 55 Principle: beware of terminological baggage For the sake of interoperability with other ontologies, do not give special meanings to terms with established general meanings (Don’t use ‘cell’ when you mean ‘plant cell’) 56 ICNP: International Classification of Nursing Procedures water =def. a type of Nursing Phenomenon of Physical Environment with the specific characteristics: clear liquid compound of hydrogen and oxygen that is essential for most plant and animal life influencing life and development of human beings. 57 Principle of definitions Supply definitions for every term 1. human-understandable natural language definition 2. an equivalent formal definition 58 Principle: definitions must be unique Each term should have exactly one definition it may have both natural-language and formal versions (issue with ontologies which exist with different levels of expressivity) 59 The Problem of Circularity A Person =def. A person with an identity document Hemolysis =def. The causes of hemolysis 60 Principle of non-circularity The term defined should not appear in its own definition 61 HL7 ‘stopping a medication’ = def. change of state in the record of a Substance Administration Act from Active to Aborted 62 Principle of increase in understandability A definition should use only terms which are easier to understand than the term defined Definitions should not make simple things more difficult than they are 63 Generalized Tarski principle (a good, general constraint on a theory of meaning) For each linguistic expression ‘E’ ‘E’ means E ‘snow’ means: snow ‘pneumonia’ means: pneumonia 64 HL7 Reference Information Model ‘medication’ does not mean: medication rather it means: the record of medication in an information system ‘disease’ does not mean: disease rather it means: the observation of a disease 65 Principle of acknowledging primitives In every ontology some terms and some relations are primitive = they cannot be defined (on pain of infinite regress) Examples of primitive relations: identity instance_of 66 Principle of Aristotelian definitions Use Aristotelian definitions An A is a B which C’s. A human being is an animal which is rational 67 Rules for formulating terms Avoid abbreviations even when it is clear in context what they mean (‘breast’ for ‘breast tumor’) Avoid acronyms Avoid mass terms (‘tissue’, ‘brain mapping’, ‘clinical research’ ...) Treat each term ‘A’ in an ontology is shorthand for a term of the form ‘the type A’ 68 Univocity Terms should have the same meanings on every occasion of use. (= They should refer to the same types) Basic ontological relations such as is_a and part_of should be used in the same way by all ontologies 69 Universality Ontologies are made of relational assertions They should include only those which hold universally 70 universality Often, order will matter: We can assert adult transformation_of child but not child transforms_into adult 71 universality viral pneumonia caused by virus but not virus causes pneumonia pneumococcal virus causes pneumonia 72 Principle of Universality results analysis later_than protocol-design but not protocol-design earlier_than results analysis 73 Principle of positivity Complements of types are not themselves types. Terms such as non-mammal non-membrane other metalworker in New Zealand do not designate types in reality 74 Generalized Anti-Boolean Principle There are no conjunctive and disjunctive types: anatomic structure, system, or substance musculoskeletal and connective tissue disorder 75 Objectivity Which types exist in reality is not a function of our knowledge. Terms such as unknown unclassified unlocalized arthropathies not otherwise specified do not designate types in reality. 76 Keep Epistemology Separate from Ontology If you want to say that We do not know where A’s are located do not invent a new class of A’s with unknown locations (A well-constructed ontology should grow linearly; it should not need to delete classes or relations because of increases in knowledge) 77 Keep Sentences Separate from Terms If you want to say I surmise that this is a case of pneumonia do not invent a new class of surmised pneumonias Confusion of ‘findings’ in medical terminologies 78 Single Inheritance No kind in a classificatory hierarchy should be asserted to have more than one is_a parent on the immediate higher level 79 Multiple Inheritance thing car blue thing is_a is_a blue car 80 Multiple Inheritance is a source of errors encourages laziness serves as obstacle to integration with neighboring ontologies hampers use of Aristotelian methodology for defining terms hampers use of statistical search tools 81 Multiple Inheritance thing blue thing car is_a1 is_a2 blue car 82 Principle of asserted single inheritance Each reference ontology module should be built as an asserted monohierarchy (a hierarchy in which each term has at most one parent) Asserted hierarchy vs. inferred hierarchy 83 Principle of normalization Polyhierarchies should be decomposable into homogeneous disjoint monohierarchies 84 Principle of instantiation A term should be included in an ontology only if there is evidence that instances to which that term refers exist in reality. 85 Avoid mass nouns Count nouns = an organism, a planet, a handshake Mass nouns = tissue, information, discourse Mass nouns almost always go hand in hand with ontological confusion 86 is_a Overloading The success of ontology alignment demands that ontological relations (is_a, part_of, ...) have the same meanings in the different ontologies to be aligned. 87 Multiple Inheritance thing blue thing car is_a1 is_a2 blue car 88 How to solve this problem Create two ontologies: of cars of colors Link the two together via cross-products (= factoring, normalization, modularization) 89 Compositionality The meanings of compound terms should be determined 1. by the meanings of component terms together with 2. the rules governing syntax 90 Why do we need rules/standards for good ontology? Ontologies must be intelligible both to humans (for annotation and curation) and to machines (for reasoning and error-checking): the lack of rules for classification leads to human error and blocks automatic reasoning and error-checking Intuitive rules facilitate training of curators and annotators Common rules allow alignment with other ontologies 91 Ontology path dependence principle The decisions made by the creators of an ontology – including those decisions which pertain to the ontology’s upper-level architecture – should as far as possible be made on the basis of the degree to which they advance the consistency of that ontology with the reference ontologies already existing in relevant domains. 92 User feedback principle An ontology should evolve on the basis of feedback derived from those who are using the ontology for example for purposes in annotation. 93