The Foundations of Biomedical Ontology Barry Smith http://ontology.buffalo.edu/smith 1 Ontology (Phil.) = the science of the types of objects, qualities, proesses, events, funktions, environments, relations ... in all spheres of reality 2 Google hits (in millions) 12.10.06 ontology 24.0 ontology + philosophy 4.6 ontology + information science 7.4 ontology + database 11.1 3 4 ontology (computer science) (roughly) the construction of standardized classification systems designed to support compatibility and integration of data 5 National Center for Biomedical Ontology $18.8 mill. NIH Roadmap Center • Stanford Medical Informatics • University of San Francisco Medical Center • Berkeley Drosophila Genome Project • Cambridge University Department of Genetics • The Mayo Clinic • University at Buffalo Department of Philosophy 6 From chromosome to disease genomics transcriptomics proteomics reactomics metabonomics phenomics behavioromics connectomics toxicopharmacogenomics bibliomics … legacy of Human Genome Project 8 where in the body ? what kind of disease process ? need for semantic annotation of data 10 how create broad-coverage semantic annotation systems for biomedicine? covering: in vitro biological phenomena model organisms humans 11 12 13 Two types of ontology natural-science ontologies capture terminology-level knowledge used by best current science vs. administrative ontologies (e.g. billing ontologies, bloodbank ontologies, lab workflow ontologies) 14 Mission of the NCBO To create software and support services for science-based ontology development and use in the biomedical domain Science-based = ontologies for support of scientific research (taken as encompassing evidence-based medicine) Science-based = using the scientific method as part of the process of ontology development and testing 15 Scientific ontologies have special features Every term in a scientific ontology must be such that the developers of the ontology believe it to refer to some entity on the basis of the best current evidence 16 For scientific ontologies reusability is crucial compatibility with neighboring scientific ontologies is crucial it should not be too easy to add new terms to an ontology we want to introduce these features in clinical medicine ... 17 For scientific ontologies the issue of how the ontology will be used is not a factor relevant for determining which entities will be acknowledged by the ontology If this decision is made on specific practical needs, this will thwart reusability of the data the ontology is used to annotate 18 Administrative ontologies Entities may be brought into existence by the ontology itself. (Convention ...) Highly task-dependent – reusability and compatibility not (always) important Developers may invent dummy entities (‘surgical procedure not performed because of patient request’) e.g. for forensic reasons (reality and knowledge are confused) 19 Hypothesis Many of the shortfalls of existing administrative ontologies can be overcome by adopting the scientific approach A good theory is, in the long run, also practically useful Administrative ontologies ~ data models 20 An Ontological Square Ontologies in support of science Administrative ontologies An Ontological Square Upper-level integrating ontologies Domain ontologies 22 An Ontological Square Upper-level integrating ontologies Domain ontologies Ontologies in support of science Administrative ontologies 23 An Ontological Square Upper-level integrating ontologies Domain ontologies Ontologies in support of science BFO (Basic Formal SNOMED Ontology) SwissProt DOLCE FMA Administrative ontologies (for ecommerce, etc.) FOAF top level: person, topic, document, primary topic ... Amazon.com ontology Library of Congress Catalog 24 Problem of ensuring sensible cooperation in a massively interdisciplinary community concept type instance model representation data 25 Entity =def anything which exists, including things and processes, functions and qualities, beliefs and actions, documents and software (Levels 1, 2 and 3) 28 what are the kinds of entity? 29 First basic distinction universal vs. instance (science text vs. diary) (human being vs. Tom Cruise) 30 For science, and thus for scientific ontologies, it is generalizations that are important = universals, types, kinds, species 31 Catalog vs. inventory A B C 515287 521683 521682 DC3300 Dust Collector Fan Gilmer Belt Motor Drive Belt 32 Catalog vs. inventory 33 Catalog of Universals/Types Ontology Universals Instances 35 Ontology = A Representation of Universals 36 Each node of an ontology consists of: • preferred term (aka term) • term identifier (TUI, aka CUI) • synonyms • definition, glosses, comments Ontology = A representation of universals 37 An ontology is a representation of universals We learn about universals in reality from looking at the results of scientific experiments in the form of scientific theories experiments relate to what is particular science describes what is general 38 universals substance organism animal mammal cat siamese instances frog Domain =def a portion of reality that forms the subjectmatter of a single science or technology or mode of study or administrative practice ...; proteomics HIV epidemiology 40 Representation =def an image, idea, map, picture, name or description ... of some entity or entities. 41 Ontologies are representational artifacts comparable to science texts and subject to the same sorts of constraints (including need for update) 42 Representational units =def terms, icons, alphanumeric identifiers ... which refer, or are intended to refer, to entities and which are minimal (atoms) 43 The Periodic Table Periodic Table 46 Ontologies are here 47 or here 48 ontologies represent general structures in reality (leg) 49 Ontologies do not represent concepts in people’s heads 50 They represent universals in reality 51 “leg” is not the name of a concept concepts do not stand in the part_of connectedness causes treats ... relations used by biomedical ontologies 52 instances A B C 515287 521683 521682 DC3300 Dust Collector Fan Gilmer Belt Motor Drive Belt universals Inventory vs. Catalog: Two kinds of representational artifact Databases represent instances Ontologies represent universals 54 How do we know which general terms designate universals? Roughly: terms used by scientists to designate entities about which we have a plurality of different kinds of testable proposition (cell, electron ...) 55 Language has the power to create general terms which go beyond the domain of universals studied by science 56 Problem: fiat demarcations male over 30 years of age with family history of diabetes abnormal curvature of spine participant in trial #2030 57 Problem: roles fist patient FDA-approved drug 58 Administrative ontologies often need to go beyond universals Fall on stairs or ladders in water transport injuring occupant of small boat, unpowered Railway accident involving collision with rolling stock and injuring pedal cyclist Nontraffic accident involving motor-driven snow vehicle injuring pedestrian 59 Class =def a maximal collection of particulars determined by a general term (‘cell’. ‘electron’ but also: ‘ ‘restaurant in Palo Alto’, ‘Italian’) the class A = the collection of all particulars x for which ‘x is A’ is true 60 universals vs. their extensions universals {a,b,c,...} collections of particulars 61 Extension =def The extension of a universal A is the class: instance of the universal A (it is the class of A’s instances) (the class of all entities to which the term ‘A’ applies) 62 Problem The same general term can be used to refer both to universals and to collections of particulars. Consider: HIV is an infectious retrovirus HIV is spreading very rapidly through Asia 63 universals vs. classes universals {c,d,e,...} classes 64 universals vs. classes universals defined classes 65 universals vs. classes universals populations, ... 66 Defined class =def a class defined by a general term which does not designate a universal the class of all diabetic patients in Leipzig on 4 June 1952 67 OWL is a good representation of defined classes • sibling of Finnish spy • member of Abba aged > 50 years 68 Terminology =def. a representational artifact whose representational units are natural language terms (with IDs, synonyms, comments, etc.) which are intended to designate universals together with defined classes. 69 universals, classes, concepts universals defined classes ‘concepts’ ? 70 universals < defined classes < ‘concepts’ ‘concepts’ which do not correspond to defined classes: ‘Surgical or other procedure not carried out because of patient's decision’ ‘Congenital absent nipple’ because they do not correspond to anything 71 (Scientific) Ontology =def. a representational artifact whose representational units (which may be drawn from a natural or from some formalized language) are intended to represent 1. universals in reality 2. those relations between these universals which obtain universally (= for all instances) lung is_a anatomical structure lobe of lung part_of lung 72 Part II: How to Build an Ontology 73 How to build an ontology work with scientists to create an initial top-level classification find ~50 most commonly used terms corresponding to universals in reality arrange these terms into an informal is_a hierarchy according to this Universality principle A is_a B every instance of A is an instance of B fill in missing terms to give a complete hierarchy (leave it to domain scientists to populate the lower levels of the hierarchy) 74 Principle of Low Hanging Fruit Include even absolutely trivial assertions (assertions you know to be universally true) pneumococcal virus is_a virus Computers need to be led by the hand 75 MeSH MeSH Descriptors Index Medicus Descriptor Anthropology, Education, Sociology and Social Phenomena (MeSH Category) Social Sciences Political Systems National Socialism National Socialism is_a Political Systems National Socialism is_a Anthropology ... 76 Principle Use singular nouns Terms in ontologies represent universals 77 Goal: Each term in an ontology represents exactly one universal there are universals also of collectivities: population complex of cells 78 the use-mention confusion Conceptual Entities =Def. An organizational header for concepts representing mostly abstract entities. swimming is healthy and has eight letters 79 Principle Avoid confusing between words and things Avoid confusing between concepts in our minds and entities in reality Recommendation: avoid the word ‘concept’ entirely 80 Trialbank ‘information’ = def. ‘a written or spoken designation of a concept’ 81 Trialbank ‘Heparin therapy’ is an instance of ‘written or spoken designation of a concept’ What are the problems here? 1. misuse of quotation marks 2. confusion of instances and universals 3. confusion of concept and reality 82 Plant Ontology cell = def. plant cell, consisting of protoplast and cell wall; ... what happens when the users of the Plant Ontology need to consider bacterial pathogens in plants? 83 Principle For the sake of interoperability with other ontologies, do not give special meanings to terms with established general meanings (Don’t use ‘cell’ when you mean ‘plant cell’) 84 ICNP: International Classification of Nursing Procedures water =def. a type of Nursing Phenomenon of Physical Environment with the specific characteristics: clear liquid compound of hydrogen and oxygen that is essential for most plant and animal life influencing life and development of human beings. 85 Principle Supply definitions wherever possible (both human-understandable natural language definitions, and equivalent formal definitions) 86 Principle Each term should have at most one definition which may have both natural-language and formal versions 87 The Problem of Circularity A Person = def. A person with an identity document cell = def. plant cell, consisting of protoplast and cell wall; ... 88 Principle Avoid circular definitions (The term defined should not appear in its own definition) 89 HL7 ‘stopping a medication’ = def. change of state in the record of a Substance Administration Act from Active to Aborted 90 Principle A definition should use terms which are easier to understand than the term defined (HL7 creates a topsy turvy world, in which simple things are made difficult) 91 Principle Use Aristotelian definitions An A is a B which C’s. A human being is an animal which is rational 92 Principle Do not seek to define everything 93 In every ontology some terms and some relations are primitive = they cannot be defined (on pain of infinite regress) Examples of primitive relations: identity instance_of 94 Principle (a good, general constraint on a theory of meaning) For each linguistic expression ‘E’ ‘E’ means E ‘snow’ means: snow ‘pneumonia’ means: pneumonia 95 HL7 Reference Information Model ‘medication’ does not mean: medication rather it means: the record of medication in an information system ‘disease does not mean: disease rather it means: the observation of a disease 96 Univocity Terms should have the same meanings on every occasion of use. (= They should refer to the same universals) Basic ontological relations such as is_a and part_of should be used in the same way by all ontologies 98 Universality Ontologies are made of relational assertions They should include only those which hold universally pneumococcal virus causes pneumonia 99 Universality Often, order will matter: We can assert adult transformation_of child but not child transforms_into adult 100 Universality viral pneumonia caused by virus but not virus causes pneumonia pneumococcal virus causes pneumonia 101 Universality protocol-design earlier_than results analysis but not results analysis later_than protocol-design 102 Positivity Complements of universals are not themselves universals. Terms such as non-mammal non-membrane other metalworker in New Zealand do not designate universals in reality 103 Ontology of universals logic of terms There are no conjunctive and disjunctive universals: anatomic structure, system, or substance musculoskeletal and connective tissue disorder rheumatism, excluding the back 104 Objectivity Which universals exist in reality is not a function of our knowledge. Terms such as unknown unclassified unlocalized arthropathies not otherwise specified do not designate universals in reality. 105 Keep Epistemology Separate from Ontology If you want to say that We do not know where A’s are located do not invent a new class of A’s with unknown locations (A well-constructed ontology should grow linearly; it should not need to delete classes or relations because of increases in knowledge) 106 Keep Sentences Separate from Terms If you want to say I surmise that this is a case of pneumonia do not invent a new class of surmised pneumonias Confusion of ‘findings’ in medical terminologies 107 Single Inheritance No kind in a classificatory hierarchy should have more than one is_a parent on the immediate higher level 108 Multiple Inheritance thing car blue thing is_a is_a blue car 109 Multiple Inheritance is a source of errors encourages laziness serves as obstacle to integration with neighboring ontologies hampers use of Aristotelian methodology for defining terms hampers use of statistical search tools 110 Multiple Inheritance thing blue thing car is_a1 is_a2 blue car 111 is_a Overloading The success of ontology alignment demands that ontological relations (is_a, part_of, ...) have the same meanings in the different ontologies to be aligned. 112 Multiple Inheritance thing blue thing car is_a1 is_a2 blue car 113 How to solve this problem Create two ontologies: of cars of colors Link the two together via cross-products (= factoring, normalization, modularization) 114 Compositionality The meanings of compound terms should be determined 1. by the meanings of component terms together with 2. the rules governing syntax 115 Why do we need rules/standards for good ontology? Ontologies must be intelligible both to humans (for annotation and curation) and to machines (for reasoning and error-checking): the lack of rules for classification leads to human error and blocks automatic reasoning and error-checking Intuitive rules facilitate training of curators and annotators Common rules allow alignment with other ontologies 116 think of ontologies as legends for cartoons cartoons, like maps, always have a certain threshold of granularity but they can be veridical representations of reality nonetheless Goal: use logically well-structured ontologies to create algorithmic, dynamic cartoons 118 Randomized controlled trials http://rctbank.ucsf.edu/ontology/outline/index.htm 119 Top-Level Class Hierarchy for RCT Root Secondary-study Trial-details Trial Concept • • • • • • • Generic-concept Population-concept Protocol-concept Design-concept Outcome-concept Administrative-concept Intervention-concept 120 Trial Details Root Secondary-study Trial-details • • • • Erratum Publication-details Trial-entry-details Administrative-details – Secondary-administrative-details – Primary-administrative-details » Executed-administrative-details » Intended-administrative-details • Conclusion-details • Background-details – Intended-background-details – Executed-background-details • • • • Stopping-details Retraction-details Correction-details Fraud-details 121 Top-Most Class Hierarchy for RCT Root Secondary-study Trial-details Trial Concept • • • • • • • Generic-concept Population-concept Protocol-concept Design-concept Outcome-concept Administrative-concept Intervention-concept 122 Concept • Generic-concept – – – – Term-information Time-entity Rule-concept Situation • Population-concept – – – – – Subgroup Recruitment-flowchart Population Recruitment Site-enrollment • Protocol-concept – – – – – – – – – Follow-up-compliance Follow-up-activity Follow-up Protocol-change Treatment-assignment Protocol Reason Outcomes-followup Secondary-study-protocol 123 Concept • Design-concept – – – – – – – – – Survival-analysis-and-results Statistical-analysis-and-results Sample-size-calculation Trial-design Hypothesis-concept Study-objective Study-monitoring Regression-analysis-and-results Stopping-rule • Outcome-concept – – – – – – Special-variable-information Outcome-assessment Miscellaneous-outcome-entity Result-entity Outcome-value-entity Outcome 124 Concept • Administrative-concept – – – – – – – – Publication-concept Study-site Person Ethics Study-committee Funder Institution Registry-id • Intervention-concept – – – – – – – – Blinding-concept Compliance-details Intervention-step Intervention-arm Co-intervention Intervention Compliance-result Intervention-logic 125 Top-Level Class Hierarchy for RCT Root Secondary-study Trial-details Trial Concept • • • • • • • Generic-concept Population-concept Protocol-concept Design-concept Outcome-concept Administrative-concept Intervention-concept 126 Basic Formal Ontology What the top level should look like 127 Two kinds of entities occurrents (processes, events, happenings) continuants (objects, qualities, states...) 128 Continuants (aka endurants) have continuous existence in time preserve their identity through change exist in toto whenever they exist at all Occurrents (aka processes) have temporal parts unfold themselves in successive phases exist only in their phases 129 You are a continuant Your life is an occurrent You are 3-dimensional Your life is 4-dimensional 130 Dependent entities require independent continuants as their bearers There is no run without a runner There is no grin without a cat 131 Dependent vs. independent continuants Independent continuants (organisms, buildings, environments) Dependent continuants (quality, shape, role, propensity, function, status, power, right) 132 All occurrents are dependent entities They are dependent on those independent continuants which are their participants (agents, patients, media ...) 133 BFO Top-Level Ontology Continuant Independent Continuant Occurrent (always dependent on one or more independent continuants) Dependent Continuant 134 = A representation of top-level types Continuant Occurrent biological process Independent Continuant Dependent Continuant cell component molecular function 135 Top-Level Ontology Continuant Independent Continuant Occurrent Dependent Continuant Function Side-Effect, Stochastic Process, ... Functioning 136 Top-Level Ontology Continuant Independent Continuant Dependent Continuant Occurrent Functioning Side-Effect, Stochastic Process, ... Function 137 Top-Level Ontology Continuant Independent Continuant Quality Dependent Continuant Function Occurrent Functioning Side-Effect, Stochastic Process, ... Spatial Region instances (in space and time) 138 Towards a Clinical Trial Ontology To serve merger of data schemas To serve flexibility of collaborative clinical trial research To serve management of clinical trial research To serve data access and reuse 141 CTO will be part of OBI Ontology of Biomedical Investigations http://obi.sourceforge.net which is in turn part of the OBO Foundry http://obofoundry.org 142 Overview of the Ontology of Biomedical Investigations with thanks to Trish Whetzel on behalf of the FuGO Working Group 143 OBI Purpose Provide a resource for the unambiguous description of the components of biomedical investigations such as the design, protocols and instrumentation, material, data and types of analysis on the data NOT designed to model biology Enables Allow consistent annotation of data across different technological and biological domains Enable powerful queries Facilitate semantically-driven data integration 144 Motivation for OBI Standardization efforts in biological and technological domains Standard syntax - Data exchange formats To provide a mechanism for software interoperability, e.g. FuGE Object Model Standard semantics - Controlled vocabularies or ontology Centralize commonalities for annotation term needs across domains to describe an investigation/study/experiment, e.g. FuGO 145 Biomedical Investigation Components Investigation Design Material and It's Characteristics Treatments Sample Analysis Preparation Instrumental Analysis Data Pre-Processing Computational/Higher Level Analysis Describe the design and purpose or general aim of the the Investigation. Describe the material and characteristics. Describe the manipulations or perturbations or observations performed on the material to meet the general aim of the investigation. Describe how the material was prepared for analysis - e.g. labeling, protein digest, etc. Describe the instrument and settings that were used. Describe the results from the instrument, e.g. what units are represented. Describe the type analysis performed to confirm/deny the hypothesis, e.g. 146 clustering. FuGO Development Strategy Decisions Unified Development Pros Pros Overlap of terms is identified early in development Universal/Common terms are defined by all those collaborating Additional technological or biological terms can be added as needed by collaborators Cons Independent Development Time needed to develop the ontology Develop ‘Ontology’ in a time frame limited only by the community Cons Development of different working policies? Use of different top level classes? Overlap of terms at lower levels of the ontology tree 147 FuGO Development Process Collect Use Cases - within community activity Collect examples of investigations as performed within a community and present Use Cases to developers group Bottom up approach - within community activity Identify concepts to describe using controlled terms Collect terms and their definitions Bin terms in the top level ontology structure Top down approach - collaborative activity Build a top level ontology structure, is_a (vertical) relationships Make a list of other foreseen (horizontal) relationships Review how Top Level Nodes fit in with the Upper Level Ontologies 148 FuGO - Top Level Classes Continuant: an entity that endure/remains the same through time Dependent Continuant: depend on another entity E.g. Environment (depend on the set of ranges of conditions, e.g. geographic location) E.g. Characteristics (entity that can be measured, e.g. temperature, unit) - Realizable: an entity that is realizable through a process (executed/run) E.g. Software (a set of machine instructions) E.g. Design (the plan that can be realized in a process) E.g. Role (the part played by an entity within the context of a process) Independent Continuant: stands on its own E.g. All physical entity (instrument, technology platform, document etc.) E.g. Biological material (organism, population etc.) Occurrent: an entity that occurs/unfold in time E.g. Temporal Regions, Spatio-Temporal Regions (single actions or Event) Process E.g. Investigation (the entire ‘experimental’ process) E.g. Study (process of acquiring and treating the biological material) E.g. Assay (process of performing some tests and recording the results) 149 Emerging FuGO Design Principles OBO Foundry ontology, utilize ontology best practices Inherit top level classes from an Upper Level ontology Use of the Relation Ontology Follow additional OBO Foundry principles Facilitates interoperability with other OBO Foundry ontologies Develop recommendations for naming conventions and metadata Format for term names, e.g. underscore vs. camel case, no purals Use of Alphanumeric identifier for terms, I.e. something that does not have semantic meaning Mechanisms for adding synonyms, etc. Open source approach Protégé/OWL Weekly conference calls Shared environment using Sourceforge (SF) and SF mailing lists 150 Future Plans Binning process - ongoing Reconciliations into one canonical version Iterative process Common working practices - established Each class consists of: unique alphanumeric identifier, human readable string name, definition and comments Sourceforge tracker in place to collect comments on terms, definitions, relationships Review ontology so that top level classes meet the needs of all involved ‘communities’ 151 OBI Collaborating Communities Crop sciences Generation Challenge Programme (GCP), www.generationcp.org Environmental genomics MGED RSBI Group, www.mged.org/Workgroups/rsbi Genomic Standards Consortium (GSC), www.genomics.ceh.ac.uk/genomecatalogue HUPO Proteomics Standards Initiative (PSI), psidev.sourceforge.net Immunology Database and Analysis Portal, www.immport.org Immune Epitope Database and Analysis Resource (IEDB), http://www.immuneepitope.org/home.do International Society for Analytical Cytology, http://www.isac-net.org/ Metabolomics Standards Initiative (MSI), msi.workgroups.sourceforge.net Neurogenetics, Biomedical Informatics Research Network (BIRN), www.nbirn.net Nutrigenomics MGED RSBI Group, www.mged.org/Workgroups/rsbi Polymorphism Toxicogenomics MGED RSBI Group, www.mged.org/Workgroups/rsbi Transcriptomics MGED Ontology Group, mged.sourceforge.net/ontologies 152 http://fugo.sourceforge.net 153 http://obi.sourceforge.net 154 155 156 157 158 159 160 161 Top-Level Class Hierarchy for RCT Root Secondary-study Trial-details Trial Concept • • • • • • • Generic-concept Population-concept Protocol-concept Design-concept Outcome-concept Administrative-concept Intervention-concept 162 Amended Top-Level Class Hierarchy for RCT Entity Continuant Population Protocol Design Occurrent Trial Secondary-study Intervention ?? Trial-details ?? Outcome-concept ?? Administrative-concept 163 Concept • Generic-concept – Term-information – Time-entity – Rule-concept » Clinical-rule Exclusion-rule Inclusion-rule » Rule-entity Recursive-rule Base-rule » Ethnicity-language-rule » Age-gender-rule » Situation 164 165 166 Concept • Protocol-concept – – – – – – – – – Follow-up-compliance Follow-up-activity Follow-up Protocol-change Treatment-assignment Protocol Reason Outcomes-followup Secondary-study-protocol 167 Amended Top-Level Class Hierarchy for RCT Entity Continuant Protocol • Secondary-study-protocol Reason Occurrent • Treatment-assignment • Follow-up – Follow-up-activity – Outcomes-follow-up • Protocol-change 168 Concept • Population-concept – – – – – Subgroup Recruitment-flowchart Population Recruitment Site-enrollment 169 Amended Top-Level Class Hierarchy for RCT Entity Continuant Protocol • Secondary-study-protocol Recruitment-flowchart Reason Population • Subgroup Occurrent • Priors – Recruitment – Site-enrollment – Treatment-assignment • Follow-up – Follow-up-activity – Outcomes-follow-up • Protocol-change 170 Concept • Administrative-concept – – – – – – – – Publication-concept Study-site Person Ethics Study-committee Funder Institution Registry-ID 171 Continuant • Information object – Publication – Registry-ID • Study-site • Person • Institution – Study-committee – Funder ???Ethics 172 Concept • Intervention-concept – – – – – – – – Blinding-concept Compliance-details Intervention-step Intervention-arm Co-intervention Intervention Compliance-result Intervention-logic 173 Occurrent • Intervention – – – – Blinding Intervention-step Intervention-arm Co-intervention • ??? Intervention-logic • ??? Compliance-result • ??? Compliance-details 174 175 178