Barry Smith 1 Three US partner institutions: • Stanford University Biomedical Informatics Research • Mayo Clinic Department of Biomedical Informatics • University at Buffalo Department of Philosophy https://immport.niaid.nih.gov/ 3 Lindsay Cowell 4 http://infectiousdiseaseontology.org 5 How to Build an Ontology Barry Smith http://ontology.buffalo.edu/smith 6 Schedule and Rules for Practicum http://ncorwiki.buffalo.edu/index.php/Immunology_Ontology 7 Why to Build an Ontology? • Scientific data are stored in databases • There are few constraints on the creation of new databases • Scientific data is siloed • How to counteract this silo-formation? • Create a common non-redundant suite of ontologies covering all scientific domains to annotate (‘tag’, ‘curate’) scientific data 8 More precisely: How to build this suite of ontologies? How to build ontologies that will integrate well together? One answer: The Semantic Web 9 integration via Linked Open Data • html demonstrated the power of the Web to allow sharing of information • use power of hyperlinks to break down silos, and create useful integration of online data • via Web Ontology Language (OWL) 10 Not allstories, of the links here Ontology success and are what they seem some reasons for failure • A fragment of the “Linked Open Data” in the biomedical domain 11 The more Semantic Technology is successful, they more it fails to solve the problem of silos Indeed it leads to the creation of multiple, new, semantic silos 12 13 14 15 Ontology success stories, and some reasons for failure • “Linked Open Data” = integration via mappings 16 What you get with ‘mappings’ all phenotypes (excess hair loss, duck feet) 17 What you get with ‘mappings’ HPO: all phenotypes (excess hair loss, duck feet ...) NCIT: all organisms What you get with ‘mappings’ all phenotypes (excess hair loss, duck feet) all organisms allose (a form of sugar) 19 What you get with ‘mappings’ all phenotypes (excess hair loss, duck feet) all organisms allose (a form of sugar) Acute Lymphoblastic Leukemia (A.L.L.) 20 Mappings are hard They are fragile, and expensive to maintain This yields a new risk of forking The goal should be to reduce need for mappings to minimum possible By creating orthogonal ontologies – one ontology for each domain Where to begin? 21 Uses of ‘ontology’ in PubMed abstracts 22 By far the most successful: GO (Gene Ontology) 23 GO provides a controlled system of terms for use in annotating (describing, tagging) data • multi-species, multi-disciplinary, open source • contributing to the cumulativity of scientific results obtained by distinct research communities • compare use of kilograms, meters, seconds in formulating experimental results 24 Hierarchical view representing relations between represented types 25 Anatomical Structure Anatomical Space Organ Cavity Subdivision Organ Cavity Organ Serous Sac Cavity Subdivision Serous Sac Cavity Serous Sac Organ Component Organ Subdivision Pleural Sac Pleural Cavity Parietal Pleura Interlobar recess Organ Part Mediastinal Pleura Pleura(Wall of Sac) Visceral Pleura Mesothelium of Pleura Tissue US $200 mill. invested in literature and data curation using GO over > 12 million annotations relating gene products described in the UniProt, Ensembl and other databases to terms in the GO experimental results reported in > 60,000 scientific journal articles manually annoted by expert biologists using GO 27 GO has learned the lessons of successful cooperation • • • • • • • Based on community consensus Updated every night Clear documentation The terms chosen are already familiar Fully open source Subjected to considerable third-party critique Tracker with rapid turnaround to identify errors and gaps 28 compare: legends for maps 29 compare: legends for diagrams 30 legends for mathematical equations xi = vector of measurements of gene i k = the state of the gene ( as “on” or “off”) θi = set of parameters of the Gaussian model ... (see proposal p. 124f.) legends for chemistry diagrams or chemistry diagrams Prasanna, et al. Chemical Compound Navigator: A Web-Based Chem-BLAST, Chemical Taxonomy-Based Search Engine for Browsing Compounds PROTEINS: Structure, Function, and Bioinformatics 63:907–917 (2006) ontologies are legends for data 33 ontologies are legends for databases GlyProt MouseEcotope sphingolipid transporter activity DiabetInGene GluChem 34 annotation using common ontologies yields integration of databases GlyProt MouseEcotope Holliday junction helicase complex DiabetInGene GluChem 35 GO has limited coverage represents only three groups of biological entities: – cellular components – molecular functions – biological processes and it does not provide representations of proteins, diseases, symptoms, … OPEN BIOMEDICAL ONTOLOGIES FOUNDRY 36 RELATION TO TIME CONTINUANT INDEPENDENT OCCURRENT DEPENDENT GRANULARITY ORGAN AND ORGANISM Organism (NCBI Taxonomy) CELL AND CELLULAR COMPONENT Cell (CL) MOLECULE Anatomical Organ Entity Function (FMA, (FMP, CPRO) Phenotypic CARO) Quality (PaTO) Cellular Cellular Component Function (FMA, GO) (GO) Molecule (ChEBI, SO, RnaO, PrO) Molecular Function (GO) Biological Process (GO) Molecular Process (GO) Original OBO Foundry ontologies (Gene Ontology in yellow) 37 RELATION TO TIME CONTINUANT INDEPENDENT OCCURRENT DEPENDENT GRANULARITY CELL AND CELLULAR COMPONENT MOLECULE Anatomical Entity (FMA, CARO) Cell (CL) Cellular Component (FMA, GO) Molecule (ChEBI, SO, RnaO, PrO) Organ Function (FMP, CPRO) environments are here ORGAN AND ORGANISM Organism (NCBI Taxonomy) Phenotypic Quality (PaTO) Biological Process (GO) Cellular Function (GO) Molecular Function (GO) Molecular Process (GO) Environment Ontology 38 RELATION TO TIME CONTINUANT INDEPENDENT OCCURRENT DEPENDENT GRANULARITY COMPLEX OF ORGANISMS ORGAN AND ORGANISM CELL AND CELLULAR COMPONENT MOLECULE Family, Community, Deme, Population Population Phenotype Population Process Organ Anatomical Function Organism Entity (FMP, CPRO) (NCBI (FMA, Phenotypic Taxonomy) CARO) Quality (PaTO) Cellular Cellular Cell Component Function (CL) (FMA, GO) (GO) Molecule (ChEBI, SO, RnaO, PrO) Molecular Function (GO) http://obofoundry.org Biological Process (GO) Molecular Process (GO) 39 The OBO Foundry: a step-by-step, evidence-based approach to expand the GO Developers commit to working to ensure that, for each domain, there is community convergence on a single ontology and agree in advance to collaborate with developers of ontologies in adjacent domains. http://obofoundry.org 40 OBO Foundry Principles Common governance (coordinating editors) Common training Common architecture • simple shared top level ontology = BFO • shared Relation Ontology: www.obofoundry.org/ro • One ontology for each domain, so no need for mappings 41 Anatomical Structure Anatomical Space Organ Cavity Subdivision Organ Cavity Organ Serous Sac Cavity Subdivision Serous Sac Cavity Serous Sac Organ Component Organ Subdivision Pleural Sac Pleural Cavity Parietal Pleura Interlobar recess Organ Part Mediastinal Pleura Tissue Pleura(Wall of Sac) Visceral Pleura Mesothelium of Pleura 42 For ontologies it is generalizations that are important = types, types, kinds, species 43 Catalog vs. inventory A B C 515287 521683 521682 DC3300 Dust Collector Fan Gilmer Belt Motor Drive Belt 44 types vs. instances 45 names of instances 46 names of types 47 An ontology is a representation of types We learn about types in reality from looking at the results of scientific experiments in the form of scientific theories experiments relate to what is particular science describes what is general types object organism animal mammal cat siamese instances frog 3 kinds of (binary) relations Between types • human is_a mammal • human heart part_of human Between an instance and a type • this human instance_of the type human • this human allergic_to the type tamiflu Between instances • Mary’s heart part_of Mary • Mary’s aorta connected_to Mary’s heart 50 Type-level relations presuppose the underlying instance-level relations A is_a B =def. A and B are types and all instances of A are instances of B A part_of B =def. All instances of A are instance-level-parts-of some instance of B 51 Anatomical Structure Anatomical Space Organ Cavity Subdivision Organ Cavity Organ Serous Sac Cavity Subdivision Serous Sac Cavity Serous Sac Parietal Pleura 52 Organ Component Organ Subdivision Pleural Sac Pleural Cavity Interlobar recess Organ Part Mediastinal Pleura Pleura(Wall of Sac) Visceral Pleura Mesothelium of Pleura Tissue The assertions linking terms in ontologies must hold universally Hence all type-level relations are provided with All-Some definitions A has-part B =def. All As have some B as instance-level part A part-of B = def. All As are instance-level parts of some B 53 Ontology for Biomedical Investigations OBI representation of a trial in a neuroscience study OBI representation of a vaccine protection investigation 57 Examples of ontology classes (types) used in these examples Ontology relations BFO Top-Level Ontology Continuant Independent Continuant Occurrent (always dependent on one or more independent continuants) Dependent Continuant 60 RELATION TO TIME GRANULARITY INDEPENDENT ORGAN AND ORGANISM Organism (NCBI Taxonomy) CELL AND CELLULAR COMPONENT Cell (CL) MOLECULE CONTINUANT DEPENDENT Anatomical Organ Entity Function (FMA, (FMP, CPRO) Phenotypic CARO) Quality (PaTO) Cellular Cellular Component Function (FMA, GO) (GO) Molecule (ChEBI, SO, RnaO, PrO) OCCURRENT Molecular Function (GO) Organism-Level Process (GO) Cellular Process (GO) Molecular Process (GO) OBO Foundry coverage 61 Two kinds of entities occurrents (processes, events, happenings) continuants (objects, qualities, states...) 62 You are a continuant Your life is an occurrent You are 3-dimensional Your life is 4-dimensional 63 Dependent entities require independent continuants as their bearers There is no run without a runner There is no grin without a cat 64 the universal red instantiates the universal eye instantiates this particular case depends_on an instance of eye of redness (of a (in a particular fly) particular fly eye) Phenotype Ontology (PATO) 65 color is_a red instantiates the particular case of redness (of a particular fly eye) anatomical structure is_a eye instantiates an instance of an depends on eye (in a particular fly) 66 Dependent vs. independent continuants Independent continuants (organisms, buildings, environments) Dependent continuants (quality, shape, role, propensity, function, status, power, right) 67 All occurrents are dependent entities They are dependent on those independent continuants which are their participants (agents, patients, media ...) 68 BFO Top-Level Ontology Continuant Independent Continuant Occurrent (always dependent on one or more independent continuants) Dependent Continuant 69 Blinding Flash of the Obvious (BFO) Continuant Independent Continuant Dependent Continuant thing quality .... ..... Occurrent process ....... 70 OBO Foundry organized in terms of Basic Formal Ontology Each Foundry ontology can be seen as an extension of a single upper level ontology (BFO) either post hoc, as in the case of the GO or in virtue of creation ab initio via downward population from BFO 71 top level Basic Formal Ontology (BFO) Ontology for Biomedical Investigations (OBI) Information Artifact Ontology mid-level (IAO) Anatomy Ontology (FMA*, CARO) domain level Cell Ontology (CL) Cellular Component Ontology (FMA*, GO*) Environment Ontology (EnvO) Subcellular Anatomy Ontology (SAO) Sequence Ontology (SO*) Protein Ontology (PRO*) Spatial Ontology (BSPO) Infectious Disease Ontology (IDO*) Phenotypic Quality Ontology (PaTO) Biological Process Ontology (GO*) Molecular Function (GO*) Extension Strategy + Modular Organization 72 Principle of Low Hanging Fruit Include even absolutely trivial assertions (assertions you know to be universally true) pneumococcal bacterium is_a bacterium Computers need to be led by the hand 73 Principle of singular nouns Terms in ontologies represent types Goal: Each term in an ontology should represent exactly one type Thus every term should be a singular noun 74 MeSH MeSH Descriptors Index Medicus Descriptor Anthropology, Education, Sociology and Social Phenomena (MeSH Category) Social Sciences Political Systems National Socialism National Socialism is_a Political Systems National Socialism is_a Anthropology ... 75 Principle: do not confuse words with things mouse =def. common name for the species mus musculus swimming is healthy and has eight letters 76 Principle of Aristotelian definitions All definitions should be of the form an S = Def. a G which Ds where ‘G’ (for: genus) is the parent term of S (for: species) in the corresponding reference ontology For example: A human being is an animal which is rational 77 Single Inheritance No kind in a classificatory hierarchy should be asserted to have more than one is_a parent on the immediate higher level 78 Multiple Inheritance thing car blue thing is_a is_a blue car 79 Multiple Inheritance is a source of errors encourages laziness serves as obstacle to integration with neighboring ontologies hampers use of Aristotelian methodology for defining terms hampers use of statistical search tools 80 Multiple Inheritance thing blue thing car is_a1 is_a2 blue car 81 Principle of asserted single inheritance Each reference ontology module should be built as an asserted monohierarchy (a hierarchy in which each term has at most one parent) Asserted hierarchy vs. inferred hierarchy 82 Ontology Development Principles Reference ontologies – capture generic content and are designed for aggressive reuse in multiple different types of context Single inheritance Single reference ontology for each domain of interest Application ontologies – created by combining local content with generic content taken from relevant reference ontologies top level Basic Formal Ontology (BFO) Ontology for Biomedical Investigations (OBI) Information Artifact Ontology mid-level (IAO) Anatomy Ontology (FMA*, CARO) domain level Cell Ontology (CL) Cellular Component Ontology (FMA*, GO*) Environment Ontology (EnvO) Subcellular Anatomy Ontology (SAO) Sequence Ontology (SO*) Protein Ontology (PRO*) Spatial Ontology (BSPO) Infectious Disease Ontology (IDO*) Phenotypic Quality Ontology (PaTO) Biological Process Ontology (GO*) Molecular Function (GO*) OBO Foundry: Downward Population from BFO84/24 Example: The Cell Ontology How to build an ontology • import BFO into ontology editor (Protégé) • work with domain experts to create an initial midlevel classification • find ~50 most commonly used terms corresponding to types in reality • arrange these terms into an informal is_a hierarchy according to this universality principle • A is_a B every instance of A is an instance of B • fill in missing terms to give a complete hierarchy • (leave it to domain experts to populate the lower levels of the hierarchy) 86 BFO Top-Level Ontology Continuant Independent Continuant Dependent Continuant Material Entity Attribute Occurrent (always dependent on one or more independent continuants) Process 87 BFO Top-Level Ontology Continuant Process Material Entity Attribute 88 http://ncorwiki.buffalo.edu/index.php/ Immunology_Ontology 89 Terms for an Allergy Ontology peanut allergy disease = IgE-mediated hypersensitivity to peanut allergen(s) allergic asthma anaphylaxis (as reaction) peanut allergy disorder = Mast cells and basophils with peanut allergen-specific IgE bound to their membranes mast cell/basophil degranulation anaphylaxis (as syndrome) peanut allergy allergic reaction milk allergy acute urticaria (hives) ragweed allergy allergic angioedema dust mite allergy allergic rhinitis 90 From the allergy example 91 From the allergy example 92 93