The OBO Foundry Barry Smith 1 History of Ontology as Computational Artifact 1970s: AI (based on FOL: McCarthy, Hayes) 1980s: KR, Knowledge Interchange Formats (Gruber, Hobbs ...) 1999: GO, OBO format (Ashburner, ...) 2000s: Semantic Web (based on OWL; Horrocks, Hendler, 1000 lite ontologies) 2009: Reconciliation of OBO with OWL; but still 2 methodologies: OBO Foundry; NCBO Bioportal 2 Ontology and the Semantic Web • html demonstrated the power of the Web to allow sharing of information • can we use semantic technology to create a Web 2.0 which would allow algorithmic reasoning with online information based on XLM, RDF and above all OWL (Web Ontology Language)? • can we use RDF and OWL to break down silos, and create useful integration of on-line data and information? 3 people tried, but the more they were successful, they more they failed OWL breaks down data silos via controlled vocabularies for the description of data dictionaries Unfortunately the very success of this approach led to the creation of multiple, new, semantic silos – because multiple ontologies are being created in ad hoc ways 4 reasons for this effect • Semantic Web (original) idea: if a million ‘lite ontologies bloom’, then somehow intelligence will be created • let’s all build new ones (shrink-wrapped software mentality – you will not get paid for reusing existing ontologies • requirements-driven software development, promotes forking, reduces potential for secondary uses 5 Ontology success stories, and some reasons for failure • A fragment of the “Linked Open Data” in the biomedical domain 6 What you get with ‘mappings’ HPO: all phenotypes (excess hair loss, duck feet ...) 7 What you get with ‘mappings’ HPO: all phenotypes (excess hair loss, duck feet ...) NCIT: all organisms 8 What you get with ‘mappings’ all phenotypes (excess hair loss, duck feet) all organisms allose (a form of sugar) 9 What you get with ‘mappings’ all phenotypes (excess hair loss, duck feet) all organisms allose (a form of sugar) Acute Lymphoblastic Leukemia (A.L.L.) 10 Mappings are hard They are fragile, and expensive to maintain Need new authorities to maintain(one for each pair of mapped ontologies), yielding new risk of forking – who will police the mappings? The goal should be to minimize the need for mappings, by avoiding redundancy in the first place Invest resources in disjoint ontology modules which work well together – reduce need for mappings to minimum possible 11 Why should you care? • you need to create systems for data mining and text processing which will yield useful digitally coded output • if the codes you use are constantly in need of ad hoc repair huge, resources will be wasted • serious investment in annotation will be defeated from the start • relevant data will not be found, because it will be lost in multiple semantic cemeteries 12 How to do it right? • how create an incremental, evolutionary process, where what is good survives, and what is bad fails • where the number of ontologies needing to be linked is small • where links are stable • create a scenario in which people will find it profitable to reuse ontologies, terminologies and coding systems which have been tried and tested 13 Reasons why GO has been successful It is a system for prospective standardization built with coherent top level but with content contributed and monitored by domain specialists Based on community consensus Updated every night Clear versioning principles ensure backwards compatibility; prior annotations do not lose their value Initially low-tech to encourage users, with movement to more powerful formal approaches (including OWL-DL – though GO community still recommending caution) 14 GO has learned the lessons of successful cooperation • Clear documentation • The terms chosen are already familiar • Fully open source (allows thorough testing in manifold combinations with other ontologies) • Subjected to considerable third-party critique • Tracker for user input with rapid turnaround and help desk 15 GO has been amazingly successful in overcoming the data balkanization problem but it covers only generic biological entities of three sorts: – cellular components – molecular functions – biological processes no diseases, symptoms, disease biomarkers, protein interactions, experimental processes … 16 RELATION TO TIME CONTINUANT INDEPENDENT OCCURRENT DEPENDENT GRANULARITY ORGAN AND ORGANISM Organism (NCBI Taxonomy) CELL AND CELLULAR COMPONENT Cell (CL) MOLECULE Anatomical Organ Entity Function (FMA, (FMP, CPRO) Phenotypic CARO) Quality (PaTO) Cellular Cellular Component Function (FMA, GO) (GO) Molecule (ChEBI, SO, RnaO, PrO) Molecular Function (GO) Biological Process (GO) Molecular Process (GO) OBO (Open Biomedical Ontology) Foundry proposal (Gene Ontology in yellow) 17 RELATION TO TIME CONTINUANT INDEPENDENT OCCURRENT DEPENDENT GRANULARITY CELL AND CELLULAR COMPONENT MOLECULE Anatomical Entity (FMA, CARO) Cell (CL) Cellular Component (FMA, GO) Molecule (ChEBI, SO, RnaO, PrO) Organ Function (FMP, CPRO) environments are here ORGAN AND ORGANISM Organism (NCBI Taxonomy) Phenotypic Quality (PaTO) Biological Process (GO) Cellular Function (GO) Molecular Function (GO) Molecular Process (GO) Environment Ontology 18 RELATION TO TIME CONTINUANT INDEPENDENT OCCURRENT DEPENDENT GRANULARITY COMPLEX OF ORGANISMS ORGAN AND ORGANISM CELL AND CELLULAR COMPONENT MOLECULE Family, Community, Deme, Population Population Phenotype Organ Anatomical Function Organism Entity (FMP, CPRO) (NCBI (FMA, Phenotypic Taxonomy) CARO) Quality (PaTO) Cellular Cellular Cell Component Function (CL) (FMA, GO) (GO) Molecule (ChEBI, SO, RnaO, PrO) Molecular Function (GO) Population-level ontologies Population Process Biological Process (GO) Molecular Process (GO) 19 Ontology success stories, and some reasons for failure • 20 RELATION TO TIME CONTINUANT INDEPENDENT OCCURRENT DEPENDENT GRANULARITY COMPLEX OF ORGANISMS ORGAN AND ORGANISM CELL AND CELLULAR COMPONENT MOLECULE Family, Community, Deme, Population Population Phenotype Organ Anatomical Function Organism Entity (FMP, CPRO) (NCBI (FMA, Phenotypic Taxonomy) CARO) Quality (PaTO) Cellular Cellular Cell Component Function (CL) (FMA, GO) (GO) Molecule (ChEBI, SO, RnaO, PrO) Molecular Function (GO) http://obofoundry.org Population Process Biological Process (GO) Molecular Process (GO) 21 The OBO Foundry: a step-by-step, evidence-based approach to expand the GO Developers commit to working to ensure that, for each domain, there is community convergence on a single ontology and agree in advance to collaborate with developers of ontologies in adjacent domains. http://obofoundry.org 22 OBO Foundry Principles Common governance (coordinating editors) Common training Common architecture to overcome Tim Berners Lee-ism: • simple shared top level ontology • shared Relation Ontology: www.obofoundry.org/ro 23 Open Biomedical Ontologies Foundry Seeks to create high quality, validated terminology modules across all of the life sciences which will be • one ontology for each domain, so no need for mappings • close to language use of experts • evidence-based • incorporate a strategy for motivating potential developers and users • revisable as science advances 24 Principles http://obofoundry.org/wiki/index.php/OBO_Foundry Principles 25 Pistoia Alliance Open standards for data and technology interfaces in the life science research industry consortium of major pharmaceutical and life science companies can we address the data silo problems created by multiplicity of proprietary terminologies by declaring terminology ‘pre-competitive’ require shared use of something like OBO Foundry ontologies in presentation of information? 26 A high-level master taxonomy of included entity-types (drug, disease, target, mechanism, cell) etc and references to which ontology instances describe them Drug type (small molecule, Drug delivery mechanism (oral, Developmental Status antibody, siRNA etc) inhaled, injectable) Disease † Disease Models (and diseaseDisease state (chronic, acute) relevant model assays, e.g. mouse model of huntingtons) Pharmacology measurement type (IC50, IC90, Ki)* Pharmacology: Assay Type (in vitro, cell-free) Target (beta2 receptor, gastric pump) Pharmacology: Assay Conditions (passage number, media?) Mechanism of action (inhibitor, agonist) Species/Strain for experimental models (including plants, fungi etc) Cell type (adipocyte, neuron) Cell state (quiescent, pluripotent,differentiating) Tissue-state (e.g necrosing) Cell Line Genome Variation type (SNP, CNV) Protein (limited to key species?) Protein-protein interaction type (inhibits, phosphorylates) Author ‡ Genomic analysis technique (array, next-gen seq etc) Toxicology observation Bioprocess Tissue Biomarker type (imaging, pharmacological) Gene (limited to key species?) Post-translational modification ADME: Institute (company, university) Brain-region Pathway 27 Virtual Physiological Human 28 Only with a prospective standard like that of the OBO Foundry could something like the VPH work designed to guarantee interoperability of ontologies from the very start (and to keep out weeds) initial set of 10 criteria tested in the annotation of scientific literature model organism databases life science experimental results 29 RELATION TO TIME GRANULARITY INDEPENDENT ORGAN AND ORGANISM Organism (NCBI Taxonomy) CELL AND CELLULAR COMPONENT Cell (CL) MOLECULE CONTINUANT DEPENDENT Anatomical Organ Entity Function (FMA, (FMP, CPRO) Phenotypic CARO) Quality (PaTO) Cellular Cellular Component Function (FMA, GO) (GO) Molecule (ChEBI, SO, RnaO, PrO) OCCURRENT Molecular Function (GO) Organism-Level Process (GO) Cellular Process (GO) Molecular Process (GO) OBO Foundry coverage 30 ORTHOGONALITY modularity ensures • • • • • annotations can be additive division of labor amongst domain experts high value of training in any given module lessons learned in one module can benefit work on other modules incentivization of those responsible for individual modules 31 Benefits of coordination • • • Can more easily reuse what is made by others Can more easily inspect and criticize what is made by others Leads to innovations (e.g. Mireot strategy for importing terms into ontologies) 32 8 Foundry members (2010) CHEBI: Chemical Entities of Biological Interest GO: Gene Ontology PATO: Phenotypic Quality Ontology PRO: Protein Ontology XAO: Xenopus Anatomy Ontology ZFA: Zebrafish Anatomy Ontology 33 RELATION TO TIME CONTINUANT INDEPENDENT OCCURRENT DEPENDENT GRANULARITY ORGAN AND ORGANISM Organism (NCBI Taxonomy) CELL AND CELLULAR COMPONENT Cell (CL) Anatomical Entity (FMA, CARO) Organ Function (FMP, CPRO) Phenotypic Quality XAO ZFA (PaTO) Cellular Component (FMA, GO) Molecule (SO, RnaO) MOLECULE ChEBI PRO Biological Process (GO) Cellular Function (GO) Molecular Function (GO) Molecular Process (GO) Current Foundry members in yellow 34 ORGAN AND ORGANISM Organism NCBI Taxonomy CARO FMA XAO CELL AND CELLULAR COMPONENT Organ Function (FMP, CPRO) Phenotypic Quality ZFA (PaTO) Cell (CL) Cellular Component (FMA, GO) SO RnaO ChEBI PRO MOLECULE Biological Process (GO) Cellular Function (GO) Molecular Function (GO) Molecular Process (GO) Prospective Foundry ontologies (in green): Foundational Model of Anatomy Ontology (FMA) Cell Ontology (CL) Sequence Ontology (SO) RNA Ontology (RnaO) 35 top level mid-level Basic Formal Ontology (BFO) Information Artifact Ontology (IAO) Ontology for Biomedical Investigations (OBI) Anatomy Ontology (FMA*, CARO) Cell Ontology (CL) domain level Cellular Component Ontology (FMA*, GO*) Environment Ontology (EnvO) Subcellular Anatomy Ontology (SAO) Sequence Ontology (SO*) Protein Ontology (PRO*) Ontology of General Medical Science (OGMS) Infectious Disease Ontology (IDO*) Phenotypic Quality Ontology (PaTO) Biological Process Ontology (GO*) Molecular Function (GO*) OBO Foundry Modular Organization 36 Problem cases Common Anatomy Reference Ontology Disease Ontology Function Ontologies Cellular Component Function Cellular Function Organ Function Artifact Function (pumping, transporting ...) Environment Ontology Species Ontology (NCBI Taxonomy) 37 IDO (Infectious Disease Ontology) Core Follows GO strategy of providing a canonical ontology of what is involved in every infectious disease – host, pathogen, vector, virulence, vaccine, transmission – accompanied by IDO Extensions for specific diseases, pathogens and vectors Provides common terminology resources and tested common guidelines for a vast array of different disease communities 38 IDO (Infectious Disease Ontology) Consortium • MITRE, Mount Sinai, UTSouthwestern – Influenza • IMBB/VectorBase – Vector borne diseases (A. gambiae, A. aegypti, I. scapularis, C. pipiens, P. humanus) • Colorado State University – Dengue Fever • Duke University – Tuberculosis, Staph. aureus • Cleveland Clinic – Infective Endocarditis • University of Michigan – Brucilosis • Duke University, University at Buffalo – HIV 39 Ontology for General Medical Science http://code.google.com/p/ogms/ (OBO) http://purl.obolibrary.org/obo/ogms.obo (OWL) http://purl.obolibrary.org/obo/ogms.owl 40 OGMS-based initiatives Vital Signs Ontology (VSO) (Welch Allyn) EHR / Demographics Ontology Infectious Disease Ontology Mental Health Ontology Emotion Ontology 41 Ontology for General Medical Science Jobst Landgrebe (then Co-Chair of the HL7 Vocabulary Group): “the best ontology effort in the whole biomedical domain by far” 42 EXPERIMENTAL ARTIFACTS Ontology for Biomedical Investigations (OBI) CLINICAL MEDICINE Ontology of General Medical Science (OGMS) INFORMATION ARTIFACTS Information Artifact Ontology (IAO) How to keep clear about the distinction • processes of observation, • results of such processes (measurement data) • the entities observed 43