Development of an Application Ontology for Beta Cell Genomics Based On the Ontology for Biomedical Investigations Jie Zheng, Elisabetta Manduchi and Christian J. Stoeckert Jr Department of Genetics, Perelman School of Medicine, University of Pennsylvania ICBO July 2013, Montreal Computational Biology and Informatics Laboratory Beta Cell Genomics Database • http://genomics.betacell.org/gbco/ • A functional genomics resource focused on pancreatic beta cell research supporting a consortium of 62 investigators and their groups • 128 studies (version 4.1) addressing the biology of beta cells, aspects of diabetes, and the production of functional beta cells from – embryonic stem cells – mature cells of other types such as exocrine cells Computational Biology and Informatics Laboratory Desired Features of A Beta Cell Genomics Ontology • Support semantic annotation of beta cell studies with enough granularity covering both biological and experimental aspects – Specimen characteristics, species, strain, anatomical entity, cell type, etc. – Assay, protocol, data analysis methods, etc. • Enable queries of increasing complexity (competency questions) – Find gene expression data of endocrine cells – Find studies using cells which develop from either mesoderm or endoderm – Find high throughput sequencing gene expression data in samples obtained during the embryo stage from mouse strains with genetic background C57BL/6J • Enable knowledge discovery based on computable definitions – Automated cell type classification based on cell phenotype/functions and/or genetic signatures using reasoners • Leverages existing efforts covering the domains of investigations, cells, anatomy, proteins, and genes – OBO Foundry ontologies Computational Biology and Informatics Laboratory OBO Foundry Reference Ontologies • Shared common upper level ontology, Basic Formal Ontology (BFO) and common relations • Orthogonal interoperable ontologies – reuse existing terms defined in OBO Foundry ontologies • Each reference ontology covers a specific domain: – Cell type ontology (CL) : cell type – Gene ontology (GO): biological process, molecular function, cell components – Protein ontology (PR): protein (cross species) – Uber anatomy ontology (UBERON): cross-species anatomy – Ontology for biomedical investigations (OBI): all aspects of an experiments Facilitate ontology integration Computational Biology and Informatics Laboratory Motivation for Developing An Application Ontology for Beta Cell Genomics Research • No single OBO Foundry ontology can meet our needs • No ontology available covers enough granularity needed by beta cell genomics research • Typical use of disconnected multiple ontologies loses semantic power Computational Biology and Informatics Laboratory Principles of Beta Cell Genomics Ontology (BCGO) Development • Reuse terms existing in the OBO Foundry ontologies if possible • Reuse existing ontology design patterns • Use OBI as the ontology framework and integrate subsets of other OBO Foundry ontologies into it • Enrich the ontology with additional axioms when needed Computational Biology and Informatics Laboratory Ontology for Biomedical Investigations (OBI) • Cover all aspects of an investigation • Contains classes that connect OBI with other OBO Foundry reference ontologies, such as CL, UBERON, and GO, and serve as the parent of referenced external terms OBI specimen CL Cell material entity gross anatomical entity molecular entity subClass of is a cultured cell CLO UBERON cellular_component biological_process process assay data transformation data item protocol measurement unit label GO ... information content entity ChEBI UO Computational Biology and Informatics Laboratory Development of BCGO 1. Identification of terms defined in OBO Foundry Ontologies 2. Extraction of terms from OBO Foundry ontologies 3. Integration of terms from different OBO Foundry ontologies 4. Enrichment of BCGO by adding additional terms and axioms Computational Biology and Informatics Laboratory Step 1: Identification of Terms Defined in OBO Foundry Ontologies 1. Draw terms from the MO to OBI mapping list – Beta Cell Genomics Database was annotated using multiple controlled vocabularies and ontologies including the MGED Ontology (MO) 2. Bioportal Annotation Tool – – High accuracy (>95%) May not include the latest version of ontologies 3. Bioportal Search Tool – – Includes partial and exact matches of input text Requires more manual review as compared to the Bioportal Annotation Tool Computational Biology and Informatics Laboratory Most Terms Needed Could Be Matched to Small Subsets of Many Ontologies Ontology Version OBI 2012-07-01 BTO* 12/20/2012 CARO N/A EnVO 2013-01-08 ERO* 2012-10-03 FMA 3.1 GAZ 1.512 MP 07/14/2012 OGMS 2011-09-20 RS 1/14/2013 SO 11/1/2012 SWO 0.5 EFO* 2.31 ChEBI 100 CLO 2.1.03 GO 2012-12-18 NCBITaxon 2013-01-24 PR 31.0. UO 2012-08-30 CL 2013-01-31 PATO 01/09/2013 UBERON 2013-01-07 Total Classes 2042 5391 50 1557 1579 83281 518195 9164 81 3361 2151 661 4057 38901 35436 38747 981148 35488 313 2120 2426 7318 Matched Terms 200 2 1 1 2 1 1 1 3 1 1 1 40 12 11 2 1 1 67 46 19 126 • 852 terms used in the Beta Cell Genomics database • 644 terms were matched to 543 ontology terms • Mapped terms defined in 24 OBO Foundry ontologies including BFO and IAO *: application ontology BTO: BRENDA tissue / enzyme source SWO: Software Ontology CARO: Common Anatomy Reference Ontology EFO: Experimental Factor Ontology EnVO: Environment Ontology ChEBI: Chemical entities of ERO: eagle-i resource ontology biological interest FMA: Foundational Model of Anatomy CLO: cell line ontology GAZ: Gazetteer NCBITaxon: NCBI organismal MP: Mammalian Phenotype classification OGMS: Ontology for General Medical Science PR: protein ontology RS: Rat Strain ontology UO: Units of measurement SO: Sequence types and features PATO: Phenotypic quality Computational Biology and Informatics Laboratory Step 2: Extraction of Terms from OBO Foundry Ontologies • Ontodog tool: OBI subset extraction – Generates a community view including all related terms and axioms Reference: Zheng et al. International Conference on Biomedical Ontology (ICBO), Graz, Austria, July 2012 • OntoFox tool for extracting terms from all other OBO Foundry ontologies – Option 1: MIREOT – Option 2: include minimal intermediate ontology terms – Option 3: all related terms and axioms Reference: Xiang et al. (2010) BMC Research Notes, 3:175 Computational Biology and Informatics Laboratory Extraction Option 1 • Applied when five or less terms in an ontology were used by BCGO • MIREOT: minimum information to reference an external ontology term Reference: Courtot et al. (2011) Applied Ontology, 6:23 – IRI of the term – IRI of the source ontology – IRI of the term parent in the target ontology – Can be done manually Computational Biology and Informatics Laboratory Extraction Option 2 • Keep hierarchical structure with minimal intermediates • Example: reference human, mouse, rat in NCBITaxon MIREOT Include computed intermediate classes Include all intermediate classes … 14 intermediate classes Option 2 Computational Biology and Informatics Laboratory Extraction Option 3 • Reuse logical axioms of terms defined in source ontologies • Example – ontology design pattern of cell in CL Meehan et al. BMC Bioinformatics 2011, 12:6 Computational Biology and Informatics Laboratory Summary of Extraction Methods And Results Computational Biology and Informatics Laboratory Step 3: Integration of Terms Extracted From Different OBO Ontologies (1) Import retrieved terms into OBI subset (BCGO community view) under corresponding parent classes terms of interest In other OBO Foundry ontologies Subset of OBI Beta Cell Genomics view of OBI material entity ontology specimen subset of CL Cell gross anatomical entity molecular entity OntoFox output file cultured cell subset of CLO subClass of subset of UBERON is a cellular_component biological_process process assay data transformation data item protocol measurement unit label subset of GO ... information content entity subset of ChEBI subset of UO - Using OWL:imports - Keep retrieved terms belong to same source ontology in one OWL file - Contains 2389 classes Computational Biology and Informatics Laboratory Step 3: Integration of Terms Extracted From Different OBO Ontologies (2) To avoid inconsistencies caused by integrating terms from different paths we remove textual and logical definitions of terms referenced to external ontologies PATO PATO terms retrieved from OBI deprecated Removal of definitions of PATO terms in retrieved OBI subset Retrieval of definitions from PATO Computational Biology and Informatics Laboratory Summary of Extraction Methods And Results Computational Biology and Informatics Laboratory Step 4: Enrichment of BCGO • 208 terms that could not be matched to OBO Foundry ontologies • 42 new terms have been added into BCGO • Example – ‘insulin-expressing mature beta cell’ insulin secretion detection of glucose mature insulin type B pancreatic cell insulin-expressing mature beta cell islet of Langerhans Meehan et al. BMC Bioinformatics 2011, 12:6 Computational Biology and Informatics Laboratory Ontology Validation • Annotation: 83% terms covered by BCGO • Competency questions can be answered: Find gene expression data of endocrine cells Find studies using cells which develop from either mesoderm or endoderm Find high throughput sequencing gene expression data in samples obtained during the embryo stage from mouse strains with genetic background C57BL/6J • Automated cell type classification: ongoing Computational Biology and Informatics Laboratory Challenges • OBO Foundry ontologies use different versions of upper level ontology – BFO • Inconsistent representation of the same entities in different OBO Foundry ontologies – Example, ‘cell line cell’, alignment work has been done by CL, CLO and OBI developers – Resolution: Alignment work presented in the ICBO poster session with title ‘Alignment of Cultured Cell Modeling Across OBO Foundry Ontologies: Key Outcomes and Insights’ by Dr. Matthew Brush Computational Biology and Informatics Laboratory Summary • BCGO is available on: http://purl.obolibary.org/obo/bcgo.owl • All related documents are available on: http://code.google.com/p/bcgo-ontology/ • Development of a cross-domain application ontology – based on the OBI framework – reuse existent reference ontologies and ontology design patterns • The approach should be generally applicable when using interoperable source ontologies • Orthogonal interoperable OBO Foundry ontologies facilitate ontology integration Computational Biology and Informatics Laboratory Acknowledgements • • • • Emily Greenfest-Allen Matthew Brush And OBI, CLO, CL developers Oliver He and Allen Xiang • NIH grant 1R01GM093132-01 and by 5 U01 DK 072473 Computational Biology and Informatics Laboratory Questions? Computational Biology and Informatics Laboratory Advantages Of Using OntoFox • Provide many different options for ontology terms extractions • Backend RDF store contains all OBO Foundry ontologies and reload daily if updated • Input settings can be saved as a text format file and can be reused Computational Biology and Informatics Laboratory