Developing and Using the Gene Ontology Midori A. Harris EBI & GO Consortium RDF, Ontologies and Meta-Data Workshop 7–9 June 2006 GO project aims • • • Compile structured vocabularies describing aspects of molecular biology Describe gene products using vocabulary terms (annotation) Provide public resource of data and tools: • to query and modify the vocabularies and annotations • annotation tools for curators GO Scope: 3 domains • ‣Molecular Function — elemental activity or task nuclease, DNA binding, transcription factor • Biological Process — broad objective or goal ‣ mitosis, signal transduction, metabolism • ‣Cellular Component — location or complex nucleus, ribosome, origin recognition complex ✴ ‘Normal’ functions and processes only: ‣ ‣ No pathological processes No experimental conditions GO Structure: DAG cell is_a part_of membrane mitochondrion lysosomal membrane mitochondrial membrane GO is a directed acyclic graph (DAG): a term can have one or more parent(s) Applications of GO • Gene product annotation • model organism databases • genome sequence analysis • Expression data analysis • Text mining • More ... GO Annotation • What is GO annotation? • An annotation is a statement that a gene product … ‣ … has a particular molecular function ‣ … is involved in a particular biological process ‣ … is located within a certain cellular component • … as determined by a particular method • … as described in a particular reference GO Annotation • What is GO annotation? • An annotation is a statement that a gene product … ‣ … has a particular molecular function ‣ … is involved in a particular biological process ‣ … is located within a certain cellular component • … as determined by a particular method • … as described in a particular reference gene product GO Annotation • What is GO annotation? • An annotation is a statement that a gene product … ‣ … has a particular molecular function ‣ … is involved in a particular biological process ‣ … is located within a certain cellular component • … as determined by a particular method • … as described in a particular reference } gene product GO terms GO Annotation • What is GO annotation? • An annotation is a statement that a gene product … ‣ … has a particular molecular function ‣ … is involved in a particular biological process ‣ … is located within a certain cellular component • … as determined by a particular method • … as described in a particular reference } gene product GO terms evidence GO Annotation • What is GO annotation? • An annotation is a statement that a gene product … ‣ … has a particular molecular function ‣ … is involved in a particular biological process ‣ … is located within a certain cellular component • … as determined by a particular method • … as described in a particular reference } gene product GO terms evidence reference GO Annotation • Anatomy of an annotation: • Database object (gene or gene product) • GO term ID • Reference ID • Evidence code • IDA, IPI, IMP, IEP, IGI, ISS, IEA,TAS, NAS, or IC • Optional and supporting items • e. g. term qualifiers, gene product names, etc. GO Annotation • Annotation methods • Electronic annotation • sequence similarity • transitive annotation • nomenclature, other text matching ➡ high quantity but low quality • Manual annotation • literature curation by biologists ➡ time consuming but high quality GO Annotation cell membrane mitochondrion lysosomal membrane mitochondrial membrane Annotate to any level within GO DAG GO Annotation cell membrane mitochondrion lysosomal membrane mitochondrial membrane Annotate to any level within GO DAG grx2 GO Annotation cell membrane mitochondrion lysosomal membrane mitochondrial membrane Annotate to any level within GO DAG grx2 mgr2 GO & gene expression GO & gene expression GO & gene expression D.R. Williams et al. (2005) Physiol Genomics 24(1): 13–22 (Fig. 2) GO & gene expression Genes D.R. Williams et al. (2005) Physiol Genomics 24(1): 13–22 (Fig. 2) GO & gene expression Genes D.R. Williams et al. (2005) Physiol Genomics 24(1): 13–22 (Fig. 2) GO & gene expression Genes GO biological process D.R. Williams et al. (2005) Physiol Genomics 24(1): 13–22 (Fig. 2) Why GO Changes • Advances in biology • New groups join, requiring new terms or different relationships between terms • Update legacy terms • Improve logical consistency How GO Changes: Overview of the GO Editorial Procedure curators annotators editorial process tracking system - checklist of steps; discussion - document in tracker - implement using OBO-Edit and CVS anyone else changes made public Sources of Ideas • Curators – implement decisions from meetings; ongoing maintenance; specific areas of interest • Annotators – using GO to describe gene products • Interest groups – involving experts in their areas of expertise • Computation – parsing, reasoning GO Interest groups • Cover specific topics within GO • Include GO curators and outside experts • Group members develop portions of the • ontology, then report to GO Consortium Sample topics: • metabolism • developmental biology • plant biology • transcription GO Interest Groups Ontology Rules • • • (examples) Univocity • A term (or relationship) should mean the same thing every time it’s used Positivity • Complements of classes are not themselves classes; i.e., terms such as ‘non-mammal’ or ‘nonmembrane’ do not designate genuine classes Objectivity • Which classes exist does not depend on our biological knowledge; terms such as 'unknown', 'unclassified' or 'unlocalized' are thus unsuitable Univocity: GO uses one term and many characterized synonyms Tactition Taction Tactile sense perception of touch ; GO:0050975 The Challenge of Univocity: People use the same words to describe different things = bud initiation = bud initiation = bud initiation Bud initiation? How is a computer to know? Univocity: GO adds “sensu” descriptors to discriminate among organisms = bud initiation sensu Metazoa = bud initiation sensu Saccharomyces = bud initiation sensu Viridiplantae The Challenge of Positivity centriole • Some organelles are membrane-bound. • A centrosome is not a membrane bound organelle, but it still may be considered an organelle. The Challenge of Positivity: Sometimes absence is a distinction in a biologist’s mind centriole non-membrane-bound organelle GO:0043228 nucleus membrane-bound organelle GO:0043227 The Challenge of Positivity: Sometimes absence is a distinction in a biologist’s mind Note the logical difference between “non-membrane-bound organelle” and “not a membrane-bound organelle” centriole nucleus The latter includes membrane-bound everything that isorganelle non-membrane-bound a membrane bound organelle! organellenot GO:0043228 GO:0043227 Ontology alignment One of the current goals of GO is to align: Cell Types in GO with keratinocyte differentiation fat cell differentiation synonym: adipocyte differentiation dendritic cell activation lymphocyte proliferation T cell homeostasis garland cell differentiation heterocyst differentiation Cell Types in the Cell Ontology keratinocyte fat cell synonym: adipocyte dendritic cell lymphocyte T cell garland cell heterocyst GO + CL: Alignment will permit the generation of consistent and complete definitions GO + id: CL:0000062 name: osteoblast def: "A bone-forming cell which secretes an extracellular matrix. Hydroxyapatite crystals are then deposited into the matrix to form bone." [MESH:A.11.329.629] is_a: CL:0000055 relationship: develops_from CL:0000008 relationship: develops_from CL:0000375 Cell type Osteoblast differentiation: Processes whereby an osteoprogenitor cell or a cranial neural crest cell acquires the specialized features of an osteoblast, a bone-forming cell which secretes extracellular matrix. New Definition = GO + CL: Consistent and complete definitions id: GO:0001649 name: osteoblast differentiation synonym: osteoblast cell differentiation genus: differentiation GO:0030154 (differentiation) differentium: acquires_features_of CL:0000062 (osteoblast) definition (text): Processes whereby a relatively unspecialized cell acquires the specialized features of an osteoblast, the mesodermal cell that gives rise to bone Formal definitions with necessary and sufficient conditions, in both human readable and computer readable forms Other Ontologies that can be aligned with GO ontologies • Chemical 3,4-dihydroxy-2-butanone-4-phosphate synthase activity • ontologies • Anatomy metanephros development • itself • GOmitochondrial inner membrane peptidase activity • Acknowledgements • The GO Editorial Office • Jane Lomax • Amelia Ireland • Jennifer Clark • GO Consortium members • special thanks to David Hill (MGI) for slides • Funding: The Gene Ontology Consortium is supported by a P41 grant from the National Human Genome Research Institute (NHGRI), and has been supported by grants from the European Union RTD Programme "Quality of Life and Management of Living Resources." The Gene Ontology project also thanks AstraZeneca for financial support. The GO Consortium The GO Consortium www.geneontology.org The GO Consortium www.geneontology.org