Principles for Building Biomedical Ontologies Talk delivered by Jennifer Clark, GO Editorial Office Clark et al., 2005 is_a part_of Clark et al., 2005 Slides and content by: Barry Smith http://ifomis.de •formal ontology •information science •special reference to the bio-medical domain. Rama Balakrishnan, David Hill, Jennifer Clark. http://www.geneontology.org The Rules 1. Univocity 2. Positivity 3. Objectivity 4. Single Inheritance 5. Intelligibility of Definitions 6. Basis in Reality classes GO terms, types, kinds, universals instances annotated gene product attributes, tokens, individuals, particulars 1 Univocity: Terms should have the same meanings on every occasion of use The Challenge of Univocity: People use the same words to describe different things = bud initiation = bud initiation = bud initiation Bud initiation? How is a computer to distinguish? Univocity: GO adds “sensu” descriptors to discriminate among organisms = bud initiation sensu Metazoa = bud initiation sensu Saccharomyces = bud initiation sensu Viridiplantae The Challenge of Univocity: People call the same thing by different names Tactition Taction ? Tactile sense Univocity: GO uses 1 term and many characterized synonyms Tactition Taction Tactile sense perception of touch ; GO:0050975 Univocity in part_of relation ‘is at times part of’ antlers part_of red deer ‘necessarily is_part’ Seed dormancy part_of seed development ‘necessarily has_part’ Plant embryo part_of seed 2 Positivity: The complements of classes are not themselves classes. Vertebrates http://www.cucco.org/CatPictures/Cat%20Nap.jpg non-vertebrates Vertebrates http://www.cucco.org/CatPictures/Cat%20Nap.jpg non-vertebrates Vertebrates http://www.cucco.org/CatPictures/Cat%20Nap.jpg http://www.digibarn.com/collections/systems/canon-cat/Image53.jpg non-vertebrates Vertebrates http://www.cucco.org/CatPictures/Cat%20Nap.jpg http://www.digibarn.com/collections/systems/canon-cat/Image53.jpg Set of all things Vertebrates http://www.cucco.org/CatPictures/Cat%20Nap.jpg Set of all organisms Invertebrates Vertebrates http://www.cucco.org/CatPictures/Cat%20Nap.jpg Set of all organisms Invertebrates Vertebrates http://www.cucco.org/CatPictures/Cat%20Nap.jpg http://www.artalyst.com/files/userimages/user70/21088058-O.preview.jpg Set of all organisms membrane-bound organelle GO:0043227 Non-membrane bound organelle V. Not a membrane bound organelle Non-membrane bound organelles A centrosome is not a membrane bound organelle, but it still may be considered an organelle. 3 Objectivity: The existence of classes is not dependent on our biological knowledge. ‘unlocalised’ ‘unknown’ ‘unclassified’ http://news.bbc.co.uk/1/hi/sci/tech/4501152.stm do not designate biological natural kinds. Task: Annotate molecular function of 10-4, a gene from Drosophila melanogaster Molecular function ontology Annotations molecular function is_a molecular function unknown 10-4 Molecular function ontology molecular function is_a molecular function unknown Annotations 10-4 Molecular function ontology molecular function is_a molecular function unknown Annotations 10-4 4 Single Inheritance: No class in a classification hierarchy should have more than one is_a parent on the immediate higher level Clark et al., 2005 is_a part_of Rule of Single Inheritance no diamonds: C is_a2 B is_a1 A Problems with multiple inheritance B C is_a1 is_a2 A ‘is_a’ no longer univocal (univocal: having only one meaning) Is_a diamond in GO Process behavior locomotory behavior is_a larval behavior larval locomotory behavior behavior behavior of a thing descriptive behavior is_a locomotory behavior larval behavior larval locomotory behavior Is_a diamond in GO Process behavior locomotory behavior larval behavior is_a2 is_a1 larval locomotory behavior 5 Intelligibility of Definitions: The terms used in a definition should be simpler than the term to be defined cellular process Is_a cell differentiation part_of cell fate Specification cell development cell differentiation is_a osteoblast differentiation neuron differentiation adipocyte differentiation keratinocyte differentiation garland cell differentiation ‘X cell differentiation’ Essence = Genus + Differentiae Genus: differentiation Differentiae: a neuron (or x cell) X cell differentiation X cell differentiation Differentiation of an x cell. X cell differentiation The process whereby a relatively unspecialized cell acquires specialized features of an x cell. [List characteristics of x cell.] Process ontology Cell Ontology cone cell fate commitment retinal_cone_cell keratinocyte differentiation keratinocyte adipocyte differentiation fat_cell dendritic cell activation dendritic_cell [Term] id: GO:0030182 name: neuron differentiation namespace: biological_process def: "The process whereby a relatively unspecialized cell acquires specialized features of a neuron." [GO:mah] is_a: GO:0030154 ! cell differentiation relationship: part_of GO:0048699 ! neurogenesis [Term] id: CL:0000540 name: neuron def: "The basic cellular unit of nervous tissue. Each neuron consists of a body\, an axon\, and dendrites. Their purpose is to receive\, conduct\, and transmit impulses in the nervous system." [MESH:A.08.663] xref_analog: FBbt:00005106 xref_analog: FBbt:00005146 is_a: CL:0000393 ! electrically responsive cell is_a: CL:0000404 ! electrically signaling cell relationship: develops_from CL:0000031 ! neuroblast [Term] id: GO:0030182 name: neuron differentiation namespace: biological_process def: "The process whereby a relatively unspecialized cell acquires specialized features of a neuron. The basic cellular unit of nervous tissue. Each neuron consists of a body\, an axon\, and dendrites. Their purpose is to receive\, conduct\, and transmit impulses in the nervous system." [MESH:A.08.663, GO:mah] is_a: GO:0030154 ! cell differentiation intersection_of: is_a GO:0030154 ! cell differentiation intersection_of: has_participant CL:0000540 ! neuron Other Ontologies that can be aligned with GO Chemical ontologies 3,4-dihydroxy-2-butanone-4-phosphate synthase activity Anatomy ontologies metanephros development But Eventually… Building Ontology Improve Collaborate and Learn 6 Basis in Reality: When building or maintaining an ontology, always think carefully about how classes relate to instances in reality Catwoman strength, speed, agility and ultra-keen senses of a cat. http://home.austarnet.com.au/davekimble/catwoman.jpg superman strength flight x-ray vision leaps over tall buildings in a single bound http://www.uncleodiescollectibles.com/doesnotcompute/2004-10-11/Actor%20Christoper%20Reeve.jpg Ontology cartoon character super power ontology super senses super physical powers is_a x-ray vision Annotations cat senses super leaping super strength Ontology cartoon character super power ontology super senses super physical powers is_a x-ray vision Superman cat senses super leaping Catwoman Catwoman Superman Annotations super strength Ontology cartoon character super power ontology super senses super physical powers is_a x-ray vision Superman’s X-ray vision Annotations cat senses Catwoman’s cat senses super leaping Superman’s super leaping super strength Catwoman’s super strength Ontology cartoon character super power ontology is_a super senses Superman’s X-ray vision Annotations Catwoman’s cat senses super physical powers Superman’s super leaping Catwoman’s super strength molecular function Ontology binding tetrapyrrole binding cofactor binding is_a chlorophyll binding PSBI Annotations heme binding coenzyme binding quinone binding PSBI molecular function Ontology binding tetrapyrrole binding cofactor binding is_a chlorophyll binding PSBI’s chlorophyll binding function Annotations heme binding coenzyme binding quinone binding PSBI’s quinone binding function The Rules 1. 2. 3. 4. 5. 6. Univocity: Terms should have the same meanings on every occasion of use Positivity: Terms such as ‘non-mammal’ or ‘nonmembrane’ do not designate genuine classes. Objectivity: Terms such as ‘unknown’ or ‘unclassified’ or ‘unlocalized’ do not designate biological natural kinds. Single Inheritance: No class in a classification hierarchy should have more than one is_a parent on the immediate higher level Intelligibility of Definitions: The terms used in a definition should be simpler (more intelligible) than the term to be defined Basis in Reality: When building or maintaining an ontology, always think carefully about how classes relate to instances in reality END spare slides follow. How to define A is_a B A is_a B =def. 1. A and B are names of universals (natural kinds, types) in reality 2. all instances of A are as a matter of biological science also instances of B True path violation What is it? nucleus Part_of relationship chromosome Is_a relationship Mitochondrial chromosome True path violation What is it? nucleus Part_of relationship Nuclear chromosome chromosome Is_a relationships Mitochondrial chromosome The Importance of synonyms for utility: How do we represent the function of tRNA? Molecular_function Triplet_codon amino acid adaptor activity GO Definition: Mediates the insertion of an amino acid at the correct point in the sequence of a nascent polypeptide chain during protein synthesis. Synonym: tRNA Main obstacle to integration Current ontologies do not deal well with Time and Space and Instances (particulars) Our definitions should link the terms in the ontology to instances in spatiotemporal reality 7 Distinguish Universals and Instances Don’t forget instances when defining relations part_of as a relation between classes versus part_of as a relation between instances nucleus part_of cell your heart part_of you Slides and content by: Barry Smith http://ifomis.de •formal ontology •information science •special reference to the bio-medical domain. Rama Balakrishnan, David Hill, Jennifer Clark. http://www.geneontology.org