Chemical Ontologies I533 Seminar 21.Feb.2006 Kent Holaday Overview • • • • Define Ontology Knowledge Representations Applications of Ontologies Chemical Ontologies Ontology Defined Merriam-Webster Online Dictionary 1 : a branch of metaphysics concerned with the nature and relations of being 2 : a particular theory about the nature of being or the kinds of existents Ontology Defined (cont.) Google Definitions on the web • An ontology is a controlled vocabulary that describes objects and the relations between them in a formal way, and has a grammar for using the vocabulary terms to express something meaningful within a specified domain of interest. Source: members.optusnet.com.au/~webindexing/Webbook2Ed/glossary.htm • Ontology is the newest label attached to some KOSs. Ontologies are being developed as specific concept models by the Knowledge Management community. They can represent complex relationships between objects, and include the rules and axioms missing from semantic networks. Ontologies that describe knowledge in a specific area are often connected with systems for data mining and knowledge management. Source: www.und.nodak.edu/dept/library/Departments/abc/SACSEM-SemInGlossary.htm Knowledge Representations INFORMAL • Tagging • Folksonomies Examples • del.icio.us • flickr • Yahoo My Web 2.0 FORMAL • Lists • Thesauri • Taxonomies • Ontologies http://www.biowisdom.com/ontology/faq_q1.htm Examples • IUPAC, MeSH, LCSH, XML schema/DTD IUPAC Nomenclature • Compendium of Chemical Terminology carbon Element number 6 of the periodic table of elements (electronic ground state 1s2 2s2 2p2). For a description of the various types of carbon as a solid the term carbon should be used only in combination with an additional noun or a clarifying adjective. See also amorphous carbon, carbon fibres, carbon material, glasslike carbon, graphitic carbon, non-graphitic carbon, pyrolytic carbon. 1995, 67, 479 • Nomenclature of Inorganic Compounds • Nomenclature of Organic Chemistry http://www.iupac.org/publications/books/seriestitles/nomenclature.html 1. 2. 3. 4. MeSH Tree Structures 2006 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. Anatomy [A] Organisms [B] Diseases [C] Chemicals and Drugs [D] oInorganic Chemicals [D01] + oOrganic Chemicals [D02] + oHeterocyclic Compounds [D03] + oPolycyclic Compounds [D04] + oMacromolecular Substances [D05] + oHormones, Hormone Substitutes, and Hormone Antagonists [D06] + oEnzymes and Coenzymes [D08] + oCarbohydrates [D09] + oLipids [D10] + oAmino Acids, Peptides, and Proteins [D12] + oNucleic Acids, Nucleotides, and Nucleosides [D13] + oComplex Mixtures [D20] + oBiological Factors [D23] + oBiomedical and Dental Materials [D25] + oPharmaceutical Preparations [D26] + oChemical Actions and Uses [D27] + Analytical, Diagnostic and Therapeutic Techniques and Equipment [E] Psychiatry and Psychology [F] Biological Sciences [G] Physical Sciences [H] Anthropology, Education, Sociology and Social Phenomena [I] Technology and Food and Beverages [J] Humanities [K] Information Science [L] Persons [M] Health Care [N] Publication Characteristics [V] Geographic Locations [Z] Visual vs Linguistic Caffeine CAS: 58-08-2 C8H10N4O2 Synonyms: • 3,7-dihydro-1,3,7-trimethyl-1H-purine-2,6-dione • Methyltheobromin • guaranine Source: Krallinger, M. et al. (2005) Text-mining approaches in molecular biology and biomedicine. DDT 10(6) 440 Data Sources Structured • Medline • SwissProt • ChemID Plus • Medline Plus • Chemical Abstracts • NCBI databases • Misc. databases Unstructured • Text documents • Journal articles • Lab notebooks • Web pages • Database BLOBs • Email Source: Gardner, S. (2005) Ontologies and semantic data integration. DDT 10(14) 1004 Semantic Web Figure 1: The Semantic Web "layer cake" as presented by Tim Berners-Lee. Source: Hendler, J. (2001) Agents and the semantic web. http://www.cs.umd.edu/users/hendler/AgentWeb.html Chemical Ontology • Describe chemical objects and relationships • Enable the search across multiple data sources • Bridge some of the graphical versus linguistic representations Fragment of Chemical Ontology HO grouped_by_chemistry O molecules organic molecules heterocyclic compounds bridged-ring heterocyclic compounds HO IsA morphinans morphine two-ring heterocyclic compounds isoquinolines isoquinoline alkaloids morphinans morphine Source: Ennis, M. (2004) ChEBI A Dictionary of Chemical Entities with an Associated Ontology. SOFG-2, Philadelphia, October 23-26 2004 N CH3 H morphine IsA NH H morphinan ChEBI: What is it? • Chemical Entities of Biological Interest – an EBI database/dictionary of ‘biochemical compounds’ and other chemical entities of biochemical interest with an associated ontology • ChEBI’s goal is to provide standard terminology of (bio)chemical compounds that should finally be used in biological databases Source: Ennis, M. (2004) ChEBI A Dictionary of Chemical Entities with an Associated Ontology. SOFG-2, Philadelphia, October 23-26 2004 Relationships in ChEBI ontology Current • IsA : inherited from Chemical Ontology - class to class; instance to class To be implemented… • IsPartOf - group to molecule; group to group; group to class • IsEnantiomerOf - molecule to molecule; cycles allowed • IsTautomerOf - molecule to molecule; cycles allowed • IsConjugateBaseOf/IsConjugateAcidOf - molecule to molecule (e.g. anion to acid) • IsParentHydrideOf - molecule to molecule (later?) Source: Ennis, M. (2004) ChEBI A Dictionary of Chemical Entities with an Associated Ontology. SOFG-2, Philadelphia, October 23-26 2004 is_a is_enantiomer_of is_tautomer_of is_part_of is_conjugate_base_of CO2H O OH - H2N O OH O H2C OH H2N O O H2C H CO2¯ NH2 OH H3C H H2N O O - O H3C H H2N H L-Amino acid OH OH O O H2C H - O OH H2C H NH2 OH H3C NH2 H O O O O O H3C H NH2 - NH2 D-Amino acid OH =O Amino acid Source: Ennis, M. (2004) ChEBI A Dictionary of Chemical Entities with an Associated Ontology. SOFG-2, Philadelphia, October 23-26 2004 Source: http://www.cse.buffalo.edu/~rapaport/663/F03/ontology.html