Ontology Tutorial Part 1 What is Ontology and What Can It Do? Barry Smith http://ontology.buffalo.edu/smith 1 The problem of data integration / information fusion About 30,000 genes in a human Probably 100-200,000 proteins Individual variation in most genes 100s of cell types 100,000s of disease types 2 Musculo-skeletal system Circulatory system Respiratory system Digestive system Nervous system Urinary system Reproductive system Endocrine system Lymphoidal system Organism Organ Tissue Muscle tissue Nerve tissue Connective tissue Epithelial tissue Blood Cell Organelle Mitochondria Nucleus Endoplasmic reticulum Cell membrane Protein DNA 3 The Challenge Each (clinical, pathological, genetic, proteomic, pharmacological …) information system uses its own terminology and category system biomedical research demands the ability to navigate through all such information systems How can we overcome the incompatibilities which become apparent when data from distinct sources is combined? 4 Answer: “Ontology” 5 Three senses of ontology 1. Philosophical sense: an inventory of the types of entities and relations in reality 2. Knowledge engineering sense: an ontology as a consensus representation of the concepts used in a given domain (Semantic Web) 3. Ontology as controlled vocabulary (Gene Ontology, Open Biological Ontologies Consortium) 6 Three senses of ontology 1. Philosophical sense: an inventory of the types of entities and relations in reality 2. Knowledge engineering sense: an ontology as a consensus representation of the concepts used in a given domain (Semantic Web) 3. Ontology as controlled vocabulary (Gene Ontology, Open Biological Ontologies Consortium) 7 Ontology as a branch of philosophy seeks to establish the basic formal-ontological structures the kinds and structures of objects, properties, events, processes and relations in each material domain of reality 8 Formal ontology an analogue of pure mathematics Can be applied to different domains 9 Material ontology a kind of generalized chemistry or zoology (Aristotle’s ontology grew out of biological classification) 10 Aristotle world’s first ontologist 11 World‘s first ontology (from Porphyry’s Commentary on Aristotle’s Categories) 12 Linnaean Ontology 13 Formal Ontology – theory of part and whole – theory of dependence / unity – theory of boundary, continuity and contact – theory of universals and instances – theory of continuants and occurrents (objects and processes) – theory of functions and functioning – theory of granularity 14 Formal Ontology the theory of those ontological structures (such as part-whole, universal-particular) which apply to all domains whatsoever 15 Formal-Ontological Categories substance process function unity plurality site dependent part independent part are able to form complex structures in nonarbitrary ways joined by relations such as part, dependence, location. 16 A Network of Domain Ontologies Basic Formal Ontology Material (Regional) Ontologies 17 18 Three senses of ontology 1. Philosophical sense: an inventory of the types of entities and relations in reality 2. Knowledge engineering sense: an ontology as a consensus representation of the concepts used in a given domain (Semantic Web) 3. Ontology as controlled vocabulary (Gene Ontology, Open Biological Ontologies Consortium) 19 Assumptions Communication / compatibility problems should be solved automatically (by machine) Hence ontologies must be applications running in real time 20 Application ontology: Ontologies are inside the computer thus subject to severe constraints on expressive power (effectively the expressive power of Description Logic) 21 Problem: Confusion of concepts and entities in reality Don’t construct theories of reality; construct ‘models’ of ‘concepts’ 22 Ontology in the Knowledge Engineering Sense The Semantic Web 23 A new silver bullet 24 The Semantic Web designed to integrate the vast amounts of heterogeneous online data and services via dramatically better support at the level of metadata designed to yield the ability to query and integrate across different conceptual systems 25 Tim Berners-Lee, inventor of the internet ‘sees a more powerful Web emerging, one where documents and data will be annotated with special codes allowing computers to search and analyze the Web automatically. The codes … are designed to add meaning to the global network in ways that make sense to computers’ 26 hyperlinked vocabularies, called ‘ontologies’ will be used by Web authors ‘to explicitly define their words and concepts as they post their stuff online. ‘The idea is the codes would let software "agents" analyze the Web on our behalf, making smart inferences that go far beyond the simple linguistic analyses performed by today's search engines.’ 27 Exploiting tools such as: XML OWL (Ontology Web Language) RDF (Resource Descriptor Framework) DAML-OIL (Darpa Agent Mark-Up Language – Ontology Inference Layer) (confusing syntactic integration with semantic integration) 28 Ontology confused with: the language of ontology ‘Ontology’ for semantic webbers is without content Philosophical ontology = build a theory of reality Semantic-web-style ontology = build a model of the data in our computers 29 Defining ‘gene’ GDB: a gene is a DNA fragment that can be transcribed and translated into a protein Genbank: a gene is a DNA region of biological interest with a name and that carries a genetic trait or phenotype 30 Example: The Enterprise Ontology A Sale is an agreement between two Legal-Entities for the exchange of a Product for a Sale-Price. A Strategy is a Plan to Achieve a high-level Purpose. A Market is all Sales and Potential Sales within a scope of interest. 31 Example: Statements of Accounts Company Financial statements may be prepared under either the (US) GAAP or the (European) IASC standards These allocate cost items to different categories depending on the laws of the countries involved. 32 Job: to develop an algorithm for the automatic conversion of income statements and balance sheets between the two systems. Not even this relatively simple problem has been satisfactorily resolved … why not? Because the very same terms mean different things and are applied in different ways in different cultures 33 The Semantic Web Initiative The Web is a vast edifice of heterogeneous data sources Needs the ability to query and integrate across different conceptual systems 34 How resolve incompatibilities? enforce terminological compatibility via standardized term hierarchies, with standardized definitions of terms, which 1. satisfy the constraints of a description logic (DL) 2. are applied as meta-tags to the content of websites 35 Clay Shirky The Semantic Web is a machine for creating syllogisms. Humans are mortal Greeks are human Therefore, Greeks are mortal 36 Lewis Carroll - No interesting poems are unpopular among people of real taste - No modern poetry is free from affectation - All your poems are on the subject of soapbubbles - No affected poetry is popular among people of real taste - No ancient poetry is on the subject of soapbubbles Therefore: All your poems are bad. 37 the promise of the Semantic Web it will improve all the areas of your life where you currently use syllogisms 38 We can use the Semantic Web to prove that Joe loves Mary we found two documents on a trusted site, one of which said that ":Joe :loves :MJS", and another of which said that ":MJS daml:equivalentTo :Mary". We also got the checksums of the files in person from the maintainer of the site. To check this information, we can list the checksums in a local file, and then set up some FOPL rules that say "if file 'a' contains the information Joe loves mary and has the checksum md5:0qrhf8q3hfh, then record SuccessA", "if file 'b' contains the information MJS is equivalent to Mary, and has the checksum md5:0892t925h, then record SuccessB", and "if SuccessA and SuccessB, then Joe loves Mary". [http://infomesh.net/2001/swintro/] 39 Merging Databases Merging databases simply becomes a matter of recording in RDF somewhere that "Person Name" in your database is equivalent to "Name" in my database, and then throwing all of the information together and getting a processor to think about it. [http://infomesh.net/2001/swintro/] Is your "Person Name = John Smith" the same person as my "Name = John Q. Smith"? Who knows? Not the Semantic Web 40 XML-syntax does not help <BUSINESS-CARD> <FIRSTNAME>Jules</FIRSTNAME> <LASTNAME>Deryck</LASTNAME> <COMPANY>Newco</COMPANY> <MEMBEROF>XTC Group</MEMBEROF> <JOBTITLE>Business Manager</JOBTITLE> <TEL>+32(0)3.471.99.60</TEL> <FAX>+32(0)3.891.99.65</FAX> <GSM>+32(0)465.23.04.34</GSM> <WEBSITE>www.newco.com</WEBSITE> <ADDRESS> <STREET>Dendersesteenweg 17</STREET> <ZIP>2630</ZIP> <CITY>Aartselaar</CITY> <COUNTRY>Belgium</COUNTRY> </ADDRESS> </BUSINESS-CARD> 41 and with correct XML-syntax: <BUSINESS-CARD> <FIRSTNAME>Jules</FIRSTNAME> <LASTNAME>Deryck</LASTNAME> <COMPANY>Newco</COMPANY> <MEMBEROF>XTC Group</MEMBEROF> <JOBTITLE>Business Manager</JOBTITLE> <TEL>+32(0)3.471.99.60</TEL> <FAX>+32(0)3.891.99.65</FAX> <GSM>+32(0)465.23.04.34</GSM> <WEBSITE>www.newco.com</WEBSITE> <ADDRESS> <STREET>Dendersesteenweg 17 42 </STREET> and with correct XML-syntax: Is "Jules" the <BUSINESS-CARD> <FIRSTNAME>Jules</FIRSTNAME> first name of the <LASTNAME>Deryck</LASTNAME> person, or of the <COMPANY>Newco</COMPANY> <MEMBEROF>XTC Group</MEMBEROF> business-card? <JOBTITLE>Business Manager</JOBTITLE> <TEL>+32(0)3.471.99.60</TEL> <FAX>+32(0)3.891.99.65</FAX> <GSM>+32(0)465.23.04.34</GSM> <WEBSITE>www.newco.com</WEBSITE> <ADDRESS> <STREET>Dendersesteenweg 17</STREET> <ZIP>2630</ZIP> <CITY>Aartselaar</CITY> <COUNTRY>Belgium</COUNTRY> </ADDRESS> </BUSINESS-CARD> 43 and with correct XML-syntax: Is Jules or <BUSINESS-CARD> <FIRSTNAME>Jules</FIRSTNAME> Newco the <LASTNAME>Deryck</LASTNAME> member of XTC <COMPANY>Newco</COMPANY> <MEMBEROF>XTC Group</MEMBEROF> Group? <JOBTITLE>Business Manager</JOBTITLE> <TEL>+32(0)3.471.99.60</TEL> <FAX>+32(0)3.891.99.65</FAX> <GSM>+32(0)465.23.04.34</GSM> <WEBSITE>www.newco.com</WEBSITE> <ADDRESS> <STREET>Dendersesteenweg 17</STREET> <ZIP>2630</ZIP> <CITY>Aartselaar</CITY> <COUNTRY>Belgium</COUNTRY> </ADDRESS> </BUSINESS-CARD> 44 and with correct XML-syntax: <BUSINESS-CARD> <FIRSTNAME>Jules</FIRSTNAME> <LASTNAME>Deryck</LASTNAME> <COMPANY>Newco</COMPANY> <MEMBEROF>XTC Group</MEMBEROF> Do the phone <JOBTITLE>Business Manager</JOBTITLE> numbers and <TEL>+32(0)3.471.99.60</TEL> <FAX>+32(0)3.891.99.65</FAX> address belong <GSM>+32(0)465.23.04.34</GSM> <WEBSITE>www.newco.com</WEBSITE> to Jules or to the <ADDRESS> business? <STREET>Dendersesteenweg 17</STREET> <ZIP>2630</ZIP> <CITY>Aartselaar</CITY> <COUNTRY>Belgium</COUNTRY> </ADDRESS> </BUSINESS-CARD> 45 Shirkey: The Semantic Web's philosophical argument -- the world should make more sense than it does -- is hard to argue with. The Semantic Web, with its neat ontologies and its syllogistic logic, is a nice vision. However, like many visions that project future benefits but ignore present costs, it requires too much coordination and too much energy to be effective in the real world … 46 Semantic Web effort thus far devoted primarily to developing systems for standardized representation of web pages and web processes (= ontology of web typography) not to the harder task of developing of ontologies (term hierarchies) for the content of such web pages 47 Cory Doctorow A world of exhaustive, reliable metadata would be a utopia. 48 Problem 1: People lie Meta-utopia is a world of reliable metadata. But poisoning the well can confer benefits to the poisoners Metadata exists in a competitive world. Some people are crooks. Some people are cranks. Some people are French philosophers. 49 Problem 2: People are lazy Half the pages on Geocities are called “Please title this page” 50 Problem 3: People are stupid The vast majority of the Internet's users (even those who are native speakers of English) cannot spell or punctuate Will internet users learn to accurately tag their information with whatever DLhierarchy they're supposed to be using? 51 Problem 4: Ontology Impedance = semantic mismatch between ontologies being merged This problem recognized in Semantic Web literature: http://ontoweb.aifb.uni-karlsruhe.de /About/Deliverables/ontoweb-del-7.6-swws1.pdf 52 Solution 1: treat it as (inevitable) ‘impedance’ and learn to find ways to cope with the disturbance which it brings Suggested here: http://ontoweb.aifb.uni-karls-ruhe.de/About/Deliverables/ontoweb-del-7.6-swws1.pdf 53 Solution 2: resolve the impedance problem on a case-by-case basis Suppose two databases are put on the web. Someone notices that "where" in the friends table and "zip" in the places table mean the same thing. http://www.w3.org/DesignIssues/Semantic.html 54 Both solutions fail 1. treating mismatches as ‘impedance’ ignores the problem of error propagation (and is inappropriate in an area like medicine) 2. resolving impedance on a case-by-case basis defeats the very purpose of the Semantic Web 55 Ontology Impedance ‘gene’ used in websites issued by biotech companies involved in gene patenting medical researchers interested in role of genes in predisposition to smoking insurance companies 56 The idea: distinguish two separate tasks: - developing an expressively rich correct ontologies of given domains - developing on this basis computer applications capable of running in real time 57 Basic Formal Ontology BFO The Vampire Slayer 58 59 BFO ontology not the ‘standardization’ or ‘specification’ of concepts (not a branch of knowledge or concept engineering) but an inventory of the types of entities existing in reality 60 BFO not a computer application but a reference ontology in the sense of Aristotelian philosophy - it sacrifices tractability for the sake of expressive power 61 Defining ‘gene’ GDB: a gene is a DNA fragment that can be transcribed and translated into a protein Genbank: a gene is a DNA region of biological interest with a name and that carries a genetic trait or phenotype 62 Ontology ‘fragment’, ‘region’, ‘name’, ‘carry’, ‘trait’, ‘type’ ... ‘part’, ‘whole’, ‘function’, ‘inhere’, ‘substance’ … are ontological terms in the sense of traditional (philosophical) ontology 63 BFO not just a system of categories but a formal theory with definitions, axioms, theorems designed to provide formal resources for the building of reference ontologies for specific domains the latter should be of sufficient richness that terminological incompatibilities can be resolved intelligently rather than by brute force 64 The Reference Ontology Community IFOMIS (Saarbrücken) Laboratories for Applied Ontology (Trento/Rome, Turin) Foundational Ontology Project (Leeds) Ontology Works (Baltimore Department of Structural Biology (Seattle) Virtual Soldier Project (DARPA) Open Biological Ontologies Consortium (Cambridge, Berkeley, Bar Harbor) 65 66 Ontology Tutorial Part 2 The Future of Ontology in Biomedicine 67 Ontology Tutorial Part 2: The Future of Ontology in Buffalo 68 Ontology Tutorial Part 2 The Future of Ontology in Biomedicine 69 Three senses of ontology 1. Philosophical sense: an inventory of the types of entities and relations in reality 2. Knowledge engineering sense: an ontology as a consensus representation of the concepts used in a given domain (Semantic Web) 3. Ontology as controlled vocabulary (Gene Ontology, Open Biological Ontologies Consortium) 70 Philosophical Ontology Ontologies are WINDOWS ON REALITY Ontologies deal with classes/universals/invariants in reality which exist independently of our theorizing and independently of our language 71 What are universals? invariants in reality satisfying biological laws (there are truths about universals in biological textbooks) 72 A universal is not determined by its instances as a state is not determined by its citizens A universal may vary with time as an organism may vary with time (by gaining and losing molecules) 73 Universals are Not Sets A set is an abstract structure, existing outside time and space. The set of Romans timelessly has Julius Caesar as a member. Universals exist in time. 74 A Window on Reality 75 Medical Diagnostic Hierarchy 76 a hierarchy in the realm of diseases Dependence Relations 77 Organisms Diseases A Window on Reality 78 Organisms Diseases A Window on Reality 79 universals substance organism animal mammal cat siamese frog instances 80 81 Many current standard ‘ontologies’ ramshackle because they have no counterpart of formal ontology The Universal Medical Language System (UMLS) a compendium of source vocabularies including: HL7 RIM SNOMED International Classification of Diseases MeSH – Medical Subject Headings Gene Ontology 82 Three senses of ontology 1. Philosophical sense: an inventory of the types of entities and relations in reality 2. Knowledge engineering sense: an ontology as a consensus representation of the concepts used in a given domain (Semantic Web) 3. Ontology as controlled vocabulary (Gene Ontology, Open Biological Ontologies Consortium) 83 Problem: The different source vocabularies are incompatible with each other 84 Problem: They contain bad coding which often derives from failure to pay attention to simple logical or ontological principles or from principles of good definitions 85 Bad Coding Plant roots is-a Plant Plant leaves is-a Plant Pollen is-a Plant Both testes is a testis Both uterii is a uterus 86 Bad definitions Heptolysis =def the cause of heptolysis Biological process =def a biological goal that requires more than one function 87 The Concept Orientation Work on biomedical ontologies grew out of work on medical dictionaries and nomenclatures Has focused almost exclusively on ‘concepts’ conceived (sometimes confused with terms/descriptions). 88 The Curse of Linguistics Work on biomedical ontologies grew out of work on medical dictionaries and nomenclatures This led to the assumption that all that need be said about classes can be said without appeal to time or to instances in reality. Ontology is about meanings/terms/strings 89 An alternative research programme for ontology based on philosophical principles Terms in bio-ontologies refer not to ‘concepts’ but to universals in reality 90 already reformed Foundational Model of Anatomy Anatomy Reference Ontology 91 Anatomical Entity Physical Anatomical Entity Conceptual Anatomical Entity -is a- Anatomical Relationship Material Physical Anatomical Entity Body Substance Anatomical Space Anatomical Structure Biological Macromolecule Cell Part Non-material Physical Anatomical Entity Cell Tissue Organ Organ Part Organ System Body Part Human Body 92 Anatomical Entity Physical Anatomical Entity Conceptual Anatomical Entity -is a- Anatomical Relationship Material Physical Anatomical Entity Body Substance Anatomical Space Anatomical Structure Biological Macromolecule Cell Part Non-material Physical Anatomical Entity Cell Tissue Organ Organ Part Organ System Body Part A window on reality Human Body 93 Anatomical Structure Anatomical Space Organ Cavity Subdivision Organ Cavity Organ Serous Sac Cavity Subdivision Serous Sac Cavity Serous Sac Organ Component Organ Subdivision Pleural Sac Pleural Cavity Parietal Pleura Interlobar recess Organ Part Mediastinal Pleura Tissue Pleura(Wall of Sac) Visceral Pleura Mesothelium of Pleura 94 To represent ontological relations we need to take instances into account To say A part_of B is not to say anything about Bs’ need for As as parts 95 part_of as a relation between universals A part_of B =def given any x, if inst(x, A) then there is some y such that inst(y, B) and part(x, y) human testis part_of human being, But not: heart part_of human being. 96 already reformed Foundational Model of Anatomy Anatomy Reference Ontology 97 under construction / overhaul Physiology Reference Ontology Gene Ontology OBOL 98 The Gene Ontology a controlled vocabulary for annotations of genes and gene products 99 When a gene is identified three important types of questions need to be addressed: 1. Where is it located in the cell? 2. What functions does it have on the molecular level? 3. To what biological processes do these functions contribute? 100 GO has three ontologies biological processes molecular functions cellular components 101 GO astonishingly influential used by all major species genome projects used by all major pharmacological research groups used by all major bioinformatics research groups 102 GO part of the Open Biological Ontologies consortium Fungal Ontology Plant Ontology Yeast Ontology Disease Ontology Mouse Anatomy Ontology Cell Ontology Sequence Ontology Relations Ontology 103 Each of GO’s ontologies is organized in a graph-theoretical structure involving two sorts of links or edges: is-a (= is a subtype of ) (copulation is-a biological process) part-of (cell wall part-of cell) 104 105 106 cellular components molecular functions biological processes 1372 component terms 7271 function terms 8069 process terms 107 The Cellular Component Ontology (counterpart of anatomy) flagellum chromosome membrane cell wall nucleus 108 The Molecular Function Ontology ice nucleation protein stabilization kinase activity binding The Molecular Function ontology is (roughly) an ontology of actions on the molecular level of granularity 109 Biological Process Ontology glycolysis copulation death An ontology of occurrents on the level of granularity of cells, organs and whole organisms 110 GO built by biologists free of the Curse of Linguistics free of the Curse of Computer Science 111 but problems still remain menopause part_of aging aging part_of death menopause part_of death 112 heptolysis Definition The causes of heptolysis … 113 regulation of sleep part_of sleep extrinsic to membrane part_of membrane 114 GO uses only two relations is_a and part_of 115 hence GO has only sentences of the forms A is_a B and A part_of B no way to express ‘not’ and no way to express ‘is localized at’ and no way to express ‘I don’t know’: 116 Holliday junction helicase complex is-a unlocalized cellular component unknown is-a cellular component 117 Old GO definition of part_of A part_of B =def A can be part of B 118 New GO definition of part_of as part of current OBOL reform effort A part_of B =def given any x, if inst(x, A) then there is some y such that inst(y, B) and part(x, y) 119 Analogous problems for nearly all foundational relations of ontologies and semantic networks: A causes B A is associated with B A is located in B etc. Reference to instances is necessary to clear up these problems 120 121 The Future of Ontology in Buffalo http://ontology.buffalo.edu/bcor/ to provide a forum within which philosophical ontologists and those involved in ontology applications can work together in highlevel interdisciplinary research to assist in coordination and integration of projects in ontological research being pursued in Buffalo 122 Gary Byrd Charles Dement Randall Dipert John Eisner Daniel Fischer Louis Goldberg Jorge Gracia David Hershenov Rajiv Kishore Eric Little James Llinas David Mark Bill Rapaport Galina Rogova Ram Ramesh Stuart C. Shapiro Barry Smith Rohini Srihari Moises Sudit 123 College of Arts and Sciences Computer Science and Engineering School of Management Center of Excellence in Bioinformatics School of Informatics School of Dental Medicine Center for Multisource Information Fusion National Center for Geographic Information and Analysis School of Medicine and Biomedical Sciences 124 Computer Science and Engineering School of Management Charles Dement Pharma of the Future 125 Computer Science and Engineering Daniel Fischer Bill Rapaport Stuart Shapiro Rohini Srihari 126 School of Management Ram Ramesh Rajiv Kishore 127 Center of Excellence in Bioinformatics Daniel Fischer 128 School of Informatics / School of Medicine Gary Byrd Medical Informatics Certificate Program 129 School of Dental Medicine John Eisner Louis Goldberg SNODENT 130 Center for Multisource Information Fusion Eric Little James Llinas Galina Rogova Moises Sudit 131 National Center for Geographic Information and Analysis David Mark Barry Smith 132 Department of Philosophy Barry Smith (Director?) Randall Dipert Jorge Gracia David Hershenov Ingvar Johansson Jiyuan Yu 133 Goal To show how philosophical ontology can contribute to the successful application of ontologies in information systems 134