A Semantic Grid Browser for the Life Sciences Applied to the Study of Infectious Diseases Knowledge Representation for the Web Simon Jupp Bio-Health Informatics Group University of Manchester, UK SWAT4LS Edinburgh 2008 A Semantic Grid Browser for the Life Sciences Applied to the Study of Infectious Diseases Introduction Library Science Application for a Semantic Web Dynamic addition of semantic links to the existing web Improve document retrieval, indexing and navigation This is not science - but a science enabling tool Application requires some domain knowledge What semantics do we need in the knowledge to support navigation? A Semantic Grid Browser for the Life Sciences Applied to the Study of Infectious Diseases COHSE (Conceptual Open Hypermedia SErvice) Document navigation system for the web Original COHSE used OWL ontologies for background knowledge Ontology structure used to drive navigation around web documents SeaLife project: Extending COHSE with a focus on life sciences use cases Do OWL ontologies meet all the requirements for Sealife? A Semantic Grid Browser for the Life Sciences Applied to the Study of Infectious Diseases A Semantic Grid Browser for the Life Sciences Applied to the Study of Infectious Diseases A Semantic Grid Browser for the Life Sciences Applied to the Study of Infectious Diseases A Semantic Grid Browser for the Life Sciences Applied to the Study of Infectious Diseases A Semantic Grid Browser for the Life Sciences Applied to the Study of Infectious Diseases COHSE Architecture A Semantic Grid Browser for the Life Sciences Applied to the Study of Infectious Diseases Knowledge requirements for a navigation system What should the background knowledge provide? 1. Rich lexical support for adding appropriate meta-data to documents on the web 2. Semantics for representing relationships between terms, in particular to generalise or specialise a term 3. A simple data structure flexible enough to incorporate a wide range of new or existing KOS A Semantic Grid Browser for the Life Sciences Applied to the Study of Infectious Diseases OWL ontologies for navigation systems 1. Rich lexical support for adding meta-data/labels to documents on the web Annotation space is flexible, but few built in standards for rich label support 2. Semantics for representing relationships between terms, in particular to generalise or specialise a term Yes, but strict (Class hierarchy provides a natural navigational structure) 3. A common data structure that is flexible enough to represent a wide range of new or existing knowledge bases Conversions of existing knowledge bases to OWL is common, but don’t always respect OWL semantics A Semantic Grid Browser for the Life Sciences Applied to the Study of Infectious Diseases Issues with using OWL OWL classes describe sets of instances and the conditions for set membership Modelling terminologies at the class (T-box) is difficult partly due to the universal nature of statements you make in OWL Tuberculosis is a Lung Disease Cells have nuclei Bacteria may cause pneumonia Most existing terminologies aren’t built with the strict semantics of OWL in mind. We want to model the relationships between terms, not a description of the instances A Semantic Grid Browser for the Life Sciences Applied to the Study of Infectious Diseases Tuberculosis in OWL Infectious Disease TB Bacteria abbreviation Caused by BCG vaccine Isoniazid Tuberculosis vaccine Chest X-ray Is a Diagnosis/ detection Symptom drug Affects Similar to Coughing Lung Mycobacterium bovis These are all useful links to terms relating to tuberculosis Modelling these relationships between classes in an OWL ontology is extremely difficult and is not necessary for building a navigation hierarchy A Semantic Grid Browser for the Life Sciences Applied to the Study of Infectious Diseases “Something to do with” The semantics of “something to do with” is all we need to build a connected graph of related terms that can aid document navigation Simply don’t need to consider the philosophical and logical aspects of modelling domain terminology in an OWL T-box Easier to merge and align two semantically different resources when we weaken the semantics Simple terminologies serve as useful intermediates for future OWL ontologies A Semantic Grid Browser for the Life Sciences Applied to the Study of Infectious Diseases Modeling terminologies at the instance level Use OWL vocabulary to build a schema for representing terminologies at the instance (A-box) level Define a set of data properties to capture richer lexical support for terminologies Define a set of object properties to capture the kinds of relationships between terms that we want to express in navigation systems Generalise, specialise, related terms. Exploit RDF/OWL machinery e.g. property characteristics for simple inferences Extending/Constraining the schema, Transitive, Symmetric, Property Chains A Semantic Grid Browser for the Life Sciences Applied to the Study of Infectious Diseases Simple Knowledge Organisation System (SKOS) W3C standard for representing Knowledge Organisation Systems (KOS) such as thesauri, classification schemes, subject heading systems, taxonomies, dictionaries etc. on the web Uses RDF/OWL to define schema, model terminologies at instance level Rich support for labelling and documenting concept meta data Preferred label, Alternate labels, hidden labels, definitions, examples, scope notes Semantic relationships for building concept hierarchies Has Broader, Has Narrower, Related, Close Match, Exact Match http://www.w3.org/2004/02/skos/ A Semantic Grid Browser for the Life Sciences Applied to the Study of Infectious Diseases Tuberculosis in SKOS Infectious Disease TB Bacteria skos:altLabel skos:broader BCG vaccine skos:narrower Chest X-ray skos:broader Isoniazid Tuberculosis skos:related skos:narrower skos:related skos:related skos:narrower Coughing Lung Mycobacterium bovis Not a replacement for OWL, just an additional representation for terminologies A syntax for building navigational hierarchies A Semantic Grid Browser for the Life Sciences Applied to the Study of Infectious Diseases SKOS in COHSE COHSE knowledge base now support both SKOS and OWL representations Rapidly develop SKOS vocabularies to support navigation SKOS provides a standard syntax to represent a wide range of terminologies that don’t readily convert into an OWL ontology e.g. MeSH, Online Medial Dictionary, Bio thesaurus The weaker semantics of SKOS are “enough” for some applications, such as navigation systems A Semantic Grid Browser for the Life Sciences Applied to the Study of Infectious Diseases Conversions to SKOS Reuse! Publish existing life science ontologies, thesauri, dictionaries on the web using SKOS - for applications that only require SKOS semantics OWL , OBO, UMLS etc. Large coverage of terminologies that could be represented in SKOS -Protein covalent bond -Protein domain -UniProt taxonomy -Sequence types features -Genetic Context -Pathway ontology -Event (INOH pathway ontology) -Systems Biology -Protein-protein interaction and -Mosquito gross anatomy -Mouse adult gross anatomy -Mouse gross anatomy and development -C. elegans gross anatomy -Arabidopsis gross anatomy -Cereal plant gross anatomy -Drosophila gross anatomy -Dictyostelium discoideum anatomy -Fungal gross anatomy FAO -Plant structure -Maize gross anatomy -Medaka fish anatomy and development -Zebrafish anatomy and development BRENDA tissue / enzyme source Proteins Sequence Pathways Anatomy Phenotype Phenotype Gene products Development Transcript Plasmodium life cycle Cell type - Molecule role - Molecular Function - Biological process - Cellular component eVOC (Expressed Sequence Annotation for Humans) -Arabidopsis development -Cereal plant development -Plant growth and developmental stage -C. elegans development -Drosophila development FBdv fly development.obo OBO yes yes -Human developmental anatomy, abstract version -Human developmental anatomy, timed version -NCI Thesaurus -Mouse pathology -Human disease -Cereal plant trait -PATO PATO attribute and value.obo -Mammalian phenotype -Habronattus courtship -Loggerhead nesting -Animal natural history and life history A Semantic Grid Browser for the Life Sciences Applied to the Study of Infectious Diseases NeLI use case National Electronic Library of Infection Built new SKOS vocabulary to improve search/navigation around their website (powered by COHSE - evaluation ongoing). Reuse existing vocabularies that have been converted to SKOS to fill in the gaps (MeSH , OBO Disease Ontology) neli:Polio_Virus SKOS relation from neli:Polio_Virus skos:altLabel skos:broader skos:narrower skos:broader skos:broader SKOS Concept Brunhilde Virus Spinal cord disease Postpoliomyelitis Syndrome Microorganism Enterovirus Source MeSH Disease Ontology SNOMED A Semantic Grid Browser for the Life Sciences Applied to the Study of Infectious Diseases Conclusion We need a large knowledge artefact that supports navigation between related web resources Rapid generation and reuse of existing terminologies (cheap) Loosening the semantics of our model enables this with acceptable trade off SKOS is a suitable data model to represent our background knowledge We have an implementation in COHSE and use cases from the life sciences that are good showcases for semantic web technologies A Semantic Grid Browser for the Life Sciences Applied to the Study of Infectious Diseases Conclusion SKOS is new, still exploring what it can do for the life science and semantic web e.g. Ontology indexing, Resource for text mining… Not a replacement for OWL, cost effective alternative for certain tasks This is not a criticism of OWL, rather a criticism for the misuse of OWL Thank you. A Semantic Grid Browser for the Life Sciences Applied to the Study of Infectious Diseases SKOSEd Thesaurus editor for the Semantic Web http://code.google.com/p/skoseditor/ A Semantic Grid Browser for the Life Sciences Applied to the Study of Infectious Diseases Acknowledgments Manchester Other Robert Stevens COHSE developers Sean Bechhofer SeaLife project Yeliz Yesilada NeLI Patty Kostkova