Motivation Proposal Lexicon in Ontology Lifecycle Experiences Use of shared lexical resources for efficient ontological engineering Jimeno Yepes1 1 Rebholz 2 Computer Jiménez Ruiz2 Berlanga Llavorı́2 Rebholz-Schuhmann1 Group, EBI, Welcome Trust, Hinxton, UK Languages and Systems, Universitat Jaume I, Spain SWAT4LS 28 November, 2008 Conclusions Motivation Proposal Lexicon in Ontology Lifecycle Experiences Outline 1 Motivation: HeC Use Case 2 A Common Terminological Resource for Life Sciences 3 The Role of Lexicons in the Ontology Lifecycle 4 Limitations in Current Resources 5 Conclusions and Future Work Conclusions Motivation Proposal Lexicon in Ontology Lifecycle Experiences Conclusions Scenario: Health-e-Child project Health-e-Child (HeC) project . . . aims at the creation of an integrated (Grid-based) healthcare platform for European paediatrics deals with different children-related disease considers different medical levels: from molecular to cellular, tissue, organ, individual and population requires an ontology-based representation of the domain References J. Freund et al. “Health-e-child An integrated biomedical platform for grid-based pediatrics.” http://www.health-e-child.org/ Motivation Proposal Lexicon in Ontology Lifecycle Experiences Conclusions Scenario: Health-e-Child project Health-e-Child (HeC) project . . . aims at the creation of an integrated (Grid-based) healthcare platform for European paediatrics deals with different children-related disease considers different medical levels: from molecular to cellular, tissue, organ, individual and population requires an ontology-based representation of the domain References J. Freund et al. “Health-e-child An integrated biomedical platform for grid-based pediatrics.” http://www.health-e-child.org/ Motivation Proposal Lexicon in Ontology Lifecycle Experiences Scenario: creation of JRAO Juvenile Rheumatoid Arthritis Ontology (JRAO) JRAO is intended to describe the types of JRA ILAR classification considers 7 types of JRA JRAs require specialised treatments and are diagnosed according several factors: Joints affected Laboratory analysis Occurrence of fever Temporal evolution GALEN and NCI contain information relevant to JRA Conclusions Motivation Proposal Lexicon in Ontology Lifecycle Experiences Scenario: creation of JRAO Arthropathy Autoimmune Disease Arthritis Atrophic Arthritis Polyarthritis Juvenile Chronic Polyarthritis Rheumatologic Disorder Rheumatoid Arthritis Juvenile Rheumatoid Arthritis NCI Arthritis diseases Joints ects JRAO aff ... C1 C7 isTreatedBy Drugs Galen Conclusions Motivation Proposal Lexicon in Ontology Lifecycle Experiences Challenges in Ontology Reuse and Integration Reuse from available Ontologies 3 We want to reuse well-established knowledge 3 ... with little expertise in drugs, proteins, anatomy etc. 8 Domain ontologies use different notations Juvenile Rheumatoid Arthritis JuvenileArthritis Chronic Childhood Arthritis 8 Not always lexical information (i.e. synonyms) is found Conclusions Motivation Proposal Lexicon in Ontology Lifecycle Experiences Challenges in Ontology Reuse and Integration Reuse from available Ontologies 3 We want to reuse well-established knowledge 3 ... with little expertise in drugs, proteins, anatomy etc. 8 Domain ontologies use different notations Juvenile Rheumatoid Arthritis JuvenileArthritis Chronic Childhood Arthritis 8 Not always lexical information (i.e. synonyms) is found Conclusions Motivation Proposal Lexicon in Ontology Lifecycle Experiences Conclusions Challenges in Ontology Reuse and Integration Alignment Tools 3 Great Effort has been done in ontology alignment: http://www.ontologymatching.org/ 8 However those techniques provide approximate mappings (confidence): schema distance string matching (PositiveRheumatoidFactor ≈ NegativeRheumatoidFactor ) other statistical techniques 3 Some of them try to use thesaurus like Wordnet or UMLS 8 ... but they suffer from scalability problems Motivation Proposal Lexicon in Ontology Lifecycle Experiences Conclusions Challenges in Ontology Reuse and Integration Alignment Tools 3 Great Effort has been done in ontology alignment: http://www.ontologymatching.org/ 8 However those techniques provide approximate mappings (confidence): schema distance string matching (PositiveRheumatoidFactor ≈ NegativeRheumatoidFactor ) other statistical techniques 3 Some of them try to use thesaurus like Wordnet or UMLS 8 ... but they suffer from scalability problems Motivation Proposal Lexicon in Ontology Lifecycle Experiences Outline 1 Motivation: HeC Use Case 2 A Common Terminological Resource for Life Sciences 3 The Role of Lexicons in the Ontology Lifecycle 4 Limitations in Current Resources 5 Conclusions and Future Work Conclusions Motivation Proposal Lexicon in Ontology Lifecycle Experiences A Common Terminological Resource for Life Sciences Problems will be relaxed with a common reference thesaurus Conclusions Motivation Proposal Lexicon in Ontology Lifecycle Experiences Conclusions Terminological Resources and Ontologies Definition (Terminological Resources) Collection of entity names in a domain with synonyms Taxonomy of more general and specific terms (DAG) Complex terms are described with Natural Language Definition (Ontologies) Specification of a conceptualization [Gruber 93] 3 A formal specification of a shared conceptualization [Borst 97] More expressivity (union, intersection, restrictions, disjointness) to define complex concepts Motivation Proposal Lexicon in Ontology Lifecycle Experiences Conclusions Terminological Resources and Ontologies Definition (Terminological Resources) Collection of entity names in a domain with synonyms Taxonomy of more general and specific terms (DAG) Complex terms are described with Natural Language Definition (Ontologies) Specification of a conceptualization [Gruber 93] 3 A formal specification of a shared conceptualization [Borst 97] More expressivity (union, intersection, restrictions, disjointness) to define complex concepts Motivation Proposal Lexicon in Ontology Lifecycle Experiences Outline 1 Motivation: HeC Use Case 2 A Common Terminological Resource for Life Sciences 3 The Role of Lexicons in the Ontology Lifecycle 4 Limitations in Current Resources 5 Conclusions and Future Work Conclusions Motivation Proposal Ontology Lifecycle Lexicon in Ontology Lifecycle Experiences Conclusions Motivation Proposal Lexicon in Ontology Lifecycle Experiences Requirements and Acquisition Requirements Specification Granularity scope (diseases, organs, cells...) Classification criteria (e.g. ILAR, EULAR...) Medical protocols: WHICH knowledge to be represented 3 A thesaurus will help the selection of desired terms (synonyms) for labeling the future ontology concepts Knowledge Acquisition (in HeC) Based on medical protocols (HOW to represent) Domain ontologies can provide a partial representation KA will require reuse → integration of disparate sources 3 A common thesaurus will help during the KA step Conclusions Motivation Proposal Lexicon in Ontology Lifecycle Experiences Requirements and Acquisition Requirements Specification Granularity scope (diseases, organs, cells...) Classification criteria (e.g. ILAR, EULAR...) Medical protocols: WHICH knowledge to be represented 3 A thesaurus will help the selection of desired terms (synonyms) for labeling the future ontology concepts Knowledge Acquisition (in HeC) Based on medical protocols (HOW to represent) Domain ontologies can provide a partial representation KA will require reuse → integration of disparate sources 3 A common thesaurus will help during the KA step Conclusions Motivation Proposal Lexicon in Ontology Lifecycle Experiences Requirements and Acquisition Requirements Specification Granularity scope (diseases, organs, cells...) Classification criteria (e.g. ILAR, EULAR...) Medical protocols: WHICH knowledge to be represented 3 A thesaurus will help the selection of desired terms (synonyms) for labeling the future ontology concepts Knowledge Acquisition (in HeC) Based on medical protocols (HOW to represent) Domain ontologies can provide a partial representation KA will require reuse → integration of disparate sources 3 A common thesaurus will help during the KA step Conclusions Motivation Proposal Lexicon in Ontology Lifecycle Experiences Requirements and Acquisition Requirements Specification Granularity scope (diseases, organs, cells...) Classification criteria (e.g. ILAR, EULAR...) Medical protocols: WHICH knowledge to be represented 3 A thesaurus will help the selection of desired terms (synonyms) for labeling the future ontology concepts Knowledge Acquisition (in HeC) Based on medical protocols (HOW to represent) Domain ontologies can provide a partial representation KA will require reuse → integration of disparate sources 3 A common thesaurus will help during the KA step Conclusions Motivation Proposal Lexicon in Ontology Lifecycle Experiences Conceptualization and Maintenance Conceptualization A thesaurus will provide the lexical term to ontology concepts Same term and different interpretations ACR : JRA ≡ SystemicJRA t PolyArticularJRA t PauciarticularJRA ILAR : JIA ≡ SystemicJIA t PolyArticularJIA t OligoarticularJIA t Psoriatic Arthritis t Enthesisrelated Arthritis 8 Thesauri not always contains the desired term... 3 ...but ontologies can help thesaurus to evolve Conclusions Motivation Proposal Lexicon in Ontology Lifecycle Experiences Conceptualization and Maintenance Conceptualization A thesaurus will provide the lexical term to ontology concepts Same term and different interpretations ACR : JRA ≡ SystemicJRA t PolyArticularJRA t PauciarticularJRA ILAR : JIA ≡ SystemicJIA t PolyArticularJIA t OligoarticularJIA t Psoriatic Arthritis t Enthesisrelated Arthritis 8 Thesauri not always contains the desired term... 3 ...but ontologies can help thesaurus to evolve Conclusions Motivation Proposal Lexicon in Ontology Lifecycle Experiences Conclusions Conceptualization and Maintenance Maintenance and Evolution The domain will evolve during the development Both ontologies and thesaurus should be updated accordingly 3 Text resources could help in the evolution... 8 ...but there exists a decouple between lexicon/ontology development effort and the literature Motivation Proposal Lexicon in Ontology Lifecycle Experiences Conclusions Conceptualization and Maintenance Maintenance and Evolution The domain will evolve during the development Both ontologies and thesaurus should be updated accordingly 3 Text resources could help in the evolution... 8 ...but there exists a decouple between lexicon/ontology development effort and the literature Motivation Proposal Lexicon in Ontology Lifecycle Experiences Outline 1 Motivation: HeC Use Case 2 A Common Terminological Resource for Life Sciences 3 The Role of Lexicons in the Ontology Lifecycle 4 Limitations in Current Resources 5 Conclusions and Future Work Conclusions Motivation Proposal Lexicon in Ontology Lifecycle Experiences Experiences in HeC: Limitations in Current Resources Current Thesaurus 3 UMLS Metathesaurus represents the best effort 3 Terms from more than 100 terminologies (e.g. MesH, SNOMED or ICD) 8 Ambiguity retinoblastoma (gene or disease?) 8 Inappropriate term labels Long term names Use of formulae or other codifications Conclusions Motivation Proposal Lexicon in Ontology Lifecycle Experiences Experiences in HeC: Limitations in Current Resources Current Thesaurus 3 UMLS Metathesaurus represents the best effort 3 Terms from more than 100 terminologies (e.g. MesH, SNOMED or ICD) 8 Ambiguity retinoblastoma (gene or disease?) 8 Inappropriate term labels Long term names Use of formulae or other codifications Conclusions Motivation Proposal Lexicon in Ontology Lifecycle Experiences Conclusions Experiences in HeC: Limitations in Current Resources OBO Ontologies 3 Represent a huge community effort in the development of ontologies 3 Ontologies are enriched lexically 8 In some cases there exists an overload of lexical entries 8 We still miss the use of a shared reference thesaurus 8 Non-powerful underlying logic 8 Complex concepts labels are closer to descriptions and definitions Motivation Proposal Lexicon in Ontology Lifecycle Experiences Conclusions Experiences in HeC: Limitations in Current Resources OBO Ontologies 3 Represent a huge community effort in the development of ontologies 3 Ontologies are enriched lexically 8 In some cases there exists an overload of lexical entries 8 We still miss the use of a shared reference thesaurus 8 Non-powerful underlying logic 8 Complex concepts labels are closer to descriptions and definitions Motivation Proposal Lexicon in Ontology Lifecycle Experiences Experiences: Limitations in Current Resources Galen and FMA 3 Represent more formal ontologies (Frames and DL) 3 FMA uses Terminologia Anatomica (TA) 8 Independent efforts to UMLS and OBO 8 Galen contains little lexical information Conclusions Motivation Proposal Lexicon in Ontology Lifecycle Experiences Experiences: Limitations in Current Resources Galen and FMA 3 Represent more formal ontologies (Frames and DL) 3 FMA uses Terminologia Anatomica (TA) 8 Independent efforts to UMLS and OBO 8 Galen contains little lexical information Conclusions Motivation Proposal Lexicon in Ontology Lifecycle Experiences Outline 1 Motivation: HeC Use Case 2 A Common Terminological Resource for Life Sciences 3 The Role of Lexicons in the Ontology Lifecycle 4 Limitations in Current Resources 5 Conclusions and Future Work Conclusions Motivation Proposal Lexicon in Ontology Lifecycle Experiences Conclusions Conclusions A common thesaurus helps the ontology engineering lifecycle ... and this common thesaurus profits from the ontology engineering lifecycle but... A common shared thesaurus requires a community effort Evolution (versioning) of the thesaurus and the ontologies The creation of expressive and powerful ontologies requires DL expertise Motivation Proposal Lexicon in Ontology Lifecycle Experiences Conclusions Next Steps Light-weight thesaurus for Health-e-Child JIA, Cardio, Brain Tumours, ... Application within HeC tasks Semantic Integration of Ontologies A thesaurus could be considered to normalize concept labels ... but still conflicts could arise due to semantic incompatibility Feedback Questions? More information from . . . Temporal Knowledge Bases Group (UJI): http://krono.act.uji.es Rebholz’s Text Mining Group (EBI): http://www.ebi.ac.uk/Rebholz/ Feedback Thank you very much! More information from . . . Temporal Knowledge Bases Group (UJI): http://krono.act.uji.es Rebholz’s Text Mining Group (EBI): http://www.ebi.ac.uk/Rebholz/