Use of shared lexical resources for efficient ontological engineering Jimeno Yepes Jim´

advertisement
Motivation
Proposal
Lexicon in Ontology Lifecycle
Experiences
Use of shared lexical resources for efficient
ontological engineering
Jimeno Yepes1
1 Rebholz
2 Computer
Jiménez Ruiz2 Berlanga Llavorı́2
Rebholz-Schuhmann1
Group, EBI, Welcome Trust, Hinxton, UK
Languages and Systems, Universitat Jaume I, Spain
SWAT4LS 28 November, 2008
Conclusions
Motivation
Proposal
Lexicon in Ontology Lifecycle
Experiences
Outline
1
Motivation: HeC Use Case
2
A Common Terminological Resource for Life Sciences
3
The Role of Lexicons in the Ontology Lifecycle
4
Limitations in Current Resources
5
Conclusions and Future Work
Conclusions
Motivation
Proposal
Lexicon in Ontology Lifecycle
Experiences
Conclusions
Scenario: Health-e-Child project
Health-e-Child (HeC) project . . .
aims at the creation of an integrated (Grid-based) healthcare
platform for European paediatrics
deals with different children-related disease
considers different medical levels: from molecular to cellular,
tissue, organ, individual and population
requires an ontology-based representation of the domain
References
J. Freund et al. “Health-e-child An integrated biomedical
platform for grid-based pediatrics.”
http://www.health-e-child.org/
Motivation
Proposal
Lexicon in Ontology Lifecycle
Experiences
Conclusions
Scenario: Health-e-Child project
Health-e-Child (HeC) project . . .
aims at the creation of an integrated (Grid-based) healthcare
platform for European paediatrics
deals with different children-related disease
considers different medical levels: from molecular to cellular,
tissue, organ, individual and population
requires an ontology-based representation of the domain
References
J. Freund et al. “Health-e-child An integrated biomedical
platform for grid-based pediatrics.”
http://www.health-e-child.org/
Motivation
Proposal
Lexicon in Ontology Lifecycle
Experiences
Scenario: creation of JRAO
Juvenile Rheumatoid Arthritis Ontology (JRAO)
JRAO is intended to describe the types of JRA
ILAR classification considers 7 types of JRA
JRAs require specialised treatments and are diagnosed
according several factors:
Joints affected
Laboratory analysis
Occurrence of fever
Temporal evolution
GALEN and NCI contain information relevant to JRA
Conclusions
Motivation
Proposal
Lexicon in Ontology Lifecycle
Experiences
Scenario: creation of JRAO
Arthropathy
Autoimmune
Disease
Arthritis
Atrophic Arthritis
Polyarthritis
Juvenile Chronic Polyarthritis
Rheumatologic
Disorder
Rheumatoid Arthritis
Juvenile Rheumatoid Arthritis
NCI
Arthritis diseases
Joints
ects
JRAO
aff
...
C1 C7 isTreatedBy
Drugs
Galen
Conclusions
Motivation
Proposal
Lexicon in Ontology Lifecycle
Experiences
Challenges in Ontology Reuse and Integration
Reuse from available Ontologies
3 We want to reuse well-established knowledge
3 ... with little expertise in drugs, proteins, anatomy etc.
8 Domain ontologies use different notations
Juvenile Rheumatoid Arthritis
JuvenileArthritis
Chronic Childhood Arthritis
8 Not always lexical information (i.e. synonyms) is found
Conclusions
Motivation
Proposal
Lexicon in Ontology Lifecycle
Experiences
Challenges in Ontology Reuse and Integration
Reuse from available Ontologies
3 We want to reuse well-established knowledge
3 ... with little expertise in drugs, proteins, anatomy etc.
8 Domain ontologies use different notations
Juvenile Rheumatoid Arthritis
JuvenileArthritis
Chronic Childhood Arthritis
8 Not always lexical information (i.e. synonyms) is found
Conclusions
Motivation
Proposal
Lexicon in Ontology Lifecycle
Experiences
Conclusions
Challenges in Ontology Reuse and Integration
Alignment Tools
3 Great Effort has been done in ontology alignment:
http://www.ontologymatching.org/
8 However those techniques provide approximate mappings
(confidence):
schema distance
string matching
(PositiveRheumatoidFactor ≈ NegativeRheumatoidFactor )
other statistical techniques
3 Some of them try to use thesaurus like Wordnet or UMLS
8 ... but they suffer from scalability problems
Motivation
Proposal
Lexicon in Ontology Lifecycle
Experiences
Conclusions
Challenges in Ontology Reuse and Integration
Alignment Tools
3 Great Effort has been done in ontology alignment:
http://www.ontologymatching.org/
8 However those techniques provide approximate mappings
(confidence):
schema distance
string matching
(PositiveRheumatoidFactor ≈ NegativeRheumatoidFactor )
other statistical techniques
3 Some of them try to use thesaurus like Wordnet or UMLS
8 ... but they suffer from scalability problems
Motivation
Proposal
Lexicon in Ontology Lifecycle
Experiences
Outline
1
Motivation: HeC Use Case
2
A Common Terminological Resource for Life Sciences
3
The Role of Lexicons in the Ontology Lifecycle
4
Limitations in Current Resources
5
Conclusions and Future Work
Conclusions
Motivation
Proposal
Lexicon in Ontology Lifecycle
Experiences
A Common Terminological Resource for Life Sciences
Problems will be relaxed with a common reference thesaurus
Conclusions
Motivation
Proposal
Lexicon in Ontology Lifecycle
Experiences
Conclusions
Terminological Resources and Ontologies
Definition (Terminological Resources)
Collection of entity names in a domain with synonyms
Taxonomy of more general and specific terms (DAG)
Complex terms are described with Natural Language
Definition (Ontologies)
Specification of a conceptualization [Gruber 93]
3 A formal specification of a shared conceptualization [Borst 97]
More expressivity (union, intersection, restrictions, disjointness) to
define complex concepts
Motivation
Proposal
Lexicon in Ontology Lifecycle
Experiences
Conclusions
Terminological Resources and Ontologies
Definition (Terminological Resources)
Collection of entity names in a domain with synonyms
Taxonomy of more general and specific terms (DAG)
Complex terms are described with Natural Language
Definition (Ontologies)
Specification of a conceptualization [Gruber 93]
3 A formal specification of a shared conceptualization [Borst 97]
More expressivity (union, intersection, restrictions, disjointness) to
define complex concepts
Motivation
Proposal
Lexicon in Ontology Lifecycle
Experiences
Outline
1
Motivation: HeC Use Case
2
A Common Terminological Resource for Life Sciences
3
The Role of Lexicons in the Ontology Lifecycle
4
Limitations in Current Resources
5
Conclusions and Future Work
Conclusions
Motivation
Proposal
Ontology Lifecycle
Lexicon in Ontology Lifecycle
Experiences
Conclusions
Motivation
Proposal
Lexicon in Ontology Lifecycle
Experiences
Requirements and Acquisition
Requirements Specification
Granularity scope (diseases, organs, cells...)
Classification criteria (e.g. ILAR, EULAR...)
Medical protocols: WHICH knowledge to be represented
3 A thesaurus will help the selection of desired terms
(synonyms) for labeling the future ontology concepts
Knowledge Acquisition (in HeC)
Based on medical protocols (HOW to represent)
Domain ontologies can provide a partial representation
KA will require reuse → integration of disparate sources
3 A common thesaurus will help during the KA step
Conclusions
Motivation
Proposal
Lexicon in Ontology Lifecycle
Experiences
Requirements and Acquisition
Requirements Specification
Granularity scope (diseases, organs, cells...)
Classification criteria (e.g. ILAR, EULAR...)
Medical protocols: WHICH knowledge to be represented
3 A thesaurus will help the selection of desired terms
(synonyms) for labeling the future ontology concepts
Knowledge Acquisition (in HeC)
Based on medical protocols (HOW to represent)
Domain ontologies can provide a partial representation
KA will require reuse → integration of disparate sources
3 A common thesaurus will help during the KA step
Conclusions
Motivation
Proposal
Lexicon in Ontology Lifecycle
Experiences
Requirements and Acquisition
Requirements Specification
Granularity scope (diseases, organs, cells...)
Classification criteria (e.g. ILAR, EULAR...)
Medical protocols: WHICH knowledge to be represented
3 A thesaurus will help the selection of desired terms
(synonyms) for labeling the future ontology concepts
Knowledge Acquisition (in HeC)
Based on medical protocols (HOW to represent)
Domain ontologies can provide a partial representation
KA will require reuse → integration of disparate sources
3 A common thesaurus will help during the KA step
Conclusions
Motivation
Proposal
Lexicon in Ontology Lifecycle
Experiences
Requirements and Acquisition
Requirements Specification
Granularity scope (diseases, organs, cells...)
Classification criteria (e.g. ILAR, EULAR...)
Medical protocols: WHICH knowledge to be represented
3 A thesaurus will help the selection of desired terms
(synonyms) for labeling the future ontology concepts
Knowledge Acquisition (in HeC)
Based on medical protocols (HOW to represent)
Domain ontologies can provide a partial representation
KA will require reuse → integration of disparate sources
3 A common thesaurus will help during the KA step
Conclusions
Motivation
Proposal
Lexicon in Ontology Lifecycle
Experiences
Conceptualization and Maintenance
Conceptualization
A thesaurus will provide the lexical term to ontology
concepts
Same term and different interpretations
ACR : JRA ≡ SystemicJRA t PolyArticularJRA t PauciarticularJRA
ILAR : JIA ≡ SystemicJIA t PolyArticularJIA t OligoarticularJIA t
Psoriatic Arthritis t Enthesisrelated Arthritis
8 Thesauri not always contains the desired term...
3 ...but ontologies can help thesaurus to evolve
Conclusions
Motivation
Proposal
Lexicon in Ontology Lifecycle
Experiences
Conceptualization and Maintenance
Conceptualization
A thesaurus will provide the lexical term to ontology
concepts
Same term and different interpretations
ACR : JRA ≡ SystemicJRA t PolyArticularJRA t PauciarticularJRA
ILAR : JIA ≡ SystemicJIA t PolyArticularJIA t OligoarticularJIA t
Psoriatic Arthritis t Enthesisrelated Arthritis
8 Thesauri not always contains the desired term...
3 ...but ontologies can help thesaurus to evolve
Conclusions
Motivation
Proposal
Lexicon in Ontology Lifecycle
Experiences
Conclusions
Conceptualization and Maintenance
Maintenance and Evolution
The domain will evolve during the development
Both ontologies and thesaurus should be updated accordingly
3 Text resources could help in the evolution...
8 ...but there exists a decouple between lexicon/ontology
development effort and the literature
Motivation
Proposal
Lexicon in Ontology Lifecycle
Experiences
Conclusions
Conceptualization and Maintenance
Maintenance and Evolution
The domain will evolve during the development
Both ontologies and thesaurus should be updated accordingly
3 Text resources could help in the evolution...
8 ...but there exists a decouple between lexicon/ontology
development effort and the literature
Motivation
Proposal
Lexicon in Ontology Lifecycle
Experiences
Outline
1
Motivation: HeC Use Case
2
A Common Terminological Resource for Life Sciences
3
The Role of Lexicons in the Ontology Lifecycle
4
Limitations in Current Resources
5
Conclusions and Future Work
Conclusions
Motivation
Proposal
Lexicon in Ontology Lifecycle
Experiences
Experiences in HeC: Limitations in Current Resources
Current Thesaurus
3 UMLS Metathesaurus represents the best effort
3 Terms from more than 100 terminologies (e.g. MesH,
SNOMED or ICD)
8 Ambiguity
retinoblastoma (gene or disease?)
8 Inappropriate term labels
Long term names
Use of formulae or other codifications
Conclusions
Motivation
Proposal
Lexicon in Ontology Lifecycle
Experiences
Experiences in HeC: Limitations in Current Resources
Current Thesaurus
3 UMLS Metathesaurus represents the best effort
3 Terms from more than 100 terminologies (e.g. MesH,
SNOMED or ICD)
8 Ambiguity
retinoblastoma (gene or disease?)
8 Inappropriate term labels
Long term names
Use of formulae or other codifications
Conclusions
Motivation
Proposal
Lexicon in Ontology Lifecycle
Experiences
Conclusions
Experiences in HeC: Limitations in Current Resources
OBO Ontologies
3 Represent a huge community effort in the development of
ontologies
3 Ontologies are enriched lexically
8 In some cases there exists an overload of lexical entries
8 We still miss the use of a shared reference thesaurus
8 Non-powerful underlying logic
8 Complex concepts labels are closer to descriptions and
definitions
Motivation
Proposal
Lexicon in Ontology Lifecycle
Experiences
Conclusions
Experiences in HeC: Limitations in Current Resources
OBO Ontologies
3 Represent a huge community effort in the development of
ontologies
3 Ontologies are enriched lexically
8 In some cases there exists an overload of lexical entries
8 We still miss the use of a shared reference thesaurus
8 Non-powerful underlying logic
8 Complex concepts labels are closer to descriptions and
definitions
Motivation
Proposal
Lexicon in Ontology Lifecycle
Experiences
Experiences: Limitations in Current Resources
Galen and FMA
3 Represent more formal ontologies (Frames and DL)
3 FMA uses Terminologia Anatomica (TA)
8 Independent efforts to UMLS and OBO
8 Galen contains little lexical information
Conclusions
Motivation
Proposal
Lexicon in Ontology Lifecycle
Experiences
Experiences: Limitations in Current Resources
Galen and FMA
3 Represent more formal ontologies (Frames and DL)
3 FMA uses Terminologia Anatomica (TA)
8 Independent efforts to UMLS and OBO
8 Galen contains little lexical information
Conclusions
Motivation
Proposal
Lexicon in Ontology Lifecycle
Experiences
Outline
1
Motivation: HeC Use Case
2
A Common Terminological Resource for Life Sciences
3
The Role of Lexicons in the Ontology Lifecycle
4
Limitations in Current Resources
5
Conclusions and Future Work
Conclusions
Motivation
Proposal
Lexicon in Ontology Lifecycle
Experiences
Conclusions
Conclusions
A common thesaurus helps the ontology engineering lifecycle
... and this common thesaurus profits from the ontology
engineering lifecycle
but...
A common shared thesaurus requires a community effort
Evolution (versioning) of the thesaurus and the ontologies
The creation of expressive and powerful ontologies requires
DL expertise
Motivation
Proposal
Lexicon in Ontology Lifecycle
Experiences
Conclusions
Next Steps
Light-weight thesaurus for Health-e-Child
JIA, Cardio, Brain Tumours, ...
Application within HeC tasks
Semantic Integration of Ontologies
A thesaurus could be considered to normalize concept labels
... but still conflicts could arise due to semantic incompatibility
Feedback
Questions?
More information from . . .
Temporal Knowledge Bases Group (UJI): http://krono.act.uji.es
Rebholz’s Text Mining Group (EBI): http://www.ebi.ac.uk/Rebholz/
Feedback
Thank you very much!
More information from . . .
Temporal Knowledge Bases Group (UJI): http://krono.act.uji.es
Rebholz’s Text Mining Group (EBI): http://www.ebi.ac.uk/Rebholz/
Download