Outline Alignment of Biomedical Ontologies using Life Science Literature

advertisement
Outline
„
„
Alignment of Biomedical Ontologies
using Life Science Literature
„
Ontologies and ontology alignment
Ontology alignment approaches using life science
literature
Conclusion and Future Work
He Tan, Vaida Jakoniene,
Jakoniene, Patrick Lambrix,
Lambrix,
Johan Aberg,
Aberg, Nahid Shahmehri
Department of Computer and Information Science
Linköpings universitet
2
Ontologies
Outline
„
„
„
Ontologies and ontology alignment
Ontology alignment approaches using life science
literature
Conclusion and Future Work
“Ontologies define the basic terms and relations
comprising the vocabulary of a topic area, as
well as the rules for combining terms and
relations to define extensions to the
vocabulary.”
3
Ontologies
4
Motivation
„
Ontologies used
- for communication between people and
organizations
- for enabling knowledge reuse and sharing
- as basis for interoperability between systems
- as repository of information
- as query model for information sources
Ontologies in biomedical research
…
many biomedical ontologies
…
practical use of biomedical ontologies
e.g. GO, OBO, SNOMED-CT
e.g. databases annotated with GO
GENE ONTOLOGY (GO)
immune response
i- acute-phase response
i- anaphylaxis
i- antigen presentation
i- antigen processing
i- cellular defense response
i- cytokine metabolism
i- cytokine biosynthesis
synonym cytokine production
…
p- regulation of cytokine
biosynthesis
…
…
i- B-cell activation
i- B-cell differentiation
i- B-cell proliferation
i- cellular defense response
…
i- T-cell activation
i- activation of natural killer
cell activity
…
Key technology for the Semantic Web
5
6
1
Motivation
„
Motivation
Ontologies with overlapping information
GENE ONTOLOGY (GO)
SIGNAL-ONTOLOGY (SigO)
immune response
i- acute-phase response
i- anaphylaxis
i- antigen presentation
i- antigen processing
i- cellular defense response
i- cytokine metabolism
i- cytokine biosynthesis
synonym cytokine production
…
p- regulation of cytokine
biosynthesis
…
…
i- B-cell activation
i- B-cell differentiation
i- B-cell proliferation
i- cellular defense response
…
i- T-cell activation
i- activation of natural killer
cell activity
…
Immune Response
i- Allergic Response
i- Antigen Processing and Presentation
i- B Cell Activation
i- B Cell Development
i- Complement Signaling
synonym complement activation
i- Cytokine Response
i- Immune Suppression
i- Inflammation
i- Intestinal Immunity
i- Leukotriene Response
i- Leukotriene Metabolism
i- Natural Killer Cell Response
i- T Cell Activation
i- T Cell Development
i- T Cell Selection in Thymus
„
Use of multiple ontologies
e.g. custom-specific ontology + standard ontology
„
Bottom-up creation of ontologies
experts can focus on their domain of expertise
Æ important to know the interinter-ontology relationships
7
8
Aligning ontologies
GENE ONTOLOGY (GO)
SIGNAL-ONTOLOGY (SigO)
GENE ONTOLOGY (GO)
SIGNAL-ONTOLOGY (SigO)
immune response
i- acute-phase response
i- anaphylaxis
i- antigen presentation
i- antigen processing
i- cellular defense response
i- cytokine metabolism
i- cytokine biosynthesis
synonym cytokine production
…
p- regulation of cytokine
biosynthesis
…
…
i- B-cell activation
i- B-cell differentiation
i- B-cell proliferation
i- cellular defense response
…
i- T-cell activation
i- activation of natural killer
cell activity
…
Immune Response
i- Allergic Response
i- Antigen Processing and Presentation
i- B Cell Activation
i- B Cell Development
i- Complement Signaling
synonym complement activation
i- Cytokine Response
i- Immune Suppression
i- Inflammation
i- Intestinal Immunity
i- Leukotriene Response
i- Leukotriene Metabolism
i- Natural Killer Cell Response
i- T Cell Activation
i- T Cell Development
i- T Cell Selection in Thymus
immune response
i- acute-phase response
i- anaphylaxis
i- antigen presentation
i- antigen processing
i- cellular defense response
i- cytokine metabolism
i- cytokine biosynthesis
synonym cytokine production
…
p- regulation of cytokine
biosynthesis
…
…
i- B-cell activation
i- B-cell differentiation
i- B-cell proliferation
i- cellular defense response
…
i- T-cell activation
i- activation of natural killer
cell activity
…
Immune Response
i- Allergic Response
i- Antigen Processing and Presentation
i- B Cell Activation
i- B Cell Development
i- Complement Signaling
synonym complement activation
i- Cytokine Response
i- Immune Suppression
i- Inflammation
i- Intestinal Immunity
i- Leukotriene Response
i- Leukotriene Metabolism
i- Natural Killer Cell Response
i- T Cell Activation
i- T Cell Development
i- T Cell Selection in Thymus
equivalent concepts
equivalent relations
is-a relation
9
Merging ontologies
Alignment Strategies
GENE ONTOLOGY (GO)
SIGNAL-ONTOLOGY (SigO)
The New Ontology
immune response
i- acute-phase response
i- anaphylaxis
i- antigen presentation
i- antigen processing
i- cellular defense response
i- cytokine metabolism
i- cytokine biosynthesis
synonym cytokine production
…
p- regulation of cytokine
biosynthesis
…
…
i- B-cell activation
i- B-cell differentiation
i- B-cell proliferation
i- cellular defense response
…
i- T-cell activation
i- activation of natural killer
cell activity
…
Immune Response
i- Allergic Response
i- Antigen Processing and Presentation
i- B Cell Activation
i- B Cell Development
i- Complement Signaling
synonym complement activation
i- Cytokine Response
i- Immune Suppression
i- Inflammation
i- Intestinal Immunity
i- Leukotriene Response
i- Leukotriene Metabolism
i- Natural Killer Cell Response
i- T Cell Activation
i- T Cell Development
i- T Cell Selection in Thymus
immune response
i- acute-phase response
i- Allergic Response
i- anaphylaxis
i- Antigen Processing and Presentation
i- antigen presentation
i- antigen processing
i- cellular defense response
i- Complement Signaling
synonym complement activation
i- Cytokine Response
i- cytokine metabolism
i- cytokine biosynthesis
synonym cytokine production
…
p- regulation of cytokine_biosynthesis
…
…
i- B-cell activation
i- B-cell differentiation
i- B-cell proliferation
i- cellular defense response
…
i- T-cell activation
i- Natural Killer Cell Response
i- activation of natural killer cell activity
…
equivalent concepts
equivalent relations
10
„
„
„
„
„
„
Strategies based on linguistic matching
Structure-based strategies
Constraint-based approaches
GO: Complement Activation
Instance-based strategies
Use of auxiliary information
SigO: complement signaling
synonym complement activation
Combining different approaches
is-a relation
11
12
2
Alignment Strategies
„
„
„
„
„
„
Alignment Strategies
Strategies based on linguistic matching
StructureStructure-based strategies
Constraint-based approaches
Instance-based strategies
Use of auxiliary information
Combining different approaches
„
„
„
„
Strategies based on linguistic matching
Structure-based strategies
ConstraintConstraint-based approaches
Instance-based strategies
Use of auxiliary information
Person
Human
Combining different approaches
O2
O1
„
„
Animal
Animal
13
Alignment Strategies
„
„
„
„
„
„
14
Alignment Strategies
Strategies based on linguistic matching
instance
corpus
Structure-based strategies
Constraint-based approaches
Ontology
InstanceInstance-based strategies
Use of auxiliary information
Combining different approaches
„
„
„
„
„
„
Strategies based linguistic matchingdictionary
Structure-based strategies
intermediate
thesauri
ontology
Constraint-based approaches
alignment strategies
Instance-based strategies
Use of auxiliary information
Combining different approaches
15
The General Alignment Strategy
Alignment Strategies
„
„
„
„
general
dictionaries
domain
thesauri
alignment algorithm
matcher
matcher
combination
filter
suggestions
accepted
suggestions
17
alignments
„
instance
corpora
Strategies based on linguistic matching
Structure-based strategies
Constraint-based approaches
Instance-based strategies
Use of auxiliary information
Combining different approaches
source ontologies
„
16
user
conflict
checker
18
3
SAMBO – matchers
„
Outline
Terminological matchers
„
algorithms based on textual description of concepts and relations
„
…
Structural matcher
…
…
an iterative algorithm based on is-a and part-of hierarchies
„
„
Domain matcher
utilizing the domain lexicon UMLS
„
Introduction
„
Learning matchers
Motivation
Ontologies
Ontology alignment
Ontology alignment approaches using life science
literature
Conclusion and Future Work
bayes learning algorithms based on related biomedical literature
19
Learning matchers – instance-based
strategies
„
Learning matchers - steps
Basic intuition
„
A similarity measure between concepts can be computed
based on the probability that documents about one
concept are also about the other concept and vice versa.
„
20
Generate corpora
…
…
Intuition for structure-based extensions
„
Generate classifiers
„
Classification
…
Documents about a concept are also about their superconcepts.
(No requirement for previous alignment results.)
Use concept as query term in PubMed
Retrieve most recent PubMed abstracts
…
„
One classifier per ontology
Abstracts related to one ontology are classified by the
other ontology’s classifier and vice versa
Calculate similarities
21
Structural extension ‘Cl’
Basic learning matcher
„
Generate corpora
Generate classifiers
„
Classification
„
…
…
„
22
„
Generate classifiers
…
Naive Bayes classifiers
…
Take (is-a) structure of the ontologies into account when
building the classifiers
Extend the set of abstracts associated to a concept by
adding the abstracts related to the sub-concepts
Abstracts related to one ontology are classified to the concept
in the other ontology with highest posterior probability
C1
Calculate similarities
C2
C3
C4
23
24
4
Evaluation - cases
Structural extension ‘Sim’
…
„
Calculate similarities
…
…
GO vs. SigO
GO: 70 terms
Take structure of the ontologies into account when calculating
similarities
Similarity is computed based on the classifiers applied to the
concepts and their sub-concepts
SigO: 15 terms
GO-immune defense
SigO-immune defense
GO: 60 terms
SigO: 10 terms
GO-behavior
SigO-behavior
New Ontology
New Ontology
“immune defense”
…
“behavior”
MA vs. MeSH
MA: 15 terms
MeSH: 18 terms
MA-nose
MeSH-nose
MA: 77 terms
MeSH: 39 terms
MA-ear
New Ontology
MeSH-ear
New Ontology
“nose”
“ear”
MA: 112terms
MeSH: 45 terms
MA-eye
MeSH-eye
New Ontology
“eye”
25
Evaluation – influence of number of
PubMed abstracts
Evaluation
„
Matchers
Basic, StrucCl, StrucSim, StrucClSim
Term, TermWN, Dom
„
26
„
„
Parameters
Different results for different cases.
Quality of results does not necessarily increase
when corpora become larger.
Maximum number of PubMed abstracts
Quality of suggestions: precision/recall
Thresholds : 0.4, 0.5, 0.6, 0.7, 0.8
Weights for combination: 1.0/1.2
Time
27
Evaluation – quality of suggestions
28
Evaluation – quality of suggestions
Basic
„
„
1
1
0.8
0.8
0.7
0.7
nos e
ear
0.4
precision
ID
0.5
„
B
B
0.6
rec all
„
0.9
0.9
0.6
ID
0.5
nose
ear
0.4
eye
eye
Basic usually outperforms structure-based algorithms.
No clear winner among StrucCl and StrucSim.
StruClSim performs worse than StrucCl and
StrucSim.
0.3
0.3
0.2
0.2
0.1
0.1
0
0
0.4
0.5
0. 6
threshold
0.7
0.8
0.4
0.5
0.6
0.7
0.8
threshold
29
30
5
Evaluation – quality of suggestions
Evaluation – quality of suggestions
Terminological matchers
„
1
1
0.9
0.9
0.9
0.8
0.8
0.8
0.7
0.7
0.7
B
0.7
B
0.6
ID
0.6
ID
0.5
nose
0.5
nose
0.4
ear
0.4
ear
0.3
eye
1
ear
eye
0.3
0.2
0.6
ID
nose
0.5
ear
0.4
eye
0.8
0.2
0.2
0.2
0.1
0.1
0
0
0.4
0.5
0.6
0.7
eye
0.3
0.3
0.1
0.1
1
0.9
precision
nos e
0.4
precision
ID
0.5
recall
B
B
0.6
recall
Domain matcher
„
0.8
0.4
0.5
0.6
0.7
0.8
0
0
0.4
0.5
0.6
0.7
0.8
0.4
0.5
0.6
0.7
threshold
0.8
threshold
threshold
threshold
31
Evaluation – quality of suggestions
„
Evaluation - time
„
Comparison of the matchers
CS_TermWN
32
⊇ CS_Dom ⊇ CS_Basic
T_StrucCl ~ T_StrucClSim > T_StrucSim > T_Basic
1100
StrucCl
StrucClSim
„
Combinations of the different matchers
leads to higher quality results
StrucSim
900
700
Basic
500
time (sec.)
case: ear
300
Dom
100
Term
TermWN
-100
33
Outline
„
…
…
„
„
Conclusion
Introduction
…
34
„
Motivation
Ontologies
Ontology alignment
„
Instance-based algorithms for aligning ontologies
using life science literature
Evaluations of matchers
…
Ontology alignment approaches using life science
literature
Conclusion and Future Work
…
…
35
Basic outperforms structure-based approaches
Our structure-based approaches do not require previous
alignments
Combination with other approaches gives best results
36
6
Future Work
„
Algorithms
…
…
…
Classify abstracts to multiple concepts
Use of auxiliary information
Other classifiers
„
Structure-based filtering
„
Evaluation tool - KitAMO
37
7
Download