Outline Alignment of Biomedical Ontologies using Life Science Literature Ontologies and ontology alignment Ontology alignment approaches using life science literature Conclusion and Future Work He Tan, Vaida Jakoniene, Jakoniene, Patrick Lambrix, Lambrix, Johan Aberg, Aberg, Nahid Shahmehri Department of Computer and Information Science Linköpings universitet 2 Ontologies Outline Ontologies and ontology alignment Ontology alignment approaches using life science literature Conclusion and Future Work “Ontologies define the basic terms and relations comprising the vocabulary of a topic area, as well as the rules for combining terms and relations to define extensions to the vocabulary.” 3 Ontologies 4 Motivation Ontologies used - for communication between people and organizations - for enabling knowledge reuse and sharing - as basis for interoperability between systems - as repository of information - as query model for information sources Ontologies in biomedical research many biomedical ontologies practical use of biomedical ontologies e.g. GO, OBO, SNOMED-CT e.g. databases annotated with GO GENE ONTOLOGY (GO) immune response i- acute-phase response i- anaphylaxis i- antigen presentation i- antigen processing i- cellular defense response i- cytokine metabolism i- cytokine biosynthesis synonym cytokine production … p- regulation of cytokine biosynthesis … … i- B-cell activation i- B-cell differentiation i- B-cell proliferation i- cellular defense response … i- T-cell activation i- activation of natural killer cell activity … Key technology for the Semantic Web 5 6 1 Motivation Motivation Ontologies with overlapping information GENE ONTOLOGY (GO) SIGNAL-ONTOLOGY (SigO) immune response i- acute-phase response i- anaphylaxis i- antigen presentation i- antigen processing i- cellular defense response i- cytokine metabolism i- cytokine biosynthesis synonym cytokine production … p- regulation of cytokine biosynthesis … … i- B-cell activation i- B-cell differentiation i- B-cell proliferation i- cellular defense response … i- T-cell activation i- activation of natural killer cell activity … Immune Response i- Allergic Response i- Antigen Processing and Presentation i- B Cell Activation i- B Cell Development i- Complement Signaling synonym complement activation i- Cytokine Response i- Immune Suppression i- Inflammation i- Intestinal Immunity i- Leukotriene Response i- Leukotriene Metabolism i- Natural Killer Cell Response i- T Cell Activation i- T Cell Development i- T Cell Selection in Thymus Use of multiple ontologies e.g. custom-specific ontology + standard ontology Bottom-up creation of ontologies experts can focus on their domain of expertise Æ important to know the interinter-ontology relationships 7 8 Aligning ontologies GENE ONTOLOGY (GO) SIGNAL-ONTOLOGY (SigO) GENE ONTOLOGY (GO) SIGNAL-ONTOLOGY (SigO) immune response i- acute-phase response i- anaphylaxis i- antigen presentation i- antigen processing i- cellular defense response i- cytokine metabolism i- cytokine biosynthesis synonym cytokine production … p- regulation of cytokine biosynthesis … … i- B-cell activation i- B-cell differentiation i- B-cell proliferation i- cellular defense response … i- T-cell activation i- activation of natural killer cell activity … Immune Response i- Allergic Response i- Antigen Processing and Presentation i- B Cell Activation i- B Cell Development i- Complement Signaling synonym complement activation i- Cytokine Response i- Immune Suppression i- Inflammation i- Intestinal Immunity i- Leukotriene Response i- Leukotriene Metabolism i- Natural Killer Cell Response i- T Cell Activation i- T Cell Development i- T Cell Selection in Thymus immune response i- acute-phase response i- anaphylaxis i- antigen presentation i- antigen processing i- cellular defense response i- cytokine metabolism i- cytokine biosynthesis synonym cytokine production … p- regulation of cytokine biosynthesis … … i- B-cell activation i- B-cell differentiation i- B-cell proliferation i- cellular defense response … i- T-cell activation i- activation of natural killer cell activity … Immune Response i- Allergic Response i- Antigen Processing and Presentation i- B Cell Activation i- B Cell Development i- Complement Signaling synonym complement activation i- Cytokine Response i- Immune Suppression i- Inflammation i- Intestinal Immunity i- Leukotriene Response i- Leukotriene Metabolism i- Natural Killer Cell Response i- T Cell Activation i- T Cell Development i- T Cell Selection in Thymus equivalent concepts equivalent relations is-a relation 9 Merging ontologies Alignment Strategies GENE ONTOLOGY (GO) SIGNAL-ONTOLOGY (SigO) The New Ontology immune response i- acute-phase response i- anaphylaxis i- antigen presentation i- antigen processing i- cellular defense response i- cytokine metabolism i- cytokine biosynthesis synonym cytokine production … p- regulation of cytokine biosynthesis … … i- B-cell activation i- B-cell differentiation i- B-cell proliferation i- cellular defense response … i- T-cell activation i- activation of natural killer cell activity … Immune Response i- Allergic Response i- Antigen Processing and Presentation i- B Cell Activation i- B Cell Development i- Complement Signaling synonym complement activation i- Cytokine Response i- Immune Suppression i- Inflammation i- Intestinal Immunity i- Leukotriene Response i- Leukotriene Metabolism i- Natural Killer Cell Response i- T Cell Activation i- T Cell Development i- T Cell Selection in Thymus immune response i- acute-phase response i- Allergic Response i- anaphylaxis i- Antigen Processing and Presentation i- antigen presentation i- antigen processing i- cellular defense response i- Complement Signaling synonym complement activation i- Cytokine Response i- cytokine metabolism i- cytokine biosynthesis synonym cytokine production … p- regulation of cytokine_biosynthesis … … i- B-cell activation i- B-cell differentiation i- B-cell proliferation i- cellular defense response … i- T-cell activation i- Natural Killer Cell Response i- activation of natural killer cell activity … equivalent concepts equivalent relations 10 Strategies based on linguistic matching Structure-based strategies Constraint-based approaches GO: Complement Activation Instance-based strategies Use of auxiliary information SigO: complement signaling synonym complement activation Combining different approaches is-a relation 11 12 2 Alignment Strategies Alignment Strategies Strategies based on linguistic matching StructureStructure-based strategies Constraint-based approaches Instance-based strategies Use of auxiliary information Combining different approaches Strategies based on linguistic matching Structure-based strategies ConstraintConstraint-based approaches Instance-based strategies Use of auxiliary information Person Human Combining different approaches O2 O1 Animal Animal 13 Alignment Strategies 14 Alignment Strategies Strategies based on linguistic matching instance corpus Structure-based strategies Constraint-based approaches Ontology InstanceInstance-based strategies Use of auxiliary information Combining different approaches Strategies based linguistic matchingdictionary Structure-based strategies intermediate thesauri ontology Constraint-based approaches alignment strategies Instance-based strategies Use of auxiliary information Combining different approaches 15 The General Alignment Strategy Alignment Strategies general dictionaries domain thesauri alignment algorithm matcher matcher combination filter suggestions accepted suggestions 17 alignments instance corpora Strategies based on linguistic matching Structure-based strategies Constraint-based approaches Instance-based strategies Use of auxiliary information Combining different approaches source ontologies 16 user conflict checker 18 3 SAMBO – matchers Outline Terminological matchers algorithms based on textual description of concepts and relations Structural matcher an iterative algorithm based on is-a and part-of hierarchies Domain matcher utilizing the domain lexicon UMLS Introduction Learning matchers Motivation Ontologies Ontology alignment Ontology alignment approaches using life science literature Conclusion and Future Work bayes learning algorithms based on related biomedical literature 19 Learning matchers – instance-based strategies Learning matchers - steps Basic intuition A similarity measure between concepts can be computed based on the probability that documents about one concept are also about the other concept and vice versa. 20 Generate corpora Intuition for structure-based extensions Generate classifiers Classification Documents about a concept are also about their superconcepts. (No requirement for previous alignment results.) Use concept as query term in PubMed Retrieve most recent PubMed abstracts One classifier per ontology Abstracts related to one ontology are classified by the other ontology’s classifier and vice versa Calculate similarities 21 Structural extension ‘Cl’ Basic learning matcher Generate corpora Generate classifiers Classification 22 Generate classifiers Naive Bayes classifiers Take (is-a) structure of the ontologies into account when building the classifiers Extend the set of abstracts associated to a concept by adding the abstracts related to the sub-concepts Abstracts related to one ontology are classified to the concept in the other ontology with highest posterior probability C1 Calculate similarities C2 C3 C4 23 24 4 Evaluation - cases Structural extension ‘Sim’ Calculate similarities GO vs. SigO GO: 70 terms Take structure of the ontologies into account when calculating similarities Similarity is computed based on the classifiers applied to the concepts and their sub-concepts SigO: 15 terms GO-immune defense SigO-immune defense GO: 60 terms SigO: 10 terms GO-behavior SigO-behavior New Ontology New Ontology “immune defense” “behavior” MA vs. MeSH MA: 15 terms MeSH: 18 terms MA-nose MeSH-nose MA: 77 terms MeSH: 39 terms MA-ear New Ontology MeSH-ear New Ontology “nose” “ear” MA: 112terms MeSH: 45 terms MA-eye MeSH-eye New Ontology “eye” 25 Evaluation – influence of number of PubMed abstracts Evaluation Matchers Basic, StrucCl, StrucSim, StrucClSim Term, TermWN, Dom 26 Parameters Different results for different cases. Quality of results does not necessarily increase when corpora become larger. Maximum number of PubMed abstracts Quality of suggestions: precision/recall Thresholds : 0.4, 0.5, 0.6, 0.7, 0.8 Weights for combination: 1.0/1.2 Time 27 Evaluation – quality of suggestions 28 Evaluation – quality of suggestions Basic 1 1 0.8 0.8 0.7 0.7 nos e ear 0.4 precision ID 0.5 B B 0.6 rec all 0.9 0.9 0.6 ID 0.5 nose ear 0.4 eye eye Basic usually outperforms structure-based algorithms. No clear winner among StrucCl and StrucSim. StruClSim performs worse than StrucCl and StrucSim. 0.3 0.3 0.2 0.2 0.1 0.1 0 0 0.4 0.5 0. 6 threshold 0.7 0.8 0.4 0.5 0.6 0.7 0.8 threshold 29 30 5 Evaluation – quality of suggestions Evaluation – quality of suggestions Terminological matchers 1 1 0.9 0.9 0.9 0.8 0.8 0.8 0.7 0.7 0.7 B 0.7 B 0.6 ID 0.6 ID 0.5 nose 0.5 nose 0.4 ear 0.4 ear 0.3 eye 1 ear eye 0.3 0.2 0.6 ID nose 0.5 ear 0.4 eye 0.8 0.2 0.2 0.2 0.1 0.1 0 0 0.4 0.5 0.6 0.7 eye 0.3 0.3 0.1 0.1 1 0.9 precision nos e 0.4 precision ID 0.5 recall B B 0.6 recall Domain matcher 0.8 0.4 0.5 0.6 0.7 0.8 0 0 0.4 0.5 0.6 0.7 0.8 0.4 0.5 0.6 0.7 threshold 0.8 threshold threshold threshold 31 Evaluation – quality of suggestions Evaluation - time Comparison of the matchers CS_TermWN 32 ⊇ CS_Dom ⊇ CS_Basic T_StrucCl ~ T_StrucClSim > T_StrucSim > T_Basic 1100 StrucCl StrucClSim Combinations of the different matchers leads to higher quality results StrucSim 900 700 Basic 500 time (sec.) case: ear 300 Dom 100 Term TermWN -100 33 Outline Conclusion Introduction 34 Motivation Ontologies Ontology alignment Instance-based algorithms for aligning ontologies using life science literature Evaluations of matchers Ontology alignment approaches using life science literature Conclusion and Future Work 35 Basic outperforms structure-based approaches Our structure-based approaches do not require previous alignments Combination with other approaches gives best results 36 6 Future Work Algorithms Classify abstracts to multiple concepts Use of auxiliary information Other classifiers Structure-based filtering Evaluation tool - KitAMO 37 7