Aligning and Combining Ontologies Nigam Shah nigam@stanford.edu Alignment: What and Why Alignment Additivity of Annotations Completeness Ontology Alignment Alignment (R) = the identification of synonymy relationship b/w terms from different ontologies. Mapping = the identification of some relationship b/w terms from different ontologies. Alignment (R) Mapping Alignment ‘Alignment’ = the process of detecting potential mappings Approaches to alignment Pre-defined, during the process of creation of the ontology… The OBO Foundry paradigm (http://obofoundry.org) Authors discuss, argue, vote and reach a consensus Takes a long time! Post-hoc, after the relevant ontologies have been in use for some time Human curated might not scale Algorithm driven (PROMPT, FOAM …) might not be accurate enough Why care about ‘alignment’? Alignment [/combination] of ontologies is essential for creating structured annotations, analyzing existing annotations and integrating annotations derived from different ontologies Annotations (biomedical) Annotation = An assertion declaring a relationship b/w a biomedical entity and a type in an ontology. e.g. p53 <associated_with> cell death Annotations tell us what the biologists believe to be true (in particular or in general) Most annotations are based on particular observations and are generalized during interpretation by a biologist/curator. Semantics of annotations are not always declared apriori (e.g. associated_with, involves) Annotations of clinical / medical data are usually not generalized but are at the particular (or instance) level. Using GO annotations Descriptions built by connecting/linking ontology terms Biologists interpret a list of genes and form a result statement such as: The photosynthesis genes located in the chloroplast are repressed in response to ozone stress and have the ABRE binding site enriched in their promoters. … but we need more! A sentence conveying <Some molecular function> in <some biological process> at <some cellular location> is more meaningful than the three terms separately. Such statements are articulated in natural language in a paper (which some one has to “mine” using text analysis or curate) Simple mapping/linking/aligning/combining is not enough. I believe that we also need to SPECIFY HOW TO COMBINE terms from different ontologies as “terminals” to create statements at a higher level. Adding more structure to GO… OBOL OBOL Relations Ontology Relations Ontology ?<link>? <Some MF> in <Some BP> Between-ontology structure Adding structure [beyond GO]: PATO The building blocks of phenotype descriptions: EQ Entity (bearer) such as spermatocyte, wing Quality (property, attribute) - a kind of dependent continuant Formally, an EQ description defines: - a Quality which inheres_in a bearer entity PATO: Annotation patterns for absence are under discussion e.g. "spermatocyte devoid of asters" E = CL:spermatocyte Inheres in the spermatocyte Q = PATO:lacks_part The quality/relation of missing some part or parts E2 = GO-CC:aster The quality is with respect to the type "aster" Semantically rich cross ontology annotations 1. Relationship ontology 2. Mouse Pathology ontology 3. Tissue/Organ 4. Gene ontology Basal layer of organ shows membranous staining mRNA of genes encoding proteins with mf in bp at cc is increased in sample-id which shows some pathology in some tissue in some organ Queries enabled: 1. Identify all images with a specific pathology 2. Identify cases with pathology and some gene expression changes 3. Correlate changes biological processes with change in morphology Discovery enabled: 1. Classify samples in expression space and “look” for histological changes that correlate with it. HOW WHY Increasing structure b/w Ontologies Machine Prose Ontologies Reactome BioPAX 5% 15% PaTO GO, FMA 80% is_a, part_of has_quality has_participant has_reaction effects, induces Have to be aware of the difference b/w markup, indexing, key wording etc and explicit model specification. Currently it is not clear which one if of more value More formal relations [RO, BioTOP-relations, what else?] Open Questions/Challenges Creation/acceptance of a systematic formalism for creating semantically rich annotations. (e.g. associated_with, involves) A generic tool that uses ontologies and allow the user to compose terms and cross ontology annotations Easy term/annotation composition easy navigation of ontologies for finding relevant entities that participate in the term composition and/or composed annotations. Control the amount of alternative [compositional] statements allowed Can we allow users to type in what ever they want, parse it and "hook" it into ontologies to generate a compositional representation of what they typed and ask them to "accept" or edit. Iterate with the user at point of first declaration of the statements rather than having to curate it again later. Estimate annotation man-power (how much time will this take?) and return on time spent. Find a suitable domain/data-set and create composable annotations for several data types (or articulate results of existing studies in the composable form) to demonstrate utility Next conference? “enabling technologies for ontological access to clinical and animal model data” A hands-on problem solving workshop – addressing a use case