`alignment`?

advertisement
Aligning and Combining Ontologies
Nigam Shah
nigam@stanford.edu
Alignment: What and Why
Alignment
Additivity of Annotations
Completeness
Ontology Alignment
 Alignment (R) = the
identification of
synonymy relationship
b/w terms from different
ontologies.
 Mapping = the
identification of some
relationship b/w terms
from different ontologies.
Alignment (R)
Mapping
Alignment
 ‘Alignment’ = the process
of detecting potential
mappings
Approaches to alignment
Pre-defined, during the process of creation of
the ontology…
The OBO Foundry paradigm (http://obofoundry.org)
Authors discuss, argue, vote and reach a consensus
Takes a long time!
Post-hoc, after the relevant ontologies have
been in use for some time
Human curated  might not scale
Algorithm driven (PROMPT, FOAM …)  might not be
accurate enough
Why care about ‘alignment’?
Alignment [/combination] of ontologies is
essential for creating structured annotations,
analyzing existing annotations and integrating
annotations derived from different ontologies
Annotations (biomedical)
 Annotation = An assertion declaring a relationship
b/w a biomedical entity and a type in an ontology.
 e.g. p53 <associated_with> cell death
 Annotations tell us what the biologists believe to be
true (in particular or in general)
 Most annotations are based on particular observations and
are generalized during interpretation by a biologist/curator.
 Semantics of annotations are not always declared
apriori (e.g. associated_with, involves)
 Annotations of clinical / medical data are usually not
generalized but are at the particular (or instance)
level.
Using GO annotations
Descriptions built by connecting/linking
ontology terms
Biologists interpret a list of genes and form a
result statement such as:
The photosynthesis genes located in the chloroplast
are repressed in response to ozone stress and have
the ABRE binding site enriched in their promoters.
… but we need more!
 A sentence conveying <Some molecular
function> in <some biological process>
at <some cellular location> is more
meaningful than the three terms separately.
 Such statements are articulated in natural language in a
paper (which some one has to “mine” using text analysis or
curate)
 Simple mapping/linking/aligning/combining is not
enough.
 I believe that we also need to SPECIFY HOW TO COMBINE
terms from different ontologies as “terminals” to create
statements at a higher level.
Adding more structure to GO…
OBOL
OBOL
Relations Ontology
Relations Ontology
?<link>?
<Some MF> in <Some BP>
Between-ontology structure
Adding structure [beyond GO]: PATO
The building blocks of phenotype descriptions: EQ
Entity (bearer) such as spermatocyte, wing
Quality (property, attribute)
- a kind of dependent continuant
Formally, an EQ description defines:
- a Quality which inheres_in a bearer entity
PATO: Annotation patterns for absence are under
discussion
e.g. "spermatocyte devoid of asters"
E = CL:spermatocyte
Inheres in the spermatocyte
Q = PATO:lacks_part
The quality/relation of missing some part or parts
E2 = GO-CC:aster
The quality is with respect to the type "aster"
Semantically rich cross ontology annotations
1. Relationship ontology
2. Mouse Pathology ontology
3. Tissue/Organ
4. Gene ontology
Basal layer of organ
shows membranous
staining
mRNA of genes encoding proteins
with mf in bp at cc is increased in
sample-id which shows some
pathology in some tissue in
some organ
Queries enabled:
1. Identify all images with a specific pathology
2. Identify cases with pathology and some gene expression changes
3. Correlate changes biological processes with change in morphology
Discovery enabled:
1. Classify samples in expression space and “look” for histological changes that
correlate with it.
HOW
WHY
Increasing structure b/w Ontologies
Machine Prose
Ontologies
Reactome
BioPAX
5%
15%
PaTO
GO, FMA
80%
is_a, part_of
has_quality
has_participant
has_reaction
effects, induces
Have to be aware of the difference b/w markup, indexing, key wording etc and explicit model
specification. Currently it is not clear which one if of more value
More formal relations [RO, BioTOP-relations, what else?]
Open Questions/Challenges
 Creation/acceptance of a systematic formalism for creating
semantically rich annotations. (e.g. associated_with, involves)
 A generic tool that uses ontologies and allow the user to
compose terms and cross ontology annotations
 Easy term/annotation composition
 easy navigation of ontologies for finding relevant entities that participate in
the term composition and/or composed annotations.
 Control the amount of alternative [compositional] statements
allowed
 Can we allow users to type in what ever they want, parse it and "hook" it
into ontologies to generate a compositional representation of what they
typed and ask them to "accept" or edit. Iterate with the user at point of first
declaration of the statements rather than having to curate it again later.
 Estimate annotation man-power (how much time will this take?)
and return on time spent.
 Find a suitable domain/data-set and create composable
annotations for several data types (or articulate results of existing
studies in the composable form) to demonstrate utility
Next conference?
“enabling technologies for
ontological access to
clinical and animal model
data”
A hands-on problem solving workshop – addressing a use case
Download