2-The_OBO_Foundry - Buffalo Ontology Site

advertisement
The OBO Foundry
Barry Smith
1
History of Ontology as Computational
Artifact
1970s: AI (based on FOL: McCarthy, Hayes)
1980s: KR, Knowledge Interchange Formats
(Gruber, Hobbs ...)
1999: GO, OBO format (Ashburner, ...)
2000s: Semantic Web (based on OWL; Horrocks,
Hendler, 1000 lite ontologies)
2009: Reconciliation of OBO with OWL; but still 2
methodologies: OBO Foundry; NCBO Bioportal
2
Ontology and the Semantic Web
• html demonstrated the power of the Web to
allow sharing of information
• can we use semantic technology to create a Web
2.0 which would allow algorithmic reasoning with
online information based on XLM, RDF and above
all OWL (Web Ontology Language)?
• can we use RDF and OWL to break down silos,
and create useful integration of on-line data and
information?
3
people tried, but the more they were successful,
they more they failed
OWL breaks down data silos via controlled
vocabularies for the description of data
dictionaries
Unfortunately the very success of this approach
led to the creation of multiple, new, semantic
silos – because multiple ontologies are being
created in ad hoc ways
4
reasons for this effect
• Semantic Web (original) idea: if a million ‘lite
ontologies bloom’, then somehow intelligence
will be created
• let’s all build new ones (shrink-wrapped
software mentality – you will not get paid for
reusing existing ontologies
• requirements-driven software development,
promotes forking, reduces potential for
secondary uses
5
Ontology success stories, and some
reasons for failure
•
A fragment of the “Linked Open
Data” in the biomedical domain
6
What you get with ‘mappings’
HPO: all phenotypes (excess hair loss, duck feet
...)
7
What you get with ‘mappings’
HPO: all phenotypes (excess hair loss, duck feet ...)
NCIT: all organisms
8
What you get with ‘mappings’
all phenotypes (excess hair loss, duck feet)
all organisms
allose (a form of sugar)
9
What you get with ‘mappings’
all phenotypes (excess hair loss, duck feet)
all organisms
allose (a form of sugar)
Acute Lymphoblastic Leukemia (A.L.L.)
10
Mappings are hard
They are fragile, and expensive to maintain
Need new authorities to maintain(one for each
pair of mapped ontologies), yielding new risk
of forking – who will police the mappings?
The goal should be to minimize the need for
mappings, by avoiding redundancy in the first
place
Invest resources in disjoint ontology modules
which work well together – reduce need for
mappings to minimum possible
11
Why should you care?
• you need to create systems for data mining
and text processing which will yield useful
digitally coded output
• if the codes you use are constantly in need of
ad hoc repair huge, resources will be wasted
• serious investment in annotation will be
defeated from the start
• relevant data will not be found, because it will
be lost in multiple semantic cemeteries
12
How to do it right?
• how create an incremental, evolutionary process,
where what is good survives, and what is bad
fails
• where the number of ontologies needing to be
linked is small
• where links are stable
• create a scenario in which people will find it
profitable to reuse ontologies, terminologies and
coding systems which have been tried and tested
13
Reasons why GO has been
successful
It is a system for prospective standardization
built with coherent top level but with content
contributed and monitored by domain specialists
Based on community consensus
Updated every night
Clear versioning principles ensure backwards
compatibility; prior annotations do not lose their
value
Initially low-tech to encourage users, with
movement to more powerful formal approaches
(including OWL-DL – though GO community still
recommending caution)
14
GO has learned the lessons of
successful cooperation
• Clear documentation
• The terms chosen are already familiar
• Fully open source (allows thorough testing in
manifold combinations with other ontologies)
• Subjected to considerable third-party critique
• Tracker for user input with rapid turnaround and
help desk
15
GO has been amazingly successful in
overcoming the data balkanization
problem
but it covers only generic biological entities of
three sorts:
– cellular components
– molecular functions
– biological processes
no diseases, symptoms, disease biomarkers,
protein interactions, experimental processes …
16
RELATION
TO TIME
CONTINUANT
INDEPENDENT
OCCURRENT
DEPENDENT
GRANULARITY
ORGAN AND
ORGANISM
Organism
(NCBI
Taxonomy)
CELL AND
CELLULAR
COMPONENT
Cell
(CL)
MOLECULE
Anatomical
Organ
Entity
Function
(FMA,
(FMP, CPRO) Phenotypic
CARO)
Quality
(PaTO)
Cellular
Cellular
Component Function
(FMA, GO)
(GO)
Molecule
(ChEBI, SO,
RnaO, PrO)
Molecular Function
(GO)
Biological
Process
(GO)
Molecular Process
(GO)
OBO (Open Biomedical Ontology) Foundry proposal
(Gene Ontology in yellow)
17
RELATION
TO TIME
CONTINUANT
INDEPENDENT
OCCURRENT
DEPENDENT
GRANULARITY
CELL AND
CELLULAR
COMPONENT
MOLECULE
Anatomical
Entity
(FMA,
CARO)
Cell
(CL)
Cellular
Component
(FMA, GO)
Molecule
(ChEBI, SO,
RnaO, PrO)
Organ
Function
(FMP, CPRO)
environments
are here
ORGAN AND
ORGANISM
Organism
(NCBI
Taxonomy)
Phenotypic
Quality
(PaTO)
Biological
Process
(GO)
Cellular
Function
(GO)
Molecular Function
(GO)
Molecular Process
(GO)
Environment Ontology
18
RELATION
TO TIME
CONTINUANT
INDEPENDENT
OCCURRENT
DEPENDENT
GRANULARITY
COMPLEX OF
ORGANISMS
ORGAN AND
ORGANISM
CELL AND
CELLULAR
COMPONENT
MOLECULE
Family, Community,
Deme, Population
Population
Phenotype
Organ
Anatomical
Function
Organism
Entity
(FMP, CPRO)
(NCBI
(FMA,
Phenotypic
Taxonomy)
CARO)
Quality
(PaTO)
Cellular
Cellular
Cell
Component Function
(CL)
(FMA, GO)
(GO)
Molecule
(ChEBI, SO,
RnaO, PrO)
Molecular Function
(GO)
Population-level ontologies
Population
Process
Biological
Process
(GO)
Molecular Process
(GO)
19
Ontology success stories, and
some reasons for failure
•
20
RELATION
TO TIME
CONTINUANT
INDEPENDENT
OCCURRENT
DEPENDENT
GRANULARITY
COMPLEX OF
ORGANISMS
ORGAN AND
ORGANISM
CELL AND
CELLULAR
COMPONENT
MOLECULE
Family, Community,
Deme, Population
Population
Phenotype
Organ
Anatomical
Function
Organism
Entity
(FMP, CPRO)
(NCBI
(FMA,
Phenotypic
Taxonomy)
CARO)
Quality
(PaTO)
Cellular
Cellular
Cell
Component Function
(CL)
(FMA, GO)
(GO)
Molecule
(ChEBI, SO,
RnaO, PrO)
Molecular Function
(GO)
http://obofoundry.org
Population
Process
Biological
Process
(GO)
Molecular Process
(GO)
21
The OBO Foundry: a step-by-step,
evidence-based approach to expand
the GO
 Developers commit to working to ensure
that, for each domain, there is community
convergence on a single ontology
 and agree in advance to collaborate with
developers of ontologies in adjacent
domains.
http://obofoundry.org
22
OBO Foundry Principles
 Common governance (coordinating editors)
 Common training
 Common architecture to overcome Tim
Berners Lee-ism:
• simple shared top level ontology
• shared Relation Ontology:
www.obofoundry.org/ro
23
Open Biomedical Ontologies Foundry
Seeks to create high quality, validated terminology
modules across all of the life sciences which will be
• one ontology for each domain, so no need for
mappings
• close to language use of experts
• evidence-based
• incorporate a strategy for motivating potential
developers and users
• revisable as science advances
24
Principles
http://obofoundry.org/wiki/index.php/OBO_Foundry
Principles
25
Pistoia Alliance
Open standards for data and technology interfaces in
the life science research industry
 consortium of major pharmaceutical and life
science companies
 can we address the data silo problems created
by multiplicity of proprietary terminologies by
declaring terminology ‘pre-competitive’
 require shared use of something like OBO
Foundry ontologies in presentation of
information?
26
A high-level master taxonomy of included entity-types (drug, disease, target, mechanism, cell) etc
and references to which ontology instances describe them
Drug type (small molecule,
Drug delivery mechanism (oral, Developmental Status
antibody, siRNA etc)
inhaled, injectable)
Disease †
Disease Models (and diseaseDisease state (chronic, acute)
relevant model assays, e.g.
mouse model of huntingtons)
Pharmacology measurement
type (IC50, IC90, Ki)*
Pharmacology: Assay Type (in
vitro, cell-free)
Target (beta2 receptor, gastric
pump)
Pharmacology: Assay Conditions
(passage number, media?)
Mechanism of action (inhibitor,
agonist)
Species/Strain for experimental
models (including plants, fungi
etc)
Cell type (adipocyte, neuron)
Cell state (quiescent,
pluripotent,differentiating)
Tissue-state (e.g necrosing)
Cell Line
Genome Variation type (SNP,
CNV)
Protein (limited to key species?)
Protein-protein interaction type
(inhibits, phosphorylates)
Author ‡
Genomic analysis technique
(array, next-gen seq etc)
Toxicology observation
Bioprocess
Tissue
Biomarker type (imaging,
pharmacological)
Gene (limited to key species?)
Post-translational modification
ADME:
Institute (company, university)
Brain-region
Pathway
27
Virtual Physiological Human
28
Only with a prospective standard
like that of the OBO Foundry could
something like the VPH work
designed to guarantee interoperability of ontologies
from the very start (and to keep out weeds)
initial set of 10 criteria tested in the annotation of
scientific literature
model organism databases
life science experimental results
29
RELATION TO
TIME
GRANULARITY
INDEPENDENT
ORGAN AND
ORGANISM
Organism
(NCBI
Taxonomy)
CELL AND
CELLULAR
COMPONENT
Cell
(CL)
MOLECULE
CONTINUANT
DEPENDENT
Anatomical
Organ
Entity
Function
(FMA,
(FMP, CPRO) Phenotypic
CARO)
Quality
(PaTO)
Cellular
Cellular
Component Function
(FMA, GO)
(GO)
Molecule
(ChEBI, SO,
RnaO, PrO)
OCCURRENT
Molecular Function
(GO)
Organism-Level
Process
(GO)
Cellular Process
(GO)
Molecular
Process
(GO)
OBO Foundry coverage
30
ORTHOGONALITY
modularity ensures
•
•
•
•
•
annotations can be additive
division of labor amongst domain experts
high value of training in any given module
lessons learned in one module can benefit
work on other modules
incentivization of those responsible for
individual modules
31
Benefits of coordination
•
•
•
Can more easily reuse what is made by others
Can more easily inspect and criticize what is
made by others
Leads to innovations (e.g. Mireot strategy for
importing terms into ontologies)
32
8 Foundry members (2010)
CHEBI: Chemical Entities of Biological
Interest
GO: Gene Ontology
PATO: Phenotypic Quality Ontology
PRO: Protein Ontology
XAO: Xenopus Anatomy Ontology
ZFA: Zebrafish Anatomy Ontology
33
RELATION
TO TIME
CONTINUANT
INDEPENDENT
OCCURRENT
DEPENDENT
GRANULARITY
ORGAN AND
ORGANISM
Organism
(NCBI
Taxonomy)
CELL AND
CELLULAR
COMPONENT
Cell
(CL)
Anatomical
Entity
(FMA,
CARO)
Organ
Function
(FMP, CPRO) Phenotypic
Quality
XAO ZFA
(PaTO)
Cellular
Component
(FMA, GO)
Molecule (SO, RnaO)
MOLECULE
ChEBI
PRO
Biological
Process
(GO)
Cellular
Function
(GO)
Molecular Function
(GO)
Molecular
Process
(GO)
Current Foundry members in yellow
34
ORGAN AND
ORGANISM
Organism
NCBI
Taxonomy
CARO FMA
XAO
CELL AND
CELLULAR
COMPONENT
Organ
Function
(FMP, CPRO) Phenotypic
Quality
ZFA
(PaTO)
Cell
(CL)
Cellular
Component
(FMA, GO)
SO
RnaO
ChEBI
PRO
MOLECULE
Biological
Process
(GO)
Cellular
Function
(GO)
Molecular Function
(GO)
Molecular
Process
(GO)
Prospective Foundry ontologies (in green):
Foundational Model of Anatomy Ontology (FMA)
Cell Ontology (CL)
Sequence Ontology (SO)
RNA Ontology (RnaO)
35
top level
mid-level
Basic Formal Ontology (BFO)
Information Artifact
Ontology
(IAO)
Ontology for Biomedical
Investigations
(OBI)
Anatomy Ontology
(FMA*, CARO)
Cell
Ontology
(CL)
domain level
Cellular
Component
Ontology
(FMA*, GO*)
Environment
Ontology
(EnvO)
Subcellular Anatomy Ontology (SAO)
Sequence Ontology
(SO*)
Protein Ontology
(PRO*)
Ontology of General
Medical Science
(OGMS)
Infectious
Disease
Ontology
(IDO*)
Phenotypic
Quality
Ontology
(PaTO)
Biological
Process
Ontology (GO*)
Molecular
Function
(GO*)
OBO Foundry Modular Organization
36
Problem cases
Common Anatomy Reference Ontology
Disease Ontology
Function Ontologies
Cellular Component Function
Cellular Function
Organ Function
Artifact Function (pumping, transporting ...)
Environment Ontology
Species Ontology (NCBI Taxonomy)
37
IDO (Infectious Disease Ontology) Core
Follows GO strategy of providing a canonical
ontology of what is involved in every
infectious disease – host, pathogen, vector,
virulence, vaccine, transmission –
accompanied by IDO Extensions for specific
diseases, pathogens and vectors
Provides common terminology resources
and tested common guidelines for a vast
array of different disease communities
38
IDO (Infectious Disease Ontology) Consortium
• MITRE, Mount Sinai, UTSouthwestern – Influenza
• IMBB/VectorBase – Vector borne diseases (A.
gambiae, A. aegypti, I. scapularis, C. pipiens, P.
humanus)
• Colorado State University – Dengue Fever
• Duke University – Tuberculosis, Staph. aureus
• Cleveland Clinic – Infective Endocarditis
• University of Michigan – Brucilosis
• Duke University, University at Buffalo – HIV
39
Ontology for General Medical
Science
http://code.google.com/p/ogms/
(OBO) http://purl.obolibrary.org/obo/ogms.obo
(OWL) http://purl.obolibrary.org/obo/ogms.owl
40
OGMS-based initiatives
Vital Signs Ontology (VSO) (Welch Allyn)
EHR / Demographics Ontology
Infectious Disease Ontology
Mental Health Ontology
Emotion Ontology
41
Ontology for General Medical
Science
Jobst Landgrebe (then Co-Chair of the HL7
Vocabulary Group):
“the best ontology effort in the whole
biomedical domain by far”
42
EXPERIMENTAL
ARTIFACTS
Ontology for Biomedical Investigations (OBI)
CLINICAL
MEDICINE
Ontology of General Medical Science (OGMS)
INFORMATION
ARTIFACTS
Information Artifact Ontology (IAO)
How to keep clear about the distinction
• processes of observation,
• results of such processes (measurement
data)
• the entities observed
43
Download