HowToBuildAnOntology.. - Buffalo Ontology Site

advertisement
Barry Smith
1
Three US partner institutions:
• Stanford University Biomedical Informatics Research
• Mayo Clinic Department of Biomedical Informatics
• University at Buffalo Department of Philosophy
https://immport.niaid.nih.gov/
3
Lindsay Cowell
4
http://infectiousdiseaseontology.org
5
How to Build an Ontology
Barry Smith
http://ontology.buffalo.edu/smith
6
Schedule and Rules for Practicum
http://ncorwiki.buffalo.edu/index.php/Immunology_Ontology
7
Why to Build an Ontology?
• Scientific data are stored in databases
• There are few constraints on the creation
of new databases
• Scientific data is siloed
• How to counteract this silo-formation?
• Create a common non-redundant suite of
ontologies covering all scientific domains
to annotate (‘tag’, ‘curate’) scientific
data
8
More precisely:
How to build this suite of ontologies?
How to build ontologies that will integrate well
together?
One answer: The Semantic Web
9
integration via Linked Open Data
• html demonstrated the power of the Web
to allow sharing of information
• use power of hyperlinks to break down
silos, and create useful integration of online data
• via Web Ontology Language (OWL)
10
Not allstories,
of the links
here
Ontology success
and
are what
they seem
some reasons
for failure
•
A fragment of the “Linked Open
Data” in the biomedical domain
11
The more Semantic Technology is
successful, they more it fails to
solve the problem of silos
Indeed it leads to the creation of
multiple, new, semantic silos
12
13
14
15
Ontology success stories, and some
reasons for failure
•
“Linked Open Data” = integration via mappings
16
What you get with ‘mappings’
all phenotypes (excess hair loss, duck feet)
17
What you get with ‘mappings’
HPO: all phenotypes (excess hair loss, duck feet ...)
NCIT: all organisms
What you get with ‘mappings’
all phenotypes (excess hair loss, duck feet)
all organisms
allose (a form of sugar)
19
What you get with ‘mappings’
all phenotypes (excess hair loss, duck feet)
all organisms
allose (a form of sugar)
Acute Lymphoblastic Leukemia (A.L.L.)
20
Mappings are hard
They are fragile, and expensive to maintain
This yields a new risk of forking
The goal should be to reduce need for mappings
to minimum possible
By creating orthogonal ontologies – one
ontology for each domain
Where to begin?
21
Uses of ‘ontology’ in PubMed abstracts
22
By far the most successful: GO (Gene Ontology)
23
GO provides a controlled system of terms for
use in annotating (describing, tagging) data
• multi-species, multi-disciplinary, open
source
• contributing to the cumulativity of scientific
results obtained by distinct research
communities
• compare use of kilograms, meters, seconds
in formulating experimental results
24
Hierarchical view representing
relations between represented
types
25
Anatomical
Structure
Anatomical Space
Organ Cavity
Subdivision
Organ
Cavity
Organ
Serous Sac
Cavity
Subdivision
Serous Sac
Cavity
Serous Sac
Organ
Component
Organ
Subdivision
Pleural Sac
Pleural
Cavity
Parietal
Pleura
Interlobar
recess
Organ Part
Mediastinal
Pleura
Pleura(Wall
of Sac)
Visceral
Pleura
Mesothelium
of Pleura
Tissue
US $200 mill. invested in literature and
data curation using GO
over > 12 million annotations relating gene
products described in the UniProt, Ensembl
and other databases to terms in the GO
experimental results reported in > 60,000
scientific journal articles manually annoted by
expert biologists using GO
27
GO has learned the lessons of
successful cooperation
•
•
•
•
•
•
•
Based on community consensus
Updated every night
Clear documentation
The terms chosen are already familiar
Fully open source
Subjected to considerable third-party critique
Tracker with rapid turnaround to identify errors
and gaps
28
compare: legends for maps
29
compare: legends for diagrams
30
legends for mathematical equations
xi = vector of measurements of gene i
k = the state of the gene ( as “on” or “off”)
θi = set of parameters of the Gaussian model
...
(see proposal p. 124f.)
legends
for chemistry
diagrams
or chemistry
diagrams
Prasanna, et al.
Chemical Compound Navigator: A
Web-Based Chem-BLAST,
Chemical Taxonomy-Based
Search Engine for Browsing
Compounds
PROTEINS: Structure, Function,
and Bioinformatics 63:907–917
(2006)
ontologies are legends for data
33
ontologies are legends for databases
GlyProt
MouseEcotope
sphingolipid
transporter
activity
DiabetInGene
GluChem
34
annotation using common ontologies
yields integration of databases
GlyProt
MouseEcotope
Holliday junction
helicase complex
DiabetInGene
GluChem
35
GO has limited coverage
represents only three groups of biological
entities:
– cellular components
– molecular functions
– biological processes
and it does not provide representations of
proteins, diseases, symptoms, …
OPEN BIOMEDICAL ONTOLOGIES FOUNDRY
36
RELATION
TO TIME
CONTINUANT
INDEPENDENT
OCCURRENT
DEPENDENT
GRANULARITY
ORGAN AND
ORGANISM
Organism
(NCBI
Taxonomy)
CELL AND
CELLULAR
COMPONENT
Cell
(CL)
MOLECULE
Anatomical
Organ
Entity
Function
(FMA,
(FMP, CPRO) Phenotypic
CARO)
Quality
(PaTO)
Cellular
Cellular
Component Function
(FMA, GO)
(GO)
Molecule
(ChEBI, SO,
RnaO, PrO)
Molecular Function
(GO)
Biological
Process
(GO)
Molecular Process
(GO)
Original OBO Foundry ontologies
(Gene Ontology in yellow)
37
RELATION
TO TIME
CONTINUANT
INDEPENDENT
OCCURRENT
DEPENDENT
GRANULARITY
CELL AND
CELLULAR
COMPONENT
MOLECULE
Anatomical
Entity
(FMA,
CARO)
Cell
(CL)
Cellular
Component
(FMA, GO)
Molecule
(ChEBI, SO,
RnaO, PrO)
Organ
Function
(FMP, CPRO)
environments
are here
ORGAN AND
ORGANISM
Organism
(NCBI
Taxonomy)
Phenotypic
Quality
(PaTO)
Biological
Process
(GO)
Cellular
Function
(GO)
Molecular Function
(GO)
Molecular Process
(GO)
Environment Ontology
38
RELATION
TO TIME
CONTINUANT
INDEPENDENT
OCCURRENT
DEPENDENT
GRANULARITY
COMPLEX OF
ORGANISMS
ORGAN AND
ORGANISM
CELL AND
CELLULAR
COMPONENT
MOLECULE
Family, Community,
Deme, Population
Population
Phenotype
Population
Process
Organ
Anatomical
Function
Organism
Entity
(FMP, CPRO)
(NCBI
(FMA,
Phenotypic
Taxonomy)
CARO)
Quality
(PaTO)
Cellular
Cellular
Cell
Component Function
(CL)
(FMA, GO)
(GO)
Molecule
(ChEBI, SO,
RnaO, PrO)
Molecular Function
(GO)
http://obofoundry.org
Biological
Process
(GO)
Molecular Process
(GO)
39
The OBO Foundry: a step-by-step,
evidence-based approach to expand
the GO
 Developers commit to working to ensure that,
for each domain, there is community
convergence on a single ontology
 and agree in advance to collaborate with
developers of ontologies in adjacent domains.
http://obofoundry.org
40
OBO Foundry Principles
 Common governance (coordinating editors)
 Common training
 Common architecture
• simple shared top level ontology = BFO
• shared Relation Ontology:
www.obofoundry.org/ro
• One ontology for each domain, so no need
for mappings
41
Anatomical
Structure
Anatomical Space
Organ Cavity
Subdivision
Organ
Cavity
Organ
Serous Sac
Cavity
Subdivision
Serous Sac
Cavity
Serous Sac
Organ
Component
Organ
Subdivision
Pleural Sac
Pleural
Cavity
Parietal
Pleura
Interlobar
recess
Organ Part
Mediastinal
Pleura
Tissue
Pleura(Wall
of Sac)
Visceral
Pleura
Mesothelium
of Pleura
42
For ontologies
it is generalizations that are
important = types, types,
kinds, species
43
Catalog vs. inventory
A
B
C
515287
521683
521682
DC3300 Dust Collector Fan
Gilmer Belt
Motor Drive Belt
44
types vs. instances
45
names of instances
46
names of types
47
An ontology is a representation
of types
We learn about types in reality from looking
at the results of scientific experiments in the
form of scientific theories
experiments relate to what is particular
science describes what is general
types
object
organism
animal
mammal
cat
siamese
instances
frog
3 kinds of (binary) relations
Between types
• human is_a mammal
• human heart part_of human
Between an instance and a type
• this human instance_of the type human
• this human allergic_to the type tamiflu
Between instances
• Mary’s heart part_of Mary
• Mary’s aorta connected_to Mary’s heart
50
Type-level relations presuppose the
underlying instance-level relations
A is_a B =def. A and B are types and all
instances of A are instances of B
A part_of B =def. All instances of A are
instance-level-parts-of some instance
of B
51
Anatomical
Structure
Anatomical Space
Organ Cavity
Subdivision
Organ
Cavity
Organ
Serous Sac
Cavity
Subdivision
Serous Sac
Cavity
Serous Sac
Parietal
Pleura
52
Organ
Component
Organ
Subdivision
Pleural Sac
Pleural
Cavity
Interlobar
recess
Organ Part
Mediastinal
Pleura
Pleura(Wall
of Sac)
Visceral
Pleura
Mesothelium
of Pleura
Tissue
The assertions linking terms in
ontologies must hold universally
Hence all type-level relations are provided with
All-Some definitions
A has-part B =def. All As have some B as
instance-level part
A part-of B = def. All As are instance-level parts
of some B
53
Ontology for Biomedical
Investigations
OBI representation of a trial in a neuroscience
study
OBI representation of a vaccine
protection investigation
57
Examples of ontology classes
(types) used in these examples
Ontology relations
BFO Top-Level Ontology
Continuant
Independent
Continuant
Occurrent
(always dependent
on one or more
independent
continuants)
Dependent
Continuant
60
RELATION TO
TIME
GRANULARITY
INDEPENDENT
ORGAN AND
ORGANISM
Organism
(NCBI
Taxonomy)
CELL AND
CELLULAR
COMPONENT
Cell
(CL)
MOLECULE
CONTINUANT
DEPENDENT
Anatomical
Organ
Entity
Function
(FMA,
(FMP, CPRO) Phenotypic
CARO)
Quality
(PaTO)
Cellular
Cellular
Component Function
(FMA, GO)
(GO)
Molecule
(ChEBI, SO,
RnaO, PrO)
OCCURRENT
Molecular Function
(GO)
Organism-Level
Process
(GO)
Cellular Process
(GO)
Molecular
Process
(GO)
OBO Foundry coverage
61
Two kinds of entities
occurrents (processes, events, happenings)
continuants (objects, qualities, states...)
62
You are a continuant
Your life is an occurrent
You are 3-dimensional
Your life is 4-dimensional
63
Dependent entities
require independent continuants as their
bearers
There is no run without a runner
There is no grin without a cat
64
the universal red
instantiates
the universal eye
instantiates
this particular case depends_on an instance of eye
of redness (of a
(in a particular fly)
particular fly eye)
Phenotype Ontology (PATO)
65
color
is_a
red
instantiates
the particular case
of redness (of a
particular fly eye)
anatomical structure
is_a
eye
instantiates
an instance of an
depends on
eye (in a particular
fly)
66
Dependent vs. independent
continuants
Independent continuants (organisms,
buildings, environments)
Dependent continuants (quality, shape,
role, propensity, function, status,
power, right)
67
All occurrents are dependent entities
They are dependent on those independent
continuants which are their participants
(agents, patients, media ...)
68
BFO Top-Level Ontology
Continuant
Independent
Continuant
Occurrent
(always dependent
on one or more
independent
continuants)
Dependent
Continuant
69
Blinding Flash of the Obvious (BFO)
Continuant
Independent
Continuant
Dependent
Continuant
thing
quality
.... .....
Occurrent
process
.......
70
OBO Foundry organized in terms of
Basic Formal Ontology
Each Foundry ontology can be seen as an
extension of a single upper level ontology
(BFO)
either post hoc, as in the case of the GO
or in virtue of creation ab initio via
downward population from BFO
71
top level
Basic Formal Ontology (BFO)
Ontology for
Biomedical
Investigations
(OBI)
Information Artifact
Ontology
mid-level
(IAO)
Anatomy Ontology
(FMA*, CARO)
domain
level
Cell
Ontology
(CL)
Cellular
Component
Ontology
(FMA*, GO*)
Environment
Ontology
(EnvO)
Subcellular Anatomy Ontology (SAO)
Sequence Ontology
(SO*)
Protein Ontology
(PRO*)
Spatial Ontology
(BSPO)
Infectious
Disease
Ontology
(IDO*)
Phenotypic
Quality
Ontology
(PaTO)
Biological
Process
Ontology (GO*)
Molecular
Function
(GO*)
Extension Strategy + Modular Organization
72
Principle of Low Hanging Fruit
Include even absolutely trivial assertions
(assertions you know to be universally true)
pneumococcal bacterium is_a bacterium
Computers need to be led by the hand
73
Principle of singular nouns
Terms in ontologies represent types
Goal: Each term in an ontology should
represent exactly one type
Thus every term should be a singular noun
74
MeSH
MeSH Descriptors
Index Medicus Descriptor
Anthropology, Education, Sociology and
Social Phenomena (MeSH Category)
Social Sciences
Political Systems
National Socialism
National Socialism is_a Political Systems
National Socialism is_a Anthropology ...
75
Principle: do not confuse words
with things
mouse =def. common name for the species
mus musculus
swimming is healthy and has eight letters
76
Principle of Aristotelian definitions
All definitions should be of the form
an S = Def. a G which Ds
where ‘G’ (for: genus) is the parent term of S
(for: species) in the corresponding reference
ontology
For example:
A human being is an animal which is rational
77
Single Inheritance
No kind in a classificatory hierarchy
should be asserted to have more
than one is_a parent on the
immediate higher level
78
Multiple Inheritance
thing
car
blue thing
is_a
is_a
blue car
79
Multiple Inheritance
is a source of errors
encourages laziness
serves as obstacle to integration with
neighboring ontologies
hampers use of Aristotelian methodology for
defining terms
hampers use of statistical search tools
80
Multiple Inheritance
thing
blue thing
car
is_a1
is_a2
blue car
81
Principle of asserted single
inheritance
Each reference ontology module should be
built as an asserted monohierarchy (a
hierarchy in which each term has at most
one parent)
Asserted hierarchy vs. inferred hierarchy
82
Ontology Development Principles
Reference ontologies – capture generic
content and are designed for aggressive
reuse in multiple different types of context
Single inheritance
Single reference ontology for each domain of
interest
Application ontologies – created by combining
local content with generic content taken
from relevant reference ontologies
top level
Basic Formal Ontology (BFO)
Ontology for
Biomedical
Investigations
(OBI)
Information Artifact
Ontology
mid-level
(IAO)
Anatomy Ontology
(FMA*, CARO)
domain
level
Cell
Ontology
(CL)
Cellular
Component
Ontology
(FMA*, GO*)
Environment
Ontology
(EnvO)
Subcellular Anatomy Ontology (SAO)
Sequence Ontology
(SO*)
Protein Ontology
(PRO*)
Spatial Ontology
(BSPO)
Infectious
Disease
Ontology
(IDO*)
Phenotypic
Quality
Ontology
(PaTO)
Biological
Process
Ontology (GO*)
Molecular
Function
(GO*)
OBO Foundry: Downward Population from BFO84/24
Example: The Cell Ontology
How to build an ontology
• import BFO into ontology editor (Protégé)
• work with domain experts to create an initial midlevel classification
• find ~50 most commonly used terms
corresponding to types in reality
• arrange these terms into an informal is_a
hierarchy according to this universality principle
•
A is_a B  every instance of A is an instance of B
• fill in missing terms to give a complete hierarchy
• (leave it to domain experts to populate the lower
levels of the hierarchy)
86
BFO Top-Level Ontology
Continuant
Independent
Continuant
Dependent
Continuant
Material Entity
Attribute
Occurrent
(always dependent
on one or more
independent
continuants)
Process
87
BFO Top-Level Ontology
Continuant
Process
Material Entity
Attribute
88
http://ncorwiki.buffalo.edu/index.php/
Immunology_Ontology
89
Terms for an Allergy Ontology
peanut allergy disease = IgE-mediated
hypersensitivity to peanut allergen(s)
allergic asthma
anaphylaxis (as reaction)
peanut allergy disorder = Mast cells and
basophils with peanut allergen-specific IgE
bound to their membranes
mast cell/basophil degranulation
anaphylaxis (as syndrome)
peanut allergy
allergic reaction
milk allergy
acute urticaria (hives)
ragweed allergy
allergic angioedema
dust mite allergy
allergic rhinitis
90
From the allergy
example
91
From the allergy example
92
93
Download