Computational Biology and Informatics Laboratory

advertisement
Development of an Application
Ontology for Beta Cell Genomics Based
On the Ontology for Biomedical
Investigations
Jie Zheng, Elisabetta Manduchi and Christian J. Stoeckert Jr
Department of Genetics, Perelman School of Medicine,
University of Pennsylvania
ICBO July 2013, Montreal
Computational Biology and Informatics Laboratory
Beta Cell Genomics Database
• http://genomics.betacell.org/gbco/
• A functional genomics resource focused on
pancreatic beta cell research supporting a
consortium of 62 investigators and their groups
• 128 studies (version 4.1) addressing the biology
of beta cells, aspects of diabetes, and the
production of functional beta cells from
– embryonic stem cells
– mature cells of other types such as exocrine cells
Computational Biology and Informatics Laboratory
Desired Features of A
Beta Cell Genomics Ontology
• Support semantic annotation of beta cell studies with enough
granularity covering both biological and experimental aspects
– Specimen characteristics, species, strain, anatomical entity, cell type, etc.
– Assay, protocol, data analysis methods, etc.
• Enable queries of increasing complexity (competency questions)
– Find gene expression data of endocrine cells
– Find studies using cells which develop from either mesoderm or endoderm
– Find high throughput sequencing gene expression data in samples obtained
during the embryo stage from mouse strains with genetic background
C57BL/6J
• Enable knowledge discovery based on computable definitions
– Automated cell type classification based on cell phenotype/functions and/or
genetic signatures using reasoners
• Leverages existing efforts covering the domains of investigations,
cells, anatomy, proteins, and genes
– OBO Foundry ontologies
Computational Biology and Informatics Laboratory
OBO Foundry Reference Ontologies
• Shared common upper level ontology, Basic Formal
Ontology (BFO) and common relations
• Orthogonal interoperable ontologies – reuse existing terms
defined in OBO Foundry ontologies
• Each reference ontology covers a specific domain:
– Cell type ontology (CL) : cell type
– Gene ontology (GO): biological process, molecular function, cell
components
– Protein ontology (PR): protein (cross species)
– Uber anatomy ontology (UBERON): cross-species anatomy
– Ontology for biomedical investigations (OBI): all aspects of an
experiments
Facilitate ontology integration
Computational Biology and Informatics Laboratory
Motivation for Developing An Application
Ontology for Beta Cell Genomics Research
• No single OBO Foundry ontology can meet our
needs
• No ontology available covers enough
granularity needed by beta cell genomics
research
• Typical use of disconnected multiple
ontologies loses semantic power
Computational Biology and Informatics Laboratory
Principles of Beta Cell Genomics
Ontology (BCGO) Development
• Reuse terms existing in the OBO Foundry
ontologies if possible
• Reuse existing ontology design patterns
• Use OBI as the ontology framework and
integrate subsets of other OBO Foundry
ontologies into it
• Enrich the ontology with additional axioms
when needed
Computational Biology and Informatics Laboratory
Ontology for Biomedical
Investigations (OBI)
• Cover all aspects of an investigation
• Contains classes that connect OBI with other OBO Foundry
reference ontologies, such as CL, UBERON, and GO, and
serve as the parent of referenced external terms
OBI
specimen
CL
Cell
material
entity
gross
anatomical entity
molecular entity
subClass of
is a
cultured cell
CLO
UBERON
cellular_component
biological_process
process
assay
data transformation
data item
protocol
measurement
unit label
GO
...
information
content
entity
ChEBI
UO
Computational Biology and Informatics Laboratory
Development of BCGO
1. Identification of terms defined in OBO
Foundry Ontologies
2. Extraction of terms from OBO Foundry
ontologies
3. Integration of terms from different OBO
Foundry ontologies
4. Enrichment of BCGO by adding additional
terms and axioms
Computational Biology and Informatics Laboratory
Step 1: Identification of Terms Defined
in OBO Foundry Ontologies
1. Draw terms from the MO to OBI mapping list
–
Beta Cell Genomics Database was annotated using
multiple controlled vocabularies and ontologies
including the MGED Ontology (MO)
2. Bioportal Annotation Tool
–
–
High accuracy (>95%)
May not include the latest version of ontologies
3. Bioportal Search Tool
–
–
Includes partial and exact matches of input text
Requires more manual review as compared to the
Bioportal Annotation Tool
Computational Biology and Informatics Laboratory
Most Terms Needed Could Be Matched
to Small Subsets of Many Ontologies
Ontology
Version
OBI
2012-07-01
BTO*
12/20/2012
CARO
N/A
EnVO
2013-01-08
ERO*
2012-10-03
FMA
3.1
GAZ
1.512
MP
07/14/2012
OGMS
2011-09-20
RS
1/14/2013
SO
11/1/2012
SWO
0.5
EFO*
2.31
ChEBI
100
CLO
2.1.03
GO
2012-12-18
NCBITaxon 2013-01-24
PR
31.0.
UO
2012-08-30
CL
2013-01-31
PATO
01/09/2013
UBERON
2013-01-07
Total
Classes
2042
5391
50
1557
1579
83281
518195
9164
81
3361
2151
661
4057
38901
35436
38747
981148
35488
313
2120
2426
7318
Matched
Terms
200
2
1
1
2
1
1
1
3
1
1
1
40
12
11
2
1
1
67
46
19
126
• 852 terms used in the Beta Cell
Genomics database
• 644 terms were matched to 543
ontology terms
• Mapped terms defined in 24 OBO
Foundry ontologies including BFO
and IAO
*: application ontology
BTO: BRENDA tissue / enzyme source
SWO: Software Ontology
CARO: Common Anatomy Reference Ontology EFO: Experimental Factor Ontology
EnVO: Environment Ontology
ChEBI: Chemical entities of
ERO: eagle-i resource ontology
biological interest
FMA: Foundational Model of Anatomy
CLO: cell line ontology
GAZ: Gazetteer
NCBITaxon: NCBI organismal
MP: Mammalian Phenotype
classification
OGMS: Ontology for General Medical Science PR: protein ontology
RS: Rat Strain ontology
UO: Units of measurement
SO: Sequence types and features
PATO: Phenotypic quality
Computational Biology and Informatics Laboratory
Step 2: Extraction of Terms from
OBO Foundry Ontologies
• Ontodog tool: OBI subset extraction
– Generates a community view including all related
terms and axioms
Reference: Zheng et al. International Conference on Biomedical
Ontology (ICBO), Graz, Austria, July 2012
• OntoFox tool for extracting terms from all other
OBO Foundry ontologies
– Option 1: MIREOT
– Option 2: include minimal intermediate ontology
terms
– Option 3: all related terms and axioms
Reference: Xiang et al. (2010) BMC Research Notes, 3:175
Computational Biology and Informatics Laboratory
Extraction Option 1
• Applied when five or less terms in an ontology
were used by BCGO
• MIREOT: minimum information to reference
an external ontology term
Reference: Courtot et al. (2011) Applied Ontology, 6:23
– IRI of the term
– IRI of the source ontology
– IRI of the term parent in the target ontology
– Can be done manually
Computational Biology and Informatics Laboratory
Extraction Option 2
• Keep hierarchical structure with minimal intermediates
• Example: reference human, mouse, rat in NCBITaxon
MIREOT
Include computed
intermediate classes
Include all intermediate
classes
… 14 intermediate classes
Option 2
Computational Biology and Informatics Laboratory
Extraction Option 3
• Reuse logical axioms of terms defined in source ontologies
• Example – ontology design pattern of cell in CL
Meehan et al. BMC Bioinformatics 2011, 12:6
Computational Biology and Informatics Laboratory
Summary of Extraction
Methods And Results
Computational Biology and Informatics Laboratory
Step 3: Integration of Terms Extracted
From Different OBO Ontologies (1)
Import retrieved terms into OBI subset (BCGO community
view) under corresponding parent classes
terms of interest
In other OBO
Foundry ontologies
Subset of OBI
Beta Cell
Genomics
view of OBI
material
entity
ontology
specimen
subset of CL
Cell
gross
anatomical entity
molecular entity
OntoFox output file
cultured cell
subset of CLO
subClass of
subset of UBERON
is a
cellular_component
biological_process
process
assay
data transformation
data item
protocol
measurement
unit label
subset of GO
...
information
content
entity
subset of ChEBI
subset of UO
- Using OWL:imports
- Keep retrieved terms
belong to same source
ontology in one OWL file
- Contains 2389 classes
Computational Biology and Informatics Laboratory
Step 3: Integration of Terms Extracted
From Different OBO Ontologies (2)
To avoid inconsistencies caused by integrating terms from
different paths we remove textual and logical definitions of
terms referenced to external ontologies
PATO
PATO terms retrieved from OBI
deprecated
Removal of definitions of
PATO terms in retrieved
OBI subset
Retrieval of definitions
from PATO
Computational Biology and Informatics Laboratory
Summary of Extraction
Methods And Results
Computational Biology and Informatics Laboratory
Step 4: Enrichment of BCGO
• 208 terms that could not be matched to OBO Foundry ontologies
• 42 new terms have been added into BCGO
• Example – ‘insulin-expressing mature beta cell’
insulin secretion
detection of glucose
mature
insulin
type B pancreatic cell
insulin-expressing
mature beta cell
islet of Langerhans
Meehan et al. BMC Bioinformatics 2011, 12:6
Computational Biology and Informatics Laboratory
Ontology Validation
• Annotation: 83% terms covered by BCGO
• Competency questions can be answered:
Find gene expression data of endocrine cells
Find studies using cells which develop from either
mesoderm or endoderm
Find high throughput sequencing gene expression
data in samples obtained during the embryo stage
from mouse strains with genetic background C57BL/6J
• Automated cell type classification: ongoing
Computational Biology and Informatics Laboratory
Challenges
• OBO Foundry ontologies use different versions
of upper level ontology – BFO
• Inconsistent representation of the same
entities in different OBO Foundry ontologies
– Example, ‘cell line cell’, alignment work has been
done by CL, CLO and OBI developers
– Resolution: Alignment work presented in the ICBO
poster session with title ‘Alignment of Cultured
Cell Modeling Across OBO Foundry Ontologies:
Key Outcomes and Insights’ by Dr. Matthew Brush
Computational Biology and Informatics Laboratory
Summary
• BCGO is available on:
http://purl.obolibary.org/obo/bcgo.owl
• All related documents are available on:
http://code.google.com/p/bcgo-ontology/
• Development of a cross-domain application ontology
– based on the OBI framework
– reuse existent reference ontologies and ontology design
patterns
• The approach should be generally applicable when
using interoperable source ontologies
• Orthogonal interoperable OBO Foundry ontologies
facilitate ontology integration
Computational Biology and Informatics Laboratory
Acknowledgements
•
•
•
•
Emily Greenfest-Allen
Matthew Brush
And OBI, CLO, CL developers
Oliver He and Allen Xiang
• NIH grant 1R01GM093132-01 and by 5 U01
DK 072473
Computational Biology and Informatics Laboratory
Questions?
Computational Biology and Informatics Laboratory
Advantages Of Using OntoFox
• Provide many different options for ontology
terms extractions
• Backend RDF store contains all OBO Foundry
ontologies and reload daily if updated
• Input settings can be saved as a text format
file and can be reused
Computational Biology and Informatics Laboratory
Download