PowerPoint

advertisement
Primary Immunodeficiency Disease (PID) PhenomeR
(An integrated web-based ontology resource towards establishment of PID E-clinical decision support system)
Subazini Thankaswamy Kosalai and Sujatha Mohan1
1Research
Unit for Immunoinformatics, RIKEN Research Center for Allergy and Immunology, 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama, Kanagawa 230-0045, Japan.
ABSTRACT
The main challenge for in silico genotype-phenotype correlation for any genetic diseases is to standardize phenotype ontology
terms and the genotype data. Earlier, we have developed and established a molecular disease database named RAPID—
Resource of Asian Primary Immunodeficiency Diseases (PID) (http://rapid.rcai.riken.jp), a web-based informatics platform
which enables PID experts to easily mine collected genomic, transcriptomic, and proteomic data of PID causing genes. At
present, RAPID comprises a total of 265 PIDs and 243 genes, out of which 233 genes are reported with over 5000 unique
disease-causing mutations annotated from about 1800 PubMed citations as of February 2013. We, hereby, introduce a newly
developed PID ontology browser, “PhenomeR” (http://rapid.rcai.riken.jp/ontology/v1.0/phenomer.php), for systematic
integration and analysis of PID phenotype with the genotype data that are taken from RAPID. It currently holds 1438 PIDphenotype terms that are mapped and standardized using logic based assessment approach and represented in the form of Web
Ontology Language (OWL) and Resource Description Framework (RDF) formats using semantic web technology for easy
data exchange and validation, and interpretation of PID phenotype-genotype correlation using various computational approaches.
The motivation for the development of PhenomeR is mainly to assist researchers and clinicians to identify reported and novel
PID-causing genes as well as to determine genes involved in PID through the identification of reported disease-causing mutations
and their respective observed symptoms. In essence, PID PhenomeR serves as an active integrated platform for PID phenotype
data, wherein the generated semantic framework is implemented in the integrated knowledge-base query interface i.e. SPARQL
Protocol and RDF Query Language (SPARQL) endpoint for establishing a well-informed PID e-clinical decision support system.
PID-phenomeR features
RAPID,
IDR and
Literature
Phenotype
annotation tool
Collected PID
Phenotypes
terms
Mapped terms using Standard sources
Human Disease (DOID)
Human Phenotype Ontology (HPO)
Online Mendelian Inheritance in Man Metathesaurus source processing (OMIM-MTHU)
Symptom Ontology (SYMP)
Systematized Nomenclature of Medicine Clinical
Terms (SNOMEDCT)
The Unified Medical Language System - Concept
Unique Identifiers (UMLS_CUI)
No
Is Mapped ?
 Presents a web-based user friendly interface
for accessing, querying browsing and
analyzing PID phenotype terms
RAPID - Home page
 Integrates
semantically
standardized
phenotype vocabularies from RAPID along
with PIDs, genes and disease-causing
mutations into a relational ontology for
inference of genotype-phenotype correlation
Yes
(B) DATA STANDARDIZATION
 Provides PID-phenotype data in various
standardized downloadable options - OWL,
RDF and Excel formats for easy sharing and
data exchange
among other interested
research groups
 Displays the phenotype terms
structure using NCBO widget
Overview of PID-phenomeR
(A) DATA COLLECTION
Masuya, H., Y. Makita, et al. (2011). "The RIKEN integrated
database of mammals." Nucleic Acids Res. 39:D861-70.
PID quality check
by Logic based
assessment
method
in tree
No
Conservativity
principle
 Facilitates integrated knowledgeBase query
interface - SPARQL Protocol and RDF Query
Language (SPARQL)
 Promotes a network of active
community-driven
semantic
technology
open
web
Yes
PID PhenomeR Database Schema
RDF and OWL formats viewed in Link Data and Protégé
Home page
No
Consistency
principle
Yes
R
E
S
P
O
N
S
E
Locality
principle
PID quality check by semiautomated method
Term C3 deficiency
viewed using
Protégé 4.1 OntoGraf
No
(C) DATA STORAGE & RETRIEVAL
PID PhenomeR – Download Option
OWL, RDF files
generation
Statistics
Database Statistics
RDF file generated using OWL
Syntax Converter
Q
U
E
R
Y
Phenotype
terms
Semantic
types
Phenotype ontology
database
PID Phenotype KnowledgeBase
Search and Query interface "PhenomeR"
Search result of
phenotype term
Primary information page
of STK4 gene in RAPID
PID PhenomeR Advanced search options
1466
24
Category
29
Subcategory
45
Terms in
Multiple
Category
Terms in
Multiple
subcategory
Newly mapped
terms
OWL Statistics
Classes
Individuals
Classes with single
subclass
Classes with more
than 25 subclasses
1549
144
1346
17
Average number of
Siblings
276
10
Object Property
161
51
Data Property
9
Successful outcome and challenges
PhenomeR aims to build hierarchical ontology class structures and entities
of all observed PID phenotypic terms that can be further used as integrated
knowledgebase query interface - SPARQL Protocol and RDF Query
Language (SPARQL) for screening and implementing algorithms to
compile data from multiple sources to measure statistically significant
dataset with greater sensitivity, specificity and degree of confidence
towards well-informed clinical decision support system.
The mapping of unmapped terms from the PhenomeR is a challenging
task, since some of them are not available in any of the databases. This
ongoing pursuit will soon implement a systematic integrated approach for
mapping all these unmapped new terms towards an open communitydriven semantic web (SW) technology.
PhenomeR enables easy access, search, query and analyze PID
phenotype terms associated with genes, diseases and mutations
Reported list of genes
Reported list of mutation data
Reported list of mutation data
Mutation analysis of STK4 gene
Search result of phenotype term
beginning with ‘Recurrent’
CONCLUSION
Multiple terms search output
Hyperlinked PubMed
reference citation
Search result of PID phenotype
term with category
‘Cardiovascular’
Search result of PID phenotype term
with semantic type - ‘Acquired
Abnormality’
Master list of PID phenotype terms, associated
features and relationships in Excel format
PID PhenomeR – Download Option – OWL format
Overall, this kind of analysis should bridge a gap between genotype and
phenotype correlation thereby improving phenotype-based genetic analysis of
PID genes. Moreover, it should facilitate clinicians in confirming early PID
diagnosis and also helpful in implementing proper therapeutic interventions.
We sincerely believe that the presented structured data format in RPO should
help in augmenting biomedical researchers to do further analysis
computationally and also assisting clinicians in identification of diagnosed PID
Publications – PID project
PID PhenomeR project in NCBO BioPortal
http://bioportal.bioontology.org/projects/171
Term hierarchy
visualization using NCBO
widget from NCI thesaurus
Subazini Thankaswamy Kosalai and Sujatha
Mohan.
PID PhenomeR- An integrated
platform for developing phenotype ontology
structures for primary immunodeficiency
diseases (Database, Oxford University Press In communication)
All distinct subjects from
RPO ontology queried
using SPARQL
RPO summary page in NCBO BioPortal
(http://bioportal.bioontology.org/ontologi
es/3114)
Contact: sujatha@rcai.riken.jp
Registration form for submitting new PID
terms
Acknowledgements
The authors acknowledge RIKEN for providing necessary computing
resources, the research team at the Institute of Bioinformatics (IOB),
Bangalore India for their collaboration in developing RAPID, and alumni of our
lab as well as all PID physicians involved in the PID Japan project for their
valuable input and suggestions.
Collaboration and funding
The PID project has been initiated by the IOB and the Immunogenomics
research group at Research Centre for Allergy and Immunology (RCAI),
RIKEN Yokohama Institute, Japan and it was funded by The Asia S&T
Strategic Cooperation Promotion Program, Special Coordination Funds for
Promoting Science and Technology, MEXT, Japan.
Download