Primary Immunodeficiency Disease (PID) PhenomeR (An integrated web-based ontology resource towards establishment of PID E-clinical decision support system) Subazini Thankaswamy Kosalai and Sujatha Mohan1 1Research Unit for Immunoinformatics, RIKEN Research Center for Allergy and Immunology, 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama, Kanagawa 230-0045, Japan. ABSTRACT The main challenge for in silico genotype-phenotype correlation for any genetic diseases is to standardize phenotype ontology terms and the genotype data. Earlier, we have developed and established a molecular disease database named RAPID— Resource of Asian Primary Immunodeficiency Diseases (PID) (http://rapid.rcai.riken.jp), a web-based informatics platform which enables PID experts to easily mine collected genomic, transcriptomic, and proteomic data of PID causing genes. At present, RAPID comprises a total of 265 PIDs and 243 genes, out of which 233 genes are reported with over 5000 unique disease-causing mutations annotated from about 1800 PubMed citations as of February 2013. We, hereby, introduce a newly developed PID ontology browser, “PhenomeR” (http://rapid.rcai.riken.jp/ontology/v1.0/phenomer.php), for systematic integration and analysis of PID phenotype with the genotype data that are taken from RAPID. It currently holds 1438 PIDphenotype terms that are mapped and standardized using logic based assessment approach and represented in the form of Web Ontology Language (OWL) and Resource Description Framework (RDF) formats using semantic web technology for easy data exchange and validation, and interpretation of PID phenotype-genotype correlation using various computational approaches. The motivation for the development of PhenomeR is mainly to assist researchers and clinicians to identify reported and novel PID-causing genes as well as to determine genes involved in PID through the identification of reported disease-causing mutations and their respective observed symptoms. In essence, PID PhenomeR serves as an active integrated platform for PID phenotype data, wherein the generated semantic framework is implemented in the integrated knowledge-base query interface i.e. SPARQL Protocol and RDF Query Language (SPARQL) endpoint for establishing a well-informed PID e-clinical decision support system. PID-phenomeR features RAPID, IDR and Literature Phenotype annotation tool Collected PID Phenotypes terms Mapped terms using Standard sources Human Disease (DOID) Human Phenotype Ontology (HPO) Online Mendelian Inheritance in Man Metathesaurus source processing (OMIM-MTHU) Symptom Ontology (SYMP) Systematized Nomenclature of Medicine Clinical Terms (SNOMEDCT) The Unified Medical Language System - Concept Unique Identifiers (UMLS_CUI) No Is Mapped ? Presents a web-based user friendly interface for accessing, querying browsing and analyzing PID phenotype terms RAPID - Home page Integrates semantically standardized phenotype vocabularies from RAPID along with PIDs, genes and disease-causing mutations into a relational ontology for inference of genotype-phenotype correlation Yes (B) DATA STANDARDIZATION Provides PID-phenotype data in various standardized downloadable options - OWL, RDF and Excel formats for easy sharing and data exchange among other interested research groups Displays the phenotype terms structure using NCBO widget Overview of PID-phenomeR (A) DATA COLLECTION Masuya, H., Y. Makita, et al. (2011). "The RIKEN integrated database of mammals." Nucleic Acids Res. 39:D861-70. PID quality check by Logic based assessment method in tree No Conservativity principle Facilitates integrated knowledgeBase query interface - SPARQL Protocol and RDF Query Language (SPARQL) Promotes a network of active community-driven semantic technology open web Yes PID PhenomeR Database Schema RDF and OWL formats viewed in Link Data and Protégé Home page No Consistency principle Yes R E S P O N S E Locality principle PID quality check by semiautomated method Term C3 deficiency viewed using Protégé 4.1 OntoGraf No (C) DATA STORAGE & RETRIEVAL PID PhenomeR – Download Option OWL, RDF files generation Statistics Database Statistics RDF file generated using OWL Syntax Converter Q U E R Y Phenotype terms Semantic types Phenotype ontology database PID Phenotype KnowledgeBase Search and Query interface "PhenomeR" Search result of phenotype term Primary information page of STK4 gene in RAPID PID PhenomeR Advanced search options 1466 24 Category 29 Subcategory 45 Terms in Multiple Category Terms in Multiple subcategory Newly mapped terms OWL Statistics Classes Individuals Classes with single subclass Classes with more than 25 subclasses 1549 144 1346 17 Average number of Siblings 276 10 Object Property 161 51 Data Property 9 Successful outcome and challenges PhenomeR aims to build hierarchical ontology class structures and entities of all observed PID phenotypic terms that can be further used as integrated knowledgebase query interface - SPARQL Protocol and RDF Query Language (SPARQL) for screening and implementing algorithms to compile data from multiple sources to measure statistically significant dataset with greater sensitivity, specificity and degree of confidence towards well-informed clinical decision support system. The mapping of unmapped terms from the PhenomeR is a challenging task, since some of them are not available in any of the databases. This ongoing pursuit will soon implement a systematic integrated approach for mapping all these unmapped new terms towards an open communitydriven semantic web (SW) technology. PhenomeR enables easy access, search, query and analyze PID phenotype terms associated with genes, diseases and mutations Reported list of genes Reported list of mutation data Reported list of mutation data Mutation analysis of STK4 gene Search result of phenotype term beginning with ‘Recurrent’ CONCLUSION Multiple terms search output Hyperlinked PubMed reference citation Search result of PID phenotype term with category ‘Cardiovascular’ Search result of PID phenotype term with semantic type - ‘Acquired Abnormality’ Master list of PID phenotype terms, associated features and relationships in Excel format PID PhenomeR – Download Option – OWL format Overall, this kind of analysis should bridge a gap between genotype and phenotype correlation thereby improving phenotype-based genetic analysis of PID genes. Moreover, it should facilitate clinicians in confirming early PID diagnosis and also helpful in implementing proper therapeutic interventions. We sincerely believe that the presented structured data format in RPO should help in augmenting biomedical researchers to do further analysis computationally and also assisting clinicians in identification of diagnosed PID Publications – PID project PID PhenomeR project in NCBO BioPortal http://bioportal.bioontology.org/projects/171 Term hierarchy visualization using NCBO widget from NCI thesaurus Subazini Thankaswamy Kosalai and Sujatha Mohan. PID PhenomeR- An integrated platform for developing phenotype ontology structures for primary immunodeficiency diseases (Database, Oxford University Press In communication) All distinct subjects from RPO ontology queried using SPARQL RPO summary page in NCBO BioPortal (http://bioportal.bioontology.org/ontologi es/3114) Contact: sujatha@rcai.riken.jp Registration form for submitting new PID terms Acknowledgements The authors acknowledge RIKEN for providing necessary computing resources, the research team at the Institute of Bioinformatics (IOB), Bangalore India for their collaboration in developing RAPID, and alumni of our lab as well as all PID physicians involved in the PID Japan project for their valuable input and suggestions. Collaboration and funding The PID project has been initiated by the IOB and the Immunogenomics research group at Research Centre for Allergy and Immunology (RCAI), RIKEN Yokohama Institute, Japan and it was funded by The Asia S&T Strategic Cooperation Promotion Program, Special Coordination Funds for Promoting Science and Technology, MEXT, Japan.