A Guide to Ontology-Based Phenotyping Systems – Rationale and Methods Based on the Rockefeller University Experience Andreas C. Mauer, Edward M. Barbour, Nickolay A. Khazanov, Natasha Levenkova, Shamim A. Mollah, Barry S. Coller I. Background and Rationale One of the major obstacles to clinical and translational science is the lack of a standardized method for recording and retrieving clinical information, including medical histories, physical examination findings, and details regarding responses to therapy. Collectively, such information helps to define an individual's phenotype. The benefits of carefully collecting and organizing longitudinal phenotypic information for research purposes are perhaps best exemplified by the Framingham study, which has provided and continues to provide enormously valuable clinical and translational information that has directly influenced medical practice and led to improvements in human health.(1) The revolutionary advances in genomics and the growing sophistication of proteomics reinforce the need for high-quality, detailed phenotypic information because medically and scientifically meaningful gene-gene and geneenvironment interactions can only be identified when correlated with detailed and reliable phenotypes. Yet, despite the general recognition of its importance to clinical and translational research, phenotyping as a scientific discipline has lagged behind advances in genetics, a deficiency that prompted Freimer and Sabatti to call for a “human phenome project.”(2) Therefore, Rockefeller University investigators have undertaken an initiative to enhance human phenotyping under the auspices of a Clinical and Translational Science Award (CTSA). To address the deficiencies in current practices – including the lack of standardized, rigorous, and comprehensive data recording instruments, the common practice of discarding case report forms after study completion, and the use of differing instruments by different investigators – we Ontology-Based Phenotyping Systems V. 0.3 Page 1 of 12 2/16/2016 developed an electronic phenotyping system for use by investigators worldwide. This prototype system uses the bleeding history as a paradigm. In order to promote standardization, transparency, and aggregation of data from multiple sources, as well as to facilitate data retrieval and analysis, the phenotyping system is grounded in the creation of domain ontologies for the disorders under study. Ontologies help to achieve these goals by explicitly defining the existing knowledge about a disorder. This allows a group of investigators to formally define and encode that information. In this way, the ontology allows one to develop a common understanding of a disorder among a community of investigators and make assumptions about the disorder explicit. The ontology’s electronic structure facilitates the organizational analysis of the encoded knowledge, including database design and the merger of different databases. Examples of ontologies range from the gene ontology (GO)(3) to the internet search engine Yahoo (“Yet Another Hierarchical Officious Organizer”). We set the following goals for the system: 1. Insure the quality of the instrument by expert review. 2. Maximize the use of standardized vocabulary for medical terms. 3. Insure the security of the system. 4. Insure transparency by making the instrument publicly available. 5. Faciliate adoption of the recording instrument by investigators at other sites by making it Webaccessible. 6. Connect the instrument to a scalable database. II. Building the Bleeding History Phenotyping System The Bleeding History Phenotyping System (BHPS, figure 1) consists of: a comprehensive bleeding history questionnaire; a bleeding history ontology; an electronic phenotype recording instrument (PRI); and a database. Ontology-Based Phenotyping Systems V. 0.3 Page 2 of 12 2/16/2016 The first step in developing the BHPS was the creation of a comprehensive Bleeding History Questionnaire (BHQ). The BHQ was used as the reference for constructing a Bleeding History Ontology (BHO). The BHO explicitly defines knowledge about, and relations between and among, bleeding symptoms in an electronic format that is scalable, standardizable, and tractable for database manipulation and machine learning applications. The BHO served as the foundation for an electronic Phenotype Recording Instrument (PRI). The PRI employs logical axioms to speed data collection as well as pictorial aids to facilitate accurate data collection; it also includes integrated data representation and analysis utilities (see Section II,4: Phenotype Recording Instrument). The PRI is available at https://bh.rockefeller.edu/prat/. Instructions for use can be obtained from Dr. Andreas Mauer (smollah@rockefeller.edu). The BHO also serves as the template for a Bleeding History Database (BHD) that stores de-identified demographic data and question responses. Ontology-Based Phenotyping Systems V. 0.3 Page 3 of 12 2/16/2016 Figure 1: Bleeding History Phenotyping System. After an extensive literature search and review by experts, a paper clinical reminder form was converted into a comprehensive Bleeding History Questionnaire (BHQ). The questionnaire formed the basis for a Bleeding History Ontology (BHO) as well as a Bleeding History Database (BHD) and a graphical user interface and electronic recording instrument, the Phenotype Recording Instrument (PRI). As of August 4th, 2009, the BHPS has been used by 4 investigators to collect comprehensive phenotypic information on bleeding symptoms from 500 normal individuals across three sites (an academic research facility and two community health centers). The BHPS is freely available to investigators worldwide, and an administrative framework for the dissemination of BHPS instruments to the hemostasis community has been established. The BHPS methodology was presented at the American Medical Informatics Association 2009 Summit on Translational Bioinformatics,(4) and preliminary analyses of data collected with the BHPS were presented at the XXII Congress of the International Society on Thrombosis and Haemostasis. Ontology-Based Phenotyping Systems V. 0.3 Page 4 of 12 2/16/2016 We are eager to extend our approach to the phenotyping of other disorders and are therefore pleased to offer our support to other investigators interested in developing phenotyping instruments in their own fields of expertise. 1. Medical History, Physical Examination, and Laboratory Data Selection and Organization The first step of our phenotyping system methodology (Figure 2) entails the creation of a comprehensive phenotyping questionnaire to collect signs and symptoms associated with the disorder or group of disorders. Given the importance of expert opinion in knowledge modeling,(5) it is vital that the ontologies reflect the most recent and comprehensive information based on a comprehensive review of the literature and the opinion of experts in the field. In addition, to standardize the language used in the ontology and questionnaire, we recommend mapping as many terms as possible to the codes contained in controlled vocabularies such as the Unified Medical Language System (UMLS).(6) Other mappings employed in the BHPS include the International Classification of Disease 9th Edition (ICD-10)(7) for medical diagnoses and Online Mendelian Inheritance in Man (OMIM)(8) for genetic information on particular disorders. Ontology-Based Phenotyping Systems V. 0.3 Page 5 of 12 2/16/2016 Figure 2: Sample Phenotyping System Methodology A Phenotyping Questionnaire (1) is used to derive a Phenotype Ontology (2). The ontology is in turn used to build a Phenotype Database (3) and an electronic Phenotype Recording Instrument (4). Existing databases and/or registries may be incorporated using ontology-mediated approaches. Possible applications for phenotyping systems include generation of phenotype scoring instruments and analyses of genotype-phenotype correlations. 2. Ontology The second step is ontology construction. Many existing ontologies may be adapted for phenotyping applications, including those contained in public ontology repositories such as BioPortal(9) and the Open Biomedical Ontologies (OBO) Foundry.(10) If no appropriate ontology already exists for the desired purpose, a new ontology must be constructed. Numerous Ontology-Based Phenotyping Systems V. 0.3 Page 6 of 12 2/16/2016 methodologies for ontology construction have been proposed,(11-14) but all share a few common principles (adapted from Uschold(14)): 1. Identify the purpose of the ontology. What will it be used for? Who will use it? 2. Define the level of formality. In general, the more informal the ontology, the easier it is for humans to interpret. Conversely, more formal ontologies are more tractable for computerized applications such as database mergers and automated reasoning. 3. Define the scope. Should all possible terms relevant to a given disorder be included, or will a subset of terms suffice? The scope of the ontology will be directly related to the ontology’s purpose. 4. Build the ontology. A variety of computer programs are available for ontology development, but the standard in medical domains is Protégé,(15) an open-source ontology editor supported by the National Center for Bioontologies (NCBO), an element of the NIH Roadmap for Medical Research. Protégé has gained wide acceptance in the biomedical informatics community and supports several ontology formats in addition to database functions. For this reason, the BHO constructed by the Rockefeller team was encoded using Protégé. 5. Make the ontology publicly available and continually re-evaluate and revise the ontology. One of the benefits of an ontology is to help develop a consensus understanding of a topic, and this is best achieved by making the ontology publicly available so that experts in the disorder and experts in biomedical informatics can review and comment on its content and organization. This can be achieved by uploading the ontology to one of the two leading repositories of biomedical ontologies, BioPortal and the OBO Foundry. A systematic approach to updating ontologies at regular intervals based on community feedback is particularly important. Ontology-Based Phenotyping Systems V. 0.3 Page 7 of 12 2/16/2016 For additional details on the purpose of ontologies and methods for their construction, reviews by Noy(12) and Uschold(14) can be consulted. 3. Database After ontology construction is complete, the ontology structure can be used as the template for building a database. Because one aim of ontology-based phenotyping systems is to make data sources freely accessible via the Internet, “database” refers here to relational database systems such as Oracle, Microsoft SQL Server, or MySQL.(16) The Rockefeller BHPS is implemented in MySQL because this database package is open source, fast, and supportive of programming languages like Python and Perl that are useful for Web design. 4. Phenotype Recording Instrument The ontology can also serve as the basis for a comprehensive, Web-based PRI similar to that used by the BHPS. The Rockefeller PRI was developed using the Python programming language and the Django Web Application Framework,(17) but numerous other options such as Adobe Dreamweaver(18) can also be used to design a PRI. Within our PRI, each group of phenotypic symptoms is independently accessible so as to create convenient, modular questionnaire sections. Within sections, logical axioms are implemented to speed questionnaire completion. For instance, if a subject answers “Yes” to the question “Have you ever had or do you currently have spontaneous nosebleeds?” the PRI will direct the subject to appropriate followup questions; in contrast, if the answer is “No,” the subject will not be asked any further questions about nosebleeds and will be immediately directed to the next question module. The Rockefeller PRI is time-stamped so that the time required to complete the study can be analyzed. Users can log off and log on as they wish so the PRI does not have to be completed in a single session. Visual aids such as high-quality photographs (Figure 3) can be Ontology-Based Phenotyping Systems V. 0.3 Page 8 of 12 2/16/2016 included to help individuals understand the questions and provide accurate responses. In addition, data representation utilities (Figure 4) can be implemented to help investigators review their data. Figure 3: Phenotype Recording Instrument Figure 4: Data representation utilities Ontology-Based Phenotyping Systems V. 0.3 Page 9 of 12 2/16/2016 The process of questionnaire, ontology, database, and PRI development requires the collaboration of biomedical informaticists and clinicians. At Rockefeller, it required approximately one year to complete all of the steps in constructing the BHPS. The time required for other systsms will depend on the nature of available data collection instruments, as well as the availability of clinical and biomedical informatics expertise. We are eager to help other groups develop ontology-driven phenotyping systems for other disorders by sharing our experience and offering guidance on their construction and deployment. Ontology-Based Phenotyping Systems V. 0.3 Page 10 of 12 2/16/2016 III. Bibliography (1) Shindler E. Framingham Heart Study. http://www framinghamheartstudy org/about/milestones html 2008 December 3;Available from: URL: http://www.framinghamheartstudy.org/about/milestones.html (2) Freimer N, Sabatti C. The human phenome project. Nat Genet 2003 May;34(1):15-21. (3) Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM, Davis AP, Dolinski K, Dwight SS, Eppig JT, Harris MA, Hill DP, Issel-Tarver L, Kasarskis A, Lewis S, Matese JC, Richardson JE, Ringwald M, Rubin GM, Sherlock G. Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat Genet 2000 May;25(1):25-9. (4) Mauer AC, Barbour E, Khazanov N, Levenkova N, Mollah S, Coller BS. An Ontology-Driven Bleeding History Phenotyping System to Pool Data Across Sites. 2009 Mar 15; American Medical Informatics Association; 2009 p. 176. (5) Gomez Perez A, Benjamins VR. Overview of Knowledge Sharing and Reuse Components: Ontologies and Problem-Solving Methods. 2009 Aug 2; 2009. (6) Kashyap V, Borgida A. Representing the UMLS Semantic Network using OWL: (Or "What's in a Semantic Web link?"). In: Fensel D, Sycara K, Mylopoulos J, editors. The Semantic Web International Semantic Web Conference. Springer-Verlag, Heidelberg; 2003. p. 1-16. (7) National Center on Health Statistics. International Classification of Diseases, 9th Revision. http://www cdc gov/nchs/about/major/dvs/icd9des htm 2009 April 16;Available from: URL: http://www.cdc.gov/nchs/about/major/dvs/icd9des.htm (8) Online Mendelian Inheritance in Man. http://www ncbi nlm nih gov/omim/ 2009 January 5;Available from: URL: http://www.ncbi.nlm.nih.gov/omim/ (9) National Center for Biomedical Ontology. BioPortal. http://bioportal bioontology org/ 2009;Available from: URL: http://bioportal.bioontology.org/ (10) Smith B, Ashburner M, Rosse C, Bard J, Bug W, Ceusters W, Goldberg LJ, Eilbeck K, Ireland A, Mungall CJ, Leontis N, Rocca-Serra P, Ruttenberg A, Sansone SA, Scheuermann RH, Shah N, Whetzel PL, Lewis S. The OBO Foundry: coordinated evolution of ontologies to support biomedical data integration. Nat Biotechnol 2007 November;25(11):1251-5. (11) Gruber T. A translation approach to portable ontology specifications. Knowledge Acquisition 1993;5(2):199-220. (12) Noy N, McGuinness D. Ontology Development 101: A Guide to Creating Your First Ontology. http://protege stanford edu/publications/ontology_development/ontology101-noy-mcguinness html 2008 December 30;Available from: URL: http://protege.stanford.edu/publications/ontology_development/ontology101-noymcguinness.html Ontology-Based Phenotyping Systems V. 0.3 Page 11 of 12 2/16/2016 (13) Stevens R, Goble CA, Bechhofer S. Ontology-based knowledge representation for bioinformatics. Brief Bioinform 2000 November;1(4):398-414. (14) Uschold M. Building Ontologies: Towards a Unified Methodology. 1996 Dec 16; 1996. (15) Protege-OWL. http://protege stanford edu/overview/protege-owl html 2009 January 5;Available from: URL: http://protege.stanford.edu/overview/protege-owl.html (16) MySQL. http://www mysql com 2009 April 2;Available from: URL: http://www.mysql.com (17) Django. www djangoproject com/ 2009 April 2;Available from: URL: www.djangoproject.com/ (18) Adobe Dreamweaver. http://www adobe com/products/dreamweaver/ 2009 April 2;Available from: URL: http://www.adobe.com/products/dreamweaver/ Ontology-Based Phenotyping Systems V. 0.3 Page 12 of 12 2/16/2016