Phenotyping System Development Guide

advertisement
A Guide to Ontology-Based Phenotyping Systems – Rationale and Methods Based on the
Rockefeller University Experience
Andreas C. Mauer, Edward M. Barbour, Nickolay A. Khazanov, Natasha Levenkova, Shamim A.
Mollah, Barry S. Coller
I. Background and Rationale
One of the major obstacles to clinical and translational science is the lack of a
standardized method for recording and retrieving clinical information, including medical
histories, physical examination findings, and details regarding responses to therapy.
Collectively, such information helps to define an individual's phenotype. The benefits of carefully
collecting and organizing longitudinal phenotypic information for research purposes are perhaps
best exemplified by the Framingham study, which has provided and continues to provide
enormously valuable clinical and translational information that has directly influenced medical
practice and led to improvements in human health.(1) The revolutionary advances in genomics
and the growing sophistication of proteomics reinforce the need for high-quality, detailed
phenotypic information because medically and scientifically meaningful gene-gene and geneenvironment interactions can only be identified when correlated with detailed and reliable
phenotypes. Yet, despite the general recognition of its importance to clinical and translational
research, phenotyping as a scientific discipline has lagged behind advances in genetics, a
deficiency that prompted Freimer and Sabatti to call for a “human phenome project.”(2)
Therefore, Rockefeller University investigators have undertaken an initiative to enhance
human phenotyping under the auspices of a Clinical and Translational Science Award (CTSA).
To address the deficiencies in current practices – including the lack of standardized, rigorous,
and comprehensive data recording instruments, the common practice of discarding case report
forms after study completion, and the use of differing instruments by different investigators – we
Ontology-Based Phenotyping Systems
V. 0.3
Page 1 of 12
2/16/2016
developed an electronic phenotyping system for use by investigators worldwide. This prototype
system uses the bleeding history as a paradigm.
In order to promote standardization, transparency, and aggregation of data from multiple
sources, as well as to facilitate data retrieval and analysis, the phenotyping system is grounded
in the creation of domain ontologies for the disorders under study. Ontologies help to achieve
these goals by explicitly defining the existing knowledge about a disorder. This allows a group of
investigators to formally define and encode that information. In this way, the ontology allows one
to develop a common understanding of a disorder among a community of investigators and
make assumptions about the disorder explicit. The ontology’s electronic structure facilitates the
organizational analysis of the encoded knowledge, including database design and the merger of
different databases. Examples of ontologies range from the gene ontology (GO)(3) to the
internet search engine Yahoo (“Yet Another Hierarchical Officious Organizer”).
We set the following goals for the system: 1. Insure the quality of the instrument by
expert review. 2. Maximize the use of standardized vocabulary for medical terms. 3. Insure the
security of the system. 4. Insure transparency by making the instrument publicly available. 5.
Faciliate adoption of the recording instrument by investigators at other sites by making it Webaccessible. 6. Connect the instrument to a scalable database.
II. Building the Bleeding History Phenotyping System
The Bleeding History Phenotyping System (BHPS, figure 1) consists of: a
comprehensive bleeding history questionnaire; a bleeding history ontology; an electronic
phenotype recording instrument (PRI); and a database.
Ontology-Based Phenotyping Systems
V. 0.3
Page 2 of 12
2/16/2016
The first step in developing the BHPS was the creation of a comprehensive Bleeding
History Questionnaire (BHQ). The BHQ was used as the reference for constructing a Bleeding
History Ontology (BHO). The BHO explicitly defines knowledge about, and relations between
and among, bleeding symptoms in an electronic format that is scalable, standardizable, and
tractable for database manipulation and machine learning applications. The BHO served as the
foundation for an electronic Phenotype Recording Instrument (PRI). The PRI employs logical
axioms to speed data collection as well as pictorial aids to facilitate accurate data collection; it
also includes integrated data representation and analysis utilities (see Section II,4: Phenotype
Recording Instrument). The PRI is available at https://bh.rockefeller.edu/prat/. Instructions for
use can be obtained from Dr. Andreas Mauer (smollah@rockefeller.edu). The BHO also serves
as the template for a Bleeding History Database (BHD) that stores de-identified demographic
data and question responses.
Ontology-Based Phenotyping Systems
V. 0.3
Page 3 of 12
2/16/2016
Figure 1: Bleeding History Phenotyping System.
After an extensive literature search and review by experts, a paper clinical reminder form was converted into a
comprehensive Bleeding History Questionnaire (BHQ). The questionnaire formed the basis for a Bleeding History
Ontology (BHO) as well as a Bleeding History Database (BHD) and a graphical user interface and electronic
recording instrument, the Phenotype Recording Instrument (PRI).
As of August 4th, 2009, the BHPS has been used by 4 investigators to collect
comprehensive phenotypic information on bleeding symptoms from 500 normal individuals
across three sites (an academic research facility and two community health centers). The BHPS
is freely available to investigators worldwide, and an administrative framework for the
dissemination of BHPS instruments to the hemostasis community has been established. The
BHPS methodology was presented at the American Medical Informatics Association 2009
Summit on Translational Bioinformatics,(4) and preliminary analyses of data collected with the
BHPS were presented at the XXII Congress of the International Society on Thrombosis and
Haemostasis.
Ontology-Based Phenotyping Systems
V. 0.3
Page 4 of 12
2/16/2016
We are eager to extend our approach to the phenotyping of other disorders and are
therefore pleased to offer our support to other investigators interested in developing
phenotyping instruments in their own fields of expertise.
1. Medical History, Physical Examination, and Laboratory Data Selection and
Organization
The first step of our phenotyping system methodology (Figure 2) entails the creation of a
comprehensive phenotyping questionnaire to collect signs and symptoms associated with the
disorder or group of disorders. Given the importance of expert opinion in knowledge
modeling,(5) it is vital that the ontologies reflect the most recent and comprehensive information
based on a comprehensive review of the literature and the opinion of experts in the field. In
addition, to standardize the language used in the ontology and questionnaire, we recommend
mapping as many terms as possible to the codes contained in controlled vocabularies such as
the Unified Medical Language System (UMLS).(6) Other mappings employed in the BHPS
include the International Classification of Disease 9th Edition (ICD-10)(7) for medical diagnoses
and Online Mendelian Inheritance in Man (OMIM)(8) for genetic information on particular
disorders.
Ontology-Based Phenotyping Systems
V. 0.3
Page 5 of 12
2/16/2016
Figure 2: Sample Phenotyping System Methodology
A Phenotyping Questionnaire (1) is used to derive a Phenotype Ontology (2). The ontology is in turn used to build a Phenotype
Database (3) and an electronic Phenotype Recording Instrument (4). Existing databases and/or registries may be incorporated
using ontology-mediated approaches. Possible applications for phenotyping systems include generation of phenotype scoring
instruments and analyses of genotype-phenotype correlations.
2. Ontology
The second step is ontology construction. Many existing ontologies may be adapted for
phenotyping applications, including those contained in public ontology repositories such as
BioPortal(9) and the Open Biomedical Ontologies (OBO) Foundry.(10) If no appropriate
ontology already exists for the desired purpose, a new ontology must be constructed. Numerous
Ontology-Based Phenotyping Systems
V. 0.3
Page 6 of 12
2/16/2016
methodologies for ontology construction have been proposed,(11-14) but all share a few
common principles (adapted from Uschold(14)):
1. Identify the purpose of the ontology. What will it be used for? Who will use it?
2. Define the level of formality. In general, the more informal the ontology, the easier it is
for humans to interpret. Conversely, more formal ontologies are more tractable for computerized
applications such as database mergers and automated reasoning.
3. Define the scope. Should all possible terms relevant to a given disorder be included,
or will a subset of terms suffice? The scope of the ontology will be directly related to the
ontology’s purpose.
4. Build the ontology. A variety of computer programs are available for ontology
development, but the standard in medical domains is Protégé,(15) an open-source ontology
editor supported by the National Center for Bioontologies (NCBO), an element of the NIH
Roadmap for Medical Research. Protégé has gained wide acceptance in the biomedical
informatics community and supports several ontology formats in addition to database functions.
For this reason, the BHO constructed by the Rockefeller team was encoded using Protégé.
5. Make the ontology publicly available and continually re-evaluate and revise the
ontology. One of the benefits of an ontology is to help develop a consensus understanding of a
topic, and this is best achieved by making the ontology publicly available so that experts in the
disorder and experts in biomedical informatics can review and comment on its content and
organization. This can be achieved by uploading the ontology to one of the two leading
repositories of biomedical ontologies, BioPortal and the OBO Foundry. A systematic approach
to updating ontologies at regular intervals based on community feedback is particularly
important.
Ontology-Based Phenotyping Systems
V. 0.3
Page 7 of 12
2/16/2016
For additional details on the purpose of ontologies and methods for their construction,
reviews by Noy(12) and Uschold(14) can be consulted.
3. Database
After ontology construction is complete, the ontology structure can be used as the
template for building a database. Because one aim of ontology-based phenotyping systems is to
make data sources freely accessible via the Internet, “database” refers here to relational
database systems such as Oracle, Microsoft SQL Server, or MySQL.(16) The Rockefeller BHPS
is implemented in MySQL because this database package is open source, fast, and supportive
of programming languages like Python and Perl that are useful for Web design.
4. Phenotype Recording Instrument
The ontology can also serve as the basis for a comprehensive, Web-based PRI similar
to that used by the BHPS. The Rockefeller PRI was developed using the Python programming
language and the Django Web Application Framework,(17) but numerous other options such as
Adobe Dreamweaver(18) can also be used to design a PRI.
Within our PRI, each group of phenotypic symptoms is independently accessible so as
to create convenient, modular questionnaire sections. Within sections, logical axioms are
implemented to speed questionnaire completion. For instance, if a subject answers “Yes” to the
question “Have you ever had or do you currently have spontaneous nosebleeds?” the PRI will
direct the subject to appropriate followup questions; in contrast, if the answer is “No,” the subject
will not be asked any further questions about nosebleeds and will be immediately directed to the
next question module.
The Rockefeller PRI is time-stamped so that the time required to complete the study can
be analyzed. Users can log off and log on as they wish so the PRI does not have to be
completed in a single session. Visual aids such as high-quality photographs (Figure 3) can be
Ontology-Based Phenotyping Systems
V. 0.3
Page 8 of 12
2/16/2016
included to help individuals understand the questions and provide accurate responses. In
addition, data representation utilities (Figure 4) can be implemented to help investigators review
their data.
Figure 3: Phenotype Recording Instrument
Figure 4: Data representation utilities
Ontology-Based Phenotyping Systems
V. 0.3
Page 9 of 12
2/16/2016
The process of questionnaire, ontology, database, and PRI development requires the
collaboration of biomedical informaticists and clinicians. At Rockefeller, it required approximately
one year to complete all of the steps in constructing the BHPS. The time required for other
systsms will depend on the nature of available data collection instruments, as well as the
availability of clinical and biomedical informatics expertise. We are eager to help other groups
develop ontology-driven phenotyping systems for other disorders by sharing our experience and
offering guidance on their construction and deployment.
Ontology-Based Phenotyping Systems
V. 0.3
Page 10 of 12
2/16/2016
III. Bibliography
(1) Shindler E. Framingham Heart Study. http://www framinghamheartstudy org/about/milestones
html 2008 December 3;Available from: URL:
http://www.framinghamheartstudy.org/about/milestones.html
(2) Freimer N, Sabatti C. The human phenome project. Nat Genet 2003 May;34(1):15-21.
(3) Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM, Davis AP, Dolinski K, Dwight SS,
Eppig JT, Harris MA, Hill DP, Issel-Tarver L, Kasarskis A, Lewis S, Matese JC, Richardson JE,
Ringwald M, Rubin GM, Sherlock G. Gene ontology: tool for the unification of biology. The Gene
Ontology Consortium. Nat Genet 2000 May;25(1):25-9.
(4) Mauer AC, Barbour E, Khazanov N, Levenkova N, Mollah S, Coller BS. An Ontology-Driven
Bleeding History Phenotyping System to Pool Data Across Sites. 2009 Mar 15; American Medical
Informatics Association; 2009 p. 176.
(5) Gomez Perez A, Benjamins VR. Overview of Knowledge Sharing and Reuse Components:
Ontologies and Problem-Solving Methods. 2009 Aug 2; 2009.
(6) Kashyap V, Borgida A. Representing the UMLS Semantic Network using OWL: (Or "What's in a
Semantic Web link?"). In: Fensel D, Sycara K, Mylopoulos J, editors. The Semantic Web International Semantic Web Conference. Springer-Verlag, Heidelberg; 2003. p. 1-16.
(7) National Center on Health Statistics. International Classification of Diseases, 9th Revision.
http://www cdc gov/nchs/about/major/dvs/icd9des htm 2009 April 16;Available from: URL:
http://www.cdc.gov/nchs/about/major/dvs/icd9des.htm
(8) Online Mendelian Inheritance in Man. http://www ncbi nlm nih gov/omim/ 2009 January
5;Available from: URL: http://www.ncbi.nlm.nih.gov/omim/
(9) National Center for Biomedical Ontology. BioPortal. http://bioportal bioontology org/
2009;Available from: URL: http://bioportal.bioontology.org/
(10) Smith B, Ashburner M, Rosse C, Bard J, Bug W, Ceusters W, Goldberg LJ, Eilbeck K, Ireland A,
Mungall CJ, Leontis N, Rocca-Serra P, Ruttenberg A, Sansone SA, Scheuermann RH, Shah N,
Whetzel PL, Lewis S. The OBO Foundry: coordinated evolution of ontologies to support
biomedical data integration. Nat Biotechnol 2007 November;25(11):1251-5.
(11) Gruber T. A translation approach to portable ontology specifications. Knowledge Acquisition
1993;5(2):199-220.
(12) Noy N, McGuinness D. Ontology Development 101: A Guide to Creating Your First Ontology.
http://protege stanford edu/publications/ontology_development/ontology101-noy-mcguinness
html 2008 December 30;Available from: URL:
http://protege.stanford.edu/publications/ontology_development/ontology101-noymcguinness.html
Ontology-Based Phenotyping Systems
V. 0.3
Page 11 of 12
2/16/2016
(13) Stevens R, Goble CA, Bechhofer S. Ontology-based knowledge representation for bioinformatics.
Brief Bioinform 2000 November;1(4):398-414.
(14) Uschold M. Building Ontologies: Towards a Unified Methodology. 1996 Dec 16; 1996.
(15) Protege-OWL. http://protege stanford edu/overview/protege-owl html 2009 January 5;Available
from: URL: http://protege.stanford.edu/overview/protege-owl.html
(16) MySQL. http://www mysql com 2009 April 2;Available from: URL: http://www.mysql.com
(17) Django. www djangoproject com/ 2009 April 2;Available from: URL: www.djangoproject.com/
(18) Adobe Dreamweaver. http://www adobe com/products/dreamweaver/ 2009 April 2;Available
from: URL: http://www.adobe.com/products/dreamweaver/
Ontology-Based Phenotyping Systems
V. 0.3
Page 12 of 12
2/16/2016
Download