Biomedical Informatics - UNC School of Information and Library

advertisement
Watkins, Paul B.
BIOMEDICAL INFORMATICS CORE
The TraCS Biomedical Informatics Core will unite the silos of biomedical informatics research excellence at
UNC and across North Carolina to maximize re-use of data, knowledge and processes. With the
establishment of the North Carolina Collaboratory for Biomedical Informatics (NCCBI), TraCS will support
research, patient care, education and policy-making while building upon, leveraging and extending the current
biomedical informatics infrastructure at UNC-CH. This core involves several external partners with a strong
presence in NC and world-wide: Red Hat, IBM, SAS, Allscripts, Quintiles and NCHICA. We are committed to
achieving a national leadership role in the design and development of best practices for the inclusion of clinical
data into shared repositories of biomedical data.
A.
Biomedical Informatics Core Leadership
Investigators: The following investigators will form the leadership of the Biomedical Informatics Core through
membership on the NCCBI UNC-CH Steering Committee:
Jose-Marie Griffiths, PhD, TraCS Biomedical Informatics Leader and Dean, School of Information and Library
Science (SILS), is a member of the National Science Board and a Fellow of the American Association for the
Advancement of Science, and co-authored Revolutionizing Health Care Through Information Technology while
serving as a member of the President’s Information Technology Advisory Committee.
Tim Carey, MD, Professor of Medicine and Director of the UNC-CH Sheps Center for Health Services
Research, is an accomplished Type 2 Clinical Investigator with extensive experience with health care data
bases. He is a member of the TraCS Translational Research Advisory Board and chair of the Type 2
Translation Subcommittee.
Ronald Falk, MD, Professor of Medicine, Chief, Division of Nephrology, is an accomplished clinical and
translational researcher who has successfully merged data from UNC electronic medical records with his
research data. He directs the Glomerular Disease Collaborative Network which includes over 400
nephrologists throughout the Southeast United States.
Bradley Mark Hemminger, PhD, Assistant Professor, joint appointment in School of Information and Library
Science and the Carolina Center for Genome Sciences; Adjunct Assistant Professor, Department of Radiology,
School of Medicine, will co-lead the Educational Opportunities Working Group. He has conducted an extensive
needs assessment with bioinformatics researchers at UNC-CH and is working on the development of a unified
vocabulary to support data access.
Carol Jenkins, MLS, Director, Health Sciences Library, will work with SILS to design support services for the
NCCBI. Ms. Jenkins chairs the UNC-CH campus-wide Information Technology Strategic Planning Committee.
John P. Kichak, Vice President, Information Services Division of UNC Hospitals, will direct the development of
the WebCIS data warehouse and provide significant leadership in the development of the NCCBI.
Lisa LaVange, PhD, Professor of the Practice of Biostatistics in the School of Public Health and director of the
Collaborative Studies Coordinating Center, will lead the NCCBI Data Management Initiative. She joined UNC
after 10 years in industry and brings extensive experience in clinical data management and statistics to the
TraCS Institute, with a particular emphasis on trial design, data management and analysis in a regulatory
environment. She is a Co-PI of this CTSA proposal and an associate director of the TraCS Institute.
Terry Magnuson, PhD, Sarah Graham Kenan Professor and Chair of Department of Genetics, director of the
Carolina Center for Genome Sciences Program and director of Cancer Genetics, Lineberger Comprehensive
Cancer Center, will provide leadership in building the TraCS-facilitated Translational Genomics Research
Initiative.
Chuck Perou, PhD, TraCS Biomedical Informatics Core Co-Director, Assistant Professor, Department of
Genetics with a research focus in cancer molecular genetics, will lead the Translational Genomics Research
Initiative.
Russell Taylor, PhD, Research Professor, joint appointment in the Department of Computer Science and the
Department of Physics and Astronomy, will co-lead the Educational Opportunities Working Group.
Biomedical Informatics Core
Watkins, Paul B.
Daniel Reed, PhD, founding director of the Renaissance Computing Institute, Chancellor’s Eminent Professor
Vice Chancellor for Information Technology, and UNC-CH CIO is a member of the President’s Council of
Advisors on Science and Technology, Chair of the Board of Directors of the Computing Research Association
and member of the Biomedical Informatics Expert Panel for the NIH National Center. Dr. Reed was formerly
director of the National Center for Supercomputing Applications (NCSA) and, while a member of the
President’s Information Technology Advisory Committee, co-authored their report, Revolutionizing Health Care
through Information Technology. He will lead the integration of the Portal into NCCBI and provide design
leadership.
External Partners: UNC-CH is in close geographic proximity to several leading information technology
companies which will work together with us to define the parameters of this ambitious project, helping to
identify the milestones and contribute intellectual capital to its achievement. These companies include:
Red Hat – the leading open source software company will provide intellectual guidance on an open source
architecture, help populate the international advisory council for the NCCBI and connect us with related global
open source activities.
IBM – has relationships with UNC-CH through its research labs worldwide and its healthcare and life sciences
solutions group located in Research Triangle Park, NC. Preliminary discussions related to the biomedical
informatics core have focused on the National Health Information Network and Health Information Exchange,
bioinformatics research and underlying architecture for the NCCBI cyberinfrastructure.
SAS – the statistical software company is currently engaged in discussions to develop the UNC Health Care
System research data warehouse. The company is also world-class in the area of analytics – an important
component of biomedical informatics.
Allscripts – provides software for maintaining electronic health records. Allscripts is in the process of
developing a data warehouse for many millions of patients and plans to make available its electronic health
records to the shared repository, thereby enhancing the potential to enroll research subjects into clinical trials.
Quintiles – will assist in clinical data and policy issues through the participation of Judith Beach, a nationally
recognized expert on HIPAA, and via other in-kind support.
North Carolina Healthcare Information and Communications Alliance (NCHICA) – is a nonprofit
consortium of over 220 organizations dedicated to improving healthcare statewide by accelerating the adoption
of information technology.
B.
Vision
The history of science has taught us that the more broadly and transparently scientific data, knowledge and
processes are shared and re-used, the stronger the science and the faster the rate at which the science
advances. Activities over the past decade have leveraged public infrastructure and open access to better
practice of science itself through, for example, the National Library of Medicine’s National Center for
Biotechnology Information (NCBI), a resource for molecular biology information. NCBI creates public
databases, conducts research in computational biology, develops software tools for analyzing genome data
and disseminates biomedical information. Of particular significance for its usability and functionality is Entrez,
NCBI’s cross-database search engine, toolkits and educational programs. The TraCS Biomedical Informatics
Core will use this basic framework and infrastructure as a model to build an Entrez-like data system for clinical
data, which may come from clinical trials and studies, actual patient care or both.
The problem of collecting, cataloging, protecting, sharing and reusing clinical data is a non-trivial one, but it is a
problem that also contains its own solution. As the Entrez system teaches, the solution comes not by designing
the system around the method of collection or the structure of the catalog, but rather from thinking about how
to maximize the nature and the number of ways that data can be properly shared.
UNC-CH, along with its partners in government, healthcare and industry, is well-positioned to make this vision
a reality, with measurable implementation milestones. Several recent developments at UNC-CH confirm the
institutional commitment to build collaborative and far-reaching biomedical informatics activities:
 A Bioinformatics planning group, led by Dean Jose-Marie Griffiths and comprised of representatives from
the UNC-CH Schools of Medicine, Public Health, Dentistry, Nursing, Pharmacy, Information and Library
Science and the Department of Computer Science, has been working for the past 2 years on plans for
integrated health informatics education and related research. This effort has provided the foundation for the
TraCS Biomedical Informatics Core.
 UNC Health Care System is in the process of planning for a research data warehouse that will facilitate
access to clinical and administrative data for research purposes.
Biomedical Informatics Core



Watkins, Paul B.
The Renaissance Computing Institute (RENCI), founded in 2004, is a major collaborative anchored at
UNC-CH and supported by North Carolina State University and Duke University. One of RENCI’s primary
foci is integrating information technologies for applications to biology, biomedicine and genomics.
NCHICA (NC Healthcare Informatics Work Group) led by David Potenziani, Director of Information
Technology and Adjunct Assistant Professor, UNC School of Public Health, is assisting with the re-tooling
and transformation of the currently fragmented NC healthcare community from largely paper-based records
to an interoperable federated system of electronic information to improve healthcare.
Strong relationships exist among UNC-CH’s School of Information and Library Science, Department of
Computer Science, Red Hat and IBM, especially in development and dissemination of open source
software, open access digital libraries, knowledge representation, computer modeling and visualization.
To support the goals of the TraCS Institute, the Biomedical Informatics Core will create a statewide
interdisciplinary and inter-institutional collaboratory (collaborative laboratory): the North Carolina Collaboratory
for Biomedical Informatics (NCCBI). It will build on the transformative technology used by the NIH to create
Entrez for the NCBI. The long-term goal is to create a shared biomedical informatics data repository
connecting clinical enterprises across the State of North Carolina to create a demonstration project for clinical
data that will be a model for sharing and re-use of clinical data. This repository will contain appropriately deidentified data from clinical trials and clinical care. With the establishment of the NCCBI, the TraCS Biomedical
Informatics Core will transform the excellent but fragmented biomedical informatics capabilities at UNC-CH into
a coherent and connected system that facilitates routine re-use of research knowledge, data and processes
throughout UNC and North Carolina, serving as a prototype for the nation.
An effort of this magnitude stems from the rapidly emerging and urgent need to enhance and promote a culture
of collaboration among researchers, clinicians, information technologists, educators, consumers, payers, and
others. This requires fostering and sustaining interdisciplinary collaboration across traditional boundaries;
managing large and diverse data sets generated by a variety of methods from different disciplines; mining and
analyzing these data with advanced statistical, mathematical and computational techniques; and developing
robust, user-friendly tools and services for a variety of stakeholder constituencies.
The construction of the NCCBI will require funding that exceeds existing and proposed resources and will
require years of dedicated work. UNC-CH is committed to seeking the necessary funds from the private as well
as the public sector to realize this vision.
C.
Specific Aims
The TraCS Biomedical Informatics Core has identified 7 specific aims or requirements that we believe we must
effect to leverage and facilitate the power of re-use and transform biomedical informatics at UNC and ultimately
across North Carolina.
C-1. Improved collaboration among researchers, clinicians and educators. Effective collaboration results
from a combination of social, organizational and technical strategies and processes. Achievement of this aim
will require extensive user involvement, change management, promotion of sharing and collaboration,
development of effective processes and tools to address specific problems and adoption of standards.
C-2. Improved ability to search for clinical data and re-use with other research data. A critical step in
achieving this aim is to implement a research data warehouse for accessing UNC Health Care System
electronic health records and other clinical and administrative data. The end result of this project will be a
secure, searchable data repository with capabilities to download clinical care data for integration with other
research databases, including databases from primary data collection efforts under TraCS protocols.
C-3. Improved data mining and analysis capabilities to facilitate re-use of genomic and clinical data.
This objective will be accomplished through the expansion of the widely recognized bioinformatics center at
UNC’s Lineberger Comprehensive Cancer Center to develop a network through which genomic research
centers across campus can be linked to each other, to clinical care databases and to other research databases.
C-4. Clinical research data management systems that are re-usable and scalable across varied
research protocols. We will draw upon existing clinical research data management resources in the current
GCRC and the Collaborative Studies Coordinating Center, integrating and expanding resources and tools to
Biomedical Informatics Core
Watkins, Paul B.
enable more efficient and accurate real-time data collection, processing, management and analysis for TraCS
research protocols.
C-5. Flexible data collection procedures and tools that will support patient quality-of-care analyses as
well as facilitate subject recruitment for future research studies. This initiative will provide high quality
clinical databases populated with information collected from consenting, treatment-seeking patients in the UNC
Health Care System and, ultimately, throughout North Carolina to facilitate analyses of patient quality of care
as well as greatly shorten the length of time required to screen and enroll patients as subjects in future
research protocols.
C-6. Expanded community engagement. We will build on existing relationships with the broader biomedical
community, described more fully in the Community Engagement Core. Achievement of the aim will be based
on community biomedical informatics needs, priorities and perceived barriers to participation in collaborative
activities, the capture and accumulation of data in a shared and appropriately secure repository, availability of
a variety of tools for accessing, analyzing and re-using the data and development of targeted user portals, and
registries and educational programs aimed at multiple audiences.
C-7. Coordination of research infrastructure designed for re-use of data, processes, tools and
services. We will include design, development, testing and implementation of a biomedical informatics
cyberinfrastructure that includes a shared data repository; reusable processes, protocols, tools and
applications and a variety of user services.
D.
Detailed Vision
D-1. Improved collaboration among researchers, clinicians and educators:
Establishment of the North Carolina Collaboratory for Biomedical Informatics
Overview: Engaging the power of re-use requires significant and challenging behavior changes among
researchers. First and foremost among these is the development of a culture of collaboration. The NCCBI will
establish the social, organizational and technical framework needed to promote and sustain this new high level
of collaboration. This initiative will draw upon investigators’ experience in developing and sustaining successful
interdisciplinary collaborations.
Problem Statement: There are many areas of excellence in biomedical informatics at UNC-CH. Strong groups
of researchers and clinicians use the most modern informatics tools and data sources available but they lack
awareness of the full range of activities and capabilities, and an adequate infrastructure to support sharing and
collaboration, and they experience a collaboration culture that is not as interdisciplinary or routine as it might
be. This results in suboptimal use of existing resources, duplication of effort, and lost opportunities. There
exists a critical need for coordination across organizational units to 1) expand awareness of the wide range of
relevant biomedical informatics activities and resources; 2) foster and support effective, efficient and routine reuse of research knowledge, data and processes; and 3) stimulate the development and dissemination of new
ideas. However, new collaborations are not easy to build and sustain, especially when they cross multiple
disciplines and institutions. They need deliberate and focused effort and must deliver clear and immediate
value to those participating through demonstrated convenience, usability, reward and recognition.
What Do We Have to Build Upon? The current biomedical informatics infrastructure and leadership at UNC-CH
provide a substantial base on which to build the work of the NCCBI. UNC-CH has significant ongoing activities
in biomedical informatics, (described in more detail subsequently). However, there exists a critical need for
coordination across both databases and UNC-CH schools and research centers to stimulate and support
effective and efficient clinical and translational research. Strong, yet insulated and separated groups of
researchers and clinicians use the most modern informatics tools available but need the people, software and
tools to link disparate databases together and to help people identify new opportunities for collaboration.
To date, UNC-CH has established a number of focused interdisciplinary research centers with varying degrees
of informatics capabilities activities: the General Clinical Research Center, the Collaborative Studies
Coordinating Center, the Lineberger Comprehensive Cancer Center, the Carolina Center for Exploratory
Genetic Analysis, the Carolina Center for Genome Sciences, the Carolina Exploratory Center for
Cheminformatics Research, the Biomedical Imaging Research Center, the Carolina Environmental
Bioinformatics Center, the Center for Bioinformatics, the Renaissance Computing Institute, and the Odum
Biomedical Informatics Core
Watkins, Paul B.
Institute for Research in Social Science, among others. These centers engage faculty, staff and students or
schools and departments at UNC-CH and occasionally from other UNC institutions. Each of these centers has
been successful in its own right, but each could potentially contribute to an institution-wide or broader research
infrastructure, leveraging critical mass and specialized resources.
Experience of the lead investigators with computer-supported collaborative work in multi-institutional
interdisciplinary collaboratives such as the Center for Environmentally Responsible Solvents and Processes
(http://www.nsfstc.unc.edu/), Space Physics and Aeronomy Research Collaboratory (http://www.si.umich.edu/sparc), the high energy physics collaboratory: ATLAS (http://ganesh.lsa.umich.edu), to name a few, has shown
that the social, behavioral and organizational transformations take time to develop. They result from deliberate
and persistent attention to the human perceptions, preferences, values, incentives and connections, along with
organizational and technological innovation. They also demonstrate that transformational changes can and do
occur especially when the collaborations include social and behavioral scientists who can study the evolving
collaboration patterns and design interventions to overcome problems as they arise. Research on collaboration
(http://www.scienceofcollaboratories.org/) confirms that success in science collaboratories is based on a
complex mix of social and technical factors.
Proposed Solution: We plan to design and establish a statewide interdisciplinary and inter-institutional
collaboratory: the North Carolina Collaboratory for Biomedical Informatics (NCCBI) to 1) develop a culture of
collaboration through rich, recurring human engagement oriented to common interests/concerns; 2) build a
new organization with components of existing organizations (GCRC, Coordinating Center, RENCI), newly
established positions, and a participatory governance structure; and 3) expand the technical infrastructure to
facilitate routine re-use of biomedical research knowledge, data and processes throughout UNC and eventually
across North Carolina. Achieving this vision will require promotion of sharing and collaboration, with incentives
for participation to the participants derived from access to new data and services, extensive user involvement,
change management, implementation of tools and applications to address specific problems and adoption of
standards for representation of data, metadata, processes and knowledge.
Implementation Details: Past experience has shown that collaboration cannot be driven by technological
infrastructure alone. Effective collaboration is achieved through the combination of a set of social processes to
stimulate and sustain awareness, engagement, sharing and innovation, along with the development and
deployment of a technical infrastructure of processes, data and information sources, tools and services that
deliver needed functionality in a convenient, easy-to-use form. It is the social interaction that generates and
drives the technical agendas. Since specific implementation tactics will depend on ongoing user input, the
details will be developed as the project evolves. However, we will begin immediately to design start-up
strategies and to develop strategic and implementation plans for NCCBI.
Start-up strategies will begin immediately to build on momentum developed by the broader campus-wide
discussions of integrated health informatics activities and the intensive interactions associated with the
development of this proposal. To begin to build awareness of the potential power of re-use, a monthly NCCBI
lecture series addressing issues of sharing, collaboration and re-use will be established starting in February
2007. A related website, as part of or linked to the TraCS website, will disseminate announcements, press
releases, speaker information and presentation materials, and a wiki will be started for ongoing discussion.
A formal strategic planning process for NCCBI will begin in Spring 2007, consolidating 3 existing planning
threads: research data warehouse, integrated biomedical/health informatics research and education initiative,
and TraCS Biomedical Informatics Core. The process will involve multiple stakeholder constituencies to identify
needs, resources, capabilities and aspirations along with perceived barriers to sharing and re-use. Anticipated
elements of the NCCBI strategic plan include a long-term vision and roadmap with phased implementation
steps. The plan will address the social, organizational and technical developments necessary to realize the
vision. We anticipate that a strategic plan will be prepared by September 2007. An annual planning retreat will
review and refine priorities for the upcoming year in the context of a 3-year window. An annual conference will
present updates on research and development projects as well as results that can inform the planning and
prioritization process.
The social developments include a range of communication and engagement opportunities: meetings,
presentations, symposia, workshops, conferences, and open forums, along with mechanisms to overcome
potential barriers to participation and sharing: representation, intellectual property, ownership, credit,
recognition and reward mechanisms.
Biomedical Informatics Core
Watkins, Paul B.
The organization and governance structure of the NCCBI will be enhanced and enabled by several key
committees and working groups that will engage in development of collaborative agendas for research,
information technology, educational opportunities, and clinical care. The NCCBI UNC-CH Steering Committee
(members listed in part A of this Core Section) will guide the ongoing development of NCCBI and especially
the prioritization of activities and resource allocations to ensure that the NCCBI biomedical informatics
agendas are aligned with and supportive of the overall goals of TraCS and those of the other cores. The
Deans’ Research Collaboration Council, along with 4 specific topic working groups (on educational
opportunities, clinical care, research and IT domains) will focus on interdisciplinary research priorities and the
underlying IT requirements. The International Advisory Council will offer a global context for cutting-edge
developments in biomedical informatics. The External Partner Committee will bring together external providers
and their respective capabilities and contributions to ensure that their efforts are harmonized and leveraged to
support the NCCBI goals and priorities. Two staff teams will support data management and cyberinfrastructure
services, respectively. A communications specialist will facilitate internal and external communications, working
with the TraCS Institute Office of Communications as needed. Meetings will occur at regular intervals: weekly
(data management team, cyberinfrastructure team), monthly (Steering Committee, External Partner Committee,
Working Groups), quarterly (Deans’ Research Collaboration Council) and annually (International Advisory
Council).
Anticipated technical solutions to improve collaboration through NCCBI include:
Development of a research portal to register researchers, projects, data sets, tools developed or used, problems,
publications, and more.
Availability of personalized library and information services such as alerting services and on-demand reference services.
Development of rules of engagement that address the potential barriers to participation.
Implementing the WebCIS research data warehouse (see D-2).
Establishment of a Translational Genomics Initiative (see D-3).
Establishment of a Data Management Services (see D-4).
Recruitment of subjects for clinical trails through kiosks and other smart technology (see D-5).
Community engagement (see D-6).
Coordination of research infrastructure designed for re-use of data, processes, tools (see D-7).
D-2. Improved ability to search for clinical data and re-use with other research data:
Implementing the WebCIS research data warehouse
Overview: The development of a research data warehouse derived from UNC’s existing electronic medical
record system, WebCIS, is a critical component of the NCCBI. The warehouse will physically reside within the
Health Care System firewall but will have expanded search capabilities for analysis as well as secure, real time
links with research databases for data transfer. UNC-CH will partner with SAS, IBM and possibly other
companies in creating the research data warehouse, the first step in achieving the NCCBI.
Problem Statement: WebCIS was designed as a transaction and workflow system. While clinical data can be
exported, the system cannot, in its current form, support clinical research on a large scale, and documentation
of the current data is relatively poor. The ability to download and link clinical data with research data such as
questionnaires and laboratory data does exist, but is conducted ad-hoc for each new project. The institution
urgently needs a clinical data repository.
What Do We Have to Build Upon? The UNC Health Care System’s Information Technology Division is
responsible for supporting all IT infrastructure and clinical patient care applications for UNC hospitals and
affiliated entities, with over 1M active records. The division’s objective is to maximize patient care and
operational efficiency by ensuring the ability to incorporate back end systems and operational process
integration. Over the past 12 years the division has built an electronic medical record with an in-house
developed system, WebCIS, which offers a common interface for 12,400 physicians regardless of where they
practice within the UNC Health Care System. All ambulatory care data are computerized. Inpatient care noting
is complete. Computerized physician order entry is universal across the hospital. Outpatient prescriptions are
online, including the ability to directly transmit prescriptions to pharmacies. Patient care data are stored and
have the potential to provide a rich resource for clinical and health services research. The research data
warehouse has been approved for internal UNC-CH funding in FY07.
Biomedical Informatics Core
Watkins, Paul B.
Proposed Solution: This integrated enterprise data warehouse will create actionable intelligence that can
impact clinical effectiveness, fiscal integrity and research outcomes across the organization via access to and
use of timely and accurate data. The result will be an enterprise intelligence platform that can be utilized for
analysis, prediction and alignment and that will consistently enhance the leadership position of the UNC Health
Care System.
Implementation Details: This project will be divided into 3 stages, with each stage occurring in parallel and
each building towards a federated data warehouse:
1. Building a research data warehouse for all the functionality mentioned above by extracting and transferring
the 15 years of clinical data contained within the UNC HCS electronic medical record.
2. Building and linking all the administrative/financial decision support data bases under the umbrella of the
data warehouse.
3. Building and implementing an easy-to-use query tool that will have access to the federated data dictionary
to enable all researchers and clinicians to perform queries on clinical data linked to financial and research
data.
Subsequent phases will include the creation of the shared data repository clinical programs from across the
state of North Carolina using open source, open standards and open access. These efforts will involve
partnerships with IBM, Red Hat, Allscripts and NCHICA.
D-3. Improved data mining and analysis capabilities to facilitate re-use of genomic and clinical data:
Establishment of a Translational Genomics Research Initiative
Overview: This aim will be accomplished through the expansion of the very successful bioinformatics center at
UNC’s Lineberger Comprehensive Cancer Center and the continued development of the infrastructure through
which various clinical research centers located throughout the campus can be linked to each other, to clinical
care databases and to other research databases.
Problem Statement: There is need for greater coordination of efforts and research infrastructure for sharing
data, tools and services. There is an urgent need for an informatics infrastructure to link clinical scientists with
the scientists who generate genomic data on clinical materials and who can advise on types of genomic
analyses to perform and genomic assay experimental designs that work, as well as to help with analysis and
storage of genomic data.
What Do We Have to Build Upon? UNC-CH has numerous groups excelling in bioinformatics research, with
faculty expertise in biology, biostatistics, statistics, chem-informatics, computer science, genetics, information
science, library science, pharmacogenomics and systems biology. Areas of application include all areas of
high-output –omics technologies, as well as sequence analysis, traditional genetics, the synthesis of
information across data types and interfacing with clinical information.
UNC-CH has also longstanding expertise in relevant biomedical areas, such as mouse genetics and numerous
clinical disciplines. In 2001 the Carolina Center for Genome Sciences (genomics.unc.edu) was developed in a
10-year $245M Genome Sciences Initiative at UNC-CH, with over 40 faculty members in departments across
the university. At least half of the new hires work in various areas of bioinformatics. The Carolina Center for
Genome Sciences unites bioinformatics and biomedical investigators and has fostered extensive collaboration.
Although the bioinformatics expertise at UNC-CH is strong and maturing, areas designated/targeted for growth
include statistical genetics, biostatistics, genetics, pharmacogenomics, individualized therapy, proteomics,
metabolomics and imaging.
Bioinformatics Consortia and Research Centers: UNC-CH is a funded participant in the large Cancer
Bioinformatics Grid (CaBIG, cabig.nci.nih.gov), with special emphasis on distance-weighted discrimination
tools for machine learning and cross-platform normalization of microarrays. The Carolina Center for
Exploratory Genetic Analysis is funded by an NIH grant to explore methods for genotype-phenotype analysis
and models for matching clinical and genomic datasets. The Carolina Exploratory Center for Cheminformatics
Research (neccr.org) is developing quantitative tools that will design and explore chemical libraries and highthroughput screening results to better understand toxicity and efficacy of small molecules in complex biological
systems. The Biomedical Research Imaging Center (bric.unc.edu) builds upon UNC-CH strengths in image
analysis. The Carolina Environmental Bioinformatics Center is funded by a 5-year, $4.5M grant to create tools
Biomedical Informatics Core
Watkins, Paul B.
and methods for handling toxicogenomics and related environmental science datasets and brings together 17
faculty across UNC-CH in a truly cross-disciplinary effort.
Standard Software Applications, Software Training and Funded Bioinformatics Cores: The UNC-CH Center for
Bioinformatics (bioinformatics.unc.edu) supports the use of computational biology tools throughout UNC-CH.
The Center for Bioinformatics serves as a resource for numerous standard bioinformatics applications,
including sequence analysis and database development.
UNC-CH provides funding for several bioinformatics cores that support large federal grants on campus. We list
a few examples, beyond the group led by Dr. Perou in the Cancer Center (see below) and highlighted
elsewhere in this CTSA application. The bioinformatics cores include the Biostatistics and Bioinformatics Core
for the UNC-CH Lineberger Cancer Center’s Gastro-Intestinal Specialized Program of Research Excellence
(SPORE), the Biostatistics Core of the Center for Environmental Health Susceptibility, the Microarray Analysis
Core of the Neurodevelopmental Disorders Research Center and a similar core for the UNC Neurosciences
Center.
Bioinformatics Training at UNC: Numerous federally funded pre-doctoral and post-doctoral training grants and
programs on campus provide training in bioinformatics. The flagship is the grant for the UNC Bioinformatics
and Computational Biology PhD certificate program (bcb.unc.edu, funded by NIGMS), which unites interested
students and bioinformatics advising faculty from across UNC; it will become a freestanding PhD program in
the future. The Cancer Genomics training grant in Biostatistics (NCI) focuses on statistical genomics. The
Environmental Sciences training grant (NIEHS) has several students working on genomics bioinformatics
methods. Similarly, the UNC-CH Toxicology Curriculum is moving steadily to incorporate more toxicogenomics
and bioinformatics research training.
Computing facilities: Computing capabilities at UNC-CH are excellent and continuously improving. In addition
to the computing facilities in individual departments, UNC-CH’s Information Technology Services maintains
several major multiprocess clusters. Moreover, several UNC-CH researchers have active research
collaborations with RENCI, which provides additional leadership and assistance and has recently brought
online a 1024 computer-node Blue Gene L cluster with 5.6 Tflop peak performance.
Lineberger Comprehensive Cancer Center’s Bioinformatics Group: One of our most successful groups has
been the Lineberger Comprehensive Cancer Center’s (LCCC) Bioinformatics Group, which represents an
outstanding multidisciplinary team of cancer biologists, clinical researchers, genomic specialists,
bioinformaticians and biostatisticians, co-directed by LCCC faculty members Charles Perou, PhD
(Departments of Genetics and Pathology), Steve Marron, PhD (Department of Statistics and Operations
Research), and D. Neil Hayes, MD (Department of Oncology), who is serving as the medical director;
additional statistical analysis support is also provided by Andrew Nobel, PhD, of Statistics and Operations
Research. The Bioinformatics Group provides several major services:



UNC Microarray Database, (with 2-color arrays and a database for Affymetrix data). The UNC Microarray
Database currently houses over 10,000 experiments and has 289 registered users. Microarray data
analysis has resulted in the publication of 14 papers since 2003 co-authored by at least 2 members of the
LCCC-BG.1-14 UNC’s work will be part of the Genome Atlas project.
Expertise in the development, maintenance and mining of databases that contain cancer patient clinical
information.
An honest broker system to link data from the same participant across databases.
Proposed Solution: The already existing goals of the LCCC Bioinformatics Group are to provide genomic
database services (maintaining a gene expression and SNP database), provide relational patient clinical
databases (for tissues, for patient treatment information and for tumor sample information), and to provide
expertise in analyzing the data that are stored in these 2 types of databases (i.e., statistical analysis of
genomic data and biostatistical analysis of clinical/patient data). We propose here to expand these goals in 2
ways to improve our data mining and analysis capabilities: 1) to provide a direct computational bridge between
these 2 types of databases and 2) to provide the computational know-how to other translational groups of
researchers and clinicians at UNC-CH to facilitate additional translational genomics studies. And to enable this
expansion and implementation we will establish the Translational Genomics Research Initiative that will be led
by Dr. Charles Perou.
Biomedical Informatics Core
Watkins, Paul B.
Implementation Details: The LCCC Bioinformatics Group already maintains at least 3 different genomic and 4
different clinical databases. A focus of the LCCC Bioinformatics Group is on breast cancer, and thus as one of
our early CTSA-TraCS projects through the Translational Genomics Research Initiative, we propose to work to
link together our breast genomic data with our breast cancer patient clinical databases. We are currently
building a comprehensive search and retrieval portal on top of our existing honest broker system. This will
enable approved researchers to seamlessly query multiple databases, including the clinical, gene expression,
SNP and breast tissue databases, which will retrieve de-identified, HIPAA-compliant search results. The
“breast research portal” will provide the honest brokers with the ability to authorize and control each
researcher’s ability to search databases based on IRB approvals. The portal, while integrating the databases,
will also help in building common vocabularies that would help future database integration with other
institutions, including the hospital systems. As part of this first effort of the Translational Genomics Research
Initiative, we also propose to integrate this breast data system into the WebCIS research data warehouse (see
D-2), and link it to the Bioportal that is part of the Carolina Center for Exploratory Genetic Analysis. This breast
cancer focused project will be one of our first attempts at linking multiple databases and data types together.
We are confident that the integration and federation of the breast research portal, WebCIS data warehouse
and Bioportal will be successful given our already significant success in merging breast clinical and genomic
data together to make an impact upon breast cancer biology and treatment.3 The next step for our
Translational Genomics Research Initiative will be to adapt this system for other CTSA researchers; that would
potentially include other cancer research groups (lung, GI, ovarian), and other disease-focus groups including
mental disorders, cystic fibrosis and cardiovascular diseases. We can provide the microarray and SNP
database to serve the needs of all UNC researchers, and using the funds requested here, our TraCS genomics
initiative will work with other disease-focus groups to 1) identify the existing clinical databases and data types
that researchers have, 2) create a computational link between these existing databases and our gene
expression and SNP databases, and 3) assist these researchers with the data retrieval and analysis, including
combined analyses of genomic and clinical data. As research groups and priorities are set (which will likely
include the well established cancer, cystic fibrosis and cardiovascular groups at UNC-CH), the Translational
Genomics Research Initiative within the TraCS infrastructure will work individually with each group to link
existing database within the LCCC Bioinformatics Group with the individual databases, and then most
importantly, individually work with the clinical scientists to query the genomic data relative to the clinical data so
that any potential correlates can be found. Thus, the experience in database linking and analysis of Dr. Perou
and colleagues in the new Translational Genomics Research Initiative will be shared and used among TraCS
investigators and trainees to the benefit these other existing strong and collaborative clinical research
programs.
D-4. Clinical research data management systems that are re-usable and scalable across varied
research protocols: Combining resources from the GCRC and Collaborative Studies Coordinating
Center to establish the TraCS Data Management Services
Overview: This initiative will draw upon existing clinical research data management resources in the current the
GCRC and the UNC Collaborative Studies Coordinating Center, integrating and expanding resources and tools
to enable more efficient and accurate real-time data collection, processing, management and analysis for
TraCS research protocols.
Problem Statement: Development of unified best practices in data management and reasonable unified data
structure specifications are key to the success of TraCS research in that both are needed to facilitate
combining complex data between vastly different scientific enterprises. As an example, consider linking
genotype, imaging, and clinic population data in order to analyze genetic modifiers and risk factors in the
longitudinal evolution of brain changes due to Alzheimer’s disease. Such a massively complex unification of
diverse data is not possible with the current data management resources available to UNC-CH investigators.
The proposed Biomedical Informatics Core will provide the infrastructure within which these linkages can take
place. A key component of this infrastructure will be clinical data management for research studies.
What Do We Have to Build Upon? Currently, the Informatics Core of the GCRC provides data management
support to investigators, administrative support to GCRC staff and server/workstation management for the
entire GCRC. Data management support, ranging from initial consultation to final database export for analysis,
is tailored to the unique needs of each research protocol. While the staff is extremely productive, efficient and
Biomedical Informatics Core
Watkins, Paul B.
able to handle the workload of the existing GCRC, most applications are developed on a per protocol basis
and are not easily re-usable.
The UNC Collaborative Studies Coordinating Center, established in 1971 and continuously funded by NIH for
the past 35 years, has set the standards for study coordination in general and data management, in particular
of large, multi-center studies. The center has been a pioneer in clinical data management, implementing
remote data entry on a national project in 1987 (the first NIH coordinating center to do so), followed by a webbased data management system in 2001. Features of center-developed data management systems include
interactive data entry with real-time field validation, audit logs to record database modifications, integrity
checks for the database, security (in logins, permissions based on need and encryption), automated data
queries, reporting, specimen tracking, re-key verification, forms inventory, data imports and exports (for
analysis) and options for local or server-based software. The center is currently on its 4th generation system,
based on Visual Basic generated HTML screens and a Microsoft SQL Server database. All systems satisfy
FDA guidelines for electronic records and signatures (21 CFR Part 11). However, as an NIH-funded
Coordinating Center, the center has had more experience working with medical centers outside UNC-CH than
within. The extensive data management, project coordination, and statistical consulting services available at
the Coordinating Center have not, for the most part, been utilized by UNC School of Medicine research
projects.
Two examples serve to illustrate the need for a unified, accessible approach to research data management in
the TraCS. The first example is a proposed study of a dietary intervention to reduce the risk of pre-term births.
This clinical trial was conceived as a potential R01 application from UNC’s Department of Obstetrics and
Gynecology (John Thorp, MD, as PI). For the study to be successful, many more pregnant women will need to
be enrolled than are available through UNC clinics alone. Providing full coordinating center support of a multisite study is beyond the scope of the current GCRC Informatics Core services, and funding for a stand-alone
coordinating center is beyond the financial limits of a standard R01 application; therefore, this project would be
viewed as too big for the GCRC informatics core and too small for the Coordinating Center. This study needs
access to web-based data entry and tracking systems at a cost well within the R01 cap.
The recently awarded SCCOR project (UNC Cystic Fibrosis Center; Ric Boucher, MD, as PI) provides a
second example. In addition to basic science studies, the project includes 3 clinical studies, each consisting of
an observational component and a randomized clinical trial. The clinical studies share common objectives and
methodologies across a spectrum of research subjects (healthy smokers, COPD patients, CF patients), and
the potential to pool data across studies is a key aspect of the design. Further, sputum samples and other
specimens will be moved from lab to lab as part of this project, and the ability to enter data at each station via a
web-based system will facilitate the processing and tracking of study data and results. While this is not a multicenter study, clearly the tools developed for same would greatly enhance the data infrastructure of this study.
Proposed Solution: We intend to build upon the strengths of the 2 existing core facilities, the GCRC Informatics
Core and the Coordinating Center, to provide superb data management capabilities to the TraCS Institute
investigators. The proposed strategy for clinical data management will provide research databases of
extremely high quality that are locked and ready for statistical analysis soon after last subject visits are
completed. Research databases will be automatically linked to other components of the TraCS Collaboratory
through the proposed biomedical informatics infrastructure and readily accessible to clinical investigators at
UNC as well as at other CTSA institutions for joint research efforts, provided access is granted.
Implementation Details: A Data Management Service will be established as part of the Biomedical Informatics
Core. This service will be located within the TraCS offices and will incorporate the existing GCRC Informatics
Core staff and facilities. The new Data Management Service will represent an increase in staffing levels of the
current GCRC informatics core to reflect both the greater number of studies anticipated and an expansion of
services. In addition, a core group of database and web programmers and network staff from the UNC
Collaborative Studies Coordinating Center will be assigned to the TraCS Data Management Service and
located within TraCS Institute offices to facilitate implementation of the Coordinating Center’s web-based
systems and tools for use in TraCS protocols.
The Coordinating Center’s web-based systems proposed for the TraCS Data Management Service are tabledriven in that the core code is independent of the study design. Moreover, study-specific information, including
electronic case report form layouts and edit specifications, is stored in database tables. The system is selfdocumenting, with variable names in each table corresponding to question numbers on case report forms or
Biomedical Informatics Core
Watkins, Paul B.
data entry screens. Query reporting and resolution are incorporated into the data management system, and
SAS reports of study progress and database status can be generated automatically. This particular design
lends itself extremely well to a quick study start-up period, once protocols and data collection instruments are
finalized.
The merging of these 2 programming and network support groups into the proposed Data Management
Service will enable TraCS protocols to take advantage of web-based data management, randomization and
tracking systems that are integrated to facilitate reporting of study progress and availability of data for analysis
(interim and final). Data management consulting services will be provided to projects and programs during the
design phase to advise on data collection modalities for a particular research protocol as well as case report
form design. The goal is to use a common set of case report forms across TraCS protocols as often as
possible and to standardize data definitions and edit specifications in order to facilitate pooling of data across
studies and the eventual sharing of data with other CTSA centers. This core will also promote and encourage
adoption across the CTSA network of common Case Report Forms to be determined within this network.
D-5. Flexible data collection procedures and tools that will support patient quality-of-care analyses as
well as facilitate patient recruitment for future research studies
Overview: This initiative will provide high quality clinical databases populated with information collected from
consenting, treatment-seeking patients in the UNC Health Care system and throughout the state of North
Carolina to facilitate analyses of patient quality of care as well as greatly shorten the length of time required to
screen and enroll subjects in future research protocols.
Problem Statement: Information collected via paper forms at the time a patient enters a UNC Health Care clinic
is currently not utilized to facilitate either assessments of quality of care or future subject recruitment in
research protocols.
What Do We Have to Build Upon? The web-based data management systems at the Collaborative Studies
Coordinating Center, in conjunction with the administrative computing support from the GCRC Informatics Core,
both described in Section D-4 above, are well-positioned to support this initiative. The Coordinating Center’s
web-based data management and tracking systems are designed to easily accommodate new data collection
modalities required for this initiative, and the experience of the GCRC Informatics Core in supporting clinic
operations provides the expertise needed to plan for extensive data collection and web-based subject
recruitment throughout the UNC Health Care system and into community areas.
Proposed Solution: 1) We will develop, pilot test and implement data collection procedures that are suitable for
use at points of care in UNC’s Health Care system in order to populate a clinical database of sufficient quality
to support patient quality of care analyses. 2) We will merge relevant background and medical history data
from WebCIS to this clinical database, and the resulting database will serve as a resource for subject
recruitment for future TraCS-based studies and clinical trials. 3) Web-based subject recruitment systems will
be developed for implementation in health care communities outside of UNC.
Implementation Details (Carpe Diem pilot study): The Coordinating Center is currently launching a pilot study in
conjunction with the oncology clinics at UNC School of Medicine that incorporates flexible data collection tools
into the web-based data management system, namely, smart pens and tablet PCs. In this pilot, patients visiting
one of the oncology clinics are approached and asked to participate and give informed consent for minimal
data collection at the time of their visit. The study is therefore pilot testing not only the use of flexible data
collection methodologies, but also the ability to consider treatment-seeking patients as potential research
subjects for future studies. Data are collected using digital tablets or digital pens; both devices are minimally
invasive and therefore ideal for use in a clinic setting. The clinical database is then populated with data
collected at point of entry through the digital devices. These newly collected data will be coded and processed
for use in assessments of patient quality of care. Data standards consistent with regulatory guidelines for
clinical trials will be applied, thereby facilitating the use of these point-of-care data in the event that the patients
eventually join a research protocol as research subjects.
In addition to the data collected in the clinic via digital devices, other background medical information will be
downloaded from electronic medical records (WebCIS). This is currently being accomplished through a live
HL7 feed for the Carpe Diem pilot study. With the completion of the research data warehousing project for
WebCIS data, this information transfer will be greatly improved.
Biomedical Informatics Core
Watkins, Paul B.
The Carpe Diem pilot study will serve as the framework for a large-scale data collection initiative of the TraCS
Institute Biomedical Informatics Core. Kiosks at clinics throughout the health care system (and eventually
throughout the state) will be established to educate patients about possible participation in future clinical trials.
Subjects will be recruited to enter a registry of subjects willing to have their clinical data screened for eligibility
in clinical studies as protocols are developed. Basic demographic and phenotyping data will be collected using
the flexible tools described above and stored in the registry. The registry database will be processed to achieve
the quality required for a regulatory submission. Therefore, the registry will not only provide easy and informed
identification of potential subjects for future trials, but background data will already be collected, thereby
enabling a very quick study start-up phase. Success rates based on eventual enrollment into specific clinical
trials of subjects who were initially contacted at the kiosks will be closely monitored.
Several other small scale initiatives are currently underway that incorporate hospital clinical records from
WebCIS into a research protocol involving primary data collection. One such pilot is using downloaded
WebCIS data to assist in randomized clinical trials of disease management for patients with type 2 diabetes in
the general internal medicine practice.
PCIR Core Support: Another important initiative of the TraCS Institute Data Management Service will be to
support operations at the PCIR Core. Following are the activities that we envision under this initiative:
o E-Protocol web-based system to track and manage study documents
o Web-based recruitment tools for use throughout UNC Health Care and in NC communities
o Web-based adverse event/serious adverse event reporting and tracking system, linked to clinical study
databases
o Lab downloads from WebCIS warehouse—these are currently supported on a protocol by protocol basis.
The Carpe Diem pilot study is testing a live HL7 feed from WebCIS into the clinical database at the
Coordinating Center. Upon completion of the research warehouse initiative for WebCIS, clinical data
retrieval, such as downloads, will be easily implemented for all TraCS protocols.
o Data transfers from external central labs/centers with quality control procedures implemented (e.g., 5%
blinded replicate analysis)—these procedures are in place at the Coordinating Center and will be available
for all TraCS protocols.
o Collection of basic phenotyping and demographic data on all PCIR research subjects using standard case
report forms and existing web-based systems.
Integration of both staff and software tools from the GCRC informatics core and the Collaborative Studies
Coordinating Center and linkage of the resulting TraCS Data Management Service to the TraCS Institute will
facilitate learning opportunities for clinical investigational scholars who would benefit from immersion in this
living laboratory of clinical research data management. The TraCS Institute’s goal is to train the next
generation of superb and knowledgeable translational and clinical investigators, who will be the drivers of new
research endeavors. However, success at reaching the finish line defined as delivery of discovery to accepted
application to the public for improvement of health cannot occur without a well-engineered, well-running stateof-the art machine: clinical data management. Data drive the results that determine investigational product
safety and efficacy. Clinical investigators analyzing trial data and using those results to shape clinical practice
would benefit from a better understanding of how those data are generated – from design of case report forms,
to database construction, to data collection, editing, validation, summarization and analysis.
D-6. Expanded Community Engagement
Overview: This initiative will deliver for re-use throughout North Carolina the resources and services of NCCBI,
developed and tested at UNC-CH. These will include flexible and easy-to-use tools for electronic data capture,
research and community engagement portals, and access to the shared repository of clinical and other
biomedical data, protocols, processes, tools and knowledge resources. The social engagement processes that
encourage participation in NCCBI will extend to include healthcare providers, professionals, consumers and
payers.
Problem Statement: UNC-CH researchers have successfully performed clinical research in communities
across the state, but have had to work through informatics issues de novo with each project. Many studies
have involved laborious review of paper charts to create electronic databases that are useful as a one-time tool
but are too cumbersome for ongoing use. While these efforts have been effective for individual studies, they do
Biomedical Informatics Core
Watkins, Paul B.
not have the capability to efficiently accrue the cumulative data necessary to monitor dissemination efforts and
concomitant long term health outcomes.
What Do We Have to Build Upon? The North Carolina Area Health Education Centers (AHEC) and existing
investments in community-based research infrastructure are described in the Community Engagement Core
section. Past community-related biomedical informatics projects in this area have been successful in their own
right. These include syndromic surveillance of primary care practices, the accumulation of data from
emergency departments with disparate record systems, electronic surveys administered via laptops with
uploading of appropriately de-identified data to a central website, the Carpe Diem project piloting flexible data
collection tools, to name a few. The TraCS Biomedical Informatics Core along with the Community
Engagement Core through the NCCBI and related activities will add the social, organizational and technical
infrastructure needed to facilitate convenient and cost-effective re-use of data, knowledge and processes.
Proposed Solution: We will engage community stakeholders in continuous discussion of needs, priorities,
capabilities and concerns, resulting in a living community engagement agenda. Based on that agenda, we will
develop, pilot, test and implement convenient and cost-effective solutions that leverage resource investments
and optimize sharing and re-use.
Implementation Details: To identify needs and priorities for community engagement informatics support, we will
conduct periodic workshops with representatives from the Community Engagement Core and the distributed
communities. We will visit some of these communities to fully understand their problems and concerns and to
assist in designing workable solutions. A Community Engagement Portal will be developed to disseminate
current activities, capabilities, needs, interests, and the like, and will link to the Research Portal. The TraCS
Community Advisory Board will be able to advise and facilitate these interactions as well.
Alternative technologies will be demonstrated, tested, piloted and evaluated for field data capture (scanning,
intelligent pens, tablets, speech recognition, for example). Ongoing data accumulation, creation and
maintenance of registries, and uploading of information to the relevant data warehouse will occur, as
appropriate. The shared data repository will have the capability of combining clinical information with biologic
and genetic information. We also envision eventual incorporation of real time clinical reference materials and
decision aids as part of our informatics service to participating community practices, providing an educational
service and serving as incentives for continued participation in the TraCS informatics program.
We will develop the AHEC-based Regional Translation Research Units (RTRU) into regional health information
sites that can merge data and contribute to the NCCBI shared data repository at the TraCS Institute. Given the
variability of electronic record capabilities in the practices that will make up our clinical network, our informatics
solutions will be multi-faceted. We will leverage our partnerships with open source software companies to
develop and distribute these solutions. These multifaceted communication approaches will be delivered
through the range of user and outreach services – FAQs, websites, wikis, educational programs and so on.
We envision strong representation by the county health departments across North Carolina in each RTRU. The
application of information technology resources to the collection, aggregation and analysis of health data
inherent in the NCCBI can be leveraged in the public health domain. The data residing in current and emerging
clinical information systems can be adapted and accessed for use by public health practitioners to address a
variety of issues. Such access can support an ongoing assessment of data quality in systems designed to
support clinical care. Accessing clinical data systems can potentially provide a rich data source for syndromic
surveillance for disease outbreaks. Such an early warning system can highlight issues during those critical
hours before they spread beyond a limited area. These advances will require development of new technologies
to translate data from a variety of clinical systems into a normalized format and content.
D-7. Coordination of research infrastructure designed for re-use of data, processes, tools and services.
Overview: At the heart of the Collaboratory is the research cyberinfrastructure, which will extend beyond
today’s existing systems to include a shared data repository, reusable processes, standards and best practices,
applications and tools, interfaces and services, leveraging North Carolina’s leadership in open source software.
The extended cyberinfrastructure will be modeled after the NCBI, contributing, in particular, the protocols and
best practices, tools and services for a “Clinical Entrez.” The cyberinfrastructure will leverage the RENCI North
Carolina Bioportal project, which brings open source bioinformatics applications and data together with high
performance distributed computing resources (www.renci.org/projects/bio.php).
Biomedical Informatics Core
Watkins, Paul B.
Problem Statement: Current biomedical research infrastructure is fragmented; tools for searching and
analyzing heterogeneous databases are developed for one study at a time; and the processes and applications
are not shared. There is a critical need to support the proposed culture of collaboration with the capabilities to
support routine and convenient re-use of clinical and other research data, processes, and tools.
What Do We Have to Build Upon? The underlying IT requirements for integrating access to heterogeneous
databases and infrastructure are currently addressed in the RENCI Bioportal, a project focused on the diverse
needs of biology and biomedical research communities. This service layer is a programmatic interface
currently accessible to users via an interactive web portal, workflow automation tools such as Taverna, and
potentially other client software.
Proposed Solution: We will implement a comprehensive research cyberinfrastructure (http://sils.unc.edu/griffiths/nccbi_model). Critical elements to achieve this vision are 1) the commitment to improving the capture,
quality, comprehensiveness and curation of biomedical data and metadata; 2) implementation of a statewide
biomedical data/metadata repository with linkage and contribution to national level efforts; 3) development,
adoption and promulgation of standards of system interoperability and data portability; 4) development and
evaluation of improved applications and tools for discovery, analysis, presentation and decision support; 5)
design, implementation and evaluation of educational programs aimed at a wide variety of audiences and
offered through a range of modalities; 6) an improved understanding of the science and practice of
collaboration; and 7) multiple levels of data security and (when necessary) data encryption in order to conform
with IRB and HIPAA regulations as well as satisfy appropriate concerns of research partners such as payers.
Implementation Details: The proposed NCCBI cyberinfrastructure model includes 6 layers: 1) Base
Technologies include the general purpose computation, storage and communications technologies; 2)
Enabling Technologies include networking, operating systems, middleware and security technologies; 3)
Existing Systems include the biomedical information systems currently in place at UNC-CH and across the
state. These include a wide variety of independent clinical and research information systems, with minimal
linkage among them. (These first 3 layers just described largely comprise today’s biomedical computing
environment. The proposed expanded and enhanced cyberinfrastructure will contribute 3 new and distinct
layers that follow) 4) Shared Data Repository, which will include data of all types and associated metadata; 5)
Repository Tools and Services, including processes and best practices for a) data capture, b) data quality
assurance, c) long-term data curation, d) development and evaluation of metadata schema, taxonomies, other
forms of knowledge representation, e) access management and adoption and f) development of relevant
standards in each of these areas; and 6) User and Outreach Services that deliver information and knowledge
to specific populations that will also be developed and evaluated.
We propose to extend the Bioportal architecture to support clinical and translational biomedical science. This
work is concentrated primarily in 2 areas: 1) federated integration of clinical databases alongside existing
Bioportal data sources and 2) development of new translational services to enable efficient and effective
capture and sharing of multidisciplinary data across the Collaboratory, including security and user rights
management, data curation, metadata management and improved monitoring.
The proposed architecture for NCCBI involves extending the Bioportal to provide access to clinical databases
and applications. These include 1) the WebCIS research data warehouse, 2) data from clinical trials, 3) data
from population demographic studies, 4) data from the UNC-CH schools and departments, 5) other data from
UNC-CH and affiliates across North Carolina, 6) literature search services for aggregated searching of
research publications, and 7) decision support tools.
In addition to data, the architecture requires the development of translational services not currently supported
by the Bioportal. They include: 1) Collaboratory services for capturing and sharing of data across disciplines;
2) user access (rights and permissions) management and improved multi-level security; 3) data curation
services to provide users with support for contribution, provenance, sharing and ensuring long-term access to
data; 4) metadata management services to help unify and enhance interoperability of diverse data; and 5)
improved monitoring of infrastructure and user activity. Collaboratory services will enable the capturing of
results from one or more services, grouping and annotating the captured data and sharing them with other
users and communities as permitted by access restrictions on the derivative data. For example, a selection of
experimental data, clinical data and research literature can be annotated, combined into a virtual publication
and shared across the Collaboratory.
Biomedical Informatics Core
Watkins, Paul B.
A Cyberinfrastructure Group will be drawn from existing UNC-CH resources, consolidated and expanded to
ensure dedicated focus and effort on building the proposed shared resource. IT personnel from the existing
GCRC and Coordinating Center will be combined and expanded with new hires to create a co-located IT group
of 13-14 FTEs total. Seven of these individuals, 2 contributed by RENCI and 5 new hires, will be assigned to
cyberinfrastructure development. The cyberinfrastructure group will work to incorporate or develop new open
source code in modular form, in close collaboration with Red Hat, IBM and SAS.
A key transformative component of this proposal is the establishment of a group comprising Computer
Science and School of Information and Library Science (SILS) faculty and students to use the research
portal to identify translational and clinical biomedical problems that are in need of CS or SILS applied
solutions. UNC has a strong history of collaboration among SILS, Computer Science and the Schools of
Medicine and Public Health. Examples of such collaborations include:







Computer-Integrated Systems for Microscopy and Manipulation (“CISMM”), originally focused on the
molecular structure solution and simulation of static and dynamic interactions among molecules, to
understand cellular mitosis.
Medical Image Display and Analysis Group (“MIDAG”) led by Stephen M. Pizer, PhD, Kenan Professor,
Departments of Computer Science, Radiology, Radiation Oncology, and Biomedical Engineering. This is a
collaborative group of about 110 professionals from the departments of Computer Science, Radiology,
Radiation Oncology, Surgery, Psychiatry, Urology, Statistics, Mathematics, Biostatistics and Biomedical
Engineering, including ~25 graduate students.
Chief Complaint System, research related to the language/sublanguage used by the patient and triage
nurse to document the reason for the Emergency Department visit.
Information Extraction from WebCIS Notes as part of the DEcIDE project, whose goal is to explore
relationships between over-medication and diabetes outcomes.
Multi-User Extraction and Information Synthesis (METIS) enables one to detect hidden connections
from literature. The system, designed by SILS faculty member Cathy Blake, combines a knowledge-based
approach with shallow language processing to identify pre-defined facts from each full-text scientific article.
Personalized Information Synthesis for Breast Cancer Patients seeks to provide a breast cancer
patient with a personalized perspective on the latest scientific literature regarding her medical condition. It
uses electronically available patient information from WebCIS and full-text scientific literature.
Personal Health Record (PHR) Usability study to determine the effectiveness of alternative
organizational structures for PHRs that are personalized to specific kinds of health conditions. These are 2
important design features for user interfaces, and results will inform PHR design in this early stage of
development. This research could yield important information for the CTSA network.
To date, these collaborations have been driven by the interests and relationships among faculty. The TraCS
Institute will fund 4 supervised graduate students per year and create opportunities to work with
interdisciplinary teams on projects that further biomedical research or patient care. The portal could also be
extended to offer field experiences and internships for Computer Science and SILS students as well as other
students engaged in biomedical and health informatics programs at UNC-CH and other institutions.
E.
Measurement of Progress and Evaluation
The focus of the proposed biomedical informatics activities is on sharing and re-use of research
knowledge, data and protocols. The following measures of progress will be monitored in an ongoing manner
and reported quarterly: 1) number of participants (by discipline, affiliation, role and level/frequency of activity);
2) number of collaborations (by type, size, scope, modality and longevity); 3) amount of re-use of data,
protocols, tools, services, and other research knowledge (such as best practices, outcomes, literature); 4)
Number of new resources developed (by type); and 5) Use and re-use of newly developed resources.
Evaluation of NCCBI products and services will be conducted at regular intervals, as appropriate. For example,
the functionality and usability of each new resource, tool and service will be evaluated as they are made
available in prototype or pilot form. An annual evaluation of the NCCBI as a whole will be conducted to include
a participant survey to determine satisfaction with products, services and activities; perceived importance and
contribution of the products, services and activities to research, education and practice; and organizational
effectiveness. Complete detail for Tracking/Evaluation and Implementation and Milestones for this section is
found on page 1096.
Biomedical Informatics Core
Watkins, Paul B.
Literature Cited (not in page counts):
1. Benito, M. et al. Adjustment of systematic microarray data biases. Bioinformatics 20, 105-114 (2004).
2. Blake, C., and Pratt, W. Collaborative Information Synthesis. Paper presented at the 65th Annual Meeting of the American Society for Information Science and Technology, Philadelphia, PA. Published in
Proceedings (2002)
3. Blake, C. Information Synthesis: A Mixed-Initiative Meta-Analytic Approach to Facilitate Knowledge
Discovery from Scientific Text. Unpublished doctoral dissertation, University of California, Irvine
(2003).Carey, L.A. et al. Race, breast cancer subtypes, and survival in the Carolina Breast Cancer
Study. JAMA 295, 2492-502 (2006).
4. Chung, C.H. et al. Molecular classification of head and neck squamous cell carcinomas using patterns
of gene expression. Cancer Cell 5, 489-500 (2004).
5. Fan, C. et al. Concordance among gene-expression-based predictors for breast cancer. N Engl J Med
355, 560-9 (2006).
6. Hayes, D.N. et al. Gene expression profiling reveals reproducible human lung adenocarcinoma subtypes in multiple independent patient cohorts. J Clin Oncol 24, 5079-90 (2006).
7. Hu, Z. et al. The molecular portraits of breast tumors are conserved across microarray platforms. BMC
Genomics 7, 96 (2006).
8. Oh, D.S. et al. Estrogen-Regulated Genes Predict Survival in Hormone Receptor-Positive Breast Cancers. J Clin Oncol (2006).
9. Perou, C.M. et al. Molecular portraits of human breast tumors. Nature 406, 747-52 (2000).
10. Perreard, L. et al. Classification and risk stratification of invasive breast carcinomas using a real-time
quantitative RT-PCR assay. Breast Cancer Res 8, R23 (2006).
11. Rouzier, R. et al. Breast cancer molecular subtypes respond differently to preoperative chemotherapy.
Clin Cancer Res 11, 5678-85 (2005).
12. Scott Kraus, S., Blake, C. & West, S.L. Information Extraction from Medical Notes: The Power of One,
MEDINFO (under review)
13. Sorlie, T. et al. Gene expression profiles do not consistently predict the clinical treatment response in
locally advanced breast cancer. Mol Cancer Ther 5, 2914-8 (2006).
14. Sorlie, T. et al. Repeated observation of breast tumor subtypes in independent gene expression data
sets. Proc Natl Acad Sci U S A 100, 8418-23 (2003).
15. Sullivan, P.F., Fan, C. & Perou, C.M. Evaluating the comparability of gene expression in blood and
brain. Am J Med Genet B Neuropsychiatr Genet 141, 261-8 (2006).
16. Thomson, J.M., Parker, J., Perou, C.M. & Hammond, S.M. A custom microarray platform for analysis of
microRNA gene expression. Nat Methods 1, 47-53 (2004).
17. Travers, D.A., and Haas, S.W. Evaluation of Emergency Medical Text Processor, a system for cleaning
chief complaint data. Academic Emergency Medicine 11, 1170-1176 (2004).
18. Travers, D.A., and Haas, S.W. The Unified Medical Language System© coverage of emergency department chief complaints. Academic Emergency Medicine (in press).
Biomedical Informatics Core
Watkins, Paul B.
19. Travers, D.A., and Haas, S.W. Using nurses’ natural language entries to build a concept-oriented terminology for patients’ chief complaints in the emergency department. Journal of Biomedical Informatics.
36, 260-270 (2003).
20. Troester, M.A., Hoadley, K.A., Parker, J.S. & Perou, C.M. Prediction of toxicant-specific gene expression signatures after chemotherapeutic treatment of breast cell lines. Environ Health Perspect 112,
1607-13 (2004).
21. Weigelt, B. et al. Molecular portraits and 70-gene prognosis signature are preserved throughout the
metastatic process of breast cancer. Cancer Res 65, 9155-8 (2005).
Biomedical Informatics Core
Download