Biomedical Informatics - UNC School of Information and Library

Watkins, Paul B. BIOMEDICAL INFORMATICS CORE The TraCS Biomedical Informatics Core will unite the silos of biomedical informatics research excellence at UNC and across North Carolina to maximize re-use of data, knowledge and processes. With the establishment of the North Carolina Collaboratory for Biomedical Informatics (NCCBI), TraCS will support research, patient care, education and policy-making while building upon, leveraging and extending the current biomedical informatics infrastructure at UNC-CH. This core involves several external partners with a strong presence in NC and world-wide: Red Hat, IBM, SAS, Allscripts, Quintiles and NCHICA. We are committed to achieving a national leadership role in the design and development of best practices for the inclusion of clinical data into shared repositories of biomedical data. A. Biomedical Informatics Core Leadership Investigators: The following investigators will form the leadership of the Biomedical Informatics Core through membership on the NCCBI UNC-CH Steering Committee: Jose-Marie Griffiths, PhD, TraCS Biomedical Informatics Leader and Dean, School of Information and Library Science (SILS), is a member of the National Science Board and a Fellow of the American Association for the Advancement of Science, and co-authored Revolutionizing Health Care Through Information Technology while serving as a member of the President’s Information Technology Advisory Committee. Tim Carey, MD, Professor of Medicine and Director of the UNC-CH Sheps Center for Health Services Research, is an accomplished Type 2 Clinical Investigator with extensive experience with health care data bases. He is a member of the TraCS Translational Research Advisory Board and chair of the Type 2 Translation Subcommittee. Ronald Falk, MD, Professor of Medicine, Chief, Division of Nephrology, is an accomplished clinical and translational researcher who has successfully merged data from UNC electronic medical records with his research data. He directs the Glomerular Disease Collaborative Network which includes over 400 nephrologists throughout the Southeast United States. Bradley Mark Hemminger, PhD, Assistant Professor, joint appointment in School of Information and Library Science and the Carolina Center for Genome Sciences; Adjunct Assistant Professor, Department of Radiology, School of Medicine, will co-lead the Educational Opportunities Working Group. He has conducted an extensive needs assessment with bioinformatics researchers at UNC-CH and is working on the development of a unified vocabulary to support data access. Carol Jenkins, MLS, Director, Health Sciences Library, will work with SILS to design support services for the NCCBI. Ms. Jenkins chairs the UNC-CH campus-wide Information Technology Strategic Planning Committee. John P. Kichak, Vice President, Information Services Division of UNC Hospitals, will direct the development of the WebCIS data warehouse and provide significant leadership in the development of the NCCBI. Lisa LaVange, PhD, Professor of the Practice of Biostatistics in the School of Public Health and director of the Collaborative Studies Coordinating Center, will lead the NCCBI Data Management Initiative. She joined UNC after 10 years in industry and brings extensive experience in clinical data management and statistics to the TraCS Institute, with a particular emphasis on trial design, data management and analysis in a regulatory environment. She is a Co-PI of this CTSA proposal and an associate director of the TraCS Institute. Terry Magnuson, PhD, Sarah Graham Kenan Professor and Chair of Department of Genetics, director of the Carolina Center for Genome Sciences Program and director of Cancer Genetics, Lineberger Comprehensive Cancer Center, will provide leadership in building the TraCS-facilitated Translational Genomics Research Initiative. Chuck Perou, PhD, TraCS Biomedical Informatics Core Co-Director, Assistant Professor, Department of Genetics with a research focus in cancer molecular genetics, will lead the Translational Genomics Research Initiative. Russell Taylor, PhD, Research Professor, joint appointment in the Department of Computer Science and the Department of Physics and Astronomy, will co-lead the Educational Opportunities Working Group. Biomedical Informatics Core Watkins, Paul B. Daniel Reed, PhD, founding director of the Renaissance Computing Institute, Chancellor’s Eminent Professor Vice Chancellor for Information Technology, and UNC-CH CIO is a member of the President’s Council of Advisors on Science and Technology, Chair of the Board of Directors of the Computing Research Association and member of the Biomedical Informatics Expert Panel for the NIH National Center. Dr. Reed was formerly director of the National Center for Supercomputing Applications (NCSA) and, while a member of the President’s Information Technology Advisory Committee, co-authored their report, Revolutionizing Health Care through Information Technology. He will lead the integration of the Portal into NCCBI and provide design leadership. External Partners: UNC-CH is in close geographic proximity to several leading information technology companies which will work together with us to define the parameters of this ambitious project, helping to identify the milestones and contribute intellectual capital to its achievement. These companies include: Red Hat – the leading open source software company will provide intellectual guidance on an open source architecture, help populate the international advisory council for the NCCBI and connect us with related global open source activities. IBM – has relationships with UNC-CH through its research labs worldwide and its healthcare and life sciences solutions group located in Research Triangle Park, NC. Preliminary discussions related to the biomedical informatics core have focused on the National Health Information Network and Health Information Exchange, bioinformatics research and underlying architecture for the NCCBI cyberinfrastructure. SAS – the statistical software company is currently engaged in discussions to develop the UNC Health Care System research data warehouse. The company is also world-class in the area of analytics – an important component of biomedical informatics. Allscripts – provides software for maintaining electronic health records. Allscripts is in the process of developing a data warehouse for many millions of patients and plans to make available its electronic health records to the shared repository, thereby enhancing the potential to enroll research subjects into clinical trials. Quintiles – will assist in clinical data and policy issues through the participation of Judith Beach, a nationally recognized expert on HIPAA, and via other in-kind support. North Carolina Healthcare Information and Communications Alliance (NCHICA) – is a nonprofit consortium of over 220 organizations dedicated to improving healthcare statewide by accelerating the adoption of information technology. B. Vision The history of science has taught us that the more broadly and transparently scientific data, knowledge and processes are shared and re-used, the stronger the science and the faster the rate at which the science advances. Activities over the past decade have leveraged public infrastructure and open access to better practice of science itself through, for example, the National Library of Medicine’s National Center for Biotechnology Information (NCBI), a resource for molecular biology information. NCBI creates public databases, conducts research in computational biology, develops software tools for analyzing genome data and disseminates biomedical information. Of particular significance for its usability and functionality is Entrez, NCBI’s cross-database search engine, toolkits and educational programs. The TraCS Biomedical Informatics Core will use this basic framework and infrastructure as a model to build an Entrez-like data system for clinical data, which may come from clinical trials and studies, actual patient care or both. The problem of collecting, cataloging, protecting, sharing and reusing clinical data is a non-trivial one, but it is a problem that also contains its own solution. As the Entrez system teaches, the solution comes not by designing the system around the method of collection or the structure of the catalog, but rather from thinking about how to maximize the nature and the number of ways that data can be properly shared. UNC-CH, along with its partners in government, healthcare and industry, is well-positioned to make this vision a reality, with measurable implementation milestones. Several recent developments at UNC-CH confirm the institutional commitment to build collaborative and far-reaching biomedical informatics activities:  A Bioinformatics planning group, led by Dean Jose-Marie Griffiths and comprised of representatives from the UNC-CH Schools of Medicine, Public Health, Dentistry, Nursing, Pharmacy, Information and Library Science and the Department of Computer Science, has been working for the past 2 years on plans for integrated health informatics education and related research. This effort has provided the foundation for the TraCS Biomedical Informatics Core.  UNC Health Care System is in the process of planning for a research data warehouse that will facilitate access to clinical and administrative data for research purposes. Biomedical Informatics Core    Watkins, Paul B. The Renaissance Computing Institute (RENCI), founded in 2004, is a major collaborative anchored at UNC-CH and supported by North Carolina State University and Duke University. One of RENCI’s primary foci is integrating information technologies for applications to biology, biomedicine and genomics. NCHICA (NC Healthcare Informatics Work Group) led by David Potenziani, Director of Information Technology and Adjunct Assistant Professor, UNC School of Public Health, is assisting with the re-tooling and transformation of the currently fragmented NC healthcare community from largely paper-based records to an interoperable federated system of electronic information to improve healthcare. Strong relationships exist among UNC-CH’s School of Information and Library Science, Department of Computer Science, Red Hat and IBM, especially in development and dissemination of open source software, open access digital libraries, knowledge representation, computer modeling and visualization. To support the goals of the TraCS Institute, the Biomedical Informatics Core will create a statewide interdisciplinary and inter-institutional collaboratory (collaborative laboratory): the North Carolina Collaboratory for Biomedical Informatics (NCCBI). It will build on the transformative technology used by the NIH to create Entrez for the NCBI. The long-term goal is to create a shared biomedical informatics data repository connecting clinical enterprises across the State of North Carolina to create a demonstration project for clinical data that will be a model for sharing and re-use of clinical data. This repository will contain appropriately deidentified data from clinical trials and clinical care. With the establishment of the NCCBI, the TraCS Biomedical Informatics Core will transform the excellent but fragmented biomedical informatics capabilities at UNC-CH into a coherent and connected system that facilitates routine re-use of research knowledge, data and processes throughout UNC and North Carolina, serving as a prototype for the nation. An effort of this magnitude stems from the rapidly emerging and urgent need to enhance and promote a culture of collaboration among researchers, clinicians, information technologists, educators, consumers, payers, and others. This requires fostering and sustaining interdisciplinary collaboration across traditional boundaries; managing large and diverse data sets generated by a variety of methods from different disciplines; mining and analyzing these data with advanced statistical, mathematical and computational techniques; and developing robust, user-friendly tools and services for a variety of stakeholder constituencies. The construction of the NCCBI will require funding that exceeds existing and proposed resources and will require years of dedicated work. UNC-CH is committed to seeking the necessary funds from the private as well as the public sector to realize this vision. C. Specific Aims The TraCS Biomedical Informatics Core has identified 7 specific aims or requirements that we believe we must effect to leverage and facilitate the power of re-use and transform biomedical informatics at UNC and ultimately across North Carolina. C-1. Improved collaboration among researchers, clinicians and educators. Effective collaboration results from a combination of social, organizational and technical strategies and processes. Achievement of this aim will require extensive user involvement, change management, promotion of sharing and collaboration, development of effective processes and tools to address specific problems and adoption of standards. C-2. Improved ability to search for clinical data and re-use with other research data. A critical step in achieving this aim is to implement a research data warehouse for accessing UNC Health Care System electronic health records and other clinical and administrative data. The end result of this project will be a secure, searchable data repository with capabilities to download clinical care data for integration with other research databases, including databases from primary data collection efforts under TraCS protocols. C-3. Improved data mining and analysis capabilities to facilitate re-use of genomic and clinical data. This objective will be accomplished through the expansion of the widely recognized bioinformatics center at UNC’s Lineberger Comprehensive Cancer Center to develop a network through which genomic research centers across campus can be linked to each other, to clinical care databases and to other research databases. C-4. Clinical research data management systems that are re-usable and scalable across varied research protocols. We will draw upon existing clinical research data management resources in the current GCRC and the Collaborative Studies Coordinating Center, integrating and expanding resources and tools to Biomedical Informatics Core Watkins, Paul B. enable more efficient and accurate real-time data collection, processing, management and analysis for TraCS research protocols. C-5. Flexible data collection procedures and tools that will support patient quality-of-care analyses as well as facilitate subject recruitment for future research studies. This initiative will provide high quality clinical databases populated with information collected from consenting, treatment-seeking patients in the UNC Health Care System and, ultimately, throughout North Carolina to facilitate analyses of patient quality of care as well as greatly shorten the length of time required to screen and enroll patients as subjects in future research protocols. C-6. Expanded community engagement. We will build on existing relationships with the broader biomedical community, described more fully in the Community Engagement Core. Achievement of the aim will be based on community biomedical informatics needs, priorities and perceived barriers to participation in collaborative activities, the capture and accumulation of data in a shared and appropriately secure repository, availability of a variety of tools for accessing, analyzing and re-using the data and development of targeted user portals, and registries and educational programs aimed at multiple audiences. C-7. Coordination of research infrastructure designed for re-use of data, processes, tools and services. We will include design, development, testing and implementation of a biomedical informatics cyberinfrastructure that includes a shared data repository; reusable processes, protocols, tools and applications and a variety of user services. D. Detailed Vision D-1. Improved collaboration among researchers, clinicians and educators: Establishment of the North Carolina Collaboratory for Biomedical Informatics Overview: Engaging the power of re-use requires significant and challenging behavior changes among researchers. First and foremost among these is the development of a culture of collaboration. The NCCBI will establish the social, organizational and technical framework needed to promote and sustain this new high level of collaboration. This initiative will draw upon investigators’ experience in developing and sustaining successful interdisciplinary collaborations. Problem Statement: There are many areas of excellence in biomedical informatics at UNC-CH. Strong groups of researchers and clinicians use the most modern informatics tools and data sources available but they lack awareness of the full range of activities and capabilities, and an adequate infrastructure to support sharing and collaboration, and they experience a collaboration culture that is not as interdisciplinary or routine as it might be. This results in suboptimal use of existing resources, duplication of effort, and lost opportunities. There exists a critical need for coordination across organizational units to 1) expand awareness of the wide range of relevant biomedical informatics activities and resources; 2) foster and support effective, efficient and routine reuse of research knowledge, data and processes; and 3) stimulate the development and dissemination of new ideas. However, new collaborations are not easy to build and sustain, especially when they cross multiple disciplines and institutions. They need deliberate and focused effort and must deliver clear and immediate value to those participating through demonstrated convenience, usability, reward and recognition. What Do We Have to Build Upon? The current biomedical informatics infrastructure and leadership at UNC-CH provide a substantial base on which to build the work of the NCCBI. UNC-CH has significant ongoing activities in biomedical informatics, (described in more detail subsequently). However, there exists a critical need for coordination across both databases and UNC-CH schools and research centers to stimulate and support effective and efficient clinical and translational research. Strong, yet insulated and separated groups of researchers and clinicians use the most modern informatics tools available but need the people, software and tools to link disparate databases together and to help people identify new opportunities for collaboration. To date, UNC-CH has established a number of focused interdisciplinary research centers with varying degrees of informatics capabilities activities: the General Clinical Research Center, the Collaborative Studies Coordinating Center, the Lineberger Comprehensive Cancer Center, the Carolina Center for Exploratory Genetic Analysis, the Carolina Center for Genome Sciences, the Carolina Exploratory Center for Cheminformatics Research, the Biomedical Imaging Research Center, the Carolina Environmental Bioinformatics Center, the Center for Bioinformatics, the Renaissance Computing Institute, and the Odum Biomedical Informatics Core Watkins, Paul B. Institute for Research in Social Science, among others. These centers engage faculty, staff and students or schools and departments at UNC-CH and occasionally from other UNC institutions. Each of these centers has been successful in its own right, but each could potentially contribute to an institution-wide or broader research infrastructure, leveraging critical mass and specialized resources. Experience of the lead investigators with computer-supported collaborative work in multi-institutional interdisciplinary collaboratives such as the Center for Environmentally Responsible Solvents and Processes (http://www.nsfstc.unc.edu/), Space Physics and Aeronomy Research Collaboratory (http://www.si.umich.edu/sparc), the high energy physics collaboratory: ATLAS (http://ganesh.lsa.umich.edu), to name a few, has shown that the social, behavioral and organizational transformations take time to develop. They result from deliberate and persistent attention to the human perceptions, preferences, values, incentives and connections, along with organizational and technological innovation. They also demonstrate that transformational changes can and do occur especially when the collaborations include social and behavioral scientists who can study the evolving collaboration patterns and design interventions to overcome problems as they arise. Research on collaboration (http://www.scienceofcollaboratories.org/) confirms that success in science collaboratories is based on a complex mix of social and technical factors. Proposed Solution: We plan to design and establish a statewide interdisciplinary and inter-institutional collaboratory: the North Carolina Collaboratory for Biomedical Informatics (NCCBI) to 1) develop a culture of collaboration through rich, recurring human engagement oriented to common interests/concerns; 2) build a new organization with components of existing organizations (GCRC, Coordinating Center, RENCI), newly established positions, and a participatory governance structure; and 3) expand the technical infrastructure to facilitate routine re-use of biomedical research knowledge, data and processes throughout UNC and eventually across North Carolina. Achieving this vision will require promotion of sharing and collaboration, with incentives for participation to the participants derived from access to new data and services, extensive user involvement, change management, implementation of tools and applications to address specific problems and adoption of standards for representation of data, metadata, processes and knowledge. Implementation Details: Past experience has shown that collaboration cannot be driven by technological infrastructure alone. Effective collaboration is achieved through the combination of a set of social processes to stimulate and sustain awareness, engagement, sharing and innovation, along with the development and deployment of a technical infrastructure of processes, data and information sources, tools and services that deliver needed functionality in a convenient, easy-to-use form. It is the social interaction that generates and drives the technical agendas. Since specific implementation tactics will depend on ongoing user input, the details will be developed as the project evolves. However, we will begin immediately to design start-up strategies and to develop strategic and implementation plans for NCCBI. Start-up strategies will begin immediately to build on momentum developed by the broader campus-wide discussions of integrated health informatics activities and the intensive interactions associated with the development of this proposal. To begin to build awareness of the potential power of re-use, a monthly NCCBI lecture series addressing issues of sharing, collaboration and re-use will be established starting in February 2007. A related website, as part of or linked to the TraCS website, will disseminate announcements, press releases, speaker information and presentation materials, and a wiki will be started for ongoing discussion. A formal strategic planning process for NCCBI will begin in Spring 2007, consolidating 3 existing planning threads: research data warehouse, integrated biomedical/health informatics research and education initiative, and TraCS Biomedical Informatics Core. The process will involve multiple stakeholder constituencies to identify needs, resources, capabilities and aspirations along with perceived barriers to sharing and re-use. Anticipated elements of the NCCBI strategic plan include a long-term vision and roadmap with phased implementation steps. The plan will address the social, organizational and technical developments necessary to realize the vision. We anticipate that a strategic plan will be prepared by September 2007. An annual planning retreat will review and refine priorities for the upcoming year in the context of a 3-year window. An annual conference will present updates on research and development projects as well as results that can inform the planning and prioritization process. The social developments include a range of communication and engagement opportunities: meetings, presentations, symposia, workshops, conferences, and open forums, along with mechanisms to overcome potential barriers to participation and sharing: representation, intellectual property, ownership, credit, recognition and reward mechanisms. Biomedical Informatics Core Watkins, Paul B. The organization and governance structure of the NCCBI will be enhanced and enabled by several key committees and working groups that will engage in development of collaborative agendas for research, information technology, educational opportunities, and clinical care. The NCCBI UNC-CH Steering Committee (members listed in part A of this Core Section) will guide the ongoing development of NCCBI and especially the prioritization of activities and resource allocations to ensure that the NCCBI biomedical informatics agendas are aligned with and supportive of the overall goals of TraCS and those of the other cores. The Deans’ Research Collaboration Council, along with 4 specific topic working groups (on educational opportunities, clinical care, research and IT domains) will focus on interdisciplinary research priorities and the underlying IT requirements. The International Advisory Council will offer a global context for cutting-edge developments in biomedical informatics. The External Partner Committee will bring together external providers and their respective capabilities and contributions to ensure that their efforts are harmonized and leveraged to support the NCCBI goals and priorities. Two staff teams will support data management and cyberinfrastructure services, respectively. A communications specialist will facilitate internal and external communications, working with the TraCS Institute Office of Communications as needed. Meetings will occur at regular intervals: weekly (data management team, cyberinfrastructure team), monthly (Steering Committee, External Partner Committee, Working Groups), quarterly (Deans’ Research Collaboration Council) and annually (International Advisory Council). Anticipated technical solutions to improve collaboration through NCCBI include: Development of a research portal to register researchers, projects, data sets, tools developed or used, problems, publications, and more. Availability of personalized library and information services such as alerting services and on-demand reference services. Development of rules of engagement that address the potential barriers to participation. Implementing the WebCIS research data warehouse (see D-2). Establishment of a Translational Genomics Initiative (see D-3). Establishment of a Data Management Services (see D-4). Recruitment of subjects for clinical trails through kiosks and other smart technology (see D-5). Community engagement (see D-6). Coordination of research infrastructure designed for re-use of data, processes, tools (see D-7). D-2. Improved ability to search for clinical data and re-use with other research data: Implementing the WebCIS research data warehouse Overview: The development of a research data warehouse derived from UNC’s existing electronic medical record system, WebCIS, is a critical component of the NCCBI. The warehouse will physically reside within the Health Care System firewall but will have expanded search capabilities for analysis as well as secure, real time links with research databases for data transfer. UNC-CH will partner with SAS, IBM and possibly other companies in creating the research data warehouse, the first step in achieving the NCCBI. Problem Statement: WebCIS was designed as a transaction and workflow system. While clinical data can be exported, the system cannot, in its current form, support clinical research on a large scale, and documentation of the current data is relatively poor. The ability to download and link clinical data with research data such as questionnaires and laboratory data does exist, but is conducted ad-hoc for each new project. The institution urgently needs a clinical data repository. What Do We Have to Build Upon? The UNC Health Care System’s Information Technology Division is responsible for supporting all IT infrastructure and clinical patient care applications for UNC hospitals and affiliated entities, with over 1M active records. The division’s objective is to maximize patient care and operational efficiency by ensuring the ability to incorporate back end systems and operational process integration. Over the past 12 years the division has built an electronic medical record with an in-house developed system, WebCIS, which offers a common interface for 12,400 physicians regardless of where they practice within the UNC Health Care System. All ambulatory care data are computerized. Inpatient care noting is complete. Computerized physician order entry is universal across the hospital. Outpatient prescriptions are online, including the ability to directly transmit prescriptions to pharmacies. Patient care data are stored and have the potential to provide a rich resource for clinical and health services research. The research data warehouse has been approved for internal UNC-CH funding in FY07. Biomedical Informatics Core Watkins, Paul B. Proposed Solution: This integrated enterprise data warehouse will create actionable intelligence that can impact clinical effectiveness, fiscal integrity and research outcomes across the organization via access to and use of timely and accurate data. The result will be an enterprise intelligence platform that can be utilized for analysis, prediction and alignment and that will consistently enhance the leadership position of the UNC Health Care System. Implementation Details: This project will be divided into 3 stages, with each stage occurring in parallel and each building towards a federated data warehouse: 1. Building a research data warehouse for all the functionality mentioned above by extracting and transferring the 15 years of clinical data contained within the UNC HCS electronic medical record. 2. Building and linking all the administrative/financial decision support data bases under the umbrella of the data warehouse. 3. Building and implementing an easy-to-use query tool that will have access to the federated data dictionary to enable all researchers and clinicians to perform queries on clinical data linked to financial and research data. Subsequent phases will include the creation of the shared data repository clinical programs from across the state of North Carolina using open source, open standards and open access. These efforts will involve partnerships with IBM, Red Hat, Allscripts and NCHICA. D-3. Improved data mining and analysis capabilities to facilitate re-use of genomic and clinical data: Establishment of a Translational Genomics Research Initiative Overview: This aim will be accomplished through the expansion of the very successful bioinformatics center at UNC’s Lineberger Comprehensive Cancer Center and the continued development of the infrastructure through which various clinical research centers located throughout the campus can be linked to each other, to clinical care databases and to other research databases. Problem Statement: There is need for greater coordination of efforts and research infrastructure for sharing data, tools and services. There is an urgent need for an informatics infrastructure to link clinical scientists with the scientists who generate genomic data on clinical materials and who can advise on types of genomic analyses to perform and genomic assay experimental designs that work, as well as to help with analysis and storage of genomic data. What Do We Have to Build Upon? UNC-CH has numerous groups excelling in bioinformatics research, with faculty expertise in biology, biostatistics, statistics, chem-informatics, computer science, genetics, information science, library science, pharmacogenomics and systems biology. Areas of application include all areas of high-output –omics technologies, as well as sequence analysis, traditional genetics, the synthesis of information across data types and interfacing with clinical information. UNC-CH has also longstanding expertise in relevant biomedical areas, such as mouse genetics and numerous clinical disciplines. In 2001 the Carolina Center for Genome Sciences (genomics.unc.edu) was developed in a 10-year $245M Genome Sciences Initiative at UNC-CH, with over 40 faculty members in departments across the university. At least half of the new hires work in various areas of bioinformatics. The Carolina Center for Genome Sciences unites bioinformatics and biomedical investigators and has fostered extensive collaboration. Although the bioinformatics expertise at UNC-CH is strong and maturing, areas designated/targeted for growth include statistical genetics, biostatistics, genetics, pharmacogenomics, individualized therapy, proteomics, metabolomics and imaging. Bioinformatics Consortia and Research Centers: UNC-CH is a funded participant in the large Cancer Bioinformatics Grid (CaBIG, cabig.nci.nih.gov), with special emphasis on distance-weighted discrimination tools for machine learning and cross-platform normalization of microarrays. The Carolina Center for Exploratory Genetic Analysis is funded by an NIH grant to explore methods for genotype-phenotype analysis and models for matching clinical and genomic datasets. The Carolina Exploratory Center for Cheminformatics Research (neccr.org) is developing quantitative tools that will design and explore chemical libraries and highthroughput screening results to better understand toxicity and efficacy of small molecules in complex biological systems. The Biomedical Research Imaging Center (bric.unc.edu) builds upon UNC-CH strengths in image analysis. The Carolina Environmental Bioinformatics Center is funded by a 5-year, $4.5M grant to create tools Biomedical Informatics Core Watkins, Paul B. and methods for handling toxicogenomics and related environmental science datasets and brings together 17 faculty across UNC-CH in a truly cross-disciplinary effort. Standard Software Applications, Software Training and Funded Bioinformatics Cores: The UNC-CH Center for Bioinformatics (bioinformatics.unc.edu) supports the use of computational biology tools throughout UNC-CH. The Center for Bioinformatics serves as a resource for numerous standard bioinformatics applications, including sequence analysis and database development. UNC-CH provides funding for several bioinformatics cores that support large federal grants on campus. We list a few examples, beyond the group led by Dr. Perou in the Cancer Center (see below) and highlighted elsewhere in this CTSA application. The bioinformatics cores include the Biostatistics and Bioinformatics Core for the UNC-CH Lineberger Cancer Center’s Gastro-Intestinal Specialized Program of Research Excellence (SPORE), the Biostatistics Core of the Center for Environmental Health Susceptibility, the Microarray Analysis Core of the Neurodevelopmental Disorders Research Center and a similar core for the UNC Neurosciences Center. Bioinformatics Training at UNC: Numerous federally funded pre-doctoral and post-doctoral training grants and programs on campus provide training in bioinformatics. The flagship is the grant for the UNC Bioinformatics and Computational Biology PhD certificate program (bcb.unc.edu, funded by NIGMS), which unites interested students and bioinformatics advising faculty from across UNC; it will become a freestanding PhD program in the future. The Cancer Genomics training grant in Biostatistics (NCI) focuses on statistical genomics. The Environmental Sciences training grant (NIEHS) has several students working on genomics bioinformatics methods. Similarly, the UNC-CH Toxicology Curriculum is moving steadily to incorporate more toxicogenomics and bioinformatics research training. Computing facilities: Computing capabilities at UNC-CH are excellent and continuously improving. In addition to the computing facilities in individual departments, UNC-CH’s Information Technology Services maintains several major multiprocess clusters. Moreover, several UNC-CH researchers have active research collaborations with RENCI, which provides additional leadership and assistance and has recently brought online a 1024 computer-node Blue Gene L cluster with 5.6 Tflop peak performance. Lineberger Comprehensive Cancer Center’s Bioinformatics Group: One of our most successful groups has been the Lineberger Comprehensive Cancer Center’s (LCCC) Bioinformatics Group, which represents an outstanding multidisciplinary team of cancer biologists, clinical researchers, genomic specialists, bioinformaticians and biostatisticians, co-directed by LCCC faculty members Charles Perou, PhD (Departments of Genetics and Pathology), Steve Marron, PhD (Department of Statistics and Operations Research), and D. Neil Hayes, MD (Department of Oncology), who is serving as the medical director; additional statistical analysis support is also provided by Andrew Nobel, PhD, of Statistics and Operations Research. The Bioinformatics Group provides several major services:    UNC Microarray Database, (with 2-color arrays and a database for Affymetrix data). The UNC Microarray Database currently houses over 10,000 experiments and has 289 registered users. Microarray data analysis has resulted in the publication of 14 papers since 2003 co-authored by at least 2 members of the LCCC-BG.1-14 UNC’s work will be part of the Genome Atlas project. Expertise in the development, maintenance and mining of databases that contain cancer patient clinical information. An honest broker system to link data from the same participant across databases. Proposed Solution: The already existing goals of the LCCC Bioinformatics Group are to provide genomic database services (maintaining a gene expression and SNP database), provide relational patient clinical databases (for tissues, for patient treatment information and for tumor sample information), and to provide expertise in analyzing the data that are stored in these 2 types of databases (i.e., statistical analysis of genomic data and biostatistical analysis of clinical/patient data). We propose here to expand these goals in 2 ways to improve our data mining and analysis capabilities: 1) to provide a direct computational bridge between these 2 types of databases and 2) to provide the computational know-how to other translational groups of researchers and clinicians at UNC-CH to facilitate additional translational genomics studies. And to enable this expansion and implementation we will establish the Translational Genomics Research Initiative that will be led by Dr. Charles Perou. Biomedical Informatics Core Watkins, Paul B. Implementation Details: The LCCC Bioinformatics Group already maintains at least 3 different genomic and 4 different clinical databases. A focus of the LCCC Bioinformatics Group is on breast cancer, and thus as one of our early CTSA-TraCS projects through the Translational Genomics Research Initiative, we propose to work to link together our breast genomic data with our breast cancer patient clinical databases. We are currently building a comprehensive search and retrieval portal on top of our existing honest broker system. This will enable approved researchers to seamlessly query multiple databases, including the clinical, gene expression, SNP and breast tissue databases, which will retrieve de-identified, HIPAA-compliant search results. The “breast research portal” will provide the honest brokers with the ability to authorize and control each researcher’s ability to search databases based on IRB approvals. The portal, while integrating the databases, will also help in building common vocabularies that would help future database integration with other institutions, including the hospital systems. As part of this first effort of the Translational Genomics Research Initiative, we also propose to integrate this breast data system into the WebCIS research data warehouse (see D-2), and link it to the Bioportal that is part of the Carolina Center for Exploratory Genetic Analysis. This breast cancer focused project will be one of our first attempts at linking multiple databases and data types together. We are confident that the integration and federation of the breast research portal, WebCIS data warehouse and Bioportal will be successful given our already significant success in merging breast clinical and genomic data together to make an impact upon breast cancer biology and treatment.3 The next step for our Translational Genomics Research Initiative will be to adapt this system for other CTSA researchers; that would potentially include other cancer research groups (lung, GI, ovarian), and other disease-focus groups including mental disorders, cystic fibrosis and cardiovascular diseases. We can provide the microarray and SNP database to serve the needs of all UNC researchers, and using the funds requested here, our TraCS genomics initiative will work with other disease-focus groups to 1) identify the existing clinical databases and data types that researchers have, 2) create a computational link between these existing databases and our gene expression and SNP databases, and 3) assist these researchers with the data retrieval and analysis, including combined analyses of genomic and clinical data. As research groups and priorities are set (which will likely include the well established cancer, cystic fibrosis and cardiovascular groups at UNC-CH), the Translational Genomics Research Initiative within the TraCS infrastructure will work individually with each group to link existing database within the LCCC Bioinformatics Group with the individual databases, and then most importantly, individually work with the clinical scientists to query the genomic data relative to the clinical data so that any potential correlates can be found. Thus, the experience in database linking and analysis of Dr. Perou and colleagues in the new Translational Genomics Research Initiative will be shared and used among TraCS investigators and trainees to the benefit these other existing strong and collaborative clinical research programs. D-4. Clinical research data management systems that are re-usable and scalable across varied research protocols: Combining resources from the GCRC and Collaborative Studies Coordinating Center to establish the TraCS Data Management Services Overview: This initiative will draw upon existing clinical research data management resources in the current the GCRC and the UNC Collaborative Studies Coordinating Center, integrating and expanding resources and tools to enable more efficient and accurate real-time data collection, processing, management and analysis for TraCS research protocols. Problem Statement: Development of unified best practices in data management and reasonable unified data structure specifications are key to the success of TraCS research in that both are needed to facilitate combining complex data between vastly different scientific enterprises. As an example, consider linking genotype, imaging, and clinic population data in order to analyze genetic modifiers and risk factors in the longitudinal evolution of brain changes due to Alzheimer’s disease. Such a massively complex unification of diverse data is not possible with the current data management resources available to UNC-CH investigators. The proposed Biomedical Informatics Core will provide the infrastructure within which these linkages can take place. A key component of this infrastructure will be clinical data management for research studies. What Do We Have to Build Upon? Currently, the Informatics Core of the GCRC provides data management support to investigators, administrative support to GCRC staff and server/workstation management for the entire GCRC. Data management support, ranging from initial consultation to final database export for analysis, is tailored to the unique needs of each research protocol. While the staff is extremely productive, efficient and Biomedical Informatics Core Watkins, Paul B. able to handle the workload of the existing GCRC, most applications are developed on a per protocol basis and are not easily re-usable. The UNC Collaborative Studies Coordinating Center, established in 1971 and continuously funded by NIH for the past 35 years, has set the standards for study coordination in general and data management, in particular of large, multi-center studies. The center has been a pioneer in clinical data management, implementing remote data entry on a national project in 1987 (the first NIH coordinating center to do so), followed by a webbased data management system in 2001. Features of center-developed data management systems include interactive data entry with real-time field validation, audit logs to record database modifications, integrity checks for the database, security (in logins, permissions based on need and encryption), automated data queries, reporting, specimen tracking, re-key verification, forms inventory, data imports and exports (for analysis) and options for local or server-based software. The center is currently on its 4th generation system, based on Visual Basic generated HTML screens and a Microsoft SQL Server database. All systems satisfy FDA guidelines for electronic records and signatures (21 CFR Part 11). However, as an NIH-funded Coordinating Center, the center has had more experience working with medical centers outside UNC-CH than within. The extensive data management, project coordination, and statistical consulting services available at the Coordinating Center have not, for the most part, been utilized by UNC School of Medicine research projects. Two examples serve to illustrate the need for a unified, accessible approach to research data management in the TraCS. The first example is a proposed study of a dietary intervention to reduce the risk of pre-term births. This clinical trial was conceived as a potential R01 application from UNC’s Department of Obstetrics and Gynecology (John Thorp, MD, as PI). For the study to be successful, many more pregnant women will need to be enrolled than are available through UNC clinics alone. Providing full coordinating center support of a multisite study is beyond the scope of the current GCRC Informatics Core services, and funding for a stand-alone coordinating center is beyond the financial limits of a standard R01 application; therefore, this project would be viewed as too big for the GCRC informatics core and too small for the Coordinating Center. This study needs access to web-based data entry and tracking systems at a cost well within the R01 cap. The recently awarded SCCOR project (UNC Cystic Fibrosis Center; Ric Boucher, MD, as PI) provides a second example. In addition to basic science studies, the project includes 3 clinical studies, each consisting of an observational component and a randomized clinical trial. The clinical studies share common objectives and methodologies across a spectrum of research subjects (healthy smokers, COPD patients, CF patients), and the potential to pool data across studies is a key aspect of the design. Further, sputum samples and other specimens will be moved from lab to lab as part of this project, and the ability to enter data at each station via a web-based system will facilitate the processing and tracking of study data and results. While this is not a multicenter study, clearly the tools developed for same would greatly enhance the data infrastructure of this study. Proposed Solution: We intend to build upon the strengths of the 2 existing core facilities, the GCRC Informatics Core and the Coordinating Center, to provide superb data management capabilities to the TraCS Institute investigators. The proposed strategy for clinical data management will provide research databases of extremely high quality that are locked and ready for statistical analysis soon after last subject visits are completed. Research databases will be automatically linked to other components of the TraCS Collaboratory through the proposed biomedical informatics infrastructure and readily accessible to clinical investigators at UNC as well as at other CTSA institutions for joint research efforts, provided access is granted. Implementation Details: A Data Management Service will be established as part of the Biomedical Informatics Core. This service will be located within the TraCS offices and will incorporate the existing GCRC Informatics Core staff and facilities. The new Data Management Service will represent an increase in staffing levels of the current GCRC informatics core to reflect both the greater number of studies anticipated and an expansion of services. In addition, a core group of database and web programmers and network staff from the UNC Collaborative Studies Coordinating Center will be assigned to the TraCS Data Management Service and located within TraCS Institute offices to facilitate implementation of the Coordinating Center’s web-based systems and tools for use in TraCS protocols. The Coordinating Center’s web-based systems proposed for the TraCS Data Management Service are tabledriven in that the core code is independent of the study design. Moreover, study-specific information, including electronic case report form layouts and edit specifications, is stored in database tables. The system is selfdocumenting, with variable names in each table corresponding to question numbers on case report forms or Biomedical Informatics Core Watkins, Paul B. data entry screens. Query reporting and resolution are incorporated into the data management system, and SAS reports of study progress and database status can be generated automatically. This particular design lends itself extremely well to a quick study start-up period, once protocols and data collection instruments are finalized. The merging of these 2 programming and network support groups into the proposed Data Management Service will enable TraCS protocols to take advantage of web-based data management, randomization and tracking systems that are integrated to facilitate reporting of study progress and availability of data for analysis (interim and final). Data management consulting services will be provided to projects and programs during the design phase to advise on data collection modalities for a particular research protocol as well as case report form design. The goal is to use a common set of case report forms across TraCS protocols as often as possible and to standardize data definitions and edit specifications in order to facilitate pooling of data across studies and the eventual sharing of data with other CTSA centers. This core will also promote and encourage adoption across the CTSA network of common Case Report Forms to be determined within this network. D-5. Flexible data collection procedures and tools that will support patient quality-of-care analyses as well as facilitate patient recruitment for future research studies Overview: This initiative will provide high quality clinical databases populated with information collected from consenting, treatment-seeking patients in the UNC Health Care system and throughout the state of North Carolina to facilitate analyses of patient quality of care as well as greatly shorten the length of time required to screen and enroll subjects in future research protocols. Problem Statement: Information collected via paper forms at the time a patient enters a UNC Health Care clinic is currently not utilized to facilitate either assessments of quality of care or future subject recruitment in research protocols. What Do We Have to Build Upon? The web-based data management systems at the Collaborative Studies Coordinating Center, in conjunction with the administrative computing support from the GCRC Informatics Core, both described in Section D-4 above, are well-positioned to support this initiative. The Coordinating Center’s web-based data management and tracking systems are designed to easily accommodate new data collection modalities required for this initiative, and the experience of the GCRC Informatics Core in supporting clinic operations provides the expertise needed to plan for extensive data collection and web-based subject recruitment throughout the UNC Health Care system and into community areas. Proposed Solution: 1) We will develop, pilot test and implement data collection procedures that are suitable for use at points of care in UNC’s Health Care system in order to populate a clinical database of sufficient quality to support patient quality of care analyses. 2) We will merge relevant background and medical history data from WebCIS to this clinical database, and the resulting database will serve as a resource for subject recruitment for future TraCS-based studies and clinical trials. 3) Web-based subject recruitment systems will be developed for implementation in health care communities outside of UNC. Implementation Details (Carpe Diem pilot study): The Coordinating Center is currently launching a pilot study in conjunction with the oncology clinics at UNC School of Medicine that incorporates flexible data collection tools into the web-based data management system, namely, smart pens and tablet PCs. In this pilot, patients visiting one of the oncology clinics are approached and asked to participate and give informed consent for minimal data collection at the time of their visit. The study is therefore pilot testing not only the use of flexible data collection methodologies, but also the ability to consider treatment-seeking patients as potential research subjects for future studies. Data are collected using digital tablets or digital pens; both devices are minimally invasive and therefore ideal for use in a clinic setting. The clinical database is then populated with data collected at point of entry through the digital devices. These newly collected data will be coded and processed for use in assessments of patient quality of care. Data standards consistent with regulatory guidelines for clinical trials will be applied, thereby facilitating the use of these point-of-care data in the event that the patients eventually join a research protocol as research subjects. In addition to the data collected in the clinic via digital devices, other background medical information will be downloaded from electronic medical records (WebCIS). This is currently being accomplished through a live HL7 feed for the Carpe Diem pilot study. With the completion of the research data warehousing project for WebCIS data, this information transfer will be greatly improved. Biomedical Informatics Core Watkins, Paul B. The Carpe Diem pilot study will serve as the framework for a large-scale data collection initiative of the TraCS Institute Biomedical Informatics Core. Kiosks at clinics throughout the health care system (and eventually throughout the state) will be established to educate patients about possible participation in future clinical trials. Subjects will be recruited to enter a registry of subjects willing to have their clinical data screened for eligibility in clinical studies as protocols are developed. Basic demographic and phenotyping data will be collected using the flexible tools described above and stored in the registry. The registry database will be processed to achieve the quality required for a regulatory submission. Therefore, the registry will not only provide easy and informed identification of potential subjects for future trials, but background data will already be collected, thereby enabling a very quick study start-up phase. Success rates based on eventual enrollment into specific clinical trials of subjects who were initially contacted at the kiosks will be closely monitored. Several other small scale initiatives are currently underway that incorporate hospital clinical records from WebCIS into a research protocol involving primary data collection. One such pilot is using downloaded WebCIS data to assist in randomized clinical trials of disease management for patients with type 2 diabetes in the general internal medicine practice. PCIR Core Support: Another important initiative of the TraCS Institute Data Management Service will be to support operations at the PCIR Core. Following are the activities that we envision under this initiative: o E-Protocol web-based system to track and manage study documents o Web-based recruitment tools for use throughout UNC Health Care and in NC communities o Web-based adverse event/serious adverse event reporting and tracking system, linked to clinical study databases o Lab downloads from WebCIS warehouse—these are currently supported on a protocol by protocol basis. The Carpe Diem pilot study is testing a live HL7 feed from WebCIS into the clinical database at the Coordinating Center. Upon completion of the research warehouse initiative for WebCIS, clinical data retrieval, such as downloads, will be easily implemented for all TraCS protocols. o Data transfers from external central labs/centers with quality control procedures implemented (e.g., 5% blinded replicate analysis)—these procedures are in place at the Coordinating Center and will be available for all TraCS protocols. o Collection of basic phenotyping and demographic data on all PCIR research subjects using standard case report forms and existing web-based systems. Integration of both staff and software tools from the GCRC informatics core and the Collaborative Studies Coordinating Center and linkage of the resulting TraCS Data Management Service to the TraCS Institute will facilitate learning opportunities for clinical investigational scholars who would benefit from immersion in this living laboratory of clinical research data management. The TraCS Institute’s goal is to train the next generation of superb and knowledgeable translational and clinical investigators, who will be the drivers of new research endeavors. However, success at reaching the finish line defined as delivery of discovery to accepted application to the public for improvement of health cannot occur without a well-engineered, well-running stateof-the art machine: clinical data management. Data drive the results that determine investigational product safety and efficacy. Clinical investigators analyzing trial data and using those results to shape clinical practice would benefit from a better understanding of how those data are generated – from design of case report forms, to database construction, to data collection, editing, validation, summarization and analysis. D-6. Expanded Community Engagement Overview: This initiative will deliver for re-use throughout North Carolina the resources and services of NCCBI, developed and tested at UNC-CH. These will include flexible and easy-to-use tools for electronic data capture, research and community engagement portals, and access to the shared repository of clinical and other biomedical data, protocols, processes, tools and knowledge resources. The social engagement processes that encourage participation in NCCBI will extend to include healthcare providers, professionals, consumers and payers. Problem Statement: UNC-CH researchers have successfully performed clinical research in communities across the state, but have had to work through informatics issues de novo with each project. Many studies have involved laborious review of paper charts to create electronic databases that are useful as a one-time tool but are too cumbersome for ongoing use. While these efforts have been effective for individual studies, they do Biomedical Informatics Core Watkins, Paul B. not have the capability to efficiently accrue the cumulative data necessary to monitor dissemination efforts and concomitant long term health outcomes. What Do We Have to Build Upon? The North Carolina Area Health Education Centers (AHEC) and existing investments in community-based research infrastructure are described in the Community Engagement Core section. Past community-related biomedical informatics projects in this area have been successful in their own right. These include syndromic surveillance of primary care practices, the accumulation of data from emergency departments with disparate record systems, electronic surveys administered via laptops with uploading of appropriately de-identified data to a central website, the Carpe Diem project piloting flexible data collection tools, to name a few. The TraCS Biomedical Informatics Core along with the Community Engagement Core through the NCCBI and related activities will add the social, organizational and technical infrastructure needed to facilitate convenient and cost-effective re-use of data, knowledge and processes. Proposed Solution: We will engage community stakeholders in continuous discussion of needs, priorities, capabilities and concerns, resulting in a living community engagement agenda. Based on that agenda, we will develop, pilot, test and implement convenient and cost-effective solutions that leverage resource investments and optimize sharing and re-use. Implementation Details: To identify needs and priorities for community engagement informatics support, we will conduct periodic workshops with representatives from the Community Engagement Core and the distributed communities. We will visit some of these communities to fully understand their problems and concerns and to assist in designing workable solutions. A Community Engagement Portal will be developed to disseminate current activities, capabilities, needs, interests, and the like, and will link to the Research Portal. The TraCS Community Advisory Board will be able to advise and facilitate these interactions as well. Alternative technologies will be demonstrated, tested, piloted and evaluated for field data capture (scanning, intelligent pens, tablets, speech recognition, for example). Ongoing data accumulation, creation and maintenance of registries, and uploading of information to the relevant data warehouse will occur, as appropriate. The shared data repository will have the capability of combining clinical information with biologic and genetic information. We also envision eventual incorporation of real time clinical reference materials and decision aids as part of our informatics service to participating community practices, providing an educational service and serving as incentives for continued participation in the TraCS informatics program. We will develop the AHEC-based Regional Translation Research Units (RTRU) into regional health information sites that can merge data and contribute to the NCCBI shared data repository at the TraCS Institute. Given the variability of electronic record capabilities in the practices that will make up our clinical network, our informatics solutions will be multi-faceted. We will leverage our partnerships with open source software companies to develop and distribute these solutions. These multifaceted communication approaches will be delivered through the range of user and outreach services – FAQs, websites, wikis, educational programs and so on. We envision strong representation by the county health departments across North Carolina in each RTRU. The application of information technology resources to the collection, aggregation and analysis of health data inherent in the NCCBI can be leveraged in the public health domain. The data residing in current and emerging clinical information systems can be adapted and accessed for use by public health practitioners to address a variety of issues. Such access can support an ongoing assessment of data quality in systems designed to support clinical care. Accessing clinical data systems can potentially provide a rich data source for syndromic surveillance for disease outbreaks. Such an early warning system can highlight issues during those critical hours before they spread beyond a limited area. These advances will require development of new technologies to translate data from a variety of clinical systems into a normalized format and content. D-7. Coordination of research infrastructure designed for re-use of data, processes, tools and services. Overview: At the heart of the Collaboratory is the research cyberinfrastructure, which will extend beyond today’s existing systems to include a shared data repository, reusable processes, standards and best practices, applications and tools, interfaces and services, leveraging North Carolina’s leadership in open source software. The extended cyberinfrastructure will be modeled after the NCBI, contributing, in particular, the protocols and best practices, tools and services for a “Clinical Entrez.” The cyberinfrastructure will leverage the RENCI North Carolina Bioportal project, which brings open source bioinformatics applications and data together with high performance distributed computing resources (www.renci.org/projects/bio.php). Biomedical Informatics Core Watkins, Paul B. Problem Statement: Current biomedical research infrastructure is fragmented; tools for searching and analyzing heterogeneous databases are developed for one study at a time; and the processes and applications are not shared. There is a critical need to support the proposed culture of collaboration with the capabilities to support routine and convenient re-use of clinical and other research data, processes, and tools. What Do We Have to Build Upon? The underlying IT requirements for integrating access to heterogeneous databases and infrastructure are currently addressed in the RENCI Bioportal, a project focused on the diverse needs of biology and biomedical research communities. This service layer is a programmatic interface currently accessible to users via an interactive web portal, workflow automation tools such as Taverna, and potentially other client software. Proposed Solution: We will implement a comprehensive research cyberinfrastructure (http://sils.unc.edu/griffiths/nccbi_model). Critical elements to achieve this vision are 1) the commitment to improving the capture, quality, comprehensiveness and curation of biomedical data and metadata; 2) implementation of a statewide biomedical data/metadata repository with linkage and contribution to national level efforts; 3) development, adoption and promulgation of standards of system interoperability and data portability; 4) development and evaluation of improved applications and tools for discovery, analysis, presentation and decision support; 5) design, implementation and evaluation of educational programs aimed at a wide variety of audiences and offered through a range of modalities; 6) an improved understanding of the science and practice of collaboration; and 7) multiple levels of data security and (when necessary) data encryption in order to conform with IRB and HIPAA regulations as well as satisfy appropriate concerns of research partners such as payers. Implementation Details: The proposed NCCBI cyberinfrastructure model includes 6 layers: 1) Base Technologies include the general purpose computation, storage and communications technologies; 2) Enabling Technologies include networking, operating systems, middleware and security technologies; 3) Existing Systems include the biomedical information systems currently in place at UNC-CH and across the state. These include a wide variety of independent clinical and research information systems, with minimal linkage among them. (These first 3 layers just described largely comprise today’s biomedical computing environment. The proposed expanded and enhanced cyberinfrastructure will contribute 3 new and distinct layers that follow) 4) Shared Data Repository, which will include data of all types and associated metadata; 5) Repository Tools and Services, including processes and best practices for a) data capture, b) data quality assurance, c) long-term data curation, d) development and evaluation of metadata schema, taxonomies, other forms of knowledge representation, e) access management and adoption and f) development of relevant standards in each of these areas; and 6) User and Outreach Services that deliver information and knowledge to specific populations that will also be developed and evaluated. We propose to extend the Bioportal architecture to support clinical and translational biomedical science. This work is concentrated primarily in 2 areas: 1) federated integration of clinical databases alongside existing Bioportal data sources and 2) development of new translational services to enable efficient and effective capture and sharing of multidisciplinary data across the Collaboratory, including security and user rights management, data curation, metadata management and improved monitoring. The proposed architecture for NCCBI involves extending the Bioportal to provide access to clinical databases and applications. These include 1) the WebCIS research data warehouse, 2) data from clinical trials, 3) data from population demographic studies, 4) data from the UNC-CH schools and departments, 5) other data from UNC-CH and affiliates across North Carolina, 6) literature search services for aggregated searching of research publications, and 7) decision support tools. In addition to data, the architecture requires the development of translational services not currently supported by the Bioportal. They include: 1) Collaboratory services for capturing and sharing of data across disciplines; 2) user access (rights and permissions) management and improved multi-level security; 3) data curation services to provide users with support for contribution, provenance, sharing and ensuring long-term access to data; 4) metadata management services to help unify and enhance interoperability of diverse data; and 5) improved monitoring of infrastructure and user activity. Collaboratory services will enable the capturing of results from one or more services, grouping and annotating the captured data and sharing them with other users and communities as permitted by access restrictions on the derivative data. For example, a selection of experimental data, clinical data and research literature can be annotated, combined into a virtual publication and shared across the Collaboratory. Biomedical Informatics Core Watkins, Paul B. A Cyberinfrastructure Group will be drawn from existing UNC-CH resources, consolidated and expanded to ensure dedicated focus and effort on building the proposed shared resource. IT personnel from the existing GCRC and Coordinating Center will be combined and expanded with new hires to create a co-located IT group of 13-14 FTEs total. Seven of these individuals, 2 contributed by RENCI and 5 new hires, will be assigned to cyberinfrastructure development. The cyberinfrastructure group will work to incorporate or develop new open source code in modular form, in close collaboration with Red Hat, IBM and SAS. A key transformative component of this proposal is the establishment of a group comprising Computer Science and School of Information and Library Science (SILS) faculty and students to use the research portal to identify translational and clinical biomedical problems that are in need of CS or SILS applied solutions. UNC has a strong history of collaboration among SILS, Computer Science and the Schools of Medicine and Public Health. Examples of such collaborations include:        Computer-Integrated Systems for Microscopy and Manipulation (“CISMM”), originally focused on the molecular structure solution and simulation of static and dynamic interactions among molecules, to understand cellular mitosis. Medical Image Display and Analysis Group (“MIDAG”) led by Stephen M. Pizer, PhD, Kenan Professor, Departments of Computer Science, Radiology, Radiation Oncology, and Biomedical Engineering. This is a collaborative group of about 110 professionals from the departments of Computer Science, Radiology, Radiation Oncology, Surgery, Psychiatry, Urology, Statistics, Mathematics, Biostatistics and Biomedical Engineering, including ~25 graduate students. Chief Complaint System, research related to the language/sublanguage used by the patient and triage nurse to document the reason for the Emergency Department visit. Information Extraction from WebCIS Notes as part of the DEcIDE project, whose goal is to explore relationships between over-medication and diabetes outcomes. Multi-User Extraction and Information Synthesis (METIS) enables one to detect hidden connections from literature. The system, designed by SILS faculty member Cathy Blake, combines a knowledge-based approach with shallow language processing to identify pre-defined facts from each full-text scientific article. Personalized Information Synthesis for Breast Cancer Patients seeks to provide a breast cancer patient with a personalized perspective on the latest scientific literature regarding her medical condition. It uses electronically available patient information from WebCIS and full-text scientific literature. Personal Health Record (PHR) Usability study to determine the effectiveness of alternative organizational structures for PHRs that are personalized to specific kinds of health conditions. These are 2 important design features for user interfaces, and results will inform PHR design in this early stage of development. This research could yield important information for the CTSA network. To date, these collaborations have been driven by the interests and relationships among faculty. The TraCS Institute will fund 4 supervised graduate students per year and create opportunities to work with interdisciplinary teams on projects that further biomedical research or patient care. The portal could also be extended to offer field experiences and internships for Computer Science and SILS students as well as other students engaged in biomedical and health informatics programs at UNC-CH and other institutions. E. Measurement of Progress and Evaluation The focus of the proposed biomedical informatics activities is on sharing and re-use of research knowledge, data and protocols. The following measures of progress will be monitored in an ongoing manner and reported quarterly: 1) number of participants (by discipline, affiliation, role and level/frequency of activity); 2) number of collaborations (by type, size, scope, modality and longevity); 3) amount of re-use of data, protocols, tools, services, and other research knowledge (such as best practices, outcomes, literature); 4) Number of new resources developed (by type); and 5) Use and re-use of newly developed resources. Evaluation of NCCBI products and services will be conducted at regular intervals, as appropriate. For example, the functionality and usability of each new resource, tool and service will be evaluated as they are made available in prototype or pilot form. An annual evaluation of the NCCBI as a whole will be conducted to include a participant survey to determine satisfaction with products, services and activities; perceived importance and contribution of the products, services and activities to research, education and practice; and organizational effectiveness. Complete detail for Tracking/Evaluation and Implementation and Milestones for this section is found on page 1096. Biomedical Informatics Core Watkins, Paul B. Literature Cited (not in page counts): 1. Benito, M. et al. Adjustment of systematic microarray data biases. Bioinformatics 20, 105-114 (2004). 2. Blake, C., and Pratt, W. Collaborative Information Synthesis. Paper presented at the 65th Annual Meeting of the American Society for Information Science and Technology, Philadelphia, PA. Published in Proceedings (2002) 3. Blake, C. Information Synthesis: A Mixed-Initiative Meta-Analytic Approach to Facilitate Knowledge Discovery from Scientific Text. Unpublished doctoral dissertation, University of California, Irvine (2003).Carey, L.A. et al. Race, breast cancer subtypes, and survival in the Carolina Breast Cancer Study. JAMA 295, 2492-502 (2006). 4. Chung, C.H. et al. Molecular classification of head and neck squamous cell carcinomas using patterns of gene expression. Cancer Cell 5, 489-500 (2004). 5. Fan, C. et al. Concordance among gene-expression-based predictors for breast cancer. N Engl J Med 355, 560-9 (2006). 6. Hayes, D.N. et al. Gene expression profiling reveals reproducible human lung adenocarcinoma subtypes in multiple independent patient cohorts. J Clin Oncol 24, 5079-90 (2006). 7. Hu, Z. et al. The molecular portraits of breast tumors are conserved across microarray platforms. BMC Genomics 7, 96 (2006). 8. Oh, D.S. et al. Estrogen-Regulated Genes Predict Survival in Hormone Receptor-Positive Breast Cancers. J Clin Oncol (2006). 9. Perou, C.M. et al. Molecular portraits of human breast tumors. Nature 406, 747-52 (2000). 10. Perreard, L. et al. Classification and risk stratification of invasive breast carcinomas using a real-time quantitative RT-PCR assay. Breast Cancer Res 8, R23 (2006). 11. Rouzier, R. et al. Breast cancer molecular subtypes respond differently to preoperative chemotherapy. Clin Cancer Res 11, 5678-85 (2005). 12. Scott Kraus, S., Blake, C. & West, S.L. Information Extraction from Medical Notes: The Power of One, MEDINFO (under review) 13. Sorlie, T. et al. Gene expression profiles do not consistently predict the clinical treatment response in locally advanced breast cancer. Mol Cancer Ther 5, 2914-8 (2006). 14. Sorlie, T. et al. Repeated observation of breast tumor subtypes in independent gene expression data sets. Proc Natl Acad Sci U S A 100, 8418-23 (2003). 15. Sullivan, P.F., Fan, C. & Perou, C.M. Evaluating the comparability of gene expression in blood and brain. Am J Med Genet B Neuropsychiatr Genet 141, 261-8 (2006). 16. Thomson, J.M., Parker, J., Perou, C.M. & Hammond, S.M. A custom microarray platform for analysis of microRNA gene expression. Nat Methods 1, 47-53 (2004). 17. Travers, D.A., and Haas, S.W. Evaluation of Emergency Medical Text Processor, a system for cleaning chief complaint data. Academic Emergency Medicine 11, 1170-1176 (2004). 18. Travers, D.A., and Haas, S.W. The Unified Medical Language System© coverage of emergency department chief complaints. Academic Emergency Medicine (in press). Biomedical Informatics Core Watkins, Paul B. 19. Travers, D.A., and Haas, S.W. Using nurses’ natural language entries to build a concept-oriented terminology for patients’ chief complaints in the emergency department. Journal of Biomedical Informatics. 36, 260-270 (2003). 20. Troester, M.A., Hoadley, K.A., Parker, J.S. & Perou, C.M. Prediction of toxicant-specific gene expression signatures after chemotherapeutic treatment of breast cell lines. Environ Health Perspect 112, 1607-13 (2004). 21. Weigelt, B. et al. Molecular portraits and 70-gene prognosis signature are preserved throughout the metastatic process of breast cancer. Cancer Res 65, 9155-8 (2005). Biomedical Informatics Core

Biomedical Informatics - UNC School of Information and Library

Related documents

Products

Support

Biomedical Informatics - UNC School of Information and Library

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib