October 2, 2007 Biomedical and Health Informatics Lecture Series Peter Tarczy-Hornoch MD Head and Professor, Division of Biomedical and Health Informatics University of Washington Biomedical and Health Informatics Lecture Series Focus: current topics and developments in informatics Presenters: faculty, students, researchers and developers from UW, other academic institutions, government, and industry (locally and nationally) Intended audience: Broader UW & Seattle community interested in BHI BHI faculty and students History: Early 1990’s: initiated as part of IAIMS (MEDED 590) 2003-2006: temporarily changed to closed journal club format Fall 2006: return to public lecture series format Fall 2007: 10th year of Division of Biomedical & Health Informatics MEBI 590 & BHI Lecture Series Biomedical and Health Informatics (BHI) Lecture series available for credit as MEBI 590 Details & upcoming lectures available at: http://courses.washington.edu/mebi590/ pth@u.washington.edu Key points for those taking for credit Need to sign in each lecture to get credit CR/NC course Must attend 9 of 10 lectures for credit Informatics and the New Northwest Institute of Translational Health Sciences Peter Tarczy-Hornoch MD Director, Biomedical Informatics Core Northwest Institute of Translational Health Sciences Head and Professor, Division of Biomedical and Health Informatics Professor, Division of Neonatology bhi.washington.edu Outline Clinical Translational Science Awards Northwest Institute of Translational Health Sciences Biomedical Informatics Core of NW ITHS Data Integration Summary NIH Roadmap - Process Initiated in 2002 by NIH Director (Zerhouni) http://nihroadmap.nih.gov/ Chart a roadmap for medical research in 21st c. NIH Leadership What are today’s scientific challenges? What are the roadblocks to progress? What do we need to do to overcome roadblocks? What can’t be accomplished by any single Institute – but is the responsibility of NIH as a whole Working Groups Implementation Groups Implementation Groups => RFAs Summer/Fall 2006: New initiatives (Roadmap 1.5) NIH Roadmap – Themes New Pathways to Discovery Building Blocks, Biological Pathways, and Networks Molecular Libraries & Molecular Imaging Structural Biology Bioinformatics and Computational Biology (BISTI/NCBC) Nanomedicine Research Teams of the Future High-Risk Research Interdisciplinary Research Public-Private Partnerships Re-engineering the Clinical Research Enterprise Clinical Research Networks/NECTAR Clinical Research Policy Analysis and Coordination Clinical Research Workforce Training Dynamic Assessment of Patient-Reported Chronic Disease Outcomes Translational Research (Clinical Translational Science Awards) NIH Roadmap Clinical Translational Science Awards Initial request for applications October 2005 Current RFA: RFA-RM-07-007 CTSA planning grants (one year), implementation grants (five years) “The purpose of this initiative is to assist institutions to create a uniquely transformative, novel, and integrative academic home for Clinical and Translational Science that has the resources to train and advance a cadre of well-trained multiand inter-disciplinary investigators and research teams with access to innovative research tools and information technologies to promote the application of new knowledge and techniques to patient care.” Definition of Translational Research “Translational research transforms scientific discoveries arising from laboratory, clinical or population studies into clinical or population-based applications to improve health by reducing disease incidence, morbidity and mortality Modified from the NCI translational research working group (2006) UW: human subjects, specimens or plans CTSA: From Bench to Bedside to Community NIH Roadmap Clinical Translational Science Awards Integrate existing Clinical Research Centers (CRCs) with existing clinical/translational science training grants (K12, K30, T32) and expand capabilities through new cores (e.g. Biomedical Informatics, Evaluation, Novel Technologies, etc.) Establish regional and national consortia with the aim of transforming how clinical and translational research is conducted, and ultimately enabling researchers to provide new treatments more efficiently and quickly to patients When fully implemented in 2012, the initiative is expected to provide a total of about $500 million annually to 60 academic health centers in the US National CTSA Awards 2006 & 2007 CTSA Full Center Awards 2006 Columbia University Health Sciences Duke University Mayo Clinic College of Medicine Oregon Health & Science University Rockefeller University University of California, Davis University of California, San Francisco University of Pennsylvania University of Pittsburgh University of Rochester University of Texas Health Science Center at Houston Yale University 2007 Case Western Reserve University Emory University Johns Hopkins University of Chicago University of Iowa University of Michigan University of Texas Southwestern Medical Center University of Washington University of Wisconsin Vanderbilt University Washington University Weill Cornell Medical College Outline Clinical Translational Science Awards Northwest Institute of Translational Health Sciences Biomedical Informatics Core of NW ITHS Data Integration Summary Institute of Translational Health Sciences Northwest ITHS is the name for the regional inter-disciplinary consortium funded through the NIH-NCRR Clinical Translational Science Award (CTSA) Planning grant: 2006-7 Full Center grant: 2007-12 funded $62M NW ITHS will provide an “academic home” and integrated resources to: Advance clinical and translational science; Create and nurture a cadre of well-trained clinical investigators; Speed translation of discoveries into clinical practice Foster interactions between the university, non-profit, and business research communities Create an incubator for novel ideas and collaborations that cross disciplines Institute of Translational Health Sciences NW ITHS – “Collaboratory” Model NW ITHS - Partners Founding Members of the NW ITHS and Key Collaborators University of Washington Children’s Hospital and Regional Medical Center Fred Hutchinson Cancer Research Center Group Health Cooperative Center for Health Studies Benaroya Research Institute PATH Six proposed American Indian and Alaska Native Network Sites 6 Health Sciences School, 12 sites, 67 key scientific personnel, more than 150 centers Drs. Nora Disis (UW), Bonnie Ramsey (CHRMC), Mac Cheever (FHCRC/SCCA) co-leaders Institute of Translational Health Sciences Eleven ITHS Cores Administrative Novel clinical and translational methodologies Pilot and collaborative translational and clinical studies Biomedical informatics Study design and biostatistics Regulatory knowledge, support and research ethics Participant clinical interactions resources (CRC+) Community engagement Translational technologies and resources Research education, training and career development Tracking and evaluation Institute of Translational Health Sciences Outline Clinical Translational Science Awards Northwest Institute of Translational Health Sciences Biomedical Informatics Core of NW ITHS Data Integration Summary CTSA RFA & Biomedical Informatics Biomedical Informatics is the cornerstone of communication within (CTSAs) and with all collaborating organizations Applicants should describe: support provided for operations, administration, research and clinical/translational research activities plan to establish communication with external organizations relevant to their mission the process by which standards and other mechanisms will be developed and used to maximize interoperability between internal systems and systems in outside organizations assessment of informatics performance across the CTSA programs and with external partners inter- and intra-organizational sharing of data, technology and best practices Biomedical Informatics is expected to be the subject of an overall NIH CSTA Informatics Steering Committee that ensures interoperability between the CTSA institutions and with their external partners. Biomedical Informatics Core Team Peter Tarczy-Hornoch MD, Core Director Jim Brinkley MD PhD, Core Co-Director Nick Anderson PhD, Core Deputy Director Bill Lober MD Jim LoGerfo MD MPH Dan Suciu PhD Dan Ach (GCRC Informatics Lead) To be hired: ~14 professional staff and 3 RA slots ITHS Biomedical Informatics Core Aim 1 Aim 3 Aim 2 Aim 4 Aim 5: Develop & maintain ITHS administrative databases & Web interfaces Aim 1: Provide access to electronic health data at ITHS institutions Inventory and model recurring common queries Develop new interfaces to electronic health data from partner institutions Provide ITHS researchers access to electronic health data from partner institutions via a new common web interface Pilot a Virtual Data Warehouse (VDW) across the ITHS partner institutes building on the common web interface Extend the pilot VDW to include clinics in the WWAMI region Access to electronic health record data Existing resources: MIND Access Project (UW), Cerner Research Query System (CHRMC), Clinical Data Repository (FHCRC), Research-O-Matic (CHS) Gaps: no convenient access, repository data limited Goals: Simplify appropriate access to existing data Extend appropriate access to existing data Extend sources of electronic health record data Note: research still needed to solve Aim 1-4 gaps Aim 2: Support access to study data management tools for translational research Provide consultation to ITHS researchers regarding choosing and implementing study management tools Continue to develop and enhance existing ITHS data management tools Maintain and augment an inventory of data management tools Develop interfaces to most commonly use data management tools Perform a feasibility study of the establishment of a Data coordinating center Access to study data management tools Existing resources: GCRC Study Data Management (UW/CHRMC), Seedpod/Celo (UW), CF TDN (CHRMC), Clinical Informatics Shared Resource (FHCRC), multiple tools elsewhere Gaps: ease of use, limited features, not integrated Goals: Move local systems from prototype to production Develop centralized resources for currently used case report forms/study data management tools Extend centralized repository to include other CTSA tools Aim 3: Interface to biological study data from scientific instrumentation cores Provide ITHS researchers access to data from ITHS scientific instrumentation cores Prioritize list of other scientific instrumentation cores suitable to access Develop protocols and interfaces to new ITHS Human Genomics and Coordinated Tissue Bank core Access to instrumentation cores data Existing resources: large number of scientific instrumentation cores across consortium sites, generalizing interfaces via caBIG & SCHARP collaboration with Labkey Software (FHCRC) Gap: data not integrated with clinical/study data Goals: Build reusable interfaces to key scientific instrumentation Ensure compatibility with Aim 4 and national standards Aim 4: Integrate access across these three data sources Provide ad-hoc integration of aims 1-3 to ITHS researchers via ITHS BMI personnel Develop a data integration model for ITHS BMI by adapting existing tools Implement, test and refine prototype ITHS BMI Data Integration System Deploy and continue to refine the ITHS BMI data integration system Integrate access across these resources Existing resources: BioMediator (UW), XBrain (UW), CNICS, NA-ACCORD (UW), MIND/MAP (UW), Clinical Data Repository (FHCRC), caBIG (FHCRC), SCHARP (FHCRC), Virtual Data Warehouse (CHS) Gaps: no system integrates sources from Aim 1-3, no system across consortium members Goals: Adapt and evolve existing local systems to meet needs Continue to assess commercial systems Adopt interoperable approaches across CTSA sites Outline Clinical Translational Science Awards Northwest Institute of Translational Health Sciences Biomedical Informatics Core of NW ITHS Data Integration Summary UW Biomedical Data Integration and Analysis Research Group Peter Tarczy-Hornoch MD, PI Dan Suciu PhD, PI Alon Halevy PhD, Past PI 6 collaborating faculty Jim Brinkley, Chris Carlson, Eugene Kolker, Peter Myler, 4 programmers Ron Shaker, Todd Detwiler 13 students (over time) Eithon Cadag, Brent Louie, Terry Shen, Kelan Wang Motivation for Data Integration Genomics Data Literature Clinical Data Proteomics Information Pathways Knowledge Discovery (understanding) Experimental Data Others… Adapted from Chung and Wooley. 2003 Slide K. Wang, 2005 The Growth of Biologic Databases 900 800 700 Databases 600 500 400 300 200 100 0 2000 2001 2002 2003 2004 2005 2006 Year (Nucleic Acids Research, Database Issues 2000-2006) Slide E Cadag, 2006 BioMediator System Federated, general purpose, modular, decoupled NIH NHGRI/NLM funded 2000-2007 www.biomediator.org Interface Pfam Query` Query`` Query Translation Query Query` Interface Query`` ProSite Interface Query` Common data model CDD Query`` BioMediator Use Case: Annotation PubMed Entrez PROSITE COGs GO BLAST Human analysis and curation Local databases PSORT Pfam CDD BLOCKS Local algorithms Slide E Cadag, 2006 Finding Needle in Haystack: Inference Complete Result Set Relevant Subset Inference to Emulate Human Annotator Working memory Pfam.DomainHit IF DomainHit e-value > e-value: 10e-10 10e-15 name: neurotransmitter ProSite.DomainHit THEN remove e-value: 10e-20 name: neurotrans. IF DatabaseHit Name is BLAST.DatabaseHit similar to other e-value: 10e-10 DatabaseHit Names name: nic. acetylcholine THEN increase evidence BLAST.DatabaseHit e-value: 10e-20 evidence for name: acetylcholine acetylcholine increased rec. ... Rule-base ... Slide E. Cadag, 2006 Evaluation Scoring System Dimensions of granularity and utility Score Granularity Meaning Utility Meaning -2 Automated annotation is incorrect Phrasing or representation of automated annotation is not useful for functional annotation -1 Automated annotation is less specific than actual Automated annotation is less useful than actual 0 Automated annotation is indistinguishable from actual Automated annotation is as useful as actual Automated annotation is more specific than actual Automated annotation is more useful than actual +1 Slide E. Cadag, 2006 Scores for Automated Annotations Automated Score Incorrect or useless Less granular or useful Same as actual More granular or useful Total Granularity, % (n) Utility, % (n) 3.0% (1) 0% (0) 20.6% (7) 5.8% (2) 52.9% (18) 73.6% (25) 23.5% (8) 20.6% (7) 100% (34) 100% (34) Granularity average (selected annotations): -0.029 Utility average (selected annotations): 0.147 Slide E. Cadag, 2006 Finding Needle in Haystack: Uncertainty NSF IIS funded 2005-2009 Complete Result Set Relevant Subset Data Source Measures: Ps Source 1 Source 2 Concept 1 Concept 2 Source 3 Source 4 Concept 1 Concept 2 Ps: users belief in a concept from a particular source Slide B. Louie, 2007 Data Source Measures: Qs Source 2 Source 1 relationship Concept 1 Concept 2 Source 3 Source 4 Concept 1 Concept 2 relationship Qs: users belief in the interconnections (relationship) between two sources Slide B. Louie, 2007 Data Record Measures: Pr Source 1 Source 2 Concept 1 Concept 2 Record 1 Record 2 Pr: measure of belief in a particular data record Slide B. Louie, 2007 Data Record Measures: Qr Source 1 Source 2 Concept 1 Concept 2 link Record 1 Record 2 Qr: measure of belief in a particular link between data records Slide B. Louie, 2007 Result Graph with Uncertainty Measures Qs: 0.8 Qr: 0.9 Ps: 0.7 Pr: 0.3 Ps: 1.0 Pr: 0.8 Ps: 0.8 Pr: 0.5 Qs: 0.8 Qr: 0.3 Slide B. Louie, 2007 Network Reliability Theory Qse1* Qre1 UII (U2) Score = probability that a node is reachable from the start (seed) node. S Psn1* Prn1 Qse1* Qre1 Qse1* Qre1 Psn1* Prn1 Psn1* Prn1 Qse1* Qre1 Qse1* Qre1 Qse1* Qre1 Psn1* Prn1 Psn1* Prn1 Computing U2 score is #P. Approximation algorithms exist (Karger 2001), but are impractical. Qse1* Qre1 Slide B. Louie, 2007 Result Graph with Uncertainty Scores Qs: 0.8 Qr: 0.9 U2: 0.72 Ps: 0.7 Pr: 0.3 U2: 0.21 Ps: 1.0 Pr: 0.8 U2: 0.80 Ps: 0.8 Pr: 0.5 U2: 0.40 Qs: 0.8 Qr: 0.3 U2: 0.24 Slide B. Louie, 2007 BioMediator & Uncertainty: Evaluation Preliminary evaluation Gold standard: COG functional categorization Comparison: BioMediator + Uncertainty Agreement with actual: 94.4% After increasing number of simulations to estimate UII scores: 100% NW ITHS and Data Integration Aim 1 Aim 3 Aim 2 Aim 4 Aim 5: Develop & maintain ITHS administrative databases & Web interfaces Outline Clinical Translational Science Awards Northwest Institute of Translational Health Sciences Biomedical Informatics Core of NW ITHS Data Integration Summary Summary/Questions CTSAs are seen as a key part of the NIH Roadmap “Re-engineering the clinical research enterprise” Biomedical informatics (BMI) cores are seen as key nationally as well as locally for NW ITHS The BMI core is focused on addressing identified gaps through both research and tool development An important foundational element to the BMI core is data integration