BioVU and the Synthetic Derivative Erica Bowton, PhD Program Manager, Personalized Medicine Personalized Medicine What is BioVU? • • • • The move towards personalized medicine requires very large sample sets for discovery and validation BioVU: biobank intended to support a broad view of biology and enable personalized medicine Contains de-identified DNA extracted from leftover blood after clinically-indicated testing of Vanderbilt patients who have not opted out Linked to Synthetic Derivative: de-identified EMR 4 John Doe A7CCF99DE65732…. John Doe One way hash 5 ~2 million records The Synthetic Derivative: can be updated 6 A7CCF99DE65732…. A7CCF99DE65732…. A7CCF99DE5732…. One way hash John Doe John Doe eligible Extract DNA ~2 million records The Synthetic Derivative: can be updated How BioVU Samples are Accepted Accepted samples must: Be of good quality Have sufficient amount of blood Be from a patient who has signed the BioVU form Be from a patient who has not opted out 7 The BioVU Form A component of the Consent for Treatment process 8 Awareness Generation • Posters in phlebotomy areas in English and Spanish • Brochures freely available to VUMC clinics in English and Spanish • BioVU hotline available for questions and opt-out 9 BioVU Sample Accrual: 176,448 225,000 200,000 Current accrual as of 2-19-2014: 155,090 adult 21,472 pediatric 175,000 150,000 Anticipated pediatric sample accrual 125,000 Anticipated adult sample accrual 100,000 Pediatric samples accrued 75,000 Adult samples accrued 50,000 25,000 0 10 Where are BioVU samples stored? RTS SmaRTStore 11 BioVU Operations Oversight = oversight Institutional Review Board = input, advisory BioVU BioVU Protocol Review Committee General Counsel Med Ctr Ethics Ethics Advisory Board* Community Advisory Board* Vice Chancellor’s Office Operations Oversight Board** Vice Chancellor (Chair) Ethics/ELSI (2) Ctr Human Genetics Research (2) Clinical genetic testing lab (1) Genetics/Genetic Medicine (6) Clin. Pharmacology(PI) Patient advocacy (2) University counsel (1) Biostatistics (3) Cancer center (3) Pediatric genetics (1) Program staff * Includes (or exclusively) external membership ** (n)= number of members representing this discipline/area. Several members are represented in more than one area Resources for EMR-based research at VUMC The Synthetic Derivative A de-identified and continuously-updated image of the EMR (2 M records) BioVU • DNA samples available: >175,000 • Plasma collection underway Redeposited genotypes 13 • Subjects with GWAS data: >12,000 • Subjects with any genotyping: >60,000 • > 8,000,000,000 genotypes 13 The Synthetic Derivative • • • • • • • Rich, multi-source database of de-identified clinical and demographic data A Derivative of the EMR - information content reduced by ‘scrubbing’ identifiers Systematically shifted event dates User Interface tool that can be used for access and analysis Services are available to help deliver results for non-standard queries (temporal queries, controls matching, etc) Contains ~2.1 million records o ~1 million with detailed longitudinal data o averaging 100,000 bytes in size o an average of 27 codes per record Records updated over time and are current through 8/31/13 Synthetic Derivative Data Types • • • • • • • • • Narratives, such as: Clinical Notes Discharge Summaries History and Physicals Problem Lists Surgical Reports Progress Notes Letters Diagnostic Codes, Procedural Codes Forms (intake, assessment) Reports (pathology, ECGs, echocardiograms) Clinical Communications Lab Values and Vital Signs Medication Orders TraceMaster (ECGs) Tumor Registry Technology + policy De-identification • Derivation of 128-character identifier (RUI) from the MRN generated by Secure Hash Algorithm (SHA-512) • HIPAA identifiers removed using combination of custom techniques and established de-identification software Date Shift • Our algorithm shifts the dates within a record by a time period (up to 364 days backwards) that is consistent within each record, but differs across records Restricted access & continuous oversight • Access restricted to VU; not a public resource • IRB approval for study (non-human) • Data Use Agreement • Audit logs of all searches and data exports Data Use Agreement • No attempt at re-identification • Inform BioVU staff if a record is identifiable • Research confined to that which is described • Genotypes to be re-deposited back to BioVU Phenotyping Approach Algorithm Development <95% Identify phenotype of interest Case & control algorithm development and refinement Manual review; assess precision ≥95% Deploy in BioVU Disease Cohorts Number in SD Number in BioVU Alzheimer’s 3,429 497 Parkinson’s 4,365 778 Migraine 15,699 3,299 Dementia 3,747 1,045 Major Depressive Disorder 20,008 3,385 ADHD 12,922 1,184 Generalized Anxiety Disorder 5,828 1,195 Schizophrenia 4,069 495 Central Nervous System Psychiatric 19 BioVU Utilization Pre-Review DNA Requests 120 BioVU Committee Review Expedited Review Full Review Genotyping data requests Reviewed by BioVU Chair 100 80 60 DNA sample access requests 40 Reviewed and scored by Primary and Secondary reviewers 20 BioVU Projects: Requests: 104 Data Requests 0 BioVU Requests Approved so far: 86 20 BioVU Approvals Current BioVU Studies BioVU Study Areas 25 Number of Studies 20 15 10 5 0 21 USE CASE 1 Synthetic Derivative Study 22 USE CASE 1 Synthetic Derivative Study 40 35 BMI 30 25 Normal Range 20 15 Zyprexa Prescription Ability to analyze quantitative, longitudinal repeated measures 23 USE CASE 1 Synthetic Derivative Study 24 USE CASE 1 Synthetic Derivative Study 25 USE CASE 1 Synthetic Derivative Study 26 USE CASE 1 Synthetic Derivative Study 900 800 700 600 500 400 300 200 100 0 0 27 13.3 26.2 40.9 BMI 73.4 300+ USE CASE 2 Existing Genetic Data 28 USE CASE 2 Existing Genetic Data 29 USE CASE 2 Existing Genetic Data 30 USE CASE 2 Existing Genetic Data 31 USE CASE 2 Existing Genetic Data 32 USE CASE 2 Existing Genetic Data 33 USE CASE 3 New Genotyping/Sequencing 34 USE CASE 3 New Genotyping/Sequencing 35 USE CASE 3 New Genotyping/Sequencing 36 USE CASE 3 New Genotyping/Sequencing 37 USE CASE 3 One way hash New Genotyping/Sequencing Investigator query Data use agreement cases + controls One way hash cases + controls eeddd eeddd b b bbbbeed bbbbe d u u e r r d u u b sscccrruubbbbeedd sscccrruubbbbbeeddd ssccrruubbbbeedd ssccrruubbbbeedd ssccrruubbbbeedd ssccrruubbbbeedd ssccrruubbbbeedd ssccrruubbbbeedd ssccrruubbbbeedd ssccrruubbbbeedd ssccrruubbbbeedd ssccrruubbbbeedd ssccrruubbbbeedd ssccrruubbbbeedd ssccrruubbbbeedd ssccrruubbbbeedd ssccrruubbbbeedd ssccrruubbbbeedd ssccrruubbbbeed ssccrruubbbbeed ssccrruubb ssccrruubb sscr sscr Data use agreement B699tre563msd.. F5rt783mbncds… B699tre563msd.. F5rt783mbncds… B699tre563msd.. F5rt783mbncds… B699tre563msd.. F5rt783mbncds… B699tre563msd.. F5rt783mbncds… B699tre563msd.. F5rt783mbncds… B699tre563msd.. F5rt783mbncds… B699tre563msd.. F5rt783mbncds… B699tre563msd.. F5rt783mbncds… B699tre563msd.. F5rt783mbncds… B699tre563msd.. F5rt783mbncds… B699tre563msd.. F5rt783mbncds… B699tre563msd.. F5rt783mbncds… B699tre563msd.. F5rt783mbncds… B699tre563msd.. F5rt783mbncds… B699tre563msd.. F5rt783mbncds… B699tre563msd.. F5rt783mbncds… B699tre563msd.. F5rt783mbncds… B699tre563msd.. F5rt783mbncds… B699tre563msd.. F5rt783mbncds… B699tre563msd.. F5rt783mbncds… B699tre563msd.. F5rt783mbncds… B699tre563msd.. F5rt783mbncds… B699tre563msd.. F5rt783mbncds… Investigator query Data analysis One way hash cases + controls eeddd eeddd b b bbbbeed bbbbe d u u e r r d u u b sscccrruubbbbeedd sscccrruubbbbbeeddd ssccrruubbbbeedd ssccrruubbbbeedd ssccrruubbbbeedd ssccrruubbbbeedd ssccrruubbbbeedd ssccrruubbbbeedd ssccrruubbbbeedd ssccrruubbbbeedd ssccrruubbbbeedd ssccrruubbbbeedd ssccrruubbbbeedd ssccrruubbbbeedd ssccrruubbbbeedd ssccrruubbbbeedd ssccrruubbbbeedd ssccrruubbbbeedd ssccrruubbbbeed ssccrruubbbbeed ssccrruubb ssccrruubb sscr sscr Data use agreement B699tre563msd.. F5rt783mbncds… B699tre563msd.. F5rt783mbncds… B699tre563msd.. F5rt783mbncds… B699tre563msd.. F5rt783mbncds… B699tre563msd.. F5rt783mbncds… B699tre563msd.. F5rt783mbncds… B699tre563msd.. F5rt783mbncds… B699tre563msd.. F5rt783mbncds… B699tre563msd.. F5rt783mbncds… B699tre563msd.. F5rt783mbncds… B699tre563msd.. F5rt783mbncds… B699tre563msd.. F5rt783mbncds… B699tre563msd.. F5rt783mbncds… B699tre563msd.. F5rt783mbncds… B699tre563msd.. F5rt783mbncds… B699tre563msd.. F5rt783mbncds… B699tre563msd.. F5rt783mbncds… B699tre563msd.. F5rt783mbncds… B699tre563msd.. F5rt783mbncds… B699tre563msd.. F5rt783mbncds… B699tre563msd.. F5rt783mbncds… B699tre563msd.. F5rt783mbncds… B699tre563msd.. F5rt783mbncds… B699tre563msd.. F5rt783mbncds… Investigator query F5rt783mbncds…. F5rt783mbncds…. F5rt783mbncds…. F5rt783mbncds…. F5rt783mbncds…. F5rt783mbncds…. F5rt783mbncds…. F5rt783mbncds…. F5rt783mbncds…. B699tre563msd…. B699tre563msd…. B699tre563msd…. B699tre563msd…. B699tre563msd…. B699tre563msd…. B699tre563msd…. B699tre563msd…. B699tre563msd…. Genotyping, genotypephenotype relations cases + controls Sample retrieval BioVU Project Life Cycle • • BioVU Genomic data analysis and research design Biostatistical/bioinformatic support 1-2 months • • • • • • Access approvals/application Cohort identification Clinical data extraction Programming support Study design Agreements 2-3 months VANGARD Vanderbilt Technologies for Advanced Genomics Analysis and Research Design VANTAGE 1-2 months • • • • Vanderbilt Technologies for Advanced Genomics Genotyping/sequencing approaches Assay design SNP selection Sample pulling and plating For ALL BioVU Studies… Resources: 1. BioVU Project Management: BioVU@vanderbilt.edu 2. Programming services: IDASC CORE 3. Genomic technologies: VANTAGE CORE 4. Data analysis services: VANGARD CORE https://starbrite.vanderbilt.edu/biovu/ 42 END 43 Validating EMR phenotype algorithms disease Atrial fibrillation Crohn's disease Multiple sclerosis Rheumatoid arthritis Type 2 diabetes marker gene / region rs2200733 Chr. 4q25 rs10033464 Chr. 4q25 rs11805303 IL23R rs17234657 Chr. 5 rs1000113 Chr. 5 rs17221417 NOD2 rs2542151 PTPN22 rs3135388 DRB1*1501 rs2104286 IL2RA rs6897932 IL7RA rs6457617 Chr. 6 rs6679677 RSBN1 rs2476601 PTPN22 rs4506565 TCF7L2 rs12255372 TCF7L2 rs12243326 TCF7L2 rs10811661 CDKN2B rs8050136 FTO rs5219 KCNJ11 rs5215 KCNJ11 rs4402960 IGF2BP2 0.5 0.5 published 1.0 Odds Ratio 2.0 observed 5.0 5 Ritchie et al, 2010