EXPLORING DATA AND COHORT DISCOVERY IN THE SYNTHETIC DERIVATIVE Feasibility & Hypothesis Testing The RecordCounter The Synthetic Derivative Record Counter (RecordCounter) provides exploratory data figures and counts to members of the VU research community for research planning purposes and feasibility assessment. • Available to ANYONE with the VUNET id • Allows the user to input basic medical data, such as ICD 9 codes or text keywords, e.g., lung cancer, as well as demographic information, and then search the Synthetic Derivative database to determine the approximate number of records that meet those criteria. • Can start investigating immediately….. Secondary Use of Clinical Data What is the Synthetic Derivative (SD) ? • Rich, multi-source database of de-identified clinical and demographic data • User Interface tool that can be used for access and analysis • Services are available to help deliver results for non-standard queries (temporal queries, controls matching, etc) • Contains ~2.3 million records • ~1 million with detailed longitudinal data • averaging 100k bytes in size • an average of 27 codes per record • Records updated over time and are current through 6/30/2014, soon to be updated to 10/31/2014 The RecordCounter Vs. The SD The RecordCounter – Users can use search criteria to return exploratory counts (The results returned are not exact and are meant for a high level assessment of the available data.) The SD - User can use search criteria to returns exact count and the associated longitudinal data for review. What is the Research Derivative (RD)? • Fully identified repository of integrated clinical data with tight IRB/DUA access requirements • Contains ~2.3 million records • Updates regularly and is typically about 4 weeks behind the present date • There is no tool supporting the Research Derivative and all access to the data must be through programming support Synthetic Derivative has proven transformative, but lacks ability to support: 1. Seasonality Studies; 2. Outbreaks and other date-specific studies (catastrophes, etc); 3. Find a specific patient (e.g. to contact) What is BioVU? BioVU is the Vanderbilt DNA biorepository of DNA extracted from discarded blood collected during routine clinical testing and linked to information in the Synthetic Derivative. Current sample number: 212,059 168,014 adult samples 23,280 pediatric samples Resources for EMR-based research at VUMC The Synthetic Derivative A de-id and continuously-updated version of the EMR (~2.3 M records) BioVU • DNA samples available: >212,000 • Expansion efforts underway Redeposited genotypes • • Subjects with GWAS data: >13,000 Subjects with any genotyping: >60,000 • > 8,000,000,000 genotypes 8 8 Synthetic Derivative (De-ID EMR Information) BioVU = SD + Genotyping Data Record Counter (Feasibility/Hypothesis) 1) Self-service tools available at no - or low - cost for researchers; 2) Customized tools and data extraction services using a fee-forservice agreement with researchers to sponsor ORI programmers when existing self-service tools are not adequate to fulfill complex use cases. Scientific Portfolio Synthetic Derivative Data Types Documents, such as: • Clinical Notes • Discharge Summaries • History and Physicals • Problem Lists • Surgical Reports • Operative Notes • Progress Notes • Letters Diagnostic Codes, Procedural Codes Reports (pathology, ECGs, echocardiograms) Lab Values and Vital Signs Medications TraceMaster (ECGs) Tumor Registry Technology + policy De-identification • Derivation of 128-character identifier (RUI) from the MRN generated by Secure Hash Algorithm (SHA-512) • HIPAA identifiers removed using combination of custom techniques and established de-identification software Date Shift • Our algorithm shifts the dates within a record by a time period (up to 364 days backwards) that is consistent within each record, but differs across records Restricted access & continuous oversight • • • • Access restricted to VU; not a public resource IRB approval for study (non-human) Data Use Agreement Audit logs of all searches and data exports Creating Phenotypes • Definition of phenotype for cases and controls is critical – May require consultation with experts • Basic understanding of data elements; uses and limitations of particular data points is important • Reviewing records manually to make case determination (or even to calculate PPV of search methodology) will be somewhat time consuming The problem with ICD9 codes • ICD9 give both false negatives and false positives • False negatives: • Outpatient billing limited to 4 diagnoses/visit • Outpatient billing done by physicians (e.g., takes too long to find the unknown • ICD9) Inpatient billing done by professional coders: • omit codes that don’t pay well • can only code problems actually explicitly mentioned in documentation • False positives: • Diagnoses evolve over time -- physicians may initially bill for suspected • • diagnoses that later are determined to be incorrect Billing the wrong code (perhaps it is easier to find for a busier clinician) Physicians may bill for a different condition if it pays for a given treatment • Example: Anti-TNF biologics (e.g., infliximab) originally not covered for psoriatic arthritis, so rheumatologists would code the patient as having rheumatoid arthritis Phenotyping Approach Algorithm Development <95% Identify phenotype of interest Case & control algorithm development and refinement Manual review; assess precision ≥95% Deploy in BioVU Phenotype Algorithm Development • Definition of phenotype for cases and controls is critical – May require consultation with experts • Basic understanding of data elements; uses and limitations of particular data points is important • Reviewing records manually to make case determination (or even to calculate PPV of search methodology) will be somewhat time consuming Once you have logged in… Your Dashboard • • • • A welcome and announcement section to give the Investor any immediate information/Help when accessing the SD Projects and sets found on the left hand side On the dashboard add project teams to sets you have created Overall SD/BioVU population demographics with to give an up-to-date population details of the resource Drag and Drop Search for Clinical Features • Same interface as the Record Counter • Can create complex logic statements with OR, AND, & NOT. • Can limit search to look only at subjects in BioVU User friendly Record Review Interface • Subjects listed on the Left hand side • Filter and search functionality • Status designation Data Visualization Features In the Summary tab and in the Vitals view, the new SD has new data visualization features that allow a reviewer to get a quick view of a subject’s longitudinal data. Easy Search and Filtering for Document Review Export Data • Detailed data to a text files • Demographic and annotations to REDCap New Directions… • Plasma in BioVU - Pilot project is underway to establish a program to bank plasma in the areas of biomarker discovery (heart failure), antibody therapy (breast cancer) & medication adherence (resistant hypertension) • PathLink – A tissue repository that will collect and store leftover tissues obtained during the course of standard medical care. Tissue samples and data will be linked to other clinical databases and BioVU. • ImageVU - Linking images such as MRIs and PET scans to the RD and SD • Additional Data Sources…. SD Access Protocol Requests IRB Exemption Researcher Enters StarBRITE to complete electronic application (IRB status is in StarBRITE) Signs DUA SD staff verify/ access granted Researcher accesses SD Questions or Comments? SD Help Sessions will be held the second and fourth Wednesday of each month at 1 pm. All are welcome. Time: 1:00-2:00 PM Location (2nd Wed): 2525 West End, 600 conference room Location (4th Wed): Light Hall, Room 437 If you have any questions or feedback about the SD, please contact us, email Jacqueline.Kirby@Vanderbilt.edu