NEW ENHANCEMENTS IN THE SYNTHETIC DERIVATIVE AND WHAT THAT MEANS FOR THE RESEARCHER Jacqueline Kirby June 7th, 2013 Resources • StarPanel • Identified clinical data; designed for clinical use • Record Counter • De-identified clinical data; sophisticated phenotype searching • Returns a number – record counts and aggregate demographics • Synthetic Derivative • De-identified clinical data; sophisticated phenotype searching • Returns record counts AND de-identified narratives, test values, medications, etc., for review and creation of study data sets • Research Derivative • Identified clinical data • Programmer (human) supported • BioVU • Genotype data • De-identified clinical data; sophisticated phenotype searching • Able to link phenotype information to biological sample What is the RecordCounter? The Synthetic Derivative Record Counter (RecordCounter) provides exploratory data figures and counts to members of the VU research community for research planning purposes and feasibility assessment. • Available to ANYONE with the VUNET id • Allows the user to input basic medical data, such as ICD 9 codes or text keywords, e.g., lung cancer, as well as demographic information, and then search the Synthetic Derivative database to determine the approximate number of records that meet those criteria. What is the Synthetic Derivative (SD) • Rich, multi-source database of de-identified clinical and demographic data • User Interface tool that can be used for access and analysis • Services are available to help deliver results for non-standard queries (temporal queries, controls matching, etc) • Contains ~2.3 million records • ~1 million with detailed longitudinal data • averaging 100k bytes in size • an average of 27 codes per record • Records updated over time and are current through December, 2012 • Soon to be 5/31/2013 The RecordCounter Vs. The SD The RecordCounter – Users can use search criteria to return exploratory counts (The results returned are not exact and are meant for a high level assessment of the available data.) The SD - User can use search criteria to returns exact count and the associated longitudinal data for review. What is BioVU? • • • • • The move towards personalized medicine requires very large sample sets for discovery and validation BioVU: biobank intended to support a broad view of biology and enable personalized medicine Contains de-identified DNA extracted from leftover blood after clinically-indicated testing of Vanderbilt patients who have not opted out Linked to Synthetic Derivative: de-identified EMR Current sample number: 166,397 o147,292 adult samples o19,220 pediatric samples Synthetic Derivative vs. BioVU Synthetic Derivative Data Types Documents, such as: • Clinical Notes • Discharge Summaries • History and Physicals • Problem Lists • Surgical Reports • Progress Notes • Letters Diagnostic Codes, Procedural Codes Forms (intake, assessment) Reports (pathology, ECGs, echocardiograms) Clinical Communications Lab Values and Vital Signs Medication Orders TraceMaster (ECGs) Tumor Registry Technology + policy De-identification • Derivation of 128-character identifier (RUI) from the MRN generated by Secure Hash Algorithm (SHA-512) • HIPAA identifiers removed using combination of custom techniques and established de-identification software Date Shift • Our algorithm shifts the dates within a record by a time period (up to 364 days backwards) that is consistent within each record, but differs across records Restricted access & continuous oversight • • • • Access restricted to VU; not a public resource IRB approval for study (non-human) Data Use Agreement Audit logs of all searches and data exports The New SD… Synthetic Derivative 3.0 was launched with on February 25, 2013. SD 3.0 leverages the power of an IBM Netezza data warehouse appliance to provide faster, near-immediate counts as the user builds their search criteria and new review features that includes enhanced data visualization and covariate annotation capabilities. SEARCH: Counts are provided for each search item in real-time as you build your algorithm letting you adjust your criteria immediately. Modifiers for ICD 9 codes allow searches to require 2 or more codes. REVIEW: Filter and highlight documents, medications and labs to make review efficient. ANNOTATE: Create your own set-based annotations that are sharable across the study team. General algorithm for determining a phenotype • Definition of phenotype for cases and controls is critical • May require consultation with experts • Basic understanding of data elements; uses and limitations of particular data points is important • Reviewing records manually to make case determination (or even to calculate PPV of search methodology) will be somewhat time consuming The problem with ICD9 codes • ICD9 give both false negatives and false positives • False negatives: • Outpatient billing limited to 4 diagnoses/visit • Outpatient billing done by physicians (e.g., takes too long to find the • unknown ICD9) Inpatient billing done by professional coders: • omit codes that don’t pay well • can only code problems actually explicitly mentioned in documentation • False positives: • Diagnoses evolve over time -- physicians may initially bill for suspected • • diagnoses that later are determined to be incorrect Billing the wrong code (perhaps it is easier to find for a busier clinician) Physicians may bill for a different condition if it pays for a given treatment • Example: Anti-TNF biologics (e.g., infliximab) originally not covered for psoriatic arthritis, so rheumatologists would code the patient as having rheumatoid arthritis Lessons from preliminary phenotype development • Eliminating negated and uncertain terms: • “I don’t think this is MS”, “uncertain if multiple sclerosis” • Delineating section tag of the note • “FAMILY MEDICAL HISTORY: Mother had multiple sclerosis.” • Adding requirements for further signs of “severity of disease” • For MS: an MRI with T2 enhancement, myelin basic protein or oligoclonal bands on lumbar puncture, etc. • This could potentially miss patients with outside work-ups, however Once you have logged in… The New SD gives a cleaner Home page interface with aggregate SD graphs. New features for the Investigator: • A welcome and announcement section to give the Investor any immediate information/Help when accessing the SD • Overall SD/BioVU population demographics with to give an up-to-date population details of the resource Improved Search Features Once you have selected “Start a New Search”, you will go to the Search Interface. Users can select search criteria to see record counts by dragging and dropping Search Criteria (e.g. ICD codes, Labs, Document Keywords, Medications) into the Search box. New Search Features include: • Counts for each specific criteria element as denoted to the right hand side of the search box(circled in red), summary counts for combined criteria (this OR that) indicated at the bottom of the group box(circled in blue), and a final Total count at the right corner of your search(circled in green) • Limit Search To BioVU Records, Non-compromised BioVU Samples, or only BioVU Samples available for external assay • Limit your search based on number of ICD code occurrences in the subject record to require multiple instances of a ICD code Improved Set Review After you have build your set, you can be begin reviewing your records. The New SD has both a Summary view to see a high level graphic view of a subject AND a Detail view that allows you to customize your view with a new Tabular view. What’s new in Review: • Subject ids listed on left hand side to move easily through the records. • Tabular view of the different data elements with custom sorting of tabs • Arial buttons for determining Subject status New Data Visualization Features In the Summary tab and in the Vitals view, the new SD has new data visualization features that allow a reviewer to get a quick view of a subject’s longitudinal data. Improved Document View Documents are divided into three tabs: • High Value Documents • Other Documents • Problem Lists On each Document tab, you can 1. Filter based on Keywords, Document Type, Subtypes 2. Filter keywords searches and display only the context 3. Highlight based on Keywords and display either the full documents or the word(s) in context New Medications and Labs Display Medication and Lab view now have two displays for easier review. The Summary view displays aggregate mentions of meds/labs with beginning and end dates. The Details view show each instance of the meds/lab full detail display with the ability to filter by data element. Improved Annotations Annotations allow for easier identification and saving of covariate information during set review. Create your own set-based annotations that are sharable across the study team. These can be exported to excel when performing your data analysis. What’s Next? • Data Export into REDCap • Adding PheWAS to the search criteria • Predict Labs in the Lab view • Custom and Timeline View • …. The SD has evolved greatly in the past six months and this is largely due to suggestions and needs from its users. Please let us know what YOU would like in the SD so that the SD can continue to evolve. SD Access Protocol Requests IRB Exemption Researcher Enters StarBRITE to complete electronic application (IRB status is in StarBRITE) Signs DUA SD staff verify/ access granted Researcher accesses SD Leveraging VICTR Resources • Record Counter (RC) – part of SD but open to anyone with a Vunet ID: https://biovu.vanderbilt.edu/RC/RC.html • SD (/BioVU) – Erica Bowton (via StarBrite) • RD – email or call me, or fill out a Request form at https://starbrite.vanderbilt.edu/ • (https://starbrite.vanderbilt.edu/managedata/datarequest.h tml ) Questions or Comments? SD User Group Sessions will be held the fourth Wednesday of each month at 1 pm. All are welcome. Time: 1:00-2:00 PM Location: Light Hall, Room 439 If you have any questions or feedback about the new SD, please contact us, email Jacqueline.Kirby@Vanderbilt.edu THANK YOU!