What is the RecordCounter?

advertisement
EXPLORING DATA AND COHORT
DISCOVERY IN THE SYNTHETIC
DERIVATIVE
Feasibility & Hypothesis Testing
The RecordCounter
The Synthetic Derivative Record Counter (RecordCounter) provides
exploratory data figures and counts to members of the VU research
community for research planning purposes and feasibility assessment.
• Available to ANYONE with the VUNET id
• Allows the user to input basic medical data, such as ICD 9 codes or
text keywords, e.g., lung cancer, as well as demographic information,
and then search the Synthetic Derivative database to determine the
approximate number of records that meet those criteria.
• Can start investigating immediately…..
Secondary Use of Clinical Data
What is the Synthetic Derivative (SD) ?
• Rich, multi-source database of de-identified clinical and demographic data
• User Interface tool that can be used for access and analysis
• Services are available to help deliver results for non-standard queries
(temporal queries, controls matching, etc)
• Contains ~2.3 million records
• ~1 million with detailed longitudinal data
• averaging 100k bytes in size
• an average of 27 codes per record
• Records updated over time and are current through 6/30/2014, soon to be
updated to 10/31/2014
The RecordCounter Vs. The SD
The RecordCounter – Users can use search criteria to return
exploratory counts (The results returned are not exact and
are meant for a high level assessment of the available data.)
The SD - User can use search criteria to returns exact count
and the associated longitudinal data for review.
What is the Research Derivative (RD)?
• Fully identified repository of integrated clinical data with tight IRB/DUA access
requirements
• Contains ~2.3 million records
• Updates regularly and is typically about 4 weeks behind the present date
• There is no tool supporting the Research Derivative and all access to the data must be
through programming support
Synthetic Derivative has proven transformative, but lacks ability to support:
1. Seasonality Studies;
2. Outbreaks and other date-specific studies (catastrophes, etc);
3. Find a specific patient (e.g. to contact)
What is BioVU?
BioVU is the Vanderbilt DNA biorepository of DNA extracted from
discarded blood collected during routine clinical testing and linked to
information in the Synthetic Derivative.
Current sample number: 212,059
 168,014 adult samples
 23,280 pediatric samples
Resources for EMR-based research at
VUMC
The Synthetic Derivative
A de-id and continuously-updated version of the EMR
(~2.3 M records)
BioVU
•
DNA samples available: >212,000
• Expansion efforts underway
Redeposited genotypes
•
•
Subjects with GWAS data: >13,000
Subjects with any genotyping: >60,000
• > 8,000,000,000 genotypes
8
8
Synthetic Derivative (De-ID EMR
Information)
BioVU = SD + Genotyping Data
Record Counter
(Feasibility/Hypothesis)
1) Self-service tools available at no - or low - cost for researchers;
2) Customized tools and data extraction services using a fee-forservice agreement with researchers to sponsor ORI
programmers when existing self-service tools are not adequate to
fulfill complex use cases.
Scientific
Portfolio
Synthetic Derivative Data Types
Documents, such as:
• Clinical Notes
• Discharge Summaries
• History and Physicals
• Problem Lists
• Surgical Reports
• Operative Notes
• Progress Notes
• Letters
Diagnostic Codes, Procedural Codes
Reports (pathology, ECGs, echocardiograms)
Lab Values and Vital Signs
 Medications
TraceMaster (ECGs)
Tumor Registry
Technology + policy
De-identification
• Derivation of 128-character identifier (RUI) from the MRN generated by
Secure Hash Algorithm (SHA-512)
• HIPAA identifiers removed using combination of custom techniques and
established de-identification software
Date Shift
• Our algorithm shifts the dates within a record by a time period (up to
364 days backwards) that is consistent within each record, but differs
across records
Restricted access & continuous oversight
•
•
•
•
Access restricted to VU; not a public resource
IRB approval for study (non-human)
Data Use Agreement
Audit logs of all searches and data exports
Creating Phenotypes
• Definition of phenotype for cases and controls
is critical
– May require consultation with experts
• Basic understanding of data elements; uses
and limitations of particular data points is
important
• Reviewing records manually to make case
determination (or even to calculate PPV of
search methodology) will be somewhat time
consuming
The problem with ICD9 codes
• ICD9 give both false negatives and false positives
• False negatives:
• Outpatient billing limited to 4 diagnoses/visit
• Outpatient billing done by physicians (e.g., takes too long to find the unknown
•
ICD9)
Inpatient billing done by professional coders:
• omit codes that don’t pay well
• can only code problems actually explicitly mentioned in documentation
• False positives:
• Diagnoses evolve over time -- physicians may initially bill for suspected
•
•
diagnoses that later are determined to be incorrect
Billing the wrong code (perhaps it is easier to find for a busier clinician)
Physicians may bill for a different condition if it pays for a given treatment
• Example: Anti-TNF biologics (e.g., infliximab) originally not covered for psoriatic
arthritis, so rheumatologists would code the patient as having rheumatoid arthritis
Phenotyping Approach
Algorithm Development
<95%
Identify
phenotype of
interest
Case & control
algorithm development
and refinement
Manual review;
assess precision
≥95%
Deploy in BioVU
Phenotype Algorithm Development
• Definition of phenotype for cases and
controls is critical
– May require consultation with experts
• Basic understanding of data elements;
uses and limitations of particular data
points is important
• Reviewing records manually to make case
determination (or even to calculate PPV of
search methodology) will be somewhat
time consuming
Once you have logged in…
Your Dashboard
•
•
•
•
A welcome and announcement section to
give the Investor any immediate
information/Help when accessing the SD
Projects and sets found on the left hand
side
On the dashboard add project teams to
sets you have created
Overall SD/BioVU population
demographics with to give an up-to-date
population details of the resource
Drag and Drop Search for Clinical Features
• Same interface as the Record Counter
• Can create complex logic statements with
OR, AND, & NOT.
• Can limit search to look only at subjects in
BioVU
User friendly Record Review Interface
• Subjects listed on the Left hand side
• Filter and search functionality
• Status designation
Data Visualization Features
In the Summary tab and in the Vitals
view, the new SD has new data
visualization features that allow a
reviewer to get a quick view of a
subject’s longitudinal data.
Easy Search and Filtering
for Document Review
Export Data
• Detailed data to a text files
• Demographic and annotations to REDCap
New Directions…
• Plasma in BioVU - Pilot project is underway to establish a program to bank
plasma in the areas of biomarker discovery (heart failure), antibody therapy
(breast cancer) & medication adherence (resistant hypertension)
• PathLink – A tissue repository that will collect and store leftover tissues
obtained during the course of standard medical care. Tissue samples and
data will be linked to other clinical databases and BioVU.
• ImageVU - Linking images such as MRIs and PET scans to the RD and
SD
• Additional Data Sources….
SD Access Protocol
Requests IRB
Exemption
Researcher
Enters
StarBRITE to
complete
electronic
application
(IRB status is
in StarBRITE)
Signs DUA
SD staff
verify/
access
granted
Researcher
accesses
SD
Questions or Comments?
SD Help Sessions will be held the second and fourth
Wednesday of each month at 1 pm. All are welcome.
Time: 1:00-2:00 PM
Location (2nd Wed): 2525 West End, 600 conference room
Location (4th Wed): Light Hall, Room 437
If you have any questions or feedback about the SD, please
contact us, email Jacqueline.Kirby@Vanderbilt.edu
Download