Clinical Research Management System

advertisement
Using EMR Data for Population Registries
Diana Gumas, JHMCIS Senior Director for Research
Systems, EPR and EPR2020/Amalga
David Thiemann, Center for Clinical Data Analysis
1
Potential Data Uses
• Sample Size Estimates (aggregate data without IRB approval)
– Feasibility, grant applications, statistical planning
• Identifying patients for enrollment/recruitment
– By diagnosis, pathology, stage, labs, meds
• Identifying/creating matched study controls
• Obtaining current demographics (name, address) for mail
solicitation
– From research list or by clinic, provider, clinical criteria
• Obtaining ongoing clinical + administrative data on a registry
panel
– Labs, visits, procedures, immunizations, CPT/ICD9 codes,
resource use
2
Possible research data sources
•
•
•
•
•
•
•
•
•
•
EPR (JHH & JHBMC)
Sunrise Clinical Manager (JHH – inpatient)
Meditech (Bayview)
Casemix Datamart
GE Centricity (JHCP)
EPR2020
Departmental Systems (ED, OR, Anesthesia)
Clinical Research Management System (CRMS)
IDX (professional fees)
Death Registry
3
Methods for Data Access
•
Historical: Researcher Negotiates Access With
Clinical System /DBA
–
•
Logistic nightmare, technical challenge
Clinical Research Management System (CRMS)
–
•
Study cohort with real-time links to enterprise data
Center for Clinical Data Analysis
–
Monthly/quarterly data extracts from designated systems
4
Clinical Research Management System
(CRMS)
•
•
•
1,054 Users
1079 Active Studies
25,430 Participants
Data Available in CRMS
–
eIRB
–
EPR (patient demographics)
–
Study participants / accruals
–
Electronic Case Report Forms - in next 2-3 months
5
Clinical Research Management System
(CRMS)
Ways to extract data
–
Canned Reports (click for examples)
–
Ad-hoc querying using SQL
–
Possible with CCDA support - automated study-specific
data extracts
6
EPR2020 Data for Researchers
4.2M Patients, 23.4M Visits
12.3M Documents, 6.8M Radiology Reports
25.6M Lab Results
1.5M Problems, 2.2M Medications, 140K Allergies
Planned
• Bayview & JHCP data
• ICD9 diagnosis codes and CPT charges (IDX)
Future
• Death Registry
• Blood Product Data for Transfusions
• Eclipsys SCM Order data
• HMED (ED), ORMIS, eADR/Medivision
From
EPR
Today
7
My Participant’s Lab Data
Reliable.
Driven by the CRMS Participant Registry.
Exportable.
8
Registry Cohort Discovery using
EPR2020
A JHM investigator wants to find and enroll diabetic patients
aged 45-65 years
with hemoglobin A1C between 7 and 9%
serum creatinine < 2 mg/dl
9
Center for Clinical Data Analysis (CCDA)
Provides periodic (monthly/quarterly) bulk data extracts
(delimited/flat files, .xls):
•
Preliminary, anonymous data for feasibility, grant
applications and statistical sample-size estimates
•
IRB-approved case-finding--for study enrollment
(mailings, phone solicitation), chart review, and
cohort/case-control studies
•
Research data extracts - monthly/quarterly integrated
extracts from EPR, POE, ORMIS, lab/PDS, billing
systems, vaccination/transfusion/culture data, etc.
10
How CCDA works
•
Email CCDA@jhmi.edu, cc: dthiema1@jhmi.edu;
phone 410-955-65558 (Thiemann)
•
For IRB-approved research:
– Provide full protocol + IRB approval
– Meet to discuss query methods, format
– Iterate, then schedule prod (email extracts, Jshare)
– Cost: $100/hour
•
For non-IRB projects (exploratory analyses, QI)
– Same process, cost subsidized by ICTR/JHM
– Do NOT implicitly morph QI into IRB
11
The Basics: Getting Clinical Data
Into a Registry Database
•
Real work, not ad hoc/bootstrap
•
Need $$$ and FTE(s)
•
Smart analyst(s) who know database technology
and understand (or can learn) nuances of the
sources and content domain
•
Hands-on PI management/guidance
•
Statistical liason early, before database schema and
ETL methods are set in stone
12
The Extract-Transform-Load process:
Getting Clinical Data into Research DB
•
Raw clinical/administrative data is useless for
research
•
Build an intermediate (staging) database
–
Don’t do data management in SAS/Stata/Excel
•
Data dictionary—derivation for each field
•
Templated, tested, documented cleanup
scripts/routines.
•
Intermediate tables: Log each step/modification
–
–
–
For each batch, be able to re-create data transform from scratch
Version control, change control and documentation are vital
Build data versioning into the database
13
Transforming Data
•
Raw data typically string (char/text) fields
•
Unanalyzable characters (* < >, comments) still have
meaning
–
•
~3% of pts have multiple/non-preferred MRNs
–
•
Need 1-to-many link table
Assays/reference ranges/coding changes
–
–
•
Put non-numeric data in separate field. Avoid numerical
recoding (999)
Avoid using raw codes (CPT/ICD) in research db
Map clinical codes to research terms
Defer analytic assumptions. When recoding data,
anticipate problems. Keep options open.
14
More Data Transform Challenges
•
NEVER trust raw data. Learn business logic of
source system.
–
–
–
–
CPTs morph annually, internal complexity/redundancy
Lab assays/reference/terms change
Parsing is inherently unreliable
Administrative names/groups change (clinic #s, departments).
•
Duplicate-value problems (labs, orders)
•
System-attribution source/datetime (POE, lab)
•
Always run an aggregate (“group by” ) query to
identify alternative names (eg lab name) and values
(number, result) before transform. Otherwise you’ll
miss something
15
Understanding Business Logic
•
Trust but verify: Test coding accuracy
–
–
–
–
–
•
Run min/max queries, aggregates, outer joins
–
•
Providers may habitually use imprecise/inaccurate diagnosis
codes (especially in profee data)
ICD9 procedure indications often a billing fiction
Trained coders may make systematic errors
Different content domains may have different standards (inpt vs
outpt coders)
Don’t infer/assume dependencies unless enforced by source
system.
Confirm date ranges, data ranges, relative proportions by year
Don’t assume that null rows actually are empty.
Maybe the query missed something
16
JHM Clinical Data Landscape:
Past, Present and Future
Past : Babble of unintegrated systems
•
EPR (antiquated technology, VSAM files, DB2)
contains text, not queryable, analyzable data
Present: EPR2020 (aka Amalga) –integrated data!!
•
Has everything in EPR, plus JHCP, plus gradually
adding data from clinical/departmental/administative
systems (IDX CPTs, transfusion medicine, ORMIS,
HMED, eADR, death registry, ad infinitum)
Future: ? Epic, ? JHM Data Warehouse
•
Epic: One system replacing all major JHM systems
•
JHH timeline: 4+ years
17
JHM Data Sources: Casemix Datamart
•
Gold standard for JHM (non-profee) administrative
data, including payer/insurance data
•
Combines data from Keane (hospital charges), ADT
(admission/discharge/transfer), HDM (ICD9
diagnosis + procedure coding), HSCRC (regulatory
submissions)
•
Not a true data warehouse; meager reconciliation
•
Best source for length of stay, resource use, ICD9
diagnoses
•
Outpatient ICD9s limited
•
Has JHH + BMC + HCGH data
18
JHM Data Sources: IDX (profee)
•
Gold standard for inpatient +outpatient CPT (profee
charge) data
•
ICD9 diagnosis data problematic
•
Limitation: No data from non-faculty providers
(private physicians, etc.)
•
Difficult to query. Has a data warehouse, limited
access.
•
Early target for EPR2020/Amalga integration.
19
JHH Data Sources: SCM/POE
•
Sunrise Clinical Manager/Provider Order Entry
•
Replicated transactional database, difficult to query
•
For registry purposes POE has large
attribution/process challenges: Stutter-step orders,
multiple alerts, imputed times
•
Great source for inpatient meds, labs, physiologic
monitor data
•
No codified ICD9/Snomed/RxNorm data
•
No outpatient data
20
JHH Data Sources: SCC/AIM
•
Sunrise Critical Care (aka Emtek, Eclipsys). JHH
ICUs + stepdown units + oncology
•
AIM analytic database contains selected but
comprehensive batch extract
•
Sunsets as ICUs switch to POE ClinDoc
•
Challenging to query. Lots of denormalized fields
21
JHH + BMC Data Sources: PDS
•
PDS=Pathology Data Systems
•
Includes lab, transfusion medicine, anatomic
pathology, cytopath, John Boitnott’s death registry
•
Lab data also available via EPR2020/Amalga and
POE
22
BMC Data Sources: Meditech
•
Shrink-wrapped, comprehensive inpatient +
outpatient clinical + financial system
•
Difficult for ad hoc research queries.
•
Exports data to Datamart and EPR2020
•
BMC-JHH patient linkage doable but difficult, needs
caution
23
JHCP Data Sources: GE Centricity
•
All clinical + administrative data for JHCP clinics
•
Largely opaque to research query; JHCP sometimes
collaborates directly, especially for its
physician/investigators
•
Early target for EPR2020/Amalga integration
•
Linkage challenges to BMC and JHH mrns
24
JHH Departmental Data:
ORMIS + eADR/Medivision
•
ORMIS: Operating Room Management Information
System
•
Mostly transactional
scheduling/tracking/administrative data, limited
clinical data.
•
Has diagnoses, procedures, case start/stop times
•
eADR/Medivision (anesthesia) still evolving, limited
research data access
•
Design challenges similar to legacy SCC criticalcare system.
25
JHH Departmental Data:
HMED (Emergency Department)
•
Mostly opaque to research
•
Replicated data hosted by Datamart
26
Download