C S d M i EMR d

advertisement
Case Study: Merging EMR data C
S d M i EMR d
from VA hospitals
from VA hospitals
Merging data from Hospitals using the same EMR system – yields massive amounts of data; however, even the same systems will vary
systems will vary
Brian Nordberg
Brian
Nordberg
Data Manager
University of Utah Background
VHA developed VistA from the ground up
p
g
p
Deployed at all 130 VA sites and CBOC’s
~ 27 million patients in our MPI
27 million patients in our MPI
~ 845 million outpatient encounters (as of 6/10/2010)
• Individual sites have tailered VistA for their own q
unique uses
• As much as 60% of all data in VistA are in text notes
100 different “packages”
packages
• ~ 100 different •
•
•
•
•
•
•
•
Pharmacy
Surgery
Medicine
Radiology
National Data ‐The Good, Bad and Ugly
Bad and Ugly
• Good – Already merged at a national level and data are very useable
bl
•
•
•
•
•
Outpatient Encounters, ICD9, CPT
( g ,
g , ,
,
p,
,
g )
Vitals (Height, Weight, BP, Pulse, Temp, Pain Score, Hearing..)
Demographics
Inpatient visits, ICD9, CPT
B d ti /C t
Budgeting/Cost
• Bad – Merged, but have issues
• Pharmacyy
• Laboratory
• Microbiology
• Ugly
U l
• Orders
Possibly everything else –
everything else we
we won’t know until we pull it. won t know until we pull it.
• Possibly
• Notes ‐ 569,176,954 for 2 regions
Notes in VistA by Year
Notes in VistA by Year
90000000
80000000
70000000
60000000
50000000
40000000
30000000
20000000
10000000
0
1999
2000
2001
2002
2003
Sum of Encoutners
2004
2005
Sum of Notes
2006
2007
2008
2009
Areas of Focus
Areas of Focus
• Metadata –Where are the data, what d h
do they mean?
?
• Data pull Data pull “methods”
methods and validation
and validation
• Data coding differences and data Data coding differences and data
profiling
Metadata
VistA contains 78,301 distinct data fields over 9675 different files (~tables)
The Medicine Package 1 of ~100 packages in VistA
Metadata Sources
Metadata Sources
• Corporate Data Warehouse (fully merged data)
• Outpatient data
• Vitals
• More on the way
M
th
• VA Information Resource Center (VIReC) (Data in files by VISN per year ‐ so 23 files per year)
files by VISN per year
23 files per year)
• Medical SAS datasets
• Encounters
• Demographics
• Decision Support Systems
•
•
•
•
Labs (~50 most frequent lab tests)
Utilization
Some pharmacy (150 most frequent PHA prescriptions)
(150 most frequent PHA prescriptions)
Budget and Cost
VistA Files and Fields Metadata
Simple Queries of Metadata
Simple Queries of Metadata Here we were looking for Ejection Fraction. This gives us a starting point to attempt to find the needed data for the study
Next we spoke to some experts at a station who indicated they were putting the data in those fields. So we pulled those tables
Ejection Fraction Study Discrete Ejection Fraction fields (698.1, 698.8)
fields (698.1, 698.8) populated
VHA Stations Echo Reports
600
640 *
2
10391
653
132
660
1
691
910
* St ti
* Station with whom we discussed data
ith h
di
dd t
Other Stations Put EF Data in Notes
Other Stations Put EF Data in Notes
Discrete Ejection Fraction fields (698.1, 698.8) populated
VHA Stations Echo Reports
600
Text Notes referencing
“Ejection Fraction” VHA
Stations
Notes
2
600
44948
10391
640
76584
653
132
653
11482
660
1
660
64264
691
910
691
54785
640 *
Metadata tell us where, next is how we pull the data
SSeveral ways to pull data from MUMPS, but different methods can yield different l
ll d
f
S b diff
h d
i ld diff
results
• MUMPS Data Extractor – very expensive software – not many stations have it
• Custom M code –
C t
M d now frowned upon as it can bring VistA to its knees
f
d
it
b i Vi tA t it k
• Shadow (replicated) VistA system – corporate data warehouse uses this method
10//1/2001
12//1/2002
2//1/2004
4//1/2005
6//1/2006
8//1/2007
10//1/2002
12//1/2003
2//1/2005
4//1/2006
6//1/2007
8//1/2002
10//1/2003
12//1/2004
2//1/2006
4//1/2007
6//1/2002
8//1/2003
10//1/2004
12//1/2005
2//1/2007
4//1/2002
6//1/2003
8//1/2004
10//1/2005
12//1/2006
2//1/2002
4//1/2003
6//1/2004
8//1/2005
10//1/2006
12//1/2001
2//1/2003
4//1/2004
6//1/2005
8//1/2006
10//1/2001
12//1/2002
2//1/2004
4//1/2005
6//1/2006
8//1/2007
1//1/2000
3//1/2001
5//1/2002
7//1/2003
9//1/2004
11//1/2005
1//1/2007
3//1/2002
5//1/2003
7//1/2004
9//1/2005
11//1/2006
Validating Data Pull Methods
• Microbiology accessions by site by month for 2 different methods
Blue was supposed to be our “Gold
Gold Standard
Standard” after review we after review we
• Blue was supposed to be our invalidated our “Gold Standard”
3000
2500
2000
1500
1000
500
0
J
A
B
C
D
E
F
VA Station by Month/Yr
G
H
I
J
A
B
C
D
E
F
VA Station by Month/Yr
G
H
I
J
7/1/2009
4/1/2009
1/1/2009
10/1/2008
7/1/2009
4/1/2009
1/1/2009
10/1/2008
7/1/2009
4/1/2009
1/1/2009
10/1/2008
7/1/2009
4/1/2009
1/1/2009
10/1/2008
7/1/2009
4/1/2009
1/1/2009
10/1/2008
7/1/2009
4/1/2009
1/1/2009
10/1/2008
7/1/2009
4/1/2009
1/1/2009
10/1/2008
7/1/2009
4/1/2009
1/1/2009
10/1/2008
7/1/2009
4/1/2009
1/1/2009
10/1/2008
7/1/2009
4/1/2009
1/1/2009
10/1/2008
Data Pull Method 2
Compared to yet another “gold standard” much closer, but unable to get exact
1600
1400
1200
1000
800
600
400
200
0
Aggregating the Data and Handling Data Conversions
Profiling for Data Coding and Conversions
g
g
• Do the data conform to the defined values or range of values expected
• Data Types – Alpha, Numeric, Date
• Outliers Outliers – note, transform or remove
note, transform or remove
Vital Types
Counts
Min
Result
Max Result
Avg
Result
StdDev
Result
Height
77,506,463
‐18
77,295
69.1
25.6
Weight
150,553,303
‐5
2,778,808
200.4
21,827.4
• Larger data types take up more database space, but if sized too small imports will fail or data will get truncated
bigint
Integer (whole number) data from ‐2^63 (‐9,223,372,036,854,775,808) through 2^63‐1 (9,223,372,036,854,775,807).
int
Integer (whole number) data from ‐2^31 (‐2,147,483,648) through 2^31 ‐ 1 (2,147,483,647).
smallint
Integer data from ‐2^15 (‐32,768) through 2^15 ‐ 1 (32,767).
tinyint
Integer data from 0 through 255.
Data Coding
Data Coding
Outliers may skew analysis
y
y
Average Pain Scores by VISN
OOCH!
180
Do we convert, delete or leave and note in Metadata
Metadata, regardless we need to discuss with the people coding at 23
160
140
120
100
80
60
40
20
0
1
2
3
4
5
6
7
8
9
10
11
VISN
12
15
16
17
18
19
20
21
22
23
Data Conversions Data Typing
Data Conversions –
Data Typing
MUMPS is not a strongly typed database, so it allows invalid dates, times, numerics…
• VistA stores dates and times as a string. To convert, simply add 1700 to the first 3 digits, the next 2 digits are month…
• SQL (Oracle, SQL Server) Datetime Data types cannot contain missing days
AdmitDate
Converted
AdmitDate
3000306.215
3000316.101
3/6/2000 21:50
3/16/2000 10:11
2840913
9/13/1984 0:00
9/13/1984 0:00
3000316.102
3/16/2000 10:18
30003
3//2000
Admit Date
Complete
Missing Day
Missing Hour
Missing Minute
Data Conversions
Data Conversions
Race
14000000
12000000
10000000
8000000
6000000
4000000
2000000
0
711,474 patients have different race categories from different stations
Data Conversion
Data Conversion
Lab data
• 36,433 different laboratory test names (FY 07 –
,
y
(
FY09)
• Similar test names may contain very different Similar test names may contain very different
results • Example –
Example CREATININE –
CREATININE very similar test names, very similar test names
very different results
• Combining tests requires clinical knowledge of C bi i
i
li i l k
l d
f
tests and possible result values
Creatinine Lab Test Names
Normal levels of creatinine in the blood
creatinine in the blood are ~ 0.5 to 1.7 milligrams
Min Result
Max Result
Avg Result
CREATININE (mg/24 Hr)
CREATININE CLEARANCE
CREATININE CLEARANCE
125.0
125 0
125.0
265.5
30
3.0
4857.7
133 0
133.0
CREATININE(CRT), URINE
MERCURY/CREAT RATIO
3.0
1.0
0.2
1.0
1.9
1.0
MICROALBUMIN/CREAT. RATIO
837.0
0.0
6486.0
84.5
330.5
ALBUMIN CREATININE RATIO
ALBUMIN
CREATININE RATIO
AMYLASE ISOENZYMES (PANCREATIC)
838.0
2.0
1213.0
39.0
103.5
1.0
46.0
46.0
7961.0
396.0
2.9
1.0
56.7
240.7
1.0
145027.0
1.0
0.0
1.0
787.0
14.0
4.0
3164 0
3164.0
40.0
63.0
8833.0
4.0
0.0
1.0
01
0.1
33.2
0.4
0.2
833.8
126.2
1.2
12 2
12.2
225.4
3.1
254.6
2474.8
5512.0
12.0
66.0
226.0
62575.0
1.0
11727.0
1.2
0.8
0.6
14.0
0.1
12.0
0.3
3520.0
2.0
695.3
247.0
316.9
12.0
180.7
65.0
1.1
496.1
132.6
99.2
CREATININE(ua‐random)
3.0
45.3
120.2
73.5
40.7
CREATININE,urine
CREATININE,urine(prior 3‐06)
3.0
4.0
1.1
0.9
2.1
2.0
1.7
1.5
0.5
0.5
15.0
0.9
162.2
22.4
53.3
179.0
0.5
2.5
1.2
0.3
BUN/CREAT RATIO
COMPUTED CREATININE CLEARANCE
CREAT 24H CONC(DC'd 9‐07)
CREATININE
Normal creatinine urine Normal
creatinine urine
value is ~ 27‐ 260 ml Counts
Standard Dev
CREATININE (FLUID only)
CREATININE (O)
CREATININE (PRIOR TO 8‐10‐04)
CREATININE (PRIOR TO 8
10 04)
CREATININE (random urine)
CREATININE (Ref.Lab)
CREATININE (Serum)
CREATININE (urine) ‐ mg/24HRS
CREATININE (urines)
CREATININE {St.}
CREATININE 24H CONC
Creatinine Clearance Result
Creatinine Serum Result CREATININE(sera,blood)
CREATININE(serum/plasma)
CREATININE(spot ua or fluids)
CREATININE,urine24hr, not clrnce
CT CREATININE
1406.0
54 5
54.5
0.9
1.0 NULL
702.6
28 8
28.8
0.9
46.0 NULL
16.8
62.6
1.0 NULL
6.0
22.4
1.1
15
1.5
100.2
1.4
2.1
1855.2
5.8
36.1
28.0
44.1
0.1
12
1.2
45.3
0.6
10.1
891.3
313.0
522.3
1.0
0.3
13.6
91.9
74.7
41.3
1.4
1.6
12.0 NULL
1.3
1.8
Different Coding Practices
g
• VistA
VistA has a complex heirarchial has a complex heirarchial
database
• Many discrete fields to store data
Many discrete fields to store data
• Sites may choose to store data in those fields – or choose other fields
• Ejection Fraction
• Blood Pressure –
Blood Pressure may be may be
stored in Discrete or Text
Coding of Text Documents at Each Site
Coding of Text Documents at Each Site
Study: Review of History and Physical Notes for Coding of Hospital Acquired Infections
I
I sampled all records containing “History” to avoid “&” vs “AND” ‐
l d ll
d
t i i “Hi t ” t
id “&” “AND” 112,274 records
112 274
d
Sta3n Document Titles – for History and Physical TextDocuments 436
HISTORY & PHYSICAL 442
HISTORY & PHYSICAL (BURROWS) 601
442
HISTORY & PHYSICAL (FERMELIA) 85
575
HISTORY & PHYSICAL NOTE 77
442
HISTORY & PHYSICAL TEMPLATE 660
HISTORY & PHYSICAL* 442
HISTORY AND PHYSICAL
HISTORY AND PHYSICAL 12 591
12,591
554
HISTORY AND PHYSICAL 36,068
442
HISTORY AND PHYSICAL CONSULT REPORT 554
HISTORY AND PHYSICAL EXAM SCB
HISTORY AND PHYSICAL EXAM SCB 442
HISTORY AND PHYSICAL O&E 554
HISTORY CBCB TOTAL
30,470
1,212
657
1,353
22 906
22,906
145
6,109
112,274
H&P would have been missed
Sta3n
Document Titles – for History and Physical 660 H&P * CARDIOLOGY
660 H&P **SURGERY PRE‐OP
Text Documents 206
4,829
554 H&P FOR VISUAL IMPAIRMENT SERVICES
27
660 H&P GEM EVALUATION
21
660 H&P GENERAL SURGERY ADMIT
86
660 H&P GERIATRICS
17
660 H&P MEDICAL STUDENT
98
H&P MEDICINE ADMIT
660 H&P MEDICINE ADMIT
781
660 H&P MEDICINE INTERN ADMISSION NOTE
22,894
660 H&P MEDICINE RESIDENT ADMISSION
19,729
660 H&P MEDICINE RESIDENT ADMIT NOTE
660 H&P MEDICINE STUDENT ADMISSION
660 H&P MEDICINE SUB‐I ADMIT NOTE
660 H&P MH 3A PSYCHIATRIC ADMIT
660 H&P MH HOMELESS PRIMARY CARE PROVIDER NEW PATIENT 660 H&P MH HOMELESS PROGRAM NOTE
660 H&P MH SUBSTANCE ABUSE TREATMENT
966
6,129
628
4,445
112
22
227
660 H&P MICU RESIDENT ADMISSION
4,849
660 H&P NEUROLOGY
1,220
660 H&P NEUROSURGERY
660 H&P PM&R INPATIENT ADMISSION
660 H&P PM&R INTERDISCIPLINARY ADMIT NOTE
660 H&P PODIATRY
660 H&P PRIMARY CARE
660 H&P PRIMARY CARE MID LEVEL
660 H&P PRIMARY CARE NOTE
660 H&P PRIMARY-CARE PROVIDER NEW-PATIENT
660 H&P PSYCHIATRIC ADMIT
660 H&P PSYCHIATRY ADMIT
660 H&P SURGERY
660 H&P SURGERY (INPATIENT)
11
175
718
71
2,430
898
1,528
29,012
218
2,873
11,826
539
660 H&P SURGERY CONSULT
2,165
660 H&P SURGERY UPDATE
1,221
660 H&P UROLOGY
660 H&P VASCULAR SURGERY ADMIT
45
72
660 H&P WOMEN'S
29
660 H&P** PRE‐OP MULTIDISCIPLINARY NOTE
26
TOTAL
121,143
What Do We Do?
What Do We Do?
• Metadata ‐ data about the data. More time needs to be spent with data owners to document
• Data stewards need to understand the coding g
practices in their institutions and researchers need to work with data stewards
• Data validation – each data pull should be validated against some standard
against some standard
• Data profiling ‐ each dataset will need to be analyzed for what it contains the range of values
for what it contains, the range of values…
• Standardization – Working with standards bodies at the hospital and national levels
the hospital and national levels
Download