Adam Wilcox, PhD
Associate Professor of Biomedical Informatics
Benefits



Unobtrusive
Fast & inexpensive
Easy
Challenges



Availability
Quality
Security

What data are available?

How good are the data?

How do I get data?

What’s the worst that can happen?









Names
MRNs
Addresses
Telephone and fax #s
SSNs
Email addresses
Dates
Certificate numbers
Employers
names/addresses







Geographic subdivisions
smaller than state,
except initial 3 digits of
zip code
Account #s
URLs
IP addresses
Biometric identifiers
Full face photographs
Any other
characteristics that may
be used individually or
in combination to
identify the individual

Notification of Breach
◦ If more than 500 patients, HHS also notified
◦ Media

Civil penalties
◦ Up to $250,000
◦ Repeat violations up to $1.5M
100
95
90
85
80
75
70
65
60
100
120
140
160
180

1994: Created, sponsored by Columbia University
Department of Medical Informatics and Office of
Clinical Trials
◦ Populated with data from existing clinical data repository
◦ Supporting clinical research

1998: Columbia + Cornell = NewYork Presbyterian
Hospital
◦ Warehouse funded by NYPH
◦ Goal to incorporate and provide data across whole system

2004: Formal analysis of CDW user needs by Clinical
Quality and Information Technology Committee
(CQIT)
◦ Creation of Data Warehousing Subgroup
◦ Need to bring together disparate clinical data sources
◦ Need to manage user requests for data








Patient demographics
Visit history
Diagnoses
Procedures
Vital signs
Medications
Flowsheet elements, structured notes
(Notes)
1400000
1200000
1000000
800000
Patients
600000
Visits
400000
200000
2011
2009
2007
2005
2003
2001
1999
1997
1995
1993
1991
1989
1987
1985
0
35000
30000
25000
20000
15000
10000
5000
0
1 6 11 16 21 26 31 36 41 46 51 56 61 66 71 76 81 86 91 96
Asian
Black/Non-Hispanic
Declined
White Hispanic
American Indian
Other
Pacific Islander
Unknown
White/Non-Hispanic
Male
Female
4.50E+06
4.00E+06
3.50E+06
3.00E+06
2.50E+06
2.00E+06
1.50E+06
1.00E+06
5.00E+05
0.00E+00
Data type
Count
Diagnoses
3.3M
Procedures
570K
Lab tests
Medications
1.5M
Vital signs
~80% of patients
Flowsheet/structured elements
400M
Notes
6.3M
1: Gain access to data
(to be updated in coming weeks)
I have a WebCIS
login
Y
Submit HIPAA D
preparatory to
research forms
2: Explore data
using tools &
select variables
Top 50 Variables
List &
Meaningful Use
variables
N
Contact Adam
Wilcox
Receive HIPAA
approval
De-identified
databases:

RedEx

I2B2*
Other**
3: Request & refine data from Clinical Data
Warehouse (CDW)
4: Data
management &
analysis
What level of identifying patient information
are you requesting?
Receive data set
De-identified
Covered by
HIPAA G§
Limited***
Identifiable
Fill out HIPAA B
Receive HIPAA approval
Submit IRB & receive approval
Pin down key
variables to submit
via DISCOVERY
Import & manage
data for analysis
using:

SAS

Stata

REDCap

AMALGA

Other
Loop back to
DISCOVERY for
approval to publish
data and findings
Fill out DISCOVERY form to request data
Work with programmer to refine data
Share results with
CER Studio regarding
findings &
DISCOVERY process
Existing
8,000+ surveys
Studies
Ambulatory
Clinics
Community
Outreach
Center
Household
Surveys
 Integration of data
 Collection and storage of
patient-reported data
 Identify individuals based
upon eligibility criteria
 EHR plug-in
 Informatics tools to
support data retrieval
 Intervention delivery
 De-identify and link
datasets




Identify priority
disparity areas for CER
Integrate statistical
expertise via
preliminary studies
Validation analyses on
cost and service
utilization
Identify high-risk
physical & mental
comorbidities

What data are available?

How good are the data?

How do I get data?

What’s the worst that can happen?
Download

Washington Heights / Inwood Informatics Infrastructure for