Adam Wilcox, PhD Associate Professor of Biomedical Informatics Benefits Unobtrusive Fast & inexpensive Easy Challenges Availability Quality Security What data are available? How good are the data? How do I get data? What’s the worst that can happen? Names MRNs Addresses Telephone and fax #s SSNs Email addresses Dates Certificate numbers Employers names/addresses Geographic subdivisions smaller than state, except initial 3 digits of zip code Account #s URLs IP addresses Biometric identifiers Full face photographs Any other characteristics that may be used individually or in combination to identify the individual Notification of Breach ◦ If more than 500 patients, HHS also notified ◦ Media Civil penalties ◦ Up to $250,000 ◦ Repeat violations up to $1.5M 100 95 90 85 80 75 70 65 60 100 120 140 160 180 1994: Created, sponsored by Columbia University Department of Medical Informatics and Office of Clinical Trials ◦ Populated with data from existing clinical data repository ◦ Supporting clinical research 1998: Columbia + Cornell = NewYork Presbyterian Hospital ◦ Warehouse funded by NYPH ◦ Goal to incorporate and provide data across whole system 2004: Formal analysis of CDW user needs by Clinical Quality and Information Technology Committee (CQIT) ◦ Creation of Data Warehousing Subgroup ◦ Need to bring together disparate clinical data sources ◦ Need to manage user requests for data Patient demographics Visit history Diagnoses Procedures Vital signs Medications Flowsheet elements, structured notes (Notes) 1400000 1200000 1000000 800000 Patients 600000 Visits 400000 200000 2011 2009 2007 2005 2003 2001 1999 1997 1995 1993 1991 1989 1987 1985 0 35000 30000 25000 20000 15000 10000 5000 0 1 6 11 16 21 26 31 36 41 46 51 56 61 66 71 76 81 86 91 96 Asian Black/Non-Hispanic Declined White Hispanic American Indian Other Pacific Islander Unknown White/Non-Hispanic Male Female 4.50E+06 4.00E+06 3.50E+06 3.00E+06 2.50E+06 2.00E+06 1.50E+06 1.00E+06 5.00E+05 0.00E+00 Data type Count Diagnoses 3.3M Procedures 570K Lab tests Medications 1.5M Vital signs ~80% of patients Flowsheet/structured elements 400M Notes 6.3M 1: Gain access to data (to be updated in coming weeks) I have a WebCIS login Y Submit HIPAA D preparatory to research forms 2: Explore data using tools & select variables Top 50 Variables List & Meaningful Use variables N Contact Adam Wilcox Receive HIPAA approval De-identified databases: RedEx I2B2* Other** 3: Request & refine data from Clinical Data Warehouse (CDW) 4: Data management & analysis What level of identifying patient information are you requesting? Receive data set De-identified Covered by HIPAA G§ Limited*** Identifiable Fill out HIPAA B Receive HIPAA approval Submit IRB & receive approval Pin down key variables to submit via DISCOVERY Import & manage data for analysis using: SAS Stata REDCap AMALGA Other Loop back to DISCOVERY for approval to publish data and findings Fill out DISCOVERY form to request data Work with programmer to refine data Share results with CER Studio regarding findings & DISCOVERY process Existing 8,000+ surveys Studies Ambulatory Clinics Community Outreach Center Household Surveys Integration of data Collection and storage of patient-reported data Identify individuals based upon eligibility criteria EHR plug-in Informatics tools to support data retrieval Intervention delivery De-identify and link datasets Identify priority disparity areas for CER Integrate statistical expertise via preliminary studies Validation analyses on cost and service utilization Identify high-risk physical & mental comorbidities What data are available? How good are the data? How do I get data? What’s the worst that can happen?