HIPAA and its Implications on Epidemiological Research Using Large Databases K. Arnold Chan, MD, ScD 1 Harvard School of Public Health Channing Laboratory, Birgham & Women’s Hospital and Harvard Medical School Brief outline of this presentation Using large linked automated data for public health research ● Data development processes to ensure HIPAA-compliance ● Examples ● Some thoughts ● Two types of data for public health research ● Primary data – – – ● Prospectively collected Well-designed data collection tool Informed consent Secondary data – – – – Data originally collected for other purposes May be proprietary Privacy and confidentiality (particularly important if no prior authorization) Different data systems Large linked healthcare databases ● Health insurance claims data – Medicaid – Medicare – Managed Care Organizations (MCO) ● Automated medical records ● Hospital / Clinic IT systems ● Availability of written records ● Need to contact patients / individuals ? Public health research within MCOs ● Harvard Community Health Plan (subsequently became Harvard Pilgrim HealthCare) ● Kaiser Permanente (several states) ● Group Health Cooperative (Seattle area) ● Others ● HMO Research Network – 10+ MCOs across the U.S. Public health research within MCOs ● Different types of MCOs ● Group model – Staff model – Different relationship with hospitals – Implications on data access MCOs with research programs – Separate research departments – Full-time investigators and support staff – Data elements in the MCO data ● Demographic information ● Membership – ● Office visits – ● Start date, termination date, benefit plan, ... Type of visit, diagnosis(es), special procedures Special examinations – Radiology, Laboratory examinations ● Hospitalizations ● Drug dispensings ● Linkable by a unique ID HIPAA and Research with Databases ● ● Authorization from individual research subjects not feasible Individual authorization may be waived by Institutional Review Board or Privacy Board – Minimal Risk – Data reported in aggregate fashion ● No single-case report – “Minimum necessary” principle – De-identification HIPAA and Research with Databases ● Single MCO studies – ● Multiple-MCO studies – ● Investigators and research staff are MCO employees May involve transferral of data across MCOs or to a Data Center Other types of studies not covered in this presentation – e.g. Generate a de-identified dataset for public or commercial use HIPAA and data development ● Do not move individual level data unless absolutely necessary – Generate summary tables at each study site – Combine the tables for final report – Smalley et al. Contraindicated use of cisapride: the impact of an FDA regulatory action. JAMA 2000; 284: 3036-9. HIPAA and data development ● Randomly generated Study ID to replace True ID – Crosswalk between the two stored at secured location – Destroy the crosswalk after successful linkage of data and quality check – Implications for storage and back-up HIPAA and data development ● Roll-up / transform variables – Age --> Age groups – National Drug Code --> Drug or Group of drugs – ICD-9 diagnosis code --> Disease e.g. A man born on Dec 10, 1934 with diagnosis code xxx.yy received durg 55555333-22 – 65-70 y/o m with Heart Failure received Digoxin HIPAA and data development ● Preserve temporal sequence of events but disguise the real dates ● e.g. Drug use during pregnancy study – 29 year-old received 55555-333-22 on Nov 25, 1999 and delivered a baby on Dec 10, 1999 --> – 26-30 year-old mother delivered in 1999, baby exposed to amoxicillin at -16 days HIPAA and data development ● Only extract information relevant to the study – ● e.g. A study of osteoporosis does not require information on subjects' mental health status Co-morbid conditions may be relevant – Use proxy measures to describe level of comorbidity ● ● Charlson's Index (based on concomitant diagnoses) Chronic Disease Score (based on co-medications) HIPAA and data development ● Geocoding – Describe social-economic status of study subjects based on census tract data – Send out (Study ID, address) to a geocoding firm – (Study ID, X1, X2, X3) returned ● ● ● X1 : education level X2 : income level X3 : race/ethnicity information An example Finkelstein et al. Decreasing Antibiotic Use Among US Children: The Impact of Changing Diagnosis Patterns. Pediatrics 2003; 112: 620-7. ● ● Data elements involved – Date of birth, gender – Membership – Drug dispensings – Diagnoses in close proximity to antibiotics dispensings Data from nine MCOs Finkelstein et al. Pediatric antibiotics use study ● ● Data development at each MCO – Extract antibiotics use information – Extract diagnosis of interest (infections) – Use date of birth, gender, and membership data to calculate person-time of interest Refined, aggregate data forwarded to the Data Center – Rate of antibiotics use = # of antibiotics use / 1,000 person-years for each age-gender group HIPAA and data development ● Individual identification is needed for certain types of research – Obtain medical records – Contact patient to conduct interview and/or request specimen – Linkage with external data ● ● Cancer registry National Death Index HIPAA and data development ● ● The process – Data extraction, transformation, reduction, and deidentification carried out at each MCO – Governed by State laws and local HIPAA-compliant Standard Operating Procedures – Principle of Limited Dataset / Minimum necessary The goal – Highly processed and de-identified data available for concatenation across study sites and complex analyses k-anonymity and large datasets ● The goal – A de-identified dataset at a certain level of individual anonymity A 43 year-old man with hypertension, diabetes, and anxiety, taking atenolol, rosiglitazone, and lorazepam vs. A man 40-45 taking a beta-blocker and a thiazolidenedione HIPAA, Data Storage and Access ● Implications on Data Backup Plans – ● ● Data need to be destroyed after the report is published Data only used to support pre-defined analyses Ancillary analysis are possible after IRB review and approval Epidemiology studies using large databases ● ● In the old days ... – Give me all the data, do what I say ... – What if the investigator / reviewer want to do THIS analysis ? – Use existing datasets to test new hypothesis Good research practice – Define necessary data elements according to research protocol – Pre-defined analytic plan Epidemiology studies using large databases ● Keys to protection of human subjects – Competent, responsible investigators and staff – IRB review and oversight – Data development guidelines ● – ● e.g. Good Epidemiology Practice Information technology Some reasonable rules/guidelines are better than no guideline