HIPAA and its Implications on Epidemiological Research Using Large Databases

advertisement
HIPAA and its Implications on
Epidemiological Research Using
Large Databases
K. Arnold Chan, MD, ScD
1
Harvard School of Public Health
Channing Laboratory,
Birgham & Women’s Hospital
and Harvard Medical School
Brief outline of this presentation
Using large linked automated data
for public health research
● Data development processes to
ensure HIPAA-compliance
● Examples
● Some thoughts
●
Two types of data for public health research
●
Primary data
–
–
–
●
Prospectively collected
Well-designed data collection tool
Informed consent
Secondary data
–
–
–
–
Data originally collected for other purposes
May be proprietary
Privacy and confidentiality (particularly
important if no prior authorization)
Different data systems
Large linked healthcare databases
●
Health insurance claims data
– Medicaid
– Medicare
– Managed Care Organizations (MCO)
●
Automated medical records
●
Hospital / Clinic IT systems
●
Availability of written records
●
Need to contact patients / individuals ?
Public health research within MCOs
●
Harvard Community Health Plan
(subsequently became Harvard Pilgrim
HealthCare)
●
Kaiser Permanente (several states)
●
Group Health Cooperative (Seattle area)
●
Others
●
HMO Research Network
–
10+ MCOs across the U.S.
Public health research within MCOs
●
Different types of MCOs
●
Group model
– Staff model
– Different relationship with hospitals
– Implications on data access
MCOs with research programs
–
Separate research departments
– Full-time investigators and support staff
–
Data elements in the MCO data
●
Demographic information
●
Membership
–
●
Office visits
–
●
Start date, termination date, benefit plan, ...
Type of visit, diagnosis(es), special procedures
Special examinations
–
Radiology, Laboratory examinations
●
Hospitalizations
●
Drug dispensings
●
Linkable by a unique ID
HIPAA and Research with Databases
●
●
Authorization from individual research
subjects not feasible
Individual authorization may be waived by
Institutional Review Board or Privacy Board
–
Minimal Risk
–
Data reported in aggregate fashion
●
No single-case report
–
“Minimum necessary” principle
–
De-identification
HIPAA and Research with Databases
●
Single MCO studies
–
●
Multiple-MCO studies
–
●
Investigators and research staff are MCO
employees
May involve transferral of data across MCOs or
to a Data Center
Other types of studies not covered in this
presentation
–
e.g. Generate a de-identified dataset for public or
commercial use
HIPAA and data development
●
Do not move individual level data unless
absolutely necessary
–
Generate summary tables at each study site
–
Combine the tables for final report
–
Smalley et al. Contraindicated use of
cisapride: the impact of an FDA regulatory
action. JAMA 2000; 284: 3036-9.
HIPAA and data development
●
Randomly generated Study ID to replace
True ID
–
Crosswalk between the two stored at secured
location
–
Destroy the crosswalk after successful linkage of
data and quality check
–
Implications for storage and back-up
HIPAA and data development
●
Roll-up / transform variables
–
Age --> Age groups
–
National Drug Code --> Drug or Group of drugs
–
ICD-9 diagnosis code --> Disease
e.g. A man born on Dec 10, 1934 with
diagnosis code xxx.yy received durg 55555333-22
–
65-70 y/o m with Heart Failure received Digoxin
HIPAA and data development
●
Preserve temporal sequence of events
but disguise the real dates
●
e.g. Drug use during pregnancy study
–
29 year-old received 55555-333-22 on Nov 25,
1999 and delivered a baby on Dec 10, 1999
-->
–
26-30 year-old mother delivered in 1999, baby
exposed to amoxicillin at -16 days
HIPAA and data development
●
Only extract information relevant to the study
–
●
e.g. A study of osteoporosis does not require
information on subjects' mental health status
Co-morbid conditions may be relevant
–
Use proxy measures to describe level of
comorbidity
●
●
Charlson's Index (based on concomitant diagnoses)
Chronic Disease Score (based on co-medications)
HIPAA and data development
●
Geocoding
–
Describe social-economic status of study
subjects based on census tract data
–
Send out (Study ID, address) to a geocoding firm
–
(Study ID, X1, X2, X3) returned
●
●
●
X1 : education level
X2 : income level
X3 : race/ethnicity information
An example
Finkelstein et al. Decreasing Antibiotic Use
Among US Children: The Impact of Changing
Diagnosis Patterns. Pediatrics 2003; 112: 620-7.
●
●
Data elements involved
–
Date of birth, gender
–
Membership
–
Drug dispensings
–
Diagnoses in close proximity to antibiotics
dispensings
Data from nine MCOs
Finkelstein et al. Pediatric antibiotics use study
●
●
Data development at each MCO
–
Extract antibiotics use information
–
Extract diagnosis of interest (infections)
–
Use date of birth, gender, and membership data
to calculate person-time of interest
Refined, aggregate data forwarded to the
Data Center
–
Rate of antibiotics use =
# of antibiotics use / 1,000 person-years
for each age-gender group
HIPAA and data development
●
Individual identification is needed for certain
types of research
–
Obtain medical records
–
Contact patient to conduct interview and/or
request specimen
–
Linkage with external data
●
●
Cancer registry
National Death Index
HIPAA and data development
●
●
The process
–
Data extraction, transformation, reduction, and deidentification carried out at each MCO
–
Governed by State laws and local HIPAA-compliant
Standard Operating Procedures
–
Principle of Limited Dataset / Minimum necessary
The goal
–
Highly processed and de-identified data available for
concatenation across study sites and complex analyses
k-anonymity and large datasets
●
The goal
–
A de-identified dataset at a certain level of
individual anonymity
A 43 year-old man with hypertension, diabetes,
and anxiety, taking atenolol, rosiglitazone, and
lorazepam
vs.
A man 40-45 taking a beta-blocker and a
thiazolidenedione
HIPAA, Data Storage and Access
●
Implications on Data Backup Plans
–
●
●
Data need to be destroyed after the report is
published
Data only used to support pre-defined
analyses
Ancillary analysis are possible after IRB
review and approval
Epidemiology studies using large databases
●
●
In the old days ...
–
Give me all the data, do what I say ...
–
What if the investigator / reviewer want to do
THIS analysis ?
–
Use existing datasets to test new hypothesis
Good research practice
–
Define necessary data elements according to
research protocol
–
Pre-defined analytic plan
Epidemiology studies using large databases
●
Keys to protection of human subjects
–
Competent, responsible investigators and staff
–
IRB review and oversight
–
Data development guidelines
●
–
●
e.g. Good Epidemiology Practice
Information technology
Some reasonable rules/guidelines are better
than no guideline
Download