Privacy, Confidentiality and Data Security (PCDS) in HSR: Best Practices Alan M. Zaslavsky

advertisement
Privacy, Confidentiality
and Data Security (PCDS) in
HSR: Best Practices
Alan M. Zaslavsky
Department of Health Care Policy
Harvard Medical School
1
Privacy, Confidentiality
and Data Security (PCDS)
• Importance and sensitivity of PCDS
• Basic concepts of disclosure risk
– Deidentification and reidentification
– Disclosure control
• Institutional and regulatory frameworks
– Common Rule, HIPAA, Data use agreements
• File organization, data flow and computer
security
2
• This presentation offered in our department at
least annually
– Required attendance by all programmers,
students, fellow, project managers with data
responsibilities
– Presented to faculty at meetings
– Shortened version for lower-level staff
– Tracking of attendance by personnel manager
– Sanction is loss of computer account
• Seek to fully involve project management in
PCDS issues
3
Definitions
• Privacy: the right of an individual to keep
information about herself or himself from
others.
• Confidentiality: safeguarding, by a
recipient, of information about another
individual
• Disclosure: release (direct or indirect) of
information about an identifiable individual
4
Definitions (continued)
• Data security: protections on data to
prevent unauthorized access or destruction
• Informed consent: a person's agreement to
allow person data to be provided for
research and statistical purposes
• Research: study producing generalizable
knowledge
– excludes internal operations, quality assurance
5
Importance of PCDS
Nexus for balance between
• benefits of information to society
• possible harms of information use to
individuals
in conducting the research enterprise.
One person’s “invasion of privacy” is
another’s “essential use of information.”
6
Inherent conflicts
• Law enforcement / legal process
• General access to research data
– Freedom of Information Act (FOIA)
• Commercial use / beneficial products &
services?
• Prevention of harm
• Need to save data for verification, revision
7
Costs of violations of PCDS
• Damage to subjects
– Material
– Psychological/social
• Damage to the research enterprise
• Exposure to legal/administrative sanctions
for researchers and data providers and their
institutions
8
Direct and indirect identifiers
Key: variable or combination of variables, the value
for which results in a record being unique in the
target and population data
Direct identifier: Information that is uniquely
associated with a person.
Indirect identifier: Data which, in combination are
uniquely associated with a person. Information
which facilitates such associations.
9
Direct Identifiers (keys)
•Name
. •Telephone number
•Street /e-mail address
•Unique features (SSN, Medicare ID, Health
plan, Medical record #, Certificate/License,
voice-finger prints, photos)
10
Re-identification by Matching
De-identification
Original target file
Anonymized target file
Name abcdefghijkl
abcdefghijkl
Re-identification
key
Anonymized target file abcdefghijkl
Population file
abcdefmnop Name
11
Data in Combination
Variables might be identifying in
combination that are not identifying
by themselves
• Month, day and year of birth
• Gender
• Zip code
12
Example of reidentification using
three variables
Variables
list
Birthdate alone
Birthdate + gender
Birthdate + Zip (5)
Birthdate + Zip (9)
% Unique in Maine
state voter registration
12
29
69
97
Sweeney, 1997
13
Population (External) Data Bases
• Voter Registration Lists
• Research files
• State & Federal Files
– Survey files with added administrative data
• Information Vendor Files
• The unknown: what might an “intruder” know
about some or all members of your population?
14
Identifiable population groups
(entire data set highly
identifiable)
• Rare diseases
•Sample drawn from a particular area
15
Unique/unusual cases: rare values
• 110 year-old woman
• Man who weighs 350 pounds
• Income > $100 million
• Verbatim text containing identifying
details
16
Unique/unusual cases: rare
combinations of values
• 16 year-old widow
• 20 year-old Ph.D.
• Asian race in rural mid-west
• Female/Asian Executive
• 60-year old male married to 30 year-old
female
• Cause of death = prostate cancer for 30
year-old male
17
Micro Data Protection 1
• Remove direct identifiers
• Restrict geographical detail
• Code to remove detail – larger categories,
top/bottom coding
• Remove, code or edit verbatim comments
• Case suppression
• Variable suppression
18
Micro Data Protection 2
• Special handling (e.g. coding) of data from
external sources (esp. area data)
• Statistical modification (“noise”)
• Sample/subsample
• Eliminate link between persons and
establishments
19
Tabular data
• Information on individuals deduced from
unique cases in tables
• Reidentification usually related to small
groups, small cell counts
• Rounding, cell suppression, complementary
suppression might be required
20
Disclosure of individual
information from a table
Income
($’000)
<10
10-25
25-50
>50
Cancer type
Colon Lung
Kidney
60
80
0
25
36
0
19
12
2
22
14
0
Breast
24
36
17
35
21
Technical issues
• Highly technical issues in both microdata
and tabular nondisclosure
– Intersection of stats, math, computer science
• Software for detecting disclosure risk
– RTI, m-argus, etc.
• Nontechnical variables
– Resources and intentions of “intruder”
22
Disclosure control in released data
• Affect us as producers and consumers of
data
• Masking
– Affects analyses if performed on data we
receive
– Complex to implement on our releases
• Limited access data centers
23
Restricted access data centers
• Alternative to fully-deidentified public-use
microdata files
• Data are held at restricted center
– Limited set of researchers submit analyses
through intermediaries
– Output reviewed for nondisclosure
• Only feasible for organizations with
substantial, persistent resources
– e.g. NCHS, Census
24
Institutional and regulatory
frameworks for PCDS
•
•
•
•
Common Rule / IRB
HIPAA
Data Use Agreements
State regulations
25
Common Rule
• Governs protection of research subjects in
all Federally-funded research
– IRB evaluates adherence by researcher
– Institutional sanctions for violations
– Many institutions extend to all research
• Objective: protection of subject from harm
– In HSR, often there is no intervention
– Typically, commitment to minimal risk of
disclosure
26
Common Rule (continued)
• Informed consent
– generally required in primary data-collection
– appropriate information about use of data
– might be waived where impractical to obtain (e.g.
intrusive), if risks minimal & rights not injured
• Exemption from (full) review
– No intervention that could harm subject
– Secondary data with no identifiable data
– Requires determination by IRB (but less tedious)
27
Implications for researchers
• Commitments are made
– To subjects: consent language
– To IRB: safeguards promised in IRB
application
– To funding agencies: in grant application
• May involve
– Protection of data while used
– Limits on duration of use
28
HIPAA
Health Insurance Portability and
Accountability Act
• Specific rules for electronic transmission of
health data
– Primarily for efficiency but includes Privacy Rule
• Obligations imposed on health care providers
– Includes direct providers, health plans and insurers
– Research data distinguished from health plan /
provider operational functions
• Researchers must respect these obligations
29
Who is Covered by HIPAA?
• A health care provider who transmits health
information in electronic transactions
Example: a physician or hospital who
electronically bills for services
• A health plan
• A health care clearinghouse
30
HIPAA implications for research
• Practical implications of HIPAA
– What data providers will be looking for
– Need to work around restrictions on content
– More elaborate paths for data control
• HIPAA provisions for releasing data for
research
– fully deidentified
– limited use dataset
– waiver
31
Option 1: De-identified Health
Information
• Completely de-identified information (18 elements
removed) and no knowledge that remaining
information can identify the individual. OR
• Statistically “de-identified” information where a
qualified statistician determines that there is a
“very small risk ” that the information could be
used to identify the individual and documents the
methods and analysis.
32
Removal of These Identifiers
Makes Information De-identified
– Names
– Geographic info (including city and
ZIP)
– Elements of dates (except year)
– Telephone #s
– Fax #s
– E-mail address
– Social Security #
– Medical record, prescription #s
– Health plan beneficiary #s
– Account #s
–
–
–
–
–
–
–
–
Certificate/license #s
VIN and Serial #s, license
plate #s
Device identifiers, serial #s
Web URLs
IP address #s
Biometric identifiers (finger
prints)
Full face, comparable photo
images
Unique identifying #s
If the covered entity has actual knowledge that remaining information can be used to identify the
individual, the information is considered individually identifiable, and therefore, generally is PHI.
33
Option 2: Limited Data Set with
Data Use Agreement
• The Privacy Rule permits limited types of
identifiers to be released for research with
health information (referred to as a Limited
Data Set).
• Limited Data Sets can only be used and
released in accordance with a Data Use
Agreement between the covered entity and
the recipient.
34
Limited Data Set w/ Data Use
Agreement
• The Limited Data Set CAN contain
– Elements of Dates
– City and ZIP
– Other unique identifiers, characteristics and
codes not previously listed as direct identifiers
(previous slide)
• CANNOT contain other direct identifiers
(among the 18)
35
Option 3: Waiver of
Authorization
May use or disclose personal inforamtion for
research if IRB or Privacy Board determines
that :
– research involves no more than minimal risk
– research does not adversely affect the “ rights and
welfare” of subjects
– the research could not be done without a waiver
36
Data Use Agreements (DUA)
• Between data provider and data user
• Restrictions:
– access by specific personnel
– use for a specific reason
– defined duration of retention
• Implements commitments made by data
provider
37
State regulations
• Variable from state to state
• Some are relatively restrictive
– requires negotiation with data provider
38
Iron-clad protection?
• Certificate of Confidentiality
– Issued by DHHS
– Protects data against legal process
– Typically for sensitive topics, e.g. illicit drugs
• O, Canada!
39
Data security in complex projects
• Multisite projects: special needs
• Careful mapping of data flow and access
• Minimal identifying information at each
stage
• Particular care in technical aspects of
security
40
Example of a data flow plan (with security provisions)
41
File management for PCDS
• General practices of good management
– Practices necessary to maintain project continuity
• Well-structured directory organization and
naming
• Include documentation with files
• Separate project data from personal directories
• Separate datasets from programs
• Separate raw data from analytic datasets
42
• We typically follow this presentation with a
15-minute tutorial on good practices for
data and file management
43
Backups
• Conflict of privacy/confidentiality (restrict)
and data security (maintain)
• Basic backup schedule (undeletable)
– All Unix files: 4 month retention
– PC files: 2 month retention
• Project-specific backup: by request
– Only possible if material is properly organized
– Permanent media, physical security
44
• The backup policy described here was
adopted after several months of faculty
discussion
– Computer system managers wanted longer
retention
– Faculty concerned about unexpected discovery
of material intended to be deleted
– Conflicts of DUA requirements with rules
regarding retention of data for verification,
revision of manuscripts, etc.
45
General computer security
• Proper use of computer accounts, only by
authorized individuals
• Secure connections for outside access
– Remote users
– Home or “on road” access via Internet
– Applications can be “tunneled” securely
• Good practices with passwords
• Maintain file permissions to restrict access to
authorized users
46
• We follow this up with a training on
mechanics of computer security
– Permissions, file organization, etc.
• More or less fine-grained tools for
protection of various files
• IT staff included in training
– Responsible for implementing security and data
retention policies for various project datasets
• Teach methods for both Unix and Windows
sides of our system
47
Conclusions
• Know your data
• Be prepared to accommodate restrictions
required by data providers
• Maintain general security
• Seek guidance for tough situations!
48
Download