Privacy, Confidentiality and Data Security (PCDS) and Data Security (PCDS) in

advertisement
Privacy, Confidentiality
and Data Security (PCDS)
Privacy, Confidentiality
and Data Security (PCDS) in
HSR: Best Practices
• Importance and sensitivity of PCDS
• Basic concepts of disclosure risk
Alan M. Zaslavsky
Department of Health Care Policy
Harvard Medical School
– Deidentification and reidentification
– Disclosure control
• Institutional and regulatory frameworks
– Common Rule, HIPAA, Data use agreements
• File organization, data flow and computer
security
1
• This presentation offered in our department at
least annually
– Required attendance by all programmers,
students, fellow, project managers with data
responsibilities
– Presented to faculty at meetings
– Shortened version for lower-level staff
– Tracking of attendance by personnel manager
– Sanction is loss of computer account
• Seek to fully involve project management in
PCDS issues
2
Definitions
• Privacy: the right of an individual to keep
information about herself or himself from
others.
• Confidentiality: safeguarding, by a
recipient, of information about another
individual
• Disclosure: release (direct or indirect) of
information about an identifiable individual
3
Definitions (continued)
Importance of PCDS
Nexus for balance between
• benefits of information to society
• possible harms of information use to
individuals
in conducting the research enterprise.
• Data security: protections on data to
prevent unauthorized access or destruction
• Informed consent: a person's agreement to
allow person data to be provided for
research and statistical purposes
• Research: study producing generalizable
knowledge
One person’s “invasion of privacy” is
another’s “essential use of information.”
– excludes internal operations, quality assurance
5
Zaslavsky: Privacy, Confidentiality,
Data Security: Introduction
4
6
1
Inherent conflicts
Costs of violations of PCDS
• Law enforcement / legal process
• General access to research data
• Damage to subjects
– Material
– Psychological/social
– Freedom of Information Act (FOIA)
• Commercial use / beneficial products &
services?
• Prevention of harm
• Need to save data for verification, revision
• Damage to the research enterprise
• Exposure to legal/administrative sanctions
for researchers and data providers and their
institutions
7
Direct and indirect identifiers
8
Direct Identifiers (keys)
Key: variable or combination of variables, the value
for which results in a record being unique in the
target and population data
Direct identifier: Information that is uniquely
associated with a person.
Indirect identifier: Data which, in combination are
uniquely associated with a person. Information
which facilitates such associations.
•Name
. •Telephone number
•Street /e-mail address
•Unique features (SSN, Medicare ID, Health
plan, Medical record #, Certificate/License,
voice-finger prints, photos)
9
Data in Combination
Re-identification by Matching
De-identification
Original target file
Anonymized target file
Variables might be identifying in
combination that are not identifying
by themselves
Name abcdefghijkl
abcdefghijkl
• Month, day and year of birth
• Gender
• Zip code
Re-identification
key
Anonymized target file abcdefghijkl
Population file
abcdefmnop Name
11
Zaslavsky: Privacy, Confidentiality,
Data Security: Introduction
10
12
2
Population (External) Data Bases
Example of reidentification using
three variables
Variables
list
Birthdate alone
Birthdate + gender
Birthdate + Zip (5)
Birthdate + Zip (9)
% Unique in Maine
state voter registration
• Voter Registration Lists
• Research files
• State & Federal Files
– Survey files with added administrative data
12
29
69
97
• Information Vendor Files
• The unknown: what might an “intruder” know
about some or all members of your population?
Sweeney, 1997
13
Identifiable population groups
(entire data set highly
identifiable)
14
Unique/unusual cases: rare values
• 110 year-old woman
• Man who weighs 350 pounds
• Income > $100 million
• Rare diseases
•Sample drawn from a particular area
• Verbatim text containing identifying
details
15
Unique/unusual cases: rare
combinations of values
Micro Data Protection 1
• 16 year-old widow
• 20 year-old Ph.D.
• Asian race in rural mid-west
• Female/Asian Executive
• 60-year old male married to 30 year-old
female
• Cause of death = prostate cancer for 30
year-old male
17
Zaslavsky: Privacy, Confidentiality,
Data Security: Introduction
16
• Remove direct identifiers
• Restrict geographical detail
• Code to remove detail – larger categories,
top/bottom coding
• Remove, code or edit verbatim comments
• Case suppression
• Variable suppression
18
3
Micro Data Protection 2
Tabular data
• Special handling (e.g. coding) of data from
external sources (esp. area data)
• Statistical modification (“noise”)
• Sample/subsample
• Eliminate link between persons and
establishments
• Information on individuals deduced from
unique cases in tables
• Reidentification usually related to small
groups, small cell counts
• Rounding, cell suppression, complementary
suppression might be required
19
Disclosure of individual
information from a table
Income
($’000)
<10
10-25
25-50
>50
Cancer type
Colon Lung
Kidney
60
80
0
25
36
0
19
12
2
22
14
0
20
Technical issues
• Highly technical issues in both microdata
and tabular nondisclosure
– Intersection of stats, math, computer science
Breast
24
36
17
35
• Software for detecting disclosure risk
– RTI, μ-argus, etc.
• Nontechnical variables
– Resources and intentions of “intruder”
21
Disclosure control in released data
22
Restricted access data centers
• Affect us as producers and consumers of
data
• Masking
• Alternative to fully-deidentified public-use
microdata files
• Data are held at restricted center
– Affects analyses if performed on data we
receive
– Complex to implement on our releases
– Limited set of researchers submit analyses
through intermediaries
– Output reviewed for nondisclosure
• Only feasible for organizations with
substantial, persistent resources
• Limited access data centers
– e.g. NCHS, Census
23
Zaslavsky: Privacy, Confidentiality,
Data Security: Introduction
24
4
Common Rule
Institutional and regulatory
frameworks for PCDS
•
•
•
•
• Governs protection of research subjects in
all Federally-funded research
Common Rule / IRB
HIPAA
Data Use Agreements
State regulations
– IRB evaluates adherence by researcher
– Institutional sanctions for violations
– Many institutions extend to all research
• Objective: protection of subject from harm
– In HSR, often there is no intervention
– Typically, commitment to minimal risk of
disclosure
25
Common Rule (continued)
26
Implications for researchers
• Informed consent
– generally required in primary data-collection
– appropriate information about use of data
– might be waived where impractical to obtain (e.g.
intrusive), if risks minimal & rights not injured
• Exemption from (full) review
– No intervention that could harm subject
– Secondary data with no identifiable data
– Requires determination by IRB (but less tedious)
• Commitments are made
– To subjects: consent language
– To IRB: safeguards promised in IRB
application
– To funding agencies: in grant application
• May involve
– Protection of data while used
– Limits on duration of use
27
HIPAA
28
Who is Covered by HIPAA?
Health Insurance Portability and
Accountability Act
• Specific rules for electronic transmission of
health data
• A health care provider who transmits health
information in electronic transactions
– Primarily for efficiency but includes Privacy Rule
• Obligations imposed on health care providers
– Includes direct providers, health plans and insurers
– Research data distinguished from health plan /
provider operational functions
Example: a physician or hospital who
electronically bills for services
• A health plan
• A health care clearinghouse
• Researchers must respect these obligations
29
Zaslavsky: Privacy, Confidentiality,
Data Security: Introduction
30
5
HIPAA implications for research
• Practical implications of HIPAA
Option 1: De-identified Health
Information
• Completely de-identified information (18 elements
removed) and no knowledge that remaining
information can identify the individual. OR
• Statistically “de-identified” information where a
qualified statistician determines that there is a
“very small risk ” that the information could be
used to identify the individual and documents the
methods and analysis.
– What data providers will be looking for
– Need to work around restrictions on content
– More elaborate paths for data control
• HIPAA provisions for releasing data for
research
– fully deidentified
– limited use dataset
– waiver
31
Removal of These Identifiers
Makes Information De-identified
– Names
– Geographic info (including city and
ZIP)
– Elements of dates (except year)
– Telephone #s
– Fax #s
– E-mail address
– Social Security #
– Medical record, prescription #s
– Health plan beneficiary #s
– Account #s
–
–
–
–
–
–
–
–
Certificate/license #s
VIN and Serial #s, license
plate #s
Device identifiers, serial #s
Web URLs
IP address #s
Biometric identifiers (finger
prints)
Full face, comparable photo
images
Unique identifying #s
32
Option 2: Limited Data Set with
Data Use Agreement
• The Privacy Rule permits limited types of
identifiers to be released for research with
health information (referred to as a Limited
Data Set).
• Limited Data Sets can only be used and
released in accordance with a Data Use
Agreement between the covered entity and
the recipient.
If the covered entity has actual knowledge that remaining information can be used to identify the
individual, the information is considered individually identifiable, and therefore, generally is PHI.
33
Limited Data Set w/ Data Use
Agreement
• The Limited Data Set CAN contain
Option 3: Waiver of
Authorization
May use or disclose personal inforamtion for
research if IRB or Privacy Board determines
that :
– Elements of Dates
– City and ZIP
– Other unique identifiers, characteristics and
codes not previously listed as direct identifiers
(previous slide)
– research involves no more than minimal risk
– research does not adversely affect the “ rights and
welfare” of subjects
– the research could not be done without a waiver
• CANNOT contain other direct identifiers
(among the 18)
35
Zaslavsky: Privacy, Confidentiality,
Data Security: Introduction
34
36
6
Data Use Agreements (DUA)
State regulations
• Between data provider and data user
• Restrictions:
• Variable from state to state
• Some are relatively restrictive
– access by specific personnel
– use for a specific reason
– defined duration of retention
– requires negotiation with data provider
• Implements commitments made by data
provider
37
Iron-clad protection?
38
Data security in complex projects
• Certificate of Confidentiality
• Multisite projects: special needs
• Careful mapping of data flow and access
• Minimal identifying information at each
stage
• Particular care in technical aspects of
security
– Issued by DHHS
– Protects data against legal process
– Typically for sensitive topics, e.g. illicit drugs
• O, Canada!
39
40
Example of a data flow plan (with security provisions)
File management for PCDS
• General practices of good management
– Practices necessary to maintain project continuity
• Well-structured directory organization and
naming
• Include documentation with files
• Separate project data from personal directories
• Separate datasets from programs
• Separate raw data from analytic datasets
41
Zaslavsky: Privacy, Confidentiality,
Data Security: Introduction
42
7
Backups
• We typically follow this presentation with a
15-minute tutorial on good practices for
data and file management
• Conflict of privacy/confidentiality (restrict)
and data security (maintain)
• Basic backup schedule (undeletable)
– All Unix files: 4 month retention
– PC files: 2 month retention
• Project-specific backup: by request
– Only possible if material is properly organized
– Permanent media, physical security
43
• The backup policy described here was
adopted after several months of faculty
discussion
44
General computer security
– Computer system managers wanted longer
retention
– Faculty concerned about unexpected discovery
of material intended to be deleted
– Conflicts of DUA requirements with rules
regarding retention of data for verification,
revision of manuscripts, etc.
• Proper use of computer accounts, only by
authorized individuals
• Secure connections for outside access
– Remote users
– Home or “on road” access via Internet
– Applications can be “tunneled” securely
• Good practices with passwords
• Maintain file permissions to restrict access to
authorized users
45
• We follow this up with a training on
mechanics of computer security
46
Conclusions
– Permissions, file organization, etc.
• More or less fine-grained tools for
protection of various files
• IT staff included in training
– Responsible for implementing security and data
retention policies for various project datasets
• Know your data
• Be prepared to accommodate restrictions
required by data providers
• Maintain general security
• Seek guidance for tough situations!
• Teach methods for both Unix and Windows
sides of our system
47
Zaslavsky: Privacy, Confidentiality,
Data Security: Introduction
48
8
Download