Data Privacy in Biomedicine Lecture 1: Introduction and Overview

advertisement
Data Privacy in Biomedicine
Lecture 1: Introduction and Overview
Bradley Malin, PhD (b.malin@vanderbilt.edu)
Associate Professor of Biomedical Informatics, School of Medicine
Associate Professor of Computer Science, School of Engineering
Vanderbilt University
January 11, 2016
Data Privacy in Biomedicine: Lecture 1
© 2016 Bradley Malin
2
Data Privacy in Biomedicine: Lecture 1
© 2016 Bradley Malin
3
Data Privacy in Biomedicine: Lecture 1
© 2016 Bradley Malin
4
Data Privacy in Biomedicine: Lecture 1
© 2016 Bradley Malin
5
Data Privacy in Biomedicine: Lecture 1
© 2016 Bradley Malin
6
1
April 2011
Data Privacy in Biomedicine: Lecture 1
© 2016 Bradley Malin
7
http://www.nytimes.com/2014/05/16/us/us-mines-personal-health-data-to-aid-emergency-response
© 2016 Bradley Malin
Nov 2014
© 2016 Bradley Malin
9
Data Privacy in Biomedicine: Lecture 1
© 2016 Bradley Malin
http://www.thestar.com/news/gta/2014/12/26/star_investigation_3_gta_hospitals_dont_proactively_audit_a
ccesstopatient_files.html
http://healthitsecurity.com/2014/12/23/is-patient-privacy-violated-with-ny-database/
Dec 2014
Dec 2014
Data Privacy in Biomedicine: Lecture 1
8
http://www.nytimes.com/2014/11/09/sunday-review/medical-records-top-secret.html
May 2014
Data Privacy in Biomedicine: Lecture 1
Data Privacy in Biomedicine: Lecture 1
© 2016 Bradley Malin
11
Data Privacy in Biomedicine: Lecture 1
© 2016 Bradley Malin
10
12
2
http://www.nytimes.com/2015/12/24/nyregion/a-patient-is-sued-and-his-mental-health-diagnosisbecomes-public.html
Welcome to
Data Privacy in Biomedicine
Dec 2015
For CS : You’re sitting in 8396-02
For Informatics: You’re sitting in 7380
When: Mondays and Wednesdays, 3:10 – 4:25pm
Where: Featheringill Hall, Room 313
Office Hours: Upon Request
Contact: b.malin@vanderbilt.edu
Data Privacy in Biomedicine: Lecture 1

© 2016 Bradley Malin





Sample Research Areas




Medical Record Access Control,
Mining, and Modeling
Anonymization of Medical &
Genomic Data
Service-Oriented Clinical
Information Systems Design
Big Data Record Linkage
Data Privacy in Biomedicine: Lecture 1
14
More to come (projects, homeworks,
etc.) – links will be available from front
page of website
© 2016 Bradley Malin
15
Data Privacy in Biomedicine: Lecture 1
Course Objectives

© 2016 Bradley Malin
BS, Molecular Biology
MS, Computer Science (Data Mining)
MPhil, Public Policy & Management
PhD, Computer Science (Computation, Organizations & Society)
Faculty Member: DBMI (1st), EECS (2nd)
Directs: Health Data Science Center(hiplab.org)
Directs: Health Information Privacy Laboratory (hiplab.org)

Data Privacy in Biomedicine: Lecture 1
http://www.hiplab.org/courses/BMIF380
Your Professor is Brad Malin


13

You are expected to be competent in an
object oriented programming language
(Java, C++, Python, …)

You are expected to have a working
knowledge of the Internet, word processing,
and basic databases (Access, Oracle,
MySQL, PostGres) and analysis tools (R,
Matlab, Scala, Excel)
 Data
Detectives: Understand how seemingly private
information, can be discovered (or exploited) using
automated strategies.
 Data
Protectors: Construct privacy protection
technologies that provide formal computational
guarantees of privacy in disclosed databases.
 Technology
Policy Designers: Develop privacy
protection technologies that complement policy
regulations.
© 2016 Bradley Malin
16
Expectations
After this course, you should be able to analyze
data privacy from three non-exclusive
perspectives:
Data Privacy in Biomedicine: Lecture 1
© 2016 Bradley Malin
17
Data Privacy in Biomedicine: Lecture 1
© 2016 Bradley Malin
18
3
Grading
Beyond Expectations

This is a research-oriented course. There
are no exams.
 A substantial portion of your grade will be
based on your “final” project.

You have experience in
 information
 data
security
structures, algorithms, and statistics
 public
Criteria
Final Project
Homework Assignments
Reading Summaries
Class Participation
policy and legal frameworks
Data Privacy in Biomedicine: Lecture 1
© 2016 Bradley Malin
19
Data Privacy in Biomedicine: Lecture 1
Unless the assignment calls for a group project,
please do your own homework.

You can discuss the homework with other
students, including the ways in which you
approach the solutions to the questions, but the
final submission must be your own.






Do not plagiarize without proper attribution – not
even in your reading summaries, which leads
me to…
Data Privacy in Biomedicine: Lecture 1
© 2016 Bradley Malin

21
You may design your own project or choose from
a predefined set of topics (will be available on the
course website later in the semester)

Do not be afraid to discuss your project ideas
with the instructor!
Data Privacy in Biomedicine: Lecture 1
© 2016 Bradley Malin
Submit summaries to b.malin@vanderbilt.edu before class
© 2016 Bradley Malin
22
Sample Topics
Your project should be an independent study on
a data privacy issue, with relationship to the area
of biology, medicine, or health more generally

- : You skimmed the reading and barely understood its meaning
 : You read the reading and provided a reasonable account of its
contents
 +: You demonstrated critical reasoning and insight regarding the
topic
Data Privacy in Biomedicine: Lecture 1
Final Projects

20
There is no textbook for this course.
Assigned readings will be available the lecture before it is
due (at the latest).
Your summaries should be no more than 1 page in length
Summaries will be graded on a {-, , +} scale


© 2016 Bradley Malin
Reading Summaries
Homework Policy

% of Total Grade
50%
30%
10%
10%
23

Access Control Frameworks for Distributed Medical Record Systems

Surveillance of Electronic Medical Record Accesses for Suspicious
Behavior

Evaluation and Design of Privacy Technologies for Personal Health
Records (See Microsoft HealthVault Initiative)

Finding & Relating Publicly Available Repositories of Person
Specific Biomedical Information

Fast Graph-Based Approaches to Data Re-identification

Building and Evaluating Clinical Text De-identification Tools

Anonymization of clinical profiles / sets of diagnoses
Data Privacy in Biomedicine: Lecture 1
© 2016 Bradley Malin
24
4
Final Projects
Criteria
Due Date
% of
Grade
Project Proposal: A one-pager that describes the
project area and how you intend to address the
research within the confines of this semester. This
will be broken down into a several phases.
March 20
5%
Status Report Presentation: Briefing for the class
on project area and first phase of research. (No more
than 5 minutes)
March 30
5%
Written Project Status Report: A summary of the
progress you have made (No more than 4 pages).
April 3
10%
Final Project Presentation: Showcase of research
methods and results. (No more than 15 minutes)
April 25 (last
day of class)
5%
Final Project Report: This will be in the form of a
conference-style paper. It will summarize the
research area, your methodology, experience, and
contributions of your work.
May 1 (in lieu
of final)
25%
Data Privacy in Biomedicine: Lecture 1
© 2016 Bradley Malin
Why Do We Need A Course on
Privacy?
25
Data Privacy in Biomedicine: Lecture 1
Security for Privacy?

Authorization: permission and rolebased models to read/write data


Encryption: to avoid eavesdropping
during transmission and storage

© 2016 Bradley Malin
26
Security for Privacy?
Authentication: login with password,
tokens, keys
Data Privacy in Biomedicine: Lecture 1
© 2016 Bradley Malin
27
Authentication
Authorization
Encryption
But Data Can Re-identify!
Data Privacy in Biomedicine: Lecture 1
Can I see some
anonymous
data?
© 2016 Bradley Malin
28
Data Privacy Definitions
Security for Privacy?
(paraphrase Sweeney)
• The study of computational solutions for
releasing data such that a) the data is practically
useful (utility) while b) the aspects of the subjects
of the data are not revealed (privacy).
• Privacy Protection (“data protectors”):




• release information such that entity-specific
properties (e.g. identity) are controlled
• restrict what can be learned
Authentication
Authorization
Encryption
But Data Can Re-identify!
Data Privacy in Biomedicine: Lecture 1
• Data Linkage (“data detectives”)
• combining disparate pieces of entity-specific
information to learn more about an entity
Ah! I know
who this is!
© 2016 Bradley Malin
29
Data Privacy in Biomedicine: Lecture 1
© 2016 Bradley Malin
30
5
A Visual Perspective
A Visual Perspective
To ensure utility, you
must reveal all the
data
Utility
Utility
Privacy
Data Privacy in Biomedicine: Lecture 1
© 2016 Bradley Malin
Privacy
31
Data Privacy in Biomedicine: Lecture 1
A Visual Perspective
© 2016 Bradley Malin
32
A Visual Perspective
To ensure privacy, you
must not reveal any
data
Utility
Here
Lives
Data Privacy
Utility
Privacy
Data Privacy in Biomedicine: Lecture 1
© 2016 Bradley Malin
Privacy
33
Data Privacy in Biomedicine: Lecture 1
© 2016 Bradley Malin
34
Privacy, Policy, & Preference
A Visual Perspective

Individuals want control over who can – AND CAN NOT
– view their health-related records
Yoda
Data
Utility
Physicians
Yoda
Data
Yoda
Data
Insurance
Yoda’s Preferences
Physicians = Yes
Insurance = Yes
Researchers = No
Privacy
Data Privacy in Biomedicine: Lecture 1
© 2016 Bradley Malin
35
HOSPITAL RECORDS
Data Privacy in Biomedicine: Lecture 1
Researchers
© 2016 Bradley Malin
36
6
Enabling Personal Control
Through Technology
Data Collection, Policy, and Privacy

Can design technology to:

Preventing Data Collection in Public Spaces
 http://www.appliedautonomy.com/isee.html
 Standardize
policy specification
 Inform about data collection
 Address specific privacy concerns in data
sharing
Anonymity
Confidentiality
 Solitude


Data Privacy in Biomedicine: Lecture 1
© 2016 Bradley Malin
37
Commons Center
Beyond Policy and
Informative Technology

We can not always control who gets, and has
access to, our information

Legally, however, data collectors may be
required to maintain your privacy
Data Privacy in Biomedicine: Lecture 1
© 2016 Bradley Malin
38
Ingram Commons
Data Collection occurs everywhere, everyday, in all different forms
(http://webcams.vanderbilt.edu/)
Data Privacy in Biomedicine: Lecture 1
© 2016 Bradley Malin
39
Featheringill!
Data Privacy in Biomedicine: Lecture 1
© 2016 Bradley Malin
40
More Cameras
Peabody Library Terrace
http://peabody.vanderbilt.edu/about/webcams/peabody_library_terrace_webcam.php
Even More Cameras!
http://peabody.vanderbilt.edu/about/webcams/index.php
http://webcams.vanderbilt.edu/kissam/
Data Privacy in Biomedicine: Lecture 1
© 2016 Bradley Malin
41
Data Privacy in Biomedicine: Lecture 1
© 2016 Bradley Malin
42
7
Biomedical Information
Data Privacy in Biomedicine: Lecture 1
© 2016 Bradley Malin
43

Not quite in the public

But… information is
shared for various
purposes in various
contexts

How do you protect
privacy of corresponding
individuals?
Data Privacy in Biomedicine: Lecture 1
44
Privacy Policy & the Law
(Week 1)
Schedule

© 2016 Bradley Malin
Let’s look at the syllabus.

Privacy ideologies &
frameworks
 Who gets
to collect
information?
 When is health information
shared?
 How is health information
reused and why?
Data Privacy in Biomedicine: Lecture 1
© 2016 Bradley Malin
45
Data Privacy in Biomedicine: Lecture 1
Access Control & Roles (Week 2)

46
Auditing (Week 3)
Access Control




© 2016 Bradley Malin
Who gets to see information when?
Roles, job functions, & permissions
Formal representations
What Constitutes a “Good” Role


Representation of organizational
behavior
Grouping users based on legacy
knowledge provided by system
administrators

Medical Records & Audits



Data Privacy in Biomedicine: Lecture 1
© 2016 Bradley Malin
47
Who looked at my medical record?
Should they be looking?
How do we construct machine learning strategies that make sense?
Data Privacy in Biomedicine: Lecture 1
© 2016 Bradley Malin
48
8
Identifiability & A Whole Lot of Data
(Weeks 4 - 5)
Identifiability & A Whole Lot of Data
(Weeks 4 - 5)

When can we find what was
suppressed?




How can we suppress “identifiers” from data?


Why does clinical treatment context influence de-identification strategies?
Data Privacy in Biomedicine: Lecture 1
© 2016 Bradley Malin

49

© 2016 Bradley Malin

Given all the data, how can we link it?

Look at “deterministic” methods, such as rules


51
Look at probabilistic methods based on frequentist and Bayesian
statistics
© 2016 Bradley Malin
52
Spring Break!!!
If de-identification fails, can we provably protect identity?


50
What are the idiosynchrasies in the health domain that allow them to
work? Enable them to fail?
Data Privacy in Biomedicine: Lecture 1
Anonymization (Week 8 - 9)

© 2016 Bradley Malin
Record Linkage (Week 7)
How can we detect and suppress “identifiers”
from unstructured data (e.g., clinical narratives)?
Welcome to the wonderful world of natural
language processing
Data Privacy in Biomedicine: Lecture 1
And there’s alot more data than
what’s in the medical record!
We’ll use Social Security
Numbers as an example
Data Privacy in Biomedicine: Lecture 1
De-identification & Scrubbing Narratives
(Week 6)

It’s hard to protect health
information… simply because
there’s so much data

Structured data (e.g., database tables)
Using sample & population
statistics to identify
Techniques to compute
distinguishability
Yes we can! Welcome to formal models of anonymization

No class on March 7 or 9
We’ll be looking at
k-based models



Guarantee every shared
record corresponds to at
least k people
Efficient algorithms to
achieve this goal
Various “types” of data
Data Privacy in Biomedicine: Lecture 1
© 2016 Bradley Malin
53
Data Privacy in Biomedicine: Lecture 1
© 2016 Bradley Malin
54
9
Advanced Topics in Anonymization
(Week 10 - 11)
Ethical Reasoning and Privacy
(Week 9)

Hiding in a crowd doesn’t always
protect sensitive knowledge




We’ll look at the identifiability
concerns associated highdimensional data with a focus on:

When should you publish on privacy ad vulnerabilities?
Should you disseminate re-identification software or
findings?
Data Privacy in Biomedicine: Lecture 1
© 2016 Bradley Malin
Images are everywhere in healthcare

Video is becoming more prevalent

How can we remove identifiers from JPEG,
MPEG and other multimedia?
Data Privacy in Biomedicine: Lecture 1
© 2016 Bradley Malin

55

57
56
Location data is shared for
various purposes, but too
much granularity can lead to
identification

How does identification occur?

What anonymization strategies
work for geocoded and spatial
data? When?
© 2016 Bradley Malin
58
Private Record Linkage &
Information Retrieval (Week 14)
Consider “horizontal”
(different people different
place) vs. “vertical” (same
person, different place)
partitioned data systems
© 2016 Bradley Malin

Data Privacy in Biomedicine: Lecture 1

Multiparty computation isn’t always
efficient

We’ll look at two special problems
from earlier in the semester and see if
they can be solved with relaxed
assumptions of the “adversaries”

We’ll define the conditions under
which these solutions work and show
why.
We’ll look at cryptographic
methods for secure
multiparty computation.
Data Privacy in Biomedicine: Lecture 1
© 2016 Bradley Malin
Epidemiology and Geospatial
Privacy (Week 13)
You have data. I have data.
We all have data. How can
be combine data to reveal
results, but no individual
records?

Statistical methods for identification
Strategies for anonymization of DNA data
Data Privacy in Biomedicine: Lecture 1
Privacy Preserving Data Mining
(Week 14)

genetic information

Image and Video Privacy
(Weeks 12 - 13)

How to design algorithms to protect
against homogeneity and inference
attacks.
59
Data Privacy in Biomedicine: Lecture 1
© 2016 Bradley Malin
60
10
Readings for Next Lecture
Final Project Presentations!
(Week 15)


The students
are in control
S. Warren and L. Brandeis. The right to privacy. Harvard Law Review.
1890; V. IV, No. 5.


Department of Health and Human Services Summary of the Privacy Rule of
the Health Information Portability and Accountability Act (HIPAA)


You’ll be graded
by a committee
of special
reviewers
Data Privacy in Biomedicine: Lecture 1

http://faculty.uml.edu/sgallagher/Brandeisprivacy.htm
http://www.hhs.gov/ocr/privacy/hipaa/understanding/summary/index.html
Optional

D. McGraw, J. Dempsey, L. Haris, and J. Goldman. Privacy as an enabler, not an
impediment: building trust into health information exchange. Health Affairs. 2009;
28(2): 416-427.


© 2016 Bradley Malin
61
http://content.healthaffairs.org/content/28/2/416.full.pdf+html
O. Tene and J. Polonetsky. Privacy in the age of big data: a time for big decisions.
Stanford Law Review (Online). 2012; 64: 63.
Data Privacy in Biomedicine: Lecture 1
© 2016 Bradley Malin
62
11
Download