Data Privacy in Biomedicine Lecture 1: Introduction and Overview Bradley Malin, PhD (b.malin@vanderbilt.edu) Associate Professor of Biomedical Informatics, School of Medicine Associate Professor of Computer Science, School of Engineering Vanderbilt University January 11, 2016 Data Privacy in Biomedicine: Lecture 1 © 2016 Bradley Malin 2 Data Privacy in Biomedicine: Lecture 1 © 2016 Bradley Malin 3 Data Privacy in Biomedicine: Lecture 1 © 2016 Bradley Malin 4 Data Privacy in Biomedicine: Lecture 1 © 2016 Bradley Malin 5 Data Privacy in Biomedicine: Lecture 1 © 2016 Bradley Malin 6 1 April 2011 Data Privacy in Biomedicine: Lecture 1 © 2016 Bradley Malin 7 http://www.nytimes.com/2014/05/16/us/us-mines-personal-health-data-to-aid-emergency-response © 2016 Bradley Malin Nov 2014 © 2016 Bradley Malin 9 Data Privacy in Biomedicine: Lecture 1 © 2016 Bradley Malin http://www.thestar.com/news/gta/2014/12/26/star_investigation_3_gta_hospitals_dont_proactively_audit_a ccesstopatient_files.html http://healthitsecurity.com/2014/12/23/is-patient-privacy-violated-with-ny-database/ Dec 2014 Dec 2014 Data Privacy in Biomedicine: Lecture 1 8 http://www.nytimes.com/2014/11/09/sunday-review/medical-records-top-secret.html May 2014 Data Privacy in Biomedicine: Lecture 1 Data Privacy in Biomedicine: Lecture 1 © 2016 Bradley Malin 11 Data Privacy in Biomedicine: Lecture 1 © 2016 Bradley Malin 10 12 2 http://www.nytimes.com/2015/12/24/nyregion/a-patient-is-sued-and-his-mental-health-diagnosisbecomes-public.html Welcome to Data Privacy in Biomedicine Dec 2015 For CS : You’re sitting in 8396-02 For Informatics: You’re sitting in 7380 When: Mondays and Wednesdays, 3:10 – 4:25pm Where: Featheringill Hall, Room 313 Office Hours: Upon Request Contact: b.malin@vanderbilt.edu Data Privacy in Biomedicine: Lecture 1 © 2016 Bradley Malin Sample Research Areas Medical Record Access Control, Mining, and Modeling Anonymization of Medical & Genomic Data Service-Oriented Clinical Information Systems Design Big Data Record Linkage Data Privacy in Biomedicine: Lecture 1 14 More to come (projects, homeworks, etc.) – links will be available from front page of website © 2016 Bradley Malin 15 Data Privacy in Biomedicine: Lecture 1 Course Objectives © 2016 Bradley Malin BS, Molecular Biology MS, Computer Science (Data Mining) MPhil, Public Policy & Management PhD, Computer Science (Computation, Organizations & Society) Faculty Member: DBMI (1st), EECS (2nd) Directs: Health Data Science Center(hiplab.org) Directs: Health Information Privacy Laboratory (hiplab.org) Data Privacy in Biomedicine: Lecture 1 http://www.hiplab.org/courses/BMIF380 Your Professor is Brad Malin 13 You are expected to be competent in an object oriented programming language (Java, C++, Python, …) You are expected to have a working knowledge of the Internet, word processing, and basic databases (Access, Oracle, MySQL, PostGres) and analysis tools (R, Matlab, Scala, Excel) Data Detectives: Understand how seemingly private information, can be discovered (or exploited) using automated strategies. Data Protectors: Construct privacy protection technologies that provide formal computational guarantees of privacy in disclosed databases. Technology Policy Designers: Develop privacy protection technologies that complement policy regulations. © 2016 Bradley Malin 16 Expectations After this course, you should be able to analyze data privacy from three non-exclusive perspectives: Data Privacy in Biomedicine: Lecture 1 © 2016 Bradley Malin 17 Data Privacy in Biomedicine: Lecture 1 © 2016 Bradley Malin 18 3 Grading Beyond Expectations This is a research-oriented course. There are no exams. A substantial portion of your grade will be based on your “final” project. You have experience in information data security structures, algorithms, and statistics public Criteria Final Project Homework Assignments Reading Summaries Class Participation policy and legal frameworks Data Privacy in Biomedicine: Lecture 1 © 2016 Bradley Malin 19 Data Privacy in Biomedicine: Lecture 1 Unless the assignment calls for a group project, please do your own homework. You can discuss the homework with other students, including the ways in which you approach the solutions to the questions, but the final submission must be your own. Do not plagiarize without proper attribution – not even in your reading summaries, which leads me to… Data Privacy in Biomedicine: Lecture 1 © 2016 Bradley Malin 21 You may design your own project or choose from a predefined set of topics (will be available on the course website later in the semester) Do not be afraid to discuss your project ideas with the instructor! Data Privacy in Biomedicine: Lecture 1 © 2016 Bradley Malin Submit summaries to b.malin@vanderbilt.edu before class © 2016 Bradley Malin 22 Sample Topics Your project should be an independent study on a data privacy issue, with relationship to the area of biology, medicine, or health more generally - : You skimmed the reading and barely understood its meaning : You read the reading and provided a reasonable account of its contents +: You demonstrated critical reasoning and insight regarding the topic Data Privacy in Biomedicine: Lecture 1 Final Projects 20 There is no textbook for this course. Assigned readings will be available the lecture before it is due (at the latest). Your summaries should be no more than 1 page in length Summaries will be graded on a {-, , +} scale © 2016 Bradley Malin Reading Summaries Homework Policy % of Total Grade 50% 30% 10% 10% 23 Access Control Frameworks for Distributed Medical Record Systems Surveillance of Electronic Medical Record Accesses for Suspicious Behavior Evaluation and Design of Privacy Technologies for Personal Health Records (See Microsoft HealthVault Initiative) Finding & Relating Publicly Available Repositories of Person Specific Biomedical Information Fast Graph-Based Approaches to Data Re-identification Building and Evaluating Clinical Text De-identification Tools Anonymization of clinical profiles / sets of diagnoses Data Privacy in Biomedicine: Lecture 1 © 2016 Bradley Malin 24 4 Final Projects Criteria Due Date % of Grade Project Proposal: A one-pager that describes the project area and how you intend to address the research within the confines of this semester. This will be broken down into a several phases. March 20 5% Status Report Presentation: Briefing for the class on project area and first phase of research. (No more than 5 minutes) March 30 5% Written Project Status Report: A summary of the progress you have made (No more than 4 pages). April 3 10% Final Project Presentation: Showcase of research methods and results. (No more than 15 minutes) April 25 (last day of class) 5% Final Project Report: This will be in the form of a conference-style paper. It will summarize the research area, your methodology, experience, and contributions of your work. May 1 (in lieu of final) 25% Data Privacy in Biomedicine: Lecture 1 © 2016 Bradley Malin Why Do We Need A Course on Privacy? 25 Data Privacy in Biomedicine: Lecture 1 Security for Privacy? Authorization: permission and rolebased models to read/write data Encryption: to avoid eavesdropping during transmission and storage © 2016 Bradley Malin 26 Security for Privacy? Authentication: login with password, tokens, keys Data Privacy in Biomedicine: Lecture 1 © 2016 Bradley Malin 27 Authentication Authorization Encryption But Data Can Re-identify! Data Privacy in Biomedicine: Lecture 1 Can I see some anonymous data? © 2016 Bradley Malin 28 Data Privacy Definitions Security for Privacy? (paraphrase Sweeney) • The study of computational solutions for releasing data such that a) the data is practically useful (utility) while b) the aspects of the subjects of the data are not revealed (privacy). • Privacy Protection (“data protectors”): • release information such that entity-specific properties (e.g. identity) are controlled • restrict what can be learned Authentication Authorization Encryption But Data Can Re-identify! Data Privacy in Biomedicine: Lecture 1 • Data Linkage (“data detectives”) • combining disparate pieces of entity-specific information to learn more about an entity Ah! I know who this is! © 2016 Bradley Malin 29 Data Privacy in Biomedicine: Lecture 1 © 2016 Bradley Malin 30 5 A Visual Perspective A Visual Perspective To ensure utility, you must reveal all the data Utility Utility Privacy Data Privacy in Biomedicine: Lecture 1 © 2016 Bradley Malin Privacy 31 Data Privacy in Biomedicine: Lecture 1 A Visual Perspective © 2016 Bradley Malin 32 A Visual Perspective To ensure privacy, you must not reveal any data Utility Here Lives Data Privacy Utility Privacy Data Privacy in Biomedicine: Lecture 1 © 2016 Bradley Malin Privacy 33 Data Privacy in Biomedicine: Lecture 1 © 2016 Bradley Malin 34 Privacy, Policy, & Preference A Visual Perspective Individuals want control over who can – AND CAN NOT – view their health-related records Yoda Data Utility Physicians Yoda Data Yoda Data Insurance Yoda’s Preferences Physicians = Yes Insurance = Yes Researchers = No Privacy Data Privacy in Biomedicine: Lecture 1 © 2016 Bradley Malin 35 HOSPITAL RECORDS Data Privacy in Biomedicine: Lecture 1 Researchers © 2016 Bradley Malin 36 6 Enabling Personal Control Through Technology Data Collection, Policy, and Privacy Can design technology to: Preventing Data Collection in Public Spaces http://www.appliedautonomy.com/isee.html Standardize policy specification Inform about data collection Address specific privacy concerns in data sharing Anonymity Confidentiality Solitude Data Privacy in Biomedicine: Lecture 1 © 2016 Bradley Malin 37 Commons Center Beyond Policy and Informative Technology We can not always control who gets, and has access to, our information Legally, however, data collectors may be required to maintain your privacy Data Privacy in Biomedicine: Lecture 1 © 2016 Bradley Malin 38 Ingram Commons Data Collection occurs everywhere, everyday, in all different forms (http://webcams.vanderbilt.edu/) Data Privacy in Biomedicine: Lecture 1 © 2016 Bradley Malin 39 Featheringill! Data Privacy in Biomedicine: Lecture 1 © 2016 Bradley Malin 40 More Cameras Peabody Library Terrace http://peabody.vanderbilt.edu/about/webcams/peabody_library_terrace_webcam.php Even More Cameras! http://peabody.vanderbilt.edu/about/webcams/index.php http://webcams.vanderbilt.edu/kissam/ Data Privacy in Biomedicine: Lecture 1 © 2016 Bradley Malin 41 Data Privacy in Biomedicine: Lecture 1 © 2016 Bradley Malin 42 7 Biomedical Information Data Privacy in Biomedicine: Lecture 1 © 2016 Bradley Malin 43 Not quite in the public But… information is shared for various purposes in various contexts How do you protect privacy of corresponding individuals? Data Privacy in Biomedicine: Lecture 1 44 Privacy Policy & the Law (Week 1) Schedule © 2016 Bradley Malin Let’s look at the syllabus. Privacy ideologies & frameworks Who gets to collect information? When is health information shared? How is health information reused and why? Data Privacy in Biomedicine: Lecture 1 © 2016 Bradley Malin 45 Data Privacy in Biomedicine: Lecture 1 Access Control & Roles (Week 2) 46 Auditing (Week 3) Access Control © 2016 Bradley Malin Who gets to see information when? Roles, job functions, & permissions Formal representations What Constitutes a “Good” Role Representation of organizational behavior Grouping users based on legacy knowledge provided by system administrators Medical Records & Audits Data Privacy in Biomedicine: Lecture 1 © 2016 Bradley Malin 47 Who looked at my medical record? Should they be looking? How do we construct machine learning strategies that make sense? Data Privacy in Biomedicine: Lecture 1 © 2016 Bradley Malin 48 8 Identifiability & A Whole Lot of Data (Weeks 4 - 5) Identifiability & A Whole Lot of Data (Weeks 4 - 5) When can we find what was suppressed? How can we suppress “identifiers” from data? Why does clinical treatment context influence de-identification strategies? Data Privacy in Biomedicine: Lecture 1 © 2016 Bradley Malin 49 © 2016 Bradley Malin Given all the data, how can we link it? Look at “deterministic” methods, such as rules 51 Look at probabilistic methods based on frequentist and Bayesian statistics © 2016 Bradley Malin 52 Spring Break!!! If de-identification fails, can we provably protect identity? 50 What are the idiosynchrasies in the health domain that allow them to work? Enable them to fail? Data Privacy in Biomedicine: Lecture 1 Anonymization (Week 8 - 9) © 2016 Bradley Malin Record Linkage (Week 7) How can we detect and suppress “identifiers” from unstructured data (e.g., clinical narratives)? Welcome to the wonderful world of natural language processing Data Privacy in Biomedicine: Lecture 1 And there’s alot more data than what’s in the medical record! We’ll use Social Security Numbers as an example Data Privacy in Biomedicine: Lecture 1 De-identification & Scrubbing Narratives (Week 6) It’s hard to protect health information… simply because there’s so much data Structured data (e.g., database tables) Using sample & population statistics to identify Techniques to compute distinguishability Yes we can! Welcome to formal models of anonymization No class on March 7 or 9 We’ll be looking at k-based models Guarantee every shared record corresponds to at least k people Efficient algorithms to achieve this goal Various “types” of data Data Privacy in Biomedicine: Lecture 1 © 2016 Bradley Malin 53 Data Privacy in Biomedicine: Lecture 1 © 2016 Bradley Malin 54 9 Advanced Topics in Anonymization (Week 10 - 11) Ethical Reasoning and Privacy (Week 9) Hiding in a crowd doesn’t always protect sensitive knowledge We’ll look at the identifiability concerns associated highdimensional data with a focus on: When should you publish on privacy ad vulnerabilities? Should you disseminate re-identification software or findings? Data Privacy in Biomedicine: Lecture 1 © 2016 Bradley Malin Images are everywhere in healthcare Video is becoming more prevalent How can we remove identifiers from JPEG, MPEG and other multimedia? Data Privacy in Biomedicine: Lecture 1 © 2016 Bradley Malin 55 57 56 Location data is shared for various purposes, but too much granularity can lead to identification How does identification occur? What anonymization strategies work for geocoded and spatial data? When? © 2016 Bradley Malin 58 Private Record Linkage & Information Retrieval (Week 14) Consider “horizontal” (different people different place) vs. “vertical” (same person, different place) partitioned data systems © 2016 Bradley Malin Data Privacy in Biomedicine: Lecture 1 Multiparty computation isn’t always efficient We’ll look at two special problems from earlier in the semester and see if they can be solved with relaxed assumptions of the “adversaries” We’ll define the conditions under which these solutions work and show why. We’ll look at cryptographic methods for secure multiparty computation. Data Privacy in Biomedicine: Lecture 1 © 2016 Bradley Malin Epidemiology and Geospatial Privacy (Week 13) You have data. I have data. We all have data. How can be combine data to reveal results, but no individual records? Statistical methods for identification Strategies for anonymization of DNA data Data Privacy in Biomedicine: Lecture 1 Privacy Preserving Data Mining (Week 14) genetic information Image and Video Privacy (Weeks 12 - 13) How to design algorithms to protect against homogeneity and inference attacks. 59 Data Privacy in Biomedicine: Lecture 1 © 2016 Bradley Malin 60 10 Readings for Next Lecture Final Project Presentations! (Week 15) The students are in control S. Warren and L. Brandeis. The right to privacy. Harvard Law Review. 1890; V. IV, No. 5. Department of Health and Human Services Summary of the Privacy Rule of the Health Information Portability and Accountability Act (HIPAA) You’ll be graded by a committee of special reviewers Data Privacy in Biomedicine: Lecture 1 http://faculty.uml.edu/sgallagher/Brandeisprivacy.htm http://www.hhs.gov/ocr/privacy/hipaa/understanding/summary/index.html Optional D. McGraw, J. Dempsey, L. Haris, and J. Goldman. Privacy as an enabler, not an impediment: building trust into health information exchange. Health Affairs. 2009; 28(2): 416-427. © 2016 Bradley Malin 61 http://content.healthaffairs.org/content/28/2/416.full.pdf+html O. Tene and J. Polonetsky. Privacy in the age of big data: a time for big decisions. Stanford Law Review (Online). 2012; 64: 63. Data Privacy in Biomedicine: Lecture 1 © 2016 Bradley Malin 62 11