Holistic Privacy From Location Privacy to Genomic Privacy Jean-Pierre Hubaux With contributions from E. Ayday, M. Humbert, J.-Y. Le Boudec, J.-L. Raisaro, R. Shokri, G. Theodorakopoulos Make It Faster!! Benz Motorwagen, 1885 Ford-T, 1915 2 After Some Decades… 3 … the Concerns Have Changed • Reduce casualties – – – – Better brakes Safety belts Airbags … • Mitigate side effects – – – – Road congestion Depletion of fossil fuel Climate change …. 4 Similar Phenomenon with IT Assault on privacy For each end user: • 10s to 1000s Mb/s • Terabytes of storage • Processor in the Ghz Cyber-crime, cyberwar Information overload, attention deficit disorder 5 Holistic Privacy From Location Privacy to Genomic Privacy 1. On Privacy Protection 2. Location Privacy 3. Genomic Privacy 6 Another Observation Tool… “The Right to Privacy” Warren and Brandeis Harvard Law Review Vol. IV Dec. 15, 1890 No. 5 7 Major concern: photography without consent Some Modern Observation Tools Cellular phones Online Social Networks Genomic sequencing 8 Privacy: Definition • Privacy control is the ability of individuals to determine when, how, and to what extent information about themselves is revealed to others. • Goal: let personal data be used only in the context they have been released • Privacy is about the data of individuals 9 Main Risk: People’s Mind Manipulation Those observing us Citizens (us) 10 Privacy Protection at Odds with… Security (e.g., homeland security) Usability Privacy Protection Business (e.g., targeted advertisement) System performance Medical progress 11 Holistic Privacy From Location Privacy to Genomic Privacy 1. On Privacy Protection 2. Location Privacy 3. Genomic Privacy 12 Location-Based Services Users upload location episodically through WiFi or cellular networks Query, Location, Time 13 Why Reveal Your Location? • To use service – – – – – Cellular connectivity Location-based services Local recommendations Road toll payment … • For social benefits – Find friends 14 Can You Clean up Your Digital Trace? 01 01 events ----------------------------------------------Color: user identity Number: time-stamp Position on the map: location-stamp 01 02 03 01 02 04 03 17 04 05 17 0506 0708 18 09 06 1011 13 12 12 11 10 07 16 14 18 16 15 13 15 14 08 09 15 Threat User A Monday 8am Tuesday 8am User A Wednesday 12pm User A Monday 6pm Thursday 5pm User A Friday 5pm The contextual information attached to a trace tells much about our habits, interests, activities, beliefs and relationships 16 Quantification of Location Privacy • Many privacy-preserving mechanisms proposed • No unified formal framework in previous work • Various metrics for location privacy • How to compare different mechanisms? • Which metric to use? 17 Time and Space • Consider discrete time and space • Attacker: service provider (``honest but curious´´) 18 Quantifying Location Privacy KC: Knowledge Constructor LPPM: Location Privacy Protection Mechanism: - deliberately imprecise coordinate reports (e.g., drop some of the least significant bits) 19 - Swap user identifiers Correctness The adversary’s estimation of x given the observed traces o 20 Location-Privacy Preserving Mechanisms Implemented LPPMs: 21 Location-Privacy Meter Open source software tool (C++) to quantify location privacy Location-Privacy Meter (LPM) – Some traces to learn the users’ mobility profiles (background knowledge) – Observed traces LPM – Location privacy of users with respect to various attacks: Localization, Tracking, Meeting Disclosure, Aggregate Presence Disclosure,… 23 LPM: Example • N = 20 users • R = 40 regions • T = 96 time instants • Protection mechanism: – Hiding location – Precision reduction (dropping low-order bits from the x, y coordinates of the location) 24 Attacks • LO-ATT: Localization Attack: For a given user u and time t, what is the location of u at t? • MD-ATT: Meeting Disclosure Attack: For a given pair of users u and v, what is the expected number of meetings between u and v? • AP-ATT: Aggregated Presence Attack: For a given region r and time t, what is the expected number of users present in r at t? 25 Results 27 Protecting Location Privacy: Optimal Strategy against Localization Attacks Adversary Knowledge: User’s “Location Access Profile” Data source: Location traces collected by Nokia Lausanne (Lausanne Data Collection Campaign) 29 Location Obfuscation Mechanism Consequence: “Service Quality Loss” 30 Location Inference Attack Estimation Error: “Location Privacy” 31 Problem Statement 32 Zero-sum Bayesian Stackelberg Game Game User (leader) LBS message Adversary (follower) user gain / adversary loss 33 Optimal Strategy for the User Respect service quality constraint Proper probability distribution 34 Optimal Strategy for the Adversary Minimizing the user’s maximum privacy under the service quality constraint Proper probability distribution Shadow price of the service quality constraint . (exchange rate between service quality and privacy) Note: This is the dual of the previous optimization problem 35 Evaluation: Obfuscation Function 36 Output Visualization of Obfuscation Mechanisms Optimal Obfuscation Basic Obfuscation (k = 7) 37 Conclusion on Location Privacy • Protecting location privacy is a major challenge • Quantification expressed as adversary’s expected estimation error (incorrectness) • Techniques to protect location privacy: introduce imprecision in the reported location, reduce location report frequency, make use of pseudonyms,… • Privacy (similarly to any security property) is adversary-dependent. Neglecting adversary’s strategy and knowledge limits the privacy protection • More information and pointers: http://lca.epfl.ch/projects/quantifyingprivacy 38 Holistic Privacy From Location Privacy to Genomic Privacy 1. On Privacy Protection 2. Location Privacy 3. Genomic Privacy 39 On Convergence… Digital medicine: - Digital medical records - Digital imaging - Medical online social networks - Genome sequencing - Other ´omics data - Wireless biosensors … Computing Telecom ICT ``The last inch´´ …0100110100011… …CGTTAATTCCGTA… 40 The Genomic Avalanche Is Coming… 41 Genetic Sequencing 42 GATTACA (1997 Movie) Basics of Genomics – 1 • A full genome sequence: – uniquely identifies each one of us – contains information about our ethnic heritage, disease predispositions, and many other phenotypic traits. • Human genome: 3 billion letters 44 Basics of Genomics - 2 • The cell’s nucleus holds the genetic program that determines most of our physical characteristics. • This information is stored in chromosomes. • Billions of identical copies of the genetic program, one for each cell nucleus. 45 Basics of Genomics – 3 • Chromosomes: molecules of a double-stranded chemical known as Deoxyribonucleic acid (DNA) • DNA consists of chemical units that hook together known as nucleotides 46 Basics of Genomics – 4 • DNA has two strands and four nucleotides (A T G C): • • • • Pairs: A-T and G-C A = Adenosine T = Thymidine G = Guanosine C = Cytidine • The genetic information is stored in the exact sequence of nucleotides. 47 Basics of Genomics – 5 Human Genome complete and ordered sequence of all 23 chromosomes 48 Basics of Genomics - 6 • Human Genome identical in most places for all people. • SNP (Single Nucleotide Polymorphism) positions where some people have one nucleotide pair while others have another. 49 Basics of Genomics – 7 40 million SNPs … … … … • SNPs make up only 1.3% of the genome • The differences at these places make each of us unique Allele designates which nucleotide is present at a SNP. Summary of Key Concepts • Our genetic information is stored in the sequence of DNA in our chromosomes. • There are 23 chromosomes in a human genome. Men and women have slightly different sets of chromosomes. • SNPs are chromosome addresses. They are spots where some people have one nucleotide, while others have another. • SNPs have four possible alleles: A, T, G, and C. • Our collection of SNP alleles is what makes each of us unique. • Modern techniques make it possible to determine the status of large numbers of SNPs very efficiently. 51 From the Sample to the Full Genome Sequence Deep / ultra-deep sequencing Raw data (FASTq) Samples Sequencing machine (Illumina, Roche, Life Technology, Oxford Nanopore, PacBioScience,…) SAM file (aligned reads) • Individual diagnosis, personalized medicine • Statistics Full genome 52 Threat • Leakage of genomic data • Revelation of privacy-sensitive data about the patient – Predisposition to disease, ethnicity, paternity or filiation, etc. – Denial of access to health insurance, mortgage, education, and employment • Cross-layer attacks – Using privacy-sensitive information belonging to a victim retrieved from different sources 53 Goals • Allow specialists to access only to the genomic data they need • Protect data, including from insiders (e.g., curious sysadmins) homomorphic encryption • Access time to a single patient’s genomic data below a few seconds • Access time to the data of a cohort of thousands of patients below a few minutes Cryptographic Tools 55 Possible Solution 7) Homomorphic operations and proxy encryption 2) Sequencing and encryption 3) Encrypted variants Certified Institution Storage and Processing Unit (SPU) Curious Party @ SPU 8) End-result or related variants 6) Markers related to disease X and their contributions 1) Sample 5) “Check my susceptibility to disease X” and part of P’s secret key, x(2) Patient (P) Medical Center (MC) Malicious 3rd party Disease Susceptibility – Weighted Averaging P’s SNPs: ... ... ... ... Markers for disease X: Probabilities: Contributions of markers: P’s susceptibility for disease X: • All operations are conducted in ciphertext using homomorphic encryption. 57 Prototype – Patient Interface 58 Prototype – SPU Interface 59 Prototype – Medical Center Interface 60 Holistic Privacy: Data about an Individual Mobility + Body Area Network Human Relationships Genome 61 Conclusion on Genomic Privacy • Digital medicine is coming • It will for ever change the landscape of privacy protection • Genomics is particularly relevant and there is a huge ongoing research effort • Highly sensitive data + huge amounts of data + complex correlations between data Complex field, Big Data • Tools (cryptography, security protocols, database/differential privacy, anonymization techniques,…) already used for privacy protection in ICT can (and should) be applied here • More information and pointers: http://lca.epfl.ch/projects/genomic-privacy/ 62 Overall Conclusion • Assault on privacy huge research challenges • Location privacy – quantifiable at the physical level ( (x, y) coordinates) – ongoing work at the semantic level • Online Social Networks part of the background knowledge of the adversary • Genomic privacy – still in its infancy – soon to be very hot – first results coming out 63