Location Privacy - LCA

advertisement
Holistic Privacy
From Location Privacy to Genomic Privacy
Jean-Pierre Hubaux
With contributions from
E. Ayday, M. Humbert, J.-Y. Le Boudec,
J.-L. Raisaro, R. Shokri, G. Theodorakopoulos
Make It Faster!!
Benz Motorwagen, 1885
Ford-T, 1915
2
After Some Decades…
3
… the Concerns Have Changed
• Reduce casualties
–
–
–
–
Better brakes
Safety belts
Airbags
…
• Mitigate side effects
–
–
–
–
Road congestion
Depletion of fossil fuel
Climate change
….
4
Similar Phenomenon with IT
Assault on privacy
For each end user:
• 10s to 1000s Mb/s
• Terabytes of storage
• Processor in the Ghz
Cyber-crime,
cyberwar
Information overload,
attention deficit
disorder
5
Holistic Privacy
From Location Privacy to Genomic Privacy
1. On Privacy Protection
2. Location Privacy
3. Genomic Privacy
6
Another Observation Tool…
“The Right to Privacy”
Warren and Brandeis
Harvard Law Review
Vol. IV Dec. 15, 1890 No. 5
7
Major concern: photography without consent
Some Modern Observation Tools
Cellular
phones
Online Social Networks
Genomic
sequencing
8
Privacy: Definition
• Privacy control is the ability of
individuals to determine when,
how, and to what extent
information about themselves is
revealed to others.
• Goal: let personal data be used
only in the context they have been
released
• Privacy is about the data of
individuals
9
Main Risk: People’s Mind
Manipulation
Those observing us
Citizens (us)
10
Privacy Protection at Odds with…
Security (e.g., homeland security)
Usability
Privacy Protection
Business (e.g.,
targeted advertisement)
System performance
Medical progress
11
Holistic Privacy
From Location Privacy to Genomic Privacy
1. On Privacy Protection
2. Location Privacy
3. Genomic Privacy
12
Location-Based Services
Users upload location episodically
through WiFi or cellular networks
Query, Location, Time
13
Why Reveal Your Location?
• To use service
–
–
–
–
–
Cellular connectivity
Location-based services
Local recommendations
Road toll payment
…
• For social benefits
– Find friends
14
Can You Clean up Your Digital Trace?
01 01
events
----------------------------------------------Color: user identity
Number: time-stamp
Position on the map: location-stamp
01
02
03
01
02
04
03
17
04
05
17
0506
0708
18
09
06
1011
13 12
12
11
10
07
16
14
18
16
15
13
15 14
08
09
15
Threat
User A
Monday 8am
Tuesday 8am
User A
Wednesday 12pm
User A
Monday 6pm
Thursday 5pm
User A
Friday
5pm
The contextual information attached to a
trace tells much about our habits, interests,
activities, beliefs and relationships
16
Quantification of Location Privacy
• Many privacy-preserving mechanisms
proposed
• No unified formal framework in previous work
• Various metrics for location privacy
• How to compare different mechanisms?
• Which metric to use?
17
Time and Space
• Consider discrete time and space
• Attacker: service provider (``honest but curious´´)
18
Quantifying Location Privacy
KC: Knowledge Constructor
LPPM: Location Privacy Protection Mechanism:
- deliberately imprecise coordinate reports (e.g., drop some of the least significant bits)
19
- Swap user identifiers
Correctness
The adversary’s estimation of x given the observed traces o
20
Location-Privacy Preserving Mechanisms
Implemented LPPMs:
21
Location-Privacy Meter
Open source software tool (C++) to
quantify location privacy
Location-Privacy Meter (LPM)
– Some traces to learn the users’ mobility profiles
(background knowledge)
– Observed traces
LPM
– Location privacy of users with respect to various
attacks: Localization, Tracking, Meeting Disclosure,
Aggregate Presence Disclosure,…
23
LPM: Example
• N = 20 users
• R = 40 regions
• T = 96 time instants
• Protection mechanism:
– Hiding location
– Precision reduction (dropping
low-order bits from the x, y
coordinates of the location)
24
Attacks
• LO-ATT: Localization Attack: For a given user u and time t,
what is the location of u at t?
• MD-ATT: Meeting Disclosure Attack: For a given pair of users
u and v, what is the expected number of meetings between u
and v?
• AP-ATT: Aggregated Presence Attack: For a given region r
and time t, what is the expected number of users present in r
at t?
25
Results
27
Protecting Location Privacy:
Optimal Strategy against Localization Attacks
Adversary Knowledge:
User’s “Location Access Profile”
Data source: Location traces collected by Nokia Lausanne (Lausanne Data Collection Campaign)
29
Location Obfuscation Mechanism
Consequence: “Service Quality Loss”
30
Location Inference Attack
Estimation Error: “Location Privacy”
31
Problem Statement
32
Zero-sum Bayesian
Stackelberg Game
Game
User
(leader)
LBS message
Adversary
(follower)
user gain / adversary loss
33
Optimal Strategy for the User
Respect service quality
constraint
Proper probability
distribution
34
Optimal Strategy for the Adversary
Minimizing the user’s maximum
privacy under the service quality
constraint
Proper probability distribution
Shadow price of the service quality constraint .
(exchange rate between service quality and privacy)
Note: This is the dual of the previous optimization problem
35
Evaluation: Obfuscation Function
36
Output Visualization of
Obfuscation Mechanisms
Optimal Obfuscation
Basic Obfuscation
(k = 7)
37
Conclusion on Location Privacy
• Protecting location privacy is a major challenge
• Quantification expressed as adversary’s expected estimation error
(incorrectness)
• Techniques to protect location privacy: introduce imprecision in the
reported location, reduce location report frequency, make use of
pseudonyms,…
• Privacy (similarly to any security property) is adversary-dependent.
Neglecting adversary’s strategy and knowledge limits the privacy
protection
• More information and pointers:
http://lca.epfl.ch/projects/quantifyingprivacy
38
Holistic Privacy
From Location Privacy to Genomic Privacy
1. On Privacy Protection
2. Location Privacy
3. Genomic Privacy
39
On Convergence…
Digital medicine:
- Digital medical records
- Digital imaging
- Medical online social networks
- Genome sequencing
- Other ´omics data
- Wireless biosensors
…
Computing
Telecom
ICT
``The last inch´´
…0100110100011…
…CGTTAATTCCGTA…
40
The Genomic Avalanche Is Coming…
41
Genetic Sequencing
42
GATTACA (1997 Movie)
Basics of Genomics – 1
• A full genome sequence:
– uniquely identifies each one of us
– contains information about our ethnic
heritage, disease predispositions, and
many other phenotypic traits.
• Human genome: 3 billion letters
44
Basics of Genomics - 2
• The cell’s nucleus holds the genetic program that determines most of
our physical characteristics.
• This information is stored in chromosomes.
• Billions of identical copies of the genetic program, one for each cell
nucleus.
45
Basics of Genomics – 3
• Chromosomes:
molecules of a
double-stranded
chemical known as
Deoxyribonucleic
acid (DNA)
• DNA consists of
chemical units that
hook together
known as
nucleotides
46
Basics of Genomics – 4
• DNA has two strands and
four nucleotides (A T G
C):
•
•
•
•
Pairs: A-T and G-C
A = Adenosine
T = Thymidine
G = Guanosine
C = Cytidine
• The genetic information is
stored in the exact
sequence of nucleotides.
47
Basics of Genomics – 5
Human Genome  complete and ordered sequence of all 23 chromosomes
48
Basics of Genomics - 6
• Human Genome
identical in most places
for all people.
• SNP (Single Nucleotide
Polymorphism) 
positions where some
people have one
nucleotide pair while
others have another.
49
Basics of Genomics – 7
40 million SNPs
…
…
…
…
• SNPs make up only 1.3% of
the genome
• The differences at these
places make each of us
unique
Allele designates which
nucleotide is present at a
SNP.
Summary of Key Concepts
• Our genetic information is stored in the sequence of DNA in
our chromosomes.
• There are 23 chromosomes in a human genome. Men and
women have slightly different sets of chromosomes.
• SNPs are chromosome addresses. They are spots where some
people have one nucleotide, while others have another.
• SNPs have four possible alleles: A, T, G, and C.
• Our collection of SNP alleles is what makes each of us unique.
• Modern techniques make it possible to determine the status
of large numbers of SNPs very efficiently.
51
From the Sample to the Full Genome Sequence
Deep / ultra-deep
sequencing
Raw data
(FASTq)
Samples
Sequencing
machine
(Illumina,
Roche, Life
Technology,
Oxford Nanopore,
PacBioScience,…)
SAM file
(aligned
reads)
• Individual diagnosis,
personalized medicine
• Statistics
Full
genome
52
Threat
• Leakage of genomic data
• Revelation of privacy-sensitive data about the
patient
– Predisposition to disease, ethnicity, paternity or
filiation, etc.
– Denial of access to health insurance, mortgage,
education, and employment
• Cross-layer attacks
– Using privacy-sensitive information belonging to a
victim retrieved from different sources
53
Goals
• Allow specialists to access only to the genomic
data they need
• Protect data, including from insiders (e.g.,
curious sysadmins)  homomorphic
encryption
• Access time to a single patient’s genomic data
below a few seconds
• Access time to the data of a cohort of
thousands of patients below a few minutes
Cryptographic Tools
55
Possible Solution
7) Homomorphic operations
and proxy encryption
2) Sequencing
and encryption
3) Encrypted variants
Certified Institution
Storage and Processing Unit
(SPU)
Curious Party
@ SPU
8) End-result or
related variants
6) Markers related to
disease X and their
contributions
1) Sample
5) “Check my susceptibility to disease X”
and part of P’s secret key, x(2)
Patient (P)
Medical Center
(MC)
Malicious 3rd
party
Disease Susceptibility – Weighted
Averaging
P’s SNPs:
...
...
...
...
Markers for
disease X:
Probabilities:
Contributions
of markers:
P’s susceptibility
for disease X:
• All operations are conducted in ciphertext using
homomorphic encryption.
57
Prototype – Patient Interface
58
Prototype – SPU Interface
59
Prototype – Medical Center Interface
60
Holistic Privacy: Data about an Individual
Mobility
+ Body Area Network
Human
Relationships
Genome
61
Conclusion on Genomic Privacy
• Digital medicine is coming
• It will for ever change the landscape of privacy
protection
• Genomics is particularly relevant and there is a huge
ongoing research effort
• Highly sensitive data + huge amounts of data + complex
correlations between data  Complex field, Big Data
• Tools (cryptography, security protocols,
database/differential privacy, anonymization
techniques,…) already used for privacy protection in ICT
can (and should) be applied here
• More information and pointers:
http://lca.epfl.ch/projects/genomic-privacy/
62
Overall Conclusion
• Assault on privacy  huge research challenges
• Location privacy
– quantifiable at the physical level ( (x, y) coordinates)
– ongoing work at the semantic level
• Online Social Networks  part of the
background knowledge of the adversary
• Genomic privacy
– still in its infancy
– soon to be very hot
– first results coming out
63
Download