Privacy in a Mobile

advertisement
Tuning Privacy-Utility Tradeoffs in
Statistical Databases using Policies
Ashwin Machanavajjhala
ashwin @ cs.duke.edu
Collaborators: Daniel Kifer (PSU), Bolin Ding (MSR), Xi He (Duke)
Summer @ Census, 8/15/2013
1
Overview of the talk
• An inherent trade-off between privacy (confidentiality) of
individuals and utility of statistical analyses over data collected
from individuals.
• Differential privacy has revolutionized how we reason about
privacy
– Nice tuning knob ε for trading off privacy and utility
Summer @ Census, 8/15/2013
2
Overview of the talk
• However, differential privacy only captures a small part of the
privacy-utility trade-off space
– No Free Lunch Theorem
– Differentially private mechanisms may not ensure sufficient utility
– Differentially private mechanisms may not ensure sufficient privacy
Summer @ Census, 8/15/2013
3
Overview of the talk
• I will present a new privacy framework that allows data publishers
to more effectively tradeoff privacy for utility
– Better control on what to keep secret and who the adversaries are
– Can ensure more utility than differential privacy in many cases
– Can ensure privacy where differential privacy fails
Summer @ Census, 8/15/2013
4
Outline
• Background
– Differential privacy
• No Free Lunch
[Kifer-M SIGMOD ’11]
– No `one privacy notion to rule them all’
• Pufferfish Privacy Framework
[Kifer-M PODS’12]
– Navigating the space of privacy definitions
• Blowfish: Practical privacy using policies
Summer @ Census, 8/15/2013
[ongoing work]
5
Data Privacy Problem
Utility:
Privacy: No breach about any individual
Server
DB
Individual 1
r1
Individual 2
r2
Summer @ Census, 8/15/2013
Individual 3
r3
Individual N
rN
6
Data Privacy in the real world
Application
Data Collector
Third Party
(adversary)
Private
Information
Function (utility)
Medical
Hospital
Epidemiologist
Disease
Correlation between
disease and geography
Genome
analysis
Hospital
Statistician/
Researcher
Genome
Correlation between
genome and disease
Advertising
Google/FB/Y!
Advertiser
Social
Recommendations
Facebook
Another user
Summer @ Census, 8/15/2013
Clicks/Brows Number of clicks on an ad
ing
by age/region/gender …
Friend links
/ profile
Recommend other users
or ads to users based on
social network
7
Many definitions & several attacks
K-Anonymity
Sweeney et al.
IJUFKS ‘02
L-diversity
T-closeness
Li et. al ICDE ‘07
Differential
Privacy
Machanavajjhala et. al
TKDD ‘07
E-Privacy
• Linkage attack
• Background knowledge attack
• Minimality /Reconstruction
attack
• de Finetti attack
• Composition attack
Machanavajjhala et. al
VLDB ‘09
Dwork et. al ICALP ‘06
Summer @ Census, 8/15/2013
8
Differential Privacy
For every pair of inputs
that differ in one value
D1
For every output …
D2
O
Adversary should not be able to distinguish
between any D1 and D2 based on any O
log
Pr[A(D1) = O]
Pr[A(D2) = O]
Summer @ Census, 8/15/2013
<
ε
(ε>0)
.
9
Algorithms
• No deterministic algorithm guarantees differential privacy.
• Random sampling does not guarantee differential privacy.
• Randomized response satisfies differential privacy.
Summer @ Census, 8/15/2013
10
Laplace Mechanism
Query q
Database
D
True answer
q(D)
q(D) + η
Researcher
Privacy depends on
the λ parameter
η
h(η) α exp(-η / λ)
Laplace Distribution – Lap(λ)
0.6
Mean: 0,
Variance: 2 λ2
Summer @ Census, 8/15/2013
0.4
0.2
0
-10 -8 -6 -4 -2
0
2
4
11
6
8 10
Laplace Mechanism
[Dwork et al., TCC 2006]
Thm: If sensitivity of the query is S, then the following guarantees εdifferential privacy.
λ = S/ε
Sensitivity: Smallest number s.t. for any D,D’ differing in one entry,
|| q(D) – q(D’) ||1 ≤ S(q)
Summer @ Census, 8/15/2013
12
Contingency tables
Each tuple takes k=4
different values
2
2
2
8
D
Count(
Summer @ Census, 8/15/2013
,
)
13
Laplace Mechanism for Contingency Tables
2 + Lap(2/ε) 2 + Lap(2/ε)
2 + Lap(2/ε) 8 + Lap(2/ε)
Mean : 8
Variance : 8/ε2
D
Sensitivity = 2
Summer @ Census, 8/15/2013
14
Composition Property
If algorithms A1, A2, …, Ak use independent randomness and each
Ai satisfies εi-differential privacy, resp.
Then, outputting all the answers together satisfies differential
privacy with
ε = ε1 + ε2 + … + ε k
Privacy Budget
Summer @ Census, 8/15/2013
15
Differential Privacy
• Privacy definition that is independent of the attacker’s prior
knowledge.
• Tolerates many attacks that other definitions are susceptible to.
– Avoids composition attacks
– Claimed to be tolerant against adversaries with arbitrary background
knowledge.
• Allows simple, efficient and useful privacy mechanisms
– Used in LEHD’s OnTheMap
Summer @ Census, 8/15/2013
[M et al ICDE ‘08]
16
Outline
• Background
– Differential privacy
• No Free Lunch
[Kifer-M SIGMOD ’11]
– No `one privacy notion to rule them all’
• Pufferfish Privacy Framework
[Kifer-M PODS’12]
– Navigating the space of privacy definitions
• Blowfish: Practical privacy using policies
Summer @ Census, 8/15/2013
[ongoing work]
17
Differential Privacy & Utility
• Differentially private mechanisms may not ensure sufficient utility
for many applications.
• Sparse Data:
Integrated Mean Square Error due to Laplace mechanism can
be worse than returning a random contingency table for
typical values of ε (around 1)
• Social Networks
Summer @ Census, 8/15/2013
[M et al PVLDB 2011]
18
Differential Privacy & Privacy
• Differentially private algorithms may not limit the ability of an
adversary to learn sensitive information about individuals when
records in the data are correlated
• Correlations across individuals occur in many ways:
– Social Networks
– Data with pre-released constraints
– Functional Dependencies
Summer @ Census, 8/15/2013
19
Laplace Mechanism and Correlations
2 + Lap(2/ε) 2 + Lap(2/ε)
4
2 + Lap(2/ε) 8 + Lap(2/ε)
10
4
10
Auxiliary marginals published for following reasons:
D
1.
2.
Legal: 2002 Supreme Court case Utah v. Evans
Contractual: Advertisers must know exact
demographics at coarse granularities
Does Laplace mechanism still guarantee
privacy?
Summer @ Census, 8/15/2013
20
Laplace Mechanism and Correlations
22 ++ Lap(2/ε)
Lap(2/ε) 22 ++ Lap(2/ε)
Lap(2/ε)
4
22 ++ Lap(2/ε)
Lap(2/ε) 8 + Lap(2/ε)
10
4
D
Summer @ Census, 8/15/2013
Count (
Count (
Count (
Count (
10
,
,
,
,
) = 8 + Lap(2/ε)
) = 8 – Lap(2/ε)
) = 8 – Lap(2/ε)
) = 8 + Lap(2/ε)
21
Laplace Mechanism and Correlations
22 ++ Lap(1/ε)
Lap(2/ε)
Lap(2/ε) 22 ++ Lap(1/ε)
4
22 ++ Lap(1/ε)
Lap(2/ε) 8 + Lap(2/ε)
10
4
10
Mean : 8
Variance : 8/ke2
D
can reconstruct the table with
high precision for large k
Summer @ Census, 8/15/2013
22
No Free Lunch Theorem
It is not possible to guarantee any utility in addition to privacy,
without making assumptions about
• the data generating distribution
[Kifer-M SIGMOD ‘11]
• the background knowledge available
to an adversary
[Dwork-Naor JPC ‘10]
Summer @ Census, 8/15/2013
23
To sum up …
• Differential privacy only captures a small part of the privacy-utility
trade-off space
– No Free Lunch Theorem
– Differentially private mechanisms may not ensure sufficient privacy
– Differentially private mechanisms may not ensure sufficient utility
Summer @ Census, 8/15/2013
24
Outline
• Background
– Differential privacy
• No Free Lunch
[Kifer-M SIGMOD ’11]
– No `one privacy notion to rule them all’
• Pufferfish Privacy Framework
[Kifer-M PODS’12]
– Navigating the space of privacy definitions
• Blowfish: Practical privacy using policies
Summer @ Census, 8/15/2013
[ongoing work]
25
Pufferfish Framework
Summer @ Census, 8/15/2013
26
Pufferfish Semantics
• What is being kept secret?
• Who are the adversaries?
• How is information disclosure bounded?
– (similar to epsilon in differential privacy)
Summer @ Census, 8/15/2013
27
Sensitive Information
• Secrets: S be a set of potentially sensitive statements
– “individual j’s record is in the data, and j has Cancer”
– “individual j’s record is not in the data”
• Discriminative Pairs:
Mutually exclusive pairs of secrets.
– (“Bob is in the table”, “Bob is not in the table”)
– (“Bob has cancer”, “Bob has diabetes”)
Summer @ Census, 8/15/2013
28
Adversaries
• We assume a Bayesian adversary who is can be completely
characterized by his/her prior information about the data
– We do not assume computational limits
• Data Evolution Scenarios: set of all probability distributions that
could have generated the data ( … think adversary’s prior).
– No assumptions: All probability distributions over data instances are
possible.
– I.I.D.: Set of all f such that: P(data = {r1, r2, …, rk}) = f(r1) x f(r2) x…x f(rk)
Summer @ Census, 8/15/2013
29
Information Disclosure
• Mechanism M satisfies ε-Pufferfish(S, Spairs, D), if
Summer @ Census, 8/15/2013
30
Pufferfish Semantic Guarantee
Posterior odds
of s vs s’
Summer @ Census, 8/15/2013
Prior odds of
s vs s’
31
Pufferfish & Differential Privacy
• Spairs:
–
six: record i takes the value x
–
• Attackers should not be able to significantly distinguish between
any two values from the domain for any individual record.
Summer @ Census, 8/15/2013
34
Pufferfish & Differential Privacy
• Data evolution:
– For all θ = [ f1, f2, f3, …, fk ]
• Adversary’s prior may be any distribution that makes records
independent
Summer @ Census, 8/15/2013
35
Pufferfish & Differential Privacy
• Spairs:
–
–
six: record i takes the value x
• Data evolution:
– For all θ = [ f1, f2, f3, …, fk ]
A mechanism M satisfies differential privacy
if and only if
it satisfies Pufferfish instantiated using Spairs and {θ}
Summer @ Census, 8/15/2013
36
Summary of Pufferfish
• A semantic approach to defining privacy
– Enumerates the information that is secret and the set of adversaries.
– Bounds the odds ratio of pairs of mutually exclusive secrets
• Helps understand assumptions under which privacy is guaranteed
– Differential privacy is one specific choice of secret pairs and adversaries
• How should a data publisher use this framework?
• Algorithms?
Summer @ Census, 8/15/2013
37
Outline
• Background
– Differential privacy
• No Free Lunch
[Kifer-M SIGMOD ’11]
– No `one privacy notion to rule them all’
• Pufferfish Privacy Framework
[Kifer-M PODS’12]
– Navigating the space of privacy definitions
• Blowfish: Practical privacy using policies
Summer @ Census, 8/15/2013
[ongoing work]
38
Blowfish Privacy
• A special class of Pufferfish instantiations
Both pufferfish and blowfish are marine fish of the Tetraodontidae family
Summer @ Census, 8/15/2013
39
Blowfish Privacy
• A special class of Pufferfish instantiations
• Extends differential privacy using policies
– Specification of sensitive information
• Allows more utility
– Specification of publicly known constraints in the data
• Ensures privacy in correlated data
• Satisfies the composition property
Summer @ Census, 8/15/2013
40
Blowfish Privacy
• A special class of Pufferfish instantiations
• Extends differential privacy using policies
– Specification of sensitive information
• Allows more utility
– Specification of publicly known constraints in the data
• Ensures privacy in correlated data
• Satisfies the composition property
Summer @ Census, 8/15/2013
41
Sensitive Information
• Secrets: S be a set of potentially sensitive statements
– “individual j’s record is in the data, and j has Cancer”
– “individual j’s record is not in the data”
• Discriminative Pairs:
Mutually exclusive pairs of secrets.
– (“Bob is in the table”, “Bob is not in the table”)
– (“Bob has cancer”, “Bob has diabetes”)
Summer @ Census, 8/15/2013
42
Sensitive information in Differential Privacy
• Spairs:
–
six: record i takes the value x
–
• Attackers should not be able to significantly distinguish between
any two values from the domain for any individual record.
Summer @ Census, 8/15/2013
43
Other notions of Sensitive Information
• Medical Data
– OK to infer whether individual is healthy or not.
– E.g., (Bob is Healthy, Bob is Diabetes) is not a discriminative pair of secrets
for any individual
• Partitioned Sensitive Information:
Summer @ Census, 8/15/2013
44
Other notions of Sensitive Information
• Geospatial Data
– Do not want the attacker to distinguish between “close-by” points in the
space.
– May distinguish between “far-away” points
• Distance based Sensitive Information
Summer @ Census, 8/15/2013
45
Generalization as a graph
• Consider a graph G = (V, E), where V is the set of values that an
individual’s record can take.
• E encodes the set of discriminative pairs
– Same for all records.
Summer @ Census, 8/15/2013
47
Blowfish Privacy + “Policy of Secrets”
• A mechanism M satisfy blowfish privacy wrt policy G if
– For every set of outputs of the mechanism S
– For every pair of datasets that differ in one record,
with values x and y s.t. (x,y) ε E
Summer @ Census, 8/15/2013
48
Blowfish Privacy + “Policy of Secrets”
• A mechanism M satisfy blowfish privacy wrt policy G if
– For every set of outputs of the mechanism S
– For every pair of datasets that differ in one record,
with values x and y s.t. (x,y) ε E
• For any x and y in the domain,
Shortest distance
between x and y in G
Summer @ Census, 8/15/2013
49
Blowfish Privacy + “Policy of Secrets”
• A mechanism M satisfy blowfish privacy wrt policy G if
– For every set of outputs of the mechanism S
– For every pair of datasets that differ in one record,
with values x and y s.t. (x,y) ε E
• Adversary is allowed to distinguish between x and y that appear
in different disconnected components in G
Summer @ Census, 8/15/2013
50
Algorithms for Blowfish
• Consider an ordered 1-D attribute
– Dom = {x1,x2,x3,…,xd}
– E.g., ranges of Age, Salary, etc.
• Suppose our policy is:
Adversary should not distinguish whether an individual’s value is
xj or xj+1 .
x1
x2
x3
Summer @ Census, 8/15/2013
xd
52
Algorithms for Blowfish
• Suppose we want to release histogram privately
– Number of individuals in each age range
C(x1)
x1
x2
C(x3)
C(xd)
x3
xd
• Any differentially private algorithm also satisfies blowfish
– Can use Laplace mechanism (with sensitivity 2)
Summer @ Census, 8/15/2013
53
Ordered Mechanism
• We can answer a different set of queries to get a different private
estimator for the histogram.
Sd
…
S3
S2
S1
C(x1)
x1
x2
C(x3)
C(xd)
x3
xd
Summer @ Census, 8/15/2013
54
Ordered Mechanism
• We can answer each Si using Laplace mechanism …
• … but sensitivity for all the queries is only 1
Sd
…
S3
S2
S1
C(x2) -1 C(x3) +1
x1
x2
x3
Summer @ Census, 8/15/2013
Changing one tuple
from x2 to x3 only
changes S2
xd
55
Ordered Mechanism
• We can answer each Si using Laplace mechanism …
• … but sensitivity for all the queries is only 1
Factor of 2
improvement
Summer @ Census, 8/15/2013
56
Ordered Mechanism
• In addition, we have the following constraint:
• However, the noisy counts may not satisfy this constraint.
• We can post-process the noisy counts to ensure this constraint:
Summer @ Census, 8/15/2013
57
Ordered Mechanism
• We can post-process the noisy counts to ensure this constraint:
Order of magnitude
improvement for
large d
Summer @ Census, 8/15/2013
58
Ordered Mechanism
• By leveraging the weaker sensitive information in the policy, we
can provide significantly better utility
• Extends to more general policy specifications.
• Ordered mechanisms and other blowfish algorithms are being
tested on the synthetic data generator for LODES data product.
Summer @ Census, 8/15/2013
59
Blowfish Privacy & Correlations
• Differentially private mechanisms may not ensure privacy when
correlations exist in the data.
• Blowfish can handle constraints in the form of publicly known
constraints.
– Well know marginal counts in the data
– Other dependencies
• Privacy definition is similar to differential privacy with a modified
notion of neighboring tables
Summer @ Census, 8/15/2013
60
Other instantiations of Pufferfish
• All blowfish instantiations are extensions of differential privacy
using
– Weaker notions of sensitive information
– Allowing knowledge of constraints about the data
– All blowfish mechanisms satisfy composition property
• We can instantiate Pufferfish with other “realistic” adversary
notions
– Only prior distributions that are similar to the expected data distribution
– Open question: Which definitions satisfy composition property?
Summer @ Census, 8/15/2013
61
Summary
• Differential privacy (and the tuning knob epsilon) is insufficient
for trading off privacy for utility in many applications
– Sparse data, Social networks, …
• Pufferfish framework allows more expressive privacy definitions
– Can vary sensitive information, adversary priors, and epsilon
• Blowfish shows one way to create more expressive definitions
– Can provide useful composable mechanisms
• There is an opportunity to correctly tune privacy by using the
above expressive privacy frameworks
Summer @ Census, 8/15/2013
62
Thank you 
[M et al PVLDB’11]
A. Machanavajjhala, A. Korolova, A. Das Sarma, “Personalized Social Recommendations
– Accurate or Private?”, PVLDB 4(7) 2011
[Kifer-M SIGMOD’11]
D. Kifer, A. Machanavajjhala, “No Free Lunch in Data Privacy”, SIGMOD 2011
[Kifer-M PODS’12]
D. Kifer, A. Machanavajjhala, “A Rigorous and Customizable Framework for Privacy”,
PODS 2012
[ongoing work]
A. Machanavajjhala, B. Ding, X. He, “Blowfish Privacy: Tuning Privacy-Utility Trade-offs
using Policies”, in preparation
Summer @ Census, 8/15/2013
63
Download