Dynamic Enforcement of Knowledge-based Security

advertisement
DYNAMIC ENFORCEMENT OF
KNOWLEDGE-BASED
SECURITY POLICIES
Piotr (Peter) Mardziel, Stephen Magill, Michael Hicks,
and Mudhakar Srivatsa
2
Your information
3
Bad + Good
4
Want
• Protect personal information.
• Preserve beneficial uses.
5
Take back control
Photography
querier
you
6
Useful and non-revealing
out = 24 ≤ Age ≤ 30
Æ Female?
Æ Engaged? *
Photography
true
querier
* real query used by a facebook advertiser
you
7
out =
(age,
zip-code,
birth-date) *
reject
* - zip-code: postal code in USA
- age, zip-code, birth-date can be used to uniquely identify 63% of Americans
8
Benefits
• Can provide exact answer to query.
• Need not know ahead of time how your information can be useful.
• Need not preemptively hide information.
• doesn’t restricts uses
9
The problem: how to decide to run a
query or not?
Q1
out := 24 ≤ age ≤ 30
Æ female?
Æ engaged?
Q2
out := age
?
Q3
out :=
(age,
zip-code,
birth-date)
10
Setting
• Past answers, unknown future queries.
• Cannot view queries in isolation.
• Cannot use various static QIF measures.
• Rejection.
• Cannot let rejection defeat our protection.
…
Q41
Q42
X reject
time
Q43
…
11
Goal: restrict querier knowledge
• Define knowledge.
• Define learning.
12
Knowledge model
= belief (probability distribution)
over secret data
• Given belief, a query, determine revised belief given
the output of the query (Bayesian revision)
• evaluate revised belief as acceptable or not
• Can repeat the process on future queries
• start with the revised belief from the previous one
13
Meet Bob
Bob (born September 24, 1980)
bday = 267
Secret
byear = 1980
=
byear
0 · bday · 364
1956 · byear · 1992
Assumption: this is accurate
each equally likely
1992
1956
bday
0
364
14
1992
bday-query1
today := 260;
if bday ¸ today Æ bday < (today + 7)
then out := 1
else out := 0
1956
0
364
1992
=
1956
0
259
267
364
| (out = 0)
P
15
bday-query1
today := 260;
if bday ¸ today Æ bday < (today + 7)
then out := 1
else out := 0
Problem
Policy: Is this acceptable?
1992
=
=
1956
0
259
267
364
| (out = 0)
16
Probabilistic interpretation
• We can use (general purpose) probabilistic languages to model
attacker knowledge and how it changes due to query results.
• prob-scheme, IBAL, etc.
• Problem: (standard) enumeration-based probabilistic interpretation
and revision is too slow.
17
Outline
• Our results
• Problem 1: Suitable policy definition.
• Problem 2: Approximate (but sound in terms of policy) probabilistic
computation.
• Implementation.
• Experimental results.
• compare to prob-scheme, enumeration-based probabilistic interpretation
18
Problem 1: Policy
• Let us consider a policy
• Pr[my secret] < t
• Choice of threshold t might depend on risks involved in
revelation of secret.
19
P
Bob’s policy
Bob (born September 24, 1980)
bday = 267
Secret
byear = 1980
=
byear
0 · bday · 364
1956 · byear · 1992
each equally likely
Pr[bday = 267] …
Policy
Pr[bday] < 0.2
Pr[bday,byear] < 0.05
1992
Currently
Pr[bday] = 1/365
Pr[bday,byear] = 1/(365*37)
1956
bday
0
364
20
Bob’s policies
Policy
Pr[bday] < 0.2
• bday specifically should never be certainly known
21
Bob’s policies
Policy
Pr[bday,byear] < 0.05
• the pair of bday,byear should never be certainly known
• doesn’t commit to protecting bday nor byear alone
• allows queries that need high precision in one, or low precision in both
22
1992
bday-query1
today := 260;
if bday ¸ today Æ bday < (today + 7)
then out := 1
else out := 0
1956
0
364
1992
=
1956
0
259
267
364
| (out = 0)
23
bday-query1
today := 260;
if bday ¸ today Æ bday < (today + 7)
then out := 1
else out := 0
Potentially
Pr[bday] = 1/358 < 0.2
Pr[bday,byear] = 1/(358*37) < 0.05
1992
=
=
1956
0
259
267
364
| (out = 0)
24
Next day …
1992
bday-query2
today := 261;
if bday ¸ today Æ bday < (today + 7)
then out := 1
else out := 0
1956
0
259
267
364
1992
=
| (out = 1)
1956
0
267
Pr[bday] = 1
So reject?
P
25
Querier’s perspective
Assume querier knows policy
if bday != 267
if bday = 267
1992
1992
1956
1956
0
259
268
will get answer
364
0
267
will get reject
26
Rejection problem
|
reject
=
• Policy: Pr[bday = 267 | out = o] < t
• Rejection, intended to protect secret, reveals secret.
27
Rejection revised
• Policy: Pr[bday = 267 | out = o] < t
• Solution?
• Decide policy independently of secret
• Revised policy
• for every possible output o,
• for every possible bday b,
• Pr[bday = b | out = o] < t
•
So the real bday in particular
28
bday-query1
today := 260;
if bday ¸ today Æ bday < (today + 7)
then out := 1
else out := 0
accept
initial belief
29
(after bday-query1)
bday-query2
today := 261;
if bday ¸ today Æ bday < (today + 7)
then out := 1
else out := 0
reject
(regardless of what bday actually is)
30
(after bday-query1)
bday-query3
today := 266;
if bday ¸ today Æ bday < (today + 7)
then out := 1
else out := 0
accept
31
Problem 2: Slowness
• We can use (general purpose) probabilistic languages to model
attacker knowledge and how it changes due to query results.
• prob-scheme, IBAL, etc.
• (standard) enumeration-based probabilistic interpretation and revision
is too slow.
• Let us look at why.
32
Enumeration
1992
input state
1956
0
364
bday-query1
today := 260;
if bday ¸ today Æ bday < (today + 7)
then out := 1
else out := 0
output state
Æ (out = 0) *
* ± | (out = 0) = normalize(± Æ (out = 0))
33
Enumeration
• A lot of states = slow
• Can (over estimate) probabilities after partial enumeration.
• Assume unseen mass is in worst possible spot
prob-scheme
prob-poly-set
precision
1
0.8
0.6
0.4
0.2
0
0
4
8
running time [s]
12
16
34
Problem 2: Slowness
• Let us do away with enumeration.
15:00
35
Representation
1992
P1
1956
0
P1: 0 · bday · 364, 1956 · byear · 1992
p = 0.000074
364
P1: 0 · bday · 259, 1956 · byear · 1992, out = 0
p = 0.000074
P2: 267 · bday · 364, 1956 · byear · 1992, out = 1
p = 0.000074
…
P
36
Region
• Enumeration of states ! Manipulation of regions of states
• Problems
• too many regions
• non-uniform regions
37
Too many regions
1992
1991
1981
1971
nasty-query1
… disjuncts …
… more disjuncts …
… disjuncts EVERYWHERE …
1961
1956
0
259
267
364
38
Approximation
abstract
P1
1992
1992
1991
P2
1981
1981
1971
1971
1961
1961
1956
1956
0
259
364
267
P1: 0 · bday · 259, 1992 · byear · 1992
p = 0.000067
P2: 0 · bday · 259, 1982 · byear · 1990
p = 0.000067
…
P1
0
P2
259
267
364
P1: 0 · bday · 259, 1956 · byear · 1992
p = 0.000067
s · 8580 (  size of P1 )
P2: 267 · bday · 364, 1956 · byear · 1992
p = 0.000067
s · 3234 (  size of P2 )
P10
p,s refer to possible (non-zero probability) points in region
39
Abstraction imprecision
abstract
1992
1992
1991
1981
1981
1971
1971
1961
1961
1956
1956
0
259
267
364
0
1992
1992
1981
1981
1971
1971
1961
1961
1956
1956
0
259
267
364
P1
0
P2
259
259
267
267
364
364
40
Non-uniform regions
P1
1992
P2
1981
1971
1961
nasty-query-2
… disjuncts …
… more disjuncts …
… probabilistic choice …
1956
0
259
267
364
P
41
Approximation
approximate
P1
1992
1992
1991
P2
1981
1981
1971
1971
1961
1961
1956
1956
0
259
267
P1: 0 · bday · 259, 1992 · byear · 1992
p = 0.0000074
P2: 0 · bday · 259, 1991 · byear · 1991
p = 0.000074
…
P18
364
P1
0
P2
259
267
364
P1: 0 · bday · 259, 1956 · byear · 1992
p · 0.000074
s · 9620
P2: 267 · bday · 364, 1956 · byear · 1992
p · 0.000074
s · 3626
for policy check
8 … 8 … Pr[bday = b | out = o] < t
42
Abstraction
abstract
1992
1991
1981
P1
1971
P2
1961
1956
0
Probabilistic
Polyhedron
259
267
364
For each Pi, store
region (polyhedron)
upper bound on probability of each possible point
upper bound on the number of (possible) points
Also store
lower bounds on the above
Pr[A | B] = Pr[A Æ B] / Pr[B]
43
Abstract Conditioning
1992
1981
1971
1961
1956
0
259
267
364
0
259
267
364
1992
1981
1971
1961
1956
44
Abstract Conditioning
approximate
1992
1992
1991
1981
1981
1971
1971
1961
1961
1956
1956
0
259
267
364
1992
1992
1981
1981
1971
1971
1961
1961
1956
1956
0
259
267
364
P1
P2
0
259
267
364
0
259
267
364
45
Approximation benefit
Bob (born September 24, 1980)
bday = 267
Secret
byear = 1980
=
byear
0 · bday · 364
1956 · byear · 1992
Assumption: this is accurate
each equally likely
1992
1956
bday
0
364
20:00
46
Initial distribution
byear
1992
P1
1956
bday
0
364
1/(37*365) - ² · probability · 1/(37*365) + ²
37*365 · # of points · 37*365
• Need attacker’s actual belief to be represented by this P1
• Much easier task than knowing the exact querier belief.
47
Abstraction Summary
• Approximate representation of a set of probability
distributions.
•
Abstract operations for Clarkson’s probabilistic semantics.
•
•
Sound in terms of policy, even after belief revision.
Restricted arithmetic expressions (linear only)
• No state enumeration and a way to restrict number of regions
• (more) computationally feasible
• Implementation
• Uses Parma PPL for polyhedra operations
• Uses Latte to determine size (# of integer points) inside polyhedra
48
Results
• Statespace size resistance
• See technical report for
•
•
more queries
precision/performance tradeoff issues
49
Statespace size resistance
=
byear
0 · bday · 364
1956 · byear · 1992
each equally likely
1992
1956
bday
0
364
50
=
0 · bday · 364
1956 · byear · 1992
each equally likely
byear
bday-query1
today := 260;
if bday ¸ today Æ bday < (today + 7)
then out := 1
else out := 0
1992
1956
bday
0
364
1. prob-poly-set (our implementation)
2. prob-scheme (sampling/enumeration)
• provides sound estimation after partial enumeration
•
measure time and probability bound of most probable state (pair
bday,byear)
51
Statespace size resistance
prob-scheme
1 pp
prob-poly-set
=
0 · bday · 364
1956 · byear · 1992
each equally likely
max belief
1
0.8
0.6
0.4
0.2
0
0
4
> 1 pp
8
12
16
running time [s]
=
0 · bday · 364
1910 · byear · 2010
each equally likely
max belief
1
0.8
0.6
0.4
0.2
0
0
4
8
12
16
20
running time [s]
24
28
32
36
24:00
52
Conclusions
• Knowledge modeling
• Probabilistic computation and revision of knowledge due to query outputs.
• Our results
• Suitable policy definition.
• Approximate (but sound in terms of policy) probabilistic interpretation.
• Experimental results.
• State size resistance.
• Drawbacks
• Restricted language, integer linear expressions
• Still too slow for practical use
• The future
• performance / accuracy
• simpler domains (octagons) can replace polyhedra (and Latte counting tool)
• use of Latte accounts for over 95% of most tests we ran
• various other technicalities
• less restricted language and/or state space
53
Thank you!
Go back.
54
Creating dependency
strange-query
if bday – 1956 ¸ byear - 5 Æ
bday – 1956 · byear + 5
then out := 1
else out := 0
• Dependencies can be created.
1992
1992
P1
P1
1956
1956
0
364
0
364
* not to scale and/or shape
55
Differential privacy
•
•
•
•
•
Encourage participation in a database by protecting inference of a record.
Add noise to query result.
Requires trusted curator.
Suitable for aggregate queries where exact results are not required.
Not suitable to a single user protecting him(her)self.
• Using differential privacy on a query with 2 output values results in the
incorrect query output almost half the time (for reasonable ²)
56
Composability / Collusion
• If collusion is suspected among queries, one can make them share belief
• query outputs are assumed to have been learned by all of them
• assumes deterministic queries only
• Confusion problem: non-deterministic queries can result in a revised belief
which is less sure of the true (or some) secret value than the initial belief
• If collusion is suspected but not present, this can result in incorrect belief
• Abstract solution: abstract union of the revised belief (colluded) and the
prebelief (not colluded)
P1 [ P2
if ±i 2 °(Pi) for i = 1,2 then ±1, ±2 2 °(P1 [ P2)
57
Setting, Known Secret
• Measures of badness that weigh their values by the distribution of the secret
(or the distribution of the outputs of a query) can be problematic.
• An unlikely user could theorize a query is safe, but know it is not safe for
them
• Consider conditional min-entropy
• (see Information-theoretic Bounds for Differentially Private Mechanisms)
H1(X | Y) = -log2 y PY(y) maxx PX|Y(x,y) > - log2 t
probability of certain query output
• Our policy:
• -log2 maxy maxx PX|Y(x,y) > - log2 t
• so that -log2 maxx PX|Y(x,yreal) > - log2 t
Download