DYNAMIC ENFORCEMENT OF KNOWLEDGE-BASED SECURITY POLICIES Piotr (Peter) Mardziel, Stephen Magill, Michael Hicks, and Mudhakar Srivatsa 2 Your information 3 Bad + Good 4 Want • Protect personal information. • Preserve beneficial uses. 5 Take back control Photography querier you 6 Useful and non-revealing out = 24 ≤ Age ≤ 30 Æ Female? Æ Engaged? * Photography true querier * real query used by a facebook advertiser you 7 out = (age, zip-code, birth-date) * reject * - zip-code: postal code in USA - age, zip-code, birth-date can be used to uniquely identify 63% of Americans 8 Benefits • Can provide exact answer to query. • Need not know ahead of time how your information can be useful. • Need not preemptively hide information. • doesn’t restricts uses 9 The problem: how to decide to run a query or not? Q1 out := 24 ≤ age ≤ 30 Æ female? Æ engaged? Q2 out := age ? Q3 out := (age, zip-code, birth-date) 10 Setting • Past answers, unknown future queries. • Cannot view queries in isolation. • Cannot use various static QIF measures. • Rejection. • Cannot let rejection defeat our protection. … Q41 Q42 X reject time Q43 … 11 Goal: restrict querier knowledge • Define knowledge. • Define learning. 12 Knowledge model = belief (probability distribution) over secret data • Given belief, a query, determine revised belief given the output of the query (Bayesian revision) • evaluate revised belief as acceptable or not • Can repeat the process on future queries • start with the revised belief from the previous one 13 Meet Bob Bob (born September 24, 1980) bday = 267 Secret byear = 1980 = byear 0 · bday · 364 1956 · byear · 1992 Assumption: this is accurate each equally likely 1992 1956 bday 0 364 14 1992 bday-query1 today := 260; if bday ¸ today Æ bday < (today + 7) then out := 1 else out := 0 1956 0 364 1992 = 1956 0 259 267 364 | (out = 0) P 15 bday-query1 today := 260; if bday ¸ today Æ bday < (today + 7) then out := 1 else out := 0 Problem Policy: Is this acceptable? 1992 = = 1956 0 259 267 364 | (out = 0) 16 Probabilistic interpretation • We can use (general purpose) probabilistic languages to model attacker knowledge and how it changes due to query results. • prob-scheme, IBAL, etc. • Problem: (standard) enumeration-based probabilistic interpretation and revision is too slow. 17 Outline • Our results • Problem 1: Suitable policy definition. • Problem 2: Approximate (but sound in terms of policy) probabilistic computation. • Implementation. • Experimental results. • compare to prob-scheme, enumeration-based probabilistic interpretation 18 Problem 1: Policy • Let us consider a policy • Pr[my secret] < t • Choice of threshold t might depend on risks involved in revelation of secret. 19 P Bob’s policy Bob (born September 24, 1980) bday = 267 Secret byear = 1980 = byear 0 · bday · 364 1956 · byear · 1992 each equally likely Pr[bday = 267] … Policy Pr[bday] < 0.2 Pr[bday,byear] < 0.05 1992 Currently Pr[bday] = 1/365 Pr[bday,byear] = 1/(365*37) 1956 bday 0 364 20 Bob’s policies Policy Pr[bday] < 0.2 • bday specifically should never be certainly known 21 Bob’s policies Policy Pr[bday,byear] < 0.05 • the pair of bday,byear should never be certainly known • doesn’t commit to protecting bday nor byear alone • allows queries that need high precision in one, or low precision in both 22 1992 bday-query1 today := 260; if bday ¸ today Æ bday < (today + 7) then out := 1 else out := 0 1956 0 364 1992 = 1956 0 259 267 364 | (out = 0) 23 bday-query1 today := 260; if bday ¸ today Æ bday < (today + 7) then out := 1 else out := 0 Potentially Pr[bday] = 1/358 < 0.2 Pr[bday,byear] = 1/(358*37) < 0.05 1992 = = 1956 0 259 267 364 | (out = 0) 24 Next day … 1992 bday-query2 today := 261; if bday ¸ today Æ bday < (today + 7) then out := 1 else out := 0 1956 0 259 267 364 1992 = | (out = 1) 1956 0 267 Pr[bday] = 1 So reject? P 25 Querier’s perspective Assume querier knows policy if bday != 267 if bday = 267 1992 1992 1956 1956 0 259 268 will get answer 364 0 267 will get reject 26 Rejection problem | reject = • Policy: Pr[bday = 267 | out = o] < t • Rejection, intended to protect secret, reveals secret. 27 Rejection revised • Policy: Pr[bday = 267 | out = o] < t • Solution? • Decide policy independently of secret • Revised policy • for every possible output o, • for every possible bday b, • Pr[bday = b | out = o] < t • So the real bday in particular 28 bday-query1 today := 260; if bday ¸ today Æ bday < (today + 7) then out := 1 else out := 0 accept initial belief 29 (after bday-query1) bday-query2 today := 261; if bday ¸ today Æ bday < (today + 7) then out := 1 else out := 0 reject (regardless of what bday actually is) 30 (after bday-query1) bday-query3 today := 266; if bday ¸ today Æ bday < (today + 7) then out := 1 else out := 0 accept 31 Problem 2: Slowness • We can use (general purpose) probabilistic languages to model attacker knowledge and how it changes due to query results. • prob-scheme, IBAL, etc. • (standard) enumeration-based probabilistic interpretation and revision is too slow. • Let us look at why. 32 Enumeration 1992 input state 1956 0 364 bday-query1 today := 260; if bday ¸ today Æ bday < (today + 7) then out := 1 else out := 0 output state Æ (out = 0) * * ± | (out = 0) = normalize(± Æ (out = 0)) 33 Enumeration • A lot of states = slow • Can (over estimate) probabilities after partial enumeration. • Assume unseen mass is in worst possible spot prob-scheme prob-poly-set precision 1 0.8 0.6 0.4 0.2 0 0 4 8 running time [s] 12 16 34 Problem 2: Slowness • Let us do away with enumeration. 15:00 35 Representation 1992 P1 1956 0 P1: 0 · bday · 364, 1956 · byear · 1992 p = 0.000074 364 P1: 0 · bday · 259, 1956 · byear · 1992, out = 0 p = 0.000074 P2: 267 · bday · 364, 1956 · byear · 1992, out = 1 p = 0.000074 … P 36 Region • Enumeration of states ! Manipulation of regions of states • Problems • too many regions • non-uniform regions 37 Too many regions 1992 1991 1981 1971 nasty-query1 … disjuncts … … more disjuncts … … disjuncts EVERYWHERE … 1961 1956 0 259 267 364 38 Approximation abstract P1 1992 1992 1991 P2 1981 1981 1971 1971 1961 1961 1956 1956 0 259 364 267 P1: 0 · bday · 259, 1992 · byear · 1992 p = 0.000067 P2: 0 · bday · 259, 1982 · byear · 1990 p = 0.000067 … P1 0 P2 259 267 364 P1: 0 · bday · 259, 1956 · byear · 1992 p = 0.000067 s · 8580 ( size of P1 ) P2: 267 · bday · 364, 1956 · byear · 1992 p = 0.000067 s · 3234 ( size of P2 ) P10 p,s refer to possible (non-zero probability) points in region 39 Abstraction imprecision abstract 1992 1992 1991 1981 1981 1971 1971 1961 1961 1956 1956 0 259 267 364 0 1992 1992 1981 1981 1971 1971 1961 1961 1956 1956 0 259 267 364 P1 0 P2 259 259 267 267 364 364 40 Non-uniform regions P1 1992 P2 1981 1971 1961 nasty-query-2 … disjuncts … … more disjuncts … … probabilistic choice … 1956 0 259 267 364 P 41 Approximation approximate P1 1992 1992 1991 P2 1981 1981 1971 1971 1961 1961 1956 1956 0 259 267 P1: 0 · bday · 259, 1992 · byear · 1992 p = 0.0000074 P2: 0 · bday · 259, 1991 · byear · 1991 p = 0.000074 … P18 364 P1 0 P2 259 267 364 P1: 0 · bday · 259, 1956 · byear · 1992 p · 0.000074 s · 9620 P2: 267 · bday · 364, 1956 · byear · 1992 p · 0.000074 s · 3626 for policy check 8 … 8 … Pr[bday = b | out = o] < t 42 Abstraction abstract 1992 1991 1981 P1 1971 P2 1961 1956 0 Probabilistic Polyhedron 259 267 364 For each Pi, store region (polyhedron) upper bound on probability of each possible point upper bound on the number of (possible) points Also store lower bounds on the above Pr[A | B] = Pr[A Æ B] / Pr[B] 43 Abstract Conditioning 1992 1981 1971 1961 1956 0 259 267 364 0 259 267 364 1992 1981 1971 1961 1956 44 Abstract Conditioning approximate 1992 1992 1991 1981 1981 1971 1971 1961 1961 1956 1956 0 259 267 364 1992 1992 1981 1981 1971 1971 1961 1961 1956 1956 0 259 267 364 P1 P2 0 259 267 364 0 259 267 364 45 Approximation benefit Bob (born September 24, 1980) bday = 267 Secret byear = 1980 = byear 0 · bday · 364 1956 · byear · 1992 Assumption: this is accurate each equally likely 1992 1956 bday 0 364 20:00 46 Initial distribution byear 1992 P1 1956 bday 0 364 1/(37*365) - ² · probability · 1/(37*365) + ² 37*365 · # of points · 37*365 • Need attacker’s actual belief to be represented by this P1 • Much easier task than knowing the exact querier belief. 47 Abstraction Summary • Approximate representation of a set of probability distributions. • Abstract operations for Clarkson’s probabilistic semantics. • • Sound in terms of policy, even after belief revision. Restricted arithmetic expressions (linear only) • No state enumeration and a way to restrict number of regions • (more) computationally feasible • Implementation • Uses Parma PPL for polyhedra operations • Uses Latte to determine size (# of integer points) inside polyhedra 48 Results • Statespace size resistance • See technical report for • • more queries precision/performance tradeoff issues 49 Statespace size resistance = byear 0 · bday · 364 1956 · byear · 1992 each equally likely 1992 1956 bday 0 364 50 = 0 · bday · 364 1956 · byear · 1992 each equally likely byear bday-query1 today := 260; if bday ¸ today Æ bday < (today + 7) then out := 1 else out := 0 1992 1956 bday 0 364 1. prob-poly-set (our implementation) 2. prob-scheme (sampling/enumeration) • provides sound estimation after partial enumeration • measure time and probability bound of most probable state (pair bday,byear) 51 Statespace size resistance prob-scheme 1 pp prob-poly-set = 0 · bday · 364 1956 · byear · 1992 each equally likely max belief 1 0.8 0.6 0.4 0.2 0 0 4 > 1 pp 8 12 16 running time [s] = 0 · bday · 364 1910 · byear · 2010 each equally likely max belief 1 0.8 0.6 0.4 0.2 0 0 4 8 12 16 20 running time [s] 24 28 32 36 24:00 52 Conclusions • Knowledge modeling • Probabilistic computation and revision of knowledge due to query outputs. • Our results • Suitable policy definition. • Approximate (but sound in terms of policy) probabilistic interpretation. • Experimental results. • State size resistance. • Drawbacks • Restricted language, integer linear expressions • Still too slow for practical use • The future • performance / accuracy • simpler domains (octagons) can replace polyhedra (and Latte counting tool) • use of Latte accounts for over 95% of most tests we ran • various other technicalities • less restricted language and/or state space 53 Thank you! Go back. 54 Creating dependency strange-query if bday – 1956 ¸ byear - 5 Æ bday – 1956 · byear + 5 then out := 1 else out := 0 • Dependencies can be created. 1992 1992 P1 P1 1956 1956 0 364 0 364 * not to scale and/or shape 55 Differential privacy • • • • • Encourage participation in a database by protecting inference of a record. Add noise to query result. Requires trusted curator. Suitable for aggregate queries where exact results are not required. Not suitable to a single user protecting him(her)self. • Using differential privacy on a query with 2 output values results in the incorrect query output almost half the time (for reasonable ²) 56 Composability / Collusion • If collusion is suspected among queries, one can make them share belief • query outputs are assumed to have been learned by all of them • assumes deterministic queries only • Confusion problem: non-deterministic queries can result in a revised belief which is less sure of the true (or some) secret value than the initial belief • If collusion is suspected but not present, this can result in incorrect belief • Abstract solution: abstract union of the revised belief (colluded) and the prebelief (not colluded) P1 [ P2 if ±i 2 °(Pi) for i = 1,2 then ±1, ±2 2 °(P1 [ P2) 57 Setting, Known Secret • Measures of badness that weigh their values by the distribution of the secret (or the distribution of the outputs of a query) can be problematic. • An unlikely user could theorize a query is safe, but know it is not safe for them • Consider conditional min-entropy • (see Information-theoretic Bounds for Differentially Private Mechanisms) H1(X | Y) = -log2 y PY(y) maxx PX|Y(x,y) > - log2 t probability of certain query output • Our policy: • -log2 maxy maxx PX|Y(x,y) > - log2 t • so that -log2 maxx PX|Y(x,yreal) > - log2 t