Foundations of Privacy Lecture 3 Lecturer: Moni Naor Recap of last week’s lecture • The Simulation Paradigm for Defining and Proving Security of Cryptographic Protocols • The Basic Impossibility of Disclosure Prevention: – cannot hope to obtain results that are based on all possible auxiliary information Extractors and Fuzzy Extractors • Differential Privacy – For all adjacent databases – output probability is very close Desirable Properties from a sanitization mechanism • Composability – Applying the sanitization several time yields a graceful degradation – q releases , each -DP, are q¢ -DP • Robustness to side information – No need to specify exactly what the adversary knows Differential Privacy: satisfies both… Differential Privacy Protect individual participants: Dwork, McSherry, Nissim and Smith Probability of every bad event - or any event - increases only by small multiplicative factor when I enter the DB. May as well participate in DB… Adjacency: D+Me and D-Me ε-differentially private sanitizer A For all DBs D, all Me and all events T e-ε ≤ PrA[A(D+Me) 2 T] PrA[A(D-Me) 2 T] Handles aux input ≤ eε ≈ 1+ε Differential Privacy A gives - differential privacy if for all neighboring D1 and D2, and all T µ range(A ): Pr[ A (D1) 2 T] ≤ e Pr[ A (D2) 2 T] Neutralizes all linkage attacks. Composes unconditionally and automatically: Σi i ratio bounded Pr [response] Bad Responses: X X X 5 Differential Privacy: Important Properties P[z1] = Pr z~A1(D)[z=z1] Handles P’[z1] = auxiliary Pr z~A1(D’)information [z=z1] Composes naturally • A1(D) is ε1-diffP P[z2] = Pr z~A2(D,z1)[z=z2] • for all z1, A2(D,z1) is εP’[z 2-diffP, 2] = Pr z~A2(D’,z1)[z=z2] Then A2(D,A1(D)) is (ε1+ε2)-diffP Proof: for all adjacent D,D’ and (z1,z2): e-ε1 ≤ P[z1] / P’[z1] ≤ eε1 e-ε2 ≤ P[z2] / P’[z2] ≤ eε2 e-(ε1+ε2) ≤ P[(z1,z2)]/P’[(z1,z2)] ≤ eε1+ε2 Example: NO Differential Privacy U set of (name,tag 2{0,1}) tuples One counting query: #of participants with tag=1 Sanitizer A: choose and release a few random tags Bad event T: Only my tag is 1, my tag released PrA[A(D+Me) 2 T] ≥ 1/n Not diff private PrA[A(D-Me) 2 T] = 0 for any ε! e-ε ≤ PrA[A(D+Me) 2 T] PrA[A(D-Me) 2 T] ≤ eε ≈ 1+ε Size of ε D, D’ – totally unrelated databases How small can ε be? Utility should be very different • Cannot be negligible Why? • Hybrid argument Consider sequence D0=D, D1, D2, …, Dn =D’ where Di and Di+1 adjacent db. For each output set T Prob[T|D] ¸ Prob[T|D’] ¢ eεn How large can it be? • Think of a small constant Answering a single counting query U set of (name,tag2 {0,1}) tuples One counting query: #of participants with tag=1 Sanitizer A: output #of 1’s + noise Differentially private! If choose noise properly Choose noise from Laplace distribution Laplacian Noise Laplace distribution Y=Lap(b) has density function Pr[Y=y] =1/2b e-|y|/b Standard deviation: O(b) Take b=1/ε, get that Pr[Y=y] Ç e-|y| -4 -3 -2 -1 0 1 2 3 4 5 Laplacian Noise: ε-Privacy Take b=1/ε, get that Pr[Y=y] Ç e-|y| Release: q(D) + Lap(1/ε) For adjacent D,D’: |q(D) – q(D’)| ≤ 1 For output a: e- ≤ Prby D[a]/Prby D’[a] ≤ e -4 -3 -2 -1 0 1 2 3 4 5 Laplacian Noise: Õ(1/ε)-Error Take b=1/ε, get that Pr[Y=y] Ç e-|y| Pry~Y[|y| > k·1/ε] = O(e-k) Expected error is 1/ε, w.h.p error is Õ(1/ε) -4 -3 -2 -1 0 1 2 3 4 5 Randomized Response • Randomized Response Technique [Warner 1965] – Method for polling stigmatizing questions – Idea: Lie with known probability. • Specific answers are deniable • Aggregate results are still valid “trust no-one” • The data is never stored “in the plain” 1 + noise 0 + noise Popular in1DB literature + noise … Randomized Response with Laplacian Noise Initial idea: each user i, on input xi 2 {0, 1} Add to xi independent Laplace noise with magnitude 1/ε -4 -3 -2 -1 0 1 2 3 4 5 Privacy: since each increment protected by Laplace noise – differentially private whether xi is 0 or 1 Accuracy: noise cancels out, error Õ(√T) T – total Is it too high? number of users Scaling Noise to Sensitivity Global sensitivity of query q:Un→R GSq = maxD,D’ |q(D) – q(D’)| For a counting query q: GSq=1 Previous argument generalizes: For any query q:Un→ R release q(D) + Lap(GSq/ε) • ε-private • error Õ(GSq/ε) [0,n] Scaling Noise to Sensitivity Many dimensions Global sensitivity of query q:Un→Rd GSq = maxD,D’ ||q(D) – q(D’)||1 Previous argument generalizes: For any query q:Un→ Rd release q(D) + (Y1, Y2, … Yd) – Each Yi independent Lap(GSq/ε) • ε-private • error Õ(GSq/ε) Example: Histograms • Say x1, x2, ..., xn in domain U • Partition U into d disjoint bins • q(x1, x2, ..., xn) = (n1, n2, ..., nd) where nj = #{i : xi in j-th bin} • GSq =2 • Sufficient to add Lap(2/ε) noise to each count Problem: might not look like a histogram Covariance Matrix • Suppose each person’s data is a real vector (r1, r2, ..., rn ) • • Database is a matrix X • The covariance matrix of X is • (roughly) the matrix • Entries measure correlation between attributes • First step of many analyses, e.g. PCA Distance to DP with Property • Suppose P = set of “good” databases – well-clustered databases • Distance to P = # points in x that must be changed to put x in P • Always has GS = 1 • Example: – Distance to data set with “good clustering” x P K Means • A clustering algorithm with iteration • Always keeping k centers Median Median of x1, x2, ..., xn 2 [0,1] • X= 0,…,0,0,1,…,1 X’= 0,…,0,1,1,…,1 (n-1)/2 (n-1)/2 (n-1)/2 (n-1)/2 median(X) = 0 median(X’) = 1 • GSmedian = 1 • Noise magnitude: 1 . Too much noise! • But for “most” neighbor databases X, X’ |median(X) − median(X’)|is small. Can we add less noise on ”good” instances? Global Sensitivity vs. Local sensitivity • Global sensitivity is worst case over inputs Local sensitivity of query q at point D LSq(D)= maxD’ |q(D) – q(D’)| • Reminder: GSq(D) = maxD LSq(D) • Goal: add less noise when local sensitivity is lower • Problem: can leak information by amount of noise Local sensitivity of Median • For X = x1, x2, ..., xn • LSmedian(X) = max(xm − xm−1, xm+1 − xm) x1, x2, ..., xm-1, xm, xm+1, ..., xn Sensitivity of Local Sensitivity of Median Median of x1, x2, ..., xn 2 [0,1] • X= 0,…,0,0,0,0,1,…,1 X’= 0,…,0,0,0,1,1,…,1 (n-3)/2 LS(X) = 0 (n-3)/2 (n-3)/2 (n-3)/2 LS(X’) = 1 Noise magnitude must be an insensitive function! Smooth Upper Bound • Compute a “smoothed” version of local sensitivity • Design sensitivity function S(X) • S(X) is an -smooth upper bound on LSf(X) if: – for all x: S(X) ¸ LSf(X) – for all neighbors X, X’ : S(X) · eS(X’) • Theorem: if A(x) = f(x) + noise(S(x)/ε) then A is 2ε-differentially private. Smooth sensitivity • Smooth sensitivity Sf*(X)= maxY {LSf(Y)e- dist(x,y) } Claim: if S(X) is an -smooth upper bound on LSf(X) for Smooth sensitivity The Exponential Mechanism McSherry Talwar A general mechanism that yields • Differential privacy • May yield utility/approximation • Is defined (and evaluated) by considering all possible answers The definition does not yield an efficient way of evaluating it Application: Approximate truthfulness of auctions • Collusion resistance • Compatibility Example of the Exponential Mechanism • Data: xi = website visited by student i today • • Range: Y = {website names} • For each name y, let q(y; X) = #{i : xi = y} Goal: output the most frequently visited site • Procedure: Given X, Output website y with probability prop to e q(y,X) • • Popular sites exponentially more likely than rare ones • Website scores don’t change too quickly70 Projects Report on a paper • Apply a notion studied to some known domain • Checking the state of privacy is some setting • • • • Privacy in GWAS Privacy in crowd sourcing Privacy Preserving Wordle Unique identification bounds • How much worse are differential privacy guarantees in estimation • Contextual Privacy Planned Topics Privacy of Data Analysis • Differential Privacy – Definition and Properties – Statistical databases – Dynamic data • Privacy of learning algorithms • Privacy of genomic data Interaction with cryptography • SFE • Voting • Entropic Security • Data Structures • Everlasting Security • Privacy Enhancing Tech. – Mixed nets Office: Ziskind 248 Phone: 3701 E-mail: moni.naor@ Course Information Foundation of Privacy - Spring 2010 Instructor: Moni Naor When: Mondays, 11:00--13:00 (2 points) Where: Ziskind 1 • Course web page: • Prerequisites: familiarity with algorithms, data structures, probability theory, and linear algebra, at an undergraduate level; a basic course in computability is assumed. • Requirements: – Participation in discussion in class • Best: read the papers ahead of time – Homework: There will be several homework assignments • Homework assignments should be turned in on time (usually two weeks after they are given)! – Class Project and presentation – Exam : none planned