From Stability to Differential Privacy Abhradeep Guha Thakurta Yahoo! Labs, Sunnyvale Thesis: Stable algorithms yield differentially private algorithms Differential privacy: A short tutorial Privacy in Machine Learning Systems Individuals ๐1 ๐2 ๐1 ๐๐−1 ๐๐ Privacy in Machine Learning Systems Individuals ๐1 ๐2 ๐1 ๐๐−1 ๐๐ Trusted learning Algorithm โณ Privacy in Machine Learning Systems Individuals ๐1 ๐2 ๐1 ๐๐−1 ๐๐ Trusted learning Algorithm โณ Summary statistics 1. Classifiers 2. Clusters 3. Regression coefficients Users Privacy in Machine Learning Systems Individuals ๐1 ๐2 ๐1 ๐๐−1 ๐๐ Attacker Trusted learning Algorithm โณ Summary statistics 1. Classifiers 2. Clusters 3. Regression coefficients Users Privacy in Machine Learning Systems Two conflicting goals: ๐1 1. Utility: Release accurate information ๐1 2. Privacy: Protect privacy of individual entries ๐๐ ๐2 ๐๐−1 Balancing the tradeoff is a difficult problem: 1. Netflix prize database attack [NS08] 2. Facebook advertisement system attack [Korolova11] 3. Amazon recommendation system attack [CKNFS11] Data privacy is an active area of research: • Computer science, economics, statistics, biology, social sciences … Learning Algorithm โณ Users Differential Privacy [DMNS06, DKMMN06] Intuition: • Adversary learns essentially the same thing irrespective of your presence or absence in the data set ๐1 Data set: ๐ท M Random coins M(๐ท) ๐1 Data set: ๐ท’ M M(๐ท’) Random coins • ๐ท and ๐ท′ are called neighboring data sets • Require: Neighboring data sets induce close distribution on outputs Differential Privacy [DMNS06, DKMMN06] Definition: A randomized algorithm M is (๐, ๐ฟ)-differentially private if • for all data sets ๐ท and ๐ท ′ that differ in one element • for all sets of answers ๐ Pr M ๐ท ∈ ๐ ≤ ๐ Pr[M(๐ท′) ∈ ๐] + ๐ฟ ๐ Semantics of Differential Privacy • Differential privacy is a condition on the algorithm • Guarantee is meaningful in the presence of any auxiliary information • Typically, think of privacy parameters: ๐ ≈ 0.1 and ๐ฟ = 1/๐log ๐ , where ๐ = # of data samples • Composition: ๐ ’s and ๐ฟ ‘s add up over multiple executions Laplace Mechanism [DMNS06] Data set ๐ท = {๐1 , โฏ , ๐๐ } and ๐: ๐ ∗ → โ be a function on ๐ท Sensitivity: S(๐)= max ๐ท,๐ท′ ,๐๐ป ๐ท,๐ท′ =1 |๐ ๐ท − ๐ ๐ท ′ | 1. ๐ธ: Random variable sampled from Lap(๐(๐)/๐) 2. Output ๐ ๐ท + ๐ธ Theorem (Privacy): Algorithm is ๐-differentially private This Talk 1. Differential privacy via stability arguments: A meta-algorithm 2. Sample and aggregate framework and private model selection 3. Non-private sparse linear regression in high-dimensions 4. Private sparse linear regression with (nearly) optimal rate Perturbation stability (a.k.a. zero local sensitivity) Perturbation Stability Data set ๐ท ๐1 ๐2 Function ๐ โฎ ๐๐ Output Perturbation Stability Data set ๐ท′ ๐1 ๐2 ′ Function ๐ โฎ ๐๐ Output Stability of ๐ at ๐ท: The output does not change on changing any one entry Equivalently, local sensitivity of ๐ at ๐ท is zero Distance to Instability Property • Definition: A function ๐: ๐ ∗ → ℜ is ๐ −stable at a data set ๐ท if • For any data set ๐ท′ ∈ ๐ ∗ , with ๐ทΔ๐ท′ ≤ ๐, ๐ ๐ท = ๐(๐ท ′ ) • Distance to instability: max(๐ ๐ท ๐๐ ๐ − ๐ ๐ก๐๐๐๐) ๐ • Objective: Output ๐(๐ท) while preserving differential privacy All data sets Unstable data sets Distance > ๐ ๐ท Stable data sets Propose-Test-Release (PTR) framework [DL09, KRSY11, Smith T.’13] A Meta-algorithm: Propose-Test-Release (PTR) Basic tool: Laplace mechanism 1. ๐๐๐ ๐ก ← max(๐ ๐ท ๐๐ ๐ − ๐ ๐ก๐๐๐๐) ๐ 2. ๐๐๐ ๐ก ← ๐๐๐ ๐ก + ๐ฟ๐๐ 3. If ๐๐๐ ๐ก > ๐๐๐(1/๐ฟ) , ๐ 1 ๐ then return ๐(๐ท), else return ⊥ Theorem: The algorithm is ๐, ๐ฟ −differentially private 2๐๐๐ 1/๐ฟ ๐ Theorem: If ๐ is -stable at ๐ท, then w.p. ≥ 1 − ๐ฟ the algorithm outputs ๐(๐ท) This Talk 1. Differential privacy via stability arguments: A meta-algorithm 2. Sample and aggregate framework and private model selection 3. Non-private sparse linear regression in high-dimensions 4. Private sparse linear regression with (nearly) optimal rate Sample and aggregate framework [NRS07, Smith11, Smith T.’13] Sample and Aggregate Framework Data set ๐ท Subsample ๐ท1 ๐ท๐ Aggregator Algorithm Output Sample and Aggregate Framework Theorem: If the aggregator is ๐/๐, ๐ฟ −differentially private, then the overall framework is ๐, ๐ฟ −differentially private Assumption: Each entry appears in ๐ data blocks Proof: Each data entry affects only one data block A differentially private aggregator using PTR framework [Smith T.’13] An ๐, ๐ฟ −differentially Private Aggregator Assumption: ๐ discrete possible outputs ๐1 , โฏ , ๐๐ ๐ท1 ๐ท๐ Vote Count Vote ๐1 ๐2 ๐∗ ๐๐ PTR+Report-Noisy-Max Aggregator Function ๐: Candidate output with the maximum votes 1. ๐๐๐ ๐ก ← max(๐ ๐ท ๐๐ ๐ − ๐ ๐ก๐๐๐๐) ๐ 2. ๐๐๐ ๐ก ← ๐๐๐ ๐ก + ๐ฟ๐๐ 3. If ๐๐๐ ๐ก > ๐๐๐(1/๐ฟ) , ๐ 1 ๐ then return ๐(๐ท), else return ⊥ Observation: 2 × (๐๐๐ ๐ก + 1) is the gap between the counts of highest and the second highest scoring model Observation: The algorithm is always computationally efficient Analysis of the aggregator under subsampling stability [Smith T.’13] Subsampling Stability Data set ๐ท Random subsample with replacement ๐ท1 Stability: ๐ท๐ Function ๐(๐ท๐ ) = Function ๐(๐ท) w.p. > 3 4 A Private Aggregator using Subsampling Stability Voting histogram (in expectation) • ๐ท๐ : Sample each entry from ๐ท w.p. ๐ = ๐(๐/log(1/๐ฟ)) • Each entry of ๐ท appears in ≈ ๐๐ data blocks 1 m 2 3 ๐ 4 1 ๐ 2 1 ๐ 4 ๐1 ๐2 ๐∗ ๐๐ PTR+Report-Noisy-Max Aggregator • ๐ท๐ : Sample each entry from ๐ท w.p. ๐ = ๐(๐/log(1/๐ฟ)) • Each entry of ๐ท appears in 2๐๐ data blocks w.p. ≥ 1 − ๐ฟ 1. ๐๐๐ ๐ก ← (๐∗ − ๐2 )/4๐๐ − 1 2. ๐๐๐ ๐ก ← ๐๐๐ ๐ก + ๐ฟ๐๐ 3. If ๐๐๐ ๐ก > ๐๐๐(1/๐ฟ) , ๐ 1 ๐ then return ๐(๐ท), else return ⊥ ๐1 ๐2 ๐∗ ๐๐ A Private Aggregator using Subsampling Stability Theorem: Above algorithm Notice: Utility guarantee does not depend on the of candidate private models is ๐, 2๐ฟnumber −differentially Theorem: If ๐ = log ๐/๐ฟ /๐2 ,then with probability at least 1 − 2๐ฟ, the true answer ๐ ๐ท is output This Talk 1. Differential privacy via stability arguments: A meta-algorithm 2. Sample and aggregate framework and private model selection 3. Non-private sparse linear regression in high-dimensions 4. Private sparse linear regression with (nearly) optimal rate Sparse linear regression in high-dimensions and the LASSO Sparse Linear Regression in High-dimensions (๐ โซ ๐) • Data set: ๐ท = { ๐ฅ1 , ๐ฆ1 , โฏ , ๐ฅ๐ , ๐ฆ๐ } where ๐ฅ๐ ∈ โ๐ and ๐ฆ๐ ∈ โ • Assumption: Data generated by noisy linear system = Feature vector ๐ฅ๐ Parameter vector ๐ฆ๐ ∗ ๐๐×1 + ๐ค๐ Field noise Data normalization: • ∀๐ ∈ ๐ , ||๐ฅ๐ ||∞ ≤ 1 • ∀๐, ๐ค๐ is sub-Gaussian Sparse Linear Regression in High-dimensions (๐ โซ ๐) • Data set: ๐ท = { ๐ฅ1 , ๐ฆ1 , โฏ , ๐ฅ๐ , ๐ฆ๐ } where ๐ฅ๐ ∈ โ๐ and ๐ฆ๐ ∈ โ ๐ฆ๐×1 ๐๐×๐ ∗ ๐๐×1 + Field noise = Design matrix Parameter vector Response vector • Assumption: Data generated by noisy linear system ๐ค๐×1 ๐ฆ๐×1 = Design matrix + Field noise Response vector Sparse Linear Regression in High-dimensions (๐ โซ ๐) ๐ค๐×1 ๐๐×๐ ∗ ๐๐×1 • Sparsity: ๐ ∗ has ๐ < ๐ non-zero entries • Bounded norm: ∀๐ ∗ |๐๐ | ∈ (1 − Φ, Φ) for arbitrary small const. Φ Model selection problem: Find the non-zero coordinates of ๐ ∗ ๐ฆ๐×1 = Design matrix + Field noise Response vector Sparse Linear Regression in High-dimensions (๐ โซ ๐) ๐ค๐×1 ๐๐×๐ ∗ ๐๐×1 Model selection: Non-zero coordinates (or the support) of ๐ ∗ Solution: LASSO estimator [Tibshirani94,EFJT03,Wainwright06,CT07,ZY07,…] 1 ๐ ∈ arg min๐ ||๐ฆ − ๐๐||22 + Λ||๐||1 ๐∈โ 2๐ Consistency of the LASSO Estimator = + Consistency conditions* [Wainwright06,ZY07]: • Γ: Support of the underlying parameter vector ๐ ∗ ๐Γ Incoherence ||| ๐Γ๐๐ ๐Γ ๐Γ๐ Restricted Strong Convexity 1 −1 ๐ ๐Γ ๐Γ |||∞ < 4 ๐๐๐๐ ๐Γ๐ ๐Γ = Ω(๐) Consistency of the LASSO Estimator = + Consistency conditions* [Wainwright06,ZY07]: • Γ: Support of the underlying parameter vector ๐ ∗ Theorem*: Under proper choice of Λ and ๐ = Ω(๐ log ๐), support of the LASSO estimator ๐ equals support of ๐ ∗ Incoherence ||| ๐Γ๐๐ ๐Γ Restricted Strong Convexity 1 −1 ๐ ๐Γ ๐Γ |||∞ < 4 ๐๐๐๐ ๐Γ๐ ๐Γ = Ω(๐) Stochastic Consistency of the LASSO = + Consistency conditions* [Wainwright06,ZY07]: • Γ: Support of the underlying parameter vector ๐ ∗ Incoherence ||| ๐Γ๐๐ ๐Γ ๐Γ๐ ๐Γ Restricted Strong Convexity −1 1 |||∞ < 4 ๐๐๐๐ ๐Γ๐ ๐Γ = Ω(๐) Theorem [Wainwright06,ZY07]: If each data entry in ๐ ∼ ๐ฉ(0,1/4), then the assumptions above are satisfied w.h.p. We show [Smith,T.’13] Consistency conditions Perturbation stability Proxy conditions This Talk 1. Differential privacy via stability arguments: A meta-algorithm 2. Sample and aggregate framework and private model selection 3. Non-private sparse linear regression in high-dimensions 4. Private sparse linear regression with (nearly) optimal rate Interlude: A simple subsampling based private LASSO algorithm [Smith,T.’13] Notion of Neighboring Data sets Design matrix Response vector ๐ฅ๐ ๐ฆ๐ Data set ๐ท = ๐ ๐ Notion of Neighboring Data sets Design matrix ๐ฅ๐ ′ Data set ๐ท′ = Response vector ๐ฆ๐ ′ ๐ ๐ ๐ท and ๐ท′ are neighboring data sets Recap: Subsampling Stability Data set ๐ท Random subsample with replacement ๐ท1 Stability: ๐ท๐ Function ๐(๐ท๐ ) = Function ๐(๐ท) w.p. > 3 4 Recap: PTR+Report-Noisy-Max Aggregator = + ๐ Assumption: All ๐ = candidate models ๐1 , โฏ , ๐๐ ๐ ๐ท1 ๐ท๐ ๐ Vote Count Vote ๐ ๐ ๐1 ๐2 ๐∗ ๐๐ Recap: PTR+Report-Noisy-Max Aggregator • ๐ท๐ : Sample each entry from ๐ท w.p. ๐ = ๐(๐/log(1/๐ฟ)) • Each entry of ๐ท appears in 2๐๐ data blocks w.p. ≥ 1 − ๐ฟ • Fix ๐ = log(๐/๐ฟ)/๐2 1. ๐๐๐ ๐ก ← (๐∗ − ๐2 )/4๐๐ − 1 2. ๐๐๐ ๐ก ← ๐๐๐ ๐ก + ๐ฟ๐๐ 3. If ๐๐๐ ๐ก > ๐๐๐(1/๐ฟ) , ๐ 1 ๐ then return ๐(๐ท), else return ⊥ ๐1 ๐2 ๐∗ ๐๐ Subsampling Stability of the LASSO ๐ฆ๐×1 ๐๐×๐ ∗ ๐๐×1 + Field noise = Design matrix Parameter vector Response vector Stochastic assumptions: Each data entry in ๐ ∼ ๐ฉ(0,1/4) Noise ๐ค ∼ ๐ฉ(0, ๐ 2 ๐๐ ) ๐ค๐×1 Subsampling Stability of the LASSO = + Notice the gap of log(1/๐ฟ) /๐ Stochastic assumptions: Each data entry in ๐ ∼ ๐ฉ(0,1/4) Noise ๐ค ∼ ๐ฉ(0, ๐ 2 ๐Scale ๐ ) of ๐ฟ = 1/๐๐ 1 Theorem [Wainwright06,ZY07]: Under proper choice of Λ and ๐ = Ω(๐ log ๐) , support of the LASSO estimator ๐ equals support of ๐ ∗ Theorem: Under proper choice of Λ , ๐ = Ω log(1/๐ฟ)๐ log ๐ ๐ and ๐ = ๐ฟ๐ด๐๐๐, the output of the aggregator equals support of ๐ ∗ Perturbation stability based private LASSO and optimal sample complexity [Smith,T.’13] Recap: Distance to Instability Property • Definition: A function ๐: ๐ ∗ → ℜ is ๐ −stable at a data set ๐ท if • For any data set ๐ท′ ∈ ๐ ∗ , with ๐ทΔ๐ท′ ≤ ๐, ๐ ๐ท = ๐(๐ท ′ ) • Distance to instability: max(๐ ๐ท ๐๐ ๐ − ๐ ๐ก๐๐๐๐) ๐ • Objective: Output ๐(๐ท) while preserving differential privacy All data sets Unstable data sets Distance > ๐ ๐ท Stable data sets Recap: Propose-Test-Release Framework (PTR) TBD: Some global sensitivity one query 1. ๐๐๐ ๐ก ← max(๐ ๐ท ๐๐ ๐ − ๐ ๐ก๐๐๐๐) ๐ 2. ๐๐๐ ๐ก ← ๐๐๐ ๐ก + ๐ฟ๐๐ 3. If ๐๐๐ ๐ก > ๐๐๐(1/๐ฟ) , ๐ 1 ๐ then return ๐(๐ท), else return ⊥ Theorem: The algorithm is ๐, ๐ฟ −differentially private 2๐๐๐ 1/๐ฟ ๐ Theorem: If ๐ is -stable at ๐ท, then w.p. ≥ 1 − ๐ฟ the algorithm outputs ๐(๐ท) Instantiation of PTR for the LASSO = LASSO: ๐ ∈ 1 arg min๐ ||๐ฆ ๐∈โ 2๐ − ๐๐||22 + Λ||๐||1 • Set function ๐ =support of ๐ • Issue: For ๐, distance to instability might not be efficiently computable + From [Smith,T.’13] Consistency conditions Perturbation stability Proxy conditions This talk Consistency conditions Perturbation stability Proxy conditions (Efficiently testable with privacy) Perturbation Stability of the LASSO = LASSO: ๐ ∈ 1 arg min๐ ||๐ฆ ๐∈โ 2๐ + − ๐๐||22 + Λ||๐||1 ๐ฝ(๐) Theorem: Consistency conditions on LASSO are sufficient for perturbation stability Proof Sketch: 1. Analyze Karush-Kuhn-Tucker (KKT) optimality conditions at ๐ 2. Show that support(๐) is stable via using ‘’dual certificate’’ on stable instances Perturbation Stability of the LASSO = Proof Sketch: 1 ๐ Gradient of LASSO ๐๐ฝ๐ท (๐)= − ๐ ๐ฆ − ๐๐ + Λ๐||๐||1 ๐ Lasso objective on ๐ท ๐ 0 ∈ ๐๐ฝ๐ท (๐) Lasso objective on ๐ท′ ๐′ 0 ∈ ๐๐ฝ๐ท′ (๐′) + Perturbation Stability of the LASSO = Proof Sketch: 1 ๐ Gradient of LASSO ๐๐ฝ๐ท (๐)= − ๐ ๐ฆ − ๐๐ + Λ๐||๐||1 ๐ Argue using the optimality conditions of ๐๐ฝ๐ท (๐) and ๐๐ฝ๐ท′ (๐′) 1. No zero coordinates of ๐ become non-zero in ๐′ (use mutual incoherence condition) 2. No non-zero coordinates of ๐ become zero in ๐′ (use restricted strong convexity condition) + Perturbation Stability Test for the LASSO = + 0 0 Γ: Support of ๐ Γ c : Complement of the support of ๐ Test for the following (real test is more complex): • Restricted Strong Convexity (RSC): Minimum eigenvalue of ๐ΓT XΓ is Ω(๐) • Strong stability: Negative of the (absolute) coordinates of the gradient of the least-squared loss in Γ ๐ are โช Λ Geometry of the Stability of LASSO = + Intuition: Strong convexity ensures supp(๐)⊆ supp(๐′) 1. Strong convexity ensures ||๐Γ − ๐′Γ ||∞ is small Lasso objective along Γ 2. If ∀๐, |๐Γ (๐)| is large, then ∀๐, ๐′Γ ๐ > 0 3. Consistency conditions imply ∀๐, |๐Γ (๐)| is large Dimension 2 in Γ ๐ Dimension 1 in Γ ๐ Geometry of the Stability of LASSO = + Intuition: Strong stability ensures no zero coordinate in ๐ becomes non-zero in ๐′ Lasso objective along Γ c Slope: Λ Slope: -Λ Dimension 1 in Γ Dimension 2 in Γ ๐ ๐ • For the minimizer ๐ to move along Γ ๐ , the perturbation to the gradient of least-squared loss has to be large Geometry of the Stability of LASSO = + Gradient of the least-squared loss: Γ −๐ ๐ ๐ฆ − ๐๐ = Lasso objective along Γ c ๐๐ Γc Slope: Λ Slope: -Λ ๐๐ Dimension 1 in Γ Dimension 2 in Γ ๐ ๐ • Strong stability: |๐๐ | โช Λ for all ๐ ∈ Γ ๐ ⇒ ๐ ∈ Γ ๐ has a sub-gradient of zero for LASSO(๐ท′) Making the Stability Test Private (Simplified) = + Test for Restricted Strong Convexity: ๐1 ๐ท ๐2 Test for strong stability: ๐2 ๐ท Issue: If ๐1 ๐ท > ๐ก1 and ๐2 ๐ท > ๐ก2 , then sensitivities are Δ1 and Δ2 ๐1 Our solution: Proxy distance ๐ = max ๐๐ ๐ท −๐ก๐ min Δ๐ ๐ • ๐ has global sensitivity of one ๐1 and ๐2 are both large and insensitive + 1,0 Private Model Selection with Optimal Sample Complexity = Nearly optimal sample complexity + 1. Compute ๐= function of ๐1 (๐ท) and ๐2 ๐ท 2. ๐๐๐ ๐ก ← ๐ + ๐ฟ๐๐ 3. If ๐๐๐ ๐ก > ๐๐๐(๐/๐น) , ๐ 1 ๐ then return ๐ ๐ข๐๐(๐), else return ⊥ Theorem: The algorithm is ๐, ๐ฟ −differentially private Theorem: Under consistency conditions , log ๐ > ๐ผ 2 ๐ 3 and ๐ = Ω(๐ log ๐), w.h.p. the support of ๐ ∗ is output. Here ๐ผ = log(1/๐ฟ)/๐. Thesis: Stable algorithms yield differentially private algorithms Two notions of stability: 1. Perturbation stability 2. Subsampling stability This Talk 1. Differential privacy via stability arguments: A meta-algorithm 2. Sample and aggregate framework and private model selection 3. Non-private sparse linear regression in high-dimensions 4. Private sparse linear regression with (nearly) optimal rate Concluding Remarks 1. Sample and aggregate framework with PTR+report-noisy-max aggregator is a generic tool for designing learning algorithms • Example: learning with non-convex models [Bilenko,Dwork,Rothblum,T.] 2. Propose-test-release framework is an interesting tool if one can compute distance to instability efficiently 3. Open problem: Private high-dimensional learning without assumptions like incoherence and restricted strong convexity