Presentation

From Differential Privacy to Machine Learning, and Back Abhradeep Guha Thakurta Yahoo Labs, Sunnyvale Thesis: Differential privacy ⇒ generalizability Stable learning ⇒ differential privacy Part I of this Talk 1. Towards a rigorous notion of statistical data privacy 2. Differential privacy: An overview 3. Generalization guarantee via differential privacy 4. Application: Follow-the-perturbed-leader Need for a rigorous notion of privacy Learning from Private Data Individuals 𝑑1 𝑑2 𝑑1 𝑑𝑛−1 𝑑𝑛 Learning from Private Data Individuals 𝑑1 𝑑2 𝑑1 𝑑𝑛−1 𝑑𝑛 Trusted learning Algorithm ℳ Learning from Private Data Individuals 𝑑1 𝑑2 𝑑1 𝑑𝑛−1 𝑑𝑛 Trusted learning Algorithm ℳ Summary statistics 1. Classifiers 2. Clusters 3. Regression coefficients Users Learning from Private Data Individuals 𝑑1 𝑑2 𝑑1 𝑑𝑛−1 𝑑𝑛 Attacker Trusted learning Algorithm ℳ Summary statistics 1. Classifiers 2. Clusters 3. Regression coefficients Users Learning from Private Data Whose data ? By whom ? To what end ? Typical output ? • Government agency: Census, general public, appointment, summary stats • Non-profit agency: MOOCs, researcher, improve class participation, summary stats and trends • For-profit agency: Yahoo!, researcher, improve sales, recommendations Learning from Private Data Two conflicting goals: 𝑑1 1. Utility: Release accurate information 𝑑1 2. Privacy: Protect privacy of individual entries 𝑑𝑛 𝑑2 𝑑𝑛−1 Balancing the tradeoff is a difficult problem: 1. Netflix prize database attack [NS08] 2. Facebook advertisement system attack [Korolova11] 3. Amazon recommendation system attack [CKNFS11] Data privacy is an active area of research: • Computer science, economics, statistics, biology, social sciences … Learning Algorithm ℳ Users Reconstruction attacks: A case of blatant non-privacy Reconstruction Attacks: General Principle Data set Query: 𝑓1 𝑑1 𝑎1 𝑑2 Answering a set of 𝑘 queries 𝑓1 , ⋯ , 𝑓Response: Show: 𝑘 fairly accurately 𝑑1 allows recovering 99% of the data set 𝑑𝑛−1 𝑑𝑛 Violates Query: 𝑓𝑘 any reasonable notion of privacy Response: 𝑎𝑘 Data set of n records ⊆ 0,1 𝑛 Linear Reconstruction Attack [DN03] 𝑓1 0 1 𝑓2 0 1 1 1 1 0 True response: 𝑎𝑖 = ⟨𝑓𝑖 , 𝐷⟩ Objective: Output 𝑎1 , ⋯ , 𝑎𝑘 s.t. • ∀𝑖 ∈ 𝑘 , 𝑎𝑖 − 𝑎𝑖 ≤ 𝛼 0 Theorem: If 𝛼 = 𝑜( 𝑛 ) and 𝑘 = 𝑓Ω(𝑛), then 𝑓1 , ⋯ , 𝑓𝑘 ∼𝑢 0,1 𝑛 1 0 1 𝑘 1 recovers 99% of the records in 𝐷 w.p. ≥ 1 − negl(𝑛) 𝑛 Set of queries ⊆ 0,1 𝐷 Linear Reconstruction Attack [DN03] Theorem: If 𝛼 = 𝑜( 𝑛 ) and 𝑘 = Ω(𝑛), then 𝑓1 , ⋯ , 𝑓𝑘 ∼𝑢 0,1 𝑛 recovers 99% of the records in 𝐷 w.p.≥ 1 − negl(𝑛) Efficient algorithm via linear programming Proof sketch: Recovery algorithm • Do an exhaustive search over 𝑋 ∈ 0,1 • For all 𝑖 ∈ 𝑘 , 𝑋, 𝑓𝑖 − 𝑎𝑖 ≤ 𝛼 𝑛 until the following is true Disqualifying criteria Linear Reconstruction Attack [DN03] • Do an exhaustive search over 𝑋 ∈ 0,1 • For all 𝑖 ∈ 𝑘 , 𝑋, 𝑓𝑖 − 𝑎𝑖 ≤ 𝛼 Disqualifying lemma: An 𝑓 ∼𝑢 0,1 𝑛 until the following is true Disqualifying criteria 𝑛 disqualifies 𝑋 w.p. ≥ 2/3 | 𝑓, 𝐷 − ⟨𝑓, 𝑋⟩| is (symmetric) binomially distributed Anti-concentration: ⟨𝑓, 𝐷⟩ ⟨𝑓, 𝑋⟩ W.p≥ 2/3 it is Ω( 𝑛) Linear Reconstruction Attack [DN03] • Do an exhaustive search over 𝑋 ∈ 0,1 𝑛 until the following is true • For all 𝑖 ∈ 𝑘 , 𝑋, 𝑓𝑖 − 𝑎𝑖 ≤ 𝛼 Disqualifying lemma: An 𝑓 ∼𝑢 0,1 Disqualifying criteria 𝑛 disqualifies 𝑋 w.p. ≥ 2/3 Disqualifies 𝑋 | 𝑓, 𝐷 − ⟨𝑓, 𝑋⟩| is (symmetric) binomially distributed Ω( 𝑛) Anti-concentration: ⟨𝑓, 𝐷⟩ 𝑎 ⟨𝑓, 𝑋⟩ By definition, 𝑜( 𝑛) Linear Reconstruction Attack [DN03] Theorem: If 𝛼 = 𝑜( 𝑛 ) and 𝑘 = Ω(𝑛), then 𝑓1 , ⋯ , 𝑓𝑘 ∼𝑢 0,1 𝑛 recovers 99% of the records in 𝐷 w.p.≥ 1 − negl(𝑛) Proof ketch: Disqualifying lemma: An 𝑓 ∼𝑢 0,1 𝑛 disqualifies 𝑋 w.p. ≥ 2/3 • Pr[No 𝑓 eliminates 𝑋]≤ 1/3𝑘 ⇒ Pr[∃ 𝑋 not eliminated]≤ 2𝑛 /3𝑘 ≤ negl(𝑛) for 𝑘 = Ω(𝑛) Other Reconstruction Attacks [DMT07,DY08,KS13] • [DMT07] allows reconstruction even when answers to around 20% of the queries are arbitrary • [DY08] improves [DMT07] from arbitrarily answering 20% of the queries to arbitrarily answering almost 1/2 the queries • [KS13] extends reconstruction attacks beyond linear queries, e.g., machine learning applications Part I of this Talk 1. Towards a rigorous notion of statistical data privacy 2. Differential privacy: An overview 3. Generalization guarantee via differential privacy 4. Application: Follow-the-perturbed-leader Differential privacy: An overview What it Means to be Private? What we cannot hope to achieve [DN10]: • Adversary learns very little about an individual from the output Example: Martian scientist discovers I have one left and a right foot Prior belief: has two left feet Survey shows every human being has one left and one right foot What it Means to be Private? What we cannot hope to achieve [DN10]: • Adversary learns very little about an individual from the output Example: Martian scientist discovers I have one left and a right foot Posterior belief: has one left and one right foot Survey shows every human being has one left and one right foot Notice: This does not depend on being in the survey What it Means to be Private? What we can hope to achieve [DMNS06, DKMMN06]: • Adversary learns essentially the same thing irrespective of your presence or absence in the data set 𝑑1 Data set: 𝐷 A Random coins A(𝐷) 𝑑1 Data set: 𝐷’ A A(𝐷’) Random coins • 𝐷 and 𝐷′ are called neighboring data sets • Require: Neighboring data sets induce close distribution on outputs Differential Privacy [DMNS06, DKMMN06] Definition: A randomized algorithm A is (𝜖, 𝛿)-differentially private if • for all data sets 𝐷 and 𝐷 ′ that differ in one element • for all sets of answers 𝑆 𝜖 Pr A 𝐷 ∈ 𝑆 ≤ 𝑒 Pr[A(𝐷′) ∈ 𝑆] + 𝛿 Semantics of Differential Privacy • Differential privacy is a condition on the algorithm • Guarantee is meaningful in the presence of any auxiliary information • Typically, think of privacy parameters: 𝜖 ≈ 0.1 and 𝛿 = 1/𝑛log 𝑛 , where 𝑛 = # of data samples • Composition: 𝜖 ’s and 𝛿 ‘s add up over multiple executions Few tools to design differentially private algorithms Laplace Mechanism [DMNS06] Data set 𝐷 = {𝑑1 , ⋯ , 𝑑𝑛 } and 𝑓: 𝑈 ∗ → ℝ𝑝 be a function on 𝐷 Global sensitivity: GS(𝑓, 1)= max 𝐷,𝐷′ ,𝑑𝐻 𝐷,𝐷′ =1 |𝑓 𝐷 − 𝑓 𝐷 ′ | 1 1. 𝐸: Random variable sampled from Lap(GS(𝑓, 1)/𝜖)𝑝 2. Output 𝑓 𝐷 + 𝐸 Theorem (Privacy): Algorithm is 𝜖-differentially private Gaussian Mechanism [DKMMN06] Data set 𝐷 = {𝑑1 , ⋯ , 𝑑𝑛 } and 𝑓: 𝑈 ∗ → ℝ𝑝 be a function on 𝐷 Global sensitivity: GS(𝑓, 2)= max 𝐷,𝐷′ ,𝑑𝐻 𝐷,𝐷′ =1 |𝑓 𝐷 − 𝑓 𝐷 ′ | 1. 𝐸: Random variable sampled from 𝒩 0, 𝕀𝑝 (𝐺𝑆 𝑓, 2 1/𝛿/𝜖 2. Output 𝑓 𝐷 + 𝐸 Theorem (Privacy): Algorithm is (𝜖, 𝛿)-differentially private 2 Laplace Mechanism in Action: Computing Histogram • Data domain 𝒟 = {𝑢1 , ⋯ , 𝑢𝑑 } (e.g., {red, yellow, green, blue}) • Data set 𝐷 ∈ 𝒟 𝑛 : 𝑑1 , ⋯ , 𝑑𝑛 , of 𝑛 samples from domain 𝒟 Count Histogram representation: 𝒖𝟏 𝒖𝟐 𝒖𝒊 𝒖𝒅 Laplace Mechanism in Action: Computing Histogram • 𝐻(𝐷): Vector of counts for the 𝑝 −bins ||1 = 𝑝 𝑖=1 𝐻 𝐷 ′ − 𝐻 𝐷 𝑢𝑖 𝑢𝑖 =2 Count For all neighbors D and D’, ||𝐻 𝐷 − 𝐻 𝐷′ 𝒖𝟏 𝒖𝟐 𝒖𝒊 𝒖𝒅 Laplace Mechanism in Action: Computing Histogram • 𝐻(𝐷): Vector of counts for the 𝑝 −bins 1. 𝐸: Random variable sampled from Lap(2/𝜖)𝑝 ||1 = 𝑝 𝑖=1 𝐻 𝐷 ′ − 𝐻 𝐷 𝑢𝑖 𝑢𝑖 =2 Count For all neighbors D and D’, ||𝐻 𝐷 − 𝐻 𝐷′ 2. Output 𝐻 𝐷 + 𝐸 𝒖𝟏 𝒖𝟐 𝒖𝒊 𝒖𝒑 Report-Noisy-Max (a.k.a. Exponential Mechanism) [MT07,BLST10] Set of candidate outputs 𝑆 = Score function 𝑓: 𝑆 × 𝒟 𝑛 → ℝ Domain of data sets Objective: Output 𝑠 ∈ 𝑆 , maximizing 𝑓(𝑠, 𝐷) Global sensitivity GS(𝑓)= max |𝑓 𝑠, 𝐷 − 𝑓(𝑠, 𝐷′)| for all neighbors 𝑠∈𝑆,𝐷,𝐷′ 𝐷 and 𝐷′ Report-Noisy-Max (a.k.a. Exponential Mechanism) [MT07,BLST10] Objective: Output 𝑠 ∈ 𝑆 , maximizing 𝑓(𝑠, 𝐷) Global sensitivity GS(𝑓)= max |𝑓 𝑠, 𝐷 − 𝑓(𝑠, 𝐷′)| for all neighbors 𝑠∈𝑆,𝐷,𝐷′ 𝐷 and 𝐷′ 1. 𝑎𝑠 ← 𝑓 𝑠, 𝐷 + Lap(GS(𝑓)/𝜖) 2. Output 𝑠 with highest value of 𝑎𝑠 Theorem (Privacy): Algorithm is 2𝜖-differentially private Local Sensitivity [NRS07] Data set 𝐷 = {𝑑1 , ⋯ , 𝑑𝑛 } and 𝑓: 𝑈 ∗ → ℝ𝑝 be a function on 𝐷 Local sensitivity: LS(𝑓, 𝐷, 1)= max ′ 𝐷′ ,𝑑𝐻 𝐷,𝐷 =1 |𝑓 𝐷 − 𝑓 𝐷 ′ | 1 1. 𝐸: Random variable sampled from Lap(LS(𝑓, 𝐷, 1)/𝜖)𝑝 2. Output 𝑓 𝐷 + 𝐸 Not differentially private Part II : We show that local sensitivity is an useful tool Composition Theorems [DMNS06,DL09,DRV10] (𝜖, 𝛿)-diff. private algorithm 𝒜1 Data set (𝜖, 𝛿)-diff. private algorithm 𝒜𝑘 Weak composition: 𝒜1 ∘ ⋯ ∘ 𝒜𝑘 is (𝑘𝜖, 𝑘𝛿)-diff. private Strong composition ≈ 𝒜1 ∘ ⋯ ∘ 𝒜𝑘 is ( 𝑘𝜖, 𝑘𝛿)-diff. private Part I of this Talk 1. Towards a rigorous notion of statistical data privacy 2. Differential privacy: An overview 3. Generalization guarantee via differential privacy 4. Application: Follow-the-perturbed-leader Convex risk minimization Convex Empirical Risk Minimization (ERM): An Example Linear classifiers in ℝ𝑝 𝑝 Domain: Feature vector 𝑥 ∈ ℝ Data set 𝐷 = 𝑥1 , 𝑦1 , ⋯ , 𝑥𝑛 , 𝑦𝑛 Label 𝑦 ∈{yellow, red} ∼i.i.d. 𝜏 +11 𝑛-1 ERM: Use ℒ 𝜃; 𝐷 = 𝑖=1 𝑦𝑖 ⟨𝑥𝑖 , 𝜃⟩ to approximate 𝜃𝑅 Distribution 𝜏 over (𝑥, 𝑦)𝑛 Find 𝜃 ∈ 𝒞 that classifies 𝑥, 𝑦 ∼ 𝜏 Minimize risk: 𝜃𝑅 = arg min 𝔼 𝜃∈𝒞 𝑥,𝑦 ∼𝜏 𝑦⟨𝑥, 𝜃⟩ Convex set 𝒞 of constant diameter Empirical Risk Minimization (ERM) Setup Convex loss function: ℓ: 𝒞 × 𝒟 → ℝ Data set 𝐷 = {𝑑1 , ⋯ , 𝑑𝑛 } Regularized ERM: 𝜃 = Loss function: ℒ(𝜃; 𝐷) 1 arg min 𝜃∈𝒞 𝑛 𝑛 𝑖=1 ℓ(𝜃; 𝑑𝑖 ) Objective: Minimize excess risk 𝔼𝑑∼𝜏 ℓ(𝜃; 𝑑) − min 𝔼𝑑∼𝜏 ℓ(𝜃; 𝑑) 𝜃∈𝒞 regularizer + 𝑟(𝜃) Used to stop overfitting Differential privacy yields generalizability Privacy implies Generalizability: A Bird’s Eye View Differential privacy Stability (robustness to outliers) Generalization (low excess risk) Prediction Stability and Generalizability [SSSS09] 𝛼 −Prediction stability: For any pair of neighboring data sets 𝐷, 𝐷 ′ ∀𝑑 ∈ 𝒟, 𝔼𝒜 ℓ(𝒜 𝐷 ; 𝑑) − 𝔼𝒜 ℓ(𝒜 𝐷 ′ ; 𝑑) = 𝛼 Theorem: Excess empirical risk of Algorithm 𝒜: 𝔼𝒜 ℒ(𝒜 𝐷 ; 𝐷) − min ℒ 𝜃; 𝐷 ≤ 𝐴 𝜃 Prediction stability of 𝒜 is 𝛼 Then 𝔼𝑑∼𝜏 ℓ(𝜃; 𝑑) − min 𝔼𝑑∼𝜏 ℓ 𝜃; 𝑑 𝜃∈𝒞 ≤𝐴+𝛼 Prediction Stability and Generalizability [SSSS09] 𝛼 −Prediction stability: For any pair of neighboring data sets 𝐷, 𝐷 ′ ∀𝑑 ∈ 𝒟, 𝔼𝒜 ℓ(𝒜 𝐷 ; 𝑑) − 𝔼𝒜 ℓ(𝒜 𝐷′ ; 𝑑 = 𝛼 Theorem: If ||𝛻ℓ 𝜃; 𝑑 ||2 ≤ 𝐿, then 𝛼 = 2𝐿/(Δ𝑛) Regularization implies stability: 𝒜(𝐷): 𝜃 = 1 arg min 𝜃∈𝒞 𝑛 𝑛 𝑖=1 ℓ(𝜃; 𝑑𝑖 ) + Δ 2 𝜃 2 2 Prediction Stability and Generalizability [Bassily, Smith,T.’2014] 𝛼 −Prediction stability: For any pair of neighboring data sets 𝐷, 𝐷 ′ ∀𝑑 ∈ 𝒟, 𝔼𝒜 ℓ(𝒜 𝐷 ; 𝑑) − 𝔼𝒜 ℓ(𝒜 𝐷′ ; 𝑑 = 𝛼 Theorem: If 𝒜 is (𝜖, 𝛿) −differentially private, then 𝛼 = 𝜖𝐿 + 𝛿𝐿2 Advantage over 𝐿2 −regularization: Stability does not rely on convexity of ℓ Prediction Stability and Generalizability [Bassily, Smith,T.’2014] Theorem: Excess empirical risk of Algorithm 𝒜: 𝔼𝒜 ℒ(𝒜 𝐷 ; 𝐷) − min ℒ 𝜃; 𝐷 ≤ 𝐴 𝜃 Part II of the talk: Can achieve A = O(𝐿 𝑝/(𝜖𝑛)) Prediction stability of 𝒜 is 𝛼 Then 𝔼𝑑∼𝜏 ℓ(𝜃; 𝑑) − min 𝔼𝑑∼𝜏 ℓ 𝜃; 𝑑 𝜃∈𝒞 ≤𝐴+𝛼 Theorem: If 𝒜 is (𝜖, 𝛿) −differentially private, then 𝛼 = 𝜖𝐿 + 𝛿𝐿2 Prediction Stability and Generalizability [Bassily, Smith,T.’2014] Theorem: Excess empirical risk of (𝜖, 𝛿) −diff. private algorithm 𝒜: 𝔼𝒜 ℒ(𝒜 𝐷 ; 𝐷) − min ℒ 𝜃; 𝐷 = 𝑂(𝐿 𝑝/(𝜖𝑛)) 𝜃 Prediction stability of 𝒜 is 𝜖𝐿. Setting 𝜖 = 𝔼𝑑∼𝜏 ℓ(𝜃; 𝑑) − min 𝔼𝑑∼𝜏 ℓ 𝜃; 𝑑 𝜃∈𝒞 𝑝/𝑛, = 𝑂(𝐿𝑝0.25 / 𝑛) Uniform convergence would have resulted in 𝑝 dependence Proof sketches… Stability implies Generalizability Theorem: Excess empirical risk of Algorithm 𝒜: 𝔼𝒜 ℒ(𝒜 𝐷 ; 𝐷) − min ℒ 𝜃; 𝐷 ≤ 𝐴 𝜃 Prediction stability of 𝒜 is 𝛼 Then 𝔼𝑑∼𝜏 ℓ(𝜃; 𝑑) − min 𝔼𝑑∼𝜏 ℓ 𝜃; 𝑑 𝜃∈𝒞 ≤𝐴+𝛼 Stability implies Generalizability: Proof Sketch Define true risk 𝑇 𝜃 = 𝔼𝑑∼𝜏 ℓ(𝜃; 𝑑) True risk does not change on resampling a data point Data set: 𝐷 ∼ 𝜏 𝑛 and 𝐷𝑖 has 𝑖-sample replaced by 𝑑 ∼ 𝜏 Claim: ∀𝑖 ∈ 𝑛 , 𝔼𝒜 𝑇 𝒜 𝐷 Follows from the fact that 𝑑𝑖 is independent of 𝐷𝑖 = 𝔼𝒜 𝑇 𝒜 𝐷𝑖 ⇒ Claim: 𝔼𝒜 𝑇 𝒜 𝐷 = 1 𝑛 𝑛 𝑖=1 𝔼 ℓ 𝒜 𝐷𝑖 ; 𝑑𝑖 Stability implies Generalizability: Proof Sketch Claim: ∀𝑖 ∈ 𝑛 , 𝔼𝒜 𝑇 𝒜 𝐷 Claim: 𝔼𝒜 𝑇 𝒜 𝐷 = 1 𝑛 = 𝔼𝒜 𝑇 𝒜 𝐷𝑖 𝑛 𝑖=1 𝔼 ℓ 𝒜 𝐷𝑖 ; 𝑑𝑖 ⇒ 𝔼𝑇 𝒜 𝐷 1 − ℒ(𝒜 𝐷 ; 𝐷) = 𝑛 ≤𝛼 𝑛 𝔼 ℓ 𝒜 𝐷𝑖 ; 𝑑𝑖 − ℓ(𝒜 𝐷 ; 𝑑𝑖 ) 𝑖=1 [Follows from prediction stability] Stability implies Generalizability: Proof Sketch •𝔼 𝑇 𝒜 𝐷 − ℒ(𝒜 𝐷 ; 𝐷) ≤ 𝛼 True risk minimizer • 𝔼 𝑇(𝜃𝑅 ) = 𝔼𝐷∼𝜏𝑛 ℒ(𝜃𝑅 ; 𝐷) ≥ 𝔼 ℒ 𝒜 𝐷 ; 𝐷 −𝐴 ⇒ Theorem: Excess empirical risk of Algorithm 𝒜: 𝔼𝒜 ℒ(𝒜 𝐷 ; 𝐷) − min ℒ 𝜃; 𝐷 ≤ 𝐴 𝜃 Prediction stability of 𝒜 is 𝛼 Then 𝔼𝑑∼𝜏 ℓ(𝜃; 𝑑) − min 𝔼𝑑∼𝜏 ℓ 𝜃; 𝑑 𝜃∈𝒞 ≤𝐴+𝛼 𝛼 −Prediction Stability via Differential Privacy Theorem: If 𝒜 is (𝜖, 𝛿) −differentially private, then 𝛼 = 𝜖𝐿 + 𝛿𝐿2 Some fixed element from 𝒞 Proof Sketch: By the definition of differential privacy, for all 𝑑 ∈ 𝒟 𝔼 ℓ(𝒜 𝐷 ; 𝑑) − 𝔼 ℓ(𝜃 ∗ ; 𝑑) − 𝔼 ℓ(𝒜 𝐷′ ; 𝑑) − 𝔼 ℓ(𝜃 ∗ ; 𝑑) ≤ 𝜖 𝔼 ℓ 𝒜 𝐷′ ; 𝑑 ≤ 𝜖𝐿 + 𝛿𝐿2 − 𝔼 ℓ 𝜃∗; 𝑑 + 𝛿𝐿2 Part I of this Talk 1. Towards a rigorous notion of statistical data privacy 2. Differential privacy: An overview 3. Generalization guarantee via differential privacy 4. Application: Follow-the-perturbed-leader Online learning with linear costs Online Learning Setup 𝜃1 ∈ 𝐿1 -ball Player 1 Cost: ⟨𝑓1 , 𝜃1 ⟩ 𝑓1 ∈ 𝐿∞ -ball Player 2 Online Learning Setup 𝜃2 ∈ 𝐿1 −ball Player 1 Cost: ⟨𝑓2 , 𝜃2 ⟩ 𝑓2 ∈ 𝐿∞ −ball Player 2 Online Learning Setup 𝜃𝑇 ∈ 𝐿1 −ball Minimize regret: Player 1 𝑇 𝑅 𝜃1 , ⋯ , 𝜃𝑇 = 𝑓𝑇 ∈ 𝐿∞ −ball 𝑡=1 Player 2 Oblivious adversary: 𝑓1 , ⋯ , 𝑓𝑇 are fixed ahead of time Cost: ⟨𝑓𝑇 , 𝜃𝑇 ⟩ 𝑇 𝜃𝑡 , 𝑓𝑡 − min 𝜃∈L1 ⟨𝜃, 𝑓𝑡 ⟩ 𝑡=1 Follow-the-leader and Follow-the-perturbed-leader Follow-the-leader (FTL) algorithm • 𝜃𝑡+1 ← arg min 𝜃∈𝐿1 𝑡 𝜏=1⟨𝑓𝜏 , 𝜃⟩ Theorem: FTL has regret of Ω(𝑇) from: Proof sketch:Follows Generate 𝑓1 , ⋯ , 𝑓𝑇 ∼iid 0,1 𝑝 1. Exponential mechanism 2. Strong composition theorem Follow-the-perturbed leader (FTPL) [KV05] • 𝜃𝑡+1 ← arg min 𝑡𝜏=1⟨𝑓𝜏 + 𝑏𝑡 , 𝜃⟩ 𝜃∈𝐿1 (where 𝑏𝑡 ∼ Lap 1/𝜖 𝑝 ) Theorem: FTPL with 𝜖 ≈ log(𝑝)/ 𝑇 has regret of 𝑂( log(𝑝) 𝑇) Theorem: FTPL is 𝜖, 𝛿 −diff. private Regret Analysis of FTPL via Differential Privacy Theorem: FTPL with 𝜖 ≈ log(𝑝)/ 𝑇 has regret of 𝑂( log(𝑝) 𝑇) 𝑡+1 Proof Sketch: Be-the-leader (BTL): 𝜃Follows ← arg min from prediction stability 𝑡+1 𝜏 , 𝜃⟩ 𝜏=1⟨𝑓 𝜃∈𝐿1 of differential privacy Claim: BTL has zero regret Be-the-perturbed-leader (BTPL): 𝜃𝑡+1 ← arg min 𝜃∈𝐿1 𝑡+1 𝜏=1⟨𝑓𝜏 + 𝑏𝑡 , 𝜃⟩ Claim: By 𝜖, 𝛿 −differential privacy, 𝔼 FTPL ≤ 𝑒 𝜖 𝔼[BTPL] + 𝛿 Regret Analysis of FTPL via Differential Privacy Proof Sketch: Be-the-leader (BTL): 𝜃𝑡+1 ← arg min 𝜃∈𝐿1 𝑡+1 𝜏=1⟨𝑓𝜏 , 𝜃⟩ 𝑡+1 Be-the-perturbed-leader (BTPL):Optimizing 𝜃𝑡+1 ← arg min ⟨𝑓𝜏 + 𝑏bound 𝑡 , 𝜃⟩ on 𝜖, gives𝜏=1 the regret 𝜃∈𝐿1 Ignores dependence on 𝜖 𝛿 for simplicity Claim: By 𝜖, 𝛿 −differential privacy, 𝔼 FTPL ≤ 𝑒 𝔼[BTPL] + 𝛿 Claim: 𝔼 BTPL ≤ 𝔼 BTL + 𝑂(log 𝑝 /𝜖) ⇒Regret(FTPL)= 𝑂(log 𝑝 /𝜖 + 𝜖𝑇) Part I of this Talk 1. Towards a rigorous notion of statistical data privacy 2. Differential privacy: An overview 3. Generalization guarantee via differential privacy 4. Application: Follow-the-perturbed-leader Thesis: Differential privacy ⇒ generalizability Stable learning ⇒ differential privacy Concluding Remarks • Differential privacy is a cryptographically strong notion of data privacy • Differential privacy provides a very strong form of stability, i.e., stability on the measure induced on the output space • Stability from differential privacy is composable i.e., any post-processing on the output is also stable References [DMNS06] Calibrating noise to sensitivity in private data analysis. Cynthia Dwork, Kobbi Nissim, Frank McSherry and Adam Smith, 2006 [DKMMN06] Our data Ourselves: Privacy via distributed noise generation. Cynthia Dwork, Krishnaram Kenthapadi, Frank McSherry, Ilya Mironov, and Kobbi Nissim, 2006 [MT07] Mechanism design via differential privacy. Frank McSherry and Kunal Talwar, 2007 [BLST10] Discovering frequent patterns in sensitive data. Raghav Bhaskar, Srivatsan Laxman, Adam Smith, and Abhradeep Thakurta, 2010 References [BST14] Differentially Private Empirical Risk Minimization: Efficient Algorithms and Tight Algorithms. Raef Bassily, Adam Smith, and Abhradeep Thakurta, 2014 [DN10] On the Difficulties of Disclosure Prevention, or The Case for Differential Privacy. Cynthia Dwork and Moni Naor, 2010 [DN03] Revealing information while preserving privacy. Irit Dinur and Moni Naor, 2003

Presentation

Related documents

Products

Support

Presentation

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib