Presentation

advertisement
From Differential Privacy to
Machine Learning, and Back
Abhradeep Guha Thakurta
Yahoo Labs, Sunnyvale
Thesis: Differential privacy ⇒ generalizability
Stable learning ⇒ differential privacy
Part I of this Talk
1. Towards a rigorous notion of statistical data privacy
2. Differential privacy: An overview
3. Generalization guarantee via differential privacy
4. Application: Follow-the-perturbed-leader
Need for a rigorous notion of privacy
Learning from Private Data
Individuals
𝑑1
𝑑2
𝑑1
𝑑𝑛−1
𝑑𝑛
Learning from Private Data
Individuals
𝑑1
𝑑2
𝑑1
𝑑𝑛−1
𝑑𝑛
Trusted learning Algorithm
β„³
Learning from Private Data
Individuals
𝑑1
𝑑2
𝑑1
𝑑𝑛−1
𝑑𝑛
Trusted learning Algorithm
β„³
Summary
statistics
1. Classifiers
2. Clusters
3. Regression
coefficients
Users
Learning from Private Data
Individuals
𝑑1
𝑑2
𝑑1
𝑑𝑛−1
𝑑𝑛
Attacker
Trusted learning Algorithm
β„³
Summary
statistics
1. Classifiers
2. Clusters
3. Regression
coefficients
Users
Learning from Private Data
Whose data ? By whom ? To what end ? Typical output ?
• Government agency: Census, general public, appointment, summary stats
• Non-profit agency: MOOCs, researcher, improve class participation,
summary stats and trends
• For-profit agency: Yahoo!, researcher, improve sales, recommendations
Learning from Private Data
Two conflicting goals:
𝑑1
1.
Utility: Release accurate information
𝑑1
2.
Privacy: Protect privacy of individual entries
𝑑𝑛
𝑑2
𝑑𝑛−1
Balancing the tradeoff is a difficult problem:
1.
Netflix prize database attack [NS08]
2.
Facebook advertisement system attack [Korolova11]
3.
Amazon recommendation system attack [CKNFS11]
Data privacy is an active area of research:
• Computer science, economics, statistics, biology, social sciences …
Learning
Algorithm
β„³
Users
Reconstruction attacks: A case of blatant
non-privacy
Reconstruction Attacks: General Principle
Data set
Query: 𝑓1
𝑑1
π‘Ž1
𝑑2 Answering a set of π‘˜ queries 𝑓1 , β‹― , 𝑓Response:
Show:
π‘˜ fairly accurately
𝑑1 allows recovering 99% of the data set
𝑑𝑛−1
𝑑𝑛
Violates
Query: π‘“π‘˜
any reasonable notion of privacy
Response: π‘Žπ‘˜
Data set of n records ⊆ 0,1
𝑛
Linear Reconstruction Attack [DN03]
𝑓1
0
1
𝑓2
0
1
1
1
1
0
True response: π‘Žπ‘– = ⟨𝑓𝑖 , 𝐷⟩
Objective: Output π‘Ž1 , β‹― , π‘Žπ‘˜ s.t.
• ∀𝑖 ∈ π‘˜ , π‘Žπ‘– − π‘Žπ‘– ≤ 𝛼
0
Theorem: If 𝛼 = π‘œ( 𝑛 ) and π‘˜ = 𝑓Ω(𝑛), then 𝑓1 , β‹― , π‘“π‘˜ ∼𝑒 0,1 𝑛
1 0
1
π‘˜
1 recovers 99% of the records
in 𝐷 w.p. ≥ 1 − negl(𝑛)
𝑛
Set
of
queries
⊆
0,1
𝐷
Linear Reconstruction Attack [DN03]
Theorem: If 𝛼 = π‘œ( 𝑛 ) and π‘˜ = Ω(𝑛), then 𝑓1 , β‹― , π‘“π‘˜ ∼𝑒 0,1 𝑛
recovers 99% of the records in 𝐷 w.p.≥ 1 − negl(𝑛)
Efficient algorithm via
linear programming
Proof sketch: Recovery algorithm
• Do an exhaustive search over 𝑋 ∈ 0,1
• For all 𝑖 ∈ π‘˜ , 𝑋, 𝑓𝑖 − π‘Žπ‘– ≤ 𝛼
𝑛
until the following is true
Disqualifying criteria
Linear Reconstruction Attack [DN03]
• Do an exhaustive search over 𝑋 ∈ 0,1
• For all 𝑖 ∈ π‘˜ , 𝑋, 𝑓𝑖 − π‘Žπ‘– ≤ 𝛼
Disqualifying lemma: An 𝑓 ∼𝑒 0,1
𝑛
until the following is true
Disqualifying criteria
𝑛
disqualifies 𝑋 w.p. ≥ 2/3
| 𝑓, 𝐷 − ⟨𝑓, 𝑋⟩| is (symmetric) binomially distributed
Anti-concentration:
⟨𝑓, 𝐷⟩
⟨𝑓, 𝑋⟩
W.p≥ 2/3
it is Ω( 𝑛)
Linear Reconstruction Attack [DN03]
• Do an exhaustive search over 𝑋 ∈ 0,1
𝑛
until the following is true
• For all 𝑖 ∈ π‘˜ , 𝑋, 𝑓𝑖 − π‘Žπ‘– ≤ 𝛼
Disqualifying lemma: An 𝑓 ∼𝑒 0,1
Disqualifying criteria
𝑛
disqualifies 𝑋 w.p. ≥ 2/3
Disqualifies 𝑋
| 𝑓, 𝐷 − ⟨𝑓, 𝑋⟩| is (symmetric) binomially distributed
Ω( 𝑛)
Anti-concentration:
⟨𝑓, 𝐷⟩
π‘Ž
⟨𝑓, 𝑋⟩
By definition, π‘œ( 𝑛)
Linear Reconstruction Attack [DN03]
Theorem: If 𝛼 = π‘œ( 𝑛 ) and π‘˜ = Ω(𝑛), then 𝑓1 , β‹― , π‘“π‘˜ ∼𝑒 0,1 𝑛
recovers 99% of the records in 𝐷 w.p.≥ 1 − negl(𝑛)
Proof ketch:
Disqualifying lemma: An 𝑓 ∼𝑒 0,1
𝑛
disqualifies 𝑋 w.p. ≥ 2/3
• Pr[No 𝑓 eliminates 𝑋]≤ 1/3π‘˜
⇒ Pr[∃ 𝑋 not eliminated]≤ 2𝑛 /3π‘˜
≤ negl(𝑛) for π‘˜ = Ω(𝑛)
Other Reconstruction Attacks [DMT07,DY08,KS13]
• [DMT07] allows reconstruction even when answers to around 20%
of the queries are arbitrary
• [DY08] improves [DMT07] from arbitrarily answering 20% of the
queries to arbitrarily answering almost 1/2 the queries
• [KS13] extends reconstruction attacks beyond linear queries,
e.g., machine learning applications
Part I of this Talk
1. Towards a rigorous notion of statistical data privacy
2. Differential privacy: An overview
3. Generalization guarantee via differential privacy
4. Application: Follow-the-perturbed-leader
Differential privacy: An overview
What it Means to be Private?
What we cannot hope to achieve [DN10]:
• Adversary learns very little about an individual from the output
Example: Martian scientist discovers I have one left and a right foot
Prior belief:
has two left feet
Survey shows every human being has one left and one right foot
What it Means to be Private?
What we cannot hope to achieve [DN10]:
• Adversary learns very little about an individual from the output
Example: Martian scientist discovers I have one left and a right foot
Posterior belief:
has one left and one right foot
Survey shows every human being has one left and one right foot
Notice: This does not depend on
being in the survey
What it Means to be Private?
What we can hope to achieve [DMNS06, DKMMN06]:
• Adversary learns essentially the same thing irrespective of your
presence or absence in the data set
𝑑1
Data set: 𝐷
A
Random coins
A(𝐷)
𝑑1
Data set: 𝐷’
A
A(𝐷’)
Random coins
• 𝐷 and 𝐷′ are called neighboring data sets
• Require: Neighboring data sets induce close distribution on outputs
Differential Privacy [DMNS06, DKMMN06]
Definition:
A randomized algorithm A is (πœ–, 𝛿)-differentially private if
• for all data sets 𝐷 and 𝐷 ′ that differ in one element
• for all sets of answers 𝑆
πœ–
Pr A 𝐷 ∈ 𝑆 ≤ 𝑒 Pr[A(𝐷′) ∈ 𝑆] + 𝛿
Semantics of Differential Privacy
• Differential privacy is a condition on the algorithm
• Guarantee is meaningful in the presence of any auxiliary information
• Typically, think of privacy parameters: πœ– ≈ 0.1 and 𝛿 = 1/𝑛log 𝑛 ,
where 𝑛 = # of data samples
• Composition: πœ– ’s and 𝛿 ‘s add up over multiple executions
Few tools to design differentially private
algorithms
Laplace Mechanism [DMNS06]
Data set 𝐷 = {𝑑1 , β‹― , 𝑑𝑛 } and 𝑓: π‘ˆ ∗ → ℝ𝑝 be a function on 𝐷
Global sensitivity: GS(𝑓, 1)=
max
𝐷,𝐷′ ,𝑑𝐻 𝐷,𝐷′ =1
|𝑓 𝐷 − 𝑓 𝐷 ′ | 1
1. 𝐸: Random variable sampled from Lap(GS(𝑓, 1)/πœ–)𝑝
2. Output 𝑓 𝐷 + 𝐸
Theorem (Privacy): Algorithm is πœ–-differentially private
Gaussian Mechanism [DKMMN06]
Data set 𝐷 = {𝑑1 , β‹― , 𝑑𝑛 } and 𝑓: π‘ˆ ∗ → ℝ𝑝 be a function on 𝐷
Global sensitivity: GS(𝑓, 2)=
max
𝐷,𝐷′ ,𝑑𝐻 𝐷,𝐷′ =1
|𝑓 𝐷 − 𝑓 𝐷 ′ |
1. 𝐸: Random variable sampled from 𝒩 0, 𝕀𝑝 (𝐺𝑆 𝑓, 2
1/𝛿/πœ–
2. Output 𝑓 𝐷 + 𝐸
Theorem (Privacy): Algorithm is (πœ–, 𝛿)-differentially private
2
Laplace Mechanism in Action: Computing Histogram
• Data domain π’Ÿ = {𝑒1 , β‹― , 𝑒𝑑 } (e.g., {red, yellow, green, blue})
• Data set 𝐷 ∈ π’Ÿ 𝑛 : 𝑑1 , β‹― , 𝑑𝑛 , of 𝑛 samples from domain π’Ÿ
Count
Histogram representation:
π’–πŸ
π’–πŸ
π’–π’Š
𝒖𝒅
Laplace Mechanism in Action: Computing Histogram
• 𝐻(𝐷): Vector of counts for the 𝑝 −bins
||1 =
𝑝
𝑖=1
𝐻 𝐷
′
−
𝐻
𝐷
𝑒𝑖
𝑒𝑖
=2
Count
For all neighbors D and D’, ||𝐻 𝐷 − 𝐻
𝐷′
π’–πŸ
π’–πŸ
π’–π’Š
𝒖𝒅
Laplace Mechanism in Action: Computing Histogram
• 𝐻(𝐷): Vector of counts for the 𝑝 −bins
1. 𝐸: Random variable
sampled from
Lap(2/πœ–)𝑝
||1 =
𝑝
𝑖=1
𝐻 𝐷
′
−
𝐻
𝐷
𝑒𝑖
𝑒𝑖
=2
Count
For all neighbors D and D’, ||𝐻 𝐷 − 𝐻
𝐷′
2. Output 𝐻 𝐷 + 𝐸
π’–πŸ
π’–πŸ
π’–π’Š
𝒖𝒑
Report-Noisy-Max (a.k.a. Exponential Mechanism)
[MT07,BLST10]
Set of candidate outputs 𝑆 =
Score function 𝑓: 𝑆 × π’Ÿ 𝑛 → ℝ
Domain of data sets
Objective: Output 𝑠 ∈ 𝑆 , maximizing 𝑓(𝑠, 𝐷)
Global sensitivity GS(𝑓)= max |𝑓 𝑠, 𝐷 − 𝑓(𝑠, 𝐷′)| for all neighbors
𝑠∈𝑆,𝐷,𝐷′
𝐷 and 𝐷′
Report-Noisy-Max (a.k.a. Exponential Mechanism)
[MT07,BLST10]
Objective: Output 𝑠 ∈ 𝑆 , maximizing 𝑓(𝑠, 𝐷)
Global sensitivity GS(𝑓)= max |𝑓 𝑠, 𝐷 − 𝑓(𝑠, 𝐷′)| for all neighbors
𝑠∈𝑆,𝐷,𝐷′
𝐷 and 𝐷′
1. π‘Žπ‘  ← 𝑓 𝑠, 𝐷 + Lap(GS(𝑓)/πœ–)
2. Output 𝑠 with highest value of π‘Žπ‘ 
Theorem (Privacy): Algorithm is 2πœ–-differentially private
Local Sensitivity [NRS07]
Data set 𝐷 = {𝑑1 , β‹― , 𝑑𝑛 } and 𝑓: π‘ˆ ∗ → ℝ𝑝 be a function on 𝐷
Local sensitivity: LS(𝑓, 𝐷, 1)=
max ′
𝐷′ ,𝑑𝐻 𝐷,𝐷 =1
|𝑓 𝐷 − 𝑓 𝐷 ′ | 1
1. 𝐸: Random variable sampled from Lap(LS(𝑓, 𝐷, 1)/πœ–)𝑝
2. Output 𝑓 𝐷 + 𝐸
Not differentially private
Part II : We show that local sensitivity is an useful tool
Composition Theorems [DMNS06,DL09,DRV10]
(πœ–, 𝛿)-diff. private algorithm π’œ1
Data set
(πœ–, 𝛿)-diff. private algorithm π’œπ‘˜
Weak composition:
π’œ1 ∘ β‹― ∘ π’œπ‘˜ is (π‘˜πœ–, π‘˜π›Ώ)-diff. private
Strong composition ≈
π’œ1 ∘ β‹― ∘ π’œπ‘˜ is ( π‘˜πœ–, π‘˜π›Ώ)-diff. private
Part I of this Talk
1. Towards a rigorous notion of statistical data privacy
2. Differential privacy: An overview
3. Generalization guarantee via differential privacy
4. Application: Follow-the-perturbed-leader
Convex risk minimization
Convex Empirical Risk Minimization (ERM): An Example
Linear classifiers in ℝ𝑝
𝑝
Domain:
Feature
vector
π‘₯
∈
ℝ
Data set 𝐷 = π‘₯1 , 𝑦1 , β‹― , π‘₯𝑛 , 𝑦𝑛
Label 𝑦 ∈{yellow, red}
∼i.i.d. 𝜏
+11 𝑛-1
ERM: Use β„’ πœƒ; 𝐷 =
𝑖=1 𝑦𝑖 ⟨π‘₯𝑖 , πœƒ⟩ to approximate πœƒπ‘…
Distribution 𝜏 over (π‘₯, 𝑦)𝑛
Find πœƒ ∈ π’ž that classifies π‘₯, 𝑦 ∼ 𝜏
Minimize risk: πœƒπ‘… = arg min 𝔼
πœƒ∈π’ž
π‘₯,𝑦 ∼𝜏
𝑦⟨π‘₯, πœƒ⟩
Convex set π’ž of
constant diameter
Empirical Risk Minimization (ERM) Setup
Convex loss function: β„“: π’ž × π’Ÿ → ℝ
Data set 𝐷 = {𝑑1 , β‹― , 𝑑𝑛 }
Regularized ERM: πœƒ =
Loss function: β„’(πœƒ; 𝐷)
1
arg min
πœƒ∈π’ž 𝑛
𝑛
𝑖=1 β„“(πœƒ; 𝑑𝑖 )
Objective: Minimize excess risk
𝔼𝑑∼𝜏 β„“(πœƒ; 𝑑) − min 𝔼𝑑∼𝜏 β„“(πœƒ; 𝑑)
πœƒ∈π’ž
regularizer
+ π‘Ÿ(πœƒ)
Used to stop overfitting
Differential privacy yields generalizability
Privacy implies Generalizability: A Bird’s Eye View
Differential privacy
Stability (robustness to outliers)
Generalization (low excess risk)
Prediction Stability and Generalizability [SSSS09]
𝛼 −Prediction stability: For any pair of neighboring data sets 𝐷, 𝐷 ′
∀𝑑 ∈ π’Ÿ,
π”Όπ’œ β„“(π’œ 𝐷 ; 𝑑) − π”Όπ’œ β„“(π’œ 𝐷 ′ ; 𝑑) = 𝛼
Theorem: Excess empirical risk of Algorithm π’œ:
π”Όπ’œ β„’(π’œ 𝐷 ; 𝐷) − min β„’ πœƒ; 𝐷 ≤ 𝐴
πœƒ
Prediction stability of π’œ is 𝛼
Then 𝔼𝑑∼𝜏 β„“(πœƒ; 𝑑) − min 𝔼𝑑∼𝜏 β„“ πœƒ; 𝑑
πœƒ∈π’ž
≤𝐴+𝛼
Prediction Stability and Generalizability [SSSS09]
𝛼 −Prediction stability: For any pair of neighboring data sets 𝐷, 𝐷 ′
∀𝑑 ∈ π’Ÿ,
π”Όπ’œ β„“(π’œ 𝐷 ; 𝑑) − π”Όπ’œ β„“(π’œ 𝐷′ ; 𝑑 = 𝛼
Theorem: If ||𝛻ℓ πœƒ; 𝑑 ||2 ≤ 𝐿, then 𝛼 = 2𝐿/(Δ𝑛)
Regularization implies stability:
π’œ(𝐷): πœƒ =
1
arg min
πœƒ∈π’ž 𝑛
𝑛
𝑖=1 β„“(πœƒ; 𝑑𝑖 )
+
Δ
2
πœƒ
2
2
Prediction Stability and Generalizability [Bassily, Smith,T.’2014]
𝛼 −Prediction stability: For any pair of neighboring data sets 𝐷, 𝐷 ′
∀𝑑 ∈ π’Ÿ,
π”Όπ’œ β„“(π’œ 𝐷 ; 𝑑) − π”Όπ’œ β„“(π’œ 𝐷′ ; 𝑑 = 𝛼
Theorem: If π’œ is (πœ–, 𝛿) −differentially private, then 𝛼 = πœ–πΏ + 𝛿𝐿2
Advantage over 𝐿2 −regularization: Stability does not rely on
convexity of β„“
Prediction Stability and Generalizability [Bassily, Smith,T.’2014]
Theorem: Excess empirical risk of Algorithm π’œ:
π”Όπ’œ β„’(π’œ 𝐷 ; 𝐷) − min β„’ πœƒ; 𝐷 ≤ 𝐴
πœƒ
Part
II
of
the
talk:
Can
achieve
A
=
O(𝐿
𝑝/(πœ–π‘›))
Prediction stability of π’œ is 𝛼
Then 𝔼𝑑∼𝜏 β„“(πœƒ; 𝑑) − min 𝔼𝑑∼𝜏 β„“ πœƒ; 𝑑
πœƒ∈π’ž
≤𝐴+𝛼
Theorem: If π’œ is (πœ–, 𝛿) −differentially private, then 𝛼 = πœ–πΏ + 𝛿𝐿2
Prediction Stability and Generalizability [Bassily, Smith,T.’2014]
Theorem: Excess empirical risk of (πœ–, 𝛿) −diff. private algorithm π’œ:
π”Όπ’œ β„’(π’œ 𝐷 ; 𝐷) − min β„’ πœƒ; 𝐷 = 𝑂(𝐿 𝑝/(πœ–π‘›))
πœƒ
Prediction stability of π’œ is πœ–πΏ. Setting πœ– =
𝔼𝑑∼𝜏 β„“(πœƒ; 𝑑) − min 𝔼𝑑∼𝜏 β„“ πœƒ; 𝑑
πœƒ∈π’ž
𝑝/𝑛,
= 𝑂(𝐿𝑝0.25 / 𝑛)
Uniform convergence would have resulted in 𝑝 dependence
Proof sketches…
Stability implies Generalizability
Theorem: Excess empirical risk of Algorithm π’œ:
π”Όπ’œ β„’(π’œ 𝐷 ; 𝐷) − min β„’ πœƒ; 𝐷 ≤ 𝐴
πœƒ
Prediction stability of π’œ is 𝛼
Then 𝔼𝑑∼𝜏 β„“(πœƒ; 𝑑) − min 𝔼𝑑∼𝜏 β„“ πœƒ; 𝑑
πœƒ∈π’ž
≤𝐴+𝛼
Stability implies Generalizability: Proof Sketch
Define true risk 𝑇 πœƒ = 𝔼𝑑∼𝜏 β„“(πœƒ; 𝑑)
True risk does not change on
resampling a data point
Data set: 𝐷 ∼ 𝜏 𝑛 and 𝐷𝑖 has 𝑖-sample replaced by 𝑑 ∼ 𝜏
Claim: ∀𝑖 ∈ 𝑛 , π”Όπ’œ 𝑇 π’œ 𝐷
Follows from the fact that 𝑑𝑖 is
independent of 𝐷𝑖
= π”Όπ’œ 𝑇 π’œ 𝐷𝑖
⇒
Claim: π”Όπ’œ 𝑇 π’œ 𝐷
=
1
𝑛
𝑛
𝑖=1 𝔼
β„“ π’œ 𝐷𝑖 ; 𝑑𝑖
Stability implies Generalizability: Proof Sketch
Claim: ∀𝑖 ∈ 𝑛 , π”Όπ’œ 𝑇 π’œ 𝐷
Claim: π”Όπ’œ 𝑇 π’œ 𝐷
=
1
𝑛
= π”Όπ’œ 𝑇 π’œ 𝐷𝑖
𝑛
𝑖=1 𝔼
β„“ π’œ 𝐷𝑖 ; 𝑑𝑖
⇒
𝔼𝑇 π’œ 𝐷
1
− β„’(π’œ 𝐷 ; 𝐷) =
𝑛
≤𝛼
𝑛
𝔼 β„“ π’œ 𝐷𝑖 ; 𝑑𝑖 − β„“(π’œ 𝐷 ; 𝑑𝑖 )
𝑖=1
[Follows from prediction stability]
Stability implies Generalizability: Proof Sketch
•π”Ό 𝑇 π’œ 𝐷
− β„’(π’œ 𝐷 ; 𝐷) ≤ 𝛼
True risk minimizer
• 𝔼 𝑇(πœƒπ‘… ) = 𝔼𝐷∼πœπ‘› β„’(πœƒπ‘… ; 𝐷) ≥ 𝔼 β„’ π’œ 𝐷 ; 𝐷
−𝐴
⇒
Theorem: Excess empirical risk of Algorithm π’œ:
π”Όπ’œ β„’(π’œ 𝐷 ; 𝐷) − min β„’ πœƒ; 𝐷 ≤ 𝐴
πœƒ
Prediction stability of π’œ is 𝛼
Then 𝔼𝑑∼𝜏 β„“(πœƒ; 𝑑) − min 𝔼𝑑∼𝜏 β„“ πœƒ; 𝑑
πœƒ∈π’ž
≤𝐴+𝛼
𝛼 −Prediction Stability via Differential Privacy
Theorem: If π’œ is (πœ–, 𝛿) −differentially private, then 𝛼 = πœ–πΏ + 𝛿𝐿2
Some fixed element from π’ž
Proof Sketch: By the definition of differential privacy, for all 𝑑 ∈ π’Ÿ
𝔼 β„“(π’œ 𝐷 ; 𝑑) − 𝔼 β„“(πœƒ ∗ ; 𝑑) − 𝔼 β„“(π’œ 𝐷′ ; 𝑑) − 𝔼 β„“(πœƒ ∗ ; 𝑑)
≤ πœ– 𝔼 β„“ π’œ 𝐷′ ; 𝑑
≤ πœ–πΏ + 𝛿𝐿2
− 𝔼 β„“ πœƒ∗; 𝑑
+ 𝛿𝐿2
Part I of this Talk
1. Towards a rigorous notion of statistical data privacy
2. Differential privacy: An overview
3. Generalization guarantee via differential privacy
4. Application: Follow-the-perturbed-leader
Online learning with linear costs
Online Learning Setup
πœƒ1 ∈ 𝐿1 -ball
Player 1
Cost: ⟨𝑓1 , πœƒ1 ⟩
𝑓1 ∈ 𝐿∞ -ball
Player 2
Online Learning Setup
πœƒ2 ∈ 𝐿1 −ball
Player 1
Cost: ⟨𝑓2 , πœƒ2 ⟩
𝑓2 ∈ 𝐿∞ −ball
Player 2
Online Learning Setup
πœƒπ‘‡ ∈ 𝐿1 −ball
Minimize regret:
Player 1
𝑇
𝑅 πœƒ1 , β‹― , πœƒπ‘‡ =
𝑓𝑇 ∈ 𝐿∞ −ball
𝑑=1
Player 2
Oblivious adversary:
𝑓1 , β‹― , 𝑓𝑇 are fixed ahead of time
Cost: ⟨𝑓𝑇 , πœƒπ‘‡ ⟩
𝑇
πœƒπ‘‘ , 𝑓𝑑 − min
πœƒ∈L1
⟨πœƒ, 𝑓𝑑 ⟩
𝑑=1
Follow-the-leader and Follow-the-perturbed-leader
Follow-the-leader (FTL) algorithm
• πœƒπ‘‘+1 ← arg min
πœƒ∈𝐿1
𝑑
𝜏=1⟨π‘“πœ , πœƒ⟩
Theorem: FTL has regret of Ω(𝑇)
from:
Proof sketch:Follows
Generate
𝑓1 , β‹― , 𝑓𝑇 ∼iid 0,1 𝑝
1. Exponential mechanism
2. Strong composition theorem
Follow-the-perturbed leader (FTPL)
[KV05]
• πœƒπ‘‘+1 ← arg min π‘‘πœ=1⟨π‘“πœ + 𝑏𝑑 , πœƒ⟩
πœƒ∈𝐿1
(where 𝑏𝑑 ∼ Lap 1/πœ– 𝑝 )
Theorem: FTPL with πœ– ≈
log(𝑝)/ 𝑇
has regret of 𝑂( log(𝑝) 𝑇)
Theorem: FTPL is πœ–, 𝛿 −diff. private
Regret Analysis of FTPL via Differential Privacy
Theorem: FTPL with πœ– ≈ log(𝑝)/ 𝑇 has regret of 𝑂( log(𝑝) 𝑇)
𝑑+1
Proof Sketch: Be-the-leader (BTL): πœƒFollows
←
arg
min
from prediction
stability
𝑑+1
𝜏 , πœƒ⟩
𝜏=1⟨𝑓
πœƒ∈𝐿1
of differential privacy
Claim: BTL has zero regret
Be-the-perturbed-leader (BTPL): πœƒπ‘‘+1 ← arg min
πœƒ∈𝐿1
𝑑+1
𝜏=1⟨π‘“πœ
+ 𝑏𝑑 , πœƒ⟩
Claim: By πœ–, 𝛿 −differential privacy, 𝔼 FTPL ≤ 𝑒 πœ– 𝔼[BTPL] + 𝛿
Regret Analysis of FTPL via Differential Privacy
Proof Sketch: Be-the-leader (BTL): πœƒπ‘‘+1 ← arg min
πœƒ∈𝐿1
𝑑+1
𝜏=1⟨π‘“πœ , πœƒ⟩
𝑑+1
Be-the-perturbed-leader (BTPL):Optimizing
πœƒπ‘‘+1 ← arg
min
⟨π‘“πœ + 𝑏bound
𝑑 , πœƒ⟩
on πœ–, gives𝜏=1
the regret
πœƒ∈𝐿1
Ignores dependence on
πœ– 𝛿 for simplicity
Claim: By πœ–, 𝛿 −differential privacy, 𝔼 FTPL ≤ 𝑒 𝔼[BTPL] + 𝛿
Claim: 𝔼 BTPL ≤ 𝔼 BTL + 𝑂(log 𝑝 /πœ–)
⇒Regret(FTPL)= 𝑂(log 𝑝 /πœ– + πœ–π‘‡)
Part I of this Talk
1. Towards a rigorous notion of statistical data privacy
2. Differential privacy: An overview
3. Generalization guarantee via differential privacy
4. Application: Follow-the-perturbed-leader
Thesis: Differential privacy ⇒ generalizability
Stable learning ⇒ differential privacy
Concluding Remarks
• Differential privacy is a cryptographically strong notion of data
privacy
• Differential privacy provides a very strong form of stability,
i.e., stability on the measure induced on the output space
• Stability from differential privacy is composable
i.e., any post-processing on the output is also stable
References
[DMNS06] Calibrating noise to sensitivity in private data analysis.
Cynthia Dwork, Kobbi Nissim, Frank McSherry and Adam Smith, 2006
[DKMMN06] Our data Ourselves: Privacy via distributed noise generation.
Cynthia Dwork, Krishnaram Kenthapadi, Frank McSherry, Ilya Mironov, and
Kobbi Nissim, 2006
[MT07] Mechanism design via differential privacy.
Frank McSherry and Kunal Talwar, 2007
[BLST10] Discovering frequent patterns in sensitive data.
Raghav Bhaskar, Srivatsan Laxman, Adam Smith, and Abhradeep Thakurta, 2010
References
[BST14] Differentially Private Empirical Risk Minimization: Efficient Algorithms and Tight
Algorithms.
Raef Bassily, Adam Smith, and Abhradeep Thakurta, 2014
[DN10] On the Difficulties of Disclosure Prevention, or The Case for Differential Privacy.
Cynthia Dwork and Moni Naor, 2010
[DN03] Revealing information while preserving privacy.
Irit Dinur and Moni Naor, 2003
Download