Presentation

advertisement
From Stability to Differential Privacy
Abhradeep Guha Thakurta
Yahoo! Labs, Sunnyvale
Thesis: Stable algorithms yield
differentially private algorithms
Differential privacy: A short tutorial
Privacy in Machine Learning Systems
Individuals
๐‘‘1
๐‘‘2
๐‘‘1
๐‘‘๐‘›−1
๐‘‘๐‘›
Privacy in Machine Learning Systems
Individuals
๐‘‘1
๐‘‘2
๐‘‘1
๐‘‘๐‘›−1
๐‘‘๐‘›
Trusted learning Algorithm
โ„ณ
Privacy in Machine Learning Systems
Individuals
๐‘‘1
๐‘‘2
๐‘‘1
๐‘‘๐‘›−1
๐‘‘๐‘›
Trusted learning Algorithm
โ„ณ
Summary
statistics
1. Classifiers
2. Clusters
3. Regression
coefficients
Users
Privacy in Machine Learning Systems
Individuals
๐‘‘1
๐‘‘2
๐‘‘1
๐‘‘๐‘›−1
๐‘‘๐‘›
Attacker
Trusted learning Algorithm
โ„ณ
Summary
statistics
1. Classifiers
2. Clusters
3. Regression
coefficients
Users
Privacy in Machine Learning Systems
Two conflicting goals:
๐‘‘1
1.
Utility: Release accurate information
๐‘‘1
2.
Privacy: Protect privacy of individual entries
๐‘‘๐‘›
๐‘‘2
๐‘‘๐‘›−1
Balancing the tradeoff is a difficult problem:
1.
Netflix prize database attack [NS08]
2.
Facebook advertisement system attack [Korolova11]
3.
Amazon recommendation system attack [CKNFS11]
Data privacy is an active area of research:
• Computer science, economics, statistics, biology, social sciences …
Learning
Algorithm
โ„ณ
Users
Differential Privacy [DMNS06, DKMMN06]
Intuition:
• Adversary learns essentially the same thing irrespective of your
presence or absence in the data set
๐‘‘1
Data set: ๐ท
M
Random coins
M(๐ท)
๐‘‘1
Data set: ๐ท’
M
M(๐ท’)
Random coins
• ๐ท and ๐ท′ are called neighboring data sets
• Require: Neighboring data sets induce close distribution on outputs
Differential Privacy [DMNS06, DKMMN06]
Definition:
A randomized algorithm M is (๐œ–, ๐›ฟ)-differentially private if
• for all data sets ๐ท and ๐ท ′ that differ in one element
• for all sets of answers ๐‘†
Pr M ๐ท ∈ ๐‘† ≤ ๐‘’ Pr[M(๐ท′) ∈ ๐‘†] + ๐›ฟ
๐œ–
Semantics of Differential Privacy
• Differential privacy is a condition on the algorithm
• Guarantee is meaningful in the presence of any auxiliary information
• Typically, think of privacy parameters: ๐œ– ≈ 0.1 and ๐›ฟ = 1/๐‘›log ๐‘› ,
where ๐‘› = # of data samples
• Composition: ๐œ– ’s and ๐›ฟ ‘s add up over multiple executions
Laplace Mechanism [DMNS06]
Data set ๐ท = {๐‘‘1 , โ‹ฏ , ๐‘‘๐‘› } and ๐‘“: ๐‘ˆ ∗ → โ„ be a function on ๐ท
Sensitivity: S(๐‘“)=
max
๐ท,๐ท′ ,๐‘‘๐ป ๐ท,๐ท′ =1
|๐‘“ ๐ท − ๐‘“ ๐ท ′ |
1. ๐ธ: Random variable sampled from Lap(๐‘†(๐‘“)/๐œ–)
2. Output ๐‘“ ๐ท + ๐ธ
Theorem (Privacy): Algorithm is ๐œ–-differentially private
This Talk
1. Differential privacy via stability arguments: A meta-algorithm
2. Sample and aggregate framework and private model selection
3. Non-private sparse linear regression in high-dimensions
4. Private sparse linear regression with (nearly) optimal rate
Perturbation stability
(a.k.a. zero local sensitivity)
Perturbation Stability
Data set ๐ท
๐‘‘1
๐‘‘2
Function ๐‘“
โ‹ฎ
๐‘‘๐‘›
Output
Perturbation Stability
Data set ๐ท′
๐‘‘1
๐‘‘2 ′
Function ๐‘“
โ‹ฎ
๐‘‘๐‘›
Output
Stability of ๐‘“ at ๐ท: The output does not change on changing any one entry
Equivalently, local sensitivity of ๐‘“ at ๐ท is zero
Distance to Instability Property
• Definition: A function ๐‘“: ๐‘ˆ ∗ → ℜ is ๐‘˜ −stable at a data set ๐ท if
• For any data set ๐ท′ ∈ ๐‘ˆ ∗ , with ๐ทΔ๐ท′ ≤ ๐‘˜,
๐‘“ ๐ท = ๐‘“(๐ท ′ )
• Distance to instability:
max(๐‘“ ๐ท ๐‘–๐‘  ๐‘˜ − ๐‘ ๐‘ก๐‘Ž๐‘๐‘™๐‘’)
๐‘˜
• Objective: Output ๐‘“(๐ท) while preserving
differential privacy
All data sets
Unstable data sets
Distance > ๐‘˜
๐ท
Stable data sets
Propose-Test-Release (PTR) framework
[DL09, KRSY11, Smith T.’13]
A Meta-algorithm: Propose-Test-Release (PTR)
Basic tool: Laplace
mechanism
1. ๐‘‘๐‘–๐‘ ๐‘ก ← max(๐‘“ ๐ท ๐‘–๐‘  ๐‘˜ − ๐‘ ๐‘ก๐‘Ž๐‘๐‘™๐‘’)
๐‘˜
2. ๐‘‘๐‘–๐‘ ๐‘ก ← ๐‘‘๐‘–๐‘ ๐‘ก + ๐ฟ๐‘Ž๐‘
3. If ๐‘‘๐‘–๐‘ ๐‘ก >
๐‘™๐‘œ๐‘”(1/๐›ฟ)
,
๐œ–
1
๐œ–
then return ๐‘“(๐ท), else return ⊥
Theorem: The algorithm is ๐œ–, ๐›ฟ −differentially private
2๐‘™๐‘œ๐‘” 1/๐›ฟ
๐œ–
Theorem: If ๐‘“ is
-stable at ๐ท, then w.p. ≥ 1 − ๐›ฟ the algorithm
outputs ๐‘“(๐ท)
This Talk
1. Differential privacy via stability arguments: A meta-algorithm
2. Sample and aggregate framework and private model selection
3. Non-private sparse linear regression in high-dimensions
4. Private sparse linear regression with (nearly) optimal rate
Sample and aggregate framework
[NRS07, Smith11, Smith T.’13]
Sample and Aggregate Framework
Data set ๐ท
Subsample
๐ท1
๐ท๐‘š
Aggregator
Algorithm
Output
Sample and Aggregate Framework
Theorem: If the aggregator is ๐œ–/๐œ‚, ๐›ฟ −differentially private, then
the overall framework is ๐œ–, ๐›ฟ −differentially private
Assumption: Each entry appears in
๐œ‚ data blocks
Proof: Each data entry affects only
one data block
A differentially private aggregator using PTR
framework [Smith T.’13]
An ๐œ–, ๐›ฟ −differentially Private Aggregator
Assumption: ๐‘Ÿ discrete possible outputs ๐‘†1 , โ‹ฏ , ๐‘†๐‘Ÿ
๐ท1
๐ท๐‘š
Vote
Count
Vote
๐‘†1
๐‘†2
๐‘†∗
๐‘†๐‘Ÿ
PTR+Report-Noisy-Max Aggregator
Function ๐‘“: Candidate output with the maximum votes
1. ๐‘‘๐‘–๐‘ ๐‘ก ← max(๐‘“ ๐ท ๐‘–๐‘  ๐‘˜ − ๐‘ ๐‘ก๐‘Ž๐‘๐‘™๐‘’)
๐‘˜
2. ๐‘‘๐‘–๐‘ ๐‘ก ← ๐‘‘๐‘–๐‘ ๐‘ก + ๐ฟ๐‘Ž๐‘
3. If ๐‘‘๐‘–๐‘ ๐‘ก >
๐‘™๐‘œ๐‘”(1/๐›ฟ)
,
๐œ–
1
๐œ–
then return ๐‘“(๐ท), else return ⊥
Observation: 2 × (๐‘‘๐‘–๐‘ ๐‘ก + 1) is the gap between the counts of
highest and the second highest scoring model
Observation: The algorithm is always computationally efficient
Analysis of the aggregator under subsampling
stability [Smith T.’13]
Subsampling Stability
Data set ๐ท
Random subsample
with replacement
๐ท1
Stability:
๐ท๐‘š
Function
๐‘“(๐ท๐‘– )
=
Function
๐‘“(๐ท)
w.p. >
3
4
A Private Aggregator using Subsampling Stability
Voting histogram (in expectation)
• ๐ท๐‘– : Sample each entry from ๐ท
w.p. ๐‘ž = ๐‘‚(๐œ–/log(1/๐›ฟ))
• Each entry of ๐ท appears in
≈ ๐‘š๐‘ž data blocks
1
m
2
3
๐‘š
4
1
๐‘š
2
1
๐‘š
4
๐‘†1
๐‘†2
๐‘†∗
๐‘†๐‘Ÿ
PTR+Report-Noisy-Max Aggregator
• ๐ท๐‘– : Sample each entry from ๐ท w.p. ๐‘ž = ๐‘‚(๐œ–/log(1/๐›ฟ))
• Each entry of ๐ท appears in 2๐‘š๐‘ž data blocks w.p. ≥ 1 − ๐›ฟ
1. ๐‘‘๐‘–๐‘ ๐‘ก ← (๐‘†∗ − ๐‘†2 )/4๐‘š๐‘ž − 1
2. ๐‘‘๐‘–๐‘ ๐‘ก ← ๐‘‘๐‘–๐‘ ๐‘ก + ๐ฟ๐‘Ž๐‘
3. If ๐‘‘๐‘–๐‘ ๐‘ก >
๐‘™๐‘œ๐‘”(1/๐›ฟ)
,
๐œ–
1
๐œ–
then return ๐‘“(๐ท), else return ⊥
๐‘†1
๐‘†2
๐‘†∗ ๐‘†๐‘Ÿ
A Private Aggregator using Subsampling Stability
Theorem: Above algorithm
Notice: Utility guarantee does not depend on the
of candidate private
models
is ๐œ–, 2๐›ฟnumber
−differentially
Theorem: If ๐‘š = log ๐‘›/๐›ฟ /๐‘ž2 ,then with probability at least 1 − 2๐›ฟ,
the true answer ๐‘“ ๐ท is output
This Talk
1. Differential privacy via stability arguments: A meta-algorithm
2. Sample and aggregate framework and private model selection
3. Non-private sparse linear regression in high-dimensions
4. Private sparse linear regression with (nearly) optimal rate
Sparse linear regression in high-dimensions
and the LASSO
Sparse Linear Regression in High-dimensions (๐‘ โ‰ซ ๐‘›)
• Data set: ๐ท = { ๐‘ฅ1 , ๐‘ฆ1 , โ‹ฏ , ๐‘ฅ๐‘› , ๐‘ฆ๐‘› } where ๐‘ฅ๐‘– ∈ โ„๐‘ and ๐‘ฆ๐‘– ∈ โ„
• Assumption: Data generated by noisy linear system
=
Feature vector
๐‘ฅ๐‘–
Parameter vector
๐‘ฆ๐‘–
∗
๐œƒ๐‘×1
+
๐‘ค๐‘–
Field noise
Data normalization:
• ∀๐‘– ∈ ๐‘› , ||๐‘ฅ๐‘– ||∞ ≤ 1
• ∀๐‘–, ๐‘ค๐‘– is sub-Gaussian
Sparse Linear Regression in High-dimensions (๐‘ โ‰ซ ๐‘›)
• Data set: ๐ท = { ๐‘ฅ1 , ๐‘ฆ1 , โ‹ฏ , ๐‘ฅ๐‘› , ๐‘ฆ๐‘› } where ๐‘ฅ๐‘– ∈ โ„๐‘ and ๐‘ฆ๐‘– ∈ โ„
๐‘ฆ๐‘›×1
๐‘‹๐‘›×๐‘
∗
๐œƒ๐‘×1
+
Field noise
=
Design matrix
Parameter vector
Response vector
• Assumption: Data generated by noisy linear system
๐‘ค๐‘›×1
๐‘ฆ๐‘›×1
=
Design matrix
+
Field noise
Response vector
Sparse Linear Regression in High-dimensions (๐‘ โ‰ซ ๐‘›)
๐‘ค๐‘›×1
๐‘‹๐‘›×๐‘
∗
๐œƒ๐‘×1
• Sparsity: ๐œƒ ∗ has ๐‘  < ๐‘› non-zero entries
• Bounded norm: ∀๐‘–
∗
|๐œƒ๐‘– |
∈ (1 − Φ, Φ) for arbitrary small const. Φ
Model selection problem: Find the non-zero coordinates of ๐œƒ ∗
๐‘ฆ๐‘›×1
=
Design matrix
+
Field noise
Response vector
Sparse Linear Regression in High-dimensions (๐‘ โ‰ซ ๐‘›)
๐‘ค๐‘›×1
๐‘‹๐‘›×๐‘
∗
๐œƒ๐‘×1
Model selection: Non-zero coordinates (or the support) of ๐œƒ ∗
Solution: LASSO estimator [Tibshirani94,EFJT03,Wainwright06,CT07,ZY07,…]
1
๐œƒ ∈ arg min๐‘
||๐‘ฆ − ๐‘‹๐œƒ||22 + Λ||๐œƒ||1
๐œƒ∈โ„ 2๐‘›
Consistency of the LASSO Estimator
=
+
Consistency conditions* [Wainwright06,ZY07]:
• Γ: Support of the underlying parameter vector ๐œƒ ∗
๐‘‹Γ
Incoherence
||| ๐‘‹Γ๐‘‡๐‘ ๐‘‹Γ
๐‘‹Γ๐‘
Restricted Strong Convexity
1
−1
๐‘‡
๐‘‹Γ ๐‘‹Γ |||∞ <
4
๐œ†๐‘š๐‘–๐‘› ๐‘‹Γ๐‘‡ ๐‘‹Γ = Ω(๐‘›)
Consistency of the LASSO Estimator
=
+
Consistency conditions* [Wainwright06,ZY07]:
• Γ: Support of the underlying parameter vector ๐œƒ ∗
Theorem*: Under proper choice of Λ and ๐‘› = Ω(๐‘ log ๐‘), support
of the LASSO estimator ๐œƒ equals support of ๐œƒ ∗
Incoherence
||| ๐‘‹Γ๐‘‡๐‘ ๐‘‹Γ
Restricted Strong Convexity
1
−1
๐‘‡
๐‘‹Γ ๐‘‹Γ |||∞ <
4
๐œ†๐‘š๐‘–๐‘› ๐‘‹Γ๐‘‡ ๐‘‹Γ = Ω(๐‘›)
Stochastic Consistency of the LASSO
=
+
Consistency conditions* [Wainwright06,ZY07]:
• Γ: Support of the underlying parameter vector ๐œƒ ∗
Incoherence
||| ๐‘‹Γ๐‘‡๐‘ ๐‘‹Γ ๐‘‹Γ๐‘‡ ๐‘‹Γ
Restricted Strong Convexity
−1
1
|||∞ <
4
๐œ†๐‘š๐‘–๐‘› ๐‘‹Γ๐‘‡ ๐‘‹Γ = Ω(๐‘›)
Theorem [Wainwright06,ZY07]: If each data entry in ๐‘‹ ∼ ๐’ฉ(0,1/4),
then the assumptions above are satisfied w.h.p.
We show [Smith,T.’13]
Consistency conditions
Perturbation stability
Proxy conditions
This Talk
1. Differential privacy via stability arguments: A meta-algorithm
2. Sample and aggregate framework and private model selection
3. Non-private sparse linear regression in high-dimensions
4. Private sparse linear regression with (nearly) optimal rate
Interlude: A simple subsampling based private
LASSO algorithm [Smith,T.’13]
Notion of Neighboring Data sets
Design matrix
Response vector
๐‘ฅ๐‘–
๐‘ฆ๐‘–
Data set ๐ท = ๐‘›
๐‘
Notion of Neighboring Data sets
Design matrix
๐‘ฅ๐‘– ′
Data set ๐ท′ =
Response vector
๐‘ฆ๐‘– ′
๐‘›
๐‘
๐ท and ๐ท′ are neighboring data sets
Recap: Subsampling Stability
Data set ๐ท
Random subsample
with replacement
๐ท1
Stability:
๐ท๐‘š
Function
๐‘“(๐ท๐‘– )
=
Function
๐‘“(๐ท)
w.p. >
3
4
Recap: PTR+Report-Noisy-Max Aggregator
=
+
๐‘
Assumption: All ๐‘Ÿ =
candidate models ๐‘†1 , โ‹ฏ , ๐‘†๐‘Ÿ
๐‘ 
๐ท1
๐ท๐‘š
๐‘“
Vote
Count
Vote
๐‘“
๐‘“
๐‘†1
๐‘†2
๐‘†∗
๐‘†๐‘˜
Recap: PTR+Report-Noisy-Max Aggregator
• ๐ท๐‘– : Sample each entry from ๐ท w.p. ๐‘ž = ๐‘‚(๐œ–/log(1/๐›ฟ))
• Each entry of ๐ท appears in 2๐‘š๐‘ž data blocks w.p. ≥ 1 − ๐›ฟ
• Fix ๐‘š = log(๐‘›/๐›ฟ)/๐‘ž2
1. ๐‘‘๐‘–๐‘ ๐‘ก ← (๐‘†∗ − ๐‘†2 )/4๐‘š๐‘ž − 1
2. ๐‘‘๐‘–๐‘ ๐‘ก ← ๐‘‘๐‘–๐‘ ๐‘ก + ๐ฟ๐‘Ž๐‘
3. If ๐‘‘๐‘–๐‘ ๐‘ก >
๐‘™๐‘œ๐‘”(1/๐›ฟ)
,
๐œ–
1
๐œ–
then return ๐‘“(๐ท), else return ⊥
๐‘†1
๐‘†2
๐‘†∗ ๐‘†๐‘Ÿ
Subsampling Stability of the LASSO
๐‘ฆ๐‘›×1
๐‘‹๐‘›×๐‘
∗
๐œƒ๐‘×1
+
Field noise
=
Design matrix
Parameter vector
Response vector
Stochastic assumptions: Each data entry in ๐‘‹ ∼ ๐’ฉ(0,1/4)
Noise ๐‘ค ∼ ๐’ฉ(0, ๐œŽ 2 ๐•€๐‘› )
๐‘ค๐‘›×1
Subsampling Stability of the LASSO
=
+
Notice the gap of log(1/๐›ฟ) /๐œ–
Stochastic assumptions: Each data entry in ๐‘‹ ∼ ๐’ฉ(0,1/4)
Noise ๐‘ค ∼ ๐’ฉ(0, ๐œŽ 2 ๐•€Scale
๐‘› ) of ๐›ฟ = 1/๐‘›๐œ” 1
Theorem [Wainwright06,ZY07]: Under proper choice of Λ and ๐‘› =
Ω(๐‘  log ๐‘) , support of the LASSO estimator ๐œƒ equals support of ๐œƒ ∗
Theorem: Under proper choice of Λ , ๐‘› = Ω
log(1/๐›ฟ)๐‘  log ๐‘
๐œ–
and
๐‘“ = ๐ฟ๐ด๐‘†๐‘†๐‘‚, the output of the aggregator equals support of ๐œƒ ∗
Perturbation stability based private LASSO and
optimal sample complexity [Smith,T.’13]
Recap: Distance to Instability Property
• Definition: A function ๐‘“: ๐‘ˆ ∗ → ℜ is ๐‘˜ −stable at a data set ๐ท if
• For any data set ๐ท′ ∈ ๐‘ˆ ∗ , with ๐ทΔ๐ท′ ≤ ๐‘˜,
๐‘“ ๐ท = ๐‘“(๐ท ′ )
• Distance to instability:
max(๐‘“ ๐ท ๐‘–๐‘  ๐‘˜ − ๐‘ ๐‘ก๐‘Ž๐‘๐‘™๐‘’)
๐‘˜
• Objective: Output ๐‘“(๐ท) while preserving
differential privacy
All data sets
Unstable data sets
Distance > ๐‘˜
๐ท
Stable data sets
Recap: Propose-Test-Release
Framework
(PTR)
TBD: Some global sensitivity one query
1. ๐‘‘๐‘–๐‘ ๐‘ก ← max(๐‘“ ๐ท ๐‘–๐‘  ๐‘˜ − ๐‘ ๐‘ก๐‘Ž๐‘๐‘™๐‘’)
๐‘˜
2. ๐‘‘๐‘–๐‘ ๐‘ก ← ๐‘‘๐‘–๐‘ ๐‘ก + ๐ฟ๐‘Ž๐‘
3. If ๐‘‘๐‘–๐‘ ๐‘ก >
๐‘™๐‘œ๐‘”(1/๐›ฟ)
,
๐œ–
1
๐œ–
then return ๐‘“(๐ท), else return ⊥
Theorem: The algorithm is ๐œ–, ๐›ฟ −differentially private
2๐‘™๐‘œ๐‘” 1/๐›ฟ
๐œ–
Theorem: If ๐‘“ is
-stable at ๐ท, then w.p. ≥ 1 − ๐›ฟ the algorithm
outputs ๐‘“(๐ท)
Instantiation of PTR for the LASSO
=
LASSO: ๐œƒ ∈
1
arg min๐‘ ||๐‘ฆ
๐œƒ∈โ„ 2๐‘›
− ๐‘‹๐œƒ||22 + Λ||๐œƒ||1
• Set function ๐‘“ =support of ๐œƒ
• Issue: For ๐‘“, distance to instability might not be efficiently
computable
+
From [Smith,T.’13]
Consistency conditions
Perturbation stability
Proxy conditions
This talk
Consistency conditions
Perturbation stability
Proxy conditions
(Efficiently testable with privacy)
Perturbation Stability of the LASSO
=
LASSO: ๐œƒ ∈
1
arg min๐‘ ||๐‘ฆ
๐œƒ∈โ„ 2๐‘›
+
− ๐‘‹๐œƒ||22 + Λ||๐œƒ||1
๐ฝ(๐œƒ)
Theorem: Consistency conditions on LASSO are sufficient for
perturbation stability
Proof Sketch: 1. Analyze Karush-Kuhn-Tucker (KKT) optimality
conditions at ๐œƒ
2. Show that support(๐œƒ) is stable via using
‘’dual certificate’’ on stable instances
Perturbation Stability of the LASSO
=
Proof Sketch:
1 ๐‘‡
Gradient of LASSO ๐œ•๐ฝ๐ท (๐œƒ)= − ๐‘‹ ๐‘ฆ − ๐‘‹๐œƒ + Λ๐œ•||๐œƒ||1
๐‘›
Lasso objective on ๐ท
๐œƒ
0 ∈ ๐œ•๐ฝ๐ท (๐œƒ)
Lasso objective on ๐ท′
๐œƒ′
0 ∈ ๐œ•๐ฝ๐ท′ (๐œƒ′)
+
Perturbation Stability of the LASSO
=
Proof Sketch:
1 ๐‘‡
Gradient of LASSO ๐œ•๐ฝ๐ท (๐œƒ)= − ๐‘‹ ๐‘ฆ − ๐‘‹๐œƒ + Λ๐œ•||๐œƒ||1
๐‘›
Argue using the optimality conditions of ๐œ•๐ฝ๐ท (๐œƒ) and ๐œ•๐ฝ๐ท′ (๐œƒ′)
1. No zero coordinates of ๐œƒ become non-zero in ๐œƒ′
(use mutual incoherence condition)
2. No non-zero coordinates of ๐œƒ become zero in ๐œƒ′
(use restricted strong convexity condition)
+
Perturbation Stability Test for the LASSO
=
+
0 0
Γ: Support of ๐œƒ
Γ c : Complement of the support of ๐œƒ
Test for the following (real test is more complex):
• Restricted Strong Convexity (RSC): Minimum eigenvalue of ๐‘‹ΓT XΓ is Ω(๐‘›)
• Strong stability: Negative of the (absolute) coordinates of the gradient
of the least-squared loss in Γ ๐‘ are โ‰ช Λ
Geometry of the Stability of LASSO
=
+
Intuition: Strong convexity ensures supp(๐œƒ)⊆ supp(๐œƒ′)
1. Strong convexity ensures
||๐œƒΓ − ๐œƒ′Γ ||∞ is small
Lasso objective along Γ
2. If ∀๐‘–, |๐œƒΓ (๐‘–)| is large, then
∀๐‘–, ๐œƒ′Γ ๐‘– > 0
3. Consistency conditions imply
∀๐‘–, |๐œƒΓ (๐‘–)| is large
Dimension 2 in Γ ๐‘
Dimension 1 in Γ
๐œƒ
Geometry of the Stability of LASSO
=
+
Intuition: Strong stability ensures no zero coordinate in ๐œƒ becomes
non-zero in ๐œƒ′
Lasso objective along Γ c
Slope: Λ
Slope: -Λ
Dimension 1 in Γ
Dimension 2 in Γ ๐‘
๐œƒ
• For the minimizer ๐œƒ to move along Γ ๐‘ , the perturbation to the
gradient of least-squared loss has to be large
Geometry of the Stability of LASSO
=
+
Gradient of the least-squared loss:
Γ
−๐‘‹ ๐‘‡ ๐‘ฆ − ๐‘‹๐œƒ =
Lasso objective along Γ c
๐‘Ž๐‘–
Γc
Slope: Λ
Slope: -Λ
๐‘Ž๐‘
Dimension 1 in Γ
Dimension 2 in Γ ๐‘
๐œƒ
• Strong stability: |๐‘Ž๐‘– | โ‰ช Λ for all ๐‘– ∈ Γ ๐‘ ⇒
๐‘– ∈ Γ ๐‘ has a sub-gradient of zero for LASSO(๐ท′)
Making the Stability Test Private (Simplified)
=
+
Test for Restricted Strong Convexity: ๐‘”1 ๐ท
๐‘”2
Test for strong stability: ๐‘”2 ๐ท
Issue: If ๐‘”1 ๐ท > ๐‘ก1 and ๐‘”2 ๐ท > ๐‘ก2 , then
sensitivities are Δ1 and Δ2
๐‘”1
Our solution: Proxy distance
๐‘‘ = max
๐‘”๐‘– ๐ท −๐‘ก๐‘–
min
Δ๐‘–
๐‘–
• ๐‘‘ has global sensitivity of one
๐‘”1 and ๐‘”2 are both large
and insensitive
+ 1,0
Private Model Selection with Optimal Sample
Complexity
=
Nearly optimal sample complexity
+
1. Compute ๐‘‘= function of ๐‘”1 (๐ท) and ๐‘”2 ๐ท
2. ๐‘‘๐‘–๐‘ ๐‘ก ← ๐‘‘ + ๐ฟ๐‘Ž๐‘
3. If ๐‘‘๐‘–๐‘ ๐‘ก >
๐’๐’๐’ˆ(๐Ÿ/๐œน)
,
๐
1
๐œ–
then return ๐‘ ๐‘ข๐‘๐‘(๐œƒ), else return ⊥
Theorem: The algorithm is ๐œ–, ๐›ฟ −differentially private
Theorem: Under consistency conditions , log ๐‘ > ๐›ผ 2 ๐‘  3 and ๐‘› = Ω(๐‘  log ๐‘),
w.h.p. the support of ๐œƒ ∗ is output. Here ๐›ผ = log(1/๐›ฟ)/๐œ–.
Thesis: Stable algorithms yield
differentially private algorithms
Two notions of stability:
1. Perturbation stability
2. Subsampling stability
This Talk
1. Differential privacy via stability arguments: A meta-algorithm
2. Sample and aggregate framework and private model selection
3. Non-private sparse linear regression in high-dimensions
4. Private sparse linear regression with (nearly) optimal rate
Concluding Remarks
1. Sample and aggregate framework with PTR+report-noisy-max
aggregator is a generic tool for designing learning algorithms
• Example: learning with non-convex models [Bilenko,Dwork,Rothblum,T.]
2. Propose-test-release framework is an interesting tool if one can
compute distance to instability efficiently
3. Open problem: Private high-dimensional learning without
assumptions like incoherence and restricted
strong convexity
Download