slides

advertisement
Bayesianism, Convexity, and the
quest towards Optimal Algorithms
Boaz Barak
Harvard University
Microsoft Research
Partially based on work in progress with Sam Hopkins,
Jon Kelner, Pravesh Kothari, Ankur Moitra and Aaron Potechin.
Talk Plan
• Dubious historical analogy.
• Philosophize about automating algorithms.
• Wave hands about convexity and the Sum of Squares
algorithm.
• Sudden shift to Bayesianism vs Frequentism.
• Work in progress on the planted clique problem.
Skipping today:
• Sparse coding / dictionary learning / tensor completion
[B-Kelner-Steurer’14,’15 B-Moitra’15]
• Unique games conjecture / small set expansion
[..B-Brandao-Harrow-Kelner-Steurer-Zhou’12..]
• Connections to quantum information theory
Prologue: Solving equations
Babylonians (~2000BC):
del Ferro-TartagliaCardano-Ferrari (1500’s):
van Roomen/Viete (1593):
Solutions for quadratic equations.
Solutions for cubics and quartics.
“Challenge all mathematicians in the world”
π‘₯ 45 − 45π‘₯ 43 + β‹― + 45π‘₯ =
7/4 + β‹― 45/64
Special cases of quintics
Euler(1740’s):
Vandermonde(1777):
Gauss (1796):
Solve π‘₯ 11 = 1 with square and fifth roots
π‘₯ 17 = 1 root:
−1+ 17+ 34+2 17
8
+
68+12 17−16 34+2 17−2 1− 17
34−2 17
8
…
Ruffini-Abel-Galois
(early 1800’s):
• Some equations can’t be solved in radicals
• Characterization of solvable equations.
• Birth of group theory
• 17-gon construction now “boring”:
few lines of Mathematica.
A prototypical TCS paper
Interesting
problem
Efficient Algorithm
(e.g. MAX-FLOW in P)
Hardness Reduction
(e.g. MAX-CUT NP-hard)
Can we make algorithms boring?
Can we reduce creativity in algorithm design?
Can we characterize the “easy” problems?
A prototypical TCS paper
Algorithmica
Efficient Algorithm
(e.g. MAX-FLOW in P)
Interesting
problem
Intractabilia
Hardness Reduction
(e.g. MAX-CUT NP-hard)
Can we make algorithms boring?
Can we reduce creativity in algorithm design?
Can we characterize the “easy” problems?
Theme: Convexity
Algorithmica
Intractabilia
Convexity in optimization
Interesting
Problem
Creativity!!

Convex
Problem
General Solver
Example: Can embed {±1} in [−1, +1] or {π‘₯: βˆ₯ π‘₯ βˆ₯= 1} [Geomans-Williamson’94]
Convexity in optimization
Interesting
Problem
Creativity!!

Convex
Problem
General Solver
Sum of Squares Algorithm: [Shor’87,Parrilo’00,Lasserre’01]
Universal embedding of any* optimization problem into an
𝑛𝑑 -dimensional convex set.
Algorithmic version of works related to Hilbert’s 17th problem
[Artin 27,Krivine64,Stengle74]
• Both “quality” of embedding and running time grow with 𝑑
• 𝑑 = 𝑛 ⇒ optimal solution, exponential time.
• Encapsulates many natural algorithms. Optimal among natural class
[Lee-Raghavenrda-Steurer’15]
Hope*: Problem easy iff embeddable with small 𝑑
Talk Plan
• Dubious historical analogy.
• Philosophize about automating algorithms.
• Wave hands about convexity and the Sum of
Squares algorithm.
• Sudden shift to Bayesianism vs Frequentism.
• Non-results on the planted clique problem.
Frequentists vs Bayesians
“There is 10% chance that the 1020 π‘‘β„Ž digit of πœ‹ is 7”
“Nonsense! The digit is either 7 or isn’t.”
“I will take an 11: 1 bet on this.”
Planted Clique Problem
[Karp’76,Kucera’95]
Distinguish between 𝐺𝑛,1/2 and 𝐺𝑛,1/2 + π‘˜-clique
Central problem in average-case complexity:
• Cryptography [Juels’02,Applebaum-B-Wigderson’10]
• Motifs in biological networks [Milo et al Science’02, Lotem et al PNAS’04,..]
• Sparse principal component analysis [Berthet-Rigollet’12]
• Nash equilibrium [Hazan-Krauthgamer’09]
• Certifying Restricted isometry property [Koiran-Zouzias’12]
No poly time algorithm known when π‘˜ β‰ͺ 𝑛
Image credit: Andrea Montanari
Planted Clique Problem
[Karp’76,Kucera’95]
Distinguish between 𝐺𝑛,1/2 and 𝐺𝑛,1/2 + π‘˜-clique
“Vertex 17 is in clique with probability ≈ π‘˜/𝑛”
“Nonsense! The probability is either 0 or 1.”
Making this formal
Distinguish between 𝐺𝑛,1/2 and 𝐺𝑛,1/2 + π‘˜-clique
Computational
degree 𝑑 pseudo-distribution
Classical Bayesian Uncertainty: posterior distribution
πœ‡: 0,1
𝑛
→ℝ,
∀π‘₯, πœ‡ π‘₯ ≥ 0 ,
∀𝑝, deg 𝑝 ≤ 𝑑 /2 , π”Όπœ‡ 𝑝2 ≥ 0
πœ‡ consistent with observations:
πœ‡ π‘₯ =1
π‘₯
∀𝑖 ∼ 𝑗 π”Όπœ‡ π‘₯𝑖 π‘₯𝑗 ≔
π‘₯πœ‡
π‘₯ π‘₯𝑖 π‘₯𝑗 = 0
Making this formal
Distinguish between 𝐺𝑛,1/2 and 𝐺𝑛,1/2 + π‘˜-clique
Computational
degree 𝑑 pseudo-distribution
Classical Bayesian Uncertainty: posterior distribution
πœ‡: 0,1
𝑛
→ℝ,
∀π‘₯, πœ‡ π‘₯ ≥ 0 ,
∀𝑝, deg 𝑝 ≤ 𝑑 /2 , π”Όπœ‡ 𝑝2 ≥ 0
πœ‡ consistent with observations:
πœ‡ π‘₯ =1
π‘₯
∀𝑖 ∼ 𝑗 π”Όπœ‡ π‘₯𝑖 π‘₯𝑗 ≔
π‘₯πœ‡
π‘₯ π‘₯𝑖 π‘₯𝑗 = 0
Convex
set.
Making this
formal
Defined by 𝑛 eq’s +
𝑑
PSD constraint
Computational
degree 𝑑 pseudo-distribution
Classical Bayesian Uncertainty: posterior distribution
πœ‡: 0,1
𝑛
→ℝ,
∀π‘₯, πœ‡ π‘₯ ≥ 0 ,
πœ‡ π‘₯ =1
∀𝑝, deg 𝑝 ≤ 𝑑 /2 , π”Όπœ‡ 𝑝2 ≥ 0
πœ‡ consistent with observations:
π‘₯
∀𝑖 ∼ 𝑗 π”Όπœ‡ π‘₯𝑖 π‘₯𝑗 ≔
π‘₯πœ‡
π‘₯ π‘₯𝑖 π‘₯𝑗 = 0
“Vertex 17 is in clique with probability π”Όπœ‡ π‘₯17 ”
Definition*: 𝑆𝑂𝑆𝑑 𝐺 =
max
πœ‡:𝑑−𝑝.𝑑𝑖𝑠𝑑
π”Όπœ‡ π‘₯𝑖
Corollary: 𝑆𝑂𝑆𝑑 𝐺 ≥ πœ”(𝐺) for all 𝑑
Open Question: Is 𝑆𝑂𝑆𝑑 𝐺 β‰ͺ 𝑛 for some 𝑑 = 𝑂 1 ?
Convex
set.
Making this
formal
Defined by 𝑛 eq’s +
𝑑
PSD constraint
Computational
degree 𝑑 pseudo-distribution
Classical Bayesian Uncertainty: posterior distribution
πœ‡: 0,1
𝑛
→ℝ,
∀π‘₯, πœ‡ π‘₯ ≥ 0 ,
πœ‡ π‘₯ =1
π”Όπœ‡ 𝑝2 ≥ 0 ∀𝑝, deg 𝑝 ≤ 𝑑 /2
πœ‡ consistent with observations:
π”Όπœ‡ π‘₯𝑖 π‘₯𝑗 ≔
π‘₯
π‘₯πœ‡
π‘₯ π‘₯𝑖 π‘₯𝑗 = 0 ∀𝑖 ∼ 𝑗
“Vertex 17 is in clique with probability π”Όπœ‡ π‘₯17 ”
Definition*: 𝑆𝑂𝑆𝑑 𝐺 =
max
πœ‡:𝑑−𝑝.𝑑𝑖𝑠𝑑
π”Όπœ‡ π‘₯𝑖
Corollary: 𝑆𝑂𝑆𝑑 𝐺 ≥ πœ”(𝐺) for all 𝑑
Open Question: Is 𝑆𝑂𝑆𝑑 𝐺 β‰ͺ 𝑛 for some 𝑑 = 𝑂 1 ?
Open Question: Is 𝑆𝑂𝑆𝑑 𝐺 β‰ͺ 𝑛 for some 𝑑 = 𝑂 1 ?
“Theorem” [Meka-Wigderson’13]: G ∼ 𝐺𝑛,1/2 , 𝑆𝑂𝑆𝑑 𝐺 ≅ 𝑛
“Proof”: Let π‘˜ = 𝛿 𝑛 and define πœ‡ of “maximal ignorance”:
If {𝑖, 𝑗} edge then π”Όπœ‡ π‘₯𝑖 π‘₯𝑗 ≅
π‘˜ 2
𝑛
If {𝑖, 𝑗, β„“} triangle then π”Όπœ‡ π‘₯𝑖 π‘₯𝑗 π‘₯β„“ ≅
π‘˜ 3
,…
𝑛
πœ‡ is valid p-dist assuming higher degree Matrix-valued Chernoff bound
Bug [Pisier]: Concentration bound is false.
In fact, for π‘˜ > 𝑛1/3 , ∃𝑝 deg 2 s.t. π”Όπœ‡ 𝑝2 < 0 [Kelner]
Maximal ignorance moments are OK for π‘˜ ∼ 𝑛1/
0.5𝑑+1
[Meka-Potechin-Wigderson’15, Desphande-Montanari’15, Hopkins-Kothari-Potechin’15 ]
MW’s “conceptual” error
Pseudo-distributions should be as simple as possible
but not simpler.
Following A. Einstein.
Pseudo-distributions should have maximum entropy
but respect the data.
MW violated Bayeisan reasoning:
𝑛
2
Consider π”Όπœ‡ π‘₯𝑖 = Pr[𝑖 in clique] , deg(𝑖) = + Δ , Δ = Θ( 𝑛)
According to MW: π”Όπœ‡ π‘₯𝑖 =
π‘˜
𝑛
𝑛 𝑛 entropy
Pseudo-distributions
should
have
maximum
By Bayesian reasoning: 𝑖 ∉ 𝑆 ⇒ deg 𝑖 ∼ 𝑁 ,
2 2
but respect the data.
𝑛
𝑛
𝑖 ∈ 𝑆 ⇒ deg 𝑖 ∼ 𝑁 + π‘˜,
2
2
π”Όπœ‡ π‘₯𝑖 should be reweighed by exp
2 Δ−π‘˜ 2
−
𝑛
/ exp
2Δ2
−
𝑛
4π‘˜Δ
)
𝑛
∝ exp(
Going Bayesian
Theorem: [B-Hopkins-Kelner-Kothari-Moitra]
For every 𝑑, πœ– > 0 w.h.p. 𝑆𝑂𝑆𝑑 𝐺𝑛,1/2 ≥ 𝑛1/2−πœ–
𝑑 = 4 case recently shown by [Hopkins-Kothari-Potechin-Raghavendra-Schramm’16]
Proof: For every graph 𝐺 we define πœ‡ = πœ‡ 𝐺 : 0,1 𝑛 → ℝ s.t.
π”Όπœ‡ π‘₯𝑖 ≥ 𝑛1/2−πœ– and π”Όπœ‡ π‘₯𝑖 π‘₯𝑗 = 0 ∀ 𝑖, 𝑗 ∉ 𝐸(𝐺)
Bayesian desiderata: For every “simple” map 𝐺 ↦ 𝑝𝐺 where
deg 𝑝𝐺 ≤ 𝑑 ,
𝔼𝐺∼𝐺𝑛,1/2 (π”Όπœ‡
𝐺
𝑝𝐺 ) ≅ 𝔼𝐺′ ∼𝐺𝑛,1/2 ∪𝐾𝑆,π‘₯ ′=1𝑆 [ 𝑝𝐺′ (π‘₯ ′ )] (*)
Crucial observation: If “simple” is low degree then this essentially*
determines the moments – no creativity needed!!
Why is this interesting?
• Shows SoS captures Bayesian reasoning in a way that
other algorithms do not.
• Suggests new way to define what a computationally
bounded observer knows about some quantity..
• ..and a more principled way to design algorithms based
on such knowledge. (see [B-Kelner-Steurer’14,’15])
Even if SoS is not the optimal algorithm we’re looking for, the
dream of a more general theory of hardness, easiness and
knowledge is worth pursuing.
Why is this interesting?
Thanks!!
• Shows SoS captures Bayesian reasoning in a way that
other algorithms do not.
Algorithmica
Intractabilia
• Suggests new way to define what a computationally
bounded observer knows about some quantity..
• ..and a more principled way to design algorithms based
on such knowledge. (see [B-Kelner-Steurer’14,’15])
Even if SoS is not the optimal algorithm we’re looking for, the
dream of a more general theory of hardness, easiness and
knowledge is worth pursuing.
Download