Bayesianism, Convexity, and the quest towards Optimal Algorithms Boaz Barak Harvard University Microsoft Research Partially based on work in progress with Sam Hopkins, Jon Kelner, Pravesh Kothari, Ankur Moitra and Aaron Potechin. Talk Plan • Dubious historical analogy. • Philosophize about automating algorithms. • Wave hands about convexity and the Sum of Squares algorithm. • Sudden shift to Bayesianism vs Frequentism. • Work in progress on the planted clique problem. Skipping today: • Sparse coding / dictionary learning / tensor completion [B-Kelner-Steurer’14,’15 B-Moitra’15] • Unique games conjecture / small set expansion [..B-Brandao-Harrow-Kelner-Steurer-Zhou’12..] • Connections to quantum information theory Prologue: Solving equations Babylonians (~2000BC): del Ferro-TartagliaCardano-Ferrari (1500’s): van Roomen/Viete (1593): Solutions for quadratic equations. Solutions for cubics and quartics. “Challenge all mathematicians in the world” π₯ 45 − 45π₯ 43 + β― + 45π₯ = 7/4 + β― 45/64 Special cases of quintics Euler(1740’s): Vandermonde(1777): Gauss (1796): Solve π₯ 11 = 1 with square and fifth roots π₯ 17 = 1 root: −1+ 17+ 34+2 17 8 + 68+12 17−16 34+2 17−2 1− 17 34−2 17 8 … Ruffini-Abel-Galois (early 1800’s): • Some equations can’t be solved in radicals • Characterization of solvable equations. • Birth of group theory • 17-gon construction now “boring”: few lines of Mathematica. A prototypical TCS paper Interesting problem Efficient Algorithm (e.g. MAX-FLOW in P) Hardness Reduction (e.g. MAX-CUT NP-hard) Can we make algorithms boring? Can we reduce creativity in algorithm design? Can we characterize the “easy” problems? A prototypical TCS paper Algorithmica Efficient Algorithm (e.g. MAX-FLOW in P) Interesting problem Intractabilia Hardness Reduction (e.g. MAX-CUT NP-hard) Can we make algorithms boring? Can we reduce creativity in algorithm design? Can we characterize the “easy” problems? Theme: Convexity Algorithmica Intractabilia Convexity in optimization Interesting Problem Creativity!! ο Convex Problem General Solver Example: Can embed {±1} in [−1, +1] or {π₯: β₯ π₯ β₯= 1} [Geomans-Williamson’94] Convexity in optimization Interesting Problem Creativity!! ο Convex Problem General Solver Sum of Squares Algorithm: [Shor’87,Parrilo’00,Lasserre’01] Universal embedding of any* optimization problem into an ππ -dimensional convex set. Algorithmic version of works related to Hilbert’s 17th problem [Artin 27,Krivine64,Stengle74] • Both “quality” of embedding and running time grow with π • π = π ⇒ optimal solution, exponential time. • Encapsulates many natural algorithms. Optimal among natural class [Lee-Raghavenrda-Steurer’15] Hope*: Problem easy iff embeddable with small π Talk Plan • Dubious historical analogy. • Philosophize about automating algorithms. • Wave hands about convexity and the Sum of Squares algorithm. • Sudden shift to Bayesianism vs Frequentism. • Non-results on the planted clique problem. Frequentists vs Bayesians “There is 10% chance that the 1020 π‘β digit of π is 7” “Nonsense! The digit is either 7 or isn’t.” “I will take an 11: 1 bet on this.” Planted Clique Problem [Karp’76,Kucera’95] Distinguish between πΊπ,1/2 and πΊπ,1/2 + π-clique Central problem in average-case complexity: • Cryptography [Juels’02,Applebaum-B-Wigderson’10] • Motifs in biological networks [Milo et al Science’02, Lotem et al PNAS’04,..] • Sparse principal component analysis [Berthet-Rigollet’12] • Nash equilibrium [Hazan-Krauthgamer’09] • Certifying Restricted isometry property [Koiran-Zouzias’12] No poly time algorithm known when π βͺ π Image credit: Andrea Montanari Planted Clique Problem [Karp’76,Kucera’95] Distinguish between πΊπ,1/2 and πΊπ,1/2 + π-clique “Vertex 17 is in clique with probability ≈ π/π” “Nonsense! The probability is either 0 or 1.” Making this formal Distinguish between πΊπ,1/2 and πΊπ,1/2 + π-clique Computational degree π pseudo-distribution Classical Bayesian Uncertainty: posterior distribution π: 0,1 π →β, ∀π₯, π π₯ ≥ 0 , ∀π, deg π ≤ π /2 , πΌπ π2 ≥ 0 π consistent with observations: π π₯ =1 π₯ ∀π ∼ π πΌπ π₯π π₯π β π₯π π₯ π₯π π₯π = 0 Making this formal Distinguish between πΊπ,1/2 and πΊπ,1/2 + π-clique Computational degree π pseudo-distribution Classical Bayesian Uncertainty: posterior distribution π: 0,1 π →β, ∀π₯, π π₯ ≥ 0 , ∀π, deg π ≤ π /2 , πΌπ π2 ≥ 0 π consistent with observations: π π₯ =1 π₯ ∀π ∼ π πΌπ π₯π π₯π β π₯π π₯ π₯π π₯π = 0 Convex set. Making this formal Defined by π eq’s + π PSD constraint Computational degree π pseudo-distribution Classical Bayesian Uncertainty: posterior distribution π: 0,1 π →β, ∀π₯, π π₯ ≥ 0 , π π₯ =1 ∀π, deg π ≤ π /2 , πΌπ π2 ≥ 0 π consistent with observations: π₯ ∀π ∼ π πΌπ π₯π π₯π β π₯π π₯ π₯π π₯π = 0 “Vertex 17 is in clique with probability πΌπ π₯17 ” Definition*: ππππ πΊ = max π:π−π.πππ π‘ πΌπ π₯π Corollary: ππππ πΊ ≥ π(πΊ) for all π Open Question: Is ππππ πΊ βͺ π for some π = π 1 ? Convex set. Making this formal Defined by π eq’s + π PSD constraint Computational degree π pseudo-distribution Classical Bayesian Uncertainty: posterior distribution π: 0,1 π →β, ∀π₯, π π₯ ≥ 0 , π π₯ =1 πΌπ π2 ≥ 0 ∀π, deg π ≤ π /2 π consistent with observations: πΌπ π₯π π₯π β π₯ π₯π π₯ π₯π π₯π = 0 ∀π ∼ π “Vertex 17 is in clique with probability πΌπ π₯17 ” Definition*: ππππ πΊ = max π:π−π.πππ π‘ πΌπ π₯π Corollary: ππππ πΊ ≥ π(πΊ) for all π Open Question: Is ππππ πΊ βͺ π for some π = π 1 ? Open Question: Is ππππ πΊ βͺ π for some π = π 1 ? “Theorem” [Meka-Wigderson’13]: G ∼ πΊπ,1/2 , ππππ πΊ ≅ π “Proof”: Let π = πΏ π and define π of “maximal ignorance”: If {π, π} edge then πΌπ π₯π π₯π ≅ π 2 π If {π, π, β} triangle then πΌπ π₯π π₯π π₯β ≅ π 3 ,… π π is valid p-dist assuming higher degree Matrix-valued Chernoff bound Bug [Pisier]: Concentration bound is false. In fact, for π > π1/3 , ∃π deg 2 s.t. πΌπ π2 < 0 [Kelner] Maximal ignorance moments are OK for π ∼ π1/ 0.5π+1 [Meka-Potechin-Wigderson’15, Desphande-Montanari’15, Hopkins-Kothari-Potechin’15 ] MW’s “conceptual” error Pseudo-distributions should be as simple as possible but not simpler. Following A. Einstein. Pseudo-distributions should have maximum entropy but respect the data. MW violated Bayeisan reasoning: π 2 Consider πΌπ π₯π = Pr[π in clique] , deg(π) = + Δ , Δ = Θ( π) According to MW: πΌπ π₯π = π π π π entropy Pseudo-distributions should have maximum By Bayesian reasoning: π ∉ π ⇒ deg π ∼ π , 2 2 but respect the data. π π π ∈ π ⇒ deg π ∼ π + π, 2 2 πΌπ π₯π should be reweighed by exp 2 Δ−π 2 − π / exp 2Δ2 − π 4πΔ ) π ∝ exp( Going Bayesian Theorem: [B-Hopkins-Kelner-Kothari-Moitra] For every π, π > 0 w.h.p. ππππ πΊπ,1/2 ≥ π1/2−π π = 4 case recently shown by [Hopkins-Kothari-Potechin-Raghavendra-Schramm’16] Proof: For every graph πΊ we define π = π πΊ : 0,1 π → β s.t. πΌπ π₯π ≥ π1/2−π and πΌπ π₯π π₯π = 0 ∀ π, π ∉ πΈ(πΊ) Bayesian desiderata: For every “simple” map πΊ β¦ ππΊ where deg ππΊ ≤ π , πΌπΊ∼πΊπ,1/2 (πΌπ πΊ ππΊ ) ≅ πΌπΊ′ ∼πΊπ,1/2 ∪πΎπ,π₯ ′=1π [ ππΊ′ (π₯ ′ )] (*) Crucial observation: If “simple” is low degree then this essentially* determines the moments – no creativity needed!! Why is this interesting? • Shows SoS captures Bayesian reasoning in a way that other algorithms do not. • Suggests new way to define what a computationally bounded observer knows about some quantity.. • ..and a more principled way to design algorithms based on such knowledge. (see [B-Kelner-Steurer’14,’15]) Even if SoS is not the optimal algorithm we’re looking for, the dream of a more general theory of hardness, easiness and knowledge is worth pursuing. Why is this interesting? Thanks!! • Shows SoS captures Bayesian reasoning in a way that other algorithms do not. Algorithmica Intractabilia • Suggests new way to define what a computationally bounded observer knows about some quantity.. • ..and a more principled way to design algorithms based on such knowledge. (see [B-Kelner-Steurer’14,’15]) Even if SoS is not the optimal algorithm we’re looking for, the dream of a more general theory of hardness, easiness and knowledge is worth pursuing.