Games, Proofs, Norms, and Algorithms Boaz Barak – Microsoft Research Based (mostly) on joint works with Jonathan Kelner and David Steurer This talk is about • Hilbert’s 17th problem / Positivstellensatz • Proof complexity • Semidefinite programming • The Unique Games Conjecture • Machine Learning • Cryptography.. (in spirit). Theorem: ∀π₯ ∈ β, 10π₯ − π₯ 2 ≤ 25 Proof: 10π₯ − π₯ 2 = 25 − π₯ − 5 2 [Minkowski 1885, Hilbert 1888,Motzkin 1967]: Sum of squares of polynomials ∃ (multivariate) polynomial inequality without “square completion” proof Hilbert’s 17th problem: Can we always prove π π₯1 . . π₯π ≤ πΆ by showing π = πΆ − πππ/(1 + πππ ′ )? [Artin ’27, Krivine ’64, Stengle ‘73 ]: Yes! Even more general polynomial equations. Known as “Positivstellensatz” [Grigoriev-Vorobjov ’99]: Measure complexity of proof = degree of πππ, πππ′. • Typical TCS inequalities (e.g., bound π(π₯) for π₯ ∈ 0,1 π ) , degree = π π • Often degree much smaller. SOS / Lasserre SDP hierarchy • Exception – probabilistic method – examples taking Ω π degree [Grigoriev ‘99] [Shor’87,Parillo ’00, Nesterov ’00, Lasserre ’01]: Degree π SOS proofs for π-variable inequalities can be found in ππ π time. Theorem: ∀π₯ ∈ β, 10π₯ − π₯ 2 ≤ 25 Proof: 10π₯ − π₯ 2 = 25 − π₯ − 5 2 [Minkowski 1885, Hilbert 1888,Motzkin 1967]: Sum of squares of polynomials ∃ (multivariate) polynomial inequality without “square completion” proof Hilbert’s 17th problem: Can we always prove π π₯1 . . π₯π ≤ πΆ by showing π = πΆ − πππ/(1 + πππ ′ )? [Artin ’27, Krivine ’64, Stengle ‘73 ]: Yes! Even more general polynomial equations. Known as “Positivstellensatz” [Grigoriev-Vorobjov ’99]: Measure complexity of proof = degree of πππ, πππ′. • Typical TCS inequalities (e.g., bound π(π₯) for π₯ ∈ 0,1 π ) , degree = π π • Often degree much smaller. SOS / Lasserre SDP hierarchy • Exception – probabilistic method – examples taking Ω π degree [Grigoriev ‘99] [Shor’87,Parillo ’00, Nesterov ’00, Lasserre ’01]: Degree π SOS proofs for π-variable inequalities can be found in ππ π time. [Shor’87,Parillo ’00, Nesterov ’00, Lasserre ’01]: Degree π SOS proofs for π-variable inequalities can be found in ππ π time. General algorithm for polynomial optimization – maximize π(π₯) over π₯ ∈ 0,1 π . (more generally: optimize over π₯ s.t. π1 π₯ =. . = ππ (0) for low degree π1 . . ππ ) Efficient if ∃ low degree SOS proof for bound, exponential in the worst case. This talk: General method to analyze the SOS algorithm. [B-Kelner-Steurer’13] Applications: • Optimizing polynomials with non-negative coefficients over the sphere. • Algorithms for quantum separability problem [Brandao-Harrow’13] • Finding sparse vectors in subspaces: • Non-trivial worst case approx, implications for small set expansion problem. • Strong average case approx, implications for machine learning, optimization [Demanet-Hand ‘13] • Approach to refute the Unique Games Conjecture. • Learning sparse dictionaries beyond the π barrier. [Shor’87,Parillo ’00, Nesterov ’00, Lasserre ’01]: Degree π SOS proofs for π-variable inequalities can be found in ππ π time. General algorithm for polynomial optimization – maximize π(π₯) over π₯ ∈ 0,1 π . of this talk: (moreRest generally: optimize over π₯ s.t. π1 π₯ =. . = ππ (0) for low degree π1 . . ππ ) Previously used for lower bounds. Describe general rounding SOS proofs. Here used upper bounds. Efficient if ∃ low• degree SOS proof approach for bound,for exponential infor the worst case. • Define “Pseudoexpectations” aka “Fake Marginals” This talk: General method to analyze the SOS algorithm. [B-Kelner-Steurer’13] • Pseudoexpectation ↔ SOS proofs connection. Applications:• Using pseudoexpectation for combining β¦ rounding. • Optimizing polynomials with non-negative coefficients over the sphere. • Example: Finding sparse vector in subspaces (main tool: hypercontractive norms [Brandao-Harrow’13] β₯⋅β₯π→π for π > π) • Algorithms for quantum separability problem • Relation to Unique Games Conjecture • Finding sparse vectors in subspaces: Future • Non-trivial• worst casedirections approx, implications for small set expansion problem. • Strong average case approx, implications for machine learning, optimization [Demanet-Hand ‘13] • Approach to refute the Unique Games Conjecture. • Learning sparse dictionaries beyond the π barrier. Problem: Given low degree π, π1 , . . , Pk : βπ → β maximize π(π₯) s.t. ∀π ππ π₯ = 0 Hard: Encapsulates SAT, CLIQUE, MAX-CUT, etc.. Easier problem: Given many good solutions, find single OK one. (multi) set π of π₯ ′ π s.t. π π₯ ≥ π£, ∀π ππ π₯ = 0 Non-trivial combiner: Only depends on low degree marginals of π Combiner { πΌπ₯∼π π₯π1 β― π₯ππ }π1 ,..,ππ ∈ π Single x ∗ s.t. π π₯ ∗ ≥ π£ ′ , ∀π ππ π₯ ∗ = 0 [B-Kelner-Steurer’13]: Transform “simple” non-trivial combiners to algorithm for original problem. Crypto flavor… Idea in a nutshell: Simple combiners will output a solution even when fed “fake marginals”. Next: Definition of “fake marginals” Def: Degree π pseudoexpectation is operator mapping any degree ≤ π poly π into a number πΌπ(π) satisfying: • Normalization: πΌ1 = 1 • Linearity: πΌ ππ π + ππ π • Positivity: πΌπ2 (π) ≥ 0 ∀π of deg≤ π/2 = ππΌπ π + ππΌπ π ∀π, π of deg≤ π Can describe operator as nd/2 × nd/2 matrix π s.t. ππ1..ππ = πΌππ1 β―Dual πππ view of SOS/Lasserre Positivity condition means π is p.s.d : ππ ππ ≥ 0 for every vector π ∈ βπ ⇒ can optimize over deg π pseudoexpectations in ππ π π/2 time. Fundamental Fact: ∃ deg π SOS proof for π > 0 ⇔ πΌπ π > 0 for any deg π pseudoexpectation operator Take home message: • Pseudoexpectation “looks like” real expectation to low degree polynomials. • Can efficiently find pseudoexpectation matching any polynomial constraints. • Proofs about real random vars can often be “lifted” to pseudoexpectation. Combining ⇒ Rounding Problem: Given low degree π, π1 , . . , Pk : βπ → β maximize π(π₯) s.t. ∀π ππ π₯ = 0 [B-Kelner-Steurer’13]: Transform “simple” non-trivial combiners to algorithm for original problem. Non-trivial combiner: Alg πΆ with Input: { πΌ ππ1 β― πππ }π1..ππ ∈[π] , π r.v. over βπ s.t. πΌ π π₯ − v 2 = 0, ∀π πΌππ π 2 =0 Output: π₯ ∗ ∈ βπ s.t. P π₯ ∗ ≥ v/2, ∀π ππ (π₯ ∗ ) = 0 Crucial Observation: If proof that π₯ ∗ is good solution is in SOS framework, then it holds even if πΆ is fed with a pseudoexpectation. Corollary: In this case, we can find π₯ ∗ efficiently: • Use SOS PSD to find pseudoexpectation matching input conditions. • Use πΆ to round the PSD solution into an actual solution π₯ ∗ Example: Finding a planted sparse vector Let unit π£ 0 ∈ βπ be sparse (Supp π£ 0 = ππ ), π£ 1 , . . , π£ π ∈ βπ random Goal: Given basis for V = Span{π£ 0 , . . , π£ π } , find π£0 (motivation: machine learning, optimization , [Demanet-Hand 13] worst-case variant is algorithmic bottleneck in UG/SSE alg [Arora-B-Steurer’10]) Previous best results: π βͺ 1/ π [Spielman-Wang-Wright ’12, Demanet-Hand ’13] We show: π βͺ 1 is sufficient, as long as π ≤ π Approach: π£ 0 looks like this: Vector π£ ∈ Span π£ 1 . . π£ π looks like this: In particular can prove 04 π£π β« π£π4 for all unit π£ ∈ Span π£ 1 . . π£ π Example: Finding a planted sparse vector Let unit π£ 0 ∈ βπ be sparse (Supp π£ 0 = ππ ), π£ 1 , . . , π£ π ∈ βπ random Goal: Given basis for V = Span{π£ 0 , . . , π£ π } , find π£0 (motivation: machine learning, optimization , [Demanet-Hand 13] worst-case variant is algorithmic bottleneck in UG/SSE alg [Arora-B-Steurer’10]) 4 In particular π£π0 β« π βͺ π£π41/ π Previous best results: [Spielman-Wang-Wright ’12, Demanet-Hand ’13] We show: π βͺ 1 is sufficient, as long as π ≤ π Approach: π£ 0 looks like this: Vector π£ ∈ Span π£ 1 . . π£ π looks like this: In particular can prove 04 π£π β« π£π4 for all unit π£ ∈ Span π£ 1 . . π£ π Let unit π£ 0 ∈ βπ be sparse (Supp π£ 0 = ππ ), π£ 1 , . . , π£ π ∈ βπ random Goal: Given basis for V = Span{π£ 0 , . . , π£ π } , find π£0 Approach: π£ 0 looks like this: Vector π£ ∈ Span π£ 1 . . π£ π looks like this: In particular 4 π£π0 β« π£π4 Lemma: If w ∈ π unit with π€π4 ≥ (1 − π 1 ) 04 π£π then π€, π£ 0 ≥ 1 − π(1) i.e., it looks like this: Proof: Write π€ = ππ’0 + π€′ 1−π 1 β₯ π£ 0 β₯4 ≤β₯ π£ β₯4 ≤ π β₯ π’0 β₯4 +β₯ π£ ′ β₯4 ≤ π β₯ π£ 0 β₯4 +π(β₯ π£ 0 β₯4 ) Corollary: If π· distribution over such w then top e-vec of πΌπ€∼π· π€ ⊗2 is 1 − π 1 correlated with π£ 0 . Algorithm follows by noting that Lemma has SOS proof. Hence even if π· is pseudoexpectation we can still recover π£ 0 from its moments. Other Results Solve sparse vector problem* for arbitrary (worst-case) subspace π if π βͺ π −1/3 Sparse Dictionary Learning (aka “Sparse Coding”, “Blind Source Separation”): Recover π£ 1 . . π£ π ∈ βπ from random π-sparse linear combinations of them. Important tool for unsupervised learning. Previous work: only for π βͺ 1/ π [Spielman-Wang-Wright ‘12, Arora-Ge-Moitra ‘13, Agrawal-Anandkumar-Netrapali’13] Our result: any π βͺ 1 (can also handle π > π ) [Brandao-Harrow’12]: Using our techniques, find separable quantum state maximizing a “local operations classical communication” (πΏππΆπΆ) measurement. A personal overview of the Unique Games Conjecture Unique Games Conjecture: UG/SSE problem is NP-hard. [Khot’02,Raghavendra-Steurer’08] reasons to believe “Standard crypto heuristic”: Tried to solve it and couldn’t. reasons to suspect Random instances are easy via simple algorithm SOS proof system [Arora-Khot-Kolla-Steurer-Tulsiani-Vishnoi’05] Very clean picture of complexity landscape: simple algorithms are optimal [Khot’02…Raghavendra’08….] Simple poly algorithms can’t refute it Quasipoly algo on KV instance [Khot-Vishnoi’04] [Kolla ‘10] Simple subexp' algorithms can’t refute it Subexponential algorithm [B-Gopalan-Håstad-Meka-Raghavendra-Steurer’12] [Arora-B-Steurer ‘10] SOS solves all candidate hard instances [B-Brandao-Harrow-Kelner-Steurer-Zhou ‘12] SOS useful for sparse vector problem Candidate algorithm for search problem [B-Kelner-Steurer ‘13] Conclusions • Sum of Squares is a powerful algorithmic framework that can yield strong results for the right problems. (contrast with previous results on SDP/LP hierarchies, showing lower bounds when using either wrong hierarchy or wrong problem.) • “Combiner” view allows to focus on the features of the problem rather than details of relaxation. • SOS seems particularly useful for problems with some geometric structure, includes several problems related to unique games and machine learning. • Still have only rudimentary understanding when SOS works or not. • Other proof complexity ↔ approximation algorithms connections?