Time-Space Tradeoffs in Proof Complexity: Superpolynomial Lower Bounds for Superlinear Space Chris Beck Princeton University Joint work with Paul Beame & Russell Impagliazzo Jakob Nordstrom & Bangsheng Tang SAT • The satisfiability problem is a central problem in computer science, in theory and in practice. • Terminology: – A Clause is a boolean formula which is an OR of variables and negations of variables. – A CNF formula is an AND of clauses. • Object of study for this talk: CNF-SAT: Given a CNF formula, is it satisfiable? Resolution Proof System • Lines are clauses, one simple proof step • Proof is a sequence of clauses each of which is – an original clause or – follows from previous ones via resolution step • a CNF is UNSAT iff can derive empty clause ⊥ Proof DAG General resolution: Arbitrary DAG Tree-like resolution: DAG is a tree SAT Solvers • Well-known connection between Resolution and SAT solvers based on Backtracking • These algorithms are very powerful – sometimes can quickly handle CNF’s with millions of variables. • On UNSAT formulas, computation history yields a Resolution proof. – Tree-like Resolution ≈ DPLL algorithm – General Resolution ≿ DPLL + “Clause Learning” • Best current SAT solvers use this approach SAT Solvers • DPLL algorithm: backtracking search for satisfying assignment – Given a formula which is not constant, guess a value for one of its variables 𝑥, and recurse on the simplified formula. – If we don’t find an assignment, set 𝑥 the other way and recurse again. – If one of the recursive calls yields a satisfying assignment, adjoin the good value of 𝑥 and return, otherwise report fail. SAT Solvers • DPLL requires very little memory • Clause learning adds a new clause to the input CNF every time the search backtracks – Uses lots of memory to try to beat DPLL. – In practice, must use heuristics to guess which clauses are “important” and store only those. Hard to do well! Memory becomes a bottleneck. • Question: Is this inherent? Or can the right heuristics avoid the memory bottleneck? Proof Complexity & Sat Solvers • Proof Size ≤ Time for Ideal SAT Solver • Proof Space ≤ Memory for Ideal SAT Solver • Many explicit hard UNSAT examples known with exponential lower bounds for Resolution Proof Size. • Question: Is this also true for Proof Space? Space in Resolution … 𝐶1 𝐶2 x˅𝑦 𝑥˅𝑧 𝑦˅𝑧 Time step 𝑡 ⊥ Must be in memory • Clause space ≔ max(# active clauses). 𝑡 [Esteban, Torán ‘99] Lower Bounds on Space? • Considering Space alone, tight lower and upper bounds of 𝛉(𝒏), for explicit tautologies of size 𝒏. • Lower Bounds: [ET’99, ABRW’00, T’01, AD’03] • Upper Bound: All UNSAT formulas on 𝑛 variables have tree-like refutation of space ≤ 𝑛. [Esteban, Torán ‘99] • For a tree-like proof, Space is at most the height of the tree which is the stack height of DPLL search • But, these tree-like proofs are typically too large to be practical, so we should instead ask if both small space and time are simultaneously feasible. Size-Space Tradeoffs for Resolution • [Ben-Sasson ‘01] Pebbling formulas with linear size refutations, but for which all proofs have Space ∙ log Size = Ω(n/log n). • [Ben-Sasson, Nordström ‘10] Pebbling formulas which can be refuted in Size O(n), Space O(n), but Space O(n/log n) Size exp(n(1)). But, these are all for Space < 𝑛, and SAT solvers generally can afford to store the input formula in memory. Can we break the linear space barrier? Size-Space Tradeoffs • Informal Question: Can we formally show that memory rather than time can be a real bottleneck for resolution proofs and SAT solvers? • Formal Question (Ben-Sasson): “Does there exist a 𝑐 such that any CNF with a refutation of size T also has a refutation of size T𝑐 in space O(𝑛)?” • Our results: Families of formulas of size n having refutations in Time, Space nk, but all resolution refutations have T > (n0.58 k/S)loglog n/logloglog n Tseitin Tautologies Given an undirected graph :𝑉→𝔽2 , define a CSP: and Boolean variables: Parity constraints: (linear equations) When has odd total parity, CSP is UNSAT. Tseitin Tautologies • When odd, G connected, corresponding CNF is called a Tseitin tautology. [Tseitin ‘68] • Only total parity of matters • Hard when G is a constant degree expander: [Urqhart 87]: Resolution size = 2Ω(𝐸) [Torán 99]: Resolution space =Ω(E) • This work: Tradeoffs on 𝒏 × 𝒍 grid, 𝒍 ≫ 𝒏, and similar graphs, using isoperimetry. Tseitin formula on Grid • Consider Tseitin formula on 𝒏 × 𝒍 grid, 𝒍=4𝒏𝟐 n l • How can we build a resolution refutation? Tseitin formula on Grid • One idea: Mimic linear algebra refutation • If we add the lin eqn’s in any order, get 1 = 0. n l • A linear equation on 𝑘 variables corresponds to 2𝑘 clauses. Resolution is implicationally complete, so can simulate any step of this with 2𝑂(𝑘) -blowup. Tseitin formula on Grid • One idea: Mimic linear algebra refutation • If we add the lin eqn’s in column order: n l • Never have an intermediate equation with more than 𝑛 variables. Get a proof of Size ≈ #𝑣𝑒𝑟𝑡𝑖𝑐𝑒𝑠 ∙ 2𝑛 , Space ≈ 2𝑛 . Tseitin formula on Grid • Different idea: Divide and conquer • In DPLL, repeatedly bisect the graph n l • Each time we cut, one of the components is unsat. Done after 𝒏 𝐥𝐨𝐠 |𝐕| queries, tree-like proof with Space ≈ 𝒏 𝐥𝐨𝐠 𝐧𝐥, Size (𝒏𝒍)𝒏 . • Savitch-like savings in space, for 𝒍 = 𝐩𝐨𝐥𝐲(𝒏). Tseitin formula on Grid • “Chris, isn’t this just tree-width?” – Exactly. • Our work is about whether this can be improved. n l • The lower bound shows is that quasipolynomial blowup in size when the space is below 2𝑛 is necessary for the proof systems we studied. • For technical reasons, work with “doubled” grid. High Level Overview of Lower Bound • [Beame Pitassi ’96] simplified previous Proof Size lower bounds using random restrictions. • A restriction 𝜌 to a formula 𝜑 is a partial assignment of truth values to variables, resulting in some simplification. We denote the restricted formula by 𝜑|𝜌 . • If Π is a proof of 𝜑, Π|𝜌 is a proof of 𝜑|𝜌 . High Level Overview of Lower Bound • Philosophy of [Beame, Pitassi ’96] – Find 𝜑 such that any refutation contains a “complex clause” – Find 𝜑 and a random restriction which restricts φ’ to 𝜑, but also kills any complex clause whp. – Modern form of “bottleneck counting” due to Haken. Can reinterpret this as finding a large collection of assignments, one extending each restriction, each passing through a wide clause. High Level Overview of Lower Bound • This work: Multiple classes of complex clauses – Can state main idea as a pebbling argument. – Suppose G is a DAG with T vertices that can be scheduled in space S, and the vertices can be broken into 4 classes, μ ∶ 𝑉 → {1,2,3,4}. – Assume that for arcs (𝑢, 𝑣), 𝜇 𝑢 ≤ 𝜇(𝑣) + 1 – 𝐴ll sources in 1, all sinks in 4. Everything in 2 is a “complex clauses” of first type, everything in 3 is a “complex clauses” of second type. High Level Overview of Lower Bound • Suppose that there is a distribution Π on source-to-sink paths and a real number p s.t.: – For any vertex 𝑣, μ 𝑣 ∈ 2,3 , 𝑃𝑟𝜋 𝑣 ∈ 𝜋 ≤ p. – For any two vertices 𝑣1 , 𝑣2 , 𝜇 𝑣1 = 2, 𝜇 𝑣2 = 3, 𝑃𝑟𝜋 𝑣1 , 𝑣2 ∈ 𝜋 ≤ 𝑝2 . • Then, 𝑇 2 𝑆 = Ω 𝑝−3 . High Level Overview of Lower Bound • Proof: – Let 𝑣1 … 𝑣𝑇 be sequence of pebbling steps using only S pebbles. Divide time into 𝑚 epochs, 𝑚 to be determined later. – Plot 𝜇 vs. time 4 3 2 1 Time High Level Overview of Lower Bound • Let 𝜋 be any source-to-sink path. Claim: – Either, π hits a vertex of level 2 and a vertex of level 3 during the same epoch – Or, 𝜋 hits a vertex of level 2 or 3 during one of the breakpoints between epochs. 4 3 2 1 Time High Level Overview of Lower Bound • Now choose 𝜋 randomly. – Either, π hits a vertex of level 2 and a vertex of level 3 during the same epoch Probability of this: By a union bound, at most 𝑚(𝑇 𝑚)2 𝑝2 4 3 2 1 Time High Level Overview of Lower Bound • Now choose 𝜋 randomly. – Or, 𝜋 hits a vertex of level 2 or 3 during one of the breakpoints between epochs. Probability of this: By a union bound, at most 𝑚𝑆𝑝 4 3 2 1 Time High Level Overview of Lower Bound • Now choose 𝜋 randomly. – Conclude 1 ≤ 𝑚−1 𝑇 2 𝑝2 + 𝑚𝑆𝑝 • Optimizing 𝑚 yields 𝑇 2 𝑆 = Ω 𝑝−3 . • Previous such results only for expanders, etc. • Gets stronger when you have more levels. Later we will see that with 𝑙 levels, get 𝑇 ≥ 𝑝𝑆 log 𝑙 Ω log log 𝑙 High Level Overview of Lower Bound • Goal for remainder of analysis: – Show that any proof of Tseitin on doubled grid has such a flow (using restrictions), for the best 𝑝 and as many complex clause classes as possible. – It is enough that any k complex clauses from distinct classes have collective width ≥ 𝑘𝑊 for some large W. (For n x l grid, W will be n.) – How did we get complex clauses before in [Beame Pitassi ‘96]? A “measure of progress” argument… Progress Measure • The following progress measure construction appeared in [Ben-Sasson and Wigderson ’01]. • Let 𝐴1 … 𝐴𝑚 be a partition of clauses of an unsat CNF. Assume it is minimally unsat. • For any clause C, define 𝜇 𝐶 ≔ min 𝑆 𝑆⊆[𝑚], 𝐴𝑖 𝑖∈𝑆 ⊨𝐶 Progress Measure • 𝜇 is a sub-additive complexity measure: 𝜇(initial clause) = 1, 𝜇(⊥) = m, 𝜇(𝐶) ≤ 𝜇(𝐶1 ) + 𝜇(𝐶2), when 𝐶1, 𝐶2 ⊦ 𝐶. • In any refutation, there must occur a clause C 1 2 with 𝜇 𝐶 /𝑚 ∈ [ , ]. In previous work, 3 3 choose 𝐴1 … 𝐴𝑚 so that this implies C is wide. Progress Measure for Tseitin • For Tseitin tautologies, natural choice is to have one 𝐴𝑖 for each vertex v, containing all associated clauses. So 𝐴𝑖 is an 𝔽2 lin. eqn. • Claim: Let C be any clause, and let 𝑆 ⊆ 𝑉, 𝑆 ⊨ 𝐶 be any subset attaining the min, 𝑆 = 𝜇(𝐶). Then, δ(𝑆) ⊆ 𝑉𝑎𝑟𝑠(𝐶). Progress Measure for Tseitin • Claim: Let C be any clause, and let 𝑆 ⊆ 𝑉, 𝑆 ⊨ 𝐶, 𝑆 = 𝜇(𝐶). Then δ(𝑆) ⊆ 𝑉𝑎𝑟𝑠(𝐶). • Proof: • If 𝑆 ⊨ 𝐶, then 𝑆 ⋀¬𝐶 is unsat linear system. So we can derive 1 = 0 by linear combinations. • If any equation from S has coeff 0, S wasn’t minimal. In sum of equations of S, noncancelling vars are exactly boundary edge vars. These must cancel with ¬𝐶 to get 1=0, so δ(𝑆) ⊆ 𝑉𝑎𝑟𝑠(𝐶). Isoperimetry → Lower Bounds • Consequence: Tseitin formula has minimum refutation width at least the balanced bisection width of graph G. • Corollary: Tseitin on a constant degree expander requires width Ω(𝑛). • Corollary: On 𝑛 ⨉ 𝑙 grid, needs width 𝑛. • Here, “Complex clause” := 𝜇(𝐶)/m ∈ 1 2 [ , ] 3 3 Size Lower Bounds from Width • We have formulas with width lower bounds, can use XOR-substitution and restrictions to get size lower bounds. • For 𝐹 a CSP, 𝐹[⨁] is the CSP over variable set with two copies 𝑥1 , 𝑥2 of each var 𝑥 of 𝐹, and each constraint of 𝐹 interpretted as constraining the parities (𝑥1 ⨁𝑥2 ) rather than the variables 𝑥. (Then expand as CNF.) Size Lower Bounds from Width • 𝐹[⨁] has a natural random restriction 𝜌: – For each 𝑥, independently choose one of 𝑥1 , 𝑥2 to restrict to {0,1} at random. Substitute either 𝑥 or ¬𝑥 for the other, so that 𝑥1 ⨁𝑥2 |𝜌 = 𝑥. • Previously used in [B’01]. Nice properties: – 𝐹 ⨁ |𝜌 = 𝐹, always – Fairly dense, kills all wide clauses whp. – Black box, doesn’t depend on 𝐹. Size Lower Bounds from Width • When 𝐹 is a Tseitin tautology, 𝐹[⨁] is “multigraph-Tseitin” on same graph but where each edge has been doubled. • Suppose Tseitin on doubled grid has a proof of size less than 2𝑐𝑛 . Then by a union bound, some restriction kills all clauses of width 𝑛, so Tseitin on the grid has proof of width < 𝑛. Contradiction, for small enough c. Additional Complex Clause Levels • Idea: Instead of just considering C with 𝜇(𝐶)/m ∈ [1/3, 2/3] , what about [1/6, 1/3], [1/12, 1/16], etc.? • Gives us the levelled property we need. • Task: Show that clauses from k different levels are collectively wide. • Given the machinery we have, this is some “extended isoperimetry” property of the grid. Extended Isoperimetry • Claim: Let 𝑆1 … 𝑆𝑘 be subsets of vertices, 𝑆𝑖 ∈ [𝑛2 ,𝑛𝑙/2], of superincreasing sizes. n l Then, δ(𝑆𝑖 ) ≥ Ω(𝑘𝑛). • Several simple combinatorial proofs of this. • Implies clauses from distinct levels are collectively wide. Main Lemma • For any set 𝑆 of 𝑀 clauses in doubled-Tseitin, Pr[𝑆|𝜌 contains ≥ 𝑘 complex clauses of 𝑑𝑖𝑠𝑡𝑖𝑛𝑐𝑡 complexity levels] ≤ 𝑀2 −Ω 𝑛 𝑘 Proof: For any k-tuple, Ω(kn) opportunities for 𝜌 to kill at least one. Union bound over k-tuples. Complexity vs. Time • Consider the time ordering of any proof, and plot complexity of clauses in memory v. time μ Input clauses Time • Sub-additivity implies cannot skip over any [t, 2t] window of μ-values on the way up. ⊥ Complexity vs. Time • Consider the time ordering of any proof, and divide time into 𝑚 equal epochs (fix 𝑚 later) Hi Med Low Time Two Possibilities • Consider the time ordering of any proof, and divide time into 𝑚 equal epochs (fix 𝑚 later) Hi Med Low Time • Either, a clause of medium complexity appears in memory for at least one of the breakpoints between epochs, Two Possibilities • Consider the time ordering of any proof, and divide time into 𝑚 equal epochs (fix 𝑚 later) Hi Med Low Time • Or, all breakpoints only have Hi and Low. Must have an epoch which starts Low ends Hi, and so has clauses of all ≈log 𝑛 Medium levels. Analysis • Consider the restricted proof. With probability 1, one of the two scenarios applies. • For first scenario, 𝑀 = 𝑚𝑆 clauses, Pr 1st scenario ≤ 𝑚𝑆 exp(−𝑛) • For second scenario, 𝑚 epochs with 𝑀 = 𝑇/𝑚 clauses each. Union bound and main lemma: 𝑘 𝑇 Pr 2nd scenario ≤ 𝑚 exp(−Ω(𝑛)) , 𝑚 where 𝑘 ≈ log 𝑛 Analysis • Optimizing 𝑚 yields something of the form 𝑇𝑆 = exp Ω 𝑛 for 𝑘 = ω 1 . • Can get a nontrivial tradeoff this way, but need to do a lot of work on constants. Better tradeoff • Don’t just divide into epochs once • Recursively divide proof into epochs and sub-epochs where each sub-epoch contains 𝑚 sub-epochs of the next smaller size • Prove that, if an epoch does a lot of work, either – Breakpoints contain many complexities – A sub-epoch does a lot of work An even better tradeoff 𝑎 𝑘 a • If an epoch contains a clause at the end of level 𝑏, but every clause at start is level ≤ 𝑏 − 𝑎, (so the epoch makes 𝑎 progress), • and the breakpoints of its 𝑚 children epochs contain together ≤ 𝑘 complexity levels, • then some child epoch makes 𝑎/𝑘 progress. Internal Node Size = mS Leaf Node Size = T/m^(h-1) Previous Slide shows: Some set has ≥ 𝑘 complexities, where 𝑘 ℎ = log 𝑛 … An even better tradeoff • Choose ℎ = 𝑘, have 𝑘𝑘 = log 𝑛, so 𝑘 = 𝜔(1) • Choose 𝑚 so all sets are the same size: 𝑇/𝑚𝑘−1 ≈ 𝑚𝑆, so all events are rare together. • Have 𝑚𝑘 sets in total Finally, a union bound yields 𝑇 ≥ exp(Ω(𝑛)) 𝑘 𝑆 Final Tradeoff • Tseitin formulas on 𝒏 × 𝒍 grid for 𝑙 = 2𝑛4 – are of size ≈ 𝒏𝟓 . – have resolution refutations in Size, Space ≈ 2𝑛 – have 𝑛 log 𝑛 space refutations of Size 2𝑛 log 𝑛 – and any resolution refutation for Size 𝑇 and Space 𝑆 requires 𝑇 > (2 0.58 𝑛 /𝑆)ω(1) If space is at most 2𝑛 /2 then size blows up by a super-polynomial amount Open Questions • More than quasi-polynomial separations? – For Tseitin formulas upper bound for small space is only a log n power of the unrestricted size – Candidate formulas? Are these even possible? • Other proof systems? – In [BNT’12], got Polynomial Calculus Resolution. – Cutting Planes? Frege subsystems? • Other cases for separating search paradigms: “dynamic programming” vs. “binary search”? Thanks! Extended Isoperimetric Inequality If the sets aren’t essentially blocks, we’re done. If they are blocks, reduce to the line: Intervals on the line • Let 𝑎1 , 𝑏1 , … , [𝑎𝑘 , 𝑏𝑘 ] be intervals on the line, such that 𝑏𝑘 − 𝑎𝑘 ≥ 2(𝑏𝑘−1 − 𝑎𝑘−1 ) • Let 𝜇(𝑘) be the minimum number of distinct endpoints of intervals in such a configuration. • Then, a simple inductive proof shows 𝜇 𝑘 ≥ 𝑘+1 Proof DAG Proof DAG “Regular”: On every root to leaf path, no variable resolved more than once. Tradeoffs for Regular Resolution • Theorem : For any k, 4-CNF formulas (Tseitin formulas on long and skinny grid graphs) of size n with – Regular resolution refutations in size nk+1, Space nk. – But with Space only nk-, for any > 0, any regular resolution refutation requires size at least n log log n / log log log n. Regular Resolution • Can define partial information more precisely • Complexity is monotonic wrt proof DAG edges. This part uses regularity assumption, simplifies arguments with complexity plot. • Random Adversary selects random assignments based on proof – No random restrictions, conceptually clean and don’t lose constant factors here and there.