Solving Random Satisfiable 3CNF Formulas in Expected Polynomial Time M. Krivelevich, D. Vilenchik SODA 2006 Lecture Outline What is expected polynomial time and some motivation The planted SAT distribution and related work Description of our algorithm Outline of the analysis Open problems Why Consider Prob. Models ? Many interesting problems are known to be NP-hard Hardness results only show that there exist hard instances Should not discourage us from trying to design heuristics that work well for “almost all” instances For rigorous analysis - define “almost all” in meaningful way One possibility - use probabilistic models such as Gn,p Expected Polynomial Time D - a distribution on the inputs Algorithm works whp over D, if it succeeds whp when instance sampled according to D Such algorithm may fail completely on some instances E.g. Greedy Coloring Algorithm: Fix the vertices in some arbitrary order For every vertex, assign minimal possible color Expected Polynomial Time Greedy uses whp at most n/logn colors for Gn,½ [GM75] (Gn,½) ~ n/2logn whp Therefore, Greedy yields whp 2-approximation of (G) for G2Gn,½ However, Let G=Kn/2,n/2 minus some perfect matching Greedy uses n/2 colors - order vertices according to matching (G)=2 greedy fails completely Expected Polynomial Time Cont. Alternatively, demand success for all instances while keeping an overall average polynomial time Formally … Def. Algorithm A with running time tA(I) on I runs in expected polynomial time over distribution D if PrD[I]¢tA(I) is polynomial in n Expected Polynomial Time Cont. To achieve this – separate “easy” instances (can be handled in polynomial time) from “hard” ones (rare, but may require super-polynomial time) Requires a better understanding of the probability space Encourages efficient, natural and more robust algorithms What’s Next ? What is expected polynomial time and some motivation The planted SAT distribution and related work. Description of our algorithm. Outline of the analysis. Open problems. 3SAT - Definition literal 3CNF form: clause (x1Ç x2 Ç ¬x5)Æ(x3Ǭx4 Ǭx1) Æ (x1Ç x2Ç x6) Æ… Partial truth assignment: x1 x2 x3 x4 x5 x6 T F T F T * 3SAT = {all satisfiable 3CNF formulas}. 3SAT is NP-complete [Cook71]. Different SAT Distributions (Arguably) most natural distribution - Pn,p Include every possible clause w.p. p=p(n) n ρ = p 8 /n 3 Let = expected number of clauses / n, Satisfiability shows sharp threshold behavior [Fri99] < 3.42, almost all instances are satisfiable [KKL02] Analog of Gn,p > 4.5, almost all are unsatisfiable [KKS+01] Our focus is =d, d a sufficiently large constant Different SAT Distributions Pn,p not interesting at such ratios (for satisfiability algorithms) Alternatively … Consider distributions over satisfiable instances One possibility, PSATn,p where PSATn,p (I) = Pn,p(I | I is sat.) PSATn,p is hard to sample (experimentally) PSATn,p seems hard to tackle rigorously (no efficient algorithm known for =o(logn)) Different SAT Distributions Planted SAT can serve as intermediate step towards PSATn,p It is interesting and well studied on its own right It is the analog of Planted k-Coloring [BS95], [AK97], Planted Clique [AKS98], [FK00] It is a random distribution over satisfiable 3CNF formulas with arbitrarily large clauses/variables ratio Can be efficiently sampled The Planted 3SAT Distribution Generating an instance: Randomly pick a truth assignment Include every clause satisfied by w.p. p=d/n2 E.g. x1 T x2 F x3 T x4 F x5 T (x1Ç x2Ç ¬x5)Æ(x3Ç ¬x4Ç x1)Æ(¬x1Ç x2Ç x6)Æ… x6 F Planted Distributions: Related Work [KP92] - greedy variables assignment, p≥d/n (Implicitly) works in expected polynomial time [AK97] – spectral technique for coloring sparse planted 3-colorable graphs (np=d) [BSBG02] – majority vote suffices for p≥d¢logn/n2 [Fla03] – techniques similar to [AK97], solves whp planted 3SAT, p≥d/n2 Related Work Cont. [CO04] – SDP based expected polynomial time algorithm for (semi-random) planted k-colorable graphs, np≥d¢k¢logn [Böt05] – SDP based expected polynomial time algorithm for planted k-colorable graphs, np≥d¢k2 What’s Next ? What is expected polynomial time and some motivation The planted SAT distribution and related work Description of our algorithm Outline of the analysis Open problems Our Results An algorithm that decides 3SAT Expected polynomial running time over planted 3SAT, p=d/n2 Result extends to any constant k (in which case d=d0k) First work to address the issue of expected poly. time algorithms for satisfiable SAT distributions. Algorithm: General Outline Most expected poly. time heuristics discard the solution and exhaustively search for a The algorithm proceeds in 2 steps: correct one correct means coincides with the planted solution 1. Find a partial correct solution containing a large fraction of variables (always poly time) 2. a. Try to complete the partial solution to a satisfying assignment Typically, all but asolution small until b. If not possible, gradually fix the partial constant, e-(d), fraction step 2.a ends up successfully (steps a+b run in expected poly. time) Algorithm: Basic Ingredients The Majority Vote: (x1Çx2Ǭx3)Æ(x4Ç x2Ǭx1)Æ(¬x1Ç x2Ç x4)Æ(x3Ǭx2Ç x4) x1 x2 x3 x4 F T T T Basic Ingredients Cont. The Unassignment Procedure: If C = (x Ç :y Ç z)!(T Ç F Ç F), then x supports C w.r.t Note: all three variables are assigned by E.g. unassignment with threshold t =1 (x1Çx2Ǭx3)Æ(x4Çx2Ǭx1)Æ(¬x1Çx2Ç ¬x4)Æ(x3Ǭx1Ǭx4) * Ç *F Ç *F) Æ (F* Ç F * ÇF * ) Æ( T * Ç *F Ç *F ) (T* Ç F* Ç *F) Æ (T Unassignment stops when all remaining variables d support at least t clauses Basic Ingredients Cont. The Exhaustive Search: If every component is of size O(logn), the procedure is polynomial. Given 3CNF formula I, define its induced graph GI=(V,E): V = {x1, x2, …, xn} - the set of variables (xi,xj)2E if 9 clause C containing both (polarity disregarded) Given I, find the connected components in GI Search every component separately for a satisfying assignment Basic Ingredients: Motivation Assume input according to planted 3SAT by Wrongly assigned But we alsosampled expect the the Majority. majority to wrongly assig Suppose (x)=T We call such variable n some variables whp wrong variable. (small fraction) In every clause, x appears w.p. 4/7¢ 3/n, :x w.p. 3/7¢ 3/n Therefore, Must be another wrong variable Majority Vote approximates closely whp the surviving unassignment Suppose a wrongly assigned variable survives unassignment F Ç F Ç F) (T T Motivation Cont. W - the set of wrong variables surviving unassignment There exist at least t¢|W | clauses, each containing at least 2 variables from W We call such W dense each clause was with If |W | is small, this is analogous to small subgraph counted once, as the atypically high average degree support is unique. This happens with small probability in random graphs, Gn,p Algorithm: General Outline Majority Vote + Unassignment The algorithm basically proceeds in 2 steps: 1. Find a partial correct solution containing aExhaustive large fraction of the variables Search 2. a. Try to complete the partial solution to a satisfying assignment b. If not possible, gradually fix the partial solution until step 2.a ends up successfully. Make sure algorithm always succeeds. Putting Everything Together d/2 is the expected Algorithm SAT(I): support 1. MAJ à Majority Vote of I. 2. 3. 4. 5. 6. completeness Carry unassignment with threshold 0.999d/2 soundness w.r.t MAJ. Let be the partial assignment. Let U be the set of unassigned variables. Construct G=(U,E). For all subsets Y µ V\U, |Y|=0..|V\U|, and for all possible assignments Y of Y: 1. Fix according to Y. 2. Using exhaustive search on G(U,E) try to comp lete to a satisfying assignment. Y is the fixing set o f variables 3. If success, return the assignment. What’s Next ? What is expected polynomial time and some motivation. The planted SAT distribution and related work. Description of our algorithm. Outline of the analysis. Open problems. Analyzing the Running Time Algorithm SAT(I): 1. MAJ à Majority Vote of I. 2. Carry unassignment with threshold Expected to0.999d/2 perfor Expected running time w.r.t MAJ. m O(1) times O(n1+) 3. Let be the partial assignment. 4. Let U be the set of unassigned variables. 5. Construct G=(U,E). 6. For all subsets Y µ V\U, and for all possible assignments Y of Y: Always polynomial. 1. Fix according to Y. In fact expected linear tim 2. Using exhaustive searche on G(U,E) try to complete to a satisfying assignment. 3. If success, return the assignment. Analysis Outline Typically (for Planted 3SAT), the following happens: arguments Distance between MAJ and thesimilar planted assignment is e-(d)n to Gn,p, np<1 Almost all correct variables, (1-e-(d) ) n, survive unassignment Only correct variables survive the unassignment G=(U,E) breaks down to O(logn)-size connected components Therefore, “Density” arguments Exhaustive search is successful and polynomial Analysis Outline What can go wrong, preventing successful execution ? Wrong variables survived the unassignment: The partial assignment induces a (FÇFÇF) clause Formula induced by unassigned variables is not satisfiable Y0 - the set of fixing variables with which the algorithm ends Typically, Y0=; Analysis Outline Cont. Key observation: if Y0; then: 1. The Majority Vote is wrong for at least |Y0| variables 2. Y0 is a dense set of variables For “large” |Y0|, (1) happens with small probability Suppose x 2small Y0 !probability For “small” |Y0|, (2) happens with x survives the unassignment ! Otherwise, the algorithm x supports ~d/2have clauses ! with a would ended It remains to carry out the exact calculations x smaller yset Y’ Y . 0 F (T Ç F Ç F) T ! y2Y0, otherwise, algorithm can not end A Taste of Rigorous Analysis The following properties hold whp for Planted 3SAT: Let 0=e-d/C0 FMAJ - the set of variables on which MAJ and disagree Claim: for y ¸ 0n, Pr[|FMAJ|¸ y] · e-yd/C1 For JµV, F(J) is the set of clauses in I containing at least 2 variables from J Claim: Pr[9J, |J|·0n, |F(J)|¸|J|d/3]· e-|J|log(n/|J|)d/12 Properties proved using standard probabilistic techniques (union bound, Chernoff) A Taste of Rigorous Analysis The expected number of fixing iterations is at most: n y' E[# iterations] Pr[|Y0 |= y] 2 y=0 y'=0 y' n n for y , 2 n y' n y n 2 y 2 exp y log y' y y y'=0 y n y' always, 2 3n y'=0 y' y y A Taste of Rigorous Analysis E[#iterations] n n 1 exp y log exp -y log d/12 + y=0 y y y=1 144444444442 44444444443 α0n Y0 is a dense set of size y n exp y log exp -y d/C1 + y y=α0n 42 4444443 14444442 4444443 144444 n/2 n 1 d log log y α0 C0 MAJ wrong for |Y0| variables n n 3 exp -y d/C1 = O(1) 14444442 4444443 y=n/2 yd/C1 dn/2C1 2n Open Problems [FV04] show a k-opt based heuristic solving whp Planted 3SAT, p=d/n2 Change k-opt version to run in expected polynomial time Challenge: no explicit distinction between wrong and correct variables Simplify [Böt05], e.g. replacing SDP approximation with simpler and stronger procedure (similar to Majority Vote) Design an efficient algorithm for random (not planted) satisfiable formulas, p=d/n2