“Predicting Learnt Clauses Quality in Modern SAT Solvers” & “Blocked Clause Elimination” Ateeq Sharfuddin CS 297: Championship Algorithms Topics 1. 2. Audemard, G. and Simon, L. “Predicting Learnt Clauses Quality in Modern SAT Solvers.” Järvisalo, M. et al. “Blocked Clause Elimination.” Basic SAT Background Given a Boolean variable x, there are two literals: x -x (a positive literal) (a negative literal) A clause is a disjunction of literals: -x + y A CNF formula is a conjunction of clauses: (-x + y) (a + x + -b) Audemard and Simon’s paper (1) Specific to Conflict Directed Clause Learning (CDCL) solvers. Describes results of experiments exploiting a phenomenon in CDCL solvers (on industrial problems). Describes a static measure to quantify “learnt clause” usefulness. Introduces this measure to CDCL solvers. Compares performance of the GLUCOSE solvers with other current state-of-art solvers. Conflict Directed Clause Learning (CDCL) Basic Idea: When backtracking, add new clauses corresponding to causes of failure of the search. A typical branch of a CDCL solver: a sequence of decisions followed by propagations, repeated until a conflict is reached [1]. Modern SAT Solvers (1) Not much new development since: Most solvers are essentially a rehash of zChaff, with data structure tricks and a few minor improvements like zChaff (2001), an efficient implementation of DPLL MINISAT (2005) Phase caching (Pipatsrisawat et al. 2007) Luby restarts (Huang, 2007) Modern SAT solvers focus on reaching conflicts as soon as possible: Boolean Constraint Propagation (BCP) Variable State Independent Decaying Sum (VSIDS) heuristics. Boolean Constraint Propagation The iterative process of setting all unit literals the value true until encountering an empty clause or no unit clause remains in the formula. “Heart” of modern SAT solvers [1]. Variable State Independent Decaying Sum (VSIDS) heuristics Favors variables that are used recently and used often. Used in conflict analysis and determining future learnt clause usefulness. Solvers tend to let the maximum number of learnt clauses grow exponentially, as deleting a useful clause can have dramatic effect on performance. Audemard and Simon’s First Experiment Ran MINISAT on a selection of benchmarks from last SAT contests and races. Each time a conflict xc is reached for a benchmark, store decision level yl. Limit the search to two million conflicts. Compute a simple least-square linear regression (characteristic line, formula: y = mx + b) on the set of points (xc, yl). If m is negative, decision levels decrease during the search. If m is negative, when the solver will finish the search can be trivially “predicted.” If the solver follows the characteristic line, the solver will finish when this line intersects the x-axis. This point is called the “look-back justification” point and has coordinates (-b/m, 0). Table 1: Decision Level Decrease # of benchmarks in series % of benchmarks that exhibit a decreasing of decision levels Always increasing Series een goldb grieu hoons ibm-2002 ibm-2004 manol-pipe miz schup simon vange velev all #Benchs % Decr. −b/m(> 0) 8 62% 1.1e3 11 100% 1.4e6 7 71% 1.3e6 5 100% 7.2e4 7 71% 4.6e4 13 92% 1.9e5 55 91% 1.9e5 13 0% − 5 80% 4.8e5 10 90% 1.1e6 3 66% 4.0e5 54 92% 1.5e5 199 83% 3.2e5 Reduc. 1762% 93% − 123% 28% 52% 64% − 32% 50% 6% 81% 68% Median values of xc Hypotheses 1. 2. 3. The solver follows a linear decreasing of its decision levels (this was found to be false). Finding a contradiction or a solution gives the same look-back justification The solution (or contradiction) is not found by chance at any point of the computation. Experimental Results The phenomenon seems to hold true for almost all industrial problems. The phenomenon does hold for the “mizh” series of industrial problems, which encode cryptographic problems (100% increasing for this series). Strong relationship between look-back justification and effective number of conflicts needed to solve the problem: bounded between 0.90 and 8.33 times the real number of conflicts needed to solve the problem. In most cases, the justification is 1.37 times the effective number of conflicts. CDCL solvers enforce the decision level to decrease along the line. Justification vs Conflicts Historical justification of needed conflicts vs effective # of conflicts reached Conclusions from First Experiment Results indicates that CDCL solvers do not come to the solution suddenly. On SAT instances, the solver does not correctly guess a value for a literal, but learns that the opposite value leads to a contradiction. If the part of the learning schema that enforces this decreasing can be identified: Perhaps speed-up the decreasing Perhaps identify in advance the clauses that play this part and protect them against clause deletion. Measuring Learnt Clause Quality All literals from the same level are called “blocks” of literals. There is a chance they are linked with each other by direct dependencies. The learning schema should add links between these independent blocks of literals. Literals Blocking Distance Given a clause C and a partition of its literals into n subsets according to the current assignment such that the literals are partitioned with respect to their decision level, the LBD of C is exactly n. LBD for each learnt clause is stored – this is static. “Glue Clauses” learnt clauses of LBD 2 Only contain one variable of the last decision level (First Unique Implication Point). This variable will be “glued” with the block of literals propagated above. Unique Implication Point A vertex in the implication graph that dominates both vertices corresponding to the literals of the conflicting variable. Experiment on LBD Run MINISAT on the set of SAT-Race 06 benchmarks. For each learnt clause, measure the number of times “it” (glue clause) was useful in unit-propagation and conflict analysis. LBD Experiment Result Conclusions of LBD Experiment 40% of the unit propagation on learnt clauses are done on glue clauses Whereas 20% are done on clauses of size 2. Half of the learnt clauses used in the resolution mechanism during all conflict analysis have LBD < 6. Whereas clauses of size smaller than 13 are needed for the same result. Aggressive clauses deletion CDCL solvers performance is tied to clause database management. Keeping too many clauses will decrease the BCP efficiency. Cleaning too many will break the overall learning benefit. Good learnt clauses are identified by VSIDS heuristics. Solvers often let clauses set grow exponentially to prevent good clauses from being deleted. This scheme deteriorates on hard instances, making some hard instances even harder to solve. Aggressive cleaning strategy No matter the size of the initial formula, remove half of the learnt clauses (asserting clauses are kept) every 20,000 + 500x conflicts, where x is the number of times this deletion was previously performed. MINISAT with different deletion strategies MINISAT MINISAT +ag MINISAT +lbd MINISAT +ag+lbd #N (sat-unsat) 70 (35 – 35) 74 (41 – 33) 79 (47 – 32) 82 (45 – 37) #avg time 209 194 145 175 (200 benchmarks from SAT Race 2006, time out of 1000 seconds) GLUCOSE The ideas described in previous slides were embedded into MINISAT with Luby restarts strategy with phase savings. This solver is called “GLUCOSE” for its ability to detect and keep “Glue Clauses.” Two tricks were added: Each time a learnt clause is used in unit-propagation, a new LBD score is computed and updated. Increase the score of variables of the learnt clause that were propagated by a glue clause. Table comparing performances against other SAT solvers. Performance Data Solver ZCHAFF 01 ZCHAFF 04 MINISAT+ MINISAT PICOSAT RSAT 139 GLUCOSE #N (SAT-UNSAT) 84 (47 – 37) 80 (39 – 41) 136 (66 – 74) 132 (53 – 79) 153 (75 – 78) (63 – 75) 176 (75 – 101) #U 0 0 0 1 1 1 22 #B 13 5 15 16 26 14 68 #U: number of times where the solver is the only one to solve an instance #B: number of times where the solver is the fastest solver Blocked Clause Elimination Conceived by Matti Järvisalo, Armin Biere, Marijn Heule. Studies the effectiveness of BCE on standard CNF encodings of circuits Achieves the same level of simplifications as a combination of : CNF encodings Tseitin CNF encoding for circuits Plaisted-Greenbaum encoding Circuit-Level simplifications cone of influence non-shared input elimination monotone input reduction Blocking Literal / Clause A literal x in a clause C of a CNF F blocks C if for every clause C’ F with -x C’, the resolvent (C \ {x}) ∪ (C’ \ {-x}) obtained from resolving C and C’ on x is a tautology. A clause is blocked if it has a literal that blocks it. Example Given CNF F: (a + b) (a - b - c)(-a + c) Clauses: C1 = {a, b}, C2 = {a, -b, -c}, C3 = {-a, c}. Literal a does not block C1 since {b}∪{c} is not a tautology. Literal b does not block C1 since {a}∪{a, -c} is not a tautology. Literal a blocks C2 since {-b, -c}∪{c} is a tautology. Literal -c blocks C2 since {a, -b}∪{-a} is a tautology. Literal c blocks C3 since as is {a,-b} ∪{-a} is a tautology. BCE (continued) Removal of an arbitrary blocked clause by BCE still preserves satisfiability. A literal x cannot block any clause if the CNF contains the unit clause {-x}. If clause C in CNF F is blocked, any clause C’ F where C’ ≠ C that is blocked in F is also blocked in F\{C}. Pure Literal Elimination by BCE Given a CNF F, a literal x occurring in F is pure if -x does not occur in F. Pure Literal Elimination (PL): While there is a pure literal x in F, remove all clauses containing x from F. BCE is at least as effective as PL: A pure literal blocks all clauses which contain it by definition. Experiments Evaluated how much reduction can be achieved using BCE with VE and various circuit encoding techniques. Reduction measured in the size of the CNF before and after preprocessing, and gain in the number of instances solved. 292 CNFs from SAT 2009 application track. Time limit of 900 seconds. Used PrecoSAT v236 and PicoSAT v918. Results Umm, results… inconclusive. Reducing the size of a CNF by preprocessing does not necessarily lead to faster running times. Running preprocessing until completion takes a considerable portion of the 90 seconds limit. Results S = SAT’09 competition, A = structural SAT track, H = HWMCC’08, B = bit-blasted bit-vector problems from SMT-Lib, T = Tseitin, P = Plaisted-Greenbaum, M = Minicirc, N = NiceDAG, U = unknown for S, t = time in seconds spent in one encoding/preprocessing phase, V = sum of number of variables (in millions), C = sum of number of clauses in millions, b = BCE, e = VE.