Toward a Universal Inference Engine Henry Kautz University of Washington With Fahiem Bacchus, Paul Beame, Toni Pitassi, Ashish Sabharwal, & Tian Sang Universal Inference Engine Old dream of AI – General Problem Solver – Newell & Simon Logic + Inference – McCarthy & Hayes Reality: 1962 – 50 variable toy SAT problems 1992 – 300 variable non-trivial problems 1996 – 1,000 variable difficult problems 2002 – 1,000,000 variable real-world problems Pieces of the Puzzle Good old Davis-Putnam-LogemannLoveland Clause learning (nogood-caching) Randomized restarts Component analysis Formula caching Learning domain-specific heuristics Generality SAT #P complete #SAT Bayesian Networks Bounded-alternation Quantified Boolean NP complete formulas Quantified Boolean formulas PSPACE complete Stochastic SAT 1. Clause Learning with Paul Beame & Ashish Sabharwal DPLL Algorithm DPLL(F) // Perform unit propagation while exists unit clause (y) F F F|y Remove all clauses containing y Shrink all clauses containing y if F is empty, report satisfiable and halt if F contains the empty clause L, return else choose a literal x DPLL(F|x) DPLL(F|x) Extending DPLL: Clause Learning When backtracking in DPLL, add new clauses corresponding to causes of failure of the search EBL [Stallman & Sussman 77, de Kleer & Williams 87] CSP [Dechter 90] CL [Bayardo-Schrag 97, MarquesSilva-Sakallah 96, Zhang 97, Moskewicz et al. 01, Zhang et al. 01] Added conflict clauses Capture reasons of conflicts Obtained via unit propagations from known ones Reduce future search by producing conflicts sooner Conflict Graphs Known Clauses (p q a) ( a b t) (t x1) (t x2) (t x3) (x1 x2 x3 y) (x2 y) 1-UIP scheme t x1 p q y a t b Current decisions p false q false b true Decision scheme (p q b) x2 false y x3 FirstNewCut scheme (x1 x2 x3) CL Critical to Performance Best current SAT algorithms rely heavily on CL for good behavior on real world problems GRASP [MarquesSilva-Sakallah 96], SATO [H.Zhang 97] zChaff [Moskewicz et al. 01], Berkmin [Goldberg-Novikov 02] However, No good understanding of strengths and weaknesses of CL Not much insight on why it works well when it does Harnessing the Power of Clause Learning (Beame, Kautz, & Sabharwal 2003) Mathematical framework for analyzing clause learning Characterization of its power in relation to well-studied topics in proof complexity theory Ways to improve solver performance based on formal analysis Proofs of Unsatisfiability When F is unsatisfiable, Trace of DPLL on F is a proof of its unsatisfiability Bound on shortest proof of F gives bound on best possible implementation Upper bound – “There is a proof no larger than K” Potential for finding proofs quickly Best possible branching heuristic, backtracking, etc. Lower bound – “Shortest proof is at least size K” Inherent limitations of the algorithm or proof system Proof System: Resolution F = (a b) (a c) a (b c) (a c) Unsatisfiable CNF formula L empty clause Proof size = 9 c c (b c) (a b) (a c) a (bc) (a c) Special Cases of Resolution Tree-like resolution Graph of inferences forms a tree DPLL Regular resolution Variable can be resolved on only once on any path from input to empty clause Directed acyclic graph analog of DPLL tree Natural to not branch on a variable once it has been eliminated Used in original DP [Davis-Putnam 60] … Proof System Hierarchy Frege systems … Space of formulas with poly-size proofs Pigeonhole principle [Haken 85] [Alekhnovich et al. 02] [Bonet et al. 00] General RES Regular RES Tree-like RES Thm1. CL can beat Regular RES Formula f • Poly-size RES proof • Exp-size Regular proof General RES Regular RES Formula PT(f,) • Poly-size CL proof • Exp-size Regular proof Regular RES CL DPLL Example formulas GTn Ordering principle Peb Pebbling formulas [Alekhnovich et al. 02] PT(f,): Proof Trace Extension Start with unsatisfiable formula f with poly-size RES proof PT(f, ) contains • All clauses of f • For each derived clause Q=(abc) in , – Trace variable tQ – New clauses (tQ a), (tQ b), (tQ c) CL proof of PT(f, ) works by branching negatively on tQ’s in bottom up order of clauses of PT(f,): Proof Trace Extension Formula f RES proof L Q (a b c) (a b x) … … (c x) … … PT(f,): Proof Trace Extension Formula f RES proof PT(f,) (tQ a) (tQ b) (tQ c) L Q (a b c) (a b x) … … (c x) … • Trace variable tQ • New clauses … PT(f,): Proof Trace Extension Formula f RES proof PT(f,) (tQ a) (tQ b) (tQ c) L Q (a b c) (a b x) • Trace variable tQ • New clauses a (c x) tQ b c … … … … x false x FirstNewCut (a b c) How hard is PT(f,)? Easy for CL: by construction CL branches exactly once on each trace variable # branches = size() = poly Hard for Regular RES: reduction argument Fact 1: PT(f,)|TraceVars = true f Fact 2: If is a Regular RES proof of g, then |x is a Regular RES proof of g|x Fact 3: f does not have small Regular RES proofs! Implications? DPLL algorithms w/o clause learning are hopeless for certain formula classes CL algorithms have potential for small proofs Can we use such analysis to harness this potential? Pebbling Formulas (t1 t2) fG = Pebbling(G) T (d1 d2) E F (e1 e2) A B C (a1 a2) (b1 b2) (c1 c2) A node X is “pebbled” if (x1 or x2) holds Source axioms: A, B, C are pebbled Pebbling axioms: A and B are pebbled D is pebbled Target axioms: T is not pebbled Pebbling Formulas (t1 t2) fG = Pebbling(G) T (d1 d2) E F (e1 e2) A B C (a1 a2) (b1 b2) (c1 c2) [(a1 a2 ) (b1 b2 )] (d1 d 2 ) (a1 b1 d1 d 2 ) (a1 b2 d1 d 2 ) (a2 b1 d1 d 2 ) (a2 b2 d1 d 2 ) A node X is “pebbled” if (x1 or x2) holds Source axioms: A, B, C are pebbled Pebbling axioms: A and B are pebbled D is pebbled Target axioms: T is not pebbled Grid vs. Randomized Pebbling (n1 n2) m1 (t1 t2) l1 (h1 h2) (h1 h2) (i1 i2) e1 (e1 e2) (f1 f2) (g1 g2) (i1 i2 i3 i4) f1 (d1 d2 d3) (a1 a2) (b1 b2) (c1 c2) (g1 g2) (d1 d2) (a1 a2) b1 (c1 c2 c3) Branching Sequence B = (x1, x4, :x3, x1, :x8, :x2, :x4, x7, :x1, x2) OLD: “Pick unassigned var x” NEW: “Pick next literal y from B; delete it from B; if y already assigned, repeat” Statement of Results Given a pebbling graph G, can efficiently generate a branching sequence BG such that DPLL-Learn*(fG, BG) is empirically exponentially faster than DPLL-Learn*(fG) DPLL-Learn*: Any clause learner with 1-UIP learning scheme and fast backtracking, e.g. zChaff [Moskewicz et al ’01] Efficient : Q(|fG|) time to generate BG Effective: Q(|fG|) branching steps to solve fG using BG Genseq on Grid Pebbling Graphs (t1 t2) (h1 h2) (e1 e2) (a1 a2) (i1 i2) (f1 f2) (b1 b2) (c1 c2) (g1 g2) (d1 d2) Results: Grid Pebbling zChaff settings Max formula size solved 24 hours; 512 MB memory Unsatisfiable Naive DPLL Satisfiable Learning OFF Branching Seq OFF 45 vars 55 vars Learning OFF Branching Seq ON 45 vars 55 vars Original zChaff Learning ON Branching Seq OFF 2,000 vars 4,500 vars Modified zChaff Learning ON Branching Seq ON 2,500,000 vars 1,000,000 vars Results: Randomized Pebbling zChaff settings Naive DPLL Max formula size solved 24 hours; 512MB memory Unsatisfiable Satisfiable Learning OFF Branching Seq OFF 35 vars 35 vars Learning OFF Branching Seq ON 45 vars 45 vars Original zChaff Learning ON Branching Seq OFF 350 vars 350 vars Modified zChaff Learning ON Branching Seq ON 45,000 vars 20,000 vars 2. Randomized Restarts Restarts Run-time distribution typically has high variance across instances or random seeds tie-breaking in branching heuristic heavy-tailed – infinite mean & variance! Leverage by restart strategies Heavy-tailed exponential distribution short long Generalized Restarts At conflict backtrack to arbitrary point in search tree Lowest conflict decision variable = backjumping Root = restart Other = partial restart Adding clause learning makes almost any restart scheme complete (J. Marques-Silva 2002) Aggressive Backtracking zChaff – at conflict backtrack to above highest conflict variable Not traditional backjumping! Wasteful? Learned clause saves “most” work Learned clause provides new evidence about best branching variable and value! 4. Component Analysis #SAT – Model Counting Why #SAT? Prototypical #P complete problem Can encode probabilistic inference Natural encoding for counting problems Bayesian Nets to Weighted Counting Introduce new vars so all internal vars are deterministic A A .1 B B A ~A .2 .6 Bayesian Nets to Weighted Counting Introduce new vars so all internal vars are deterministic A .1 B A .1 A A B B A ~A .2 .6 P .2 P Q .6 Q B ( A P) (A Q) Bayesian Nets to Weighted Counting Weight of a model is product of variable weights Weight of a formula is sum of weights of its models A .1 A P .2 P Q .6 Q B B ( A P) (A Q) Bayesian Nets to Weighted Counting Let F be the formula defining all internal variables Pr(query) = weight(F & query) A .1 A P .2 P Q .6 Q B B ( A P) (A Q) Bayesian Nets to Counting Unweighted counting is case where all non-defined variables have weight 0.5 Introduce sets of variables to define other probabilities to desired accuracy In practice: just modify #SAT algorithm to weighted #SAT Component Analysis Can use DPLL to count models Just don’t stop when first assignment is found If formula breaks into separate components (no shared variables), can count each separately and multiply results: #SAT(C1 C2) = #SAT(C1) * #SAT(C2) RelSat (Bayardo) – CL + component analysis at each node in search tree 50 variable #SAT State of the art circa 2000 5. Formula Caching with Fahiem Bacchus, Paul Beame, Toni Pitassi, & Tian Sang Formula Caching New idea: cache counts of residual formulas at each node Bacchus, Dalmao & Pitassi 2003 Beame, Impagliazzo, Pitassi, & Segerlind 2003 Matches time/space tradeoffs of best known exact probabilistic inference algorithms nO (1) 2O ( w) where w is tree-width of formula 2O ( w log n ) if only linear space is used for cache #SAT with Component Caching #SAT(F) a = 1; for each G to_components(F) { if (G == ) m = 1; else if ( G) m = 0; else if (in_cache(G)) m = cache_value(G); else { select v F; m = ½ * #SAT(G|v) + ½ * #SAT(G|v); insert_cache(G,m);} a = a * m; } return a; #SAT with Component Caching #SAT(F) Computes probability m that a random a = 1; truth assignment satisfies the formula: # models for each G to_components(F) { = 2m if (G == ) m = 1; else if ( G) m = 0; else if (in_cache(G)) m = cache_value(G); else { select v F; m = ½ * #SAT(G|v) + ½ * #SAT(G|v); insert_cache(G,m);} a = a * m; } return a; Putting it All Together Goal: combine Clause learning Component analysis Formula caching to create a practical #SAT algorithm Not quite as straightforward as it looks! Issue 1: How Much to Cache? Everything Infeasible – 1050 + nodes Only sub-formulas on current branch Linear space Fixed variable ordering + no clause learning == Recursive Conditioning (Darwiche 2002) Surely we can do better... Efficient Cache Management Ideal: make maximum use of RAM, but not one bit more Space & age-bounded caching Separate-chaining hash table Lazy deletion of entries older than K when searching chains Constant amortized time If sum of all chains becomes too large, do global cleanup Rare in practice Issue 2: Interaction of Component Analysis & Clause Learning Without CL, sub-formulas decrease in size F F|p F|p With CL, sub-formulas may become huge 1,000 clauses 1,000,000 learned clauses Why this is a Problem Finding connected components at each node requires linear time Way too costly for learned clauses Components using learned clauses unlikely to reoccur Defeats purpose of formula caching Suggestion Use only clauses derived from original formula for Component analysis “Keys” for cached entries Use all the learned clauses for unit propagation Can this possibly be sound? Almost! Main Theorem G| F| A1 A2 A3 Therefore: for SAT sub-formulas it is safe to use learned clauses for unit propagation! UNSAT Sub-formulas But if F| is unsatisfiable, all bets are off... Without component caching, there is still no problem – because the final value is 0 in any case With component caching, could cause incorrect values to be cached Solution Flush siblings (& their descendents) of unsat components from cache #SAT CC+CL #SAT(F) a = 1; s = ; for each G to_components(F) { if (in_cache(G)) { m = cache_value(G);} else{ m = split(G); insert_cache(G,m); a = a * m; if (m==0) { flush_cache(s); break; } else s = s {G}; }} return a; #SAT CC+CL continued split(G) if (G == ) return 1; if ( G) { learn_new_clause() return 0; } select v G; return ½ * #SAT(G|v) + ½ * #SAT(G|v); Results: Pebbling Formulas 30 layers = 930 variables, 1771 clauses Results: Planning Problems Results: Circuit Synthesis Random 3-SAT 75 var random 3-sat 100000 10000 seconds 1000 relsat 100 CC+L 10 1 0.1 0.8 1 1.2 1.4 1.6 1.8 clause / variable ratio 2 2.2 Summary Dramatic progress in automating propositional inference over last decade Progress due to the careful refinement of a handful of ideas – DPLL, clause learning, restarts, component analysis, formula caching The successful unification of these elements for #SAT gives renewed hope for a universal reasoning engine! What’s Next? Evaluation of weighted-#SAT version on Bayesian networks Better component ordering and component-aware variable branching heuristics Optimal restart policies for #SAT CC+CL Adapt techniques for sampling methods – approximate inference???