Performing Bayesian Inference by Weighted Model Counting Tian Sang, Paul Beame, and Henry Kautz Department of Computer Science & Engineering University of Washington Seattle, WA Goal ► Extend success of “compilation to SAT” work for NP-complete problems to “compilation to #SAT” for #P-complete problems Leverage rapid advances in SAT technology Example: Computing permanent of a 0/1 matrix Inference in Bayesian networks (Roth 1996, Dechter 1999) ► Provide practical reasoning tool ► Demonstrate relationship between #SAT and conditioning algorithms In particular: compilation to DNNF (Darwiche 2002, 2004) Contributions ► Simple encoding of Bayesian networks into weighted model counting ► Techniques for extending state-of-the-art SAT algorithms for efficient weighted model counting ► Evaluation on computationally challenging domains Outperforms join-tree methods on problems with high tree-width Competitive with best conditioning methods on problems with high degree of determinism Outline ► Model counting ► Encoding Bayesian networks ► Related Bayesian inference algorithms ► Experiments Grid networks Plan recognition ► Conclusion SAT and #SAT ► Given a CNF formula, SAT: find a satisfying assignment n #SAT: count satisfying assignments ► Example: (x y) (y z) 5 models: (0,1,0), (0,1,1), (1,1,0), (1,1,1), (1, 0, 0) Equivalently: satisfying probability = 5/23 ► Probability ► Can that formula is satisfied by a random truth assignment modify Davis-Putnam-Logemann-Loveland to calculate this value DPLL for SAT DPLL(F) if F is empty, return 1 if F contains an empty clause, return 0 else choose a variable x to branch return (DPLL(F|x=1) V DPLL(F|x=0)) #DPLL for #SAT #DPLL(F) // computes satisfying probability of F if F is empty, return 1 if F contains an empty clause, return 0 else choose a variable x to branch return 0.5*#DPLL(F|x=1 ) + 0.5*#DPLL(F|x=0) Weighted Model Counting ► Each literal has a weight Weight of a model = Product of weight of its literals Weight of a formula = Sum of weight of its models WMC(F) if F is empty, return 1 if F contains an empty clause, return 0 else choose a variable x to branch return weight(x) * WMC(F|x=1) + weight(x) * WMC(F|x=0) Cachet ► State of the art model counting program (Sang, Bacchus, Beame, Kautz, & Pitassi 2004) ► Key innovation: sound integration of component caching and clause learning Component analysis (Bayardo & Pehoushek 2000): if formulas C1 and C2 share no variables, BWMC (C1 C2) = BWMC (C1) * BWMC (C2) Caching (Majercik & Littman 1998; Darwiche 2002; Bacchus, Dalmao, & Pitassi 2003; Beame, Impagliazzo, Pitassi, & Segerland 2003): save and reuse values of internal nodes of search tree Clause learning (Marquis-Silva 1996; Bayardo & Shrag 1997; Zhang, Madigan, Moskewicz, & Malik 2001): analyze reason for backtracking, store as a new clause Cachet ► State of the art model counting program (Sang, Bacchus, Beame, Kautz, & Pitassi 2004) ► Key innovation: sound integration of component caching and clause learning Naïve combination of all three techniques is unsound Can resolve by careful cache management (Sang, Bacchus, Beame, Kautz, & Pitassi 2004) New branching strategy (VSADS) optimized for counting (Sang, Beame, & Kautz SAT-2005) Computing All Marginals ► Task: In one counting pass, Compute number of models in which each literal is true Equivalently: compute marginal satisfying probabilities ► Approach Each recursion computes a vector of marginals At branch point: compute left and right vectors, combine with vector sum Cache vectors, not just counts ► Reasonable counting overhead: 10% - 40% slower than Encoding Bayesian Networks to Weighted Model Counting A B A 0.1 B 0.2 0.8 A 0.6 0.4 A B Encoding Bayesian Networks to Weighted Model Counting A B A 0.1 B 0.2 0.8 A 0.6 0.4 A B Chance variable P added with weight(P)=0.2 A P B Encoding Bayesian Networks to Weighted Model Counting A B A 0.1 B 0.2 0.8 A 0.6 0.4 A B and weight(P)=0.8 A P B Encoding Bayesian Networks to Weighted Model Counting A B A 0.1 B 0.2 0.8 A 0.6 0.4 A B Chance variable Q added with weight(Q)=0.6 AQ B Encoding Bayesian Networks to Weighted Model Counting A B A 0.1 B 0.2 0.8 A 0.6 0.4 A B and weight(Q)=0.4 A Q B Encoding Bayesian Networks to Weighted Model Counting A B A 0.1 B 0.2 0.8 A B w(A)=0.1 w(A)=0.9 w(P)=0.2 w(P)=0.8 w(Q)=0.6 w(Q)=0.4 w(B)=1.0 w(B)=1.0 A 0.6 0.4 A P B A P B A Q B A Q B Main Theorem ► Let: F = a weighted CNF encoding of a Bayes net E = an arbitrary CNF formula, the evidence Q = an arbitrary CNF formula, the query ► Then: P(Q | E ) WMC ( F Q E ) WMC ( F E ) Exact Bayesian Inference Algorithms ► Junction tree algorithm (Shenoy & Shafer 1990) Most widely used approach Data structure grows exponentially large in tree-width of underlying graph ► To handle high tree-width, researchers developed conditioning algorithms, e.g.: Recursive conditioning (Darwiche 2001) Value elimination (Bacchus, Dalmao, Pitassi 2003) Compilation to d-DNNF (Darwiche 2002; Chavira, Darwiche, Jaeger 2004; Darwiche 2004) ► These algorithms become similar to DPLL... Techniques Method Cache index Cache value Branching heuristic Clause learning? Weighted Model Counting component probability dynamic Recursive Conditioning partial assignment probability static Value Elimination Compiling to d-DNNF dependency probability set residual formula d-DNNF semidynamic semidynamic Experiments ► Our benchmarks: Grid, Plan Recognition Junction tree - Netica Recursive conditioning – SamIam Value elimination – Valelim Weighted model counting – Cachet ► ISCAS-85 and SATLIB benchmarks Compilation to d-DNNF – timings from (Darwiche 2004) Weighted model counting - Cachet Experiments: Grid Networks ► CPT’s S T are set randomly. ► A fraction of the nodes are deterministic, specified as a parameter ratio. ► T is the query node Results of ratio=0.5 Size Junction Tree Recursive Conditioning Value Elimination Weighted Model Counting 10*10 0.02 0.88 2.0 7.3 12*12 0.55 1.6 15.4 38 14*14 21 7.9 87 419 16*16 X 104 >20,861 890 18*18 X 2,126 X 13,111 10 problems of each size, X=memory out or time out Results of ratio=0.75 Size Junction Tree Recursive Conditioning Value Elimination Weighted Model Counting 12*12 0.47 1.5 1.4 1.0 14*14 2120 15 8.3 4.7 16*16 >227 93 71 39 18*18 X 1,751 >1,053 81 20*20 X >24,026 >94,997 248 22*22 X X X 1,300 24*24 X X X 4,998 Results of ratio=0.9 Size Junction Tree Recursive Conditioning Value Elimination Weighted Model Counting 16*16 259 102 0.55 0.47 18*18 X 1151 1.9 1.4 20*20 X >44,675 13 1.7 24*24 X X 84 4.5 26*26 X X >8,010 14 30*30 X X X 108 Plan Recognition ► Task: Given a planning domain described by STRIPS operators, initial and goal states, and time horizon Infer the marginal probabilities of each action ► Abstraction of strategic plan recognition: We know enemy’s capabilities and goals, what will it do? ► Modified Blackbox planning system (Kautz & Selman 1999) to create instances problem variables Junction Tree 4-step 5-step tire-1 tire-2 tire-3 tire-4 log-1 log-2 log-3 log-4 165 177 352 550 577 812 939 1337 1413 2303 0.16 56 X X X X X X X X Recursive Value Conditioning Elimination 8.3 36 X X X X X X X X 0.03 0.04 0.68 4.1 24 25 24 X X X Weighted Model Counting 0.03 0.03 0.12 0.09 0.23 1.1 0.11 7.9 9.7 65 ISCAS/SATLIB Benchmarks Benchmarks reported in (Darwiche 2004) Compiling to d-DNNF Weighted Model Counting uf200 (100 instances) 13 7 flat200 (100 instances) 50 8 c432 0.1 0.1 c499 6 85 c880 80 17,506 c1355 15 7,057 c1908 187 1,855 Summary ► Bayesian inference by translation to model counting is competitive with best known algorithms for problems with High tree-width High degree of determinism ► Recent conditioning algorithms already make use of important SAT techniques Most striking: compilation to d-DNNF ► Translation approach makes it possible to quickly exploit future SAT algorithms and implementations