Performing Bayesian Inference by Weighted Model Counting

advertisement
Performing Bayesian Inference
by Weighted Model Counting
Tian Sang, Paul Beame, and Henry Kautz
Department of Computer Science & Engineering
University of Washington
Seattle, WA
Goal
► Extend
success of “compilation to SAT” work for
NP-complete problems to “compilation to #SAT”
for #P-complete problems
 Leverage rapid advances in SAT technology
 Example: Computing permanent of a 0/1 matrix
 Inference in Bayesian networks (Roth 1996, Dechter 1999)
► Provide
practical reasoning tool
► Demonstrate relationship between #SAT and
conditioning algorithms
 In particular: compilation to DNNF (Darwiche 2002, 2004)
Contributions
► Simple
encoding of Bayesian networks into
weighted model counting
► Techniques for extending state-of-the-art SAT
algorithms for efficient weighted model counting
► Evaluation on computationally challenging
domains
 Outperforms join-tree methods on problems with high
tree-width
 Competitive with best conditioning methods on
problems with high degree of determinism
Outline
► Model
counting
► Encoding Bayesian networks
► Related Bayesian inference algorithms
► Experiments
 Grid networks
 Plan recognition
► Conclusion
SAT and #SAT
► Given
a CNF formula,
 SAT: find a satisfying assignment n
 #SAT: count satisfying assignments
► Example:
(x  y)  (y  z)
 5 models:
(0,1,0), (0,1,1), (1,1,0), (1,1,1), (1, 0, 0)
 Equivalently: satisfying probability = 5/23
► Probability
► Can
that formula is satisfied by a random truth assignment
modify Davis-Putnam-Logemann-Loveland to
calculate this value
DPLL for SAT
DPLL(F)
if F is empty, return 1
if F contains an empty clause, return 0
else choose a variable x to branch
return (DPLL(F|x=1) V DPLL(F|x=0))
#DPLL for #SAT
#DPLL(F)
// computes satisfying probability of F
if F is empty, return 1
if F contains an empty clause, return 0
else choose a variable x to branch
return 0.5*#DPLL(F|x=1 ) + 0.5*#DPLL(F|x=0)
Weighted Model Counting
► Each
literal has a weight
 Weight of a model = Product of weight of its literals
 Weight of a formula = Sum of weight of its models
WMC(F)
if F is empty, return 1
if F contains an empty clause, return 0
else choose a variable x to branch
return weight(x) * WMC(F|x=1) +
weight(x) * WMC(F|x=0)
Cachet
► State
of the art model counting program
(Sang,
Bacchus, Beame, Kautz, & Pitassi 2004)
► Key
innovation: sound integration of component
caching and clause learning
 Component analysis (Bayardo & Pehoushek 2000): if
formulas C1 and C2 share no variables,
BWMC (C1  C2) = BWMC (C1) * BWMC (C2)
 Caching (Majercik & Littman 1998; Darwiche 2002; Bacchus,
Dalmao, & Pitassi 2003; Beame, Impagliazzo, Pitassi, & Segerland
2003): save and reuse values of internal nodes of search
tree
 Clause learning (Marquis-Silva 1996; Bayardo & Shrag 1997;
Zhang, Madigan, Moskewicz, & Malik 2001): analyze reason for
backtracking, store as a new clause
Cachet
► State
of the art model counting program
(Sang,
Bacchus, Beame, Kautz, & Pitassi 2004)
► Key
innovation: sound integration of component
caching and clause learning
 Naïve combination of all three techniques is unsound
 Can resolve by careful cache management (Sang, Bacchus,
Beame, Kautz, & Pitassi 2004)
 New branching strategy (VSADS) optimized for counting
(Sang, Beame, & Kautz SAT-2005)
Computing All Marginals
► Task:
In one counting pass,
 Compute number of models in which each literal is true
 Equivalently: compute marginal satisfying probabilities
► Approach
 Each recursion computes a vector of marginals
 At branch point: compute left and right vectors, combine
with vector sum
 Cache vectors, not just counts
► Reasonable
counting
overhead: 10% - 40% slower than
Encoding Bayesian Networks to
Weighted Model Counting
A
B
A
0.1
B
0.2 0.8
A 0.6 0.4
A
B
Encoding Bayesian Networks to
Weighted Model Counting
A
B
A
0.1
B
0.2 0.8
A 0.6 0.4
A
B
Chance variable P added
with weight(P)=0.2
A P  B
Encoding Bayesian Networks to
Weighted Model Counting
A
B
A
0.1
B
0.2 0.8
A 0.6 0.4
A
B
and weight(P)=0.8
A  P  B
Encoding Bayesian Networks to
Weighted Model Counting
A
B
A
0.1
B
0.2 0.8
A 0.6 0.4
A
B
Chance variable Q added
with weight(Q)=0.6
AQ  B
Encoding Bayesian Networks to
Weighted Model Counting
A
B
A
0.1
B
0.2 0.8
A 0.6 0.4
A
B
and weight(Q)=0.4
A  Q  B
Encoding Bayesian Networks to
Weighted Model Counting
A
B
A
0.1
B
0.2 0.8
A
B
w(A)=0.1
w(A)=0.9
w(P)=0.2
w(P)=0.8
w(Q)=0.6
w(Q)=0.4
w(B)=1.0
w(B)=1.0
A 0.6 0.4
A P  B
A  P  B
A  Q  B
 A  Q   B
Main Theorem
► Let:
 F = a weighted CNF encoding of a Bayes net
 E = an arbitrary CNF formula, the evidence
 Q = an arbitrary CNF formula, the query
► Then:
P(Q | E ) 
WMC ( F  Q  E )
WMC ( F  E )
Exact Bayesian Inference Algorithms
► Junction
tree algorithm
(Shenoy & Shafer 1990)
 Most widely used approach
 Data structure grows exponentially large in tree-width of
underlying graph
► To
handle high tree-width, researchers developed
conditioning algorithms, e.g.:
 Recursive conditioning (Darwiche 2001)
 Value elimination (Bacchus, Dalmao, Pitassi 2003)
 Compilation to d-DNNF (Darwiche 2002; Chavira, Darwiche, Jaeger
2004; Darwiche 2004)
► These
algorithms become similar to DPLL...
Techniques
Method
Cache
index
Cache
value
Branching
heuristic
Clause
learning?
Weighted
Model
Counting
component
probability
dynamic

Recursive
Conditioning
partial
assignment
probability
static
Value
Elimination
Compiling to
d-DNNF
dependency probability
set
residual
formula
d-DNNF
semidynamic
semidynamic

Experiments
► Our




benchmarks: Grid, Plan Recognition
Junction tree - Netica
Recursive conditioning – SamIam
Value elimination – Valelim
Weighted model counting – Cachet
► ISCAS-85
and SATLIB benchmarks
 Compilation to d-DNNF – timings from
(Darwiche 2004)
 Weighted model counting - Cachet
Experiments: Grid Networks
► CPT’s
S
T
are set
randomly.
► A fraction of the
nodes are
deterministic,
specified as a
parameter ratio.
► T is the query
node
Results of ratio=0.5
Size
Junction
Tree
Recursive
Conditioning
Value
Elimination
Weighted
Model
Counting
10*10
0.02
0.88
2.0
7.3
12*12
0.55
1.6
15.4
38
14*14
21
7.9
87
419
16*16
X
104
>20,861
890
18*18
X
2,126
X
13,111
10 problems of each size, X=memory out or time out
Results of ratio=0.75
Size
Junction
Tree
Recursive
Conditioning
Value
Elimination
Weighted
Model
Counting
12*12
0.47
1.5
1.4
1.0
14*14
2120
15
8.3
4.7
16*16
>227
93
71
39
18*18
X
1,751
>1,053
81
20*20
X
>24,026 >94,997
248
22*22
X
X
X
1,300
24*24
X
X
X
4,998
Results of ratio=0.9
Size
Junction
Tree
Recursive
Conditioning
Value
Elimination
Weighted
Model
Counting
16*16
259
102
0.55
0.47
18*18
X
1151
1.9
1.4
20*20
X
>44,675
13
1.7
24*24
X
X
84
4.5
26*26
X
X
>8,010
14
30*30
X
X
X
108
Plan Recognition
► Task:
 Given a planning domain described by STRIPS
operators, initial and goal states, and time
horizon
 Infer the marginal probabilities of each action
► Abstraction
of strategic plan recognition:
We know enemy’s capabilities and goals,
what will it do?
► Modified Blackbox planning system
(Kautz & Selman 1999) to create instances
problem
variables
Junction
Tree
4-step
5-step
tire-1
tire-2
tire-3
tire-4
log-1
log-2
log-3
log-4
165
177
352
550
577
812
939
1337
1413
2303
0.16
56
X
X
X
X
X
X
X
X
Recursive
Value
Conditioning Elimination
8.3
36
X
X
X
X
X
X
X
X
0.03
0.04
0.68
4.1
24
25
24
X
X
X
Weighted
Model
Counting
0.03
0.03
0.12
0.09
0.23
1.1
0.11
7.9
9.7
65
ISCAS/SATLIB Benchmarks
Benchmarks reported
in (Darwiche 2004)
Compiling to
d-DNNF
Weighted
Model
Counting
uf200 (100 instances)
13
7
flat200 (100 instances)
50
8
c432
0.1
0.1
c499
6
85
c880
80
17,506
c1355
15
7,057
c1908
187
1,855
Summary
► Bayesian
inference by translation to model
counting is competitive with best known
algorithms for problems with
 High tree-width
 High degree of determinism
► Recent
conditioning algorithms already
make use of important SAT techniques
 Most striking: compilation to d-DNNF
► Translation
approach makes it possible to
quickly exploit future SAT algorithms and
implementations
Download