Two Approximate Algorithms for Belief Updating Mini-Clustering - MC Robert Mateescu, Rina Dechter, Kalev Kask. "Tree Approximation for Belief Updating", AAAI-2002 Iterative Join-Graph Propagation - IJGP Rina Dechter, Kalev Kask and Robert Mateescu. "Iterative JoinGraph Propagation”, UAI 2002 What is Mini-Clustering? Mini-Clustering (MC) is an approximate algorithm for belief updating in Bayesian networks MC is an anytime version of join-tree clustering MC applies message passing along a cluster tree The complexity of MC is controlled by a user-adjustable parameter, the i-bound Empirical evaluation shows that MC is a very effective algorithm, in many cases superior to other approximate schemes (IBP, Gibbs Sampling) Belief networks A belief network is a quadruple BN X , D, G, P where : X { X 1 ,..., X n } is a set of random variables D {D1 ,..., Dn } is the set of their domains G is a DAG (directed acyclic graph) over X P { p1 ,..., pn }, pi P( X i | pai ) are CPTs (condition al probabilit y tables) A B C D E F G The belief updating problem is the task of computing the posterior probability P(Y|e) of query nodes Y X given evidence e. We focus on the basic case where Y is a single variable Xi Tree decompositions A tree decomposition for a belief network BN X,D,G,P is a triple T , , , where T (V,E) is a tree and χ and ψ are labeling functions, associatin g with each verte x v V two sets, χ(v) X and ψ(v) P satisfying : 1. For each function pi P there is exactly one vertex such that pi ψ(v) and scope(pi ) χ(v) 2. For each varia ble X i X the set {v V|X i χ(v)} forms a connected subtree (running intersecti on property) ABC p(a), p(b|a), p(c|a,b) BC BCDF p(d|b), p(f|c,d) BF BEF p(e|b,f) A B C Belief network EF D E F EFG p(g|e,f) Tree decomposition G Cluster Tree Elimination Cluster Tree Elimination (CTE) is an exact algorithm It works by passing messages along a tree decomposition Basic idea: Each node sends only one message to each of its neighbors Node u sends a message to its neighbor v only when u received messages from all its other neighbors Cluster Tree Elimination Previous work on tree clustering: Lauritzen, Spiegelhalter - ‘88 (probabilities) Jensen, Lauritzen, Olesen - ‘90 (probabilities) Shenoy, Shafer - ‘90, Shenoy - ‘97 (general) Dechter, Pearl - ‘89 (constraints) Gottlob, Leone, Scarello - ‘00 (constraints) Belief Propagation x1 h(u,v) x2 xn u v ((uu) {h( x1 , u),)}h( x2 , u),..., )} h( xn , u), )}h(v, u)} Compute the message : h(u, v) elim(u ,v ) f cluster(u ) {h ( v ,u )} f Cluster Tree Elimination - example 1 ABC h(1, 2) (b, c) p(a) p(b | a) p(c | a, b) A a BC B h( 2,1) (b, c) p(d | b) p( f | c, d ) h(3, 2) (b, f ) d, f 2 BCDF C D h( 2,3) (b, f ) p(d | b) p( f | c, d ) h(1, 2) (b, c) E c ,d BF h(3, 2) (b, f ) p(e | b, f ) h( 4,3) (e, f ) e 3 BEF F h(3, 4) (e, f ) p(e | b, f ) h( 2,3) (b, f ) b G EF h( 4,3) (e, f ) p(G ge | e, f ) 4 EFG Cluster Tree Elimination - the messages 1 ABC p(a), p(b|a), p(c|a,b) BC 2 BCDF p(d|b), p(f|c,d) h(1,2)(b,c) sep(2,3)={B,F} elim(2,3)={C,D} 3 BF a h( 2,3) (b, f ) p(d | b) p( f | c, d ) h(1, 2) (b, c) c ,d BEF p(e|b,f), h(2,3)(b,f) EF 4 h(1, 2) (b, c) p(a) p(b | a) p(c | a, b) EFG p(g|e,f) Cluster Tree Elimination - properties Correctness and completeness: Algorithm CTE is correct, i.e. it computes the exact joint probability of a single variable and the evidence. Time complexity: O ( deg (n+N) d w*+1 ) Space complexity: O ( N d sep) where deg = the maximum degree of a node n = number of variables (= number of CPTs) N = number of nodes in the tree decomposition d = the maximum domain size of a variable w* = the induced width sep = the separator size Mini-Clustering - motivation Time and space complexity of Cluster Tree Elimination depend on the induced width w* of the problem When the induced width w* is big, CTE algorithm becomes infeasible Mini-Clustering - the basic idea Try to reduce the size of the cluster (the exponent); partition each cluster into mini-clusters with less variables Accuracy parameter i = maximum number of variables in a mini-cluster The idea was explored for variable elimination (MiniBucket) Mini-Clustering Suppose cluster(u) is partitioned into p mini-clusters: mc(1),…,mc(p), each containing at most i variables TC computes the ‘exact’ message: h(u ,v ) elim(u ,v ) k 1 f mc( k ) f p We want to process each fmc(k) f separately Mini-Clustering h(u ,v ) elim(u ,v ) k 1 f mc( k ) f p Approximate each fmc(k) f , k=2,…,p and take it outside the summation How to process the mini-clusters to obtain approximations or bounds: Process all mini-clusters by summation - this gives an upper bound on the joint probability A tighter upper bound: process one mini-cluster by summation and the others by maximization Can also use mean operator (average) - this gives an approximation of the joint probability Idea of Mini-Clustering Split a cluster into mini-clusters =>bound complexity h g X X Exponentia l complexity decrease : O(en ) O(er ) O(enr ) Mini-Clustering - example 1 ABC 1 p(a) p(b | a) p(c | a, b) H (1, 2 ) h(1, 2 ) (b, c) : a BC h(12 ,1) (b) : p ( d | b) h(13, 2 ) (b, f ) d, f 2 ( 2 ,1) H ( 2,1) h 2 BCDF H ( 2 , 3) BF (c) : max p ( f | c, d ) d, f h(12,3) (b) : p (d | b) h(11, 2 ) (b, c) c ,d 2 ( 2 , 3) h ( f ) : max p ( f | c, d ) c ,d 1 1 H ( 3, 2 ) h( 3, 2 ) (b, f ) : p(e | b, f ) h( 4,3) (e, f ) 3 BEF e H ( 3, 4 ) h(13, 4) (e, f ) : p(e | b, f ) h(12,3) (b) h(22,3) ( f ) EF b H ( 4 , 3) h(14,3) (e, f ) : p(G g e | e, f ) 4 EFG Mini-Clustering - the messages, i=3 1 ABC p(a), p(b|a), p(c|a,b) BC h(11, 2) (b, c) p(a) p(b | a) p(c | a, b) a 2 BCD p(d|b), h(1,2)(b,c) CDF p(f|c,d) sep(2,3)={B,F} elim(2,3)={C,D} 3 BF BEF p(e|b,f), 1 h (2,3)(b), h2(2,3)(f) EF 4 EFG p(g|e,f) h(12,3) (b) 2 ( 2 , 3) h c ,d p ( d | b) h(11, 2 ) (b, c) ( f ) max p( f | c, d ) c ,d Cluster Tree Elimination vs. Mini-Clustering 1 ABC h(1, 2) (b, c) 1 BCDF h( 2,1) (b, c) h( 2,3) (b, f ) 2 BEF EF 4 EFG BCDF H ( 2,1) BF h(3, 2) (b, f ) 3 h(3, 4) (e, f ) h( 4,3) (e, f ) BEF EF 4 EFG h(11, 2) (b, c) h(12,1) (b) H ( 2 , 3) BF 3 H (1, 2) BC BC 2 ABC h(22,1) (c) h(12,3) (b) h(22,3) ( f ) H ( 3, 2 ) h(13, 2) (b, f ) H ( 3, 4 ) h(13, 4) (e, f ) H ( 4 , 3) h(14,3) (e, f ) Mini-Clustering Correctness and completeness: Algorithm MC(i) computes a bound (or an approximation) on the joint probability P(Xi,e) of each variable and each of its values. Time & space complexity: O(n hw* d i) where hw* = maxu | {f | f (u) } | Normalization Algorithms for the belief updating problem compute, in general, the joint probability: P( X i , e), X i query node, e evidence Computing the conditional probability: P( X i | e), X i query node, e evidence is easy to do if exact algorithms can be applied becomes an important issue for approximate algorithms Normalization MC can compute an (upper) bound P( X i , e) on the joint P(Xi,e) Deriving a bound on the conditional P(Xi|e) is not easy when the exact P(e) is not available If a lower bound P(e) would be available, we could use: P( X i , e) / P(e) as an upper bound on the posterior In our experiments we normalized the results and regarded them as approximations of the posterior P(Xi|e) Experimental results We tested MC with max and mean operators Algorithms: Exact IBP Gibbs sampling (GS) MC with normalization (approximate) Measures: Networks (all variables are binary): Coding networks CPCS 54, 360, 422 Grid networks (MxM) Random noisy-OR networks Random networks Normalized Hamming Distance (NHD) BER (Bit Error Rate) Absolute error Relative error Time Random networks - Absolute error Random networks, N=50, P=2, k=2, evid=0, w*=10, 50 instances Random networks, N=50, P=2, k=2, evid=10, w*=10, 50 instances 0.16 0.16 0.12 0.10 0.08 0.06 0.10 0.08 0.06 0.04 0.04 0.02 0.02 0.00 0.00 0 2 4 6 i-bound evidence=0 8 MC Gibbs Sampling IBP 0.12 Absolute error Absolute error 0.14 MC Gibbs Sampling IBP 0.14 10 0 2 4 6 i-bound evidence=10 8 10 Coding networks - Bit Error Rate Coding networks, N=100, P=4, sigma=.22, w*=12, 50 instances Coding networks, N=100, P=4, sigma=.51, w*=12, 50 instances 0.007 0.18 MC IBP 0.006 MC IBP 0.16 0.005 Bit Error Rate Bit Error Rate 0.14 0.004 0.003 0.002 0.12 0.10 0.001 0.08 0.000 0.06 0 2 4 6 i-bound sigma=0.22 8 10 12 0 2 4 6 8 i-bound sigma=.51 10 12 Noisy-OR networks - Absolute error Noisy-OR networks, N=50, P=3, evid=10, w*=16, 25 instances Noisy-OR networks, N=50, P=3, evid=20, w*=16, 25 instances 1e+0 1e+0 MC IBP Gibbs Sampling MC IBP Gibbs Sampling 1e-1 Absolute error Absolute error 1e-1 1e-2 1e-3 1e-2 1e-3 1e-4 1e-4 1e-5 1e-5 0 2 4 6 8 10 i-bound evidence=10 12 14 16 0 2 4 6 8 10 i-bound evidence=20 12 14 16 CPCS422 - Absolute error CPCS 422, evid=0, w*=23, 1 instance CPCS 422, evid=10, w*=23, 1 instance 0.05 0.05 MC IBP MC IBP 0.04 Absolute error Absolute error 0.04 0.03 0.02 0.01 0.03 0.02 0.01 0.00 0.00 2 4 6 8 10 i-bound evidence=0 12 14 16 18 2 4 6 8 10 12 i-bound evidence=10 14 16 18 Grid 15x15 - 0 evidence Grid 15x15, evid=0, w*=22, 10 instances Grid 15x15, evid=0, w*=22, 10 instances 0.14 0.05 0.12 MC IBP MC IBP 0.04 Absolute error NHD 0.10 0.08 0.06 0.03 0.02 0.04 0.01 0.02 0.00 0.00 0 2 4 6 8 10 12 14 16 18 0 2 4 6 8 10 12 14 i-bound i-bound Grid 15x15, evid=0, w*=22, 10 instances Grid 15x15, evid=0, w*=22, 10 instances 0.12 16 18 16 18 12 MC IBP 10 0.10 MC IBP 8 Time (seconds) Relative error 0.08 0.06 0.04 6 4 2 0.02 0 0.00 0 2 4 6 8 10 i-bound 12 14 16 18 0 2 4 6 8 10 i-bound 12 14 Grid 15x15 - 10 evidence Grid 15x15, evid=10, w*=22, 10 instances Grid 15x15, evid=10, w*=22, 10 instances 0.06 0.14 0.12 MC IBP 0.05 MC IBP Absolute error NHD 0.10 0.08 0.06 0.04 0.03 0.02 0.04 0.01 0.02 0.00 0.00 0 2 4 6 8 10 12 14 16 18 0 2 4 6 8 10 12 14 i-bound i-bound Grid 15x15, evid=10, w*=22, 10 instances Grid 15x15, evid=10, w*=22, 10 instances 16 18 16 18 12 0.12 MC IBP 0.10 MC IBP 10 8 Time (seconds) Relative error 0.08 0.06 0.04 6 4 2 0.02 0 0.00 0 2 4 6 8 10 i-bound 12 14 16 18 0 2 4 6 8 10 i-bound 12 14 Grid 15x15 - 20 evidence Grid 15x15, evid=20, w*=22, 10 instances Grid 15x15, evid=20, w*=22, 10 instances 1 1 MC IBP Gibbs Sampling 0.1 Absolute error NHD MC IBP Gibbs Sampling 0.01 0.1 0.01 0.001 0.001 0 2 4 6 8 10 12 14 16 18 0 2 4 6 8 10 12 14 i-bound i-bound Grid 15x15, evid=20, w*=22, 10 instances Grid 15x15, evid=20, w*=22, 10 instances 16 18 16 18 10 1 8 Time (seconds) Relative error 0.1 MC IBP Gibbs Sampling 0.01 MC IBP Gibbs Sampling 6 4 2 0 0.001 0 2 4 6 8 10 i-bound 12 14 16 18 0 2 4 6 8 10 i-bound 12 14 Conclusion MC extends the partition based approximation from mini-buckets to general tree decompositions for the problem of belief updating Empirical evaluation demonstrates its effectiveness and superiority (for certain types of problems, with respect to the measures considered) relative to other existing algorithms