Lecture 10: Junction Tree Algorithm CSCI 780 - Machine Learning Andrew Rosenberg March 4, 2010 1/1 Today Junction Tree Algorithm Efficient calculation of marginals in a graphical model 2/1 Example of Graphical Model. x0 x1 x2 θ(xi , πi ) = x3 x5 x4 m(xi , πi ) m(πi ) 3/1 Efficient Computation of Marginals b a d c e Pass messages (small tables) around the graph. The messages will be small functions that propagate potentials around an undirected graphical model. The inference technique is the Junction Tree Algorithm 4/1 Junction Tree Algorithm Junction Tree Algorithm Moralization Introduce Evidence Triangulate Construct Junction Tree Propagate Probabilities 5/1 Junction Tree Algorithm Junction Tree Algorithm Moralization Introduce Evidence Triangulate Construct Junction Tree Propagate Probabilities 6/1 Moralization Converts a directed graph to an undirected graph. Moralization “marries” the parents. Insert an undirected edge between two nodes that have a child in common. Replace all directed edges with undirected edges. x0 x1 x2 x0 x1 x2 x3 x5 x4 x3 x5 x4 7/1 Moralization x3 x1 x5 x0 x2 x4 p(x0 )p(x1 |x0 )p(x2 |x0 )p(x3 |x1 )p(x4 |x2 )p(x5 |x1 , x4 ) x3 x1 x5 x0 x2 x4 1 ψ(x0 , x1 )ψ(x0 , x2 )ψ(x1 , x3 )ψ(x2 , x4 )ψ(x1 , x4 , x5 ) Z 8/1 Another Moralization Example 9/1 Junction Tree Algorithm Junction Tree Algorithm Moralization Introduce Evidence Triangulate Construct Junction Tree Propagate Probabilities 10 / 1 Introduce Evidence Given a moral graph. Identify what is observed xE . Reduce Probability functions since we know that some values are fixed. So only keep probability functions over remaining nodes xF p(X ) = p(XF |X¯E ) ∝ ∝ 1 ψ(x1 , x2 , x3 , x4 )ψ(x4 , x5 )ψ(x4 , x6 )ψ(x4 , x7 ) Z 1 ψ(x1 , x2 , x3 = x¯3 , x4 = x¯4 )ψ(x4 = x¯4 , x5 )ψ(x4 = x¯4 , x6 )ψ(x4 = x¯4 , x7 ) Z 1 ψ̂(x1 , x2 )ψ̂(x5 )ψ̂(x6 )ψ̂(x7 ) Z Replace potential functions with slices .4 .12 .1 .15 Requires a different normalization term 11 / 1 Introduce Evidence Observing xE separates nodes. Normalization Calculation Avoid it until the end, when we want to calculate an individual marginal. 12 / 1 Junction Tree Algorithm Junction Tree Algorithm Moralization Introduce Evidence Triangulate Construct Junction Tree Propagate Probabilities 13 / 1 Junction Trees Ultimately we want to construct Junction Trees. Each node is a clique of variables in a modal graph. Edges connect cliques There is a unique path from a node to the root Between each connected clique node there is a separator node. Separators contain intersections of variables 14 / 1 Triangulation How do we construct a Junction Tree? Need to guarantee that a Junction Graph, made up of the cliques and separators of an undirected graph is a Tree To do this, we make triangles (3 node cliques) I.e. eliminate any (chordless) cycles of 4 or more nodes. 15 / 1 Triangulation There are potentially many choices for which edge to add. We’d like to keep the largest clique size small – Small ψ tables. However, Triangulation that minimizes the largest clique size is NP-complete. Suboptimal triangulation is acceptable (poly-time) and generally doesn’t introduce too many extra dimensions. 16 / 1 Triangulation There are potentially many choices for which edge to add. We’d like to keep the largest clique size small – Small ψ tables. However, Triangulation that minimizes the largest clique size is NP-complete. Suboptimal triangulation is acceptable (poly-time) and generally doesn’t introduce too many extra dimensions. 17 / 1 Triangulation Examples 18 / 1 Triangulation Examples 19 / 1 Triangulation Examples 20 / 1 Triangulation Examples 21 / 1 Triangulation Examples 22 / 1 Triangulation Examples 23 / 1 Junction Tree Algorithm Junction Tree Algorithm Moralization Introduce Evidence Triangulate Construct Junction Tree Propagate Probabilities 24 / 1 Constructing Junction Trees Multiple trees can be constructed from the same graph. Junction Trees must satisfy the Running Intersection Property On the path connecting clique node V to clique node W , all other clique nodes must include the nodes in V ∩ W . 25 / 1 Constructing Junction Trees Multiple trees can be constructed from the same graph. Junction Trees must satisfy the Running Intersection Property On the path connecting clique node V to clique node W , all other clique nodes must include the nodes in V ∩ W . Also: A Junction Tree has the largest total separator cardinality. |φ(B, C )| + |φ(C , D)| > |φ(C , D)| + |φ(D)| 26 / 1 Forming a Junction Tree Given a set of cliques, must connect the nodes, such that the Running Intersection Property holds. A valid Junction Tree maximizes the cardinality of the separators Kruskal’s algorithm 1 Initialize a tree with no edges 2 Calculate the size of separators between all pairs 3 Connect two cliques with the largest separator cardinality (that doesn’t create a loop 4 Repeat 3 until all nodes are connected. 27 / 1 Junction Tree Algorithm Junction Tree Algorithm Moralization Introduce Evidence Triangulate Construct Junction Tree Propagate Probabilities 28 / 1 Now what? We have a valid Junction Tree! What can we do with it. (or...who cares?) Probabilities in Junction Trees. 1Y ψ̂(xC ) Z C Q ψ(xC ) 1 p(X ) = QC Z S φ(xS ) p(X ) = This is equivalent to de-absorbing smaller cliques from maximal cliques. Doesn’t change anything, just a less compact description. 29 / 1 Conversion from Directed Graph Example Conversion. p(X ) = Q 1 C ψ(xC ) Q Z S φ(xS ) We can represent CPTs as clique and separator potential functions (with a normalization term). 30 / 1 Junction Tree Algorithm Need to make marginals consistent. ψ(A, B, D) → p(A, B, D) φ(B, D) → p(B, D) ψ(B, C , D) → p(B, C , D) X p(A, B, D) = ˆ D) p(B, = ˆ ˆ D) p(B, A p(B, D) X p(B, C , D) D The Junction Tree Algorithm sends messages between cliques and separators until consistency is reached. 31 / 1 Junction Tree Algorithm Send a message from each clique to its separator. The message is what the clique thinks the marginal should be Normalize the clique by each message from its separators such that it agrees. If they agree, finished! X X ψV = φS = p(S) = φS = ψW V \S W \S If not.... 32 / 1 Junction Tree Algorithm – Message Passing φ∗S = X φ∗∗ S ψV = = ψV∗ = φ∗S ψW φS ψV X ψV∗∗ ∗ ψW W \S V \S ∗ ψW X = ψV∗∗ = φ∗∗ S ψV∗ φ∗S ∗∗ ψW = ∗ ψW X φ∗∗ S ψV φ∗S V \S V \S = = X φ∗∗ S ψV ∗ φS V \S X ∗∗ ψW φ∗∗ S = W \S 33 / 1 Junction Tree algorithm When convergence is reached – clique potentials are marginals and separator potentials are submarginals p(x) never changes because of this message passing. φ∗ S ∗ 1 ψV φS ψW 1 ψV ψW 1 ψV∗ ψW = = p(x) = ∗ ∗ Z φS Z φS Z φS This implies that, so long as p(x) is correctly represented in the potential functions, the junction tree algorithm can be used to make each potential correspond to an appropriate marginal without changing the overall probability function. 34 / 1 Converting From DAG to Junction Tree Convert the Directed Graph to the Junction Tree Initialize separators to 1, and the clique tables to CPTs. p(X ) = p(x1 )p(x2 |x1 )p(x3 |x2 )p(x4 |x3 )p(x5 |x3 )p(x6 |x5 )p(x7 |x5 ) p(X ) = Q 1 C ψ(Xc ) Q Z S φ(XS ) = 1 p(x1 , x2 )p(x3 |x2 )p(x4 |x3 )p(x5 |x3 )p(x6 |x5 )p(x7 |x5 ) 1 1·1·1·1·1 Run JTA to set potential functions to marginals. 35 / 1 Evidence in Junction Tree Initialize the same way. ψAB = p(A, B) ψBC = p(C |B) φB = 1 Update with a slice instead of the whole table. φ∗B = ∗ ψBC = ∗ ψAB = Conditionals X A φ∗B ψAB δ(A = 1) = X p(A, B)δ(A = 1) = p(A = 1, B) A p(A = 1, B) ψBC = p(C |B) = p(A = 1, B, C ) φB 1 ψAB = p(A = 1, B) ∗ ψBC ∗ B,C ψBC p(B, C |A = 1) = P 36 / 1 Efficiency of The Junction Tree Algorithm All steps are efficient. 1 Construct CPTs Polynomial in # of data points 2 Moralization Polynomial in # of nodes (variables) 3 Introduce Evidence 4 Triangulate Polynomial in # of nodes (variables) Suboptimal=Polynomial, Optimal=NP 5 Construct Junction Tree Polynomial in the number of cliques Identifying Cliques = Polynomial in the number of nodes. 6 Propagate Probabilities Polynomial in number of cliques, Exponential in size of cliques 37 / 1 Bye Next Clustering Preview. 38 / 1