Lecture 10: Junction Tree Algorithm CSCI 780

advertisement
Lecture 10: Junction Tree Algorithm
CSCI 780 - Machine Learning
Andrew Rosenberg
March 4, 2010
1/1
Today
Junction Tree Algorithm
Efficient calculation of marginals in a graphical model
2/1
Example of Graphical Model.
x0
x1
x2
θ(xi , πi ) =
x3
x5
x4
m(xi , πi )
m(πi )
3/1
Efficient Computation of Marginals
b
a
d
c
e
Pass messages (small tables) around the graph.
The messages will be small functions that propagate
potentials around an undirected graphical model.
The inference technique is the Junction Tree Algorithm
4/1
Junction Tree Algorithm
Junction Tree Algorithm
Moralization
Introduce Evidence
Triangulate
Construct Junction Tree
Propagate Probabilities
5/1
Junction Tree Algorithm
Junction Tree Algorithm
Moralization
Introduce Evidence
Triangulate
Construct Junction Tree
Propagate Probabilities
6/1
Moralization
Converts a directed graph to an undirected graph.
Moralization “marries” the parents.
Insert an undirected edge between two nodes that have a child
in common.
Replace all directed edges with undirected edges.
x0
x1
x2
x0
x1
x2
x3
x5
x4
x3
x5
x4
7/1
Moralization
x3
x1
x5
x0
x2
x4
p(x0 )p(x1 |x0 )p(x2 |x0 )p(x3 |x1 )p(x4 |x2 )p(x5 |x1 , x4 )
x3
x1
x5
x0
x2
x4
1
ψ(x0 , x1 )ψ(x0 , x2 )ψ(x1 , x3 )ψ(x2 , x4 )ψ(x1 , x4 , x5 )
Z
8/1
Another Moralization Example
9/1
Junction Tree Algorithm
Junction Tree Algorithm
Moralization
Introduce Evidence
Triangulate
Construct Junction Tree
Propagate Probabilities
10 / 1
Introduce Evidence
Given a moral graph. Identify what is observed xE .
Reduce Probability functions since we know that some values
are fixed.
So only keep probability functions over remaining nodes xF
p(X )
=
p(XF |X¯E )
∝
∝
1
ψ(x1 , x2 , x3 , x4 )ψ(x4 , x5 )ψ(x4 , x6 )ψ(x4 , x7 )
Z
1
ψ(x1 , x2 , x3 = x¯3 , x4 = x¯4 )ψ(x4 = x¯4 , x5 )ψ(x4 = x¯4 , x6 )ψ(x4 = x¯4 , x7 )
Z
1
ψ̂(x1 , x2 )ψ̂(x5 )ψ̂(x6 )ψ̂(x7 )
Z
Replace potential functions with slices
.4
.12
.1
.15
Requires a different normalization term
11 / 1
Introduce Evidence
Observing xE separates nodes.
Normalization Calculation
Avoid it until the end, when we want to calculate an
individual marginal.
12 / 1
Junction Tree Algorithm
Junction Tree Algorithm
Moralization
Introduce Evidence
Triangulate
Construct Junction Tree
Propagate Probabilities
13 / 1
Junction Trees
Ultimately we want to construct Junction Trees.
Each node is a clique of variables in a modal graph.
Edges connect cliques
There is a unique path from a node to the root
Between each connected clique node there is a separator node.
Separators contain intersections of variables
14 / 1
Triangulation
How do we construct a Junction Tree?
Need to guarantee that a Junction Graph, made up of the
cliques and separators of an undirected graph is a Tree
To do this, we make triangles (3 node cliques)
I.e. eliminate any (chordless) cycles of 4 or more nodes.
15 / 1
Triangulation
There are potentially many choices for which edge to add.
We’d like to keep the largest clique size small – Small ψ
tables.
However, Triangulation that minimizes the largest clique size
is NP-complete.
Suboptimal triangulation is acceptable (poly-time) and
generally doesn’t introduce too many extra dimensions.
16 / 1
Triangulation
There are potentially many choices for which edge to add.
We’d like to keep the largest clique size small – Small ψ
tables.
However, Triangulation that minimizes the largest clique size
is NP-complete.
Suboptimal triangulation is acceptable (poly-time) and
generally doesn’t introduce too many extra dimensions.
17 / 1
Triangulation Examples
18 / 1
Triangulation Examples
19 / 1
Triangulation Examples
20 / 1
Triangulation Examples
21 / 1
Triangulation Examples
22 / 1
Triangulation Examples
23 / 1
Junction Tree Algorithm
Junction Tree Algorithm
Moralization
Introduce Evidence
Triangulate
Construct Junction Tree
Propagate Probabilities
24 / 1
Constructing Junction Trees
Multiple trees can be constructed from the same graph.
Junction Trees must satisfy the Running Intersection
Property
On the path connecting clique node V to clique node W , all
other clique nodes must include the nodes in V ∩ W .
25 / 1
Constructing Junction Trees
Multiple trees can be constructed from the same graph.
Junction Trees must satisfy the Running Intersection
Property
On the path connecting clique node V to clique node W , all
other clique nodes must include the nodes in V ∩ W .
Also: A Junction Tree has the largest total separator cardinality.
|φ(B, C )| + |φ(C , D)| > |φ(C , D)| + |φ(D)|
26 / 1
Forming a Junction Tree
Given a set of cliques, must connect the nodes, such that the
Running Intersection Property holds.
A valid Junction Tree maximizes the cardinality of the
separators
Kruskal’s algorithm
1
Initialize a tree with no edges
2
Calculate the size of separators between all pairs
3
Connect two cliques with the largest separator cardinality
(that doesn’t create a loop
4
Repeat 3 until all nodes are connected.
27 / 1
Junction Tree Algorithm
Junction Tree Algorithm
Moralization
Introduce Evidence
Triangulate
Construct Junction Tree
Propagate Probabilities
28 / 1
Now what?
We have a valid Junction Tree!
What can we do with it. (or...who cares?)
Probabilities in Junction Trees.
1Y
ψ̂(xC )
Z
C
Q
ψ(xC )
1
p(X ) = QC
Z S φ(xS )
p(X ) =
This is equivalent to de-absorbing smaller cliques from
maximal cliques.
Doesn’t change anything, just a less compact description.
29 / 1
Conversion from Directed Graph
Example Conversion.
p(X ) =
Q
1 C ψ(xC )
Q
Z S φ(xS )
We can represent CPTs as clique and separator potential functions
(with a normalization term).
30 / 1
Junction Tree Algorithm
Need to make marginals consistent.
ψ(A, B, D)
→
p(A, B, D)
φ(B, D)
→
p(B, D)
ψ(B, C , D)
→
p(B, C , D)
X
p(A, B, D)
=
ˆ D)
p(B,
=
ˆ
ˆ D)
p(B,
A
p(B, D)
X
p(B, C , D)
D
The Junction Tree Algorithm sends messages between cliques
and separators until consistency is reached.
31 / 1
Junction Tree Algorithm
Send a message from each clique to its separator.
The message is what the clique thinks the marginal should be
Normalize the clique by each message from its separators
such that it agrees.
If they agree, finished!
X
X
ψV = φS = p(S) = φS =
ψW
V \S
W \S
If not....
32 / 1
Junction Tree Algorithm – Message Passing
φ∗S
=
X
φ∗∗
S
ψV
=
=
ψV∗
=
φ∗S
ψW
φS
ψV
X
ψV∗∗
∗
ψW
W \S
V \S
∗
ψW
X
=
ψV∗∗
=
φ∗∗
S
ψV∗
φ∗S
∗∗
ψW
=
∗
ψW
X φ∗∗
S
ψV
φ∗S
V \S
V \S
=
=
X
φ∗∗
S
ψV
∗
φS
V \S
X ∗∗
ψW
φ∗∗
S =
W \S
33 / 1
Junction Tree algorithm
When convergence is reached – clique potentials are
marginals and separator potentials are submarginals
p(x) never changes because of this message passing.
φ∗
S
∗
1 ψV φS ψW
1 ψV ψW
1 ψV∗ ψW
=
=
p(x) =
∗
∗
Z φS
Z
φS
Z φS
This implies that, so long as p(x) is correctly represented in the
potential functions, the junction tree algorithm can be used to
make each potential correspond to an appropriate marginal without
changing the overall probability function.
34 / 1
Converting From DAG to Junction Tree
Convert the Directed Graph to the Junction Tree
Initialize separators to 1, and the clique tables to CPTs.
p(X ) = p(x1 )p(x2 |x1 )p(x3 |x2 )p(x4 |x3 )p(x5 |x3 )p(x6 |x5 )p(x7 |x5 )
p(X )
=
Q
1 C ψ(Xc )
Q
Z S φ(XS )
=
1 p(x1 , x2 )p(x3 |x2 )p(x4 |x3 )p(x5 |x3 )p(x6 |x5 )p(x7 |x5 )
1
1·1·1·1·1
Run JTA to set potential functions to marginals.
35 / 1
Evidence in Junction Tree
Initialize the same way.
ψAB
=
p(A, B)
ψBC
=
p(C |B)
φB
=
1
Update with a slice instead of the whole table.
φ∗B
=
∗
ψBC
=
∗
ψAB
=
Conditionals
X
A
φ∗B
ψAB δ(A = 1) =
X
p(A, B)δ(A = 1) = p(A = 1, B)
A
p(A = 1, B)
ψBC =
p(C |B) = p(A = 1, B, C )
φB
1
ψAB = p(A = 1, B)
∗
ψBC
∗
B,C ψBC
p(B, C |A = 1) = P
36 / 1
Efficiency of The Junction Tree Algorithm
All steps are efficient.
1 Construct CPTs
Polynomial in # of data points
2
Moralization
Polynomial in # of nodes (variables)
3
Introduce Evidence
4
Triangulate
Polynomial in # of nodes (variables)
Suboptimal=Polynomial, Optimal=NP
5
Construct Junction Tree
Polynomial in the number of cliques
Identifying Cliques = Polynomial in the number of nodes.
6
Propagate Probabilities
Polynomial in number of cliques, Exponential in size of cliques
37 / 1
Bye
Next
Clustering Preview.
38 / 1
Download