HW7

advertisement
CS B553 Homework 7: Parameter and Structure Learning in Chow-Liu Tree
Networks
Due date: 4/10/2012
In this assignment you will be looking at a very simple type of Bayesian network model called a Chow-Liu
tree. Chow-Liu trees are useful because they provide a class network structures that are easy to learn.
(In fact, they predate Bayesian networks.) In this model, the joint distribution over X1,…,Xn is
decomposed into first-order conditionals as follows:
𝑃(𝑋1 , … , 𝑋𝑛 ) = ∏ 𝑃(𝑋𝑟 ) ∏ 𝑃(𝑋𝑖 |𝑋𝑝𝑎𝑖 )
𝑟∈𝑅
𝑖∉𝑅
where R is the subset of root nodes, and for non-root nodes pai gives the index of i’s only parent. So, we
can encode the graph structure through the encoding G=(pa1,…,pan) with pai{0,1,…,n}, where we take
the convention that any node with pai=0 is taken to be a root. (Note that we’re allowing the encoding
to produce forests rather than constraining the network to be a tree)
We’ll assume all variables are binary. You may wish to use the notation xi0 to indicate the event that
Xi=0 and xi1 to indicate the event that Xi=1.
1. List all of the parameters in the CPTs of a Chow-Liu tree network with structure G=(pa1,…,pan),
with node Xr being the root. Gather these into a single parameter vector .
2. Given a dataset D=( x[1],…,x[M]) of fully observed samples, write down the expression l(D; ,G)
giving the log-likelihood of D given  under the structure model G. Use the textbook’s notation
M[‘event’] to indicate the number of times ‘event’ occurs in the dataset D.
3. Give a formula for the maximum-likelihood estimate * given the data D. What is the loglikelihood l(D; *, G)?
4. Consider the model G0=(0,…,0), i.e., the entirely disconnected network. Now consider adding a
new edge j->i by setting pai = j to derive the model G =(0,…,0,j,0,…,0) (j is in the i’th index). What
is the change in log-likelihood of the maximum-likelihood parameter estimates when changing
from G0 to G? That is, compute l(D; , G) – l(D; 0, G0) where  is the MLE for model G and 0 is
the MLE for model G0.
5. Now consider the graph G0 with Xi a root node, and consider adding the edge j->i to achieve a
graph G that is also a forest. What expression do you get for l(D; , G) – l(D; 0, G0), with  the
MLE for G and 0 the MLE for G0?
6. Consider an algorithm that starts with a completely disconnected network and greedily starts to
add edges j->i, ensuring that each new edge does not induce a cycle in the graph. Edges are
added in order of largest to smallest improvement in likelihood. Demonstrate that it produces a
tree rather than a forest. Give its running time in big-O notation. Furthermore, prove or argue
that it produces the Chow-Liu tree with maximum likelihood over all trees (hint: consider how
this relates to Prim’s algorithm for minimum spanning trees).
Download