HW7

CS B553 Homework 7: Parameter and Structure Learning in Chow-Liu Tree Networks Due date: 4/10/2012 In this assignment you will be looking at a very simple type of Bayesian network model called a Chow-Liu tree. Chow-Liu trees are useful because they provide a class network structures that are easy to learn. (In fact, they predate Bayesian networks.) In this model, the joint distribution over X1,…,Xn is decomposed into first-order conditionals as follows: 𝑃(𝑋1 , … , 𝑋𝑛 ) = ∏ 𝑃(𝑋𝑟 ) ∏ 𝑃(𝑋𝑖 |𝑋𝑝𝑎𝑖 ) 𝑟∈𝑅 𝑖∉𝑅 where R is the subset of root nodes, and for non-root nodes pai gives the index of i’s only parent. So, we can encode the graph structure through the encoding G=(pa1,…,pan) with pai{0,1,…,n}, where we take the convention that any node with pai=0 is taken to be a root. (Note that we’re allowing the encoding to produce forests rather than constraining the network to be a tree) We’ll assume all variables are binary. You may wish to use the notation xi0 to indicate the event that Xi=0 and xi1 to indicate the event that Xi=1. 1. List all of the parameters in the CPTs of a Chow-Liu tree network with structure G=(pa1,…,pan), with node Xr being the root. Gather these into a single parameter vector . 2. Given a dataset D=( x[1],…,x[M]) of fully observed samples, write down the expression l(D; ,G) giving the log-likelihood of D given  under the structure model G. Use the textbook’s notation M[‘event’] to indicate the number of times ‘event’ occurs in the dataset D. 3. Give a formula for the maximum-likelihood estimate * given the data D. What is the loglikelihood l(D; *, G)? 4. Consider the model G0=(0,…,0), i.e., the entirely disconnected network. Now consider adding a new edge j->i by setting pai = j to derive the model G =(0,…,0,j,0,…,0) (j is in the i’th index). What is the change in log-likelihood of the maximum-likelihood parameter estimates when changing from G0 to G? That is, compute l(D; , G) – l(D; 0, G0) where  is the MLE for model G and 0 is the MLE for model G0. 5. Now consider the graph G0 with Xi a root node, and consider adding the edge j->i to achieve a graph G that is also a forest. What expression do you get for l(D; , G) – l(D; 0, G0), with  the MLE for G and 0 the MLE for G0? 6. Consider an algorithm that starts with a completely disconnected network and greedily starts to add edges j->i, ensuring that each new edge does not induce a cycle in the graph. Edges are added in order of largest to smallest improvement in likelihood. Demonstrate that it produces a tree rather than a forest. Give its running time in big-O notation. Furthermore, prove or argue that it produces the Chow-Liu tree with maximum likelihood over all trees (hint: consider how this relates to Prim’s algorithm for minimum spanning trees).

HW7

Related documents

Products

Support

HW7

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib