CSB551 – Elements of Artificial Intelligence – Fall 2009 Homework #6 (Bayesian Inference) Due: 11/17/09 (5:15 pm) How to complete this HW: Either 1) type your answers in the empty spaces below each problem and print out this document, or 2) print this document and write your answers in the empty spaces on the printout. Return the homework in class, during office hours, or slip it under the door of Lindley 301F before 5:15 on Tuesday, 11/17/09. Your name: ……………………………………………………………………………………… Your email address: …………………………………………………………………………… Note on Honor Code: You must NOT look at previously published solutions of any of these problems in preparing your answers. You may discuss these problems with other students in the class (in fact, you are encouraged to do so) and/or look into other documents (books, web sites), with the exception of published solutions, without taking any written or electronic notes. If you have discussed any of the problems with other students, indicate their name(s) here: ……………………………………………………………………………………………… Any intentional transgression of these rules will be considered an honor code violation. General information: Justify your answers, but keep explanations short and to the point. Excessive verbosity will be penalized. If you have any doubt on how to interpret a question, tell us in advance, so that we can help you understand the question, or tell us how you understand it in your returned solution. Grading: Problem# I II III IV Total Max. grade 25 25 25 25 100 Your grade I. Inference in Bayesian Networks (25 points) You are given the following Bayes net (it is the same example as used in class): 1. Before you make any observation, how would you compute the prior probability of MaryCalls? [We don’t ask you to do the numerical computation. We ask you to express the probability of MaryCalls, P(M), using quantities that are given in the belief net. To simplify notations, replace MaryCalls by M, Earthquake by E, etc… and let P(M), P(-M), P(E), … denote the probability of M, not M, E, etc…] 2. Now, assume that you observe that JohnCalls is True (i.e., Pr(J)=1). How would you compute the new (posterior) probability of MaryCalls? [Again, we don’t ask you for numerical values. Explain how you would do the computation. In this explanation, you will need to use the prior probability of JohnCalls, P(J). No need to explain how you compute it, since its computation is very similar to that of P(M) in Question 1.] II. Properties of Bayesian Networks (25pts) 1. Here you will prove that the following two Bayes nets N1 and N2 are equivalent; that is, they can represent the same joint probability distribution P(A,B). N1 N2 A B B A Assume A and B are Boolean variables with joint probability distribution: A,B a P(A,B) A,B b A,B c A,B d Define the conditional probability tables of N1 and N2 so that their product equals P(A,B). 2. What are all of the independence relationships can be inferred from the following Bayesian network? Write a list of statements X Y | Z1, …, Zn, [W1,…,Wm] where this means X is independent of Y, given ALL the evidence Z1,…Zn, and optionally any of the evidence W1,…,Wm. A B C D 3. Same question, but for the following Bayes net. A B C D E III. Properties of Markov Chains (25 pts) 1. Suppose you are given a time series x1,x2,…,xt of Boolean variables. The sequence is generated by the deterministic rule xi+2 = f(xi,xi+1) for all it-2. Define an equivalent time series y1,y2,…,yt-1 such that each yi+1 is defined only in terms of the previous yi, in other words, yi+1 = g(yi). Equivalence means there is a one to one mapping between the sequence x1,x2,…,xt and y1,y2,…,yt-1. Your answer should simply state the domain of the y’s and define g – you don’t have to actually prove equivalence. (Hint: define each yi as a pair of Booleans) 2. Suppose you are given an order 2 Markov chain X1,X2,…,Xt of random Boolean variables, with transition probabilities P(Xi+2|Xi,Xi+1). Recall that in an order 2, Markov chain, we mean that Xi+2 is independent of Xj given Xi and Xi+1 for all j<i. Define an equivalent order 1 Markov chain Y1,Y2,…,Yt-1 that is equivalent to X1,…,Xt. Equivalence in this case means there is a one to one mapping between the joint distribution over X1,…,Xt and Y1,…,Yt-1. Your answer should simply state the domains of the Y’s and define P(Yt+1|Yt) – you don’t have to prove equivalence. IV. Statistical Parameter Learning (25 pts) 1. You are playing poker against Joe, and trying to understand his pattern of bluffing. (For those of you who don’t play poker, a bluff is when Joe makes a large bet when he actually has a weak hand, in order to make you think he has a strong hand. It is highly advantageous to detect when someone is bluffing!) You suspect that when he is bluffing, he is more likely to fidget or close his mouth tightly. You are estimating the variables StrongHand, LargeBet, Fidgets, and CloseMouth (abbreviated S, L, F, and C). Over 20 rounds of play, you observe the following data: S 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 L 1 1 1 1 1 1 0 0 0 1 1 1 1 1 0 0 0 0 0 0 F 0 0 1 0 0 0 1 0 0 1 1 1 1 0 1 0 0 0 0 0 C 0 0 0 1 1 1 1 0 0 0 1 1 1 1 0 1 1 0 0 0 We say that a bluff B occurs when S=0 and L=1. Our initial hypothesis model for predicting bluffing is a conditional probability table for P(B|F,C). Fill out the following table with the maximum likelihood estimate for P(B|F,C) (to simplify your life, use fractions rather than decimal notation) F 0 0 1 1 C 0 1 0 1 P(B|F,C) 2. Now we’d like to model how likely Joe has a strong hand, given L, F, and C. Our initial hypothesis model is a conditional probability table for P(S|L,F,C). Fill out the following table with the maximum likelihood estimate for P(S|L,F,C). L 0 0 0 0 1 1 1 1 F 0 0 1 1 0 0 1 1 C 0 1 0 1 0 1 0 1 P(S|L,F,C) 3. In an attempt to avoid overfitting, you decide that a better hypothesis model may state that C is independent of both S and F, given L. You also hypothesize that S and L both directly cause F, and that S causes L. Draw a Bayesian network that represents these assumptions. 4. Estimate the MAP parameters of the network that you drew in Question 3. Your parameters are the entries of the conditional probability tables. Assume that every parameter has a Beta prior with a=b=1.