CSB551 – Elements of Artificial Intelligence – Fall 2009

advertisement
CSB551 – Elements of Artificial Intelligence – Fall 2009
Homework #6
(Bayesian Inference)
Due: 11/17/09 (5:15 pm)
How to complete this HW: Either 1) type your answers in the empty spaces below
each problem and print out this document, or 2) print this document and write your
answers in the empty spaces on the printout. Return the homework in class, during office
hours, or slip it under the door of Lindley 301F before 5:15 on Tuesday, 11/17/09.
Your name:
………………………………………………………………………………………
Your email address:
……………………………………………………………………………
Note on Honor Code: You must NOT look at previously published solutions of any
of these problems in preparing your answers. You may discuss these problems with other
students in the class (in fact, you are encouraged to do so) and/or look into other
documents (books, web sites), with the exception of published solutions, without taking
any written or electronic notes. If you have discussed any of the problems with other
students, indicate their name(s) here:
………………………………………………………………………………………………
Any intentional transgression of these rules will be considered an honor code violation.
General information: Justify your answers, but keep explanations short and to the
point. Excessive verbosity will be penalized. If you have any doubt on how to interpret a
question, tell us in advance, so that we can help you understand the question, or tell us
how you understand it in your returned solution.
Grading:
Problem#
I
II
III
IV
Total
Max. grade
25
25
25
25
100
Your grade
I. Inference in Bayesian Networks (25 points)
You are given the following Bayes net (it is the same example as used in class):
1. Before you make any observation, how would you compute the prior probability of
MaryCalls? [We don’t ask you to do the numerical computation. We ask you to express
the probability of MaryCalls, P(M), using quantities that are given in the belief net. To
simplify notations, replace MaryCalls by M, Earthquake by E, etc… and let P(M), P(-M),
P(E), … denote the probability of M, not M, E, etc…]
2. Now, assume that you observe that JohnCalls is True (i.e., Pr(J)=1). How would you
compute the new (posterior) probability of MaryCalls? [Again, we don’t ask you for
numerical values. Explain how you would do the computation. In this explanation, you
will need to use the prior probability of JohnCalls, P(J). No need to explain how you
compute it, since its computation is very similar to that of P(M) in Question 1.]
II. Properties of Bayesian Networks (25pts)
1. Here you will prove that the following two Bayes nets N1 and N2 are equivalent;
that is, they can represent the same joint probability distribution P(A,B).
N1
N2
A
B
B
A
Assume A and B are Boolean variables with joint probability distribution:
A,B
a
P(A,B)
A,B
b
A,B
c
A,B
d
Define the conditional probability tables of N1 and N2 so that their product equals
P(A,B).
2. What are all of the independence relationships can be inferred from the following
Bayesian network? Write a list of statements
X  Y | Z1, …, Zn, [W1,…,Wm]
where this means X is independent of Y, given ALL the evidence Z1,…Zn, and
optionally any of the evidence W1,…,Wm.
A
B
C
D
3. Same question, but for the following Bayes net.
A
B
C
D
E
III. Properties of Markov Chains (25 pts)
1. Suppose you are given a time series x1,x2,…,xt of Boolean variables. The
sequence is generated by the deterministic rule xi+2 = f(xi,xi+1) for all it-2.
Define an equivalent time series y1,y2,…,yt-1 such that each yi+1 is defined only in
terms of the previous yi, in other words, yi+1 = g(yi). Equivalence means there is a
one to one mapping between the sequence x1,x2,…,xt and y1,y2,…,yt-1. Your
answer should simply state the domain of the y’s and define g – you don’t have to
actually prove equivalence. (Hint: define each yi as a pair of Booleans)
2. Suppose you are given an order 2 Markov chain X1,X2,…,Xt of random Boolean
variables, with transition probabilities P(Xi+2|Xi,Xi+1). Recall that in an order 2,
Markov chain, we mean that Xi+2 is independent of Xj given Xi and Xi+1 for all
j<i. Define an equivalent order 1 Markov chain Y1,Y2,…,Yt-1 that is equivalent to
X1,…,Xt. Equivalence in this case means there is a one to one mapping between
the joint distribution over X1,…,Xt and Y1,…,Yt-1. Your answer should simply
state the domains of the Y’s and define P(Yt+1|Yt) – you don’t have to prove
equivalence.
IV. Statistical Parameter Learning (25 pts)
1. You are playing poker against Joe, and trying to understand his pattern of
bluffing. (For those of you who don’t play poker, a bluff is when Joe makes a
large bet when he actually has a weak hand, in order to make you think he has a
strong hand. It is highly advantageous to detect when someone is bluffing!)
You suspect that when he is bluffing, he is more likely to fidget or close his
mouth tightly. You are estimating the variables StrongHand, LargeBet, Fidgets,
and CloseMouth (abbreviated S, L, F, and C). Over 20 rounds of play, you
observe the following data:
S
1
1
1
1
1
1
1
1
1
0
0
0
0
0
0
0
0
0
0
0
L
1
1
1
1
1
1
0
0
0
1
1
1
1
1
0
0
0
0
0
0
F
0
0
1
0
0
0
1
0
0
1
1
1
1
0
1
0
0
0
0
0
C
0
0
0
1
1
1
1
0
0
0
1
1
1
1
0
1
1
0
0
0
We say that a bluff B occurs when S=0 and L=1. Our initial hypothesis model for
predicting bluffing is a conditional probability table for P(B|F,C). Fill out the
following table with the maximum likelihood estimate for P(B|F,C) (to simplify
your life, use fractions rather than decimal notation)
F
0
0
1
1
C
0
1
0
1
P(B|F,C)
2. Now we’d like to model how likely Joe has a strong hand, given L, F, and C. Our
initial hypothesis model is a conditional probability table for P(S|L,F,C). Fill out
the following table with the maximum likelihood estimate for P(S|L,F,C).
L
0
0
0
0
1
1
1
1
F
0
0
1
1
0
0
1
1
C
0
1
0
1
0
1
0
1
P(S|L,F,C)
3. In an attempt to avoid overfitting, you decide that a better hypothesis model may
state that C is independent of both S and F, given L. You also hypothesize that S
and L both directly cause F, and that S causes L. Draw a Bayesian network that
represents these assumptions.
4. Estimate the MAP parameters of the network that you drew in Question 3. Your
parameters are the entries of the conditional probability tables. Assume that every
parameter has a Beta prior with a=b=1.
Download