Home3-sol

advertisement
COSC 6342“Machine Learning” Assignment3 Spring 2009
Modified Some Solutions Group Daffodils
13. G—Topic14
a) Assuming that k1(u,v) and k2(u,v) are kernels; show that
k(u,v)=k1(u,v)*k2(u,v) is also a kernel!
k1 and k2 are kernels; therefore mappings Φ1 and Φ2 exist:
k1(u,v) = <Φ1(u), Φ1(v)> and k2(u,v) = <Φ2(u), Φ2(v)>
Proof:
Let Φ(u)= Φ1(u)* Φ2(u)
Solution:
Let,
k(u,v) = <Φ(u), Φ(v)>
= ∑i,j Φ(u)ij Φ(v)ij
= ∑i,j Φ1(u)i Φ2(u)j Φ1(v)i Φ2(v)j
= ∑i,j Φ1(u)i Φ1(v)i Φ2(u)j Φ2(v)j
= ∑iΦ1(u)i Φ1(v)i∑j Φ2(u)j Φ2(v)j
= < Φ1(u), Φ1(v)> < Φ2(u), Φ2(v)>
= k1(u,v)*k2(u,v)
We found a mapping Φ such that: k(u,v) = <Φ(u), Φ(v)>; therefore, k is a
kernel!
Proof. Do we really need this part??
Note that the gram matrix K for k is the Hadamard product (or element-by-element
product) of K1 and K2 (K = K1.K2). Suppose that K1 and K2 are covariance matrices of
(X1, . . . ,Xn) and (Y1, . . . , Yn) respectively. Then K is simply the covariance matrix of
(X1Y1, . . . ,XnYn), implying that it is symmetric and positive definite.
Hence, if k1(u,v) and k2(u,v) are kernels. Then k(u,v)=k1(u,v)*k2(u,v) is also a kernel.
Reference:Bishop Kernels “Techniques for constructing new kernels”.
14. G---Topic 14
a) Compute P(Earthquake|Mary_Calls) for the Alarm belief network (the second
transparency in Dr. Eick’s Belief Network lecture). Justify all major steps in
your computation!
1
Solution: Not verified!!! Ch. Eick
P(earthquake|mary calls ) = P (E|MC)
P(Alarm) = P (A)
P(Burglary) = P (B)
P (E|MC) = (P (MC|E) * P (E)) / P (MC)
………Applying Bayes Rule.
P (MC|E) = P (MC|A,E) * P (A|E) + P (MC|~A,E) (P (~A|E))
= P (MC|A) * P (A|E) + P (MC|~A) (1-P (A|E))
P (A|E) = P (A|B, E) P (B|E) + P (A|~B, E) P (~B|E)
= P (A|B, E) P (B) + P (A|~B, E) P (~B)
= 0.95 * 0.001 + .29 * .999
= .29037.
……… B & E are independent.
Substituting this value in equation of P (MC|E) we get,
P (MC|E) = 0.7 * 0.29037 + 0.01* (1 - 0.29037)
= 0.7 * 0.29037 + 0.01*0 .70963
= 0.210355.
P (MC) = P (MC|A) P (A) + P (MC|~A) * P (~A)
= P (MC|A) P (A) + P (MC|~A) * [1- P (A)]
P(A) = P(A|B, E) P(B|E) + P(A|~B, E) P(~B|E) + P(A|B, ~E) P(~E|B) + P(A|~B, ~E)
P(~B|~E)
= P (A|B, E) P (B) P (E) + P (A| ~B, E) P (~B) P (E) + P (A|B,~E) P (~E) P (B) +
P (A|~B,~E) P (~B) P (~E).
= 0.95*0.001*0.002 + 0.29*0.99*0.002+0.94*0.001*0.998+0.001*0.999*0.998.
= 0.002516
Substituting this in equation of P(MC) we get,
P (MC) = 0.70* 0.002516 + 0.01* 0.9974
= 0.01173
Substituting in P (E|MC) we get
P (E|MC) = (0.210355 * 0.002) / 0.01173
= 0.0358
b) Assume that the following belief network is given
2
B
A
F
C
D
E
i.
Are F and E d-separable (independent)?
ii.
Are A|E and C|E d-separable?
iii.
Are A,B|F,D and E|F,D d-separable?
Give reasons for your answers!
Solution:
i.
F&E
F  E are not independent.
Reason: Path FE is not blocked
They are not d-separable as none of the patterns given match this pattern
ii.
A|E & C|E
ADEC this pattern matches pattern 3. It is blocked.
AFEC This pattern matches pattern 3. it is blocked.
Since both the paths are blocked, A&C are d separated given E. they are independent
given C
iii.
A,B|F,D & E|F,D
ADE this matches pattern 1(a)
BDE this matches pattern 1(a)
AFE this matches pattern 1(a)
BDAFE this matches pattern 3b
Since all the 4 (all possible ) paths from A, B to E are blocked. A,B,E are independent
given F,D. They are D-separated.
16. G—Topic 18
Reinforcement Learning
3
a) Give the Bellman Equations for states 5, 6, 7 and 8 of the CHI word, depicted
below!
Solution:
U(5) =
4 +  * max( U(2), 0.5*U(4) + 0.4*U(6) + 0.1 * U(7) )
U(6)
=
2 +  * U(9)
U(7)
=
-5 +  * max( U(4), U(5) )
U(8)
=
2 +  * max( U(6), 0.9*U(9) + 0.1*U(7) )
Using temporal difference learning compute the utilities of all visited states! Do not only
give the final result but also how you derived the final result including formulas used!
Looking at the final result─did the agent leant anything? If yes, what did she learn?
GHI World
4
Solution:
The formula used for calculating new utility
U(s)  U(s) + *[R(s) + *U(s’) – U(s)]
Where s is the current state and s’ is the next state. 
Given: Initially, all utilities are initialized to zero, indicated by U(i)
Path taken : 1-2-3-5-2-3-5-7-5-7-4-1
U(1)  U(1) + 0.5[5 + 1*U(2) – U(1)]
= 0 + 0.5[5 +0 -0] = 2.5
U(2)  U(2) + 0.5[1 + 1*U(3) – U(2)] = 0.5
U(3)  U(3) + 0.5[9 + 1*U(5) – U(3)] = 4.5
U(5)  U(5) + 0.5[4 + 1*0.5 – U(5)] = 2.25
U(2)  0.5 + 0.5[1 + 1*4.5 – 0.5] = 3
U(3)  4.5 + 0.5[9 + 1*2.25 – 4.5] = 7.875
U(5)  2.25 + 0.5[4 + 1*0 – 2.25] = 3.125
U(7)  U(7) + 0.5[-5 + 1*3.125 - 0] = -0.9375
U(5)  3.125 + 0.5[4 + 1*(-0.9375) – 3.125] = 3.09375
U(7)  -0.9375 + 0.5[-5 + 1* 0 – (-0.9375)] = -2.96875
U(4)  U(4) + 0.5[1 + 1*2.5 - 0] = 1.75
U(1)  2.5 + 0.5[5+0-2.5] = 3.75
From the above utility values we can say that the agent has learnt certain policy. It has
learnt that utility of state ‘7’ is less and hence in the test run it will avoid that state. Also
since the utility of state 1 has increased from 2.5 to 3.75, it is good to go to that state. In
the test run the agent will take the states that will yield high utilities.
5
Download