COSC 6342“Machine Learning” Assignment3 Spring 2009 Modified Some Solutions Group Daffodils 13. G—Topic14 a) Assuming that k1(u,v) and k2(u,v) are kernels; show that k(u,v)=k1(u,v)*k2(u,v) is also a kernel! k1 and k2 are kernels; therefore mappings Φ1 and Φ2 exist: k1(u,v) = <Φ1(u), Φ1(v)> and k2(u,v) = <Φ2(u), Φ2(v)> Proof: Let Φ(u)= Φ1(u)* Φ2(u) Solution: Let, k(u,v) = <Φ(u), Φ(v)> = ∑i,j Φ(u)ij Φ(v)ij = ∑i,j Φ1(u)i Φ2(u)j Φ1(v)i Φ2(v)j = ∑i,j Φ1(u)i Φ1(v)i Φ2(u)j Φ2(v)j = ∑iΦ1(u)i Φ1(v)i∑j Φ2(u)j Φ2(v)j = < Φ1(u), Φ1(v)> < Φ2(u), Φ2(v)> = k1(u,v)*k2(u,v) We found a mapping Φ such that: k(u,v) = <Φ(u), Φ(v)>; therefore, k is a kernel! Proof. Do we really need this part?? Note that the gram matrix K for k is the Hadamard product (or element-by-element product) of K1 and K2 (K = K1.K2). Suppose that K1 and K2 are covariance matrices of (X1, . . . ,Xn) and (Y1, . . . , Yn) respectively. Then K is simply the covariance matrix of (X1Y1, . . . ,XnYn), implying that it is symmetric and positive definite. Hence, if k1(u,v) and k2(u,v) are kernels. Then k(u,v)=k1(u,v)*k2(u,v) is also a kernel. Reference:Bishop Kernels “Techniques for constructing new kernels”. 14. G---Topic 14 a) Compute P(Earthquake|Mary_Calls) for the Alarm belief network (the second transparency in Dr. Eick’s Belief Network lecture). Justify all major steps in your computation! 1 Solution: Not verified!!! Ch. Eick P(earthquake|mary calls ) = P (E|MC) P(Alarm) = P (A) P(Burglary) = P (B) P (E|MC) = (P (MC|E) * P (E)) / P (MC) ………Applying Bayes Rule. P (MC|E) = P (MC|A,E) * P (A|E) + P (MC|~A,E) (P (~A|E)) = P (MC|A) * P (A|E) + P (MC|~A) (1-P (A|E)) P (A|E) = P (A|B, E) P (B|E) + P (A|~B, E) P (~B|E) = P (A|B, E) P (B) + P (A|~B, E) P (~B) = 0.95 * 0.001 + .29 * .999 = .29037. ……… B & E are independent. Substituting this value in equation of P (MC|E) we get, P (MC|E) = 0.7 * 0.29037 + 0.01* (1 - 0.29037) = 0.7 * 0.29037 + 0.01*0 .70963 = 0.210355. P (MC) = P (MC|A) P (A) + P (MC|~A) * P (~A) = P (MC|A) P (A) + P (MC|~A) * [1- P (A)] P(A) = P(A|B, E) P(B|E) + P(A|~B, E) P(~B|E) + P(A|B, ~E) P(~E|B) + P(A|~B, ~E) P(~B|~E) = P (A|B, E) P (B) P (E) + P (A| ~B, E) P (~B) P (E) + P (A|B,~E) P (~E) P (B) + P (A|~B,~E) P (~B) P (~E). = 0.95*0.001*0.002 + 0.29*0.99*0.002+0.94*0.001*0.998+0.001*0.999*0.998. = 0.002516 Substituting this in equation of P(MC) we get, P (MC) = 0.70* 0.002516 + 0.01* 0.9974 = 0.01173 Substituting in P (E|MC) we get P (E|MC) = (0.210355 * 0.002) / 0.01173 = 0.0358 b) Assume that the following belief network is given 2 B A F C D E i. Are F and E d-separable (independent)? ii. Are A|E and C|E d-separable? iii. Are A,B|F,D and E|F,D d-separable? Give reasons for your answers! Solution: i. F&E F E are not independent. Reason: Path FE is not blocked They are not d-separable as none of the patterns given match this pattern ii. A|E & C|E ADEC this pattern matches pattern 3. It is blocked. AFEC This pattern matches pattern 3. it is blocked. Since both the paths are blocked, A&C are d separated given E. they are independent given C iii. A,B|F,D & E|F,D ADE this matches pattern 1(a) BDE this matches pattern 1(a) AFE this matches pattern 1(a) BDAFE this matches pattern 3b Since all the 4 (all possible ) paths from A, B to E are blocked. A,B,E are independent given F,D. They are D-separated. 16. G—Topic 18 Reinforcement Learning 3 a) Give the Bellman Equations for states 5, 6, 7 and 8 of the CHI word, depicted below! Solution: U(5) = 4 + * max( U(2), 0.5*U(4) + 0.4*U(6) + 0.1 * U(7) ) U(6) = 2 + * U(9) U(7) = -5 + * max( U(4), U(5) ) U(8) = 2 + * max( U(6), 0.9*U(9) + 0.1*U(7) ) Using temporal difference learning compute the utilities of all visited states! Do not only give the final result but also how you derived the final result including formulas used! Looking at the final result─did the agent leant anything? If yes, what did she learn? GHI World 4 Solution: The formula used for calculating new utility U(s) U(s) + *[R(s) + *U(s’) – U(s)] Where s is the current state and s’ is the next state. Given: Initially, all utilities are initialized to zero, indicated by U(i) Path taken : 1-2-3-5-2-3-5-7-5-7-4-1 U(1) U(1) + 0.5[5 + 1*U(2) – U(1)] = 0 + 0.5[5 +0 -0] = 2.5 U(2) U(2) + 0.5[1 + 1*U(3) – U(2)] = 0.5 U(3) U(3) + 0.5[9 + 1*U(5) – U(3)] = 4.5 U(5) U(5) + 0.5[4 + 1*0.5 – U(5)] = 2.25 U(2) 0.5 + 0.5[1 + 1*4.5 – 0.5] = 3 U(3) 4.5 + 0.5[9 + 1*2.25 – 4.5] = 7.875 U(5) 2.25 + 0.5[4 + 1*0 – 2.25] = 3.125 U(7) U(7) + 0.5[-5 + 1*3.125 - 0] = -0.9375 U(5) 3.125 + 0.5[4 + 1*(-0.9375) – 3.125] = 3.09375 U(7) -0.9375 + 0.5[-5 + 1* 0 – (-0.9375)] = -2.96875 U(4) U(4) + 0.5[1 + 1*2.5 - 0] = 1.75 U(1) 2.5 + 0.5[5+0-2.5] = 3.75 From the above utility values we can say that the agent has learnt certain policy. It has learnt that utility of state ‘7’ is less and hence in the test run it will avoid that state. Also since the utility of state 1 has increased from 2.5 to 3.75, it is good to go to that state. In the test run the agent will take the states that will yield high utilities. 5