Home3-sol

COSC 6342“Machine Learning” Assignment3 Spring 2009 Modified Some Solutions Group Daffodils 13. G—Topic14 a) Assuming that k1(u,v) and k2(u,v) are kernels; show that k(u,v)=k1(u,v)*k2(u,v) is also a kernel! k1 and k2 are kernels; therefore mappings Φ1 and Φ2 exist: k1(u,v) = <Φ1(u), Φ1(v)> and k2(u,v) = <Φ2(u), Φ2(v)> Proof: Let Φ(u)= Φ1(u)* Φ2(u) Solution: Let, k(u,v) = <Φ(u), Φ(v)> = ∑i,j Φ(u)ij Φ(v)ij = ∑i,j Φ1(u)i Φ2(u)j Φ1(v)i Φ2(v)j = ∑i,j Φ1(u)i Φ1(v)i Φ2(u)j Φ2(v)j = ∑iΦ1(u)i Φ1(v)i∑j Φ2(u)j Φ2(v)j = < Φ1(u), Φ1(v)> < Φ2(u), Φ2(v)> = k1(u,v)*k2(u,v) We found a mapping Φ such that: k(u,v) = <Φ(u), Φ(v)>; therefore, k is a kernel! Proof. Do we really need this part?? Note that the gram matrix K for k is the Hadamard product (or element-by-element product) of K1 and K2 (K = K1.K2). Suppose that K1 and K2 are covariance matrices of (X1, . . . ,Xn) and (Y1, . . . , Yn) respectively. Then K is simply the covariance matrix of (X1Y1, . . . ,XnYn), implying that it is symmetric and positive definite. Hence, if k1(u,v) and k2(u,v) are kernels. Then k(u,v)=k1(u,v)*k2(u,v) is also a kernel. Reference:Bishop Kernels “Techniques for constructing new kernels”. 14. G---Topic 14 a) Compute P(Earthquake|Mary_Calls) for the Alarm belief network (the second transparency in Dr. Eick’s Belief Network lecture). Justify all major steps in your computation! 1 Solution: Not verified!!! Ch. Eick P(earthquake|mary calls ) = P (E|MC) P(Alarm) = P (A) P(Burglary) = P (B) P (E|MC) = (P (MC|E) * P (E)) / P (MC) ………Applying Bayes Rule. P (MC|E) = P (MC|A,E) * P (A|E) + P (MC|~A,E) (P (~A|E)) = P (MC|A) * P (A|E) + P (MC|~A) (1-P (A|E)) P (A|E) = P (A|B, E) P (B|E) + P (A|~B, E) P (~B|E) = P (A|B, E) P (B) + P (A|~B, E) P (~B) = 0.95 * 0.001 + .29 * .999 = .29037. ……… B & E are independent. Substituting this value in equation of P (MC|E) we get, P (MC|E) = 0.7 * 0.29037 + 0.01* (1 - 0.29037) = 0.7 * 0.29037 + 0.01*0 .70963 = 0.210355. P (MC) = P (MC|A) P (A) + P (MC|~A) * P (~A) = P (MC|A) P (A) + P (MC|~A) * [1- P (A)] P(A) = P(A|B, E) P(B|E) + P(A|~B, E) P(~B|E) + P(A|B, ~E) P(~E|B) + P(A|~B, ~E) P(~B|~E) = P (A|B, E) P (B) P (E) + P (A| ~B, E) P (~B) P (E) + P (A|B,~E) P (~E) P (B) + P (A|~B,~E) P (~B) P (~E). = 0.95*0.001*0.002 + 0.29*0.99*0.002+0.94*0.001*0.998+0.001*0.999*0.998. = 0.002516 Substituting this in equation of P(MC) we get, P (MC) = 0.70* 0.002516 + 0.01* 0.9974 = 0.01173 Substituting in P (E|MC) we get P (E|MC) = (0.210355 * 0.002) / 0.01173 = 0.0358 b) Assume that the following belief network is given 2 B A F C D E i. Are F and E d-separable (independent)? ii. Are A|E and C|E d-separable? iii. Are A,B|F,D and E|F,D d-separable? Give reasons for your answers! Solution: i. F&E F  E are not independent. Reason: Path FE is not blocked They are not d-separable as none of the patterns given match this pattern ii. A|E & C|E ADEC this pattern matches pattern 3. It is blocked. AFEC This pattern matches pattern 3. it is blocked. Since both the paths are blocked, A&C are d separated given E. they are independent given C iii. A,B|F,D & E|F,D ADE this matches pattern 1(a) BDE this matches pattern 1(a) AFE this matches pattern 1(a) BDAFE this matches pattern 3b Since all the 4 (all possible ) paths from A, B to E are blocked. A,B,E are independent given F,D. They are D-separated. 16. G—Topic 18 Reinforcement Learning 3 a) Give the Bellman Equations for states 5, 6, 7 and 8 of the CHI word, depicted below! Solution: U(5) = 4 +  * max( U(2), 0.5*U(4) + 0.4*U(6) + 0.1 * U(7) ) U(6) = 2 +  * U(9) U(7) = -5 +  * max( U(4), U(5) ) U(8) = 2 +  * max( U(6), 0.9*U(9) + 0.1*U(7) ) Using temporal difference learning compute the utilities of all visited states! Do not only give the final result but also how you derived the final result including formulas used! Looking at the final result─did the agent leant anything? If yes, what did she learn? GHI World 4 Solution: The formula used for calculating new utility U(s)  U(s) + *[R(s) + *U(s’) – U(s)] Where s is the current state and s’ is the next state.  Given: Initially, all utilities are initialized to zero, indicated by U(i) Path taken : 1-2-3-5-2-3-5-7-5-7-4-1 U(1)  U(1) + 0.5[5 + 1*U(2) – U(1)] = 0 + 0.5[5 +0 -0] = 2.5 U(2)  U(2) + 0.5[1 + 1*U(3) – U(2)] = 0.5 U(3)  U(3) + 0.5[9 + 1*U(5) – U(3)] = 4.5 U(5)  U(5) + 0.5[4 + 1*0.5 – U(5)] = 2.25 U(2)  0.5 + 0.5[1 + 1*4.5 – 0.5] = 3 U(3)  4.5 + 0.5[9 + 1*2.25 – 4.5] = 7.875 U(5)  2.25 + 0.5[4 + 1*0 – 2.25] = 3.125 U(7)  U(7) + 0.5[-5 + 1*3.125 - 0] = -0.9375 U(5)  3.125 + 0.5[4 + 1*(-0.9375) – 3.125] = 3.09375 U(7)  -0.9375 + 0.5[-5 + 1* 0 – (-0.9375)] = -2.96875 U(4)  U(4) + 0.5[1 + 1*2.5 - 0] = 1.75 U(1)  2.5 + 0.5[5+0-2.5] = 3.75 From the above utility values we can say that the agent has learnt certain policy. It has learnt that utility of state ‘7’ is less and hence in the test run it will avoid that state. Also since the utility of state 1 has increased from 2.5 to 3.75, it is good to go to that state. In the test run the agent will take the states that will yield high utilities. 5

Home3-sol

Related documents

Products

Support

Home3-sol

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib