V I L L A N O V A U N I V E R S I T Y Department of Electrical & Computer Engineering ECE 8412 Introduction to Neural Networks Homework 08 Due 15 November 2005 Chapter 9 Name: Santhosh Kumar Kokala Complete the following and submit your work to me as a hard copy. You can also fax your solutions to 610-519-4436. Use MATLAB to determine the results of matrix algebra. 1. E9.2 (Skip part vi.) A. (i) Sketch a contour plot of this function. To sketch the contour plot we first need to find the Hessian matrix. For quadratic functions we can do this by comparing it to the standard form. F ( x) 1 T 1 6 - 2 x Ax d T x c x T x - 1 - 1x 2 2 - 2 6 From the above equation the Hessian matrix is 6 - 2 2 F ( x) A - 2 6 The Eigen values and Eigen vectors of this matrix are 1 1 1 4 z 1 2 8 z 2 1 1 We know that the function contours are elliptical. The maximum curvature of F(x) is in the direction of z2, since 2 is larger than 1 , and the minimum curvature is in the direction of z1 (the long axis of the ellipses). Next we need to find the center of the contours (the stationary point). This occurs when the gradient is equal to zero. 6 - 2 1 0 F (x) Ax d x - 2 6 1 0 Therefore 1 6 - 2 1 0.25 x - 2 6 1 0.25 * The contours will be elliptical, centered at x* , with long axis in the direction of z1. -2- (ii) Sketch the trajectory of the steepest descent algorithm on the contour plot of part (i), if the initial T guess is x 0 0 0 . Assume a very small learning rate is used. Applying steepest descent algorithm to the above function 6 - 2 1 F (x) Ax d x - 2 6 1 If we evaluate the gradient at the initial guess we find 1 g 0 F (x) x x 0 1 Assume that we used a fixed learning rate of 0.01. The first iteration of the steepest descent algorithm would be 0 1 0.01 x1 x 0 g 0 0.01 0 1 0.01 The second iteration of steepest descent produces 0.96 g1 F (x) x x 1 0.96 0.01 0.96 0.196 x 2 x1 g1 0.01 0.01 0.96 0.196 If we continue the iterations we obtain the following trajectory (iii) Perform two iterations of steepest descent with learning rate 0.1 Applying steepest descent algorithm to the above function 6 - 2 1 F (x) Ax d x - 2 6 1 If we evaluate the gradient at the initial guess we find 1 g 0 F (x) x x 0 1 Assume that we used a fixed learning rate of 0.1. The first iteration of the steepest descent algorithm would be 0 1 0.1 x1 x 0 g 0 0.1 0 1 0.1 -3- The second iteration of steepest descent produces 0.6 g 1 F (x) x x 1 0.6 0.1 0.6 0.16 x 2 x1 g1 0.1 0.1 0.6 0.16 -4- (iv)What is the maximum stable learning rate? Maximum allowable learning rate is given by 2 max 2 0.25 8 (v)What is the maximum stable learning rate for the initial guess given in part (ii)? - 1 - 1 1 g p 1 = 0.25 Learning rate is given by k T k 6 - 2 1 pk Akpk 1 1 - 2 6 1 T k 2. Consider the following function F (x) 1 x1 x 2 5 1 3x1 2 x 2 2 2 (i) Perform one iteration of Newton’s method, starting from the initial guess x 0 10 10 T The gradient and Hessian matrices are 6(1 ( x1 x 2 5) 2 )(3x1 2 x 2 ) 2( x1 x 2 5)(1 (3x1 2 x 2 ) 2 F ( x ) 2 2 4(1 ( x1 x 2 5) )(3 x1 2 x 2 ) 2( x1 x 2 5)(1 (3 x1 2 x 2 ) 18(1 (x 1 x 2 - 5) 2 ) 24(x 1 x 2 - 5)(3x 1 2x 2 ) 2(1 (3x 2x ) 2 ) 1 2 2 F ( x) 12(1 (x 1 x 2 - 5) 2 ) 4(x 1 x 2 - 5)(3x 1 2x 2 ) 2(1 (3x 2x ) 2 ) 1 2 -5- 12(1 (x 1 x 2 - 5) 2 ) 4(x 1 x 2 - 5)(3x 1 2x 2 ) 2(1 (3x 1 2x 2 ) 2 ) 2 8(1 (x 1 x 2 - 5) ) - 16(x 1 x 2 - 5)(3x 1 2x 2 ) 2(1 (3x 1 2x 2 ) 2 ) 10 If we start from the initial guess x 0 10 16590 g 0 F (x) x x 0 6010 and A 0 2 F ( x) xx0 1910 390 7870 1910 Therefore the first iteration of Newton’s method will be 1 1910 16590 7.3280 390 6010 7.6759 10 7870 x1 x 0 A g 0 10 1910 1 0 (ii) Repeat part (i), starting from the initial guess x 0 2 2 T 2 If we start from the initial guess x 0 2 14 g 0 F (x) xx 0 26 and A 0 2 F ( x) xx0 2 22 22 58 Therefore the first iteration of Newton’s method will be 2 2 x1 x 0 A g 0 2 22 1 0 -6- 1 22 14 2.4 58 26 2.6 (iii) Find the minimum of the function and compare with your results from the previous two parts. Matlab Code: syms x1 x2 fx1=6*(1+(x1+x2-5)^2)*(3*x1-2*x2)+2*(x1+x2-5)*(1+(3*x1-2*x2)^2) fx2=(-4*(1+(x1 + x2 - 5)^2)*(3*x1 - 2*x2))+(2*(1+(3*x1 - 2*x2)^2)*(x1 + x2 - 5)) solve(‘fx1=0’,’fx2=0’) Output: x1 = 2 2+3/5*i 2-3/5*i 2+1/5*i 2-1/5*i x2 = 3 3+2/5*i 3-2/5*i 3+4/5*i 3-4/5*i Therefore the minimum of the function matches with that of values calculated using initial guess 2 10 x 0 instead of x 0 2 10 -7- 3. For the function given in E8.3i perform two iterations of the conjugate gradient algorithm. Do this for each of the three methods given in Equations 9.61, 9.62, and 9.63. A. 7 2 x1 6 x1 x 2 x 22 2 6 7 1 Let the initial guess be x 0 and the Hessian matrix A 6 2 1 The gradient at x0 is 1 g 0 F (x) x x 0 8 F ( x) The first search direction is then 1 p 0 g 0 8 To minimize along a line, for a quadratic function we can use 1 8 8 g T0 p 0 0 T 2.6 7 p 0 Ap 0 6 1 1 8 6 2 8 1 Therefore the first iteration of conjugate gradient will be 1 1 3.6 x1 x 0 p 0 2.6 1 8 19.8 Now we need to find the second search direction using the Hestenes and Steifel. This requires the gradient at x1 144 g1 F (x) x x 1 18 144 1 143 g 0 g1 g 0 18 8 26 144 26 g g 1 18 324 1 1 g p 0 143 26 8 T 0 T 0 143 144 1 468 p 1 g 1 1p 0 324 * 18 8 2574 -8- 468 18 g p 1 2574 1 T 1 7 6 468 130 p1 Ap 1 468 2574 6 2 2574 The second step of conjugate gradient is therefore 144 T 1 3.6 1 468 0 x 2 x1 1p1 19.8 30 2574 0 Calculating the second search direction using Fletcher and Reeves 1 g1T g1 324 g T0 g 0 144 1 468 p 1 g 1 1p 0 324 * 18 8 2574 468 18 g p 1 2574 1 T 1 7 6 468 130 p1 Ap 1 468 2574 6 2 2574 T 1 144 3.6 1 468 0 x 2 x1 1p1 19.8 30 2574 0 Calculating the second search direction using Polak and Ribiere gT0 g1 1 T 324 g0 g0 144 1 143 g 0 g1 g 0 18 8 26 144 1 468 p 1 g 1 1p 0 324 * 18 8 2574 468 18 2574 gT p 1 1 T1 1 7 6 468 130 p1 Ap 1 468 2574 6 2 2574 144 3.6 1 468 0 x 2 x1 1p1 19.8 30 2574 0 -9- 4. Given a set of vectors p1 , p2 , , pn1 , which are conjugate gradient with respect to the Hessian matrix A. Show that these vectors are independent. Explain each step of your development. A. Given that the set of vectors p1 , p2 , , pn1 , which are conjugate gradient with respect to the Hessian matrix A. If these vectors are dependent, then from the equation a1 1 a2 2 ....... an n 0 it must be true that n 1 a p j 0 j j 0, for some set of constants a0, a1,………an-1, at least one of which is nonzero. If we multiply both sides of this equation by p Tk A , we obtain n 1 n 1 j 0 j 0 p A a j p j a j p Tk Ap j ak p Tk Ap k 0 , T k Where the second equality comes from the definition of conjugate vectors in the equation pTk Ap j 0 k j. If A is positive definite (a unique strong minimum exists), then p Tk Ap k must be strictly positive. This implies that ak must be zero for all k. Therefore conjugate directions must be independent. - 10 -