Home Work8

advertisement
V I L L A N O V A U N I V E R S I T Y
Department of Electrical & Computer Engineering
ECE 8412 Introduction to Neural Networks
Homework 08 Due 15 November 2005
Chapter 9
Name: Santhosh Kumar Kokala
Complete the following and submit your work to me as a hard copy.
You can also fax your solutions to 610-519-4436.
Use MATLAB to determine the results of matrix algebra.
1. E9.2 (Skip part vi.)
A. (i) Sketch a contour plot of this function.
To sketch the contour plot we first need to find the Hessian matrix. For quadratic functions we can do
this by comparing it to the standard form.
F ( x) 
1 T
1 6 - 2 
x Ax  d T x  c  x T 
x  - 1 - 1x
2
2 - 2 6
From the above equation the Hessian matrix is
6 - 2
 2 F ( x)  A  

- 2 6
The Eigen values and Eigen vectors of this matrix are
1
1 
1  4 z 1    2  8 z 2   
1
 1
We know that the function contours are elliptical. The maximum curvature of F(x) is in the direction of
z2, since 2 is larger than 1 , and the minimum curvature is in the direction of z1 (the long axis of the
ellipses).
Next we need to find the center of the contours (the stationary point). This occurs when the gradient is
equal to zero.
6 - 2 
 1 0
F (x)  Ax  d  
x    

- 2 6 
 1 0
Therefore
1
6 - 2  1 0.25
x  
  

- 2 6  1 0.25
*
The contours will be elliptical, centered at x* , with long axis in the direction of z1.
-2-
(ii) Sketch the trajectory of the steepest descent algorithm on the contour plot of part (i), if the initial
T
guess is x 0  0 0 . Assume a very small learning rate is used.
Applying steepest descent algorithm to the above function
6 - 2
 1
F (x)  Ax  d  
x 

- 2 6
 1
If we evaluate the gradient at the initial guess we find
 1
g 0  F (x) x  x   
0
 1
Assume that we used a fixed learning rate of   0.01. The first iteration of the steepest descent
algorithm would be
0
 1 0.01
x1  x 0  g 0     0.01   

0
 1 0.01
The second iteration of steepest descent produces
 0.96
g1  F (x) x  x  

1
 0.96
0.01
 0.96 0.196
x 2  x1  g1  
 0.01



0.01
 0.96 0.196
If we continue the iterations we obtain the following trajectory
(iii) Perform two iterations of steepest descent with learning rate   0.1
Applying steepest descent algorithm to the above function
6 - 2
 1
F (x)  Ax  d  
x 

- 2 6
 1
If we evaluate the gradient at the initial guess we find
 1
g 0  F (x) x  x   
0
 1
Assume that we used a fixed learning rate of   0.1. The first iteration of the steepest descent algorithm
would be
0
 1 0.1
x1  x 0  g 0     0.1    
0
 1 0.1
-3-
The second iteration of steepest descent produces
 0.6
g 1  F (x) x  x  

1
 0.6
0.1
 0.6 0.16
x 2  x1  g1     0.1


0.1
 0.6 0.16
-4-
(iv)What is the maximum stable learning rate?
Maximum allowable learning rate is given by  
2
 max

2
 0.25
8
(v)What is the maximum stable learning rate for the initial guess given in part (ii)?
- 1 - 1 
1
g p
1 = 0.25
Learning rate is given by  k   T k  
6 - 2 1
pk Akpk
1 1
 
- 2 6 1
T
k
2. Consider the following function


F (x)  1  x1  x 2  5 1  3x1  2 x 2 
2
2

(i) Perform one iteration of Newton’s method, starting from the initial guess x 0  10 10
T
The gradient and Hessian matrices are
6(1  ( x1  x 2  5) 2 )(3x1  2 x 2 )  2( x1  x 2  5)(1  (3x1  2 x 2 ) 2 
F ( x )  

2
2
 4(1  ( x1  x 2  5) )(3 x1  2 x 2 )  2( x1  x 2  5)(1  (3 x1  2 x 2 ) 
18(1  (x 1  x 2 - 5) 2 ) 

24(x 1  x 2 - 5)(3x 1  2x 2 ) 
2(1  (3x  2x ) 2 )
1
2


 2 F ( x)  

 12(1  (x 1  x 2 - 5) 2 ) 

4(x 1  x 2 - 5)(3x 1  2x 2 ) 
2(1  (3x  2x ) 2 )
1
2

-5-
 12(1  (x 1  x 2 - 5) 2 )  

4(x 1  x 2 - 5)(3x 1  2x 2 )  
2(1  (3x 1  2x 2 ) 2 ) 




2
8(1  (x 1  x 2 - 5) ) 

- 16(x 1  x 2 - 5)(3x 1  2x 2 )  

2(1  (3x 1  2x 2 ) 2 )

10
If we start from the initial guess x 0   
10
16590 
g 0  F (x) x x  

0
 6010
and
A 0   2 F ( x)
xx0
 1910 
 390
7870

 1910
Therefore the first iteration of Newton’s method will be
1
 1910  16590  7.3280

 390  6010 7.6759
10 7870
x1  x 0  A g 0     
10  1910
1
0
(ii) Repeat part (i), starting from the initial guess x 0  2 2
T
 2
If we start from the initial guess x 0   
 2
14 
g 0  F (x) xx  

0
 26
and
A 0   2 F ( x)
xx0
 2

 22
 22
58 
Therefore the first iteration of Newton’s method will be
 2   2
x1  x 0  A g 0     
2  22
1
0
-6-
1
 22 14  2.4

58   26 2.6
(iii) Find the minimum of the function and compare with your results from the previous two parts.
Matlab Code:
syms x1 x2
fx1=6*(1+(x1+x2-5)^2)*(3*x1-2*x2)+2*(x1+x2-5)*(1+(3*x1-2*x2)^2)
fx2=(-4*(1+(x1 + x2 - 5)^2)*(3*x1 - 2*x2))+(2*(1+(3*x1 - 2*x2)^2)*(x1 + x2 - 5))
solve(‘fx1=0’,’fx2=0’)
Output:
x1 =
2
2+3/5*i
2-3/5*i
2+1/5*i
2-1/5*i
x2 =
3
3+2/5*i
3-2/5*i
3+4/5*i
3-4/5*i
Therefore the minimum of the function matches with that of values calculated using initial guess
 2
10
x 0    instead of x 0   
 2
10
-7-
3. For the function given in E8.3i perform two iterations of the conjugate gradient algorithm.
Do this for each of the three methods given in Equations 9.61, 9.62, and 9.63.
A.
7 2
x1  6 x1 x 2  x 22
2
 6
7
1
Let the initial guess be x 0    and the Hessian matrix A  

 6  2 
1
The gradient at x0 is
1 
g 0  F (x) x x   
0
 8
F ( x) 
The first search direction is then
 1
p 0  g 0   
8 
To minimize along a line, for a quadratic function we can use
 1


8
8 
g T0 p 0
 
0   T

 2.6
7
p 0 Ap 0
  6  1
 1 8
 
 6 2 8 
1
Therefore the first iteration of conjugate gradient will be
1
 1 3.6 
x1  x 0  p 0     2.6   

1
8   19.8
Now we need to find the second search direction using the Hestenes and Steifel. This requires the gradient
at x1
144
g1  F (x) x x   
1
18 
144 1  143
g 0  g1  g 0         
18   8 26 
144
26 
g g 1
18   324
1 

1
g p 0
143 26 
8 
T
0
T
0
143
 144
 1  468
p 1   g 1   1p 0  
 324 *    


 18 
8  2574 
-8-
 468
18

g p
1
2574 
1   T 1  

7  6   468 130
p1 Ap 1
 468 2574


 6  2 2574 
The second step of conjugate gradient is therefore
144
T
1
3.6  1  468 0
x 2  x1   1p1  
 
 
 19.8 30 2574  0
Calculating the second search direction using Fletcher and Reeves
1 
g1T g1
 324
g T0 g 0
 144
 1  468
p 1   g 1   1p 0  
 324 *    


 18 
8  2574 
 468
18

g p
1
2574 
1   T 1  

7  6   468 130
p1 Ap 1
 468 2574


 6  2 2574 
T
1
144
3.6  1  468 0
x 2  x1   1p1  
 
 
 19.8 30 2574  0
Calculating the second search direction using Polak and Ribiere
gT0 g1
1  T
 324
g0 g0
144 1  143
g 0  g1  g 0         
18   8 26 
 144
 1  468
p 1   g 1   1p 0  
 324 *    


 18 
8  2574 
 468

18
2574 
gT p
1


 1   T1 1  

7  6   468 130
p1 Ap 1
 468 2574


 6  2 2574 
144
3.6  1  468 0
x 2  x1   1p1  
 
 
 19.8 30 2574  0
-9-
4. Given a set of vectors p1 , p2 , , pn1 , which are conjugate gradient with respect to the
Hessian matrix A. Show that these vectors are independent. Explain each step of your development.
A. Given that the set of vectors p1 , p2 , , pn1 , which are conjugate gradient with respect to the Hessian
matrix A. If these vectors are dependent, then from the equation
a1 1  a2  2  .......  an  n  0
it must be true that
n 1
a p
j 0
j
j
 0,
for some set of constants a0, a1,………an-1, at least one of which is nonzero. If we multiply both sides of
this equation by p Tk A , we obtain
n 1
n 1
j 0
j 0
p A a j p j   a j p Tk Ap j  ak p Tk Ap k  0 ,
T
k
Where the second equality comes from the definition of conjugate vectors in the equation
pTk Ap j  0
k  j.
If A is positive definite (a unique strong minimum exists), then p Tk Ap k must be strictly positive. This
implies that ak must be zero for all k. Therefore conjugate directions must be independent.
- 10 -
Download