CMSC764 Homework 3: Due Mon Mar 21 1. Write a method with

advertisement
CMSC764
Homework 3: Due Mon Mar 21
1. Write a method with signature
function x sol , res =
g r a d d e s c e n t ( f , g r a d f , x0 )
The inputs f and gradf are function handles. The function f: RN → R is an arbitrary objective
function, and gradf: RN → RN is its gradient. The method should minimize f using gradient
descent, and terminate when the gradient of f is small. I suggest stopping when
k∇f (xk )k < k∇f (x0 )k ∗ tol
where x0 is an initial guess and tol is a small tolerance parameter (a typical value would be
10−4 ).
Use a backtracking line search to guarantee convergence. The stepsize should be monotonically
decreasing. Each iteration should begin by trying the stepsize that was used on the previous
iteration, and then backtrack until the Armijo condition holds:
f (xk+1 ) ≤ f (xk ) + αhxk+1 − xk , ∇f (xk )i,
where α ∈ (0, 1), and α = 0.1 is suggested.
The function returns the solution vector x sol, and a vector res containing the norm of the
residual (i.e., the gradient in this case) at each iteration.
You can obtain the initial stepsize by estimating a Lipschitz constant for f using the formula:
L=
k∇f (x) − ∇f (y)k
kx − yk
where x = x0 and y is obtained by adding a small random perturbation to x. The initial stepsize
is then τ = L2 . Use a vector of zeros as the initial guess x0 .
Turn in your implementation with your homework submission.
2. Modify your solution from Question 1 to create the new function
f u n c t i o n D, c
=
g r a d d e s c e n t B B ( f , g r a d f , x0 )
This method is the same as the method from problem 1. However, instead of using a monotonically decreasing stepsize, begin each linesearch with a Barzilai-Borwein stepsize of magnitude
τ=
hxk+1 − xk , xk+1 − xk i
.
hxk+1 − xk , ∇f (xk+1 ) − ∇f (xk )i
Turn in your implementation with your homework submission.
3. Modify your solution from Question 1 to create the new function
f u n c t i o n D, c
=
g r a d d e s c e n t n e s t e r o v ( f , g r a d f , x0 )
This function should implement Nesterov’s method:
xk = y k − τ ∇f (y k )
p
1 + 1 + 4(δ k )2
k+1
δ
=
2
k−1
δ
y k+1 = xk + k+1 (xk − xk−1 ).
δ
CMSC764
Homework 3: Due Mon Mar 21
The method is initialized with x0 = y 1 and α1 = 1, and the first iteration has index k = 1.
Use the monotone line search from Question 1. For the Nesterov problem, what backtracking
condition is
f (xk ) ≤ f (y k ) + αhxk − y k , ∇f (y k )i.
Unlike standard gradient methods, Nestorov’s method requires α ∈ [1/2, 1). I suggest using
α =1/2.
Turn in your implementation with your homework submission.
4. Test your three solvers using the logistic regression classification problem from homework
1. Create a sample classification problem with 200 feature vectors and 20 features per vector using the method create classification problem. Then use your solvers to minimize
logreg objective using the gradient function logreg grad. Plot the residuals for the different
methods using a logarithmic y-axis. For example, you could call
[ D, c ] = c r e a t e c l a s s i f i c a t i o n problem ( 2 0 0 , 2 0 , 1 ) ;
[ x s o l , r e s ] = g r a d d e s c e n t ( @( x ) l o g r e g o b j e c t i v e ( x , D, c ) ,
@( x ) l o g r e g g r a d ( x , D, c ) , x0 ) ;
semilogy ( res ) ;
For this experiment, use a stopping condition with tol = 10−8 to verify that the
methods converge to a high degree of precision.
5. Answer the following questions. Keep your answers short (but complete).
a Why is the stopping condition in problem 1 based on the residual? Why don’t we just
terminate when the objective function gets small?
b Why is the stopping condition suggested in problem 1 better than just terminating when
the residual is small, i.e. when k∇f (xk )k < tol.
c Which gradient method was best? Does convergence speed depend on the problem dimension? Does it depend on how linearly separable the data are? The answers to these
questions may vary depending on how you generate data.
d What happens if you don’t use the Armijo linesearch? Does the method still converge?
Include a plot to verify this.
Create a short writeup (pdf ) containing your plot from problem 4 and your answers
to the questions in problem 5.
Page 2 of 2
Download