advertisement

CMSC764 Homework 3: Due Mon Mar 21 1. Write a method with signature function x sol , res = g r a d d e s c e n t ( f , g r a d f , x0 ) The inputs f and gradf are function handles. The function f: RN → R is an arbitrary objective function, and gradf: RN → RN is its gradient. The method should minimize f using gradient descent, and terminate when the gradient of f is small. I suggest stopping when k∇f (xk )k < k∇f (x0 )k ∗ tol where x0 is an initial guess and tol is a small tolerance parameter (a typical value would be 10−4 ). Use a backtracking line search to guarantee convergence. The stepsize should be monotonically decreasing. Each iteration should begin by trying the stepsize that was used on the previous iteration, and then backtrack until the Armijo condition holds: f (xk+1 ) ≤ f (xk ) + αhxk+1 − xk , ∇f (xk )i, where α ∈ (0, 1), and α = 0.1 is suggested. The function returns the solution vector x sol, and a vector res containing the norm of the residual (i.e., the gradient in this case) at each iteration. You can obtain the initial stepsize by estimating a Lipschitz constant for f using the formula: L= k∇f (x) − ∇f (y)k kx − yk where x = x0 and y is obtained by adding a small random perturbation to x. The initial stepsize is then τ = L2 . Use a vector of zeros as the initial guess x0 . Turn in your implementation with your homework submission. 2. Modify your solution from Question 1 to create the new function f u n c t i o n D, c = g r a d d e s c e n t B B ( f , g r a d f , x0 ) This method is the same as the method from problem 1. However, instead of using a monotonically decreasing stepsize, begin each linesearch with a Barzilai-Borwein stepsize of magnitude τ= hxk+1 − xk , xk+1 − xk i . hxk+1 − xk , ∇f (xk+1 ) − ∇f (xk )i Turn in your implementation with your homework submission. 3. Modify your solution from Question 1 to create the new function f u n c t i o n D, c = g r a d d e s c e n t n e s t e r o v ( f , g r a d f , x0 ) This function should implement Nesterov’s method: xk = y k − τ ∇f (y k ) p 1 + 1 + 4(δ k )2 k+1 δ = 2 k−1 δ y k+1 = xk + k+1 (xk − xk−1 ). δ CMSC764 Homework 3: Due Mon Mar 21 The method is initialized with x0 = y 1 and α1 = 1, and the first iteration has index k = 1. Use the monotone line search from Question 1. For the Nesterov problem, what backtracking condition is f (xk ) ≤ f (y k ) + αhxk − y k , ∇f (y k )i. Unlike standard gradient methods, Nestorov’s method requires α ∈ [1/2, 1). I suggest using α =1/2. Turn in your implementation with your homework submission. 4. Test your three solvers using the logistic regression classification problem from homework 1. Create a sample classification problem with 200 feature vectors and 20 features per vector using the method create classification problem. Then use your solvers to minimize logreg objective using the gradient function logreg grad. Plot the residuals for the different methods using a logarithmic y-axis. For example, you could call [ D, c ] = c r e a t e c l a s s i f i c a t i o n problem ( 2 0 0 , 2 0 , 1 ) ; [ x s o l , r e s ] = g r a d d e s c e n t ( @( x ) l o g r e g o b j e c t i v e ( x , D, c ) , @( x ) l o g r e g g r a d ( x , D, c ) , x0 ) ; semilogy ( res ) ; For this experiment, use a stopping condition with tol = 10−8 to verify that the methods converge to a high degree of precision. 5. Answer the following questions. Keep your answers short (but complete). a Why is the stopping condition in problem 1 based on the residual? Why don’t we just terminate when the objective function gets small? b Why is the stopping condition suggested in problem 1 better than just terminating when the residual is small, i.e. when k∇f (xk )k < tol. c Which gradient method was best? Does convergence speed depend on the problem dimension? Does it depend on how linearly separable the data are? The answers to these questions may vary depending on how you generate data. d What happens if you don’t use the Armijo linesearch? Does the method still converge? Include a plot to verify this. Create a short writeup (pdf ) containing your plot from problem 4 and your answers to the questions in problem 5. Page 2 of 2