Wolfe's Example and the Zigzag Phenomenon Harvey J. Greenberg University of Colorado at Denver http://www.cudenver.edu/hgreenbe/ (url changed December 1, 1998) June 18, 1996 This is a detailed analysis of Wolfe's example[1] to show how the zigzag phenomenon can cause non-convergence of a natural extension of Cauchy's steepest ascent, called the Truncated Gradient Algorithm. First, the algorithm is dened, then Wolfe's example is presented. Truncated Gradient Algorithm We seek to maximize f (x) on a box, [a; b], and we assume f is in C 1 on (a , "; b + ") for some " > 0. Without the box restriction, Cauchy's steepest ascent uses the iteration: xk+1 = xk + sk rf (xk ); where sk is chosen by the usual optimal line search. Under mild assumptions this converges to a stationary point, say x1 , where rf (x1 ) = 0. This is a maximum if f is concave. A natural extension is to project the gradient if a coordinate is at a bound value, and the sign of the associated partial derivative is such that the iterate would violate its bound for any positive step size. This is called the truncated gradient: 8 Maxf0; @f (x)=@xjg if xj = aj > > > > > < r+f (x)j = > > > > > : @f (x)=@xj if aj < xj < bj Minf0; @f (x)=@xjg if xj = bj The rst-order necessary conditions for x to be optimal is that r+ f (x) = 0, and this is sucient if f is concave. Thus, dene A(xk ) =: xk + sk r+ f (xk ), such that sk is chosen by the usual optimal line search. Then, we have the following. Truncated Gradient Alorithm. Input. Function f on box, [a; b] Rn and initial point, x0. Iteration. xk+1 = A(xk ) = xk + sk r+ f (xk ). Note that f (xk+1 ) > f (xk ) whenever r+ f (xk ) = 6 0, so it seems reasonable that this should converge to a solution. Wolfe, however, found the following counterexample: f (x) = , 34 (x12 , x1x2 + x22)3=4 + x3; 1 which we restrict to 0 xj 100 for j=1, 2, 3. We shall prove that f is concave on this box and that the truncated gradient algorithm converges to a non-optimal point, x1 = (0; 0; z ), where z < 100 for certain starting points. Concavity In this section we show f is concave on the non-negative orthant. We have the form f (x1 ; x2; x3) = , p1 q(x1; x2)p + x3, where q = x21 , x1x2 + x22, so it suces to show qp is convex on R2+ . We have q = (x1 , x2)2 + x1x2, so q 0 on R2+ , and q = 0 only at x1 = x2 = 0. To see that p1 qp is convex on R2++ , note its hessian is: (p , 1)q p,2[rq ]T [rq ] + q p,1 H; where H is the hessian of q . Divide by q p,2 , so this becomes: (2x1 , x2)2 (2x1 , x2)(2x2 , x1) 2 ,1 (p , 1) +q : 2 (2x1 , x2 )(2x2 , x1 ) (2x2 , x1 ) ,1 2 2 3 2 3 6 4 7 5 6 4 7 5 For p = 43 , this becomes: 2 6 4 q + 43 x22 , 12 (q + 23 x1x2) , 12 (q + 23 x1x2) q + 34 x21 3 7 5 : The diagonals are clearly positive, and the determinate of the 2 2 is 23 q 2 , so the hessian of 1p q p is positive denite on R2++ . This yields the desired result. Limit Points p p 1 Here we prove A(0; v; w) = ( 21 v; 0; w + 21 v ) and A(v; 0; wp ) = (0; 21 v; p w + 2 v), so that the k sequence zigzags about the x3 axis and A (0; v; w) ! (0; 0; w + v =(2 , 2). Thus, for v = 1 and w = 0, the limit is not optimal. Suppose x = (0; v; w), so @f (x)=@x1 = ,(x21 , x1x2 + x22),1=4(2x1 , x2) = pv @f (x)=@x = ,(x2 , x x + x2),1=4(2x , x ) = ,2pv 2 1 1 2 2 2 1 @f (x)=@x3 = 1 p p Thus, r+ f (x) = rf (x) = ( v; ,2 v; 1). p For the linepsearch, we require t 12 v since we mustphave x2 0. We now prove the optimal value for t is 21 v , thus proving A(0; v; w) = ( 21 v; 0; w + 12 v ). We have 2 d f (x + tr+ f (x)) = , 2v + 1: dt (1=2)1=4 t= 12 Thus, if v is suciently small, df=dt > 0, so the concavity of f implies the optimal step size is the p 1 greatest possible, 2 v. p p The proof that A(v; 0; w) = (0; 21 v; w + 12 v) is similar. Moreover, since xk1 and xk2 decrease, the \suciently small" condition is retained if we start (for example) at x = (0; 41 ; 0). References [1] P. Wolfe. On the Convergence of Gradient Methods Under Constraint. Research Report RC1752, IBM Watson Research Center, Yorktown Heights, NY., 1967. 3