Wolfe`s Example and the Zigzag Phenomenon

advertisement
Wolfe's Example and the Zigzag Phenomenon
Harvey J. Greenberg
University of Colorado at Denver
http://www.cudenver.edu/hgreenbe/
(url changed December 1, 1998)
June 18, 1996
This is a detailed analysis of Wolfe's example[1] to show how the zigzag phenomenon can cause
non-convergence of a natural extension of Cauchy's steepest ascent, called the Truncated Gradient
Algorithm. First, the algorithm is dened, then Wolfe's example is presented.
Truncated Gradient Algorithm
We seek to maximize f (x) on a box, [a; b], and we assume f is in C 1 on (a , "; b + ") for some
" > 0. Without the box restriction, Cauchy's steepest ascent uses the iteration:
xk+1 = xk + sk rf (xk );
where sk is chosen by the usual optimal line search. Under mild assumptions this converges to a
stationary point, say x1 , where rf (x1 ) = 0. This is a maximum if f is concave.
A natural extension is to project the gradient if a coordinate is at a bound value, and the sign
of the associated partial derivative is such that the iterate would violate its bound for any positive
step size. This is called the truncated gradient:
8
Maxf0; @f (x)[email protected] if xj = aj
>
>
>
>
>
<
r+f (x)j = >
>
>
>
>
:
@f (x)[email protected]
if aj < xj < bj
Minf0; @f (x)[email protected] if xj = bj
The rst-order necessary conditions for x to be optimal is that r+ f (x) = 0, and this is
sucient if f is concave. Thus, dene A(xk ) =: xk + sk r+ f (xk ), such that sk is chosen by the usual
optimal line search. Then, we have the following.
Truncated Gradient Alorithm.
Input. Function f on box, [a; b] Rn and initial point, x0.
Iteration. xk+1 = A(xk ) = xk + sk r+ f (xk ).
Note that f (xk+1 ) > f (xk ) whenever r+ f (xk ) =
6 0, so it seems reasonable that this should
converge to a solution. Wolfe, however, found the following counterexample:
f (x) = , 34 (x12 , x1x2 + x22)3=4 + x3;
1
which we restrict to 0 xj 100 for j=1, 2, 3. We shall prove that f is concave on this box
and that the truncated gradient algorithm converges to a non-optimal point, x1 = (0; 0; z ), where
z < 100 for certain starting points.
Concavity
In this section we show f is concave on the non-negative orthant. We have the form f (x1 ; x2; x3) =
, p1 q(x1; x2)p + x3, where q = x21 , x1x2 + x22, so it suces to show qp is convex on R2+ . We have
q = (x1 , x2)2 + x1x2, so q 0 on R2+ , and q = 0 only at x1 = x2 = 0. To see that p1 qp is convex
on R2++ , note its hessian is:
(p , 1)q p,2[rq ]T [rq ] + q p,1 H;
where H is the hessian of q . Divide by q p,2 , so this becomes:
(2x1 , x2)2
(2x1 , x2)(2x2 , x1)
2 ,1
(p , 1)
+q
:
2
(2x1 , x2 )(2x2 , x1 )
(2x2 , x1 )
,1 2
2
3
2
3
6
4
7
5
6
4
7
5
For p = 43 , this becomes:
2
6
4
q + 43 x22
, 12 (q + 23 x1x2)
, 12 (q + 23 x1x2)
q + 34 x21
3
7
5
:
The diagonals are clearly positive, and the determinate of the 2 2 is 23 q 2 , so the hessian of 1p q p is
positive denite on R2++ . This yields the desired result.
Limit Points
p
p
1
Here we prove A(0; v; w) = ( 21 v; 0; w + 21 v ) and A(v; 0; wp
) = (0; 21 v;
p w + 2 v), so that the
k
sequence zigzags about the x3 axis and A (0; v; w) ! (0; 0; w + v =(2 , 2). Thus, for v = 1 and
w = 0, the limit is not optimal.
Suppose x = (0; v; w), so
@f (x)[email protected] = ,(x21 , x1x2 + x22),1=4(2x1 , x2) = pv
@f (x)[email protected] = ,(x2 , x x + x2),1=4(2x , x ) = ,2pv
2
1
1 2
2
2
1
@f (x)[email protected] = 1
p p
Thus, r+ f (x) = rf (x) = ( v; ,2 v; 1).
p
For the linepsearch, we require t 12 v since we mustphave x2 0. We now prove the optimal
value for t is 21 v , thus proving A(0; v; w) = ( 21 v; 0; w + 12 v ).
We have
2
d f (x + tr+ f (x)) = , 2v + 1:
dt
(1=2)1=4
t= 12
Thus, if v is suciently
small, df=dt > 0, so the concavity of f implies the optimal step size is the
p
1
greatest possible, 2 v.
p
p
The proof that A(v; 0; w) = (0; 21 v; w + 12 v) is similar. Moreover, since xk1 and xk2 decrease,
the \suciently small" condition is retained if we start (for example) at x = (0; 41 ; 0).
References
[1] P. Wolfe. On the Convergence of Gradient Methods Under Constraint. Research Report RC1752,
IBM Watson Research Center, Yorktown Heights, NY., 1967.
3
Download