On the use of directions of negative curvature in a modified newton

advertisement
Mathematical Programming 16 (1979) 1-20.
North-Holland Publishing Company
ON THE USE OF DIRECTIONS
OF NEGATIVE
CURVATURE
IN A MODIFIED NEWTON METHOD*
Jorge J. MORI~
Argonne National Laboratory Argonne, IL, U.S.A.
D a n n y C. S O R E N S E N
University of Kentucky Lexington, KY, U.S.A.
Received 17 October 1977
Revised manuscript received 6 June 1978
We present a modified Newton method for the unconstrained minimization problem. The
modification occurs in non-convex regions where the information contained in the negative
eigenvalues of the Hessian is taken into account by performing a line search along a path
which is initially tangent to a direction of negative curvature. We give termination criteria for
the line search and prove that the resulting iterates are guaranteed to converge, under
reasonable conditions, to a critical point at which the Hessian is positive semidefinite. We also
show how the Bunch and Parlett decomposition of a symmetric indefinite matrix can be used
to give entirely adequate directions of negative curvature.
Key words: Unconstrained Optimization, Modified Newton's Method, Descent Pairs,
Directions of Negative Curvature, Symmetric Indefinite Factorization, Steplength Algorithm.
1. Introduction
L e t f : R" ~ R be a c o n t i n u o u s l y d i f f e r e n t i a b l e f u n c t i o n in the o p e n set 9 , a n d
c o n s i d e r the p r o b l e m of p r o d u c i n g a s e q u e n c e {xk} that c o n v e r g e s to a local
m i n i m i z e r x* of f. T h a t is, we seek x* s u c h that
f(x*)<-f(x),
xEN
N~
(1.1)
with N s o m e n e i g h b o r h o o d of x*.
A l g o r i t h m s for the s o l u t i o n of (1.1) are u s u a l l y descent methods. A d e s c e n t
m e t h o d d e t e r m i n e s a d i r e c t i o n Sk at the iterate xk such that Vf(xDXsk < 0. A line
s e a r c h t h e n yields a s t e p - l e n g t h ak > 0 s u c h that
f(xk + akSk) < f(xk),
a n d thus it is s e n s i b l e to let xk+t = xk + akSk. U n d e r s o m e a d d i t i o n a l r e s t r i c t i o n s
o n the c h o i c e of ak o n e c a n show that
lira
k-=
W(xk)+sk =
H
0.
(1.2)
M o r e o v e r , the v e c t o r sk is u s u a l l y r e l a t e d to Vf(xk) in s u c h a w a y t h a t (1.2)
* Work performed under the auspices of the U.S. Department of Energy.
2
J.J. Mor~, D.C. Sorensen/ Directions of negative curvature
implies that {lIVf(Xk)ll} converges to zero. Thus, every limit point x* of {Xk}
satisfies Vf(x*) = O.
It is desirable to produce a sequence which converges to a point x* with
Vf(x*) positive definite. This would imply that x* is an isolated local minimizer
of f, and in particular, that x* satisfies (1.1). In general, it is not possible to
produce such a sequence. However, through the use of directions of negative
curvature we shall be able to produce a sequence which converges to a point x* "
with V2f(x *) positive semidefinite. For practical purposes, this is a very strong
assertion. For instance, if the Hessian were known to be nonsingular at all
critical points, then x* would have to be a local minimizer. To see that
theoretically it is not, in general, possible for V2f(x *) to be positive definite,
consider an example of Wolfe [13]. In that example, steepest descent (and
actually, any reasonable descent method) converges to a saddle point at which
the Hessian is singular. Since the algorithm approaches the saddle point through
at region in which f is strictly convex, there is no possible way to avoid this
saddle point.
The idea of using directions of negative curvature appeared as early as 1968
[5, pp. 165-169], but recently there has been renewed interest [6, 7, 9, 10]. We are
particularly indebted to the paper of McCormick [9]. In that paper, McCormick
showed how a modification of the Armijo line search could be used with
directions of negative curvature. Our Theorem 3.1 is a slight modification of
McCormick's main result; its purpose is to isolate the main ingredients of
McCormick's paper. In this paper we first extend McCormick's work by
considering the practical generation of directions of negative curvature. We
discuss two methods in Section 4. One is based on Gill and Murray's [7] modified
Cholesky factorization, and the other based on Bunch and Parlett's [3]
factorization of symmetric indefinite matrices. We show that Gill and Murray's
method does not satisfy the requirements of our convergence theorems, but that
the Bunch and Parlett factorization can be used to give entirely adequate
directions of negative curvature.
In Section 5 we again extend McCormick's work by replacing the Armijo line
search by a general line search with a satisfactory termination criteria. The
results of this section provide a theoretically justified alternative to Fletcher and
Freeman's [6] ad-hoc line search. Finally, in Section 6 we present our convergence results. In particular, we show how the line search can be used
together with our directions of negative curvature to provide a very effective
modified Newton method.
Assumption 1.1. Let f:R" ~ R have two continuous derivatives on the open set
9 , and assume that for some x0 in 9 , the level set
L(xo) = {x @ 9 :
is a compact subset of 9.
f ( x ) <- f(xo)}
J.Y. Mor~, D.C. Sorensen/Directions of negative curvature
3
Notation 1.2. In all cases 11" II refers to the Iz v e c t o r n o r m or to the induced
o p e r a t o r norm. T h e gradient and H e s s i a n of f at x are d e n o t e d b y Vf(x) and
VZf(x), r e s p e c t i v e l y , but if we h a v e a s e q u e n c e of v e c t o r s {xg}, then fk, gk and Gk
are used instead of f(xk), Vf(xk) and V2f(xk), r e s p e c t i v e l y .
2, Descent directions
T h e search s t r a t e g y w e p r e s e n t d e p a r t s f r o m the usual strategies d i s c u s s e d in
the literature. I n s t e a d of using only one d e s c e n t direction and searching in a line
d e t e r m i n e d b y that direction, w e s e a r c h along a c u r v e of the f o r m
C = {x(et): x(oO = x + ¢h~(a)s + 4~2(a)d,
o~ >-0},
(2.1)
with (s, d) a d e s c e n t pair at x, and with ~bl(0) = ~b2(0) = 0.
Definition 2.1. L e t [ : R n ~ R be twice differentiable in the o p e n set ~.
(a) A point x in ~ is an indefinite point if V2f(x) has at least one negative
eigenvalue.
(b) If x is an indefinite point, then d is a direction of negative curvature if
dTV2f(x)d < O.
( c ) A pair of v e c t o r s (s, d) is a descent pair at the indefinite point x if
V[(x)Zs <-O, Vf(x)Td <--O, and dTV2[(x)d < 0. If x is not an indefinite point, then
w e require Vf(x)Ts < O, Vf(x)Td <- O, and dTV2f(x)d = O.
If x is an indefinite point, then an e x a m p l e of a d e s c e n t pair is s = - V[(x) and
d = +-e w h e r e e is an e i g e n v e c t o r c o r r e s p o n d i n g to a n e g a t i v e e i g e n v a l u e of
V2[(x), and the sign is c h o s e n so that Vf(x)Td <--O. If x is not an indefinite point
and Vf(x) ~ 0.~ then w e can take s = - V[(x) and d = 0. N o t e that a d e s c e n t pair
fails to exist if and only if V[(x) = 0 and V:[(x) is positive semidefinite.
G i v e n a d e s c e n t pair (s, d) at x, we w a n t to p r o d u c e an t~ > 0 such that
f(x(,~)) < fix).
If we let ~l,(a) = f(x(a)), we e n c o u n t e r a univariate minimization p r o b l e m w h e r e
q~" is c o n t i n u o u s as long as ~b'~'and ~b~ are continuous. If q~'(0)< 0, or if 4"(0) = 0
and 4~"(0) < 0, then it is clear that there is an 6 > 0 such that
f(x(a)) <f(x)
a ~ (0, a l .
(2.2)
T h e following l e m m a i m p r o v e s on this result.
L e m m a 2.2. Let ~ :
R~R
be twice continuously differentiable on the open
interval I which contains the origin, and assume that p~ E (0,1). Then there is an
ti > 0 in I such that
,/'(~) -< ¢,(0) + #.[,t,'(O)a + ½,/,"(0),~ 2]
for all a ~ [0, &] provided that either ~ ' ( 0 ) < O, or ~'(0) = 0 and ~"(0) < O.
4
J.J.
Mor~, D.C. Sorensen/Directions of negative curvature
Proof. The mean value theorem implies that for every t~ > 0 there exists 0 E
(0, a ) such that
q,(,~) = ~ ( o ) + q , ' ( o ) a + ~ 4~"(o) a 2 + ½ [ ~ " ( o ) - ~ " ( o ) ] ~ 2.
Hence,
tb(a) = qb(0) + ~[@'(0)a + ½@"(0)a z] + r(a),
where
r(a-) = (1 - / z ) [ ~ ' ( 0 ) a + ½~"(0)a 2] + ½[tb"(0) - tb"(0)la 2.
Since
li-~ r(-~a2)< 0,
a--,,O+
there exists an d > 0 such that r(a) < 0 for all a E [0, d].
This lemma states that the ratio of the reduction in the function to the
reduction in the quadratic approximation
¢,(0) + ¢~'(0)a + ½¢"(O)a z
is bounded below by/x. It also tells us that (2.2) can be satisfied, and that a larger
decrease is likely when qb"(0)< 0.
We want to use the simplest functions ¢1 and ¢2 which will guarantee that the
hypothesis of L e m m a 2.2 is satisfied. Observe that if tb(a) = f ( x ( a ) ) with x ( a ) as
in (2.1) then
cb'(O) = Vf(x)T(¢~(O)s + ¢~(0)d),
(2.3)
• "(0) = Vf(x)T(¢'~'(O)s + ¢~(0)d)
(2.4)
+ (¢~(0)s + d~(O)d)rVZf(x)(~b~(O)s + ¢~(0)d).
Suppose now that Vf(x)Ts = 0 at an indefinite point (this occurs, for instance, at
a saddle point). Then in order to ensure that ~'(0)-< 0 and ¢ " ( 0 ) < 0 without
imposing further conditions on s, we must require that ¢~(0)= 0, ¢~(0)> 0, and
&~(0)-> 0. Then (2.3) and (2.4) simplify to
• '(0) = Vf(x)T(&~(O)d),
(2.5)
cI)"(O) = Vf(x)T(~b'~'(O)s + &~(0)d) + (&~(0)d) ~ e2f(x)(&~(O)d).
(2.6)
When Vf(x) is positive definite, then d = 0 must be satisfied in order for (s, d) to
be a descent pair. Thus ¢'(0) = 0, and we must have &]'(0) > 0 in order to ensure
4~"(0) < 0. Therefore, if
4~1(~) i=o
~ ¢3,aj,
4,~(a) - _
i=o
"/,~',
then we must have /3o =/31 = 0 with/32> 0, and yo = 0 with yl > 0. The simplest
functions of this type are
¢,(a)
= a 2,
¢ 2 ( a ) = a.
J.J. MorE, D.C. Sorensen/ Directions of negative curvature
5
3. A modification of the Armijo steplength procedure
The arguments of Section 2 lead to iterations of the form
Xk+l =: Xk ~- Ot2Sk q- otkdk
where ak is chosen so that at least fk+~ < fk. There are several ways to choose the
steplength Otk, and in this section we present a generalization of a result of
McCormick [9] in which ak is chosen by a modification of the Armijo steplength
procedure [2].
To describe t]he steplength algorithm, let % / z E (0, 1) and x0 E R" be given. If
(Sk, dk) is a descent pair at Xk, then xk+l defined by choosing the smallest
non-negative integer i such that
Xk+I =: Xk + "yEisk -~ "yidk (~ 9 ,
(3.1)
fk+l ~-: fk -t- IzT2i[gff Sk q- l dT Gkdk].
(3.2)
L e m m a 2.2 shows that the iterates are well defined, and if a descent pair does
not exist at Xk, then we accept Xk as a solution to (1.1).
Theorem 3.1. Let f : R " ~ R satisfy Assumption 1.1, and suppose that
{lldkH} a r e bounded. If {Xk} satisfies (3.1) and (3.2) then
{lls llt
and
lim gTsg = 0
(3.3)
lim dTGkdk = 0.
(3.4)
k--~o
and
k--~¢
Proof. The sequence {fk} is decreasing and bounded below due to the continuity
of f and compactness of L(xo). Thus 0Ck--fk+l} converges to zero. If ik is the
smallest non-negative integer such that (3.1) and (3.2) hold, then there are two
cases to consider.
Case 1. Suppose the integers {ik} are bounded above by some m -> 0. Then
fk -- fk+, >----P/y2m[g[Sk + ½dTGkdk].
Since
--gTkSk ~ 0
and
-dTGkdk>--0
the conclusion follows.
Case 2. The integers {ik} are not bounded above. Without loss of generality,
assume that limk_.+~ ik = +oo. If we define
~l)k(Ol) = f(Xk + olEsk "Jr-adD,
trk ~---"~(ik-l),
then by the definition of ik,
C1)k(Ork)7>A "JvI~Or2k[gT sk + ½dXkGkdk].
(3.5)
6
J.J. Mor6, D.C. Sorensen/Directions of negative curvature
However, due to our assumptions on f and L(x0), a Taylor series argument and
the fact that g~dk <--0 may be used to show that
2
T
~k(trk) <- fk + O'k[gkSk
+ ½d~kGkdk] + r(Xk, Sk, dk, ~k)
(3.6)
lira r(Xk, Sk, dk, trk)
k-,+=
tr]
= 0.
(3.7)
with
Hence, combining (3.5) and (3.6) gives
r(xk, sk, dk, crk)
--> - (1 - tz)[g~sg + ½d~Gkdk].
(3.8)
The conclusion follows from (3.7) and (3.8).
The result presented by McCormick [9] did not specify a choice of Xk+~ when
Xk was not an indefinite point, and only required fk+1 <--fk. In the case that xk is an
indefinite point, McCormick chooses Xk+~ according to (3.1) and (3.2) with 3,2= ½
In addition, the vectors Sk and dk must satisfy
rls ll-> c31[g ll,
(3.9a)
d~Gkdk <-- C2Aak,
(3.9b)
-- S[gk >- C,IISkI[ Ilgkl[,
(3.9c)
where Ack is the most negative eigenvalue of Gk, and c~, c2, and c3 are positive
constants. More specific choices of Sk and dk were not suggested. With these
assumptions, McCormick is able to conclude that if infinitely many indefinite
points {Xk) were to occur in the sequence {xk}, then any point of accumulation x*
of the sequence {Xk) must satisfy Vf(x*) = 0, and V2f(x *) is positive semidefinite
with at least one zero eigenvalue.
Since Armijo type steplength procedures do not take into account any
information about the shape of the function along the curve x(a), we are
interested in investigating more sophisticated strategies for determing the steplength ak. Thus, in the rest of this paper we shall be concerned with a
steplength procedure which specifies criteria for terminating a univariate search
along curves x(a), and with specific choices for (Sk, dk). Finally, a convergence
result will be given which indicates that these choices are quite reasonable.
4. Determining directions of negative curvature.
If a direction of negative curvature is to be useful, then it must satisfy at least
two requirements:
(lldkll} bounded,
d~Gkark ~ 0
implies
(4.1)
)tG~--* 0.
(4.2)
J.J. Mord, D.C. Sorensen/ Directions of negative curvature
7
Here ~tGkis the most negative eigenvalue of Gk when x k is an indefinite point and
zero otherwise. Note that McCormick's condition (3.9b) is of this type. Both of
these conditions are required by our convergence theorems; intuitively, they
force the iterates towards a region of convexity f o r / .
It is possible to satisfy (4.1) and (4.2) quite easily if we have the eigenvalue
and eigenvector decomposition of Gk. H o w e v e r , this decomposition is quite
costly, and thus we examine other matrix factorizations. In particular, we
discuss in some detail Gill and Murray's modified Cholesky factorization [7], and
Bunch and Parlett's factorization [3].
Given a symmetric matrix A and parameters 8 - 0 and /3 > 0 , Gill and
Murray,s algorithm produces a unit lower triangular matrix L and diagonal
matrices D = dJiag(di) and E = diag(Ei) with di---0 and ~ - 0 such that A + E =
L D L T. The jth step of the algorithm sets
ljk = Cjk/dk,
l <-k < j,
c~j = a~j - ~
ljkCik, i >~j,
k=l
dr =
0j -- max{lcijl: i > j},
e~ = d j -
max(8,
Ic jl,
cjj.
Note that if 8 := 0, then it is possible that dj = 0, but in this case set ljk = 0 for
l<_k<j.
Gill and Murray [7] showed that if 8 = 0, then it is possible to use this
factorization to obtain a direction of negative curvature. The following lemma is
a slight modification of their results.
Lemma 4.1. L e t A ~ R n×n be s y m m e t r i c , and a s s u m e that
/32> max{laiil: i = 1 . . . . .
n}.
I f the integer m satisfies
Cmm<--Cjj, j = l . . . . .
n,
and LXp = era, then p r A p <-c,,,,. Moreover, if 8 = 0 and A has at least one
negative eigenvalue, then Cram< O.
Proof. If LTp = era, then p,- = 0 for i > m. Moreover, since A + E = L D L T,
m-1
pTAp =dm --pTEp = dm - e , . - ~ Eip 2,
i=1
and thus p r A p <- Cram. NOW assume that 8 = 0. If Cram>--0, then
i-I
i-1
aii ~: ~ l likCik = 2 I 2ikdk
k=l
8
J.J. Mor~, D.C. Sorensen/Directions of negative curvature
for 1 -< i -< n, and thus
1 2 d i ~ a i i < f l 2, j < i .
If dj~ 0 this implies that c 2-< fl2dj for j < i, and hence, 0 2 < fl2dj. Thus
di = Icsjl = cjj,
and therefore, ~j = 0. If dj = 0 then clearly ej = 0. It follows that E = 0, and in
particular, that A = LDL~C. We have now shown that if Cram-> 0 then A must be
positive semidefinite, and thus L e m m a 4.1 holds.
Gill and Murray [7] allow equality to hold in (4.3) and show that if A is
indefinite, then pTAp < 0. H o w e v e r , if we allow equality in (4.3) then we lose a
very nice consequence of L e m m a 4.1; namely, that Cram- 0 if and only if A is
positive semidefinite. In fact, if
1
then A is indefinite, but if we allow fl = 1 then cll = ½and c22 = 0.
Can we use L e m m a 4.1 to construct directions of negative curvature which
satisfy (4.1) and (4.2)? Unfortunately, the answer is no. To see this, let {Gk} be a
sequence of matrices with at least one negative eigenvalue, and let {Pk} be the
sequence of negative curvature directions generated as in L e m m a 4.1. If we
choose dk = Pk, then
dTrGkdk < ~(k)
--
~
mnl
implies that (4.2) holds provided that {Gk} is bounded. H o w e v e r , the following
example shows that {HPk[I}m a y not be bounded.
Example. 4.2. L e t
where ak ~
(0,1). We can choose fl = 1, and then it is fairly easy to show that
1,
Pk = ( - Olk
1) T.
Thus, if ak-~ 0, then {I[Pk[[}is not bounded.
Note that if we normalize Pk and consider dk = Pk[[IPk[[ as our direction of
negative curvature, then in this example
3 o~2
dlGkdk = -~
jr../. Mor~, D.C. Sorensen/Directions of negative curvature
9
Thus, if dTGkdk, then
and hence (4.2) does not hold.
The factorization of Bunch and Parlett avoids the above difficulties. Given a
symmetric matrix A, this factorization consists of a permutation matrix Q, a
block diagonal matrix D, and a unit lower triangular matrix M such that
QAQ T= M D M T. The matrices M and D have the following properties:
(a) The elements of M are bounded by a fixed positive constant which is
independent of the matrix A.
(b) D is a block diagonal matrix with one-by-one or two-by-two diagonal
blocks.
(c) D has the same number of positive, negative, and zero eigenvalues as A
(Sylvester's Inertia Theorem).
(d) The number of two-by-two diagonal blocks plus the number of negative
diagonal elements which occur as one-by-one diagonal blocks of D is equal to
the number of negative eigenvalues of A. If A is positive semidefinite, D is a
diagonal matrix with nonnegative diagonal elements.
The following lemma shows how this factorization can be used to obtain
directions of negative curvature which satisfy (4.1) and (4.2).
Lemma 4.3. L e t A = W B W T where W E
R nxn is nonsingular, B E R ~×~ is symmetric, and A has at least one negative eigenvalue. Let {zj: j = 1,2 . . . . . m} be
orthonormal eigenvectors f o r B corresponding to eigenvalues
AI~A2~...~Am<O
,
and f o r some 1 <- l <- m define y and z by
1
(4.4)
w r y = z = ~ z i.
j=l
Then
X A --~
(yTAy)llwll 2 _
l[K2(W)] = yTAy/llYll2
where AA is the smallest eigenvalue o f A, and
condition number o f W.
(4.5)
x2(W) = IIWll IIw-ill
Proof. If x is a unit eigenvector for A corresponding to
if u = W~x, then
~t A
is the Euclidean
then xTAx = )kA,and
AA = xTAx = uTBu >--Xdlull2.
Moreover, since
Ilull-< Ilwll
xA -> x 111wll 2.
and A1 <0,
(4.6)
10
J.J. Mor~, D.C. Sorensen/ Directions of negative curvature
Now note that from (4.4)
yTAy =
Zj B
zj = J=l )~j ~ )tl,
and therefore (4.6) implies that
(yTAy)II W][2 -- AlllWl[2 ---/~A.
To prove the second inequality in (4.5) use the inequality
Ilyll2_<llw-lll 2
zj
=lllW-'ll 2,
To obtain the desired result.
If L e m m a 4.3 is to be useful, then w r y = z must be easy to solve. Also, the
eigensystem of B must be readily available, and the factorization A = W B W r
should be relatively cheap to compute. These requirements rule out a full
eigensystem decomposition of A and also the factorization of Aasen [1] which
gives B in tridiagonal form. H o w e v e r , the B u n c h - P a r l e t t factorization certainly
satisfies all these requirements with the additional feature that x2(W) has a
bound that is independent of A.
Finally, with the B u n c h - P a r l e t t decomposition, it is very easy to satisfy (4.1)
and (4.2). To see this, note that if {Yk} is the sequence of negative curvature
directions generated by L e m m a 4.3, then (4.1) and (4.2) holds for dk = Yk
provided {llWkll} and {llw;'ll} are bounded. Fletcher and Freeman [6] have
suggested a direction of negative curvature ,which corresponds to l = m in
L e m m a 4.3, but (4.5) implies that l = 1 may be a slightly better choice.
5. A steplength algorithm
Once a descent pair (s, d) has been determined at a point x then we are faced
with the problem of determing a such that
f(x(a)) <-[(x)
where x ( a ) = x + a2s + ad. One solution would be to determine a such that
f ( x ( a ) ) = min{f(x(A)): A ---0}
(5.1)
but this is a very difficult computational problem. It is computationally more
desirable to replace the problem of satisfying (5.1) exactly with the specification
of criteria for terminating a univariate minimization procedure that is designed to
solve (5.1).
Such an approach is motivated by the success of previous algorithms which
have been used when a single descent direction is specified. Given a descent
direction s at a point x, one such algorithm is to terminate the line search when
./.3. MorE, D.C. Sorensen! Directions of negative curvature
11
an a has been found which satisfies
Vf(X "4"a s ) T s ~ nVf(x)Ts,
(5.2)
[(X q- OlS) <~f ( X ) q- Ol~.bVf(x)T s
(5.3)
and
where 0 < / x -< ~l < 1 are preassigned constants. If a sequence of points {Xk} are
determined where Xk+~= Xk + akSk with x = Xk, S = Sk, a = ak satisfying (5.2) and
(5.3) for each k, then it can be shown [8] that
lira g Tsk =
k-->~]s--'~
0.
(5.4)
Usually gk and sk are related so that (5.4) implies IIg~ll-~ 0 which in turn implies
Ilskll--, 0. Thus it is concluded that Ilxk+,- x~ll-~ 0 and IIg~ll-~o as long as the ak are
bounded. This enough to ensure that {Xk} converges to a critical point of f due to
the following lemma given in [11, p . 476].
Lemma 5.1. L e t f : R" --* R be continuously differentiable on the c o m p a c t set 90,
and a s s u m e that f has a finite n u m b e r o f critical points in 9o. I f {Xk} C 9o is a
s e q u e n c e such that
limllxk+,- xdl = O,
k~
limllgkll = O,
k~
then {Xk} converges to s o m e x* E 9o with V f ( x * ) = O.
The geometrical interpretation of (5.2) and (5.3) is depicted in Fig. 1. Here a *
is the smallest positive a which satisfies (5.2), and it is clear that (5.2) guarantees
that a*llsll is not too small unless V f ( x ) is also small along the direction s.
Condition (5.311 forces f ( x + a s ) to lie below the top line of Fig. 1, and thus
guarantees sufficient decrease of the function, Algorithms which use (5.2) and
(5.3) as termination criteria for their line searches are further discussed in [8, 11].
The termination criteria we shall give may be viewed as an extension of these
ideas to the situation when an iterate Xk is an indefinite point. We replace (5.2)
and (5.3) with the following rule. If (s, d) is a descent pair at x, then we terminate
\
T
~
a/.Lg s
f (x(a))
~-Ct
~C
+ aqTgTs
Fig. 1.
J.J. Mor~, D.C. Sorensen/ Directions of negative curvature
12
the search when a has been found which satisfies
W(x(o,))T x'(o,) >- ,t Vf(x)T a + 2a(W(x)
s +ld
v2S(x)d)].
f ( x ( a ) ) <--f ( x ) + tza 2[ef(x)X s + ½d TvZf(x)d],
(5.5)
(5.6)
where x(a) = x + a2s + ad and 0 </.t -< rl < 1. Note that when d = 0 these conditions reduce to those of (5.2) and (5.3).
Conditions (5.5) and (5.6) also have a geometrical interpretation as shown in
Fig. 2. Note that in this case the function f ( x ( a ) ) is concave near a = 0 because
if (s, d) is a descent pair at x and
c19(a) = f ( x + a2s + ad),
then ~"(0) < 0. Also note that (5.5) and (5.6) are equivalent to
q~'(a) -> rt[q~'(O) + q~"(O)a],
(5.7)
4,(a) _< ¢,(0) + ½ttq,"(0)a 2.
(5.8)
The role of the upper and lower curves in Figs. 1 and 2 is similar; the lower
curves are one parameter families with the parameter c chosen so that the point
of tangency o~* between the curve and f ( x ( a ) ) is smallest. Note that t~* is also
the smallest a > 0 which satisfies (5.5), and that if a * is small, then (5.7) shows
that ~"(0) must also be small. On the other hand, (5.6) guarantees a sufficient
decrease in the function, and if a * is not small, then (5.8) shows that in this case
q~"(0) must be small. Thus, in either case q~"(0), and hence, Vf(x)Ts and
drV2[(x)d are forced to zero. If s and d are properly chosen, then both [IVf(x)l[
and the smallest eigenvalue of VZf(x) must go to zero and, in particular, the
inflection point which occurs to the left of a * must either be crossed or
become"flattened out". These arguments will be made precise by the convergence
theorems of Section 6.
f
/
f
Fig. 2.
J.J. Mor~, D.C. Sorensen/ Directions of negative curvature
13
We note with Fletcher and Freeman [6] that if a direction d of negative
curvature alone :is used (taking s = 0), then the condition
IW(x +
~a)wdl
<- -
nW(x)Td
is inappropriate for termination of the linear search because V[(x)Td may be
close to zero even far away from a minimum. T h e y found it necessary to give
termination criteria based on an estimate of the first derivative of f ( x ( a ) ) at the
inflection point. The estimate was obtained from the value of the derivative of a
related quartic polynomial at its corresponding inflection point.
The following lemma will show that conditions (5.5) and (5.6) can be satisfied
whenever a desent pair exists at a point x.
Lemma 5.2. Let cp :R--->R be twice continuously differentiable in an open interval
I which contains the origin, and suppose that
{a ~ I: ~ ( a ) -< ~(0)}
is compact. Let ~z E (0, 1) and ~7 E [t~, 1). I f ~'(0) < O, or if rip'(O) <_0 and ~"(0) < O,
then there is an a > 0 in I such that
• '(~)-> n [ ~ ' ( 0 ) + ~"(0)a],
(5.9)
and
¢ (a) <- • (0) + tz [~'(O)a )a + ½¢"(O)a 2].
(5.10)
Proof. L e t
/3 = sup{a E I: ~(ot) --< ¢(0)}.
Then /3 > 0 since either q~'(O)<0, or ~'(0)--<0 and ¢ " ( 0 ) < 0 . Moreover, the
compactness assumption and the continuity of qb imply that/3 is finite and that
• (0) = ~(/3). Thus
¢(/3) -> ~(0) + ~[~'(0)/3 +-~"(0)/32].
(5.11)
Define h : I ~ R by
h(a) = ~ ( a ) - ¢(0) - n [ ¢ ( 0 ) a + ½qb"(0)a2].
Since t~---~ we have h(/3)-> 0. Note also that h ( 0 ) = 0 and either h ' ( 0 ) < 0 , or
h'(0)-<0 and h " ( 0 ) < 0 . This, together with the continuity of h, implies the
existence o f / 3 1 E (0,/3] such that h(/31) = 0, and h(a) < 0 for all a ~ (0,/31). Now
Rolle's theorem shows that there is an a E (0,/30 with h'(a) = 0, and thus (5.9)
follows. Moreover, h(a) < 0 and/.~ -< ~7 imply (5.10).
6. Convergence of the modified Newton iteration
Now we turn our attention to defining a modified N e w t o n iteration. We shall
give a convergence result based on the use of descent pairs and the step-length
14
J.J.
Mor#, D.C. Sorensen/ Directions of negative curvature
algorithm discussed above. The proof proceeds in two parts. The first result is
somewhat independent of the definition of the iterates. The second part will use
the particular way in which the iterates are defined to establish convergence.
To define the iteration let (Sk, dk) be a descent pair at xk, and let
~k(a) = f(Xk + a2Sk + adk).
If ~ E (0, 1) and r / E [/z, 1), then ak > 0 is determined such that
Xk+ l : X k q- Ol2Sk q- a k d k ~ ~ ,
(6.2)
fk+~ <--fk + ½/Z~(O) a ~,
(6.3)
O'k(ak) >--r/[O~,(O)+ ~(O)ak].
(6.4)
One might note that due to (5.10) in the statement of L e m m a 5.2, we could
require
f k+l ~ fk ..~ ~[l~tk(O)ol k + ~O'~
I t (0)a 2k]
instead of (6.3). However, the additional term does not enhance the convergence
result in any way, while it does give a more stringent requirement to be satisfied
by the univariate search.
Theorem 6.1. Let f : R n ~ R satisfy assumptions (1.3), and let {llSkll}and {[[dkl[}be
bounded. I f {Xk} satisfies (6.2), (6.3), and (6.4), then
lira g[Sk = 0
(6.5)
lim d~Gkdk = 0.
(6.6)
k~+oo
and
k---~+~
Proof. F r o m (6.1) we have ~ ( 0 ) =
gTdk and
4Y~(O) = 2g~Sk + d~Gkdk.
Since (Sk, dk) is a descent pair, ~ ( 0 ) - 0, and O'~(0)< 0. Thus (6.3) implies that
{Xk} C L(Xo), and by the continuity of f and compactness of L(xo) we have that
{fk -- fk+l} converges to zero. Since
1
t
2
A - A + l - - ~ z ~ ( O ) a k > O,
it follows that
lira a ~ ~Sk = O,
(6.7)
k--~
and
lim a 2fflTkGkdk = 0.
k--*~
(6.8)
jr../. Mor~, D.C. Sorensen/ Directions o[ negative curvature
15
From condition (6.4) we obtain
q~[(C~k)-- q~[(0) -- akq¢~(0) --> -- (1 -- ~)[q~[(0) + qe~(0)ak],
and hence
'P[(ak) - ,P[(0) - ak'P'~(0) -> -(1 - n)4¢~ (0)ak.
An application of the mean value theorem now yields that for some 0k E (0, ak),
q¢~(0k) -- 4¢~(0) --> -- (1 -- ~)q¢~(0).
(6.9)
The desired result now follows readily, for if either (6.5) or (6.6) do not hold,
then there is a subsequence {ki} and a o- > 0 such that
- ~ , ( 0 ) - or > 0.
(6.10)
H e n c e (6.9) implies that {ak~} does not converge to zero. H o w e v e r , if {ak~} does
not converge to zero and (6.10) holds, then (6.7) and (6.8) cannot be satisfied.
This contradiction establishes the theorem.
If we consider practical methods for determining e~k > 0 which satisfies (6.3)
and (6.4), then we are led to a steplength rule which c a n be analyzed by a
combination of Theorems 3.1 and 6.1.
Steplength rule SR (/z, ,7,/3). Given a fixed/3 > 0, apply a univariate minimization
algorithm to ~.Ok(a) and terminate the search if an 6k E (0,/3] is found such that
(6.2) and (6.4) are satisfied with 6k in place of ak. If these conditions cannot be
satisfied, then usually
• ~(/3) = min{ePk(a):a E [0,/3]},
(6.11)
(note that thi,,; may not hold if (/)k is not defined for all a E [0,/3]) and thus it is
reasonable to let 6k = /3. Now, if (6.2) and (6.3) are also satisfied with t~k in place
of ak, we accept ak = 6k; otherwise we take Ok to be the largest element of the
set {2-i: i = 0, 1. . . . } such that (6.2) and (6.3) are satisfied with 5kO~ in place of ak,
and then accept ak = 6ktOk.
It should be Clear that the proofs of Theorems 3.1 and 6.1 show that (6.5) and
(6.6) also hold for SR(t~, */,/3). Our next result will show that the iterates defined
by this steplength rule converge to a critical point of f where the Hessian is
positive semJidefinite. It is here that specific properties of the descent pairs
(Sk, dk) are crucial.
Theorem 6.2. Let f : R " ~ R satisfy assumptions (1.3), and in addition, assume
that f has a finite number of critical points in L(xo). Let {llskll} and {lldkll} be
bounded, and suppose that
g~sk ~ O
implies
gk->O and sk ~ O
(6.12)
J.Z MorL D.C. Sorensen/ Directions of negative curvature
16
and
dkTGkdk'--~O
implies
XGk~0 and dk~O.
(6.13)
If {Xk} satisfies (6.2) where Olk is chosen by SR(/~, n,/3), then {Xk} converges to
some x* in ~ where V[(x*) = 0 and V2[(x *) is positive semidefinite. Moreover, if
infinitely many of the Xk are indefinite points, then V2[(x *) is singular.
Proof. From (6.12) and (6.13) we have that
Thus
{lls~l[} and {lld~ll} converge
to zero.
IIx~+1- xdl-/3Zllskll +/3lld~ll
that
x ll) converges to
implies
zero. Therefore L e m m a 5.1 applies and we
obtain that {Xk} converges to some x* in ~ with Vf(x*) = 0. Since ;tak ~ 0 we also
have that V2f(x *) must be positive semidefinite. Finally, if infinitely many of the
Xk are indefinite points then the continuity of V2f(x *) implies that Vff(x*) is not
positive definite, and hence V2f(x *) must be singular.
Many choices of Sk are possible which satisfy (6.12). Indeed, if Ak is any
sequence of symmetric positive definite matrices such that {llAkl[} and {IIAZ'II} are
bounded, then choosing Sk as the solution of
AkSk = -- gk
will satisfy (6.12). Also, in Section 3 we showed how to choose dk at an indefinite
point so that (4.1) and (4.2) are satisfied. The additional requirement that dk must
satisfy is obtained if we replace dk with -+~b(A~k)dk where ~b : R ~ R is a positive
function such that qb(tk)--~O implies that tg-~0, and where the sign is chosen to
make g~dk <--O.
The iterates should also reduce naturally to Newton's iteration as soon as a
region is found where the Hessian is positive definite. Indeed, the main motivation for this strategy is to obtain the iterates using second derivative information
which is based on the true quadratic model at each xk. Of course, it is expected
that in practice very few indefinite points will be encountered during the iterative
process. In fact, Theorem 6.2 indicates that the strategy we have presented
actively seeks a region where the Hessian matrix is positive semidefinite. If, for
example, the Hessian vE[(x) is nonsingular whenever x is a critical point of [
then only finitely many of the iterates can be indefinite points.
Finally, we shall suggest a way to obtain the descent pairs (Sk, dk) which
satisfy all of the requirements of Theorem 6.2. In our description we assume that
Gk = MkDkM~ is the Bunch-Parlett factorization of the Hessian. Thus we have
omitted explicit representation of the permutations Qk which will be present in
practice. We obtain Sk as the solution of
( M ~ k M ~ ) s = -g~
where DR
=
Uk~kkUk T
is obtained from Dk by first computing the eigensystem
J.J. Mole, D.C. Sorensen[Directions of negative curvature
17
Dk = UkAgU~ ,of Dk and then replacing the diagonal elements k~k) of Ak with
max{IX1%
l~j---n
maxlX!%
where ~ is the relative machine precision. In the decomposition of Dk we have
U~Uk = ! and Ak diagonal, and note that only O(n) arithmetic operations are
required to obtain/3k from Dk.
The negatiw~ curvature direction dk is obtained as the solution to
M~'dk = +-Ikokll/2sk
where kok is tlhe most negative eigenvalue and Zk the corresponding unit eigenvector of Dk. When Dk does not have a negative eigenvalue, we take dk = O.
If f:R"--*R and x0 satisfy the assumptions of Theorem 6.2, then the
compactness of L(xo) and the continuity of Vff imply that {Gk} and {gk} are
uniformly bounded. Thus (Sk, dk) satisfy the requirements of a descent pair as
well as (6.12) '.and (6.13).
The above choice of (Sk, dk) is somewhat ad hoc and we make no mathematical
statements concerning the desirability of this choice. However, computational
results show that this specification of (Sk, dk) works reasonably well in practice.
We wish to emphasize that many other choices are possible.
We have not addressed the problem of providing an initial step a to the
univariate search. Many strategies for determining the initial step are possible.
However, we have not found a strategy with enough theoretical basis to
recommend it over something very simple such as taking the initial step to be
a = 1 each time. Note, however, that whatever strategy is chosen must eventually take a = 1 in order to retain the local quadratic rate of convergence
enjoyed by Newton's method.
7. Concluding remarks
It is possible to generalize Theorems 3.1 and 6.1 so that they apply to more
general curves. Consider
~k(a) = [(Xk + qbl(a)Sk + &2(ct)dk)
(7.1)
where (sk, dk) is a descent pair, and &l and &2 are such that ~ ( 0 ) - < 0 and
~ ( 0 ) < 0. Instead of (3.1) and (3.2), we can define Xk+t by choosing the smallest
nonnegative integer i such that
Xk+ 1 = X k -~ ¢ l ( ' y i ) S k
"~- q~2(yi)dk E
fk+~ < fk + t~ [¢~(0)Y i + ~
3,
(0)y2i].
(7.2)
Theorem 3.1 generalizes, and instead of (3.3) and (3.4) we obtain
lim ~ ( 0 ) = 0
k--*+oo
and
lim ¢~(0) -- 0.
k~+oo
(7.3)
18
J.J. Mor~, D.C. Sorensen/ Directions o[ negative curvature
Similarly, to generalize Theorem 6.1 we replace (6.2) and (6.3) by
Xk+l = Xk + ~pl(Otk)Sk -t- qb2(Otk)dk E ~ ,
fk+l ~ fk]-L[~k(0)ak -~-l{~k (0) O/2],
(7.4)
respectively, but leave (6.4) unchanged except that n o w ~Dk is defined by (7.1).
Once again, the conclusion is that (7.3) holds.
The analysis of the algorithm presented in this paper did not require the term
qb~(0) in (7.2) and (7.4). However, Mukai and Polak [10] have proposed an
algorithm which requires these terms. To describe their algorithm at an indefinite
point, let ek be a unit eigenvector corresponding to the smallest eigenvalue of Gk
such that gTek<--O. Now choose ~1(O~)=~2(a)=O£ and Sk=--l~kgk, d k = A k e k
where if hk = -- gk d- ek, then Ak ---- 1 whenever hTGkhk ~ O, and otherwise }[k "~ [~ik
where fl ~ (0, 1) and ik >-- 0 is the smallest integer such that
•
gXkhk
It is not too difficult to show that for this choice of descent pair, (7.3) implies
that {llgkl[}and {e~Gkek} converge to zero. Thus, the results of Mukai and Polak [10]
can be obtained by applying the above generalization of Theorem 3.1 to their
algorithm, and in fact, our results also cover the much simpler variation of their
algorithm where Ak ---- 1.
It is also worth noting that Theorems 3.1 and 6.2 also hold if instead of
assumptions (1.3), we assume that f : R n - > R is bounded below on L(x0), and ~72f
is uniformly continuous and bounded on some convex set which contains L(xo).
Of course, the ultimate test of any theory is whether it yields a robust
algorithm. Sorensen [12] has implemented a version of the algorithm described at
the end of Section 6, and has obtained excellent numerical results. We will only
present here the results for one problem function.
Box' s
]:unction
lo
[ ( x ) ~ { e -xl' - e -x28' - x3(e -a' - e-l°~')} 2
i=1
where 8/= i/lO.
The results obtained for this function are fairly typical. When the algorithm
was used with the standard starting point x0 = (0, 20, 20), only two indefinite
points were encountered. In general, the use of standard starting points does not
fully reveal the performance of this algorithm, because some of the standard
starts are in regions such that little or no negative curvature is encountered
during the iteration. However, when started from a set of ten random starting
points, many more indefinite points were encountered. These results are summarized in Table 1 where N E G C N T is the number of indefinite points encountered during the iteration. Note that for each starting point there are two entries.
The first is from Sorensen's algorithm, while the second is the result of a version
J.Z Mor(, D.C. Sorensen/Directions of negative curvature
19
Table 1
#
1
2
3
4
5
6
7
NITER
NFEV
FINAL Ilgll2
NEGCNT
25
36
1 x 10-25
21
24
140
4 × 10-8
16
17
2×
36
97
1 x 10-24
14
15
3 x 10-31
27
70
5 X 10 -27
20
26
1 × 10-20
20
20
6 x 10-33
26
42
0.0
37
118
5 x 10-25
22
41
1×
10 -32
17
26
64
1 x 10-2z
18
20
33
1 × 10-22
18
19
26
1×
18
22
4 x 10-2s
9
9
2 × 10TM
16
20
5 × 10-25
14
14
1 × 10-28
12
16
8X
33
95
2 × 10-25
10 TM
3
3
2
22
10 -27
3
8
1
9
10 -32
7
lO
o f Gill and M u r r a y ' s [7] m o d i f i e d N e w t o n m e t h o d . S o r e n s e n ' s i m p l e m e n t a t i o n
used the B u n c h - K a u f m a n
partial p i v o t i n g a l g o r i t h m [4] r a t h e r t h a n the B u n c h -
P a r l e t t c o m p l e t e p i v o t i n g a l g o r i t h m . W i t h this f a c t o r i z a t i o n w e do not, in t h e o r y ,
h a v e b o u n d e d n e s s o n K2(Mk). H o w e v e r , in p r a c t i c e w e do n o t e x p e c t to e n c o u n ter g r o w t h in the e l e m e n t s o f Mk.
F i n a l l y , w e n o t e t h a t the c o m p u t a t i o n s w e r e d o n e o n the I B M 370/195 at
A r g o n n e N a t i o n a l L a b o r a t o r y in d o u b l e p r e c i s i o n (14 h e x a d e c i m a l digits or
a b o u t 15 d e c i m a l digits) u n d e r t h e F O R T R A N
H ( o p t = 2) c o m p i l e r . A l s o , b o t h
a l g o r i t h m s r e p o r t e d p o s i t i v e definite H e s s i a n s at the s o l u t i o n , a n d c o m p a r a b l e
c o n v e r g e n c e c r i t e r i a w e r e specified f o r b o t h a l g o r i t h m s .
20
J.Z Mord, D.C. Sorensen/ Directions of negative curvature
Acknowledgment
W e w o u l d like to t h a n k K e n H i l l s t r o m for a l l o w i n g us to use his t e s t i n g p a c k a g e ,
a n d J u d y B e u m e r for h e r s u p e r - d u p e r t y p i n g of the m a n u s c r i p t .
References
[1] J.O. Aasen, "On the reduction of a symmetric matrix to tridiagonal form", BIT 11 (1971)
233-242.
[2] L. Armijo, "Minimization of functions having Lipschitz continuous first partial derivatives",
Pacific Journal of Mathematics 16 (1966) 1-3.
[3] J.R. Bunch and B.N. Parlett, "Direct methods for solving symmetric indefinite systems of linear
equations", SIAM Journal on Numerical Analysis 8 (197l) 639--655.
[4] J.R. Bunch and L. Kaufman, "Some stable methods for calculating inertia and solving
symmetric linear equations", Mathematics of Computation 31 (1977) 163-179.
[5] A.V. Fiacco, and G.P. McCormick, Nonlinear programming: sequential unconstrained minimization techniques (Wiley, New York, 1968).
[6] R. Fletcher and T.L. Freeman, "A modified Newton method for minimization", Journal of
Optimization Theory and Applications 23 (1977) 357-372.
[7] P.E. Gill and W. Murray, "Newton type methods for unconstrained and linearly constrained
optimization", Mathematical Programming 7 (1974) 311-350.
[8] P.E. Gill and W. Murray, "Safeguarded steplength algorithms for optimization using descent
methods", National Physical Laboratory, Rep. NAC37 (1974).
[9] G. McCormick, "A modification of Armijo's step-size rule for negative curvature", Mathematical Programming 13 (1977) 111-11~
[I0] H. Mukai and E. Polak, "A second order method for unconstrained optimization", Journal of
Optimization Theory and Applications (1978). to appear.
[11] J.M. Ortega, and W.C. Rheinboldt, Iterative solution of nonlinear equations in several variables
(Academic Press, New York, 1970).
[12] D.C. Sorensen, "Updating the symmetric indefinite factorization with applications in a modified
Newton's method", Ph.D. thesis, University of California at San Diego, Argonne National
Laboratory Rep. ANL--77-49, (1977).
[13] P. Wolfe, "Convergence conditions for ascent methods II: Some corrections", SIAMReview 13
(1971) 185-188.
Download