On the use of directions of negative curvature in a modified newton

Mathematical Programming 16 (1979) 1-20. North-Holland Publishing Company ON THE USE OF DIRECTIONS OF NEGATIVE CURVATURE IN A MODIFIED NEWTON METHOD* Jorge J. MORI~ Argonne National Laboratory Argonne, IL, U.S.A. D a n n y C. S O R E N S E N University of Kentucky Lexington, KY, U.S.A. Received 17 October 1977 Revised manuscript received 6 June 1978 We present a modified Newton method for the unconstrained minimization problem. The modification occurs in non-convex regions where the information contained in the negative eigenvalues of the Hessian is taken into account by performing a line search along a path which is initially tangent to a direction of negative curvature. We give termination criteria for the line search and prove that the resulting iterates are guaranteed to converge, under reasonable conditions, to a critical point at which the Hessian is positive semidefinite. We also show how the Bunch and Parlett decomposition of a symmetric indefinite matrix can be used to give entirely adequate directions of negative curvature. Key words: Unconstrained Optimization, Modified Newton's Method, Descent Pairs, Directions of Negative Curvature, Symmetric Indefinite Factorization, Steplength Algorithm. 1. Introduction L e t f : R" ~ R be a c o n t i n u o u s l y d i f f e r e n t i a b l e f u n c t i o n in the o p e n set 9 , a n d c o n s i d e r the p r o b l e m of p r o d u c i n g a s e q u e n c e {xk} that c o n v e r g e s to a local m i n i m i z e r x* of f. T h a t is, we seek x* s u c h that f(x*)<-f(x), xEN N~ (1.1) with N s o m e n e i g h b o r h o o d of x*. A l g o r i t h m s for the s o l u t i o n of (1.1) are u s u a l l y descent methods. A d e s c e n t m e t h o d d e t e r m i n e s a d i r e c t i o n Sk at the iterate xk such that Vf(xDXsk < 0. A line s e a r c h t h e n yields a s t e p - l e n g t h ak > 0 s u c h that f(xk + akSk) < f(xk), a n d thus it is s e n s i b l e to let xk+t = xk + akSk. U n d e r s o m e a d d i t i o n a l r e s t r i c t i o n s o n the c h o i c e of ak o n e c a n show that lira k-= W(xk)+sk = H 0. (1.2) M o r e o v e r , the v e c t o r sk is u s u a l l y r e l a t e d to Vf(xk) in s u c h a w a y t h a t (1.2) * Work performed under the auspices of the U.S. Department of Energy. 2 J.J. Mor~, D.C. Sorensen/ Directions of negative curvature implies that {lIVf(Xk)ll} converges to zero. Thus, every limit point x* of {Xk} satisfies Vf(x*) = O. It is desirable to produce a sequence which converges to a point x* with Vf(x*) positive definite. This would imply that x* is an isolated local minimizer of f, and in particular, that x* satisfies (1.1). In general, it is not possible to produce such a sequence. However, through the use of directions of negative curvature we shall be able to produce a sequence which converges to a point x* " with V2f(x *) positive semidefinite. For practical purposes, this is a very strong assertion. For instance, if the Hessian were known to be nonsingular at all critical points, then x* would have to be a local minimizer. To see that theoretically it is not, in general, possible for V2f(x *) to be positive definite, consider an example of Wolfe [13]. In that example, steepest descent (and actually, any reasonable descent method) converges to a saddle point at which the Hessian is singular. Since the algorithm approaches the saddle point through at region in which f is strictly convex, there is no possible way to avoid this saddle point. The idea of using directions of negative curvature appeared as early as 1968 [5, pp. 165-169], but recently there has been renewed interest [6, 7, 9, 10]. We are particularly indebted to the paper of McCormick [9]. In that paper, McCormick showed how a modification of the Armijo line search could be used with directions of negative curvature. Our Theorem 3.1 is a slight modification of McCormick's main result; its purpose is to isolate the main ingredients of McCormick's paper. In this paper we first extend McCormick's work by considering the practical generation of directions of negative curvature. We discuss two methods in Section 4. One is based on Gill and Murray's [7] modified Cholesky factorization, and the other based on Bunch and Parlett's [3] factorization of symmetric indefinite matrices. We show that Gill and Murray's method does not satisfy the requirements of our convergence theorems, but that the Bunch and Parlett factorization can be used to give entirely adequate directions of negative curvature. In Section 5 we again extend McCormick's work by replacing the Armijo line search by a general line search with a satisfactory termination criteria. The results of this section provide a theoretically justified alternative to Fletcher and Freeman's [6] ad-hoc line search. Finally, in Section 6 we present our convergence results. In particular, we show how the line search can be used together with our directions of negative curvature to provide a very effective modified Newton method. Assumption 1.1. Let f:R" ~ R have two continuous derivatives on the open set 9 , and assume that for some x0 in 9 , the level set L(xo) = {x @ 9 : is a compact subset of 9. f ( x ) <- f(xo)} J.Y. Mor~, D.C. Sorensen/Directions of negative curvature 3 Notation 1.2. In all cases 11" II refers to the Iz v e c t o r n o r m or to the induced o p e r a t o r norm. T h e gradient and H e s s i a n of f at x are d e n o t e d b y Vf(x) and VZf(x), r e s p e c t i v e l y , but if we h a v e a s e q u e n c e of v e c t o r s {xg}, then fk, gk and Gk are used instead of f(xk), Vf(xk) and V2f(xk), r e s p e c t i v e l y . 2, Descent directions T h e search s t r a t e g y w e p r e s e n t d e p a r t s f r o m the usual strategies d i s c u s s e d in the literature. I n s t e a d of using only one d e s c e n t direction and searching in a line d e t e r m i n e d b y that direction, w e s e a r c h along a c u r v e of the f o r m C = {x(et): x(oO = x + ¢h~(a)s + 4~2(a)d, o~ >-0}, (2.1) with (s, d) a d e s c e n t pair at x, and with ~bl(0) = ~b2(0) = 0. Definition 2.1. L e t [ : R n ~ R be twice differentiable in the o p e n set ~. (a) A point x in ~ is an indefinite point if V2f(x) has at least one negative eigenvalue. (b) If x is an indefinite point, then d is a direction of negative curvature if dTV2f(x)d < O. ( c ) A pair of v e c t o r s (s, d) is a descent pair at the indefinite point x if V[(x)Zs <-O, Vf(x)Td <--O, and dTV2[(x)d < 0. If x is not an indefinite point, then w e require Vf(x)Ts < O, Vf(x)Td <- O, and dTV2f(x)d = O. If x is an indefinite point, then an e x a m p l e of a d e s c e n t pair is s = - V[(x) and d = +-e w h e r e e is an e i g e n v e c t o r c o r r e s p o n d i n g to a n e g a t i v e e i g e n v a l u e of V2[(x), and the sign is c h o s e n so that Vf(x)Td <--O. If x is not an indefinite point and Vf(x) ~ 0.~ then w e can take s = - V[(x) and d = 0. N o t e that a d e s c e n t pair fails to exist if and only if V[(x) = 0 and V:[(x) is positive semidefinite. G i v e n a d e s c e n t pair (s, d) at x, we w a n t to p r o d u c e an t~ > 0 such that f(x(,~)) < fix). If we let ~l,(a) = f(x(a)), we e n c o u n t e r a univariate minimization p r o b l e m w h e r e q~" is c o n t i n u o u s as long as ~b'~'and ~b~ are continuous. If q~'(0)< 0, or if 4"(0) = 0 and 4~"(0) < 0, then it is clear that there is an 6 > 0 such that f(x(a)) <f(x) a ~ (0, a l . (2.2) T h e following l e m m a i m p r o v e s on this result. L e m m a 2.2. Let ~ : R~R be twice continuously differentiable on the open interval I which contains the origin, and assume that p~ E (0,1). Then there is an ti > 0 in I such that ,/'(~) -< ¢,(0) + #.[,t,'(O)a + ½,/,"(0),~ 2] for all a ~ [0, &] provided that either ~ ' ( 0 ) < O, or ~'(0) = 0 and ~"(0) < O. 4 J.J. Mor~, D.C. Sorensen/Directions of negative curvature Proof. The mean value theorem implies that for every t~ > 0 there exists 0 E (0, a ) such that q,(,~) = ~ ( o ) + q , ' ( o ) a + ~ 4~"(o) a 2 + ½ [ ~ " ( o ) - ~ " ( o ) ] ~ 2. Hence, tb(a) = qb(0) + ~[@'(0)a + ½@"(0)a z] + r(a), where r(a-) = (1 - / z ) [ ~ ' ( 0 ) a + ½~"(0)a 2] + ½[tb"(0) - tb"(0)la 2. Since li-~ r(-~a2)< 0, a--,,O+ there exists an d > 0 such that r(a) < 0 for all a E [0, d]. This lemma states that the ratio of the reduction in the function to the reduction in the quadratic approximation ¢,(0) + ¢~'(0)a + ½¢"(O)a z is bounded below by/x. It also tells us that (2.2) can be satisfied, and that a larger decrease is likely when qb"(0)< 0. We want to use the simplest functions ¢1 and ¢2 which will guarantee that the hypothesis of L e m m a 2.2 is satisfied. Observe that if tb(a) = f ( x ( a ) ) with x ( a ) as in (2.1) then cb'(O) = Vf(x)T(¢~(O)s + ¢~(0)d), (2.3) • "(0) = Vf(x)T(¢'~'(O)s + ¢~(0)d) (2.4) + (¢~(0)s + d~(O)d)rVZf(x)(~b~(O)s + ¢~(0)d). Suppose now that Vf(x)Ts = 0 at an indefinite point (this occurs, for instance, at a saddle point). Then in order to ensure that ~'(0)-< 0 and ¢ " ( 0 ) < 0 without imposing further conditions on s, we must require that ¢~(0)= 0, ¢~(0)> 0, and &~(0)-> 0. Then (2.3) and (2.4) simplify to • '(0) = Vf(x)T(&~(O)d), (2.5) cI)"(O) = Vf(x)T(~b'~'(O)s + &~(0)d) + (&~(0)d) ~ e2f(x)(&~(O)d). (2.6) When Vf(x) is positive definite, then d = 0 must be satisfied in order for (s, d) to be a descent pair. Thus ¢'(0) = 0, and we must have &]'(0) > 0 in order to ensure 4~"(0) < 0. Therefore, if 4~1(~) i=o ~ ¢3,aj, 4,~(a) - _ i=o "/,~', then we must have /3o =/31 = 0 with/32> 0, and yo = 0 with yl > 0. The simplest functions of this type are ¢,(a) = a 2, ¢ 2 ( a ) = a. J.J. MorE, D.C. Sorensen/ Directions of negative curvature 5 3. A modification of the Armijo steplength procedure The arguments of Section 2 lead to iterations of the form Xk+l =: Xk ~- Ot2Sk q- otkdk where ak is chosen so that at least fk+~ < fk. There are several ways to choose the steplength Otk, and in this section we present a generalization of a result of McCormick [9] in which ak is chosen by a modification of the Armijo steplength procedure [2]. To describe t]he steplength algorithm, let % / z E (0, 1) and x0 E R" be given. If (Sk, dk) is a descent pair at Xk, then xk+l defined by choosing the smallest non-negative integer i such that Xk+I =: Xk + "yEisk -~ "yidk (~ 9 , (3.1) fk+l ~-: fk -t- IzT2i[gff Sk q- l dT Gkdk]. (3.2) L e m m a 2.2 shows that the iterates are well defined, and if a descent pair does not exist at Xk, then we accept Xk as a solution to (1.1). Theorem 3.1. Let f : R " ~ R satisfy Assumption 1.1, and suppose that {lldkH} a r e bounded. If {Xk} satisfies (3.1) and (3.2) then {lls llt and lim gTsg = 0 (3.3) lim dTGkdk = 0. (3.4) k--~o and k--~¢ Proof. The sequence {fk} is decreasing and bounded below due to the continuity of f and compactness of L(xo). Thus 0Ck--fk+l} converges to zero. If ik is the smallest non-negative integer such that (3.1) and (3.2) hold, then there are two cases to consider. Case 1. Suppose the integers {ik} are bounded above by some m -> 0. Then fk -- fk+, >----P/y2m[g[Sk + ½dTGkdk]. Since --gTkSk ~ 0 and -dTGkdk>--0 the conclusion follows. Case 2. The integers {ik} are not bounded above. Without loss of generality, assume that limk_.+~ ik = +oo. If we define ~l)k(Ol) = f(Xk + olEsk "Jr-adD, trk ~---"~(ik-l), then by the definition of ik, C1)k(Ork)7>A "JvI~Or2k[gT sk + ½dXkGkdk]. (3.5) 6 J.J. Mor6, D.C. Sorensen/Directions of negative curvature However, due to our assumptions on f and L(x0), a Taylor series argument and the fact that g~dk <--0 may be used to show that 2 T ~k(trk) <- fk + O'k[gkSk + ½d~kGkdk] + r(Xk, Sk, dk, ~k) (3.6) lira r(Xk, Sk, dk, trk) k-,+= tr] = 0. (3.7) with Hence, combining (3.5) and (3.6) gives r(xk, sk, dk, crk) --> - (1 - tz)[g~sg + ½d~Gkdk]. (3.8) The conclusion follows from (3.7) and (3.8). The result presented by McCormick [9] did not specify a choice of Xk+~ when Xk was not an indefinite point, and only required fk+1 <--fk. In the case that xk is an indefinite point, McCormick chooses Xk+~ according to (3.1) and (3.2) with 3,2= ½ In addition, the vectors Sk and dk must satisfy rls ll-> c31[g ll, (3.9a) d~Gkdk <-- C2Aak, (3.9b) -- S[gk >- C,IISkI[ Ilgkl[, (3.9c) where Ack is the most negative eigenvalue of Gk, and c~, c2, and c3 are positive constants. More specific choices of Sk and dk were not suggested. With these assumptions, McCormick is able to conclude that if infinitely many indefinite points {Xk) were to occur in the sequence {xk}, then any point of accumulation x* of the sequence {Xk) must satisfy Vf(x*) = 0, and V2f(x *) is positive semidefinite with at least one zero eigenvalue. Since Armijo type steplength procedures do not take into account any information about the shape of the function along the curve x(a), we are interested in investigating more sophisticated strategies for determing the steplength ak. Thus, in the rest of this paper we shall be concerned with a steplength procedure which specifies criteria for terminating a univariate search along curves x(a), and with specific choices for (Sk, dk). Finally, a convergence result will be given which indicates that these choices are quite reasonable. 4. Determining directions of negative curvature. If a direction of negative curvature is to be useful, then it must satisfy at least two requirements: (lldkll} bounded, d~Gkark ~ 0 implies (4.1) )tG~--* 0. (4.2) J.J. Mord, D.C. Sorensen/ Directions of negative curvature 7 Here ~tGkis the most negative eigenvalue of Gk when x k is an indefinite point and zero otherwise. Note that McCormick's condition (3.9b) is of this type. Both of these conditions are required by our convergence theorems; intuitively, they force the iterates towards a region of convexity f o r / . It is possible to satisfy (4.1) and (4.2) quite easily if we have the eigenvalue and eigenvector decomposition of Gk. H o w e v e r , this decomposition is quite costly, and thus we examine other matrix factorizations. In particular, we discuss in some detail Gill and Murray's modified Cholesky factorization [7], and Bunch and Parlett's factorization [3]. Given a symmetric matrix A and parameters 8 - 0 and /3 > 0 , Gill and Murray,s algorithm produces a unit lower triangular matrix L and diagonal matrices D = dJiag(di) and E = diag(Ei) with di---0 and ~ - 0 such that A + E = L D L T. The jth step of the algorithm sets ljk = Cjk/dk, l <-k < j, c~j = a~j - ~ ljkCik, i >~j, k=l dr = 0j -- max{lcijl: i > j}, e~ = d j - max(8, Ic jl, cjj. Note that if 8 := 0, then it is possible that dj = 0, but in this case set ljk = 0 for l<_k<j. Gill and Murray [7] showed that if 8 = 0, then it is possible to use this factorization to obtain a direction of negative curvature. The following lemma is a slight modification of their results. Lemma 4.1. L e t A ~ R n×n be s y m m e t r i c , and a s s u m e that /32> max{laiil: i = 1 . . . . . n}. I f the integer m satisfies Cmm<--Cjj, j = l . . . . . n, and LXp = era, then p r A p <-c,,,,. Moreover, if 8 = 0 and A has at least one negative eigenvalue, then Cram< O. Proof. If LTp = era, then p,- = 0 for i > m. Moreover, since A + E = L D L T, m-1 pTAp =dm --pTEp = dm - e , . - ~ Eip 2, i=1 and thus p r A p <- Cram. NOW assume that 8 = 0. If Cram>--0, then i-I i-1 aii ~: ~ l likCik = 2 I 2ikdk k=l 8 J.J. Mor~, D.C. Sorensen/Directions of negative curvature for 1 -< i -< n, and thus 1 2 d i ~ a i i < f l 2, j < i . If dj~ 0 this implies that c 2-< fl2dj for j < i, and hence, 0 2 < fl2dj. Thus di = Icsjl = cjj, and therefore, ~j = 0. If dj = 0 then clearly ej = 0. It follows that E = 0, and in particular, that A = LDL~C. We have now shown that if Cram-> 0 then A must be positive semidefinite, and thus L e m m a 4.1 holds. Gill and Murray [7] allow equality to hold in (4.3) and show that if A is indefinite, then pTAp < 0. H o w e v e r , if we allow equality in (4.3) then we lose a very nice consequence of L e m m a 4.1; namely, that Cram- 0 if and only if A is positive semidefinite. In fact, if 1 then A is indefinite, but if we allow fl = 1 then cll = ½and c22 = 0. Can we use L e m m a 4.1 to construct directions of negative curvature which satisfy (4.1) and (4.2)? Unfortunately, the answer is no. To see this, let {Gk} be a sequence of matrices with at least one negative eigenvalue, and let {Pk} be the sequence of negative curvature directions generated as in L e m m a 4.1. If we choose dk = Pk, then dTrGkdk < ~(k) -- ~ mnl implies that (4.2) holds provided that {Gk} is bounded. H o w e v e r , the following example shows that {HPk[I}m a y not be bounded. Example. 4.2. L e t where ak ~ (0,1). We can choose fl = 1, and then it is fairly easy to show that 1, Pk = ( - Olk 1) T. Thus, if ak-~ 0, then {I[Pk[[}is not bounded. Note that if we normalize Pk and consider dk = Pk[[IPk[[ as our direction of negative curvature, then in this example 3 o~2 dlGkdk = -~ jr../. Mor~, D.C. Sorensen/Directions of negative curvature 9 Thus, if dTGkdk, then and hence (4.2) does not hold. The factorization of Bunch and Parlett avoids the above difficulties. Given a symmetric matrix A, this factorization consists of a permutation matrix Q, a block diagonal matrix D, and a unit lower triangular matrix M such that QAQ T= M D M T. The matrices M and D have the following properties: (a) The elements of M are bounded by a fixed positive constant which is independent of the matrix A. (b) D is a block diagonal matrix with one-by-one or two-by-two diagonal blocks. (c) D has the same number of positive, negative, and zero eigenvalues as A (Sylvester's Inertia Theorem). (d) The number of two-by-two diagonal blocks plus the number of negative diagonal elements which occur as one-by-one diagonal blocks of D is equal to the number of negative eigenvalues of A. If A is positive semidefinite, D is a diagonal matrix with nonnegative diagonal elements. The following lemma shows how this factorization can be used to obtain directions of negative curvature which satisfy (4.1) and (4.2). Lemma 4.3. L e t A = W B W T where W E R nxn is nonsingular, B E R ~×~ is symmetric, and A has at least one negative eigenvalue. Let {zj: j = 1,2 . . . . . m} be orthonormal eigenvectors f o r B corresponding to eigenvalues AI~A2~...~Am<O , and f o r some 1 <- l <- m define y and z by 1 (4.4) w r y = z = ~ z i. j=l Then X A --~ (yTAy)llwll 2 _ l[K2(W)] = yTAy/llYll2 where AA is the smallest eigenvalue o f A, and condition number o f W. (4.5) x2(W) = IIWll IIw-ill Proof. If x is a unit eigenvector for A corresponding to if u = W~x, then ~t A is the Euclidean then xTAx = )kA,and AA = xTAx = uTBu >--Xdlull2. Moreover, since Ilull-< Ilwll xA -> x 111wll 2. and A1 <0, (4.6) 10 J.J. Mor~, D.C. Sorensen/ Directions of negative curvature Now note that from (4.4) yTAy = Zj B zj = J=l )~j ~ )tl, and therefore (4.6) implies that (yTAy)II W][2 -- AlllWl[2 ---/~A. To prove the second inequality in (4.5) use the inequality Ilyll2_<llw-lll 2 zj =lllW-'ll 2, To obtain the desired result. If L e m m a 4.3 is to be useful, then w r y = z must be easy to solve. Also, the eigensystem of B must be readily available, and the factorization A = W B W r should be relatively cheap to compute. These requirements rule out a full eigensystem decomposition of A and also the factorization of Aasen [1] which gives B in tridiagonal form. H o w e v e r , the B u n c h - P a r l e t t factorization certainly satisfies all these requirements with the additional feature that x2(W) has a bound that is independent of A. Finally, with the B u n c h - P a r l e t t decomposition, it is very easy to satisfy (4.1) and (4.2). To see this, note that if {Yk} is the sequence of negative curvature directions generated by L e m m a 4.3, then (4.1) and (4.2) holds for dk = Yk provided {llWkll} and {llw;'ll} are bounded. Fletcher and Freeman [6] have suggested a direction of negative curvature ,which corresponds to l = m in L e m m a 4.3, but (4.5) implies that l = 1 may be a slightly better choice. 5. A steplength algorithm Once a descent pair (s, d) has been determined at a point x then we are faced with the problem of determing a such that f(x(a)) <-[(x) where x ( a ) = x + a2s + ad. One solution would be to determine a such that f ( x ( a ) ) = min{f(x(A)): A ---0} (5.1) but this is a very difficult computational problem. It is computationally more desirable to replace the problem of satisfying (5.1) exactly with the specification of criteria for terminating a univariate minimization procedure that is designed to solve (5.1). Such an approach is motivated by the success of previous algorithms which have been used when a single descent direction is specified. Given a descent direction s at a point x, one such algorithm is to terminate the line search when ./.3. MorE, D.C. Sorensen! Directions of negative curvature 11 an a has been found which satisfies Vf(X "4"a s ) T s ~ nVf(x)Ts, (5.2) [(X q- OlS) <~f ( X ) q- Ol~.bVf(x)T s (5.3) and where 0 < / x -< ~l < 1 are preassigned constants. If a sequence of points {Xk} are determined where Xk+~= Xk + akSk with x = Xk, S = Sk, a = ak satisfying (5.2) and (5.3) for each k, then it can be shown [8] that lira g Tsk = k-->~]s--'~ 0. (5.4) Usually gk and sk are related so that (5.4) implies IIg~ll-~ 0 which in turn implies Ilskll--, 0. Thus it is concluded that Ilxk+,- x~ll-~ 0 and IIg~ll-~o as long as the ak are bounded. This enough to ensure that {Xk} converges to a critical point of f due to the following lemma given in [11, p . 476]. Lemma 5.1. L e t f : R" --* R be continuously differentiable on the c o m p a c t set 90, and a s s u m e that f has a finite n u m b e r o f critical points in 9o. I f {Xk} C 9o is a s e q u e n c e such that limllxk+,- xdl = O, k~ limllgkll = O, k~ then {Xk} converges to s o m e x* E 9o with V f ( x * ) = O. The geometrical interpretation of (5.2) and (5.3) is depicted in Fig. 1. Here a * is the smallest positive a which satisfies (5.2), and it is clear that (5.2) guarantees that a*llsll is not too small unless V f ( x ) is also small along the direction s. Condition (5.311 forces f ( x + a s ) to lie below the top line of Fig. 1, and thus guarantees sufficient decrease of the function, Algorithms which use (5.2) and (5.3) as termination criteria for their line searches are further discussed in [8, 11]. The termination criteria we shall give may be viewed as an extension of these ideas to the situation when an iterate Xk is an indefinite point. We replace (5.2) and (5.3) with the following rule. If (s, d) is a descent pair at x, then we terminate \ T ~ a/.Lg s f (x(a)) ~-Ct ~C + aqTgTs Fig. 1. J.J. Mor~, D.C. Sorensen/ Directions of negative curvature 12 the search when a has been found which satisfies W(x(o,))T x'(o,) >- ,t Vf(x)T a + 2a(W(x) s +ld v2S(x)d)]. f ( x ( a ) ) <--f ( x ) + tza 2[ef(x)X s + ½d TvZf(x)d], (5.5) (5.6) where x(a) = x + a2s + ad and 0 </.t -< rl < 1. Note that when d = 0 these conditions reduce to those of (5.2) and (5.3). Conditions (5.5) and (5.6) also have a geometrical interpretation as shown in Fig. 2. Note that in this case the function f ( x ( a ) ) is concave near a = 0 because if (s, d) is a descent pair at x and c19(a) = f ( x + a2s + ad), then ~"(0) < 0. Also note that (5.5) and (5.6) are equivalent to q~'(a) -> rt[q~'(O) + q~"(O)a], (5.7) 4,(a) _< ¢,(0) + ½ttq,"(0)a 2. (5.8) The role of the upper and lower curves in Figs. 1 and 2 is similar; the lower curves are one parameter families with the parameter c chosen so that the point of tangency o~* between the curve and f ( x ( a ) ) is smallest. Note that t~* is also the smallest a > 0 which satisfies (5.5), and that if a * is small, then (5.7) shows that ~"(0) must also be small. On the other hand, (5.6) guarantees a sufficient decrease in the function, and if a * is not small, then (5.8) shows that in this case q~"(0) must be small. Thus, in either case q~"(0), and hence, Vf(x)Ts and drV2[(x)d are forced to zero. If s and d are properly chosen, then both [IVf(x)l[ and the smallest eigenvalue of VZf(x) must go to zero and, in particular, the inflection point which occurs to the left of a * must either be crossed or become"flattened out". These arguments will be made precise by the convergence theorems of Section 6. f / f Fig. 2. J.J. Mor~, D.C. Sorensen/ Directions of negative curvature 13 We note with Fletcher and Freeman [6] that if a direction d of negative curvature alone :is used (taking s = 0), then the condition IW(x + ~a)wdl <- - nW(x)Td is inappropriate for termination of the linear search because V[(x)Td may be close to zero even far away from a minimum. T h e y found it necessary to give termination criteria based on an estimate of the first derivative of f ( x ( a ) ) at the inflection point. The estimate was obtained from the value of the derivative of a related quartic polynomial at its corresponding inflection point. The following lemma will show that conditions (5.5) and (5.6) can be satisfied whenever a desent pair exists at a point x. Lemma 5.2. Let cp :R--->R be twice continuously differentiable in an open interval I which contains the origin, and suppose that {a ~ I: ~ ( a ) -< ~(0)} is compact. Let ~z E (0, 1) and ~7 E [t~, 1). I f ~'(0) < O, or if rip'(O) <_0 and ~"(0) < O, then there is an a > 0 in I such that • '(~)-> n [ ~ ' ( 0 ) + ~"(0)a], (5.9) and ¢ (a) <- • (0) + tz [~'(O)a )a + ½¢"(O)a 2]. (5.10) Proof. L e t /3 = sup{a E I: ~(ot) --< ¢(0)}. Then /3 > 0 since either q~'(O)<0, or ~'(0)--<0 and ¢ " ( 0 ) < 0 . Moreover, the compactness assumption and the continuity of qb imply that/3 is finite and that • (0) = ~(/3). Thus ¢(/3) -> ~(0) + ~[~'(0)/3 +-~"(0)/32]. (5.11) Define h : I ~ R by h(a) = ~ ( a ) - ¢(0) - n [ ¢ ( 0 ) a + ½qb"(0)a2]. Since t~---~ we have h(/3)-> 0. Note also that h ( 0 ) = 0 and either h ' ( 0 ) < 0 , or h'(0)-<0 and h " ( 0 ) < 0 . This, together with the continuity of h, implies the existence o f / 3 1 E (0,/3] such that h(/31) = 0, and h(a) < 0 for all a ~ (0,/31). Now Rolle's theorem shows that there is an a E (0,/30 with h'(a) = 0, and thus (5.9) follows. Moreover, h(a) < 0 and/.~ -< ~7 imply (5.10). 6. Convergence of the modified Newton iteration Now we turn our attention to defining a modified N e w t o n iteration. We shall give a convergence result based on the use of descent pairs and the step-length 14 J.J. Mor#, D.C. Sorensen/ Directions of negative curvature algorithm discussed above. The proof proceeds in two parts. The first result is somewhat independent of the definition of the iterates. The second part will use the particular way in which the iterates are defined to establish convergence. To define the iteration let (Sk, dk) be a descent pair at xk, and let ~k(a) = f(Xk + a2Sk + adk). If ~ E (0, 1) and r / E [/z, 1), then ak > 0 is determined such that Xk+ l : X k q- Ol2Sk q- a k d k ~ ~ , (6.2) fk+~ <--fk + ½/Z~(O) a ~, (6.3) O'k(ak) >--r/[O~,(O)+ ~(O)ak]. (6.4) One might note that due to (5.10) in the statement of L e m m a 5.2, we could require f k+l ~ fk ..~ ~[l~tk(O)ol k + ~O'~ I t (0)a 2k] instead of (6.3). However, the additional term does not enhance the convergence result in any way, while it does give a more stringent requirement to be satisfied by the univariate search. Theorem 6.1. Let f : R n ~ R satisfy assumptions (1.3), and let {llSkll}and {[[dkl[}be bounded. I f {Xk} satisfies (6.2), (6.3), and (6.4), then lira g[Sk = 0 (6.5) lim d~Gkdk = 0. (6.6) k~+oo and k---~+~ Proof. F r o m (6.1) we have ~ ( 0 ) = gTdk and 4Y~(O) = 2g~Sk + d~Gkdk. Since (Sk, dk) is a descent pair, ~ ( 0 ) - 0, and O'~(0)< 0. Thus (6.3) implies that {Xk} C L(Xo), and by the continuity of f and compactness of L(xo) we have that {fk -- fk+l} converges to zero. Since 1 t 2 A - A + l - - ~ z ~ ( O ) a k > O, it follows that lira a ~ ~Sk = O, (6.7) k--~ and lim a 2fflTkGkdk = 0. k--*~ (6.8) jr../. Mor~, D.C. Sorensen/ Directions o[ negative curvature 15 From condition (6.4) we obtain q~[(C~k)-- q~[(0) -- akq¢~(0) --> -- (1 -- ~)[q~[(0) + qe~(0)ak], and hence 'P[(ak) - ,P[(0) - ak'P'~(0) -> -(1 - n)4¢~ (0)ak. An application of the mean value theorem now yields that for some 0k E (0, ak), q¢~(0k) -- 4¢~(0) --> -- (1 -- ~)q¢~(0). (6.9) The desired result now follows readily, for if either (6.5) or (6.6) do not hold, then there is a subsequence {ki} and a o- > 0 such that - ~ , ( 0 ) - or > 0. (6.10) H e n c e (6.9) implies that {ak~} does not converge to zero. H o w e v e r , if {ak~} does not converge to zero and (6.10) holds, then (6.7) and (6.8) cannot be satisfied. This contradiction establishes the theorem. If we consider practical methods for determining e~k > 0 which satisfies (6.3) and (6.4), then we are led to a steplength rule which c a n be analyzed by a combination of Theorems 3.1 and 6.1. Steplength rule SR (/z, ,7,/3). Given a fixed/3 > 0, apply a univariate minimization algorithm to ~.Ok(a) and terminate the search if an 6k E (0,/3] is found such that (6.2) and (6.4) are satisfied with 6k in place of ak. If these conditions cannot be satisfied, then usually • ~(/3) = min{ePk(a):a E [0,/3]}, (6.11) (note that thi,,; may not hold if (/)k is not defined for all a E [0,/3]) and thus it is reasonable to let 6k = /3. Now, if (6.2) and (6.3) are also satisfied with t~k in place of ak, we accept ak = 6k; otherwise we take Ok to be the largest element of the set {2-i: i = 0, 1. . . . } such that (6.2) and (6.3) are satisfied with 5kO~ in place of ak, and then accept ak = 6ktOk. It should be Clear that the proofs of Theorems 3.1 and 6.1 show that (6.5) and (6.6) also hold for SR(t~, */,/3). Our next result will show that the iterates defined by this steplength rule converge to a critical point of f where the Hessian is positive semJidefinite. It is here that specific properties of the descent pairs (Sk, dk) are crucial. Theorem 6.2. Let f : R " ~ R satisfy assumptions (1.3), and in addition, assume that f has a finite number of critical points in L(xo). Let {llskll} and {lldkll} be bounded, and suppose that g~sk ~ O implies gk->O and sk ~ O (6.12) J.Z MorL D.C. Sorensen/ Directions of negative curvature 16 and dkTGkdk'--~O implies XGk~0 and dk~O. (6.13) If {Xk} satisfies (6.2) where Olk is chosen by SR(/~, n,/3), then {Xk} converges to some x* in ~ where V[(x*) = 0 and V2[(x *) is positive semidefinite. Moreover, if infinitely many of the Xk are indefinite points, then V2[(x *) is singular. Proof. From (6.12) and (6.13) we have that Thus {lls~l[} and {lld~ll} converge to zero. IIx~+1- xdl-/3Zllskll +/3lld~ll that x ll) converges to implies zero. Therefore L e m m a 5.1 applies and we obtain that {Xk} converges to some x* in ~ with Vf(x*) = 0. Since ;tak ~ 0 we also have that V2f(x *) must be positive semidefinite. Finally, if infinitely many of the Xk are indefinite points then the continuity of V2f(x *) implies that Vff(x*) is not positive definite, and hence V2f(x *) must be singular. Many choices of Sk are possible which satisfy (6.12). Indeed, if Ak is any sequence of symmetric positive definite matrices such that {llAkl[} and {IIAZ'II} are bounded, then choosing Sk as the solution of AkSk = -- gk will satisfy (6.12). Also, in Section 3 we showed how to choose dk at an indefinite point so that (4.1) and (4.2) are satisfied. The additional requirement that dk must satisfy is obtained if we replace dk with -+~b(A~k)dk where ~b : R ~ R is a positive function such that qb(tk)--~O implies that tg-~0, and where the sign is chosen to make g~dk <--O. The iterates should also reduce naturally to Newton's iteration as soon as a region is found where the Hessian is positive definite. Indeed, the main motivation for this strategy is to obtain the iterates using second derivative information which is based on the true quadratic model at each xk. Of course, it is expected that in practice very few indefinite points will be encountered during the iterative process. In fact, Theorem 6.2 indicates that the strategy we have presented actively seeks a region where the Hessian matrix is positive semidefinite. If, for example, the Hessian vE[(x) is nonsingular whenever x is a critical point of [ then only finitely many of the iterates can be indefinite points. Finally, we shall suggest a way to obtain the descent pairs (Sk, dk) which satisfy all of the requirements of Theorem 6.2. In our description we assume that Gk = MkDkM~ is the Bunch-Parlett factorization of the Hessian. Thus we have omitted explicit representation of the permutations Qk which will be present in practice. We obtain Sk as the solution of ( M ~ k M ~ ) s = -g~ where DR = Uk~kkUk T is obtained from Dk by first computing the eigensystem J.J. Mole, D.C. Sorensen[Directions of negative curvature 17 Dk = UkAgU~ ,of Dk and then replacing the diagonal elements k~k) of Ak with max{IX1% l~j---n maxlX!% where ~ is the relative machine precision. In the decomposition of Dk we have U~Uk = ! and Ak diagonal, and note that only O(n) arithmetic operations are required to obtain/3k from Dk. The negatiw~ curvature direction dk is obtained as the solution to M~'dk = +-Ikokll/2sk where kok is tlhe most negative eigenvalue and Zk the corresponding unit eigenvector of Dk. When Dk does not have a negative eigenvalue, we take dk = O. If f:R"--*R and x0 satisfy the assumptions of Theorem 6.2, then the compactness of L(xo) and the continuity of Vff imply that {Gk} and {gk} are uniformly bounded. Thus (Sk, dk) satisfy the requirements of a descent pair as well as (6.12) '.and (6.13). The above choice of (Sk, dk) is somewhat ad hoc and we make no mathematical statements concerning the desirability of this choice. However, computational results show that this specification of (Sk, dk) works reasonably well in practice. We wish to emphasize that many other choices are possible. We have not addressed the problem of providing an initial step a to the univariate search. Many strategies for determining the initial step are possible. However, we have not found a strategy with enough theoretical basis to recommend it over something very simple such as taking the initial step to be a = 1 each time. Note, however, that whatever strategy is chosen must eventually take a = 1 in order to retain the local quadratic rate of convergence enjoyed by Newton's method. 7. Concluding remarks It is possible to generalize Theorems 3.1 and 6.1 so that they apply to more general curves. Consider ~k(a) = [(Xk + qbl(a)Sk + &2(ct)dk) (7.1) where (sk, dk) is a descent pair, and &l and &2 are such that ~ ( 0 ) - < 0 and ~ ( 0 ) < 0. Instead of (3.1) and (3.2), we can define Xk+t by choosing the smallest nonnegative integer i such that Xk+ 1 = X k -~ ¢ l ( ' y i ) S k "~- q~2(yi)dk E fk+~ < fk + t~ [¢~(0)Y i + ~ 3, (0)y2i]. (7.2) Theorem 3.1 generalizes, and instead of (3.3) and (3.4) we obtain lim ~ ( 0 ) = 0 k--*+oo and lim ¢~(0) -- 0. k~+oo (7.3) 18 J.J. Mor~, D.C. Sorensen/ Directions o[ negative curvature Similarly, to generalize Theorem 6.1 we replace (6.2) and (6.3) by Xk+l = Xk + ~pl(Otk)Sk -t- qb2(Otk)dk E ~ , fk+l ~ fk]-L[~k(0)ak -~-l{~k (0) O/2], (7.4) respectively, but leave (6.4) unchanged except that n o w ~Dk is defined by (7.1). Once again, the conclusion is that (7.3) holds. The analysis of the algorithm presented in this paper did not require the term qb~(0) in (7.2) and (7.4). However, Mukai and Polak [10] have proposed an algorithm which requires these terms. To describe their algorithm at an indefinite point, let ek be a unit eigenvector corresponding to the smallest eigenvalue of Gk such that gTek<--O. Now choose ~1(O~)=~2(a)=O£ and Sk=--l~kgk, d k = A k e k where if hk = -- gk d- ek, then Ak ---- 1 whenever hTGkhk ~ O, and otherwise }[k "~ [~ik where fl ~ (0, 1) and ik >-- 0 is the smallest integer such that • gXkhk It is not too difficult to show that for this choice of descent pair, (7.3) implies that {llgkl[}and {e~Gkek} converge to zero. Thus, the results of Mukai and Polak [10] can be obtained by applying the above generalization of Theorem 3.1 to their algorithm, and in fact, our results also cover the much simpler variation of their algorithm where Ak ---- 1. It is also worth noting that Theorems 3.1 and 6.2 also hold if instead of assumptions (1.3), we assume that f : R n - > R is bounded below on L(x0), and ~72f is uniformly continuous and bounded on some convex set which contains L(xo). Of course, the ultimate test of any theory is whether it yields a robust algorithm. Sorensen [12] has implemented a version of the algorithm described at the end of Section 6, and has obtained excellent numerical results. We will only present here the results for one problem function. Box' s ]:unction lo [ ( x ) ~ { e -xl' - e -x28' - x3(e -a' - e-l°~')} 2 i=1 where 8/= i/lO. The results obtained for this function are fairly typical. When the algorithm was used with the standard starting point x0 = (0, 20, 20), only two indefinite points were encountered. In general, the use of standard starting points does not fully reveal the performance of this algorithm, because some of the standard starts are in regions such that little or no negative curvature is encountered during the iteration. However, when started from a set of ten random starting points, many more indefinite points were encountered. These results are summarized in Table 1 where N E G C N T is the number of indefinite points encountered during the iteration. Note that for each starting point there are two entries. The first is from Sorensen's algorithm, while the second is the result of a version J.Z Mor(, D.C. Sorensen/Directions of negative curvature 19 Table 1 # 1 2 3 4 5 6 7 NITER NFEV FINAL Ilgll2 NEGCNT 25 36 1 x 10-25 21 24 140 4 × 10-8 16 17 2× 36 97 1 x 10-24 14 15 3 x 10-31 27 70 5 X 10 -27 20 26 1 × 10-20 20 20 6 x 10-33 26 42 0.0 37 118 5 x 10-25 22 41 1× 10 -32 17 26 64 1 x 10-2z 18 20 33 1 × 10-22 18 19 26 1× 18 22 4 x 10-2s 9 9 2 × 10TM 16 20 5 × 10-25 14 14 1 × 10-28 12 16 8X 33 95 2 × 10-25 10 TM 3 3 2 22 10 -27 3 8 1 9 10 -32 7 lO o f Gill and M u r r a y ' s [7] m o d i f i e d N e w t o n m e t h o d . S o r e n s e n ' s i m p l e m e n t a t i o n used the B u n c h - K a u f m a n partial p i v o t i n g a l g o r i t h m [4] r a t h e r t h a n the B u n c h - P a r l e t t c o m p l e t e p i v o t i n g a l g o r i t h m . W i t h this f a c t o r i z a t i o n w e do not, in t h e o r y , h a v e b o u n d e d n e s s o n K2(Mk). H o w e v e r , in p r a c t i c e w e do n o t e x p e c t to e n c o u n ter g r o w t h in the e l e m e n t s o f Mk. F i n a l l y , w e n o t e t h a t the c o m p u t a t i o n s w e r e d o n e o n the I B M 370/195 at A r g o n n e N a t i o n a l L a b o r a t o r y in d o u b l e p r e c i s i o n (14 h e x a d e c i m a l digits or a b o u t 15 d e c i m a l digits) u n d e r t h e F O R T R A N H ( o p t = 2) c o m p i l e r . A l s o , b o t h a l g o r i t h m s r e p o r t e d p o s i t i v e definite H e s s i a n s at the s o l u t i o n , a n d c o m p a r a b l e c o n v e r g e n c e c r i t e r i a w e r e specified f o r b o t h a l g o r i t h m s . 20 J.Z Mord, D.C. Sorensen/ Directions of negative curvature Acknowledgment W e w o u l d like to t h a n k K e n H i l l s t r o m for a l l o w i n g us to use his t e s t i n g p a c k a g e , a n d J u d y B e u m e r for h e r s u p e r - d u p e r t y p i n g of the m a n u s c r i p t . References [1] J.O. Aasen, "On the reduction of a symmetric matrix to tridiagonal form", BIT 11 (1971) 233-242. [2] L. Armijo, "Minimization of functions having Lipschitz continuous first partial derivatives", Pacific Journal of Mathematics 16 (1966) 1-3. [3] J.R. Bunch and B.N. Parlett, "Direct methods for solving symmetric indefinite systems of linear equations", SIAM Journal on Numerical Analysis 8 (197l) 639--655. [4] J.R. Bunch and L. Kaufman, "Some stable methods for calculating inertia and solving symmetric linear equations", Mathematics of Computation 31 (1977) 163-179. [5] A.V. Fiacco, and G.P. McCormick, Nonlinear programming: sequential unconstrained minimization techniques (Wiley, New York, 1968). [6] R. Fletcher and T.L. Freeman, "A modified Newton method for minimization", Journal of Optimization Theory and Applications 23 (1977) 357-372. [7] P.E. Gill and W. Murray, "Newton type methods for unconstrained and linearly constrained optimization", Mathematical Programming 7 (1974) 311-350. [8] P.E. Gill and W. Murray, "Safeguarded steplength algorithms for optimization using descent methods", National Physical Laboratory, Rep. NAC37 (1974). [9] G. McCormick, "A modification of Armijo's step-size rule for negative curvature", Mathematical Programming 13 (1977) 111-11~ [I0] H. Mukai and E. Polak, "A second order method for unconstrained optimization", Journal of Optimization Theory and Applications (1978). to appear. [11] J.M. Ortega, and W.C. Rheinboldt, Iterative solution of nonlinear equations in several variables (Academic Press, New York, 1970). [12] D.C. Sorensen, "Updating the symmetric indefinite factorization with applications in a modified Newton's method", Ph.D. thesis, University of California at San Diego, Argonne National Laboratory Rep. ANL--77-49, (1977). [13] P. Wolfe, "Convergence conditions for ascent methods II: Some corrections", SIAMReview 13 (1971) 185-188.

On the use of directions of negative curvature in a modified newton

Related documents

Products

Support

On the use of directions of negative curvature in a modified newton

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib