MITLibraries Document Services Room 14-0551 77 Massachusetts Avenue Cambridge, MA 02139 Ph: 617.253.5668 Fax: 617.253.1690 Email: docs@mit.edu http: //libraries. mit. edu/docs DISCLAIMER OF QUALITY Due to the condition of the original material, there are unavoidable flaws in this reproduction. We have made every effort possible to provide you with the best copy available. If you are dissatisfied with this product and find it unusable, please contact Document Services as soon as possible. Thank you. Due to the poor quality of the original document, there is some spotting or background shading in this document. LIDS-P-1124 August 1981 F FINAL. SIZE 8.F X 11 MO(.,Dl Pl'P R i77% Rt:). _. r . _ _ _--_- PROJECTED NEWTON METI-IODS FOR OPTIMIZATION: PROBLEMIS IITH SIMPLE CONSTRAINTS* . _. ^ ...... follow apply also with fiinor nodifica.t i :.... :'.Le...:, .........t~:~ i;oJ-, , ..:i .................. , . . i ~ Dimitri P. Bertsekas '~j Department of Electrical Engineering and Computer Sciencel Massachusetts Institute of Technology . ....... Cambridge, Massachusetts 02139 . .... -- -_--------.......... [ ,8milP.:'..w .:t al Abstr iact" |sl).-e ,d .. '..._."-- ................. ..... 1..._.1 .. .____.......................... ,C'J~i~, t1 -o : - HERE [":F Irv P 'U i !-"') : i l11 /,T ri_-, lems with rectangle ..... We consider the problem min {f(x) Ix > 0} and [x k - kDkVf(xk) + algorithms of the form Xk+l constraints tins to probT such as i (2 x minimi ze- ----------subject to b < x.< b 2 where [.]+ denotes projection on the positive .... . orthant, ok is a stepsize chosen by an Armijo-like Problems (1) :~where b and b are given vectors. ~~~k . . ' . :2 rule, and Dk is a positive definite symmetric maand (2) are referred to as simply constrained probWeshowth-a-! We show that Dtrixcan trix which is partly diagonal. ems, diagonal. ande their showha algorithmic Dk solution is the primary which n is partly subject of this paper. be calculated simply on the basis of second deri.vatives of f so that the resulting Newton-like algoIn view of the simplicity of the constraints, rithm has a typically superlinear rate of converone would expect that.solution of problem (1) is With other choices of Dk convergence at a gence. almost as easy as unconstrained minimization of f. typi.cally linear rate is obtained. The algorithmrs This expectation is partly justified in that the order necessary condition for a vector x = first are almost as simple as their unconstrained counter-n -1 parts. They are well suited for problems of large ) to be a local minimum of problem (1) (x ,...,x dimension such as those arising in optimal control. takes the simple form while being competitive with existing methods for Bf(x) low-dimensional problems. (3a) -1, ... ,n i -> 0 ax). 0 Df(x) 0 1. >0, i = 1,...,n (3b) Furthermore the direct analog of the method of steepest descent takes the simple form (1) f(x) subject to x ,> x Introduction We consider the problem minimize if = k [xk - Vf(xk)] k = 0,1,... .. (4) 0 where f : R - R is a continuously differentiable function and the vector inequality x > 0 is meant to be componentwise [i.e. for x = (,x ..... xn),, > 0 for all i = l,....n]. This we write x > 0 if x type of problem arises very often in applications; for example Vwhen f is a dual. functional relative to an original inequality constrained primal problem, and x represents a vector of nonnegative lagrange multipliers corresponding to the inequality constraints, and when f represents an Augmented Lagrangian or exact penalty function taking into account other possibly nonlinear equality and inequalThe analysis and algorithms that ity constraints. Work supported by Grant NSF ECS-79-20834 and by Alphatech, Inc. through DOE Contract No. DE-AC0179ET29243. To be presented at' the 20th December 1981. where ok is a positive scalar stepsize and for any n...,z6R R we denote vector z = (z, [zl : [ ax {0,z X {O,z The stepsize ak may be chosen i.na number of ways. In the original proposal of Goldstein [1], and Levitin and Poljak [2], ak is taken to be a constant E C for all k) and a convergence result a (i.e. is shown under the assumption that a is sufficientIn generly small and Vf is Lipschitz continuous. al a proper value for a can be found only through experimentations. ,0i alternative suggested by INMc- An alternative suggested by cxperimetatio. Connick [3] is co choose ak by function minimization along the arc of points xk(l), a > 0 where IEEE Conference on Decision and Control, San Diego, CA. iUT-Ori 77%c RED. NU F R PAG ES i i R , MO)DEL PAPER F INAL SIZE Thus1,-s-hen _ .ta---------Thuscl~iks-cosen so-that-- of-the inverse H'eSsian V f() ½ X 1ii lastaong te ~ _suitable subspace. At thispoint wefind.that the algorithms available at present are far more comr TIT-LE ON! W-IUi PAGE 'plicated than' their unconstrained counterparts, f~~ min f[xk(C)].(6) f[xk(] =__ m, mm xk(d) particularly when the problem has large..dimension. a>O'~~Thus the most straightforward..extension of. Newton's Unfortunately the minimization above is very difmethod is given by ficult to carry out, particularly for problems of:H.lO'-+i _ large dimension, since f[xk(c)] need not-be difk+1- --=- Xk-+ Xk l (12)l _ L I_ ferentiable, convex, or-unimodal as a function of hre isa a solution ion of ofte ad program where xis the quadratic program a even if f is convex. For most problems we prefer,;,-,, . k the Armijo-like stepsize rule first proposed in + 1 _.1m_ V~f(x _(x o k ).-E~k)(xx i Betsek[4] [4 whereby', wheeby c;iPK 'is Bertsekas given byk by,-- ----isgiven -,xk - --,--r--- ,nimimze ----,-I --f (xk) (x xk) ---2k 2 -xjd)V ins t k- i;, A!{ I2 , Mt! 5" s3 .......- , - --- -- -. where m is the first nonnegative integer m satisfyig . k fying>,-~~~~~~~~ 5in f(xin ffx f mA > ' )Im x Os. (xk) -[Xk( 5)] > IvfXkj ' Xk - xk($ s)] * (7b) Here scalars the s, , and are fixed and are Here the scalars s , and a are fixed and are .I v1 -dent chosen so that s > 0, B6(0,1), and oc(O, 2 -). In addition to being easily implementable and convergent, the algorithm (4), (7), has the advantage that when ' it converges to a local minimum x satisfying the standard second order sufficiency conditions for optimality (including strict complementarity) it ativecontrantsat te iden fiteatifies theidenifis active constraints at xin x in a fiite finite number of iterations in the sense that there exists k such that B(x)- B=x ) Bx=B(xk V kk > k ->T() (8) where, for every x6R , B(x) denotes the set of indices of binding constraints at indices constraints of binding at x x B(x) -{i l 0 show that the results stated above hold also for sothtthe algorithm [xk ~~~~~the - algorithm ~performance kf(X)+(10) ~k+l ~kVf(~x~~k)]~ = ~k - (lens where D is a diagonal positive definite matrix and k ak is chosen by (7) where now xk((a) is given by xk(i) [Xk - cDkVf(xk)] For this it i elements dk, dldC i d - dd k < -< 0 (13) (11) is necessary to assume that the diagonal = ,...,n _.[10] d of the matrices Dk satisfy = 0,1 0,1 .... ii = 11,...,n , ,n .direction kk = -hred sm pstv n jar sar.the where d and d are some positive scalars. -here psivhas ' and ----and okI-----is a stepsize parameter.- There are- conver- :gence and superlinear rate of convergence., results the literature regarding this type of method (Levitin and Poljak [2], Dunn [5]) and its QuasiNewton versions (Garcia-Palomares and Mangasarian [6]), however its effectiveness is strongly depenupon the computational requirements of solving the quadratic program (13). For problems of small dimension, problem (13) can be solved rather quickdimension, problem (13) can be solved rather quickly by standard pivoting or manifold suboptimization methods but when the problem is of large dimension methods but whien the problem is of large dimension solution of the quadratic program (13) by standard methods can be very time consuming. Indeed, there are large-scale quadratic programming problems aare large-scale problems arising in optimalquadratic control programming the solution of which by pivoting methods is unthinkable. In any case the facility or lack thereoff of solving the quadratic program (13) must be accounted for when comparing method (12) against other alternatives. Another possible approach for constructing superlinearly convergent algorithms for solving 1 =l =-I 1,...n}. =, (9) Minor of te . modif n Minor modifications of the proofs given in [4] x subject to x > (7a) ar soe problem (1) stems from the original gradient projection proposal of Rosen [7], and is based on manifold suboptimization and active set strategies as in Gill and Murray [8] and Goldfarb [8], 'urray Goldfarb [9] [9], Luenberger Luenberger [10] and [10] and other sources, (see Lenard [11] for an up-to-date evaluation of various alternatives). Methods of this type are quite efficient for probof relatively small dimension, but are typically large-scale problems a ly unattractive unattractive for for large-scale problems smith with a large number of constraints binding at a solution. The main reason is that typically at most one constraint can be added to the active set at each iteration, so if, for example, 1,000 constraints are binding at the point of convergence and an interior starting point is selected, then the method will require at least 1,000 iterations (and possibly many more) to converge. While several authors [8], have alluded to the possibility of bending the ...... ..... of search along the constraint boundary, a only specific proposal known to the author that often possible to achieve s'ubstan- been made in the context of the manifold suboptimization approach is the one of McCormick [12] tial computational savings by proper diagonal scaling of VE as in (10), the resulting algorithm is typically characterized by linear linear convergence rate. typically characterized by convergence rate. proposed by Brayton and Cullu [13 incorporate proposed by Brayton and Cullum [131 incorporate bending but simultaneously require the solution of While it is Any attempt to construct a superlinearly algorithm must by necessity involve a convergent nondiagonal h t b scaling matrix Dk which is an adequate approximation and it does not seem particularlyr attractive methods for large-scale problems. (The Quasi-Newton quadratic programming subproblens). Manifold sub- .. .i. s optimization methods require also additional computation overhead in deciding which constraint to FINAL SIZE 8¥½ X 11 `1MODEL PAPER 77% RIi)D. a ur rithtake i ticurrent'it For the apdrop- from ,the: currenitly-:active. set... suitably chosen rectangle' (i.e. a set described by I parently most successful strategies (Lenard [11]) t which attempt to drop as many constraints as pos--- ------.upper and lower-bounds on the -variables) --as"a "lo.cal universe" instead of a manifold. sible, !this overhead can be significant and must be,-_ ' taken into account when comparing the manifold subThroughout- the- paper-we emphasize Newton-like - -optimization-- approach with other alternatives. -.-. ---'methods as prototypes. for'broad-classes of-super- ----------....-.....------------r.... linearly converging algorithms that fit the frameThe algorithms proposed in this paper attempt work of the paper. We often make positive definite-. to combine the basic simplicity of the steepest descent iteration-(4), (7) with the sophistication -I-- ness assumptions on-the Hessian-matrix of-f-in order --.to avoid getting bogged domm--in--techni.cal details and fast convergence of the.constrained Newton's They do not involve solution of. i ,relating to modifications of Newton's method such imethod (12), (13). as those employed in unconstrained minimization la quadratic program thereby avoiding the associated :''[14]-[16] to account for the possibility that V2f computational overhead, and there is no bound to is not positive definite., Quasi-Newton versions of !the number of constraints that can be added to the the Newton-like methods presented are possible but !currently active set thereby bypassing a' serious the discussion of specific implementations is be; inherent limitation of manifold suboptimization yond the scope of the paper. More generally it may The'basic form of the method is methods. ibe said that the nature ! Xk+l where xka) Xk.(ak) = = . .. .......... '.'. [x - aD kVf(x)] , ::'' a > 0. :'" of the algorithms proposed 'is such that almost every-useful idea from uncon'strained minimization can be fruitfully adapted :-within the constrained minimization framework con= here, however, the precise details of how '-:sidered : ''' this should be'done may involve considerable furIther research and experimentation. (14) - '' (15) Dk is a positive definite symmetric matrix which is employed throughout the paper is e Th notation as follows. All vectors are considered to be col-, The umn vectors, A prime denotes transposition. i.e. for standard norm in Rn is denoted by n i21 (x . .,x ) we write x = [ x = (x i=l n The gradient and Hessian of a function f; R + R 2 are denoted by Vf and V f respectively. Proofs of all results stated may be found in reference [17]. partly diagonal, and ak is a stepsize determined by 'k an Armijo-like rule similar to (7) that will be described later. The convergence and rate of convergence properties of this method are discussed in A key property of the method is that Section 2. under mild assumptions it identifies the manifold of binding constraints at a solution in a finite This number of iterations in the sense of (8). means that eventually the method is reduced to an unconstrained method on this manifold and brings to bear the extensive methodology and analysis relating to unconstrained minimization algorithms. In reference [17] we discuss how the method (14), (15) can form the basis for constructing algorithms for general linearly constrained problems of the form minimize Algorithms for Minimization Subject to Simple Constraints 2. We consider first the problem min f(x) x > 0} of (1). Any vector x satisfying the first order necessary condition (3) will be referred to as a We critical point with respect to problem (1)o focus attention at iterations of the form f(x) (16) + [xk 0 - k DkVf(Xk)] subject to bl < Ax < b2. Xk+l The main idea here is to view problem (16) locally as a simply constrained problem via a transformation of variables. For example, if the matrix A is square and invertible, problem (16) is equivalent to the problem r where Dk is a positive definite symmetric matrix minimize h(y) = and f(A-ly) [xk - aDkVf(xk)] , a >0 The stepsize a (i.e. f[xk(a)] > f(xk), V a > 0). following proposition identifies a class of matrices n objective function reduction is posDk for which .. Define for all. x > 0 sible. Ax. A similar approach based on an active set strategy is employed when A is not square and invertible. The ideas are similar to those involved in manifold suboptimization methods where a linear manifold is selected as a "local universe" for the purposes of = chosen by search along the arc of points It is easy to construct examples where an arbitrary positive definite choice of the matrix Dk leads to situations where it is impossible to reduce the value of the objective by suitable choice of the via the transformation = k is xk(a) subject to b1 < y < b2 y t - . . .. v' AurHOFD 77'. R ID. J!NMD F FP ; PA GES HE RE ] MODE L PAPER FINAL_ SIZE 8'1/ )X11 {11Xs(X)dxkt,>tO..Dl.pdteznsecld O;-;i'.i)I%; h i0, i o0}. > of e i'F-' (17) (17) ! __- s^l- acedeiq p;i*s bare l We say that a symmetric nxn matrix Djwithrelements' , ;_xi-_[- - I MVf(xk] d min{,w}. is'diagonal with respect-.to a subset-.of indices-------- kt.................--..... --..... .-----........ _ IC !2...n}-if ...----.... I (k+l)st Iteration of the Algorithm l |.. j 'd1 J ~ 0, Vi 6 I, j =,2;...,n, j i. (18) i .. _ atrix definite symmetric diagonal with.respect to the set IK .......... __ We select a _positive .... ....... ............ Dk which is Proposition 1: Let x > 0 and D be a positive def- . linite symmetric matrix which is diagonal with reigie n by tspect to I + (x) and denote. i I Jc IX "i talf(x)] Ix(c -; ' ! , b) If ea > 0. x is - not a critical point with respect to (1) there exists a scalar a problem f[x(a)] < f(x), Vae (O,--]. = [Xk -akDk (20) Vf(x k )+] should be chosen diagonal with respect to a subset of indices that contains I 0 Denote iPk DVf(x22) k) k Ixk(a) = Xk+l Xk(c) The algorithm that *we describe utilizes a sca- (24) where °k O (k 2 (25) = and mk is the first nonnegative integer m such that f(x) k ff( - fx (m) f[xk( af(k (x ______i discontinuity at the boundary of the constraint set whereby given a sequence {Xk} of interior points kthat convergs to I+(x) a k that converges to a boundary point x the set I+(x ) +_ k This may be strictly smaller than the set I (x). causes difficulties in proving convergence of the algorithm and may have an adverse effect on its rate of convergence. (This phenomenon is quite common in feasible direction algorithms and is referFor this reason we red to zigzagging or jamming). will employ certain enlargements of the sets I+(xk) with the aim of bypassing these difficulties. (23) Vkc > 0. k+l af(xk) , ax ) Unfortunately the set I+(Xk exhibits an undesirable k(xk) [xk cP is given-by Then x >0 such that Based on Proposition 1 we are led to the conclusion that the matrix Dk in the iteration k k+ 1 .(21) * a) The vector x is a critical point with respect to problem (1) if and only if x - x(a), . i (19) >0 'j:~~ af(x) < .{iJ]o- ......-<-L n JI+ I i 6 k) > om Pi + x fk axkii (26) kx' The stepsize rule (25), (26) may be viewed as a combination of the Armijo-like rule (7) and the Armijo rule usually employed in unconstrained minimization (see e.g. Polak [18]). When I+ is empty k the right-hand side of (26) becomes aomVf(xk) 'Pk and is identical to the corresponding expression of the Armijo rule in unconstrained optimization, (26) is +equality while if Ik = {1,2,....,n then inequality (26) is identical with (7). Note that, for all k; we have I (27) xk lar c > 0 (typically small), a fixedt diagonal positive definite matrix M (for example the identity), so the matrix Dk is diagonal with respect to I+(Xk). It is possible to show that, for all in > 0, the and two parameters B3c(0,1) and ae(O,8) that will be an tw c01armtr )tawilb used in connection with an Armijo-like stepsize rule. An initial vector x0 > 0 is chosen and at right-hand side of (26) positive if and only if the kth iteration of the algorithm we have a vector decreases the value of the objective function at each iteration k for which xk is not a critical. Actually the results that follow can be shown also for the case where MNis changed from one iteration to the next in a way that its diagonal elements are bounded above and away from zero. point, and We proceed convergence use of the is nonnegative, and it is x is not a critical point. k In conclusion the algorithm is well defined, ~~l~-~~T~~--_-~~--~~~~~-~~-us the---_I-~ of--I~~ essentially terminates if xk is critical. to analyze its convergence and rate of properties To this end we will make folloiing two assumptions: folio-~ring tw asuptos AUTHO'R PAGES HE RE ]iNMBER .- NMODE L PAPER 77% RfED. FINAL_ SIZE 81½ X 11 (A).;KiThe _gradient:.Vf-is -Lipschitz. continuous on each bounded set of R , i.e. given any bounded set chosen in a way-that approxiiates the'invers oif i'the portion of the Hessian of f corresponding to S c Ri there exists a scalar L (depending on S) such that IH tha-t (28) -- . .... J -- -s-1 11vztxj vrkpj ~ - "/ IVf(x) V . - l, - yI-. - ~o) ' ·· -....' VX,. y-y. ..............-l--.'" these -same-indices------------------------------I 'Proposition 3Let x* be a local minimum of, problem Assume -also--that (1) satisfying-Assumption (C).---. ' (B) holds in the stronger form whereby,-in-addition (B) There exist-positive:scalars XA, X2 and nonneg ative scalars q, q--such th I1 2 1 .. wk[x lx of Note that when q the form zlk 2 IzI < z'DkZ 2 < st a Mfxk ~~ksome 2AzI = q2 k_ _ _2n X (30) 0, relation (30) i , ', VzRn, k = 0,1 -hee. -. . {x I converges I to * and we have k 1 =- B(xk):= B(x*) , V k > k + 1-. -:- -then --... . ..... (31) ----- . are uni-f I problem ( No1teatwhnqlT conditions. For all x > 0 we denote by B(x) the set of indices of binding constraints at x, i.e. I+ .k..... {ix -O}, V x > . == 0, = {xx iten ie = (37) th Xk Under the assumptions of Proposition 3 we see that if the algorithm converges to a.local minimum x* satisfying Assumption (C) then it reduces evenby iternimization meth(24) and fore- We now focus attention at a local minimum satZisfying'D~I < hZ/ZI .I.~jlE~n~.X D.1..: (x B(x) (24) and for index k we have takes Proposition 2: Under Assumptions (A) and (B) above, every limit point of a sequence {xk} generated - (35)1 isf-o a soequence generated by iteration Dk-s -- and simply says that the eigenvalues of formly bounded above and away from zero. problem (1). ements d-to of41), the matrices satisfy- for some scalar. X- >0O----- ------------- {x k vicB(x*)} . (38) onverges wlavehave to x* and } = B(x*), =(x*), Vk> V >k (39) (37) +. (32) This shows that if the portion of the matrix D (C) The local minimuand away from (1) is such that for some 6 > 0, f is twice continously differentiable in the open sphere {x x-x*j < 6} and there exist positive scalars problem(A)sat(1). and(B) above, {xlx 2 is chosen to w see e k the inverse of the Hessian of f with respect to i indices then the algorithm eventually reduces x*=' these T = 0, . VizB(x*)} (38) to Newton's method restricted on the subspace T. mlz12 < zV More specifically, by rearranging indices if necessary, assume without loss of generality that 2 f(x)z < m2 |zj 2 , V x such that <We m i - x*i < 6cand z / 0 such that z i0, ix and icB(x*) correspondinger o the indices iofI (33) Ik = {r ... ,n}, (40) Furthere is some integer. index rk k we w ill have > UBx)= 0, i VicB(x*). 0}, V x 0. : (34) (32) The following proposition demonstrates an important property of the algorithm namely that under mild conditions it is attracted of oblem by a local (1) is minimum such x* satisfying Assumtion (C) anId identifies the set of active constraints atx* in a finite numbdiffer of iterations. Thus if the algorithm converges to x* Then D has the form -(39) Dix > k k ' - (41) Dk ° n dk where d' > 0, i =r. n and D can be an arbik rk+l k equivalent to an unconstrained ontimization method trary positive definite matrix. Suppose we choose restrictds oan tle sunl~~ sace of at [bindings)~; ) to No be he inverse of the Hessian of f ith torespect ____drestricted binding constraintsc on the subspao . at x*. ThiMs property is instrumetcall in proving spct to the indices i = ... rk i.e. the elements -suporlinear the - -- 1at Furthermore conavergenlce of the algorityhm wlhe portion ofDw to the indices of are fo ~ corresponding kx .. . . . .. then after a finite number of iterations it is : NU AU--THOR 77% RED. Rc,!fin text ob 2 f--..~a k= -I ktk [ ,o'R . rC IIEHE ES FINAL SIZE 8./ X 11 MODEL PAPERt s!.ccedi-,, Cd C f.ap.S.s ihee Vi!+t I -----t !- G::P.'. :McCormick, '"The: Variable. RedUction Meth 2 od for Nonlinear Programming", Management --- Science;--Vol-. --17,-- 1970, pp. 146-_160 . [12]1 - (42) 1 ` A | 2By Assption is positive (C), Vf(x*)definite on 113'i R.K. 'Brayton and J. Cullum, "An Algorithm for By Asstdmption (C), F f(x*) is positive definite on T so it follows from (39) 'that .--..this choice is well .. Minimizing a Differentiable Function Subject oLI Box Constraints and Errors";--Journal of defined and satisfies the assumption of Proposition' - to to Box Costraints and Errors"---Journal of 13 forlk sufficiently large. Since the conclusion Optimization Theory and Applications, Vol. Oimi-t 29, ,of this proposition asserts that the method even29 1979, 1979, pp. pp. 521-558. 521-558. . . .... .. . ..... ... 'tually reduces- to Newton's method restricted on the:- .. 'subspace T a superlinear convergence rate result -[14] IN. Murray, "Second Derivative Methods", in :follows. This type of argument can be used to -con--; , iNumerical Methods for Unconstrained Optimi!struct a number of Newton-like and Quasi-Newton zation, W. Murray (ed.), Academic Press, N.Y., 57-71.. ........... 1972, methods and prove corresponding' convergence and1972 pp. p -irate of'convergence'results , , .rat.e.of,., iconvergence'r '!,> ,. >+esui.ts.,...| j )[15]R. Fletcher, and T.L. Freeman, "A Modified Newton Method for Minimization", Journal of References . .'.'.. . ... .. Optimization Theory and Applications, 23, [1] [2] A.A. Goldstein, "Convex Programming in Hilbert Space", Bull. Amer. Math. Soc., Vol. 70, 1964, pp. 709-710. E.S. Levitin and B.T. Poljak, "Constrained Minimization Problems", U.S.S.R. Comput. Math. Phys., Vol. 6, 1966, pp. 1-50. [3] G.P. McCormick, "Anti-Zig-Zagging by Bending", Management Science, Vol. 15, 1969, pp. 315319. [4] D.P. Bertsekas, "On the Goldstein-LevitinPoljak Gradient Projection Method", IEEE Transactions on Aut. Control, Vol. AC-21, 1976, pp. 174-184. [5] J.C. Dunn, "Newton's Method and the Goldstein Step Length Rule for Constrained Minimization Problems", SIAM J. on Control and Optimization, Vol. 6, 1980, pp. b59-6/4. [6] U.M. Garcia-Palomares, and O.L. Mangasarian, "Superlinearly Convergc;nt Quasi-Newton Algo-rithms for Nonlinearly Constrained Optimization Problems", Mathematical Programming, Vol. 11, 1976, pp. 1-13. [7] J.B. Rosen, "The Gradient Projection Method for Nonlinear Programming, Part I: Linear Constraints", J. SIAM, Vol. 8, 1960, pp. 181217. [8] P.E. Gill. and M. Murray (eds.), Numerical Methods for Constrained Optimization, Academic Press; N.Y., 1974, Ch. 2 an 3. - [9] D. Goldfarb, "Extension of Davidon's Variable Metric Algorithm to Maximization Under Linear Inequality and Equality Constraints", SIAM J. of Applied Mlath., Vol. 17, 1969, pp. 739- [10] D.G. Luenberger, Introduction to Linear and Nonlinear Programming, Addison-lWesley, Reading, Mass. 1973, Ch. 11. [11] M.L. Lenard, "A Computational Study of Active Set Strategies in Nonlinear Progranmning with Linear Constraints", Mathematical Programming, Vol. 16, 1979, pp. 81-97. ' [16] I Vol. 1977, pp. 357-372. J.J. More and D.C. Sorensen, "On the Use of Directions of Negative Curvature in a Modified Newton Method", Mathematical Programming, Vol. 16, 1979, pp. 1-20. [17] D.P. Bertsekas, "Projected Newton Methods for Optimization Problems with Simple Constraints", SIAM J. on Control and Optimization, (to appear). [18] E. Polak, Computational Methods in Optimization: A Unified Approach, Academic Press, 1971.