4C EfNT-e 0R::-: ::iRS:tAR G NS I~~1:;~:I·.: *- Paprer-,. ; C'k ~ ~~~~~~~~~~~~~~:,, : :-'-'--; '--~~~~~~~: '"'-' ,- ' '','.-"''' ' -$ t :t~~~~h : : .; '..-.,.-..,! i- : . ' '- ,', : ';' ' ""- : ,t fdS000;0 ::ae woki0 0 t~~~~~:: 040-00:X ; : :=.- : : 1. ··. :..·:· ;; ";· -' r·- : ·- -·-·· :;· ;·-·. .;.·.. ---- ·. - ·- - · ···-. · -.. i I-. i i.· ·: - .·i:. -·1· :· '· :· :j r : . i INST.ITUTE " MASCTTS. . OFT.ECHNOLOGY...,. .A ~ Generalized Descent Methods For Asymmetric Systems of Equations and Variational Inequalities by Janice H. Hammond Thomas L. Magnanti OR 137-85 August 1985 GENERALIZED DESCENT METHODS FOR ASYMMETRIC SYSTEMS OF EQUATIONS AND VARIATIONAL INEQUALITIES by Janice H. Hammond Thomas L. Magnanti ABSTRACT We consider generalizations of the steepest descent algorithm for solving asymmetric systems of equations. We first show that if the system is linear and is defined by a matrix M, then the method converges if M2 is positive definite. We also establish easy to verify conditions on the matrix M that ensure that M is positive definite, and develop a scaling procedure that extends the class of matrices that satisfy the convergence conditions. In addition, we establish a local convergence result for nonlinear systems defined by uniformly monotone maps, and discuss a class of general descent methods. Finally, we show that a variant of the Frank-Wolfe method will solve a certain class of variational inequality problems. All of the methods that we consider reduce to standard nonlinear programming algorithms for equivalent optimization problems when the Jacobian of the underlying problem map is symmetric. We interpret the convergence conditions for the generalized steepest descent algorithms as restricting the degree of asymmetry of the problem map. KEYWORDS: variational inequalities, linear and nonlinear systems, steepest descent, Frank-Wolfe Algorithm. This research was partially supported by Grant # ECS-8316224 from the National Science Foundation's Program in Systems Theory and Operations Research. 1. INTRODUCTION Historically, systems of equations and inequalities have been closely linked with optimization problems. Theory and methods developed in one of these problem contexts have often complemented and stimulated new results in the other. In particular, equation and inequality systems are often viewed as the optimality conditions for an auxiliary nonlinear program. the physical sciences, variational principles For example, in (see Kaempffer [1967], for example) identify equilibrium conditions for systems such as electrical networks, chemical mixtures, and mechanical structures with equivalent optimization problems (minimizing power losses, Gibbs free energy, or potential energy). Similar identifications (e.g., identifying spatial price equilibria and urban traffic equilibria with minimizing consumer plus producer surplus (Samuelson [1952], Beckmann, McGuire and Winsten [1956])) have also proved to be quite useful in studying social and economic systems. Once a system of equations and inequalities has been posed as an equivalent optimization problem, nonlinear programming algorithms can be used to solve the system. Indeed, many noted nonlinear programming methods have been adopted and used with considerable success to solve systems of equations (e.g., Hestenes and Stiefel [1952], Ortega and Rheinboldt [1970]). However, viewing a system of equations or inequalities as the optimality conditions of an equivalent optimization problem requires that some form of symmetry condition be imposed on the system. For example, consider the general finite dimensional variational inequality problem mapping f:RnORn find If f x C C and a set VI(f,C): given a C C Rn, satisfying (x-x ) f(x ) is continuously differentiable, then 0 f(x) for every VF(x) x c C. for some map (1) 2 F:C C Rn+R if and only if Vf(x) is symmetric for all x c C. In this case the variational inequality system can be viewed as the optimality conditions for the optimization problem min {F(x):x c C. (2) Therefore, nonlinear programming methods can be applied to the optimization problem (2) in order to solve the variational inequality (1). Suppose now that Vf(x) is not symmetric, so that the variational inequality has no equivalent optimization problem such as (2). Can analogues of the nonlinear programming algorithms for (2) be applied directly to (1)? If so, when will they converge and at what convergence rate? This paper provides partial answers to these questions. we focus on solving systems of asymmetric equations f(x ) In particular, O0, which for purposes of interpretation we view as variational inequalities with in (1). In Sections 2-5 we introduce and study a generalized steepest descent algorithm for solving the asymmetric system simplified problem setting in which mapping. mappings. C = Rn f f(x ) O. Section 3 considers a is an affine, strictly monotone Section 4 extends these results to nonlinear, uniformly monotone Section 5 shows that the convergence conditions for the generalized steepest descent method can be weakened considerably if the problem mapping is scaled in an appropriate manner. Section 6 extends the results for the generalized steepest descent method to more general gradient methods. Finally, in Section 7 we consider a generalization of the Frank-Wolfe algorithm that is applicable to constrained variational inequalities. To conclude this section, we briefly outline the notational conventions and terminology to be used in this paper. be introduced in the text as needed. Other definitions and notation will 3 of be a real M Let nxn In general, we define the definiteness matrix. M e.g., without regard to symmetry: M if and x Mxonly > 0 if for nonzero M, part xofC Rn. symmetric the every if is positive definite if and only T Recall that is (Mpositive + M ), definite by MM = is defined positive definite. nxn An defines an inner product G positive definite symmetric matrix n R: on (x,y)G where denotes definition and = G product defined by T := xT Gy, denotes transposition. induces a norm on Rn: IXIIG = (XX) G which in turn induces a norm on the IA I := X) nxn matrix sup IIAx A: IG IxIIGl-1 IIAI IG' we implicitly assume that (By writing The inner has the same dimensions as A G.) The mapping f:C for every x C, y C; for every x C, y C C x if for some scalar C, y C, where is monotone on RnR strictly monotone on with k > 0, I1 11 y; x x if (x-y)T(f(x) - f(y)) (x-y)T(f(x) - f(y)) > 0 if C and uniformly (or strongly) monotone on (x-y)T(f(x) - f(y)) kx-yj|2 for every denotes the Euclidean norm. d in Rn , in the direction d; i.e., Finally, for any two points the ray emanating from C x and [x;d] = {y : y - x + d, e 2 0}- we let [x;d] denote 4 2. BACKGROUND AND MOTIVATION In this section, we introduce a generalized steepest descent algorithm for asymmetric systems of equations and show that the algorithm's convergence requires some restriction on the degree of asymmetry of the problem map. Consider the unconstrained variational inequality problem where f is continuously differentiable and uniformly monotone. unconstrained problem seeks a zero of the mapping for every If x Rn Vf(x) if and only if f(x ) is symmetric for every some uniformly convex functional satisfying C = Rn. VI(f,Rn), f(x ) 0 = since *T * (x-x ) f(x ) 0 0. x c Rn, F:Rn R, f, This then f is the gradient of and the unique solution x solves the convex minimization problem (2) with In this case, the solution to the unconstrained variational inequality problem can be found by using the steepest descent method to find the point x at which F achieves its minimum over Rn. Steepest Descent Algorithm for Unconstrained Minimization Problems x0 E R . Step 0: Select Step 1: Direction Choice. then stop: Step 2: k xk Set k 0. Compute * x . -VF(xk). If k VF(x )O 0, Otherwise, go to Step 2. One-Dimensional Minimization. Find xk1 = arg min{F(x) : x c [xk-VF(xk)]} Go to Step 1 with k k + 1. E Curry [1944] and Courant [1943] have given early expositions on this classical method. Curry attributes the method to Cauchy [1847], while Courant attributes it to Hadamard [1907]. [1971], or Bertsekas [1982]), if As is well-known (see, for example, Polak F is continuously differentiable and the 5 level set F(x0 )) {x : F(x) is bounded, then the steepest descent algorithm either terminates finitely with a point infinite, and every limit point exists) satisfies x x satisfying of the sequence VF(xN) {xk } 0 or it is (at least one VF(x ) = 0. Local rate of convergence results can be obtained by approximating by a quadratic function. F(x) = In particular, suppose that some positive definite symmetric nxn matrix Q. Then if respectively, the largest and smallest eigenvalues of r = k {x (A-a)/(A+a), the sequence } Q A Q x-x *l and F(x) a for are, and generated by the steepest descent algorithm satisfies k+l When f r 2 F(xk+l x*2 = F(k+ - - IIQ -Q r2 IIxk - is a gradient mapping, we can reformulate x IIQ (3) VI(f,Rn) as the equivalent minimization problem (2) and use the steepest descent algorithm to solve the minimization problem; equivalently, we can restate the steepest descent algorithm in a form that can be applied directly to the variational inequality problem. To do so, we eliminate any reference to algorithm and refer only to f(x) VF(x). = Since F(x) F(x) is convex, in the k+l x solves the one-dimensional optimization problem in Step 2 if and only if the directional derivative of directions. F k+l at x is nonnegative in all feasible Therefore, the algorithm can be restated in the following equivalent form: Generalized Steepest Descent Algorithm for the Unconstrained Variational Inequality Problem x0 Step O: Select Step 1: Direction Choice. E R . Set k Compute Otherwise, go to Step 2. 0. -f(x k). If f(x) 0, stop; x = x 6 Step 2: One-Dimensional Variational Inequality. Find E [x ;-f(xk)] x satisfying (x-xk+)T f(xk + ) Go to Step 1 with 0 = k for every x [x k;-f(xk)]. a k + 1. As stated, the algorithm is applicable to any unconstrained variational inequality problem. It can be viewed as a method that moves through the "vector field" defined by f by solving a sequence of one-dimensional variational inequalities. The generalized steepest descent algorithm will not solve every unconstrained variational inequality problem, even if the underlying map is uniformly monotone. If f is not a gradient mapping, the iterates generated by the algorithm can cycle or diverge. The following example illustrates this type of behavior. Example 1 Let f(x) = Mx, x e R2 where definite (because = I), M gradient map (since Vf(x) f = M and M Vf(x) - M Since . is uniformly monotone. I If the = [], x If is not symmetric. is orthogonal to this example, f is a p > 0, f is not a gradient Let = [:] and consider the x p - 1. As long the one-dimensional variational inequality subproblem on thxtheonek+l kth iteration will solve at the point f(xk + ) is positive p = O, progress of the generalized steepest descent algorithm when as M is symmetric) and the generalized steepest descent algorithm will converge. mapping, since P1 = -f(xO) u_ f(x(xk), xk+1 at which the vector the direction of movement. [], which implies that xl [ ] since In 7 f][2] is orthogonal to [ 2 1 [] x4 1[i] xO. [1 ], x x22 Similarly, xx 3 1] [ - ], and Thus, in this case, the algorithm cycles about the four points [ ] and [1]. Figure 1 illustrates this cyclic behavior. (In the figure, the mapping has been scaled to emphasize the orientation of the XI vector field.) = Mx* f(X f(x') = M,x' X f(X3 ) = Mx' X / I I I I I = X4 movement direction -f(x') draws away from solution as p Increases from 0 to 1 - x xl f(x') = M,x' -…__-------. X f(X) = MX' Figure 1: The Steepest Descent Iterates Need Not Converge If M is Asymmetric The iterates produced by the generalized steepest descent algorithm do not converge when p 1 because the matrix M p is "too asymmetric": the off-diagonal entries are too large in absolute value in comparison to the diagonal entries. Geometrically, if points directly away from the solution p = x 0, the vector field defined by O; as p f increases, the vector field begins to twist, which causes the movement direction to draw away from the solution until, when p fact, x x and only if P[ IPI O < 1. 1, , the algorithm no longer converges. In so the iterates converge to the solution if 8 In the following analysis, we investigate conditions on that ensure f that the generalized steepest descent algorithm solves an unconstrained variational inequality problem that cannot necessarily be reformulated as an equivalent minimization problem. 3. THE GENERALIZED STEEPEST DESCENT ALGORITHM FOR UNCONSTRAINED PROBLEMS WITH AFFINE MAPS In this section, we consider the unconstrained variational inequality problem VI(f,Rn), f(x) assume that = is a uniformly monotone affine map. where f Mx-b, where M is an nxn Thus, we real positive definite matrix and b Rn. 3.1 Convergence of the Generalized Steepest Descent Method is affine, we can easily find a closed form expression for the f When steplength ek on the th k-t h iteration. Lemma 1 Assume that f(x) = Mx-b. Let k x be the th k- h iterate generated by the generalized steepest descent method and assume that th th k- steplength determined on the k x * j x . Then the iteration is kb)T (Mxk b) (Mx k~. ~k k_ )T (4) (Mxk-b) M(Mx -b) Proof Step 2 of the algorithm determines (x-xk+)Tf(x k+ ) 0 for every xk+1 £ [xk; -f(xk)] x c [xk; -f(xk)]. satisfying (5) 9 k+l If x -fT(xk)f(xk) k k , inequality (5) with 2 0. But then f(x ) becomes O, so the algorithm would have terminated in Step 1. Hence, we assume that ek (x) k - f(x ) x - k+l k x ~ x , and therefore that 0. Substituting x x - ef(xk) x + and x - f(xk) into (5) gives k (ek-e)f T(x )f(xk-Okf(xk)) 2 0. Since this inequality is valid for all fT(xk)f(xk-kf(xk)) 0 must hold. = expression and solving for When f ek 6 0, and ek > F satisfying f(x) VF(x), When the Jacobian of VF(x) = f(x) convergence does not apply. into this yields the expression (4). steepest descent algorithm follows from the fact that function the condition f(x) - Mx-b Substituting is a gradient mapping, i.e. function for the algorithm. , o convergence of the F(x) f(x) is a descent is not symmetric, no exists, so in general this proof of Instead, we will establish convergence of the generalized steepest descent method by showing that the iterates produced by the algorithm contract to the solution with respect to the that M = (M + MT)). The M M norm (recall norm is a natural choice for establishing convergence because it corresponds directly to the descent function when f(x) while - Mx-b 1x-x* 12 and M is symmetric. (x-x*)TM(x-x* ) = In this case, 2F(x) + x*TMx , function for the algorithm if and only if I x-x I| so F(x) F(x) F(x) (1/2)xTMx-bTx, is a descent is a descent function for the algorithm. The following theorem states necessary and sufficient conditions on the matrix M for the steepest descent method to contract from any starting point with respect to the M norm. 10 Theorem 1 Let M f(x) = Mx-b. be a positive definite matrix, and Then the sequence of iterates produced by the generalized steepest descent method is VI(f,Rn) if and only if the matrix M2 of the problem x norm to the solution M guaranteed to contract in is positive definite. Furthermore, the contraction constant is given by r ((Mx)T(Mx) xTM2x xO x Mx (Mx) M(Mx) 1/2 - inf L (6) Proof *k x j x For ease of notation, let e = ,ek algorithm, let x x - (Mx- b), where | r l|IM T(x) := x*Il. S rnix x - x I r := sup* T(x). r iterate generated by the be the (k+l) s t iterate; then wil 1 show that there (Mx-b)(Mx-b) (Mx-b) M(Mx-b) r c [0,1) exists a real number x and let th kth be the that is independent of x r Because /I Ix - x ll and satisfies must satisfy for every x is clearly nonnegative, since x , we define T(x) > 0 for every xfx x x . We now show that IIzil 2M Because I Ix-xI |M zTMz r < 1 if and only if for every - [(x-x* ) TM(x-x *)] , and =- )] )-(M-b)[(-(Mx-b)-x*)TM(x-(Mx-b)-x x-x *)TM(x-*) is positive definite. we have that z c R, = [(x-x) TM(x-x*)-e(Mx-b)TM( M2 x-x*)-(x-x )TM(Mx-b)+82 (Mx-b)TM(Mx-b)] [M(x-x )] [M(x-x ) [(x-x*)TM2 (x-x *) [M(x-x )] M[M(x-x )] ] 11 e where the last equality follows after substituting for Mx - b - M(x - M * y x-x # 0 b) - Thus, T(x) T2 T R(y) := [(My) (My)][y My] [yTMy][(My) M(My) ] and sup [1-R(y)]l yO M(x - x ). with [1-inf R(y)]½ y/O where r - sup*T(x) = xx inf R(y) > 0. y#O M2 Then is positive definite. M Suppose that [1-R(y)], Note that if and only if < 1 and replacing is positive definite, and M2y T inf R(y) = inf y#O yO0 T yTMy T y y (My)TM(My) (My)T(My) yTM2y inf T my0 sup (My) TM(My) sup yO0 yy y~O X (M2 ) min [X max (Max( where X in(A) mi n ( A max eigenvalues of the real symmetric matrix A. symmetric, has positive real eigenvalues. definiteness of Consequently, M (My) ) (7) (M) ) denote, respectively, the minimum and maximum Ama (A) and (My) M , being positive definite and Similarly, the positive ensures that the eigenvalues of and hence inf R(y) > 0, M are real and positive. r < 1. yfO M2 Conversely, if nonzero vector yTMy >O. y. Because y # 0 Moreover, which implies that is not positive definite, then r 1. M is positive definite, ensures that yTM2y 0 for some (My)TM(My) > 0 (My)T(My) > 0. Thus, R(y) and 0, 12 The expression defining the contraction constant follows from the convergence proof. O Corollary is bounded from above by r The contraction constant [ [i min(M r = [ L - -T^ 2 ^ -1I ) 1 -2 JI max(M) max(M) Proof r is defined by . r = [1 - inf R(y where yfO yTM2y inf R(y) - inf y#O yfO *(My) T inf (My) (My) TM(M)y) y My y#O yTM2y yTMy My sup y~o (8) T (My) (My) (y)T(my) The numerator of the last expression can be rewritten as follows: TM2 inf T yO0 yTMy = = yTM2y inf = inf yfO = inf ziO Y ( yTy Y z z Xmin(A), mmn where A Now for any (M I)- TM2 (MJ)-1 nxn matrix M½ and G, X is any matrix satisfying is an eigenvalue of A (Mi)T (M) if and only if M. is 13 an eigenvalue of if G -M, , G AG, G then (see, for example, Strang [1976]). ,, AG = M In particular, and hence hence-1 and (MN min(A) - X (9) ). Finally, sup (My) yOf M(My) (My) The result follows from (8), Mz z z (My) sup (10) max D (9) and (10). Note that a different upper bound on r can be derived from inequality (7) in the proof of Theorem 1: i r r := I [1- 1 - In general, this bound is not as tight as the bound corollary. To see this, let that has positive real eigenvalues. S-IT S and X min(T) = min T r given in the be symmetric matrices, and assume min Ilxlil Then xTTx 'T -1 S x SS Tx (S-lT)xTSx = 'iA min (S- T)· iX(S min where Let max ||XI||- xTSx T)X (S), max is the eigenvector corresponding to the minimum eigenvalue of S - M and T = M2 . Then S-1T T S M-1M M2 M S- T. has has positive real eigenvalues 14 because it has the same set of eigenvalues as the positive definite symmetric Thus, (M)-TM (M)l. matrix M) min(M . max which implies that r (M) X (M A2Q, min = ( 2 i' - max) 3.2 r. 2 in max)] Discussion of M 2 The theorem indicates that the key to convergence of the generalized steepest descent method is the matrix M2 . If the positive definite matrix M is symmetric, the convergence of the steepest descent algorithm for M unconstrained convex minimization problems follows immediately: positive definite because M, MTM = positive definite imposes a restriction on the degree to which from M . To see this, note that M2 M M can differ for every x O0. is positive definite if and only if for every nonzero vector the angle between the vectors MTx The positive definiteness of on the quantity 1 1M-MT 11 be is positive definite if and only if M xTM2x = (MTx)T(Mx) > 0 Thus, In being positive definite, is nonsingular. general, the condition that the square of the positive definite matrix is and Mx M2 is acute. does not imply an absolute upper bound for any norm increase this quantity by multiplying x, M ||-|, because we can always by a constant. However, if M2 is 15 jIM-MTI I/ M+MTII positive definite, then the normalized quantity less than 1. This result follows from the following proposition. (Anstreicher [1984]) Proposition 1 M Let be an T hen, real matrix. nxn positive definite if and only if x must be for any norm M2 l(M+MT)xll < l(M-M )x 11., is for every 0. Proof T M2 + (M2) Thus, ' i[(M+MT) xT(M2 )x > 0 2 + (M-MT)2] xT(M2 - + (M2) T)x > xT(M+MT)2x + xT(M-MT)2x > 0 T xT(M+MT ) (M+MT)x > -xT(M-M I (M+MT)xll > In particular, if T, , .T ,, I IM- M I M M xT(M ) 2x T ) (M-MT )x (M-MT)xl. O i s positive definite, then max ,- , -T, ,, IItm-m xll -'I ,, Im -T% - II -M - x I I < I .... , T%- I I I I tMtm )X I I I x 1=1 max II (M+MT)x I = IIM+MT I, where x - argmax I1xl11 I xlll Consequently, I I- MT I1l/ IM+MI C(M-MT )x I I. < 1. For related results on the positive definiteness of M2 , see Johnson's [1972] study of complex matrices whose hermitian part is positive definite (see also Ballantine and Johnson [1975]). This thesis and subsequent paper describe conditions under which the hermitian part of the square of such a matrix is positive definite. 16 3.3 Discussion of the Bound on the Contraction Constant Let us return to the problem defined in Example 1. f(x) M x definite. is affine and strictly monotone, since MP ' For this example, 2p is positive definite if and only if M M -M 12/IM M 12 < 1 if 11iN M1-0 pMT2 P P pp2 2 M 212p 2 ~0 -P and Ipi < and only if .[1 M ,1-p The mapping . is positive 2 , Moreover, is positive definite, since 0 2 22 21 For this example, the upper bound on the contraction constant given in the corollary is tight. To see this, first note that k+1 p[ 0 -1 k Jx 0- |IXk k1 and |M x * O. 0 I, so the Thus, IIXk+ | 112 (P2(Xk)TIX k) l1I p112 olx * 12 i = p Recall from the example that norm is equivalent to the Euclidean norm. x M ' p -pI Ix X I 2 hence the contraction constant for the problem is 2 1P|- The bound given by the corollary is also Ipl, because min(M n) ami ) - p and Xmx(M) max0 , giving - [1-(l-p2 )] - ipi M p 17 and applying the Kantorovich M found by diagonalizing r upper bound on is symmetric, a tighter M If may be quite loose. r contraction constant on the r For affine problems defined by symmetric matrices, the bound inequality (see, for example, Luenberger [1973]) is rmax r k X = - Xmin(M) + nM) for example, if r r - 0.82 and r in the same way if then 1, then k = 3, 0.58; if - k - while (k-l)/(k+l), in (M), r (M)/ In terms of the condition number . r = S rs = 0.5 0; r and = k is not symmetric. then 1.5, and if r = 0.82; A matrix Thus, [(k-l)/k]. M r s k - 0.2 10, and then cannot be derived r This tighter upper bound on 0.95. M if r can be decomposed into its spectral decomposition (and hence is unitarily equivalent to a diagonal matrix) if and only if and only if T T M M = MM . diagonalize M r s 3.4 is normal, which is true for a real matrix M M Thus, if if is not symmetric, we cannot necessarily and use the Kantorovich inequality to obtain the upper bound on the contraction constant Sufficient Conditions for r. M2 to be Positive Definite We now seek easy to verify conditions on the matrix that the matrix M M is positive definite. M that will ensure The following example shows that the double (i.e., row and column) diagonal dominance condition, a necessary and sufficient condition for convergence for the problem in Example 1, is not in general a sufficiently strong condition for M2 to be positive definite. 18 Example 2 2 0 0 4 2 Let .99 0 SinceM det(2M .99 ) 01 -3.2836, 1 . Then M 0 0 8 2.97 2.97 Mis not - _2.97 2.97 0 and 2M 2.97 2 positive 01 definite, and - therefore M2 1 _2.97 0M2 D is not positive definite. Results by Ahn and Hogan [1982], (see also Dafermos [1983] and Florian and Spiess [1982]) imply that the norm condition D - diag(M) and B - M-D, ID BD I112 , < where ensures that the Jacobi method will solve an unconstrained variational inequality problem defined by an affine map. (I ID BD- 112 1 < implies the usual condition for convergence of the Jacobi method for linear equations, radius of the matrix A, p(D-1B) < 1, and Chan [1981] show that if I ID-BD- 112 < 1. ID- BD < 1 M2 II2 p(D-1 B) because M where p(A) is the spectral D-1 BI ID < ID½BD- 112 ) Pang is doubly diagonally dominant, then Example 2, therefore, also demonstrates that is not a sufficiently strong condition on M to ensure that is positive definite. The following theorem shows that stronger double diagonal dominance conditions imposed on M guarantee that which in turn implies that M2 M2 is doubly diagonally dominant, is positive definite. Theorem 2 Let M - (Mij) for every i be an nxn matrix with positive diagonal entries. 1,2,...,n, IMijI ji1 < ct InMjil and joi < ct, If 02 _ 19 min{(Mii) : il,...,n} where t = and max{Mii..: il,...,n} c = / - 1, then both M and M2 are doubly diagonally dominant, and therefore positive definite, matrices. Proof Let M.. be the element of (i,j)th M. Then the (i,j)th element of 13is is (Mii) 2 (M )ij = + r kfi n = MikMkj k=1 iiMij + MijMJj M2 To show that (M2)ii i,j + (Mii) > Z I(M 2 ) i I and (M2)ii > > (Mii) > - - I(M 2 )jil Z Jr. k(i and if ij. MikMkj is doubly diagonally dominant, we must show that JT'' i.e., that if i=j MikMki E k~i Mik ki + Mik M ki + i~i Mii.M. MijMjj + + MikMkj I ki,j z IMiiM i + MjiMjj + I M jkki joi ki,j (11) (12) To show that (11) holds, it is enough (by Cauchy's Inequality and the triangle inequality) to show that (Mi2 > k i IMikllMkil + iij i IMij I + Mi I ij jiJ I+ z IMikl MkIjl jii ki,j Because the last term in the righthand side of the above expression is equal to ZE kji IMikMllkjl, j3i,k the righthand side is the sum of the first and last terms in 20 | kl['M il + kri E jSi,k 1\kj] E kii 'Mik [ £ 'kjl ]. jok Consequently, to show (11) is true, we show that IMik Mii [ Ikjl] +Mii Mij I + E ik 1f kJi E Mi IMij i jji (13) t To establish (13) (and hence (11)), we introduce the quantity in the statement of the theorem. (i) t Mii for every defined Note that for (Mii) /Mii = M..ii t (since iil,...,n every i), and (ii) Max{Mii: i-=l,...,n} t (Mii) The bounds on the off-diagonal elements of for every M i=l,...,n. assumed in the statement of the theorem ensure that the righthand side of (13) is bounded from above by IMik(ct) + Mii(ct) + Max{Mii: i=l,...,n}(ct) k#i < c2 t2 + Mii(ct) + (ct) Max{Mii: il,...,n} 2 2 + 2c(Mii) 2 (Mui)if + (Mii) c(MII) (i)and since(ii). (Mii) (c + 2c) (Mii) 2 by or, holds Thus, (13)- c c c + 2c - 1 /-1, 0, which holds if and only if (13), and therefore (11), must hold. then (12) must hold. c e [-/'-1, /-1]. Similarly, if M2 These two results establish that > 0, if Thus, if c - -1, is doubly diagonally dominant whenever the hypotheses of the theorem are satisfied. The double diagonal dominance of diagonally dominant. Because M1 M2 M2 ensures that is doubly is symmetric and row diagonally dominant, by the Gershgorin Circle Theorem (Gershgorin [1931]), M has real, positive 21 eigenvalues. Since M is symmetric and has positive eigenvalues, positive definite, and hence M M is is positive definite. The conditions that the theorem imposes on the off-diagonal elements of M also ensure that M is doubly diagonally dominant, and hence that positive definite. stronger than the assumption that that M2 M is ° Because the assumption that imposed on M M M is doubly diagonally dominant is is positive definite, the conditions in Theorem 2 are likely to be stronger than necessary to show is positive definite. In a number of numerical examples, we have compared the following three conditions: (1) the conditions of Theorem 2; (2) necessary and sufficient conditions for the matrix M2 to be doubly M2 to be diagonally dominant; and (3) necessary and sufficient conditions for the matrix positive definite. From the proof of Theorem 2, we know that conditions (1) imply conditions (2), and conditions (2) imply conditions (3). The examples suggest that there is a much larger "gap" between conditions (2) and (3) than between conditions (1) and (2). Thus, it seems that we cannot find conditions much less restrictive than the conditions of the theorem as long as we look for conditions that imply that M2 M2 is doubly diagonally dominant instead of showing directly that is positive definite. The conditions that Theorem 2 imposes on the off-diagonal elements of are the least restrictive when the diagonal elements of M are all equal. Section 5, we show that by scaling the rows or the columns of M M In so that the 22 scaled matrix has equal diagonal entries, we may be able to weaken considerably the conditions imposed on M. We close this section by noting that the positive definiteness of the matrices M M2 and consequence, if M is preserved under unitary transformations. and M2 As a are positive definite, then the generalized steepest descent method will solve any unconstrained variational inequality problem defined by a mapping to 4. f(x) Mx-b, where M is unitarily equivalent M. THE GENERALIZED STEEPEST DESCENT ALGORITHM FOR UNCONSTRAINED PROBLEMS WITH NONLINEAR MAPS If f:R n_ Rn is not affine, strict monotonicity is not a sufficiently strong condition to ensure that a solution to the unconstrained problem n VI(f,R ) exists. If, for example, Because the ground set no solution. x n - 1 and f(x) - e , R then VI(f,R 1) has over which the problem is formulated is not compact, some type of coercivity condition must be imposed on the mapping (See, for example, to ensure the existence of a solution. f Auslender [1976] and Kinderlehrer and Stampacchia [1980].) solution to VI(f,Rn) hemicontinuous. is ensured if The existence of a f is strongly coercive and Therefore, because uniform monotonicity implies strong coercivity, in this section we restrict our attention to problems defined by uniformly monotone mappings. The following theorem establishes conditions under which the generalized steepest descent method will solve an unconstrained variational inequality problem with a nonlinear mapping f. In this case, the key to convergence is the definiteness of the square of the Jacobian of solution * x . f evaluated at the 23 Theorem 3 Let M - Vf(x ), Let that M2 close in be uniformly monotone and twice Gateaux-differentiable. Rn f: Rn is positive definite. and assume Then, if the initial iterate is sufficiently the sequence of iterates produced by x , norm to the solution M VI(f,R ), is the unique solution to x where M the generalized steepest descent algorithm contracts to the solution in norm. Proof I To simplify notation, we let E := IIX-X*11 be the initial iterate. x x Let this proof. norm throughout denote the M |' We will show that if is sufficiently small, then the iterates generated by the > 0 x . algorithm contract to the solution the first iterate generated by the x, By Step 2 of the algorithm, < 1. We assume that algorithm, solves the one-dimensional variational inequality problem on the ray [x; -f(x)] emanating from in the direction x Lemma 1 demonstrates that the solution satisfies x - f(x), fT (x)f(x) = The proof of to this one-dimensional problem x as Thus, if we parameterize the ray [x; -f(x)] O. x - x - then -f(x). f(x), where the steplength e is defined by the equation fT(x)f(x - f(x)) By Lemma 2, which follows, the value to determine an expression for we approximate f about x (14) 0. satisfying (14) is unique. 0, for any x S :" {x : with a linear mapping and let IX - X |I v error in this linear approximation, i.e., f(x) - f(x ) + Vf(x )(x - x ) In order + v ' M(x - x ) + vx X x = E} denote the 24 Substituting M(x - expression for f(x) - x ) + v- for x f(x) in (14) yields the following : fT(x)M(x - x ) + fT(x)vfT(x)Mf( x fT (x)Mf(x) To show that the iterates generated by the algorithm contract t, o the solution in M norm, we show that there exists a real number is independent of must satisfy x S , x lix - x l/llx T(x) := r we define IIx - x i < rllx - xlI. and satisfies r sup T(x). r - x*l that 0,1) r Beca use r for every T(x) > 0 is clearly nonnegative, since xcS £ for every x S . We now show that have that T(x) R(x) := = r < 1. Proceeding as in the proof of Theorem 1, we [1 - R(x)]½ , -T where fT(x)M(x - x ) + - 2fM* 2 *T (x - x )TMf(x) fT(x)Mf(x) - (15) (x - x )TM(x - x ) Substituting for in (15) and replacing f(x) M(x - x ) + vx with we have that R(x) - *)TMTMx* * *TM2 * ) TMT(x - x )][(x - x ) (x - x )] - E(x) [(x - x I (16) v-, and is [(M(x - x ) + vx] M[M(x - x ) + v )][(x - x* ) M(x - x )] x x where the error term E(x) contains all terms involving v and x given by Tv* *T T *T -{v M(x - x* ) + vT[M(x - x ) + v x] }{(x - x ) Mv - v[M(x - x ) + v x] x x x x x x + (x - x ) M M(x - x ).{(x - x )TMv x + (x - x*2(x - x ){v x - x) - v-[M(x - x ) + v ]} x v[M(x X x - x ) + vx x ) 25 E(x) can be bounded from above using the triangle inequality, Cauchy's inequality, and the fact that the matrix norm IIAII'[xjI E(x) IE(x) IlAx I I A I satisfies for every vector x: 1 11 Ivxl I M l - X| + IIvx |.[MI l| I - X | + lvx |11]2 l M2 11)llx - x*1I 12 + {( I IMTMI I + {I IxlI IMIIIXI II x I+ IIIv < {kllix - x*113 + k 2 iix - X* 12 [k + k1X - x 12 3 - x I I + I I }1 x - x*11 + k 4 11X - x*112]}2 {kll x - x*113 + k 2 llx - x [k311x - x i kx MI'I '[ I |l + k 4 lIx - x *12]} - x 115 where the second inequality follows from Lemma 3, which follows, and the third x - x inequality holds since k. 1 1 < 1 for every x S. k since each 0 O0. Since r -I I sup [1 - R(x)]' = [1 xcS C sup T(x) xESC and only if inf R(x) > 0. We therefore show 1that Dividing the numerator and denominator (16) by (recall that I:II denotes the M norm) and - *) inf R(x)] , xeS Ilx - x*)TM M(x - x )] x || [(x using inf R(x) xeS c inf XESE (x - x )TM(x - x) kl I if inf R(x) > 0. xES -E(x) - kllx - x 115, we obtain (x - x*)TM2 (x r < 1 - x*113 (x - x )TMM(x - x ) [M(x - x*) + v ]TM[M(x - x *) + - x*TT(x - x* (x - x ) M M(x - x ) x] 26 inf xS XcS (x - x*)TM2 (x - * ) sup kl|x - x T )TM(x - x*) xS XE * TMT ( * - SUP [M(x - x ) + vX] X sup XES * )TMTM(x - ) * T * L (X - 13 M[M(x - x ) + vX] (x - x )TM TM (x - x ) Considering each of these three expressions separately, we have: (X - X ) M (X - x ) inf xES (x - x )TM(x - x ) g inf xcS (x-x*)TM2(x- sup (x- *T (x - x )(x x*) X (M2 ) min * - x ) x )TM(x - x*) kmax ( > 0 ) xcS (X - x )T(x - x ) M2 since and M are positive definite; (x-x )TM(x-x*) kl Ix - x I13 sup xcS = kllx - x II (x-x *)T(x-x* ) sup (x - x ) MTM(x - x ) ** (X - x ) (X - x ) E C ) x (M) kcm maxin = be , Xm2.n (MM) where sup x ES b : k.e ax(M)/ X xS ;S M) 2 0; [M(x - x ) + v x ] TM[M(x -x and ) + v x] (x - x* )TMT M MM(x - x* ) £ sup in (MT [(M(x - x ) ]TM[M(x - x)] + [M(x - x )].[M(x - x )] ~max (M) + up I xESc I1 · I IM2 I IX - sup xES vTM (x - x ) + (X x )TMTMv (x - x )TMTM(x E I+ I x1- x*1-I I IMTMI1- I I (x- x ) TMTM(x - x*) x + VTMv x x x - x*) I1+ I I I IMI I I IVx I 27 sup(M)+ ( I -x x II cI )TTM(x - x ) x max(M) + sup <X max (M) + max XESc (-x x - *TT * (x - x )TM M(x - x ) inf xcS =X (x-x ) TM(x-x*) ) T(x-x ) sup xESE clix * T (X - X ) ( ACE C max(M) ma (M) + - ) = X max (M) + a, (MM) min where c 0, and hence a :- cX (M)/X in(MTM) a 0. Combining these inequalities gives X (M2 ) mbin - bc x inf X£S R(x) a Xmax (M) + aE c which is greater than zero if is sufficiently small, since the denominator X (M2)/X (M) > 0, min max is positive, constant (M) max and b 0. Hence, the contraction is less than 1. r Lemma 2 If unique f is uniformly monotone, then, for a given > 0 satisfying fT(x)f(x - Bf(x)) - 0. is the modulus of monotonicity of f. x there exists a x , Moreover 7 9 -, where a 28 Proof solves the one-dimensional variational inequality problem e a 0 [x - f(x) - (x - i.e., e satisfies Thus, e solves f(x)) f(x))] f(x - where VI(g,R 1), 2 )-g(e x, 1 )] (e 2 e 0, 0. The existence is uniformly g allf(x) 112: monotone with modulus of monotonicity (e02-1)[g(e for every -fT(x)f(x - ef(x)). g(e) : 6 follow, because, for a given and uniqueness of e for every f(x)) a 0 (x)f(x - -(8 - )f 0 1 )fT(x)[f(x-elf(x)) - f(x-6 2 f(x))] = [(x-elf(x)) - (x-62f(x))] [f(x-el )_f))-f(x-2f(x))] lIx - elf(x) - x + e2f(x) 112 - [Illf(x) l2] ( 2 * Moreover, Moreover, e is positive, because x o< To show that f (x)f(x - ef(x)) Since - implies that g(O) 02 = e we set -, x a0 1. Therefore, since 81 '0 11 f(x)j2 2 < 0. in expression (17). -2 T ae2fT(x)f(x). fT(x)f(x) e >0 and - this substitution gives 0, -T Using the fact that (17) 1)2 and f(x) # (because 0 x $ x ), we see that a > 0, e <- Lemma 3 Assume that S : { : II fT(x)f(x) - O. - let 1.11 f has a second Gateaux derivative on the open set 11 < 1}. Let Let x x - ef(x), v - f(x) - M(x - x ), denote M norm. the denote the M norm. let where e is chosen so that v- - f(x) - M(x - x ), and 29 Then, for there exist constants 1, 2 and 3, i x the following conditions for any (i) (ii) (iii) - S: clIx - x*112; Ilvxll Ix that satisfy 0 Ci *1 lIv-11 c3 lx and I x - x*I; c2 x*112. - Proof (i) IlVxll = I - f(x) - M(x - x ) I X*) 11 If(x) - f(x ) - Vf(x )(x - I sup i ,x I IV2 f[x * + t(x - x ) -* 2x*12 0;5t;1 <-sup xcES sup 0 O<t-1 IjV 2 * f[x + t - )]II1}1Ix - x* 12 - clllx - X*11 2 . where the first inequality follows from an extended mean value theorem stated as Theorem 3.3.6 in Ortega and Rheinboldt [1970]. (ii) II - X*11 Ix - [M(x - x ) + v xx] - x < IIx - x IMi · IX - + c Clearly, 0. II x*II + IIx - x*112 . c211x-x II. where the first inequality follows from Lemma 2 and (i), and 2 := 1 + 2 IMI + a~~~~c c1 a >O because IIMII > 0, C1 k 0, and a > 0. 30 -= f(a) - f(x*) - Vf(x*)(x - x )1l S sup tV2f[x + t(x - x )]!l~x - x 112 up* {x:llx-x 11c C311 - - * c 3 c21Ix - x 1 = 311x- x 5. k and 12 where the last inequality follows from (ii) and 2 x)]*112 112 C3 - 2 c2 I- sup II2f[x* + t(x - X*)1i 2} o0tl c3 O because c3 0 >0. o SCALING THE MAPPING OF AN UNCONSTRAINED VARIATIONAL INEQUALITY PROBLEM In this section, we consider a procedure for scaling the mapping of an unconstrained variational inequality problem that is to be solved by the generalized steepest descent algorithm. We first consider the problem VI(f,R n ) f(x) defined by the affine mapping either the rows or the columns of M Mx-b. We show that by scaling in an appropriate manner before applying the generalized steepest descent algorithm, we can weaken, perhaps considerably, the convergence conditions that Theorem 2 imposes on M. When f(x) - Mx-b, the unconstrained problem VI(f,R n ) Mx - b. the problem of finding a solution to the linear equation nonsingular nxn equivalent. We can, therefore, find the solution to equivalent problem matrix, then the linear systems VI(Af,Rn), where steepest descent method will solve Mx Af(x) - AMx-Ab. VI(Af,R n ) if both = is equivalent to b If A is a and (AM)x - Ab VI(f,Rn) by solving the The generalized AM and are (AM) 2 are 31 In particular, suppose that positive definite matrices. diagonal entries and let D = diag(M). Then D if (D-1M) and are both positive definite matrices, which is true, by Theorem 2, if i = 1,2,...,n, for every Z J~i l(D M)ij where Since has positive is nonsingular, and the VI(D-lf,R n ) generalized steepest descent method will solve (D 1M) 2 M = c / - 1 is diagonal, and D and < ct )ii (D- M) (Note that all diagonal entries of M) I< (18) ct, ji min{[(D- 1M)ii]2: i = 1,...,n} t =ii : i = l,...,n} max{(D- M) and (D Z j(D joi (Mii) = = /M M ('ij D M , then for each i and j, ij/Mii' are equal.) Conditions (18) can therefore be simplified, establishing the following result. Theorem 4 Let and let M D = (Mij) be an = [diag(M)] nxn 1. matrix with positive diagonal entries, If for every and |MilI < cM JSi i ii where c matrices. - 1, then (D-1M) and i joi 2 (D 1M) = 1,2,...,n, j M < c, are positive definite (19) 32 The conditions that Theorem 4 imposes on can be considerably less M restrictive than the analogous conditions that Theorem 2 imposes on M; namely, for every 1,2,...,n, i IMij joi I < ct (20) IMjil < ct, and joi min{(Mii)2: i = 1,...,n} where c = / - 1 t = and The conditions on the row sums of as those in (19), because t Mii true for the column sum conditions: in (20) are at least as restrictive M < E n t because Mii min{Mii: This is also i = 1,...,n},- the < c min{Mii: i Z IMji jfi MJ i = 1,2,...,n. for every column conditions in (20) imply that hence i = 1,...,n} max{Mii: 1,...,n}, and j -I- < c. The conditions specified in (20) are equivalent to those given in (19) if, and only if, all of the diagonal entries of M are identical. The following example allows us to compare conditions (19) and (20) for the problem VI(f,R n ) f(x) - Mx-b. defined by an affine map Example 3 Consider the effect of scaling the matrix N M - 00 He a b 0 1 0 1/N 0 , 1 0 0 M defined as 0 and D M - 1 Conditions (19) for the scaled problem reduce to the inequality 1 0 0 a 1 0 b 0 1 33 la + bi < - 1. In contrast, conditions (20) for the unscaled problem are 1) N2 (/2Ja| + Jb[ < The upper bound on the upperbound N = 1, from (/ - I( I+ a - 1) bi 0 N if N 1 imposed on the conditions imposed on becomes increasingly stringent. lal and IbI 1 . imposed by the conditions (20) is tighter than la + bI for the scaled problem unless in which case the bounds are the same. 1, drive 1 1) N if (As a and N 0 As the value of IbI or N moves away for the unscaled problem N + , the conditions (19) o to zero). Analogous results can be obtained by column-scaling the matrix M. An argument similar to our discussion of row-scaling shows that if for every i = 1,2,...,n, IM M. then MD - 1 and < c and (MD-1) 2 Z Mji < cMii, where c (21) (21) - are positive definite. For a given problem, either the rows or the columns of M could be scaled in order to satisfy one of the sets of conditions that ensure convergence of the generalized steepest descent method. For a given matrix, one of these scaling procedures could define a matrix for which the algorithm will work, even if the other does not. If we column-scale the matrix of Example 3, then conditions (21) reduce to 34 lal bI ' (r - 1)N. + Thus, in order to obtain the least restrictive conditions on to column scale for all positive values of M, it is better N. By using a row- or column-scaling procedure, it might be possible to transform a variational inequality problem defined by a nonmonotone affine map into a problem defined by a strictly monotone affine map. MD -1 might be positive definite, even if M is not. That is, D M or The following example illustrates such a situation. Example 4 Let -1 L M 8 0.51 2 . Neither (-1) (D M) nor and det(M ) = D/~M are positive definite, since det(D M) f(x) Mx-b, M is positive definite, since 10 det(M) - -8.0625 < 0 2 M 0.27 > 0. -1665.5625 < 0. However, both M) - 0.5775 > 0 det(D D M and and Consequently, an unconstrained problem defined by and this choice of the matrix M can be transformed, by row-scaling, into an equivalent problem that can be solved by the generalized steepest descent method, even though neither definite. nor M is positive Note that column-scaling will not produce a matrix satisfying the steepest descent convergence properties: since M det(D ) MD is not positive definite, -15.2 < 0. O These scaling procedures can also be used to transform a nonlinear mapping into one that satisfies the convergence conditions given in Theorem 3 for the generalized steepest descent algorithm. algorithm will converge in a neighborhood of the solution x if The 35 ) (or -fD1 Df (i) is uniformly monotone and twice Gateaux differentiable; and D where 6. (or [Vf(x )D- 12 ) [D- Vf(x )]2 (ii) = is positive definite, diag[Vf(x )]. GENERALIZED DESCENT ALGORITHMS The steepest descent algorithm for the unconstrained minimization problem Rn Min {F(x):x {xk } generates a sequence of iterates that minimizes in the direction F } by determining a point k -VF(x ) Rn xk+1 x . from the previous iterate In contrast, general descent (or gradient) methods generate a sequence of {xk iterates dkVF(xk) < 0. i.e., k by determining a point from dk direction x } x * k x , where dk x C R is any descent direction from The set of descent directions for in the F that minimizes k x * x , from the point F is given by D(xk):- {-AkVF(xk): Ak is an nxn positive definite matrix}. This general descent method reduces to the steepest descent method when Ak = I for k 0,1,2,.... If Ak = [V2F(xk)] 1 for k = 0,1,2,..., this method becomes a "damped" or "limited-step" Newton method. If F then is uniformly convex and twice-continuously differentiable, then this modification of Newton's method (i.e., Newton's method with a minimizing steplength) will produce iterates converging to the unique critical point of F. (See, for example, Ortega and Rheinboldt [1970].) In this section, we analyze the convergence of gradient algorithms adapted to solve unconstrained variational inequality problems. 36 VI(f,Rn) The generalized descent algorithm for the unconstrained problem can be stated as follows: Generalized Descent Algorithm : Step Select x0 Rn. k = 0. Compute the scaling matrix Scaling Choice. Step 1: Set Ak = A(x 0 ,xl,...,x ,k). Direction Choice. Step 2: x k * x . -Akf(xk). Compute = 0, stop: Otherwise, go to Step 2. Find One-Dimensional Variational Inequality. Step 3: Akf(xk) If xk+1 C [xk; -Akf(xk)] satisfying k+l T k+l ) )Tf(x (x - x Go to Step 1 with 0 for every k k x c [xk; -Akf(xk)]. O k - k + 1. The following result summarizes the convergence properties for this algorithm when f is a strictly monotone affine mapping. Theorem 5 Let M be a positive definite matrix, and f(x) - Mx-b. the sequence of positive definite symmetric matrices and {xk Let } {Ak } be be the sequence of iterates generated by the generalized descent algorithm applied to VI(f,C). (a) Then, the steplength ek determined on the kt iteration of the algorithm is ek (Mx b) Ak(Mxk-b) k T k (Mx-b) AkMAk(Mx -b) (22) 37 (b) the sequence of iterates generated by the algorithm are guaranteed to contract to the solution constant r < 1 x in norm by a fixed contraction if inf [X min(MAM)] > 0, A (i) M and inf [Xmin(A)] > 0; A (ii) where the infimum is taken over all positive definite symmetric matrices (c) A; and the contraction constant r is bounded from above by M hT r = inf Xmi[(M s A A 1 rt sup A Hi_ (MAM)(M -1 1/2 T max[A M(A) Proof (a) For ease of notation, let scaling matrix, e the kth x be the steplength k iterate generated by the algorithm. iterate, and x - x - Af(x) x x. k th solves the unconstrained one-dimensional subproblem, fT(x)Af(x) (Mx-b)TA[M(x - Solving the last equality for kth be the (Otherwise, and the algorithm would have terminated in Step 2 of the x be the (k+l)St As in the proof of the generalized steepest descent method, we can assume that Since A A(Mx-b)) - b] - 0. gives expression (22). f(x) 0, iteration.) 38 (b) The iterates generated by the algorithm are guaranteed to contract in norm to the solution r [0,1) M b x M if and only if there exists a real number that is independent of I1 - x and satisfies IIM S rlx - x 1.M. Thus, we define r : sup *TA(), A,xix where I lx - x* 11 M TA (x) := for x x . As in the proof of Theorem 1, we obtain a simplified expression for TA(x): TA(X) where y = [1 - RA(y)] , x - x , and RA(Y) T T := [(My)TA(My)][(y) MAMy] [(My) AMA(My) ][(y) My] Therefore, r if and only if inf RA (y) >0 A,y#O If sup * TA(X) A,x#x inf RA(y) A,y#O if inf [A in(A)] > 0 minA A sup [1 - RA ( y ) ] A,y#O > 0. [1 - inf RA (y)] A A,yO0 Hence, to prove (b), we show that inf [Xmin (A)] > 0 A min and - and inf [ mi n( M AM ) A min(A)l A inf [X (MAM)] > 0. A min , >0, then then < 1 39 inf [(My)TA(My)]l A,y#O (My)T(My) inf AY [(y)TMAMy] (y)Ty 12 Ivow A,yO0 sup [(My) AM A(My)l A,y#O (My)T(My)y inf [X mi n (A)]. A sup [(y)TMy] A.yS (y)Ty inf [Xmi n MA M)] A m . sup [Amax(AMA)] A because of (c) M. sup [ ma(AMA)] > 0 A Thus, and Ama x X max (M) > 0 u, ( M) by the positive definiteness r < 1. By an argument analogous to the argument in the proof of the corollary to Theorem 1, inf (Y)TMAMy Ay6O0 (y)TMy inf RA(y) A,y#O sup (My) AMA(My)( AYO 0(My)TA(My)) I H -T Me -1 inf Amin [(M ) (MAM)(M ) ] A sup AX[A M(A) A T] max The result follows from this inequality and the fact that If the sequence of iterates x0,...,x k ) {xk } {Ak} r - [1 - inf RA(y)]i A,y#O of scaling matrices is chosen before the sequence is generated, (that is, if A is independent of then the statement in part (b) of the theorem can be strengthened. In this case, a proof that generalizes the proof of Theorem 1 shows that the 40 {x } sequence is guaranteed to contract to the solution in M norm if and only if inf [A min(MAkM)l > 0 k=0,1,... (i') (ii') When inf [Imin(Ak)] > 0. k=0,1,... is independent of Ak and algorithm if conditions (i") x,...,x , (i') and (ii') are replaced by the conditions: lim inf[Xmi ( n MAM)] > 0, (ii") we can also ensure convergence of the lim inf[ min(A )] k+0 and > O. In this case, the iterates do not necessarily contract to the solution. 7. THE FRANK-WOLFE ALGORITHM Consider the constrained variational inequality problem f:C C Rn+Rn VI(f,C), is continuously differentiable and strictly monotone and bounded polyhedron. where C is a In this constrained problem setting, how might we generalize the descent methods that we have discussed for unconstrained problems? The class of feasible direction algorithms are natural candidates to consider. In this section, we study one of these methods: the Frank-Wolfe algorithm. If Vf(x) is symmetric for every some strictly convex functional VI(f,C) F:C-R 1, x C, f(x) - [VF(x)]T then and the unique solution solves the minimization problem (2). Thus, when f x for to is a gradient mapping, the solution to the variational inequality problem may be found by using the Frank-Wolfe method to find the minimum of F over C. 41 Recall that Frank-Wolfe algorithm (Frank and Wolfe [1956]) is a linear approximation method that iteratively approximates VF(xk)(x - xk). solution v On the kt h F(x) by F(xk ) + Fk(x) : iteration, the algorithm determines a vertex to the linear program Min Fk(x), xeC k+l x and then chooses as the next iterate the point the line segment that minimizes F on [x , vk] . Frank-Wolfe Algorithm for Linearly Constrained Convex Minimization Problems x 0 c C. Set Step 0: Find Step 1: Direction Choice. k = 0. Given Step 2: k x let Min xTVF(xk). xcC the linear program then stop: xk, * = x . v If be a vertex solution to (vk)T k) = (xk)T F(xk), Otherwise, go to Step 2. One-Dimensional Minimization. Let wk solve the one dimensional minimization problem: Min Go to Step 1 with k F((1-w)x k+1 x - + wvk ). (l-wk)xk + wvk and k = k+ . O This algorithm has been effective for solving large-scale traffic equilibrium problems (see, for example, Bruynooghe et al. [1968], LeBlanc et al. [1975], and Golden [1975].) In this context, the linear program in Step 1 decomposes into a set of shortest path problems, one for each 42 origin-destination pair. Therefore, the algorithm alternately solves shortest path problems and one-dimensional minimization problems. If is pseudoconvex and continuously differentiable on the bounded F C, polyhedron then (see Martos [1975], for example) the Frank-Wolfe algorithm produces a sequence of feasible points that is either finite, {x k terminating with an optimal solution, or it is infinite, and has some accumulation points, any of which is an optimal solution. When f(x) = VF(x) x for every C, we can solve VI(f,C) by reformulating the problem as the equivalent minimization problem and applying the Frank-Wolfe method. Equivalently, we can adapt the Frank-Wolfe method to solve the variational inequality problem directly by substituting f for VF in Step 1 and replacing the minimization problem in Step 2 with its optimality conditions, which are necessary and sufficient because F is convex. (For other modifications of the Frank-Wolfe algorithm applicable to variational inequalities, see Lawphongpanich and Hearn [1982] and Marcotte [1983].) Generalized Frank-Wolfe Method for the Linearly Constrained Variational Inequality Problem x 0 c C. k = 0. Set Step 0: Find Step 1: Direction Choice. Given the linear program then stop: k x , Min xTf(xk) XEC is a solution to x k let v If be a vertex solution to (xk)Tf(xk) = (vk)Tf(xk Otherwise, go to VI(f,C). Step 2. Step 2: One-Dimensional Variational Inequality. Let wk solve the following one-dimensional variational inequality problem on the line segment [x k , v k : Find wk [0,1] satisfying 43 {[ (1-w)x k+wvk ] - [(1-Wk)X WkVk ]Tf [ (1-Wk)XkkVk ] 0 w E [0,1]. for every Go to Step 1 with k+l x (-wk)x k + Wvk k k = k+l. and This generalization of the Frank-Wolfe method is applicable to any linearly constrained variational inequality problem. f If, however, is not a gradient mapping, the algorithm need not converge to the solution of the problem. The following two examples illustrate situations for which the sequence of iterates produced by the algorithms cycle among the extreme points The first is a simple two-dimensional example; the of the feasible region. second could model delay time in a traffic equilibrium problem with one origin-destination pair and three parallel arcs. is affine f The mapping and strictly monotone in each of these examples. Example 5 Let f(x) - where Mx-b, J M C = {x = (xl,X2 ): x2 s 1/2, V/ x 1 + x2 a -1, The solution to Let x0 VI(f,C) [ . is x - and , b - and x 1 + x2 a -1}. i The linear program of Step 1 of the L 1/2 generalized Frank-Wolfe algorithm solves at vO = [ ], variational inequality subproblem of Step 2 solves at Continuing in this manner, the algorithm then generates xl and the [ v1 - ] / Li/2 44 x2 1 [ v2 = - 1/2 12 1/2 , and x3 12 1/2 x0 , xl and iterates cycle about the three points this cyclic behavior. - = x. x2 . Hence, the Figure 2 illustrates (In the figure, the mapping has been scaled to emphasize the orientation of the vector field.) X I X f(x') ',/ X, XI f(x') Figure 2: The Generalized Frank-Wolfe Algorithm Cycles Example 6 Let and let f(x) C - {x Mx-b, where 1 1 0 0 1 1 0 1 LI M and b, - 0 0 (Xl,x2 ,x3 ): x 1 a 0, x2 2 O, x3 2 O, x 1 + x2 + x 2 - 1. 1/3 The solution to O VI(f,C) is x = 1/3 1/3 , since 45 (x - x ) f(x) Let v2 [ x Then [ 0. x 2 2/3(x 1 + x2 + x3 - 1) v0 - 1 and 0= for every , , x3 [ ] Hence, the iterates cycle about the 3 points 1 x v1 C. - [ O x1 x0 , x x2 . and D The generalized Frank-Wolfe method does not converge in the above examples because the matrix M is in some sense "too asymmetric." In all of the examples we have analyzed, the algorithm cycles only when the Jacobian of f is very asymmetric. Because the generalized Frank-Wolfe algorithm reduces to the generalized steepest descent algorithm when the problem to which it is being applied is unconstrained, it is likely that the conditions required for the generalized Frank-Wolfe to converge are at least as strong as the conditions required for the generalized steepest descent method to converge. That is, it is likely that at least, M2 must be positive definite. condition is satisfied in neither of the previous examples. M2 = is clearly not positive definite. -2V3- M2 In Example 5, M In Example 6, -2 is not positive definite because the determinant of the first minor of This 2x2 principle is negative. Several difficulties arise when trying to prove convergence of the generalized Frank-Wolfe method. First, the iterates generated by the algorithm do not contract toward the constrained solution with respect to 46 either the Euclidean norm or the M norm, where M = Vf(x ), even if M is (An example in Hammond [1984] illustrates this behavior.) symmetric. The proof of convergence of the Frank-Wolfe method for convex minimization problems demonstrates convergence by showing that When descent function. Mx-b f(x) F(xk) is a is a gradient mapping, instead of using the usual descent argument, we can prove convergence of the generalized Frank-Wolfe method with an argument that relies on the fact that the solution of the constrained problem is the projection with respect to the the unconstrained solution onto the feasible region. M norm of This is not true if M is asymmetric, so the argument cannot be generalized to the asymmetric case. Although the Frank-Wolfe algorithm itself does not converge for either of Examples 5 or 6, the method will converge for these examples if in each In particular, consider a modified iteration the step length is reduced. version of the Frank-Wolfe method, where the step length on the iteration is /k; that is, the algorithm generates iterates by the recursion k+l x where v k =x + k) k 1 -x), (v k+1 is the solution to the linear programming subproblem in Step 1 of the Frank-Wolfe method. This procedure can also be interpreted as an we can iteratively substitute for extreme-point averaging scheme: i - k,k-l,...,1 xi for to obtain k+l x Thus, xk kth 1 k+l k Ev i vi is the average of the extreme points generated by the linear programming subproblems on the first k iterations. This variant of the 47 In the Frank-Wolfe method will solve the problems given in Examples 5 and 6. next subsection we show that it generalizes the "fictitious play" algorithm for zero-sum two-person games. 7.1 Fictitious Play Algorithm to a (x , y ) 1951] shows that an equilibrium solution Robinson finite, two-person zero-sum game can be found using the iterative method of fictitious play. A = The game can be represented by its pay-off matrix Each play consists of a row player choosing one of the (aij). m rows of th the and matrix column are player chooses one of player the n pays columns. If player the ith the row the thewhile jth acolumn chosen, the column row amount aij, -aij. receives points k where S i.e., the row player receives x satisfying xTAy < (x)Ay is the unit simplex in and the column player to the game is a pair of (x , y ) An equilibrium solution Sm, yES n +aij for every yes n, xcS, k R The fictitious play method determines -k k (x ,y ), the k th play of the game, by determining for each player the best pure strategy (i.e., the single best row or column) against the accumulated strategies of the other -k the row player chooses the pure strategy x k 1 k-1 that is the best reply to the average y : y of the first k plays player. Hence, at iteration k, j=0 by the column player. -If k ik If x is the best response to k xTAy (k) TAyk X Ay - xAy -k y , x must satisfy for every xSm. 48 -k x That is, solves the (trivial) linear program Max xTAyk xeSm -k y Similarly, solves the linear program Min ym k x where 1 k-1 xi iwhere x Robinson shows that the iterates to the equilibrium solution kT (xk)TAy k k (x , y ) generated by this method converge of the game. (x , y ) To show that the Frank-Wolfe method "with averaging" is a generalization of the fictitious play algorithm, we first reformulate the matrix game as the f(z) if Mz ] [ [T = *T ] [T ] . z [ - T T ** (x ) Ay - x Ay (x , y ) that is, if and only if z VI(f,C) solves T * *TT* (x-x ) (-Ay )+(y-y ) Ax * (z-z ) f(z ) xESm, ycS , C - SmxS, where VI(f,C), variational inequality problem and if and only for every 0 is a solution to the game. The Frank-Wolfe method with averaging determines an extreme point of on the kt h iteration by solving the linear programming subproblem zTf(z k ). Min xEC (This subproblem determines over (xk)TAy minimizes k+l zk+ (xk)TAy I i-k i=O -k x and if and only if C k k+1 C -1 over : Sn.) that is, -k x -k y : -k z minimizes maximizes xAy k T k z f(zk ) over m S - Tk -xTAy and -k y + The algorithm then determines the next iterate x k+l 1 k+I' k + i=O -i x and y k+1 1 k E Z J-O J 49 Thus we can view the Frank-Wolfe algorithm with averaging as a generalized fictitious play algorithm. The following theorem shows that this generalized fictitious play algorithm will solve a certain class of variational inequality problems. Shapley [1964] has devised an example that shows that the method of fictitious play need not solve general bimatrix games (and hence general variational inequality problems). However, the mapping in his example is not monotone. Theorem 6 The fictitious play algorithm will produce a sequence of iterates that converge to the solution of the variational inequality problem (i) (ii) (iii) f is continuously differentiable and monotone; C is compact and strongly convex; and no point x in the ground set C satisfies VI(f,C) if f(x) = 0. Proof The algorithm fits into the framework of Auslender's [1976] descent algorithm procedure, because and the stepsize Z ken-W Wk = +X, k wk and = 1/k lim k k v at the solves the subproblem th kt h T k min{xTf(xk):xEC} iteration satisfies k w > 0, w k = 0. k O Two of the conditions specified by the theorem are more restrictive than we might wish. polyhedral. First, if C is strongly convex, then C cannot be This framework, therefore, does not show that the algorithm converges for the many problem settings that cast the variational inequality problem over a polyhedral ground set. Since an important feature of this algorithm is that the subproblem is a linear program when the ground set C 50 is polyhedral, this restriction eliminates many of the applications for which the algorithm is most attractive. xcC Secondly, the condition that may be too restrictive in some problem settings. f(x) # 0 for One setting for which this condition is not overly restrictive is the traffic equilibrium problem. If we assume that the demand between at least one OD pair is positive, then we can assume that the cost of any feasible flow on the network is nonzero. Powell and Sheffi [1982] show that iterative methods with "fixed step sizes" such as this one will solve convex minimization problems under certain conditions. Their proof does not extend to variational inequality problems defined by maps that have asymmetric Jacobians. Although we do not currently have a convergence proof for the fictitious play algorithm for solving variational inequality problems, we believe that it is likely that the algorithm will converge. We therefore end this section with the following conjecture: Conjecture If f is uniformly monotone and C is a bounded polyhedron, then the fictitious play algorithm will solve the variational inequality problem VI(f,C). 8. CONCLUSION In general, when nonlinear programming algorithms are adapted to variational inequality problems, their convergence requires a restriction on the degree of asymmetry of the Jacobian of the problem map. Analyzing the effect that an asymmetric Jacobian has on the vector field defined by the problem map suggests why this restriction is required. Consider the difference between the vector fields defined by two monotone affine maps, one 51 having a symmetric Jacobian matrix and one having an asymmetric Jacobian matrix. f(x) - Mx-b. Let When * T. x~x* (x-x ) M(x-x ) - c equation is a symmetric positive definite matrix, the M describes an ellipsoid whose axes are in the direction of the eigenvectors of M. The set of equations * T. ~x* (x-x ) M(x-x ) c, therefore, describe concentric ellipsoids about the solution. sets, the vector f(x) at the point since x, on the boundary of one of these ellipsoidal level x For any point Mx-b is normal to the hyperplane supporting the set a (-*T ~-* T) 2M(x -1 2(x - M (M+M )(x-x ax(x-x ) M(x-x) b) 2f(x). If M is not symmetric, the set of equations (x-x ) M(x-x ) = c (x-x ) M(x-x ) = again describe concentric ellipsoids about the solution: the axes are in the direction of the eigenvectors of though, the vector f(x) = Mx-b M. In this instance, is not normal to the hyperplane supporting the ellipsoidal level set at the point x. Figure 3 illustrates the vector fields and ellipsoidal level sets for a symmetric matrix and an asymmetric matrix. In general, the more asymmetric the matrix the more the vector M, field "twists" about the origin. Nonlinear programming algorithms are designed to solve problems defined by maps that have symmetric Jacobians. In general, these algorithms move iteratively in "good" feasible descent directions. kt h minimization problem (2), on the feasible direction choose dk dk satisfying That is, for the iteration, the algorithm determines a dkVF(xk) < 0. Many algorithms attempt to "sufficiently close" to the steepest descent direction -VF(xk). When these algorithms are adapted to solve variational inequality problems, they determine a direction the direction -f(xk). dk satisfying df(xk ) < O., As long as the Jacobian of f with dk close to is nearly symmetric, 52 4Xl ascent" ins range c dir ( M ) = (X.x)TM(x .X) = C M Asymmetric M Symmetric f(x) - Mx-b is Normal to the Tangent Plane to the Ellipsoidal Level Set if and only if M is Symmetric Figure 3: such a direction is a "good" direction for the problem move in the direction Jacobian of f dk Figure 3 illustrates the set of for both the symmetric and asymmetric cases. because a If, however, the is a move towards the solution. is very asymmetric, a move in the direction away from the solution. -f(xk), VI(f,C), dk may be a move "descent" directions The illustrations show that the direction that a nonlinear programming algorithm considers to be a "good" direction, can be a poor direction if the matrix is very asymmetric. Projection algorithms are widely used to solve variational inequality problems. A fundamental difference between the nonlinear programming 53 algorithms that we consider in this paper and projection methods is that the algorithms we consider use a "full" steplength; in contrast, projection methods use a small fixed steplength, or a steplength defined by a convergent sequence of real numbers. Although full steplength algorithms such as the generalized steepest descent and Frank-Wolfe algorithms require more work per iteration than those using a constant or convergent sequence step size, they move fairly quickly to a neighborhood of the solution. Taking a full steplength poses a problem, however, when the Jacobian of the mapping is very asymmetric. In this case the "twisting" vector field may not only cause the algorithm to choose a less than ideal direction of movement, but, having done so, will cause the algorithm to determine a much longer step than it would choose if the mapping was nearly symmetric. The asymmetry is not as much of a problem if the step size is small, because the algorithm will not pull as far away from the solution even if the direction of movement is poor. illustrates the effect of asymmetry on the full steplength. Figure 4 By our previous observations, algorithms that take a full step size will converge only if a bound is imposed on the degree of asymmetry of the Jacobian. methods do not require this type of condition. Projection The algorithms that we consider in this paper will converge even if the problem mapping is very asymmetric as long as the full steplength is replaced by a sufficiently small steplength. The steepest descent algorithm for unconstrained variational inequality problems becomes a projection algorithm if the stepsize is sufficiently small. Theorem 6 shows that the Frank-Wolfe method will converge for a class of variational inequality problems if the stepsize is defined by a convergent sequence. 54 X, k f(xk) ) = c2 >c k) Xt X') = C1 M Symmetric M Asymmetric Figure 4: A Full Steplength Pulls the Iterate Further from the Solution when the Map is Very Asymmetric 55 REFERENCES Ahn, B. and W. Hogan [1982]. "On Convergence of the PIES Algorithm for Computing Equilibria," Operations Research 30:2, 281-300. Anstreicher, K. [1984]. Auslender, A. [1976]. Private communication. Optimisation: Methodes Numberiques, Mason, Paris. Ballantine, C. S. and C. R. Johnson [1975]. "Accretive Matrix Products," Linear and Multilinear Algebra 3, 169-185. Beckmann, M. J., C. B. McGuire and C.B. Winsten [1956]. Studies in the Economics of Transportation, Yale University Press, New Haven, CT. Bertsekas, D. P. [1982]. Constrained Optimization and Lagrange Multiplier Methods, Academic Press, New York, NY. Bruynooghe, M., A. Gibert, and M. Sakarovitch [1968]. "Une Methode d'Affectation du Traffic," in Fourth Symposium of the Theory of Traffic Flow, Karlsruhe. Cauchy, A. L. [1847]. "Methode generale pour la resolution des systemes d'equations simultanees," Comptes rendus, Ac. Sci. Paris 25, 536-538. Courant, R. [1943]. "Variational Methods for the Solution of Problems of Equilibrium and Vibrations," Bull. Amer. Math. Soc. 49, 1-23. Curry, H. [1944]. "The Method of Steepest Descent for Nonlinear Minimization Problems," Quar. Appl. Math. 2, 258-261. Dafermos, S. C. [1983]. "An Iterative Scheme for Variational Inequlaities," Mathematical Programming, 26:1, 40-47. Florian, M. and H. Spiess [1982]. "The Convergence of Diagonalization Algorithms for Asymmetric Network Equilibrium Problems," Transportation Research B 16B, 477-483. Frank, M. and P. Wolfe [1956]. "An Algorithm for Quadratic Programming," Naval Research Logistic Quarterly 3, 95-110. Gershgorin, S. [1931], "Uber die Abrenzung der Eigenwerte einer Matrix," Izv. Akad. Nauk SSSR Ser. Mat. 7, 749-754. Golden, B. [1975]. "A Minimum Cost Multi-Commodity Network Flow Problem Concerning Imports and Exports, Networks 5, 331-356. Hadamard, J. [1908]. "Memoire sur le probleme d'analyse relatif l'equilibre des plaques elastiques encastrees," Mmoires presentes par divers savants estranges a l'Academie des Sciences de l'Institut de France (2) 33:4. 56 Hammond, J. [1984]. "Solving Asymmetric Variational Inequality Problems and Systems of Equations with Generalized Nonlinear Programming Algorithms," Ph.D. Thesis, Department of Mathematics, M.I.T., Cambridge, MA. Hestenes, M., and E. Stiefel [1952]. "Methods of Conjugate Gradients for Solving Linear Systems," J. Res. Nat. Bur. Standards 49, 409-436. Johnson, C. R. [1972]. "Matrices Whose Hermitian Part is Positive Definite," Ph.D. Thesis, California Institute of Technology, Pasadena, CA. Kaempffer, F.A. [1967]. Waltham, MA. The Elements of Physics, Blaisdell Publishing Co., Kinderlehrer, D. and Stampacchia [1980]. An Introduction to Variational Inequalities and Applications, Academic Press, New York, NY. Lawphongpanich, S. and D. Hearn [1982]. "Simplicial Decomposition of the Asymmetric Traffic Assignment Problem," Research Report No. 82-12, Department of Industrial and Systems Engineering, University of Florida, Gainsville, Florida. LeBlanc, L. J., E. K. Morlok, and W. P. Pierskalla [1975]. "An Efficient Approach to Solving the Road Network Equilibrium Traffic Assignment Problem," Transportation Research 5, 309-318. Luenberger, D. G. [1973]. Introduction to Linear and Nonlinear Programming. Addison-Wesley, Reading, MA. Marcotte, P. [1983]. "A New Algorithm for Solving Variational Inequalities Over Polyhedra, with Application to the Traffic Assignment Problem," Cahier du GERAD #8324, Ecole des Hautes Etues Commerciales, Universit de Montreal, Montreal, Qudbec. Martos, B. [1975]. Nonlinear Programming Theory and Methods. Elsevier Publishing Co., Inc., New York, New York. American Ortega, J. M. and W. C. Rheinboldt [1970]. Iterative Solution of Nonlinear Equations in Several Variables, Academic Press, New York, NY. Pang, J. S. and D. Chan [1981]. "Iterative Methods for Variational and Complementarity Problems," Mathematical Programming 24, 284-313. Polak, E. [19871]. New York, NY. Computational Methods for Optimization, Academic Press, Powell, W. B. and Y. Sheffi [1982]. "The Convergence of Equilibrium Algorithms with Predetermined Step Sizes," Transportation Science 16:1, 45-55. Robinson, J. [1951]. "An Iterative Method of Solving a Game," Annals of Mathematics 54:2, 296-301. 57 Samulson, P. A. [1952]. "Spatial Price Equilibrium and Linear Programming," American Economic Review 42, 283-303. Shapley, L. S. [1964]. "Some Topics in Two Person Games," Advances in Game Theory, Annals of Mathematics Studies No. 52, M. Dresher, L.S. Shapley, A.W. Tucker eds., Princeton University Press, Princeton, NJ, 1-28. Strang, G. [1976]. York, N.Y. Linear Algebra and Its Applications. Academic Press, New