::iRS:tAR :-: 4C EfNT-e 0R:-K I~~1:;~:I·.:

advertisement
4C EfNT-e 0R::-:
::iRS:tAR
G NS
I~~1:;~:I·.:
*-
Paprer-,.
; C'k
~
~~~~~~~~~~~~~~:,,
:
:-'-'--;
'--~~~~~~~:
'"'-'
,-
'
'','.-"'''
'
-$
t
:t~~~~h
: :
.;
'..-.,.-..,!
i- :
. '
'- ,',
:
';'
'
""-
:
,t
fdS000;0
::ae
woki0 0
t~~~~~:: 040-00:X
;
: :=.-
:
:
1.
··.
:..·:·
;;
";· -'
r·-
:
·-
-·-··
:;·
;·-·.
.;.·..
----
·.
- ·- - ·
···-. · -..
i
I-. i
i.·
·: -
.·i:.
-·1·
:·
'· :·
:j
r
:
.
i
INST.ITUTE "
MASCTTS.
. OFT.ECHNOLOGY...,.
.A
~
Generalized Descent Methods For
Asymmetric Systems of Equations
and Variational Inequalities
by
Janice H. Hammond
Thomas L. Magnanti
OR 137-85
August 1985
GENERALIZED DESCENT METHODS FOR
ASYMMETRIC SYSTEMS OF EQUATIONS
AND VARIATIONAL INEQUALITIES
by
Janice H. Hammond
Thomas L. Magnanti
ABSTRACT
We consider generalizations of the steepest descent algorithm for
solving asymmetric systems of equations.
We first show that if the
system is linear and is defined by a matrix M, then the method converges
if M2
is positive definite.
We also establish easy to verify conditions
on the matrix M that ensure that M
is positive definite, and develop a
scaling procedure that extends the class of matrices that satisfy the
convergence conditions.
In addition, we establish a local convergence
result for nonlinear systems defined by uniformly monotone maps, and
discuss a class of general descent methods.
Finally, we show that a
variant of the Frank-Wolfe method will solve a certain class of variational inequality problems.
All of the methods that we consider reduce to standard nonlinear
programming algorithms for equivalent optimization problems when the
Jacobian of the underlying problem map is symmetric.
We interpret the
convergence conditions for the generalized steepest descent algorithms
as restricting the degree of asymmetry of the problem map.
KEYWORDS:
variational inequalities, linear and nonlinear systems,
steepest descent, Frank-Wolfe Algorithm.
This research was partially supported by Grant # ECS-8316224 from the
National Science Foundation's Program in Systems Theory and Operations
Research.
1.
INTRODUCTION
Historically, systems of equations and inequalities have been closely
linked with optimization problems.
Theory and methods developed in one of
these problem contexts have often complemented and stimulated new results in
the other.
In particular, equation and inequality systems are often viewed as
the optimality conditions for an auxiliary nonlinear program.
the physical sciences, variational principles
For example, in
(see Kaempffer [1967], for
example) identify equilibrium conditions for systems such as electrical
networks, chemical mixtures, and mechanical structures with equivalent
optimization problems (minimizing power losses, Gibbs free energy, or
potential energy).
Similar identifications (e.g., identifying spatial price
equilibria and urban traffic equilibria with minimizing consumer plus producer
surplus (Samuelson [1952], Beckmann, McGuire and Winsten [1956])) have also
proved to be quite useful in studying social and economic systems.
Once a system of equations and inequalities has been posed as an
equivalent optimization problem, nonlinear programming algorithms can be used
to solve the system.
Indeed, many noted nonlinear programming methods have
been adopted and used with considerable success to solve systems of equations
(e.g., Hestenes and Stiefel [1952], Ortega and Rheinboldt [1970]).
However, viewing a system of equations or inequalities as the optimality
conditions of an equivalent optimization problem requires that some form of
symmetry condition be imposed on the system.
For example, consider the
general finite dimensional variational inequality problem
mapping f:RnORn
find
If
f
x
C C
and a set
VI(f,C):
given a
C C Rn,
satisfying
(x-x ) f(x )
is continuously differentiable, then
0
f(x)
for every
VF(x)
x c C.
for some map
(1)
2
F:C C Rn+R
if and only if
Vf(x)
is symmetric for all
x c C.
In this case
the variational inequality system can be viewed as the optimality conditions
for the optimization problem
min {F(x):x c C.
(2)
Therefore, nonlinear programming methods can be applied to the optimization
problem (2) in order to solve the variational inequality (1).
Suppose now that
Vf(x)
is not symmetric, so that the variational
inequality has no equivalent optimization problem such as (2).
Can analogues
of the nonlinear programming algorithms for (2) be applied directly to (1)?
If so, when will they converge and at what convergence rate?
This paper provides partial answers to these questions.
we focus on solving systems of asymmetric equations
f(x )
In particular,
O0, which for
purposes of interpretation we view as variational inequalities with
in (1).
In Sections 2-5 we introduce and study a generalized steepest descent
algorithm for solving the asymmetric system
simplified problem setting in which
mapping.
mappings.
C = Rn
f
f(x )
O.
Section 3 considers a
is an affine, strictly monotone
Section 4 extends these results to nonlinear, uniformly monotone
Section 5 shows that the convergence conditions for the generalized
steepest descent method can be weakened considerably if the problem mapping is
scaled in an appropriate manner.
Section 6 extends the results for the
generalized steepest descent method to more general gradient methods.
Finally, in Section 7 we consider a generalization of the Frank-Wolfe
algorithm that is applicable to constrained variational inequalities.
To conclude this section, we briefly outline the notational conventions
and terminology to be used in this paper.
be introduced in the text as needed.
Other definitions and notation will
3
of
be a real
M
Let
nxn
In general, we define the definiteness
matrix.
M
e.g.,
without regard to symmetry:
M
if and
x Mxonly
> 0 if for
nonzero
M,
part xofC Rn.
symmetric
the every
if
is positive definite if and only
T
Recall that
is
(Mpositive
+ M ), definite
by MM = is
defined
positive definite.
nxn
An
defines an inner product
G
positive definite symmetric matrix
n
R:
on
(x,y)G
where
denotes definition and
=
G
product defined by
T
:= xT Gy,
denotes transposition.
induces a norm on
Rn:
IXIIG = (XX) G
which in turn induces a norm on the
IA I
:=
X)
nxn
matrix
sup
IIAx
A:
IG
IxIIGl-1
IIAI IG' we implicitly assume that
(By writing
The inner
has the same dimensions as
A
G.)
The mapping
f:C
for every x
C, y
C;
for every x
C, y
C
C
x
if
for some scalar
C, y
C, where
is monotone on
RnR
strictly monotone on
with
k > 0,
I1 11
y;
x
x
if
(x-y)T(f(x) - f(y))
(x-y)T(f(x) - f(y)) > 0
if
C
and uniformly (or strongly) monotone on
(x-y)T(f(x) - f(y))
kx-yj|2
for every
denotes the Euclidean norm.
d
in
Rn ,
in the direction
d;
i.e.,
Finally, for any two points
the ray emanating from
C
x
and
[x;d] = {y : y - x +
d, e 2 0}-
we let
[x;d]
denote
4
2.
BACKGROUND AND MOTIVATION
In this section, we introduce a generalized steepest descent algorithm
for asymmetric systems of equations and show that the algorithm's convergence
requires some restriction on the degree of asymmetry of the problem map.
Consider the unconstrained variational inequality problem
where
f
is continuously differentiable and uniformly monotone.
unconstrained problem seeks a zero of the mapping
for every
If
x
Rn
Vf(x)
if and only if
f(x )
is symmetric for every
some uniformly convex functional
satisfying
C = Rn.
VI(f,Rn),
f(x )
0
=
since
*T
*
(x-x ) f(x )
0
0.
x c Rn,
F:Rn R,
f,
This
then
f
is the gradient of
and the unique solution
x
solves the convex minimization problem (2) with
In this case, the solution to the unconstrained variational
inequality problem can be found by using the steepest descent method to find
the point
x
at which
F
achieves its minimum over
Rn.
Steepest Descent Algorithm for Unconstrained Minimization Problems
x0 E R .
Step 0:
Select
Step 1:
Direction Choice.
then stop:
Step 2:
k
xk
Set k
0.
Compute
*
x .
-VF(xk).
If
k
VF(x
)O
0,
Otherwise, go to Step 2.
One-Dimensional Minimization.
Find
xk1 = arg min{F(x) : x c [xk-VF(xk)]}
Go to Step 1 with
k
k + 1.
E
Curry [1944] and Courant [1943] have given early expositions on this
classical method.
Curry attributes the method to Cauchy [1847], while Courant
attributes it to Hadamard [1907].
[1971], or Bertsekas [1982]), if
As is well-known (see, for example, Polak
F
is continuously differentiable and the
5
level set
F(x0 ))
{x : F(x)
is bounded, then the steepest descent algorithm
either terminates finitely with a point
infinite, and every limit point
exists) satisfies
x
x
satisfying
of the sequence
VF(xN)
{xk }
0
or it is
(at least one
VF(x ) = 0.
Local rate of convergence results can be obtained by approximating
by a quadratic function.
F(x) =
In particular, suppose that
some positive definite symmetric
nxn
matrix
Q.
Then if
respectively, the largest and smallest eigenvalues of
r
=
k
{x
(A-a)/(A+a), the sequence
}
Q
A
Q
x-x *l
and
F(x)
a
for
are,
and
generated by the steepest descent
algorithm satisfies
k+l
When
f
r 2 F(xk+l
x*2 = F(k+
-
- IIQ
-Q
r2 IIxk
-
is a gradient mapping, we can reformulate
x IIQ
(3)
VI(f,Rn)
as the
equivalent minimization problem (2) and use the steepest descent algorithm to
solve the minimization problem; equivalently, we can restate the steepest
descent algorithm in a form that can be applied directly to the variational
inequality problem.
To do so, we eliminate any reference to
algorithm and refer only to
f(x)
VF(x).
=
Since
F(x)
F(x)
is convex,
in the
k+l
x
solves the one-dimensional optimization problem in Step 2 if and only if the
directional derivative of
directions.
F
k+l
at
x
is nonnegative in all feasible
Therefore, the algorithm can be restated in the following
equivalent form:
Generalized Steepest Descent Algorithm for the Unconstrained Variational
Inequality Problem
x0
Step O:
Select
Step 1:
Direction Choice.
E
R .
Set
k
Compute
Otherwise, go to Step 2.
0.
-f(x k).
If
f(x)
0,
stop;
x
= x
6
Step 2:
One-Dimensional Variational Inequality.
Find
E [x ;-f(xk)]
x
satisfying
(x-xk+)T f(xk + )
Go to Step 1 with
0
=
k
for every
x
[x k;-f(xk)].
a
k + 1.
As stated, the algorithm is applicable to any unconstrained variational
inequality problem.
It can be viewed as a method that moves through the
"vector field" defined by
f
by solving a sequence of one-dimensional
variational inequalities.
The generalized steepest descent algorithm will not solve every
unconstrained variational inequality problem, even if the underlying map is
uniformly monotone.
If
f
is not a gradient mapping, the iterates generated
by the algorithm can cycle or diverge.
The following example illustrates this
type of behavior.
Example 1
Let
f(x)
=
Mx,
x e R2
where
definite (because
= I),
M
gradient map (since
Vf(x)
f
=
M
and
M
Vf(x) - M
Since
.
is uniformly monotone.
I
If
the
= [],
x
If
is not symmetric.
is orthogonal to
this example,
f
is a
p > 0,
f
is not a gradient
Let
=
[:] and consider the
x
p - 1.
As long
the one-dimensional variational inequality subproblem on
thxtheonek+l
kth iteration will solve at the point
f(xk + )
is positive
p = O,
progress of the generalized steepest descent algorithm when
as
M
is symmetric) and the generalized
steepest descent algorithm will converge.
mapping, since
P1
=
-f(xO)
u_
f(x(xk),
xk+1
at which the vector
the direction of movement.
[], which implies that
xl
[ ]
since
In
7
f][2] is orthogonal to [ 2
1
[]
x4
1[i]
xO.
[1
],
x
x22
Similarly,
xx 3
1]
[ - ],
and
Thus, in this case, the algorithm cycles about the four points
[
]
and [1].
Figure 1 illustrates this cyclic behavior.
(In
the figure, the mapping has been scaled to emphasize the orientation of the
XI
vector field.)
= Mx*
f(X
f(x') = M,x'
X
f(X3 ) = Mx'
X
/
I
I
I
I
I
=
X4
movement direction -f(x') draws away
from solution as p Increases
from 0 to 1
-
x
xl
f(x') = M,x'
-…__-------.
X
f(X) = MX'
Figure 1: The Steepest Descent Iterates
Need Not Converge If M is Asymmetric
The iterates produced by the generalized steepest descent algorithm do
not converge when
p
1
because the matrix
M
p
is "too asymmetric":
the
off-diagonal entries are too large in absolute value in comparison to the
diagonal entries.
Geometrically, if
points directly away from the solution
p
=
x
0,
the vector field defined by
O;
as
p
f
increases, the vector
field begins to twist, which causes the movement direction to draw away from
the solution until, when
p
fact,
x
x
and only if
P[
IPI
O
< 1.
1,
,
the algorithm no longer converges.
In
so the iterates converge to the solution if
8
In the following analysis, we investigate conditions on
that ensure
f
that the generalized steepest descent algorithm solves an unconstrained
variational inequality problem that cannot necessarily be reformulated as an
equivalent minimization problem.
3.
THE GENERALIZED STEEPEST DESCENT ALGORITHM FOR UNCONSTRAINED
PROBLEMS WITH AFFINE MAPS
In this section, we consider the unconstrained variational inequality
problem
VI(f,Rn),
f(x)
assume that
=
is a uniformly monotone affine map.
where
f
Mx-b,
where
M
is an
nxn
Thus, we
real positive definite matrix
and
b
Rn.
3.1
Convergence of the Generalized Steepest Descent Method
is affine, we can easily find a closed form expression for the
f
When
steplength
ek
on the
th
k-t h
iteration.
Lemma 1
Assume that
f(x)
=
Mx-b.
Let
k
x
be the
th
k- h
iterate generated by the
generalized steepest descent method and assume that
th
th
k-
steplength determined on the
k
x
*
j x .
Then the
iteration is
kb)T (Mxk b)
(Mx
k~.
~k
k_ )T
(4)
(Mxk-b) M(Mx -b)
Proof
Step 2 of the algorithm determines
(x-xk+)Tf(x k+ )
0
for every
xk+1 £ [xk; -f(xk)]
x c [xk; -f(xk)].
satisfying
(5)
9
k+l
If
x
-fT(xk)f(xk)
k
k
,
inequality (5) with
2 0. But then
f(x )
becomes
O, so the algorithm would have
terminated in Step 1. Hence, we assume that
ek
(x) k
- f(x )
x -
k+l
k
x
~ x , and therefore that
0.
Substituting
x
x
- ef(xk)
x +
and
x
-
f(xk)
into (5) gives
k
(ek-e)f T(x )f(xk-Okf(xk))
2 0.
Since this inequality is valid for all
fT(xk)f(xk-kf(xk))
0 must hold.
=
expression and solving for
When
f
ek
6
0, and
ek >
F
satisfying
f(x)
VF(x),
When the Jacobian of
VF(x) = f(x)
convergence does not apply.
into this
yields the expression (4).
steepest descent algorithm follows from the fact that
function
the condition
f(x) - Mx-b
Substituting
is a gradient mapping, i.e.
function for the algorithm.
,
o
convergence of the
F(x)
f(x)
is a descent
is not symmetric, no
exists, so in general this proof of
Instead, we will establish convergence of the
generalized steepest descent method by showing that the iterates produced by
the algorithm contract to the solution with respect to the
that
M =
(M + MT)).
The
M
M
norm (recall
norm is a natural choice for establishing
convergence because it corresponds directly to the descent function
when
f(x)
while
-
Mx-b
1x-x* 12
and
M
is symmetric.
(x-x*)TM(x-x*
)
=
In this case,
2F(x) + x*TMx ,
function for the algorithm if and only if
I x-x
I|
so
F(x)
F(x)
F(x)
(1/2)xTMx-bTx,
is a descent
is a descent function
for the algorithm.
The following theorem states necessary and sufficient conditions on the
matrix
M
for the steepest descent method to contract from any starting point
with respect to the
M
norm.
10
Theorem 1
Let
M
f(x) = Mx-b.
be a positive definite matrix, and
Then the
sequence of iterates produced by the generalized steepest descent method is
VI(f,Rn)
if and only if the matrix
M2
of the problem
x
norm to the solution
M
guaranteed to contract in
is positive definite.
Furthermore, the contraction constant is given by
r
((Mx)T(Mx) xTM2x
xO x Mx (Mx) M(Mx)
1/2
- inf
L
(6)
Proof
*k
x j x
For ease of notation, let
e = ,ek
algorithm, let
x
x - (Mx- b),
where
| r
l|IM
T(x) :=
x*Il.
S rnix x - x I
r := sup* T(x).
r
iterate generated by the
be the (k+l) s t
iterate; then
wil 1 show that there
(Mx-b)(Mx-b)
(Mx-b) M(Mx-b)
r c [0,1)
exists a real number
x
and let
th
kth
be the
that is independent of x
r
Because
/I Ix - x ll
and satisfies
must satisfy
for every
x
is clearly nonnegative, since
x , we define
T(x) > 0
for every
xfx
x
x
.
We now show that
IIzil 2M
Because
I Ix-xI
|M
zTMz
r < 1
if and only if
for every
-
[(x-x* ) TM(x-x *)] , and
=-
)]
)-(M-b)[(-(Mx-b)-x*)TM(x-(Mx-b)-x
x-x *)TM(x-*)
is positive definite.
we have that
z c R,
= [(x-x) TM(x-x*)-e(Mx-b)TM(
M2
x-x*)-(x-x )TM(Mx-b)+82 (Mx-b)TM(Mx-b)]
[M(x-x )] [M(x-x
)
[(x-x*)TM2 (x-x
*)
[M(x-x )] M[M(x-x )]
]
11
e
where the last equality follows after substituting for
Mx - b - M(x - M
*
y
x-x
# 0
b)
-
Thus,
T(x)
T2
T
R(y) := [(My) (My)][y My]
[yTMy][(My) M(My) ]
and
sup [1-R(y)]l
yO
M(x - x ).
with
[1-inf R(y)]½
y/O
where
r - sup*T(x) =
xx
inf R(y) > 0.
y#O
M2
Then
is positive definite.
M
Suppose that
[1-R(y)],
Note that
if and only if
< 1
and replacing
is positive definite,
and
M2y
T
inf R(y) = inf
y#O
yO0
T
yTMy
T
y y
(My)TM(My)
(My)T(My)
yTM2y
inf
T
my0
sup
(My) TM(My)
sup
yO0
yy
y~O
X
(M2 )
min
[X max (Max(
where
X in(A)
mi n (
A
max
eigenvalues of the real symmetric matrix A.
symmetric, has positive real eigenvalues.
definiteness of
Consequently,
M
(My)
)
(7)
(M) )
denote, respectively, the minimum and maximum
Ama (A)
and
(My)
M ,
being positive definite and
Similarly, the positive
ensures that the eigenvalues of
and hence
inf R(y) > 0,
M
are real and positive.
r < 1.
yfO
M2
Conversely, if
nonzero vector
yTMy >O.
y.
Because
y # 0
Moreover,
which implies that
is not positive definite, then
r
1.
M
is positive definite,
ensures that
yTM2y
0
for some
(My)TM(My) > 0
(My)T(My) > 0.
Thus, R(y)
and
0,
12
The expression defining the contraction constant follows from the
convergence proof.
O
Corollary
is bounded from above by
r
The contraction constant
[ [i
min(M
r =
[
L
-
-T^ 2 ^ -1I
)
1 -2
JI
max(M)
max(M)
Proof
r
is defined by
.
r = [1 - inf R(y
where
yfO
yTM2y
inf R(y) - inf
y#O
yfO
*(My)
T
inf
(My)
(My) TM(M)y)
y My
y#O
yTM2y
yTMy
My
sup
y~o
(8)
T
(My) (My)
(y)T(my)
The numerator of the last expression can be rewritten as follows:
TM2
inf
T
yO0
yTMy
=
=
yTM2y
inf
=
inf
yfO
=
inf
ziO
Y
( yTy
Y
z z
Xmin(A),
mmn
where
A
Now for any
(M
I)- TM2 (MJ)-1
nxn
matrix
M½
and
G,
X
is any matrix satisfying
is an eigenvalue of
A
(Mi)T (M)
if and only if
M.
is
13
an eigenvalue of
if
G -M, ,
G
AG,
G
then
(see, for example, Strang [1976]).
,,
AG = M
In particular,
and hence
hence-1
and
(MN
min(A) - X
(9)
).
Finally,
sup
(My)
yOf
M(My)
(My)
The result follows from (8),
Mz
z
z
(My)
sup
(10)
max
D
(9) and (10).
Note that a different upper bound on
r
can be derived from inequality
(7) in the proof of Theorem 1:
i
r r
:=
I
[1-
1 -
In general, this bound is not as tight as the bound
corollary.
To see this, let
that
has positive real eigenvalues.
S-IT
S
and
X min(T) =
min
T
r
given in the
be symmetric matrices, and assume
min
Ilxlil
Then
xTTx
'T -1
S x SS Tx
(S-lT)xTSx
=
'iA
min
(S- T)·
iX(S
min
where
Let
max
||XI||-
xTSx
T)X
(S),
max
is the eigenvector corresponding to the minimum eigenvalue of
S - M
and
T = M2 .
Then
S-1T
T
S
M-1M
M2
M
S- T.
has
has positive real eigenvalues
14
because it has the same set of eigenvalues as the positive definite symmetric
Thus,
(M)-TM (M)l.
matrix
M)
min(M
.
max
which implies that
r
(M)
X
(M
A2Q,
min
=
( 2
i'
-
max)
3.2
r.
2
in
max)]
Discussion of M 2
The theorem indicates that the key to convergence of the generalized
steepest descent method is the matrix
M2 .
If the positive definite matrix
M
is symmetric, the convergence of the steepest descent algorithm for
M
unconstrained convex minimization problems follows immediately:
positive definite because
M,
MTM
=
positive definite imposes a restriction on the degree to which
from
M
.
To see this, note that
M2
M
M
can differ
for every
x
O0.
is positive definite if and only if for every nonzero vector
the angle between the vectors
MTx
The positive definiteness of
on the quantity
1 1M-MT 11
be
is positive definite if and only if
M
xTM2x = (MTx)T(Mx) > 0
Thus,
In
being positive definite, is nonsingular.
general, the condition that the square of the positive definite matrix
is
and
Mx
M2
is acute.
does not imply an absolute upper bound
for any norm
increase this quantity by multiplying
x,
M
||-|,
because we can always
by a constant.
However, if
M2
is
15
jIM-MTI I/ M+MTII
positive definite, then the normalized quantity
less than
1.
This result follows from the following proposition.
(Anstreicher [1984])
Proposition 1
M
Let
be an
T hen,
real matrix.
nxn
positive definite if and only if
x
must be
for any norm
M2
l(M+MT)xll
<
l(M-M )x
11.,
is
for every
0.
Proof
T
M2 + (M2)
Thus,
' i[(M+MT)
xT(M2 )x >
0
2
+ (M-MT)2]
xT(M2
-
+ (M2) T)x >
xT(M+MT)2x + xT(M-MT)2x > 0
T
xT(M+MT ) (M+MT)x > -xT(M-M
I (M+MT)xll >
In particular, if
T,
, .T ,,
I IM- M I
M
M
xT(M
) 2x
T
) (M-MT )x
(M-MT)xl.
O
i s positive definite, then
max
,- , -T,
,,
IItm-m xll
-'I
,,
Im
-T% - II
-M
-
x I I <
I .... , T%- I I
I I tMtm
)X I I
I x 1=1
max
II (M+MT)x I = IIM+MT I,
where
x - argmax
I1xl11
I xlll
Consequently,
I I- MT I1l/ IM+MI
C(M-MT
)x I I.
< 1.
For related results on the positive definiteness of
M2 ,
see Johnson's
[1972] study of complex matrices whose hermitian part is positive definite
(see also Ballantine and Johnson [1975]).
This thesis and subsequent paper
describe conditions under which the hermitian part of the square of such a
matrix is positive definite.
16
3.3
Discussion of the Bound on the Contraction Constant
Let us return to the problem defined in Example 1.
f(x)
M x
definite.
is affine and strictly monotone, since
MP '
For this example,
2p
is positive definite if and only if
M
M -M
12/IM M
12
< 1
if
11iN M1-0
pMT2
P
P pp2
2
M
212p
2
~0
-P
and
Ipi <
and only if
.[1
M
,1-p
The mapping
.
is positive
2
,
Moreover,
is positive definite, since
0
2
22
21
For this example, the upper bound on the contraction constant given in
the corollary is tight.
To see this, first note that
k+1
p[
0
-1
k
Jx
0-
|IXk
k1
and
|M
x
*
O.
0
I, so the
Thus,
IIXk+
|
112
(P2(Xk)TIX k)
l1I
p112
olx
*
12
i
=
p
Recall from the example that
norm is equivalent to the Euclidean norm.
x
M
'
p
-pI
Ix
X
I
2
hence the contraction constant for the problem is 2 1P|- The bound given
by the corollary is also
Ipl,
because
min(M n)
ami
) - p
and
Xmx(M)
max0
,
giving
-
[1-(l-p2 )]
- ipi
M
p
17
and applying the Kantorovich
M
found by diagonalizing
r
upper bound on
is symmetric, a tighter
M
If
may be quite loose.
r
contraction constant
on the
r
For affine problems defined by symmetric matrices, the bound
inequality (see, for example, Luenberger [1973]) is
rmax
r
k
X
=
- Xmin(M)
+
nM)
for example, if
r
r
-
0.82
and
r
in the same way if
then
1,
then
k = 3,
0.58; if
-
k
-
while
(k-l)/(k+l),
in (M), r
(M)/
In terms of the condition number
.
r
=
S
rs = 0.5
0;
r
and
=
k
is not symmetric.
then
1.5,
and if
r = 0.82;
A matrix
Thus,
[(k-l)/k].
M
r
s
k
-
0.2
10,
and
then
cannot be derived
r
This tighter upper bound on
0.95.
M
if
r
can be decomposed into
its spectral decomposition (and hence is unitarily equivalent to a diagonal
matrix) if and only if
and only if
T
T
M M = MM .
diagonalize
M
r
s
3.4
is normal, which is true for a real matrix
M
M
Thus, if
if
is not symmetric, we cannot necessarily
and use the Kantorovich inequality to obtain the upper bound
on the contraction constant
Sufficient Conditions for
r.
M2
to be Positive Definite
We now seek easy to verify conditions on the matrix
that the matrix
M
M
is positive definite.
M
that will ensure
The following example shows that
the double (i.e., row and column) diagonal dominance condition, a necessary
and sufficient condition for convergence for the problem in Example 1, is not
in general a sufficiently strong condition for
M2
to be positive definite.
18
Example 2
2
0
0
4
2
Let
.99
0
SinceM det(2M
.99 ) 01 -3.2836,
1 . Then
M
0
0
8
2.97
2.97
Mis not
- _2.97
2.97
0
and 2M
2.97
2
positive
01
definite,
and - therefore
M2
1
_2.97
0M2
D
is not positive definite.
Results by Ahn and Hogan [1982], (see also Dafermos [1983] and Florian
and Spiess [1982]) imply that the norm condition
D - diag(M)
and B - M-D,
ID
BD
I112
,
<
where
ensures that the Jacobi method will solve an
unconstrained variational inequality problem defined by an affine map.
(I ID BD- 112
1
<
implies the usual condition for convergence of the Jacobi
method for linear equations,
radius of the matrix
A,
p(D-1B) < 1,
and Chan [1981] show that if
I ID-BD- 112
< 1.
ID- BD
< 1
M2
II2
p(D-1 B)
because
M
where
p(A)
is the spectral
D-1 BI ID
<
ID½BD- 112
)
Pang
is doubly diagonally dominant, then
Example 2, therefore, also demonstrates that
is not a sufficiently strong condition on
M
to ensure that
is positive definite.
The following theorem shows that stronger double diagonal dominance
conditions imposed on
M
guarantee that
which in turn implies that
M2
M2
is doubly diagonally dominant,
is positive definite.
Theorem 2
Let M -
(Mij)
for every
i
be an
nxn
matrix with positive diagonal entries.
1,2,...,n,
IMijI
ji1
<
ct
InMjil
and
joi
<
ct,
If
02 _
19
min{(Mii) : il,...,n}
where
t =
and
max{Mii..: il,...,n}
c = /
- 1,
then both
M
and
M2
are doubly diagonally dominant, and therefore positive definite, matrices.
Proof
Let
M..
be the
element of
(i,j)th
M.
Then the
(i,j)th
element of
13is
is
(Mii)
2
(M )ij =
+
r
kfi
n
=
MikMkj
k=1
iiMij + MijMJj
M2
To show that
(M2)ii
i,j
+
(Mii)
>
Z
I(M 2 ) i
I
and
(M2)ii
>
>
(Mii)
>
-
-
I(M 2 )jil
Z
Jr.
k(i
and
if ij.
MikMkj
is doubly diagonally dominant, we must show that
JT''
i.e., that
if i=j
MikMki
E
k~i
Mik ki +
Mik M ki +
i~i
Mii.M.
MijMjj +
+
MikMkj I
ki,j
z IMiiM i + MjiMjj +
I
M jkki
joi
ki,j
(11)
(12)
To show that (11) holds, it is enough (by Cauchy's Inequality and the triangle
inequality) to show that
(Mi2 > k
i
IMikllMkil
+
iij
i
IMij I +
Mi
I ij
jiJ
I+
z
IMikl MkIjl
jii ki,j
Because the last term in the righthand side of the above expression is equal
to
ZE
kji
IMikMllkjl,
j3i,k
the righthand side is
the sum of the first and last terms in
20
|
kl['M il +
kri
E
jSi,k
1\kj]
E
kii
'Mik
[ £ 'kjl
].
jok
Consequently, to show (11) is true, we show that
IMik
Mii
[
Ikjl] +Mii
Mij I +
E
ik
1f
kJi
E Mi IMij i
jji
(13)
t
To establish (13) (and hence (11)), we introduce the quantity
in the statement of the theorem.
(i)
t
Mii
for every
defined
Note that
for
(Mii) /Mii = M..ii
t
(since
iil,...,n
every i), and
(ii)
Max{Mii: i-=l,...,n}
t
(Mii)
The bounds on the off-diagonal elements of
for every
M
i=l,...,n.
assumed in the statement
of the theorem ensure that the righthand side of (13) is bounded from above by
IMik(ct) + Mii(ct) + Max{Mii: i=l,...,n}(ct)
k#i
< c2 t2 + Mii(ct) + (ct) Max{Mii: il,...,n}
2 2 + 2c(Mii) 2
(Mui)if + (Mii)
c(MII)
(i)and
since(ii).
(Mii)
(c + 2c)
(Mii)
2 by or,
holds
Thus, (13)- c
c
c
+ 2c - 1
/-1,
0,
which holds if and only if
(13), and therefore (11), must hold.
then (12) must hold.
c
e
[-/'-1, /-1].
Similarly, if
M2
These two results establish that
> 0,
if
Thus, if
c -
-1,
is doubly
diagonally dominant whenever the hypotheses of the theorem are satisfied.
The double diagonal dominance of
diagonally dominant.
Because
M1
M2
M2
ensures that
is doubly
is symmetric and row diagonally dominant,
by the Gershgorin Circle Theorem (Gershgorin [1931]),
M
has real, positive
21
eigenvalues.
Since
M
is symmetric and has positive eigenvalues,
positive definite, and hence
M
M
is
is positive definite.
The conditions that the theorem imposes on the off-diagonal elements of
M
also ensure that
M
is doubly diagonally dominant, and hence that
positive definite.
stronger than the assumption that
that
M2
M
is
°
Because the assumption that
imposed on
M
M
M
is doubly diagonally dominant is
is positive definite, the conditions
in Theorem 2 are likely to be stronger than necessary to show
is positive definite.
In a number of numerical examples, we have
compared the following three conditions:
(1) the conditions of Theorem 2;
(2) necessary and sufficient conditions for the matrix
M2
to be doubly
M2
to be
diagonally dominant; and
(3) necessary and sufficient conditions for the matrix
positive definite.
From the proof of Theorem 2, we know that conditions (1) imply conditions (2),
and conditions (2) imply conditions (3).
The examples suggest that there is a
much larger "gap" between conditions (2) and (3) than between conditions (1)
and (2).
Thus, it seems that we cannot find conditions much less restrictive
than the conditions of the theorem as long as we look for conditions that
imply that
M2
M2
is doubly diagonally dominant instead of showing directly that
is positive definite.
The conditions that Theorem 2 imposes on the off-diagonal elements of
are the least restrictive when the diagonal elements of
M
are all equal.
Section 5, we show that by scaling the rows or the columns of
M
M
In
so that the
22
scaled matrix has equal diagonal entries, we may be able to weaken
considerably the conditions imposed on
M.
We close this section by noting that the positive definiteness of the
matrices
M
M2
and
consequence, if M
is preserved under unitary transformations.
and
M2
As a
are positive definite, then the generalized
steepest descent method will solve any unconstrained variational inequality
problem defined by a mapping
to
4.
f(x)
Mx-b,
where
M
is unitarily equivalent
M.
THE GENERALIZED STEEPEST DESCENT ALGORITHM FOR UNCONSTRAINED
PROBLEMS WITH NONLINEAR MAPS
If
f:R n_
Rn
is not affine, strict monotonicity is not a sufficiently
strong condition to ensure that a solution to the unconstrained problem
n
VI(f,R )
exists.
If, for example,
Because the ground set
no solution.
x
n - 1 and f(x) - e ,
R
then VI(f,R 1)
has
over which the problem is formulated
is not compact, some type of coercivity condition must be imposed on the
mapping
(See, for example,
to ensure the existence of a solution.
f
Auslender [1976] and Kinderlehrer and Stampacchia [1980].)
solution to
VI(f,Rn)
hemicontinuous.
is ensured if
The existence of a
f is strongly coercive and
Therefore, because uniform monotonicity implies strong
coercivity, in this section we restrict our attention to problems defined by
uniformly monotone mappings.
The following theorem establishes conditions under which the generalized
steepest descent method will solve an unconstrained variational inequality
problem with a nonlinear mapping
f. In this case, the key to convergence is
the definiteness of the square of the Jacobian of
solution
*
x .
f
evaluated at the
23
Theorem 3
Let
M - Vf(x ),
Let
that
M2
close in
be uniformly monotone and twice Gateaux-differentiable.
Rn
f: Rn
is positive definite.
and assume
Then, if the initial iterate is sufficiently
the sequence of iterates produced by
x ,
norm to the solution
M
VI(f,R ),
is the unique solution to
x
where
M
the generalized steepest descent algorithm contracts to the solution in
norm.
Proof
I
To simplify notation, we let
E := IIX-X*11
be the initial iterate.
x
x
Let
this proof.
norm throughout
denote the M
|'
We will show that if
is sufficiently small, then the iterates generated by the
> 0
x .
algorithm contract to the solution
the first iterate generated by the
x,
By Step 2 of the algorithm,
< 1.
We assume that
algorithm, solves the one-dimensional variational inequality problem on the
ray
[x; -f(x)]
emanating from
in the direction
x
Lemma 1 demonstrates that the solution
satisfies
x -
f(x),
fT (x)f(x)
=
The proof of
to this one-dimensional problem
x
as
Thus, if we parameterize the ray [x; -f(x)]
O.
x - x -
then
-f(x).
f(x),
where the steplength
e
is defined by the
equation
fT(x)f(x -
f(x))
By Lemma 2, which follows, the value
to determine an expression for
we approximate
f
about
x
(14)
0.
satisfying (14) is unique.
0, for any
x
S
:" {x :
with a linear mapping and let
IX - X |I
v
error in this linear approximation, i.e.,
f(x) - f(x ) + Vf(x )(x - x )
In order
+ v ' M(x - x ) + vx
X
x
=
E}
denote the
24
Substituting
M(x -
expression for
f(x) - x ) + v-
for
x
f(x)
in (14) yields the following
:
fT(x)M(x - x ) + fT(x)vfT(x)Mf(
x
fT (x)Mf(x)
To show that the iterates generated by the algorithm contract t,
o the
solution in
M
norm, we show that there exists a real number
is independent of
must satisfy
x
S ,
x
lix - x l/llx
T(x) :=
r
we define
IIx - x i < rllx - xlI.
and satisfies
r
sup T(x).
r
- x*l
that
0,1)
r
Beca use
r
for every
T(x) > 0
is clearly nonnegative, since
xcS
£
for every
x
S .
We now show that
have that
T(x)
R(x) :=
=
r < 1.
Proceeding as in the proof of Theorem 1, we
[1 - R(x)]½ ,
-T
where
fT(x)M(x - x ) +
-
2fM*
2
*T
(x - x )TMf(x)
fT(x)Mf(x)
-
(15)
(x - x )TM(x - x )
Substituting for
in (15) and replacing
f(x)
M(x - x ) + vx
with
we
have that
R(x)
-
*)TMTMx*
*
*TM2
*
) TMT(x - x )][(x - x )
(x - x )] - E(x)
[(x - x
I
(16)
v-,
and is
[(M(x - x ) + vx] M[M(x - x ) + v )][(x - x* ) M(x - x )]
x
x
where the error term
E(x)
contains all terms involving
v
and
x
given by
Tv*
*T
T
*T
-{v M(x - x* ) + vT[M(x
- x ) + v x] }{(x - x ) Mv - v[M(x - x ) + v x]
x
x
x
x
x
x
+ (x - x ) M M(x - x ).{(x - x )TMv
x
+ (x - x*2(x
-
x
){v
x
-
x)
- v-[M(x - x ) + v ]}
x
v[M(x
X
x
- x ) + vx
x
)
25
E(x)
can be bounded from above using the triangle inequality, Cauchy's
inequality, and the fact that the matrix norm
IIAII'[xjI
E(x)
IE(x)
IlAx I
I A I satisfies
for every vector x:
1 11 Ivxl I M l - X| + IIvx |.[MI l| I - X | + lvx |11]2
l M2 11)llx - x*1I 12
+ {( I IMTMI I +
{I IxlI
IMIIIXI II
x
I+ IIIv
< {kllix - x*113 + k 2 iix - X* 12 [k
+ k1X - x
12
3
- x
I
I
+
I
I
}1
x - x*11 + k 4 11X - x*112]}2
{kll x - x*113 + k 2 llx - x
[k311x - x i
kx
MI'I
'[ I
|l
+ k 4 lIx - x *12]}
- x 115
where the second inequality follows from Lemma 3, which follows, and the third
x - x
inequality holds since
k.
1
1 < 1
for every x
S.
k
since each
0
O0.
Since
r
-I
I sup [1 - R(x)]' = [1
xcS C
sup T(x)
xESC
and only if inf R(x) > 0.
We therefore show 1that
Dividing the numerator and denominator (16) by
(recall that
I:II denotes
the
M
norm) and
-
*)
inf R(x)] ,
xeS
Ilx -
x*)TM M(x - x )]
x
|| [(x
using
inf R(x)
xeS c
inf
XESE
(x - x )TM(x
- x)
kl I
if
inf R(x) > 0.
xES
-E(x)
- kllx - x 115,
we obtain
(x - x*)TM2 (x
r < 1
-
x*113
(x - x )TMM(x - x )
[M(x - x*) + v ]TM[M(x - x *) +
- x*TT(x - x*
(x - x ) M M(x - x )
x]
26
inf
xS
XcS
(x - x*)TM2 (x - * )
sup
kl|x - x
T
)TM(x
- x*)
xS
XE
* TMT
(
*
-
SUP
[M(x - x ) + vX]
X
sup
XES
*
)TMTM(x -
)
*
T
*
L
(X -
13
M[M(x - x ) + vX]
(x - x )TM
TM (x - x )
Considering each of these three expressions separately, we have:
(X - X ) M (X - x )
inf
xES
(x - x )TM(x - x )
g
inf
xcS
(x-x*)TM2(x-
sup
(x-
*T
(x - x )(x
x*)
X (M2 )
min
*
- x )
x )TM(x - x*)
kmax (
> 0
)
xcS (X - x )T(x - x )
M2
since
and
M
are positive definite;
(x-x )TM(x-x*)
kl Ix - x I13
sup
xcS
=
kllx - x II (x-x *)T(x-x* )
sup
(x - x ) MTM(x - x )
**
(X - x ) (X - x )
E
C
)
x (M)
kcm
maxin
=
be ,
Xm2.n (MM)
where
sup
x ES
b :
k.e ax(M)/
X
xS
;S
M) 2 0;
[M(x - x ) + v x ] TM[M(x -x
and
) + v x]
(x - x* )TMT
M MM(x - x* )
£
sup
in
(MT
[(M(x - x )
]TM[M(x
- x)] +
[M(x - x )].[M(x - x )]
~max (M) +
up I
xESc
I1 · I IM2
I IX -
sup
xES
vTM (x - x ) + (X x
)TMTMv
(x - x )TMTM(x
E
I+ I x1- x*1-I I IMTMI1- I I
(x- x
) TMTM(x
- x*)
x
+ VTMv
x x
x
- x*)
I1+ I
I
I IMI I
I IVx I
27
sup(M)+
(
I
-x x II
cI
)TTM(x - x )
x
max(M)
+
sup
<X
max
(M) +
max
XESc (-x
x
-
*TT
*
(x - x )TM M(x - x )
inf
xcS
=X
(x-x ) TM(x-x*)
) T(x-x )
sup
xESE clix
* T
(X - X ) (
ACE
C max(M)
ma
(M) +
-
)
= X
max (M) + a,
(MM)
min
where
c
0,
and hence
a :- cX
(M)/X in(MTM) a 0.
Combining these inequalities gives
X
(M2 )
mbin
- bc
x
inf
X£S
R(x)
a
Xmax (M) + aE
c
which is greater than zero if
is sufficiently small, since the denominator
X
(M2)/X
(M) > 0,
min
max
is positive,
constant
(M)
max
and
b
0.
Hence, the contraction
is less than 1.
r
Lemma 2
If
unique
f
is uniformly monotone, then, for a given
> 0
satisfying
fT(x)f(x - Bf(x)) - 0.
is the modulus of monotonicity of
f.
x
there exists a
x ,
Moreover
7 9
-,
where
a
28
Proof
solves the one-dimensional variational inequality problem
e a 0
[x -
f(x) - (x -
i.e.,
e
satisfies
Thus,
e
solves
f(x))
f(x))] f(x -
where
VI(g,R 1),
2
)-g(e
x,
1 )]
(e
2
e
0,
0.
The existence
is uniformly
g
allf(x) 112:
monotone with modulus of monotonicity
(e02-1)[g(e
for every
-fT(x)f(x - ef(x)).
g(e) :
6 follow, because, for a given
and uniqueness of
e
for every
f(x)) a 0
(x)f(x -
-(8 - )f
0
1 )fT(x)[f(x-elf(x))
- f(x-6 2 f(x))]
= [(x-elf(x)) - (x-62f(x))] [f(x-el
)_f))-f(x-2f(x))]
lIx - elf(x) - x + e2f(x) 112
-
[Illf(x) l2] ( 2 *
Moreover,
Moreover, e is positive, because x
o<
To show that
f (x)f(x - ef(x))
Since
-
implies that g(O)
02 = e
we set
-,
x
a0
1.
Therefore, since
81 '0
11
f(x)j2
2
< 0.
in expression (17).
-2 T
ae2fT(x)f(x).
fT(x)f(x)
e >0
and
-
this substitution gives
0,
-T
Using the fact that
(17)
1)2
and
f(x)
#
(because
0
x $ x ),
we see that
a > 0, e <-
Lemma 3
Assume that
S :
{ : II
fT(x)f(x) - O.
-
let
1.11
f
has a second Gateaux derivative on the open set
11 < 1}.
Let
Let
x
x - ef(x),
v - f(x) - M(x - x ),
denote
M norm. the
denote the
M norm.
let
where
e is chosen so that
v- - f(x) - M(x - x ),
and
29
Then, for
there exist constants
1, 2 and 3,
i
x
the following conditions for any
(i)
(ii)
(iii)
-
S:
clIx - x*112;
Ilvxll
Ix
that satisfy
0
Ci
*1
lIv-11
c3 lx
and
I x - x*I;
c2
x*112.
-
Proof
(i)
IlVxll = I
-
f(x) - M(x - x )
I
X*) 11
If(x) - f(x ) - Vf(x )(x -
I sup
i ,x
I IV2 f[x * + t(x - x )
-* 2x*12
0;5t;1
<-sup
xcES
sup
0
O<t-1
IjV
2
*
f[x
+
t
-
)]II1}1Ix -
x* 12
- clllx - X*11 2 .
where the first inequality follows from an extended mean value theorem stated
as Theorem 3.3.6 in Ortega and Rheinboldt [1970].
(ii) II - X*11
Ix -
[M(x - x ) + v xx] - x
< IIx - x
IMi · IX -
+
c
Clearly,
0.
II
x*II +
IIx - x*112
. c211x-x II.
where the first inequality follows from Lemma 2 and (i), and
2
:= 1 +
2
IMI +
a~~~~c
c1
a
>O
because
IIMII > 0, C1 k 0,
and
a > 0.
30
-=
f(a) - f(x*) - Vf(x*)(x - x )1l
S sup
tV2f[x + t(x - x )]!l~x - x 112
up*
{x:llx-x 11c
C311 -
-
*
c 3 c21Ix - x 1
=
311x- x
5.
k
and
12
where the last inequality follows from (ii) and
2
x)]*112
112
C3
- 2
c2
I-
sup II2f[x* + t(x - X*)1i
2} o0tl
c3
O
because
c3
0
>0.
o
SCALING THE MAPPING OF AN UNCONSTRAINED VARIATIONAL INEQUALITY
PROBLEM
In this section, we consider a procedure for scaling the mapping of an
unconstrained variational inequality problem that is to be solved by the
generalized steepest descent algorithm.
We first consider the problem
VI(f,R n )
f(x)
defined by the affine mapping
either the rows or the columns of
M
Mx-b.
We show that by scaling
in an appropriate manner before applying
the generalized steepest descent algorithm, we can weaken, perhaps
considerably, the convergence conditions that Theorem 2 imposes on M.
When
f(x) - Mx-b,
the unconstrained problem
VI(f,R n )
Mx - b.
the problem of finding a solution to the linear equation
nonsingular
nxn
equivalent.
We can, therefore, find the solution to
equivalent problem
matrix, then the linear systems
VI(Af,Rn),
where
steepest descent method will solve
Mx
Af(x) - AMx-Ab.
VI(Af,R n )
if both
=
is equivalent to
b
If
A
is a
and (AM)x - Ab
VI(f,Rn)
by solving the
The generalized
AM
and
are
(AM) 2
are
31
In particular, suppose that
positive definite matrices.
diagonal entries and let
D
=
diag(M).
Then
D
if
(D-1M)
and
are both positive definite matrices, which is true, by Theorem 2, if
i = 1,2,...,n,
for every
Z
J~i
l(D M)ij
where
Since
has positive
is nonsingular, and the
VI(D-lf,R n )
generalized steepest descent method will solve
(D 1M) 2
M
=
c
/
- 1
is diagonal, and
D
and
< ct
)ii
(D- M)
(Note that all diagonal entries of
M)
I<
(18)
ct,
ji
min{[(D- 1M)ii]2: i = 1,...,n}
t =ii
: i = l,...,n}
max{(D- M)
and
(D
Z j(D
joi
(Mii)
=
=
/M
M
('ij
D
M
,
then for each
i
and j,
ij/Mii'
are equal.)
Conditions (18) can
therefore be simplified, establishing the following result.
Theorem 4
Let
and let
M
D
=
(Mij)
be an
= [diag(M)]
nxn
1.
matrix with positive diagonal entries,
If for every
and
|MilI < cM
JSi i
ii
where
c
matrices.
-
1,
then
(D-1M)
and
i
joi
2
(D 1M)
=
1,2,...,n,
j
M
< c,
are positive definite
(19)
32
The conditions that Theorem 4 imposes on
can be considerably less
M
restrictive than the analogous conditions that Theorem 2 imposes on M; namely,
for every
1,2,...,n,
i
IMij
joi
I
< ct
(20)
IMjil < ct,
and
joi
min{(Mii)2: i = 1,...,n}
where
c = /
- 1
t =
and
The conditions on the row sums of
as those in (19), because
t
Mii
true for the column sum conditions:
in (20) are at least as restrictive
M
<
E
n
t
because
Mii
min{Mii:
This is also
i = 1,...,n},- the
< c min{Mii: i
Z IMji
jfi
MJ
i = 1,2,...,n.
for every
column conditions in (20) imply that
hence
i = 1,...,n}
max{Mii:
1,...,n},
and
j
-I-
< c.
The conditions specified in
(20) are equivalent to those given in (19) if, and only if, all of the
diagonal entries of
M
are identical.
The following example allows us to compare conditions (19) and (20) for
the problem
VI(f,R n )
f(x) - Mx-b.
defined by an affine map
Example 3
Consider the effect of scaling the matrix
N
M
-
00
He
a
b
0
1
0
1/N
0
,
1
0
0
M
defined as
0
and
D
M -
1
Conditions (19) for the scaled problem reduce to the inequality
1
0
0
a
1
0
b
0
1
33
la + bi <
- 1.
In contrast, conditions (20) for the unscaled problem are
1) N2
(/2Ja| + Jb[ <
The upper bound on
the upperbound
N = 1,
from
(/
-
I( I+
a
- 1)
bi
0
N
if
N 1
imposed on
the conditions imposed on
becomes increasingly stringent.
lal
and
IbI
1
.
imposed by the conditions (20) is tighter than
la
+
bI
for the scaled problem unless
in which case the bounds are the same.
1,
drive
1
1) N
if
(As
a
and
N
0
As the value of
IbI
or
N
moves away
for the unscaled problem
N +
,
the conditions
(19)
o
to zero).
Analogous results can be obtained by column-scaling the matrix
M.
An
argument similar to our discussion of row-scaling shows that if for every
i
=
1,2,...,n,
IM
M.
then
MD - 1
and
< c
and
(MD-1)
2
Z
Mji
< cMii,
where
c
(21)
(21)
-
are positive definite.
For a given problem, either the rows or the columns of
M
could be
scaled in order to satisfy one of the sets of conditions that ensure
convergence of the generalized steepest descent method.
For a given
matrix, one of these scaling procedures could define a matrix for which the
algorithm will work, even if the other does not.
If we column-scale the
matrix of Example 3, then conditions (21) reduce to
34
lal
bI ' (r - 1)N.
+
Thus, in order to obtain the least restrictive conditions on
to column scale for all positive values of
M,
it is better
N.
By using a row- or column-scaling procedure, it might be possible to
transform a variational inequality problem defined by a nonmonotone affine map
into a problem defined by a strictly monotone affine map.
MD
-1
might be positive definite, even if
M
is not.
That is,
D
M
or
The following example
illustrates such a situation.
Example 4
Let
-1
L
M
8
0.51
2
.
Neither
(-1)
(D M)
nor
and
det(M )
=
D/~M
are positive definite, since
det(D
M)
f(x)
Mx-b,
M
is positive definite, since
10
det(M) - -8.0625 < 0
2
M
0.27 > 0.
-1665.5625 < 0.
However, both
M) - 0.5775 > 0
det(D
D
M
and
and
Consequently, an unconstrained problem defined by
and this choice of the matrix
M
can be transformed, by
row-scaling, into an equivalent problem that can be solved by the generalized
steepest descent method, even though neither
definite.
nor
M
is positive
Note that column-scaling will not produce a matrix satisfying the
steepest descent convergence properties:
since
M
det(D
)
MD
is not positive definite,
-15.2 < 0.
O
These scaling procedures can also be used to transform a nonlinear
mapping into one that satisfies the convergence conditions
given in Theorem 3 for the generalized steepest descent algorithm.
algorithm will converge in a neighborhood of the solution
x
if
The
35
)
(or -fD1
Df
(i)
is uniformly monotone and twice Gateaux
differentiable; and
D
where
6.
(or [Vf(x )D- 12 )
[D- Vf(x )]2
(ii)
=
is positive definite,
diag[Vf(x )].
GENERALIZED DESCENT ALGORITHMS
The steepest descent algorithm for the unconstrained minimization problem
Rn
Min {F(x):x
{xk }
generates a sequence of iterates
that minimizes
in the direction
F
}
by determining a point
k
-VF(x
)
Rn
xk+1
x .
from the previous iterate
In contrast, general descent (or gradient) methods generate a sequence of
{xk
iterates
dkVF(xk) < 0.
i.e.,
k
by determining a point
from
dk
direction
x
}
x
*
k
x , where
dk
x
C R
is any descent direction from
The set of descent directions for
in the
F
that minimizes
k
x
*
x ,
from the point
F
is given by
D(xk):- {-AkVF(xk): Ak
is an
nxn
positive definite matrix}.
This general descent method reduces to the steepest descent method when
Ak = I
for
k
0,1,2,....
If
Ak = [V2F(xk)]
1
for
k
=
0,1,2,...,
this method becomes a "damped" or "limited-step" Newton method.
If
F
then
is
uniformly convex and twice-continuously differentiable, then this modification
of Newton's method (i.e., Newton's method with a minimizing steplength) will
produce iterates converging to the unique critical point of
F.
(See, for
example, Ortega and Rheinboldt [1970].)
In this section, we analyze the convergence of gradient algorithms
adapted to solve unconstrained variational inequality problems.
36
VI(f,Rn)
The generalized descent algorithm for the unconstrained problem
can be stated as follows:
Generalized Descent Algorithm
:
Step
Select
x0
Rn.
k = 0.
Compute the scaling matrix
Scaling Choice.
Step 1:
Set
Ak = A(x 0 ,xl,...,x ,k).
Direction Choice.
Step 2:
x
k
*
x .
-Akf(xk).
Compute
=
0,
stop:
Otherwise, go to Step 2.
Find
One-Dimensional Variational Inequality.
Step 3:
Akf(xk)
If
xk+1 C [xk; -Akf(xk)]
satisfying
k+l T
k+l
)
)Tf(x
(x - x
Go to Step 1 with
0
for every
k
k
x c [xk; -Akf(xk)].
O
k - k + 1.
The following result summarizes the convergence properties for this
algorithm when
f
is a strictly monotone affine mapping.
Theorem 5
Let
M
be a positive definite matrix, and
f(x) - Mx-b.
the sequence of positive definite symmetric matrices and
{xk
Let
}
{Ak
}
be
be the
sequence of iterates generated by the generalized descent algorithm applied to
VI(f,C).
(a)
Then,
the steplength
ek
determined on the
kt
iteration of the
algorithm is
ek
(Mx b) Ak(Mxk-b)
k
T
k
(Mx-b) AkMAk(Mx -b)
(22)
37
(b)
the sequence of iterates generated by the algorithm are guaranteed
to contract to the solution
constant
r < 1
x
in
norm by a fixed contraction
if
inf [X min(MAM)] > 0,
A
(i)
M
and
inf [Xmin(A)] > 0;
A
(ii)
where the infimum is taken over all positive definite symmetric
matrices
(c)
A;
and
the contraction constant
r
is bounded from above by
M
hT
r =
inf Xmi[(M
s
A
A
1
rt
sup
A
Hi_
(MAM)(M
-1
1/2
T
max[A M(A)
Proof
(a)
For ease of notation, let
scaling matrix,
e
the
kth
x
be the
steplength
k
iterate generated by the algorithm.
iterate,
and
x - x -
Af(x)
x
x.
k th
solves the unconstrained one-dimensional subproblem,
fT(x)Af(x)
(Mx-b)TA[M(x -
Solving the last equality for
kth
be the
(Otherwise,
and the algorithm would have terminated in Step 2 of the
x
be the
(k+l)St
As in the proof of the generalized
steepest descent method, we can assume that
Since
A
A(Mx-b)) - b] - 0.
gives expression (22).
f(x)
0,
iteration.)
38
(b)
The iterates generated by the algorithm are guaranteed to contract in
norm to the solution
r
[0,1)
M b
x
M
if and only if there exists a real number
that is independent of
I1 -
x
and satisfies
IIM S rlx - x 1.M.
Thus, we define
r :
sup *TA(),
A,xix
where
I lx - x* 11
M
TA (x) :=
for
x
x .
As in the proof of Theorem 1, we obtain a simplified expression for
TA(x):
TA(X)
where
y
= [1 - RA(y)]
,
x - x , and
RA(Y)
T
T
:= [(My)TA(My)][(y) MAMy]
[(My) AMA(My) ][(y) My]
Therefore,
r
if and only if
inf RA (y) >0
A,y#O
If
sup * TA(X)
A,x#x
inf RA(y)
A,y#O
if
inf [A in(A)] > 0
minA
A
sup
[1 - RA ( y ) ]
A,y#O
>
0.
[1 -
inf RA (y)]
A
A,yO0
Hence, to prove (b), we show that
inf [Xmin (A)] > 0
A
min
and
-
and
inf [ mi n( M AM )
A
min(A)l
A
inf [X
(MAM)] > 0.
A
min
,
>0,
then
then
< 1
39
inf [(My)TA(My)]l
A,y#O
(My)T(My)
inf
AY
[(y)TMAMy]
(y)Ty
12
Ivow
A,yO0
sup
[(My) AM A(My)l
A,y#O
(My)T(My)y
inf [X mi n (A)].
A
sup
[(y)TMy]
A.yS
(y)Ty
inf [Xmi n MA
M)]
A
m
.
sup [Amax(AMA)]
A
because
of
(c)
M.
sup [ ma(AMA)] > 0
A
Thus,
and
Ama
x
X
max (M) > 0
u,
( M)
by the positive definiteness
r < 1.
By an argument analogous to the argument in the proof of the corollary to
Theorem 1,
inf
(Y)TMAMy
Ay6O0 (y)TMy
inf RA(y)
A,y#O
sup
(My) AMA(My)(
AYO
0(My)TA(My)) I
H -T
Me -1
inf Amin [(M ) (MAM)(M ) ]
A
sup AX[A M(A)
A
T]
max
The result follows from this inequality and the fact that
If the sequence
of iterates
x0,...,x k )
{xk
}
{Ak}
r -
[1 -
inf RA(y)]i
A,y#O
of scaling matrices is chosen before the sequence
is generated, (that is, if
A
is independent of
then the statement in part (b) of the theorem can be strengthened.
In this case, a proof that generalizes the proof of Theorem 1 shows that the
40
{x }
sequence
is guaranteed to contract to the solution in
M
norm if and
only if
inf
[A min(MAkM)l > 0
k=0,1,...
(i')
(ii')
When
inf
[Imin(Ak)] > 0.
k=0,1,...
is independent of
Ak
and
algorithm if conditions
(i")
x,...,x ,
(i') and (ii') are replaced by the conditions:
lim inf[Xmi
( n MAM)] > 0,
(ii")
we can also ensure convergence of the
lim inf[ min(A )]
k+0
and
> O.
In this case, the iterates do not necessarily contract to the solution.
7.
THE FRANK-WOLFE ALGORITHM
Consider the constrained variational inequality problem
f:C C Rn+Rn
VI(f,C),
is continuously differentiable and strictly monotone and
bounded polyhedron.
where
C
is a
In this constrained problem setting, how might we
generalize the descent methods that we have discussed for unconstrained
problems?
The class of feasible direction algorithms are natural candidates
to consider.
In this section, we study one of these methods:
the Frank-Wolfe
algorithm.
If
Vf(x)
is symmetric for every
some strictly convex functional
VI(f,C)
F:C-R 1,
x
C,
f(x) - [VF(x)]T
then
and the unique solution
solves the minimization problem (2).
Thus, when
f
x
for
to
is a gradient
mapping, the solution to the variational inequality problem may be found by
using the Frank-Wolfe method to find the minimum of
F
over
C.
41
Recall that Frank-Wolfe algorithm (Frank and Wolfe [1956]) is a linear
approximation method that iteratively approximates
VF(xk)(x - xk).
solution
v
On the
kt h
F(x)
by
F(xk ) +
Fk(x) :
iteration, the algorithm determines a vertex
to the linear program
Min Fk(x),
xeC
k+l
x
and then chooses as the next iterate the point
the line segment
that minimizes
F
on
[x , vk] .
Frank-Wolfe Algorithm for Linearly Constrained Convex Minimization
Problems
x 0 c C.
Set
Step 0:
Find
Step 1:
Direction Choice.
k = 0.
Given
Step 2:
k
x
let
Min xTVF(xk).
xcC
the linear program
then stop:
xk,
*
= x .
v
If
be a vertex solution to
(vk)T
k)
= (xk)T F(xk),
Otherwise, go to Step 2.
One-Dimensional Minimization.
Let
wk
solve the one
dimensional minimization problem:
Min
Go to Step 1 with
k
F((1-w)x
k+1
x
-
+ wvk ).
(l-wk)xk + wvk and
k = k+ .
O
This algorithm has been effective for solving large-scale traffic
equilibrium problems (see, for example, Bruynooghe et al. [1968], LeBlanc
et al. [1975], and Golden [1975].)
In this context, the linear program in
Step 1 decomposes into a set of shortest path problems, one for each
42
origin-destination pair.
Therefore, the algorithm alternately solves shortest
path problems and one-dimensional minimization problems.
If
is pseudoconvex and continuously differentiable on the bounded
F
C,
polyhedron
then (see Martos [1975], for example) the Frank-Wolfe
algorithm produces a sequence
of feasible points that is either finite,
{x k
terminating with an optimal solution, or it is infinite, and has some
accumulation points, any of which is an optimal solution.
When
f(x) = VF(x)
x
for every
C,
we can solve
VI(f,C)
by
reformulating the problem as the equivalent minimization problem and applying
the Frank-Wolfe method.
Equivalently, we can adapt the Frank-Wolfe method to
solve the variational inequality problem directly by substituting
f
for
VF
in Step 1 and replacing the minimization problem in Step 2 with its optimality
conditions, which are necessary and sufficient because
F
is convex.
(For
other modifications of the Frank-Wolfe algorithm applicable to variational
inequalities, see Lawphongpanich and Hearn [1982] and Marcotte [1983].)
Generalized Frank-Wolfe Method for the Linearly Constrained Variational
Inequality Problem
x 0 c C.
k = 0.
Set
Step 0:
Find
Step 1:
Direction Choice.
Given
the linear program
then stop:
k
x ,
Min xTf(xk)
XEC
is a solution to
x
k
let
v
If
be a vertex solution to
(xk)Tf(xk)
=
(vk)Tf(xk
Otherwise, go to
VI(f,C).
Step 2.
Step 2:
One-Dimensional Variational Inequality.
Let
wk
solve the
following one-dimensional variational inequality problem on the
line segment
[x
k
,
v
k
:
Find
wk
[0,1]
satisfying
43
{[ (1-w)x k+wvk ] - [(1-Wk)X
WkVk ]Tf [ (1-Wk)XkkVk ]
0
w E [0,1].
for every
Go to Step 1 with
k+l
x
(-wk)x
k
+ Wvk
k
k = k+l.
and
This generalization of the Frank-Wolfe method is applicable to any
linearly constrained variational inequality problem.
f
If, however,
is not
a gradient mapping, the algorithm need not converge to the solution of the
problem.
The following two examples illustrate situations for which the
sequence of iterates produced by the algorithms cycle among the extreme points
The first is a simple two-dimensional example; the
of the feasible region.
second could model delay time in a traffic equilibrium problem with one
origin-destination pair and three parallel arcs.
is affine
f
The mapping
and strictly monotone in each of these examples.
Example 5
Let
f(x)
-
where
Mx-b,
J
M
C = {x = (xl,X2 ): x2 s 1/2, V/ x 1 + x2 a -1, The solution to
Let
x0
VI(f,C)
[
.
is
x
-
and
,
b -
and
x 1 + x2 a -1}.
i
The linear program of Step 1 of the
L 1/2
generalized Frank-Wolfe algorithm solves at
vO = [
],
variational inequality subproblem of Step 2 solves at
Continuing in this manner, the algorithm then generates
xl
and the
[
v1 -
]
/
Li/2
44
x2
1
[
v2
=
-
1/2
12
1/2
, and
x3
12
1/2
x0 , xl and
iterates cycle about the three points
this cyclic behavior.
-
= x.
x2 .
Hence, the
Figure 2 illustrates
(In the figure, the mapping has been scaled to
emphasize the orientation of the vector field.)
X
I
X
f(x')
',/
X,
XI
f(x')
Figure 2:
The Generalized Frank-Wolfe Algorithm Cycles
Example 6
Let
and let
f(x)
C -
{x
Mx-b,
where
1
1
0
0
1
1
0
1
LI
M
and
b, -
0
0
(Xl,x2 ,x3 ): x 1 a 0, x2 2 O, x3 2 O, x 1 + x2 + x 2 - 1.
1/3
The solution to
O
VI(f,C)
is
x
=
1/3
1/3
,
since
45
(x - x ) f(x)
Let
v2
[
x
Then
[ 0.
x
2
2/3(x
1
+ x2 + x3 - 1)
v0
-
1
and
0= for every
,
,
x3
[
]
Hence, the iterates cycle about the 3 points
1
x
v1
C.
- [ O
x1
x0 , x
x2 .
and
D
The generalized Frank-Wolfe method does not converge in the above
examples because the matrix
M
is in some sense "too asymmetric."
In all of
the examples we have analyzed, the algorithm cycles only when the Jacobian of
f
is very asymmetric.
Because the generalized Frank-Wolfe algorithm reduces
to the generalized steepest descent algorithm when the problem to which it is
being applied is unconstrained, it is likely that the conditions required for
the generalized Frank-Wolfe to converge are at least as strong as the
conditions required for the generalized steepest descent method to converge.
That is, it is likely that at least,
M2
must be positive definite.
condition is satisfied in neither of the previous examples.
M2
=
is clearly not positive definite.
-2V3-
M2
In Example 5,
M
In Example 6,
-2
is not positive definite because the determinant of the first
minor of
This
2x2
principle
is negative.
Several difficulties arise when trying to prove convergence of the
generalized Frank-Wolfe method.
First, the iterates generated by the
algorithm do not contract toward the constrained solution with respect to
46
either the Euclidean norm or the
M
norm, where
M
=
Vf(x ),
even if
M
is
(An example in Hammond [1984] illustrates this behavior.)
symmetric.
The proof of convergence of the Frank-Wolfe method for convex
minimization problems demonstrates convergence by showing that
When
descent function.
Mx-b
f(x)
F(xk)
is a
is a gradient mapping, instead of using
the usual descent argument, we can prove convergence of the generalized
Frank-Wolfe method with an argument that relies on the fact that the solution
of the constrained problem is the projection with respect to the
the unconstrained solution onto the feasible region.
M
norm of
This is not true if
M
is asymmetric, so the argument cannot be generalized to the asymmetric case.
Although the Frank-Wolfe algorithm itself does not converge for either of
Examples 5 or 6, the method will converge for these examples if in each
In particular, consider a modified
iteration the step length is reduced.
version of the Frank-Wolfe method, where the step length on the
iteration is
/k;
that is, the algorithm generates iterates by the recursion
k+l
x
where
v
k
=x +
k)
k
1
-x),
(v
k+1
is the solution to the linear programming subproblem in Step 1 of
the Frank-Wolfe method.
This procedure can also be interpreted as an
we can iteratively substitute for
extreme-point averaging scheme:
i - k,k-l,...,1
xi
for
to obtain
k+l
x
Thus, xk
kth
1
k+l
k
Ev
i
vi
is the average of the extreme points generated by the linear
programming subproblems on the first
k
iterations.
This variant of the
47
In the
Frank-Wolfe method will solve the problems given in Examples 5 and 6.
next subsection we show that it generalizes the "fictitious play" algorithm
for zero-sum two-person games.
7.1
Fictitious Play Algorithm
to a
(x , y )
1951] shows that an equilibrium solution
Robinson
finite, two-person zero-sum game can be found using the iterative method of
fictitious play.
A
=
The game can be represented by its pay-off matrix
Each play consists of a row player choosing one of the
(aij).
m
rows of
th
the and
matrix
column are
player
chooses
one of player
the n pays
columns.
If player
the ith
the row
the
thewhile
jth acolumn
chosen,
the column
row
amount
aij,
-aij.
receives
points
k
where S
i.e., the row player receives
x
satisfying
xTAy
< (x)Ay
is the unit simplex in
and the column player
to the game is a pair of
(x , y )
An equilibrium solution
Sm, yES n
+aij
for every
yes n,
xcS,
k
R
The fictitious play method determines
-k k
(x ,y ),
the
k
th
play of the
game, by determining for each player the best pure strategy (i.e., the single
best row or column) against the accumulated strategies of the other
-k
the row player chooses the pure strategy x
k 1 k-1
that is the best reply to the average y
:
y
of the first k plays
player.
Hence, at iteration
k,
j=0
by the column player.
-If
k ik
If x
is the best response to
k
xTAy
(k) TAyk
X Ay - xAy
-k
y , x
must satisfy
for every
xSm.
48
-k
x
That is,
solves the (trivial) linear program
Max xTAyk
xeSm
-k
y
Similarly,
solves the linear program
Min
ym
k
x
where
1
k-1
xi
iwhere
x
Robinson shows that the iterates
to the equilibrium solution
kT
(xk)TAy
k
k
(x , y )
generated by this method converge
of the game.
(x , y )
To show that the Frank-Wolfe method "with averaging" is a generalization
of the fictitious play algorithm, we first reformulate the matrix game as the
f(z)
if
Mz
] [
[T
=
*T
]
[T
] .
z
[
-
T
T **
(x ) Ay - x Ay
(x , y )
that is, if and only if
z
VI(f,C)
solves
T
*
*TT*
(x-x ) (-Ay )+(y-y ) Ax
*
(z-z ) f(z )
xESm, ycS ,
C - SmxS,
where
VI(f,C),
variational inequality problem
and
if and only
for every
0
is a solution to the game.
The Frank-Wolfe method with averaging determines an extreme point of
on the
kt h
iteration by solving the linear programming subproblem
zTf(z k ).
Min
xEC
(This subproblem determines
over
(xk)TAy
minimizes
k+l
zk+
(xk)TAy
I
i-k
i=O
-k
x
and
if and only if
C
k
k+1
C
-1
over
:
Sn.)
that is,
-k
x
-k
y :
-k
z
minimizes
maximizes
xAy
k
T
k
z f(zk )
over
m
S
-
Tk
-xTAy
and
-k
y
+
The algorithm then determines the next iterate
x
k+l
1
k+I'
k
+
i=O
-i
x
and
y
k+1
1
k
E
Z
J-O
J
49
Thus we can view the Frank-Wolfe algorithm with averaging as a generalized
fictitious play algorithm.
The following theorem shows that this generalized fictitious play
algorithm will solve a certain class of variational inequality problems.
Shapley [1964] has devised an example that shows that the method of fictitious
play need not solve general bimatrix games (and hence general variational
inequality problems).
However, the mapping in his example is not monotone.
Theorem 6
The fictitious play algorithm will produce a sequence of iterates that
converge to the solution of the variational inequality problem
(i)
(ii)
(iii)
f
is continuously differentiable and monotone;
C
is compact and strongly convex; and
no point
x
in the ground set
C
satisfies
VI(f,C)
if
f(x) = 0.
Proof
The algorithm fits into the framework of Auslender's [1976] descent
algorithm procedure, because
and the stepsize
Z
ken-W
Wk = +X,
k
wk
and
=
1/k
lim
k
k
v
at the
solves the subproblem
th
kt h
T k
min{xTf(xk):xEC}
iteration satisfies
k
w
> 0,
w k = 0.
k
O
Two of the conditions specified by the theorem are more restrictive than
we might wish.
polyhedral.
First, if
C
is strongly convex, then
C
cannot be
This framework, therefore, does not show that the algorithm
converges for the many problem settings that cast the variational inequality
problem over a polyhedral ground set.
Since an important feature of this
algorithm is that the subproblem is a linear program when the ground set
C
50
is polyhedral, this restriction eliminates many of the applications for which
the algorithm is most attractive.
xcC
Secondly, the condition that
may be too restrictive in some problem settings.
f(x) # 0
for
One setting for which
this condition is not overly restrictive is the traffic equilibrium problem.
If we assume that the demand between at least one OD pair is positive, then we
can assume that the cost of any feasible flow on the network is nonzero.
Powell and Sheffi [1982] show that iterative methods with "fixed step
sizes" such as this one will solve convex minimization problems under certain
conditions.
Their proof does not extend to variational inequality problems
defined by maps that have asymmetric Jacobians.
Although we do not currently
have a convergence proof for the fictitious play algorithm for solving
variational inequality problems, we believe that it is likely that the
algorithm will converge.
We therefore end this section with the following
conjecture:
Conjecture
If
f
is uniformly monotone and
C
is a bounded polyhedron, then the
fictitious play algorithm will solve the variational inequality problem
VI(f,C).
8.
CONCLUSION
In general, when nonlinear programming algorithms are adapted to
variational inequality problems, their convergence requires a restriction on
the degree of asymmetry of the Jacobian of the problem map.
Analyzing the
effect that an asymmetric Jacobian has on the vector field defined by the
problem map suggests why this restriction is required.
Consider the
difference between the vector fields defined by two monotone affine maps, one
51
having a symmetric Jacobian matrix and one having an asymmetric Jacobian
matrix.
f(x) - Mx-b.
Let
When
* T. x~x*
(x-x ) M(x-x ) - c
equation
is a symmetric positive definite matrix, the
M
describes an ellipsoid whose axes are in the
direction of the eigenvectors of M. The set of equations
* T.
~x*
(x-x ) M(x-x )
c, therefore, describe concentric ellipsoids about the
solution.
sets, the vector
f(x)
at the point
since
x,
on the boundary of one of these ellipsoidal level
x
For any point
Mx-b
is normal to the hyperplane supporting the set
a (-*T ~-*
T)
2M(x
-1
2(x - M
(M+M )(x-x
ax(x-x ) M(x-x)
b)
2f(x).
If
M
is not symmetric, the set of equations
(x-x ) M(x-x ) = c
(x-x ) M(x-x ) =
again describe concentric ellipsoids about the solution:
the axes are in the direction of the eigenvectors of
though,
the vector
f(x) = Mx-b
M.
In this instance,
is not normal to the hyperplane supporting
the ellipsoidal level set at the point
x.
Figure 3 illustrates the vector
fields and ellipsoidal level sets for a symmetric matrix and an asymmetric
matrix.
In general, the more asymmetric the matrix
the more the vector
M,
field "twists" about the origin.
Nonlinear programming algorithms are designed to solve problems defined
by maps that have symmetric Jacobians.
In general, these algorithms move
iteratively in "good" feasible descent directions.
kt h
minimization problem (2), on the
feasible direction
choose
dk
dk
satisfying
That is, for the
iteration, the algorithm determines a
dkVF(xk) < 0.
Many algorithms attempt to
"sufficiently close" to the steepest descent direction
-VF(xk).
When these algorithms are adapted to solve variational inequality problems,
they determine a direction
the direction
-f(xk).
dk
satisfying
df(xk ) < O.,
As long as the Jacobian of
f
with
dk
close to
is nearly symmetric,
52
4Xl
ascent"
ins
range c
dir
(
M
) =
(X.x)TM(x .X) = C
M Asymmetric
M Symmetric
f(x) - Mx-b is Normal to the Tangent Plane to the Ellipsoidal
Level Set if and only if M is Symmetric
Figure 3:
such a direction is a "good" direction for the problem
move in the direction
Jacobian of
f
dk
Figure 3 illustrates the set of
for both the symmetric and asymmetric cases.
because a
If, however, the
is a move towards the solution.
is very asymmetric, a move in the direction
away from the solution.
-f(xk),
VI(f,C),
dk
may be a move
"descent" directions
The illustrations show that
the direction that a nonlinear programming algorithm considers to be
a "good" direction, can be a poor direction if the matrix is very asymmetric.
Projection algorithms are widely used to solve variational inequality
problems.
A fundamental difference between the nonlinear programming
53
algorithms that we consider in this paper and projection methods is that the
algorithms we consider use a "full" steplength; in contrast, projection
methods use a small fixed steplength, or a steplength defined by a convergent
sequence of real numbers.
Although full steplength algorithms such as the
generalized steepest descent and Frank-Wolfe algorithms require more work per
iteration than those using a constant or convergent sequence step size, they
move fairly quickly to a neighborhood of the solution.
Taking a full
steplength poses a problem, however, when the Jacobian of the mapping is very
asymmetric.
In this case the "twisting" vector field may not only cause the
algorithm to choose a less than ideal direction of movement, but, having done
so, will cause the algorithm to determine a much longer step than it would
choose if the mapping was nearly symmetric.
The asymmetry is not as much of a
problem if the step size is small, because the algorithm will not pull as far
away from the solution even if the direction of movement is poor.
illustrates the effect of asymmetry on the full steplength.
Figure 4
By our previous
observations, algorithms that take a full step size will converge only if a
bound is imposed on the degree of asymmetry of the Jacobian.
methods do not require this type of condition.
Projection
The algorithms that we
consider in this paper will converge even if the problem mapping is very
asymmetric as long as the full steplength is replaced by a sufficiently small
steplength.
The steepest descent algorithm for unconstrained variational
inequality problems becomes a projection algorithm if the stepsize is
sufficiently small.
Theorem 6 shows that the Frank-Wolfe method will converge
for a class of variational inequality problems if the stepsize is defined by a
convergent sequence.
54
X,
k f(xk)
) = c2 >c
k)
Xt
X') = C1
M Symmetric
M Asymmetric
Figure 4: A Full Steplength Pulls the Iterate Further
from the Solution when the Map is Very Asymmetric
55
REFERENCES
Ahn, B. and W. Hogan [1982]. "On Convergence of the PIES Algorithm for
Computing Equilibria," Operations Research 30:2, 281-300.
Anstreicher, K. [1984].
Auslender, A. [1976].
Private communication.
Optimisation:
Methodes Numberiques, Mason, Paris.
Ballantine, C. S. and C. R. Johnson [1975]. "Accretive Matrix Products,"
Linear and Multilinear Algebra 3, 169-185.
Beckmann, M. J., C. B. McGuire and C.B. Winsten [1956]. Studies in the
Economics of Transportation, Yale University Press, New Haven, CT.
Bertsekas, D. P. [1982]. Constrained Optimization and Lagrange Multiplier
Methods, Academic Press, New York, NY.
Bruynooghe, M., A. Gibert, and M. Sakarovitch [1968]. "Une Methode
d'Affectation du Traffic," in Fourth Symposium of the Theory of Traffic
Flow, Karlsruhe.
Cauchy, A. L. [1847]. "Methode generale pour la resolution des systemes
d'equations simultanees," Comptes rendus, Ac. Sci. Paris 25, 536-538.
Courant, R. [1943]. "Variational Methods for the Solution of Problems of
Equilibrium and Vibrations," Bull. Amer. Math. Soc. 49, 1-23.
Curry, H. [1944]. "The Method of Steepest Descent for Nonlinear Minimization
Problems," Quar. Appl. Math. 2, 258-261.
Dafermos, S. C. [1983].
"An Iterative Scheme for Variational Inequlaities,"
Mathematical Programming, 26:1, 40-47.
Florian, M. and H. Spiess [1982]. "The Convergence of Diagonalization
Algorithms for Asymmetric Network Equilibrium Problems," Transportation
Research B 16B, 477-483.
Frank, M. and P. Wolfe [1956]. "An Algorithm for Quadratic Programming,"
Naval Research Logistic Quarterly 3, 95-110.
Gershgorin, S. [1931], "Uber die Abrenzung der Eigenwerte einer Matrix," Izv.
Akad. Nauk SSSR Ser. Mat. 7, 749-754.
Golden, B. [1975]. "A Minimum Cost Multi-Commodity Network Flow Problem
Concerning Imports and Exports, Networks 5, 331-356.
Hadamard, J. [1908]. "Memoire sur le probleme d'analyse relatif
l'equilibre
des plaques elastiques encastrees," Mmoires presentes par divers savants
estranges a l'Academie des Sciences de l'Institut de France (2) 33:4.
56
Hammond, J. [1984]. "Solving Asymmetric Variational Inequality Problems and
Systems of Equations with Generalized Nonlinear Programming Algorithms,"
Ph.D. Thesis, Department of Mathematics, M.I.T., Cambridge, MA.
Hestenes, M., and E. Stiefel [1952]. "Methods of Conjugate Gradients for
Solving Linear Systems," J. Res. Nat. Bur. Standards 49, 409-436.
Johnson, C. R. [1972]. "Matrices Whose Hermitian Part is Positive Definite,"
Ph.D. Thesis, California Institute of Technology, Pasadena, CA.
Kaempffer, F.A. [1967].
Waltham, MA.
The Elements of Physics, Blaisdell Publishing Co.,
Kinderlehrer, D. and Stampacchia [1980]. An Introduction to Variational
Inequalities and Applications, Academic Press, New York, NY.
Lawphongpanich, S. and D. Hearn [1982]. "Simplicial Decomposition of the
Asymmetric Traffic Assignment Problem," Research Report No. 82-12,
Department of Industrial and Systems Engineering, University of Florida,
Gainsville, Florida.
LeBlanc, L. J., E. K. Morlok, and W. P. Pierskalla [1975]. "An Efficient
Approach to Solving the Road Network Equilibrium Traffic Assignment
Problem," Transportation Research 5, 309-318.
Luenberger, D. G. [1973]. Introduction to Linear and Nonlinear Programming.
Addison-Wesley, Reading, MA.
Marcotte, P. [1983]. "A New Algorithm for Solving Variational Inequalities
Over Polyhedra, with Application to the Traffic Assignment Problem,"
Cahier du GERAD #8324, Ecole des Hautes Etues Commerciales, Universit de
Montreal, Montreal, Qudbec.
Martos, B. [1975]. Nonlinear Programming Theory and Methods.
Elsevier Publishing Co., Inc., New York, New York.
American
Ortega, J. M. and W. C. Rheinboldt [1970]. Iterative Solution of Nonlinear
Equations in Several Variables, Academic Press, New York, NY.
Pang, J. S. and D. Chan [1981]. "Iterative Methods for Variational and
Complementarity Problems," Mathematical Programming 24, 284-313.
Polak, E. [19871].
New York, NY.
Computational Methods for Optimization, Academic Press,
Powell, W. B. and Y. Sheffi [1982]. "The Convergence of Equilibrium
Algorithms with Predetermined Step Sizes," Transportation Science 16:1,
45-55.
Robinson, J. [1951]. "An Iterative Method of Solving a Game," Annals of
Mathematics 54:2, 296-301.
57
Samulson, P. A. [1952]. "Spatial Price Equilibrium and Linear Programming,"
American Economic Review 42, 283-303.
Shapley, L. S. [1964]. "Some Topics in Two Person Games," Advances in Game
Theory, Annals of Mathematics Studies No. 52, M. Dresher, L.S. Shapley,
A.W. Tucker eds., Princeton University Press, Princeton, NJ, 1-28.
Strang, G. [1976].
York, N.Y.
Linear Algebra and Its Applications.
Academic Press, New
Download