PROJECTION METHODS FOR LINEAR AND NONLINEAR EQUATIONS Dissertation submitted to the

advertisement
PROJECTION METHODS FOR LINEAR AND
NONLINEAR EQUATIONS
Dissertation submitted to the
Hungarian Academy of Sciences
for the degree ”MTA Doktora”
Aurél Galántai
University of Miskolc
2003
2
Contents
1 Preface
5
2 Triangular decompositions
2.1 Monotonicity of LU and LDU factorizations . . . . . . . . . . . . . . . .
2.2 The geometry of LU decomposability . . . . . . . . . . . . . . . . . . . . .
2.3 Perturbations of triangular matrix factorizations . . . . . . . . . . . . . .
2.3.1 Exact perturbation terms for the LU factorization . . . . . . . . .
2.3.2 Exact perturbation terms for the LDU and Cholesky factorizations
2.3.3 Bounds for the projection Pk (B) . . . . . . . . . . . . . . . . . . .
2.3.4 Norm bounds for the perturbations of LU and LDU factorizations
2.3.5 Norm bounds for the Cholesky factorizations . . . . . . . . . . . .
2.3.6 Componentwise perturbation bounds . . . . . . . . . . . . . . . . .
2.3.7 Iterations for upper bounds . . . . . . . . . . . . . . . . . . . . . .
7
8
10
12
15
17
19
23
24
27
32
3 The
3.1
3.2
3.3
3.4
3.5
.
.
.
.
.
37
39
41
44
47
49
.
.
.
.
.
53
53
56
59
65
68
rank reduction procedure of Egerváry
The rank reduction operation . . . . . . . .
The rank reduction algorithm . . . . . . . .
Rank reduction and factorizations . . . . .
Rank reduction and conjugation . . . . . .
Inertia and rank reduction . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
4 Finite projection methods for linear systems
4.1 The Galerkin-Petrov projection method . . . .
4.2 The conjugate direction methods . . . . . . . .
4.3 The ABS projection methods . . . . . . . . . .
4.4 The stability of conjugate direction methods . .
4.5 The stability of the rank reduction conjugation
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
5 Projection methods for nonlinear algebraic systems
71
5.1 Extensions of the Kaczmarz method . . . . . . . . . . . . . . . . . . . . . 73
5.2 Nonlinear conjugate direction methods . . . . . . . . . . . . . . . . . . . . 78
5.3 Particular methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
5.3.1 Methods with Þxed direction matrices . . . . . . . . . . . . . . . . 83
5.3.2 The nonlinear ABS methods . . . . . . . . . . . . . . . . . . . . . 85
5.3.3 Quasi-Newton ABS methods . . . . . . . . . . . . . . . . . . . . . 89
5.4 Monotone convergence in partial ordering . . . . . . . . . . . . . . . . . . 97
5.5 Special applications of the implicit LU ABS method . . . . . . . . . . . . 104
5.5.1 The block implicit LU ABS method on linear and nonlinear systems
with block arrowhead structure . . . . . . . . . . . . . . . . . . . . 105
5.5.2 Constrained minimization with implicit LU ABS methods . . . . . 110
CONTENTS
4
6 Convergence and error estimates
6.1 A posteriori error estimates for linear and nonlinear equations . . . . .
6.1.1 Derivation and geometry of the Auchmuty estimate . . . . . .
6.1.2 Comparison of estimates . . . . . . . . . . . . . . . . . . . . . .
6.1.3 Probabilistic analysis . . . . . . . . . . . . . . . . . . . . . . . .
6.1.4 The extension of Auchmuty’s estimate to nonlinear systems . .
6.1.5 Numerical testing . . . . . . . . . . . . . . . . . . . . . . . . . .
6.2 Bounds for the convergence rate of the alternating projection method
6.2.1 The special case of the alternating projection method . . . . .
6.2.2 A new estimate for the convergence speed . . . . . . . . . . . .
6.2.3 An extension of the new estimate . . . . . . . . . . . . . . . . .
6.2.4 A simple computational experiment . . . . . . . . . . . . . . .
6.2.5 Final remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
115
115
116
118
119
120
121
125
126
127
131
132
134
7 Appendices
7.1 Notation . . . . . . . . . . . . . . . . . . . . . . . . . .
7.2 Unitarily invariant matrix norms and projector norms
7.3 Variable size test problems . . . . . . . . . . . . . . . .
7.4 A FORTRAN program of Algorithm QNABS3 . . . .
.
.
.
.
.
.
.
.
135
135
136
142
148
8 References
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
165
Chapter 1
PREFACE
This thesis contains the author’s work concerning the numerical solution of systems of
linear and nonlinear algebraic equations. This activity includes the development, application and testing of new and efficient algorithms (ABS methods), general algorithmic
frameworks, a special projection based local convergence theory, monotone (and global)
convergence theory in partial ordering and a perturbation theory of the linear conjugate
direction methods and rank reduction conjugation.
The investigated algorithms are essentially projection methods that are also featured as conjugate direction methods. The conjugate directions are generated using
Egerváry’s rank reduction procedure [76]. This fact and the requirements of the monotone
convergence theory led to the investigation of the triangular decompositions and the rank
reduction procedure. Concerning the triangular decompositions we obtained a monotonicity theorem in partial ordering, a geometric characterization and a complete perturbation
theory. Concerning the rank reduction algorithm we obtained several basic results that include the necessary and sufficient condition for the breakdown free computation, canonical
forms, the characterization of the related full rank factorization, characterizations of different obtainable factorizations and the inherent conjugation properties. In fact, it turned
out that all full-rank factorizations are related to triangular factorizations in a speciÞed
way and so the conjugation procedure is triangular factorization based. Our triangular
factorization and rank reduction results are the basis for many results presented in the
thesis. These results however are useful in other subjects of numerical linear algebra as
well.
For any numerical method it is important to know the quality of the obtained
numerical solution. Here we investigate and show the practical value of the a posteriori
estimate of Auchmuty [17]. It is also important
to estimate the speed of iterative al¡ ¢
gorithms. Here we give a computable (O n3 ) bound for the convergence speed of the
alternating projection method whose convergence was proved by von Neumann, Halperin
and others [70].
The unscaled scalar nonlinear ABS method was developed by Abaffy, Galántai
and Spedicato [8]. The block linear and nonlinear ABS method was developed by Abaffy
and Galántai [6]. Except for the algorithm none of the joint results are included in the
thesis. The development and testing of the quasi-Newton ABS method of Section 5.3.3
is a joint work with A. Jeney. The presented convergence results are the author’s own
results.
The Appendices also contain some of the author’s results necessary in earlier
sections.
6
Preface
Chapter 2
TRIANGULAR DECOMPOSITIONS
The importance and usefulness of the Gaussian elimination (GE) became apparent in
the 1940’s, when the activity of Hotelling, von Neumann and Goldstine, Turing and Fox,
Huskey and Wilkinson resulted in the observation that Gaussian elimination is the right
method for digital computers for solving linear equations of the form
¢
¡
Ax = b
A ∈ Rm×m
(for details see [143], [233]). In fact, von Neumann and Goldstine [196] and Turing [239]
discovered that GE computes an LU (LDU ) factorization for general A and an LDLT
factorization for positive deÞnite symmetric A. They also made the Þrst error analysis
of the method and the involved triangular factorizations. The Cholesky factorization
and method were discovered and used Þrst in geodesy around the 1920’s. Since then
the triangular factorizations and the triangular factorization based algorithms (GE and
Cholesky and variants, LR method for eigenvalue computations) became important in
many Þelds of applied mathematics and the number of related problems and papers is
ever increasing (see, e.g., [143], [233]).
Here we make three contributions to the theory of triangular factorizations. We
prove the monotonicity of the LU and LDU factorizations of M matrices in partial ordering. We also give a geometric characterization of the LU decomposability in terms of
subspace angles and gap. Finally we give a complete perturbation theory for all known
triangular decompositions. We also mention that all three subjects were originally required by the development of nonlinear ABS methods. These results however are useful
elsewhere on their own.
DeÞnition 1 The matrix A ∈ Fn×n is said to be lower triangular if aij = 0 for all i < j.
If aij = 0 for all i ≤ j, then A is strictly lower triangular. If aij = 0 for i < j and aii = 1
(i = 1, . . . , n), then A is unit lower triangular.
DeÞnition 2 The matrix A ∈ Fn×n is said to be upper triangular if aij = 0 for all i > j.
If aij = 0 for all i ≥ j, then A is strictly upper triangular. If aij = 0 for i > j and aii = 1
(i = 1, . . . , n), then A is unit upper triangular.
The LU decomposition of a matrix A ∈ Fn×n is deÞned by A = LU , where L
is lower triangular and U is upper triangular. The LU decomposition, if it
¢ is not
¡ exists,
unique. For any nonsingular diagonal matrix D, decomposition A = (LD) D−1 U is also
an LU factorization of the matrix. The sufficient part of the following result was proved
by Turing (see, e.g., [143]).
Theorem 3 Let A ∈ Fn×n be a nonsingular matrix. The matrix A has a unique LU
decomposition
A = LU,
(2.1)
Triangular decompositions
8
where L is unit lower triangular and U is upper triangular, if and only if the leading
principal submatrices are all nonsingular.
DeÞnition 4 The block LU decomposition of the partitioned matrix
k
A = [Aij ]i,j=1 ∈ Fn×n
(Aij ∈ Fli ×lj )
is deÞned by A = LU , where L and U are block triangular matrices of the form
k
(Lij ∈ Fli ×lj ; Lij = 0, i < j)
k
(Uij ∈ Fli ×lj ; Uij = 0, i > j).
L = [Lij ]i,j=1
and
U = [Uij ]i,j=1
Theorem 5 The matrix A has a block LU decomposition if and only if the Þrst k − 1
leading principal block submatrices of A are nonsingular.
For proofs of the above theorems, see e.g., [122], [144], [143].
A block triangular matrix is unit block triangular if its diagonal elements are unit
matrices. If L or U is unit block triangular, then the block LU decomposition is unique.
DeÞnition 6 A partitioned nonsingular matrix A ∈ Fn×n is said to be (block) strongly
nonsingular, if A has a (block) LU decomposition.
The conditions of strong nonsingularity are clearly the most restrictive in the
case when all blocks are scalars. Such a case implies block strong nonsingularity for any
allowed partition. A geometric characterization of the strong nonsingularity will be given
in Section 2.2.
Occasionally we denote the unique LU factorization of A by A = L1 U, where
L1 stands for the unit lower triangular component. The unique LDU factorization of A
is deÞned by A = L1 DU1 , where L1 is unit lower triangular, D is diagonal and U1 is
unit upper triangular (U = DU1 ). A special case of the triangular factorizations is the
Cholesky factorization.
Theorem 7 (Cholesky decomposition). Let A ∈ Cn×n be Hermitian and positive deÞnite.
Then A can be written in the form
A = LLH ,
(2.2)
where L is lower triangular with positive diagonal entries. If A is real, L may be taken to
be real.
2.1
Monotonicity of LU and LDU factorizations
The following result establishes the monotonicity of the triangular decompositions of Mmatrices [103], [104].
DeÞnition 8 A matrix A is reducible if there is a permutation matrix Π such that
·
¸
X 0
ΠT AΠ =
Y Z
with X and Z both square blocks. Otherwise, the matrix is called irreducible.
Monotonicity of LU and LDU factorizations
9
The following result is related to the fact ([23], [84], [219], [241]) that a nonsingular
matrix A is an M-matrix if and only if there exist lower and upper triangular M-matrices
R and S, respectively, such that A = RS.
Theorem 9 Let A and B be two M-matrices such that A ≤ B and assume that A is
nonsingular or irreducible singular and B is nonsingular. Let A = LA VA and B = LB VB
be the LU factorizations of A and B such that both LA and LB are unit lower triangular.
Then
LA ≤ LB ,
VA ≤ VB .
(2.3)
In addition, if VA = DA UA and VB = DB UB , where UA and UB are unit upper triangular,
DA and DB are diagonal matrices, then
DA ≤ DB ,
UA ≤ UB .
(2.4)
Proof. We prove the result by induction. Assume that k × k matrices A and B
are nonsingular such that LA ≤ LB and VA ≤ VB hold. Let
·
·
¸
¸
¡
¢
A c
B p
0
0
c, r, p, q ∈ Rk .
A =
, B =
rT a
qT b
Assume that A0 and B 0 are also nonsingular M -matrices. They have the LU -factorizations



LA
0
VA
L−1
A c

,
A0 = 
rT VA−1 1
0 a − rT A−1 c

B0 = 
LB
q
T
VB−1
0
1


VB
L−1
B p
0
T
b−q B
−1
p

.
By assumption LA ≤ LB , VA ≤ VB , c ≤ p ≤ 0, r ≤ q ≤ 0 and 0 < a ≤ b. Relation L−1
A ≥
−1
−1
−1
−1
T −1
T −1
≥
0
implies
L
c
≤
L
p
≤
0.
Inequality
V
≥
V
≥
0
implies
r
V
≤
q
V
L−1
B
A
B
A
B
A
B ≤
0. Notice that a − rT A−1 c = det (A0 ) / det (A) > 0 and b − q T B −1 p = det (B 0 ) / det (B) >
0. Finally 0 < a − rT A−1 c ≤ b − q T B −1 p follows from A−1 ≥ B −1 ≥ 0. Thus we proved
the Þrst part of the theorem for the nonsingular case. If A0 is an irreducible singular
matrix of order m, then by Theorem 4.16 of Berman and Plemmons [23] (p.156) A0 has
rank m− 1, each principal submatrix of A0 other than A0 itself is a nonsingular M -matrix,
a = rT A−1 c and A0 has the LU -factorization



LA
0
VA L−1
A c


A0 = 
0
0
rT VA−1 1
with singular upper triangular matrix VA0 . As 0 ≤ b − q T B −1 p the theorem also holds in
this case. Let VA = DA UA and VB = DB UB . As AT and B T are also M-matrices that
satisfy AT ≤ B T , we have the LU -factorizations AT = UAT (DA LTA ) and B T = UBT (DB LTB )
with UAT ≤ UBT and DA LTA ≤ DB LTB . This implies UA ≤ UB . The inequality 0 ≤ DA ≤
DB follows from the relations VA ≤ VB , diag (VA ) = DA and diag (VB ) = DB . The same
reasoning applies to A0 and B 0 if they are nonsingular. If A0 is irreducible singular, then



−1 −1
DA 0
UA DA
LA c
,

DA0 UA0 = 
0 0
0
0
Triangular decompositions
10

DB0 UB0 = 
DB
0
0
b − q T B −1 p


UB
−1 −1
DB
LB p
0
1

,
−1 −1
−1 −1
from which 0 ≤ DA0 ≤ DB0 immediately follows. As DA
LA c ≤ DB
LB p ≤ 0 the rest
of the theorem also follows.
Remark 10 From Theorem 4.16 of Berman and Plemmons [23] (p. 156) it follows that
Theorem 9 does not hold if B is irreducible singular and A 6= B.
Theorem 9 is not true for general matrices. Using the results of Jain and Snyder
[162] we deÞne matrices A and B as follows:




1 −4
9 −12
2 −4
9 −12
 −4

17 −40
57 
17 −40
57 
 , B =  −4
.
A=



9 −40
98 −148
9 −40
98 −148 
−12
57 −148
242
−12
57 −148
242
These matrices are monotone, A ≤ B and



1
0
0 0
1
0
0
 −4
 −2

1
0
0
1
0
 , LB = 
LA = 

 9/2 −22/9
9 −4
1 0 
1
−12
9 −4 1
−6 11/3 −240/67
LA and LB are not comparable, implying that Theorem 9 does not
matrices. The reverse of Theorem 9 is also not true. Let



1
0
0 0
1
0
0
 −4
 −4

1
0
0
1
0
 , LB = 
LA = 
 −9 −4
 0 −4
1 0 
1
−12 −9 −4 1
0
0 −4

0
0 
.
0 
1
hold for monotone

0
0 
.
0 
1
Matrices LA and LB are M-matrices and LA ≤ LB . The product matrix LA LTA is not an
M-matrix. Furthermore LA LTA and LB LTB are not comparable. We remark however that
if L and R are lower triangular M-matrices such that LLT ≤ RRT holds, then L ≤ R
also holds. This result is due to Schmidt and Patzke [216].
A symmetric nonsingular M -matrix is called a Stieltjes matrix. The Stieltjes
matrices are positive deÞnite. An easy consequence of Theorem 9 is the following.
eL
eT are Stieltjes matrices with their corresponding
Corollary 11 If A = LLT and B = L
e
Cholesky factorizations and A ≤ B, then L ≤ L.
Theorem 9 will be used in the monotone convergence proof of nonlinear conjugate
direction methods in Section 5.4.
2.2
The geometry of LU decomposability
We give a geometric characterization of the LU decomposability or strong nonsingularity
[110], which plays a key role in many cases and especially in our investigations. The result
is rather different from the algebraic characterization given by Theorem 3.
We recall that for any nonsingular matrix A ∈ Rn×n there is a unique QRfactorization A = QA RA such that the main diagonal of RA contains only positive entries.
If the Cholesky decomposition AT A = RT R is known, then QA = AR−1 and RA = R.
The geometry of LU decomposability
11
Proposition 12 A nonsingular matrix A ∈ Rn×n has an LU factorization if and only if
its orthogonal factor QA has an LU factorization.
Proof.
If¢the LU factorization A = LA UA exists, then LA UA = QA RA , from
¡
−1
= QA follows. As UA R−1
which LA UA RA
A is upper triangular, this proves the only if
part. In turn, if QA = LU , then A = QA RA = L (URA ) proving the LU decomposability
of matrix A.
As κ2 (A) = κ2 (RA ) we can say that the orthogonal part QA of matrix A is
responsible for the LU decomposability, while the upper triangular part RA of the QRfactorization determines the spectral condition number of A (for the use of this fact see
[52] or [143]).
We now recall the following known results on angles between subspaces (see, e.g.,
[128]).
DeÞnition 13 Let M, N ⊆ Rn be two subspaces such that p = dim (M) ≥ dim (N ) = q.
The principal angles θ1 , . . . , θq ∈ [0, π/2] between M and N are deÞned recursively for
k = 1, . . . , q by
cos (θk ) = max max uT v = uTk vk ,
u∈M v∈N
(2.5)
subject to the constraints:
kuk2 = 1,
kvk2 = 1,
uTi u = 0,
viT v = 0,
i = 1, . . . , k − 1.
(2.6)
Note that 0 ≤ θ1 ≤ θ2 ≤ . . . ≤ θq ≤ π/2. Let U = [u1 , . . . , un ] ∈ Rn×n be
orthogonal such that U1 = [u1 , . . . , up ] is a basis for M, and U2 = [up+1 , . . . , un ] is a
basis for M ⊥ . Similarly, let V = [v1 , . . . , vn ] ∈ Rn×n be an orthogonal matrix such that
V1 = [v1 , . . . , vq ] is a basis for N , and V2 = [vq+1 , . . . , vn ] is a basis for N ⊥ . Let us
consider the partitioned orthogonal matrix
· T
¸
¡ T
¢
U1 V1 U1T V2
U1 V1 ∈ Rp×q .
(2.7)
UT V =
T
T
U2 V1 U2 V2
Let θi = θi (M, N) be the ith principal angle between the subspaces M and N . It follows
from a result of Björck and Golub [26] (see also [128]) that
¡
¢
σi U1T V1 = cos (θi (M, N )) (i = 1, . . . , q) .
Let us consider now the orthogonal factor QA of matrix A ∈ Rn×n in 2 × 2
partitioned form:
"
#
(k)
(k)
³
´
Q11 Q12
(k)
k×k
Q
.
(2.8)
QA =
∈
R
11
(k)
(k)
Q21 Q22
(k)
The matrix QA has an LU factorization, if and only if Q11 is nonsingular for all k =
(k)
1, . . . , n − 1. The matrix Q11 is nonsingular, if and only if its singular values are positive.
¡ ¢|k
¡ ¢n−k|
Letting U = QTA , V = I, U1 = QTA , U2 = QTA
, V1 = I |k and V2 = I n−k|
in partition (2.7) we obtain partition (2.8). Thus
³
´
³ ³ ³¡ ¢|k ´
³ ´´´
(k)
, R I |k
, i = 1, . . . , k.
(2.9)
σi Q11 = cos θi R QT
Triangular decompositions
12
(k)
Hence the matrix Q11 is nonsingular, if cos (θi ) > 0 holds for all i. This happens, if and
only
³ if ´θi < π/2 for all i = ¡1, . .¢. , k. In other words, the angles between the subspaces
R QkA (row space) and R I |k must be less than π/2.
This observation can be expressed in terms of subspace distance or gap, which is
deÞned for subspaces M, N ⊂ Rn by
d (M, N) = kPM − PN k2 ,
(2.10)
where PM and PN are the orthogonal projectors onto M and N , respectively. It is also
known (see, e.g., [234], [109]) that
½
sin (θq ) , dim (M) = dim (N) = q
kPM − PN k2 =
(2.11)
1, dim (M ) 6= dim (N )
Thus if all θi < π/2, then sin (θi ) < 1 and kPM − PN k2 < 1.
Proposition
orthogonal
matrix QA ∈ Rn×n has an LU factorization, if and only
³ ³ ´14 The
´
¡
¢
if θk R QkA , R I |k < π/2 holds for all k = 1, . . . , n − 1. The equivalent condition is
for all k = 1, . . . , n − 1.
³ ´´
³ ³ ´
<1
d R QkA , R I |k
(2.12)
Theorem 15 A nonsingular
∈ Rn×n has an LU factorization if and only if
³ ³ matrix
´ A
¡ |k ¢´
k
< π/2 holds for all k = 1, . . . , n − 1. The
for QA the condition θk R QA , R I
equivalent condition is
³ ´´
³ ³ ´
<1
(2.13)
d R QkA , R I |k
for all k = 1, . . . , n − 1.
If a nonsingular matrix A has no LU factorization, then a permutation matrix
P exists such that P A has an LU factorization. As P A = (P QA ) RA is also a QRfactorization, the multiplication of A by P (change of rows) improves on the orthogonal
part QA of matrix. In other words, the change of row vectors keeps out π/2 as a principal
angle.
2.3
Perturbations of triangular matrix factorizations
It was recognized very early that the perturbation of the triangular factorizations affects
the stability of the Gaussian elimination type methods. Von Neumann and Goldstine
[196], [197], and Turing [239] gave the Þrst perturbation analysis, although in implicit
form. In fact, they investigated the inversion of matrices via the Gaussian elimination
and the effect of rounding errors. For example, von Neumann and Goldstine [196] show
e
certain° conditions and L
that if the positive deÞnite and symmetric A ∈ Rn×n satisÞes
°
°
°
e L
eT ° ≤ 0.42n2 u holds,
e are the computed LDLT factorization of A, then °A − LD
and D
where u is the machine unit. Most of the error analysis works on Gaussian elimination
type methods is related to some kind of ßoating point arithmetic and essentially backward
error analysis. This is due to Wilkinson, who proved the following important and famous
result.
Perturbations of triangular matrix factorizations
13
Theorem 16 (Wilkinson). Let A ∈ Rn×n and suppose GE with partial pivoting produces
a computed solution x
b to Ax = b. Then
(A + ∆A) x
b = b,
k∆Ak∞ ≤ 8n3 ρn kAk∞ u + O(u2 ),
where ρn is the growth factor of the pivot elements.
Wilkinson also proved that the triangular factors L and U found when performing
GE are the exact triangular factors of A + F , where the error matrix F is bounded
essentially in terms of machine word-length and the increase in magnitude of the elements
of A as the calculation proceeded.
The true meaning of the Wilkinson theorem is that the computed solution exactly
satisÞes a perturbed equation. It does not give any answer however, if one is interested
in the solution error x − x
b, or the error inßuencing factors.
The study of the perturbations of triangular factorizations started relatively lately.
Broyden [35] was perhaps the Þrst researcher, who showed that the triangular factors
can be badly conditioned even for well-conditioned matrices. Since then many authors
investigated the perturbations of triangular matrix factorizations [19], [25], [46], [47], [48],
[74], [112], [231], [232], [235], [236], [237] (see also [143]).
We note that the stability of triangular factorizations is also important in rank
testing [126], [204] (for rank testing see Stewart [233]).
The earlier perturbation results are true upper estimates for the solution of certain
nonlinear perturbation equations or approximate bounds based on linearization. Here we
solve these perturbation equations and derive the exact perturbation expressions for the
LU , LDU , LDLT and Cholesky factorizations. The exact perturbation terms are valid
for all matrix perturbations keeping nonsingularity and factorability. The results are then
used to give upper bounds that improve the known bounds in most of the cases. Certain
componentwise upper bounds can be obtained by monotone iterations as well. The results
of this section were published in papers [112] and [117].
Assume that A ∈ Rn×n has an LU factorization. We consider the unique LU
and LDU factorizations A = L1 U and A = L1 DU1 , respectively, where L1 is unit lower
triangular, D is diagonal and U1 is unit upper triangular (U = DU1 ). If A ∈ Rn×n is a
symmetric, positive deÞnite matrix, then it has a unique LDLT factorization A = L1 DLT1 ,
where L1 is unit lower triangular and D is diagonal with positive diagonal entries. The
Cholesky factorization of this A is denoted by A = RRT , where R = D1/2 LT1 is upper
triangular.
Let δA ∈ Rn×n be a perturbation such that A + δA also has LU factorization.
Then
A + δA = (L1 + δL1 ) (U + δU )
(2.14)
A + δA = (L1 + δL1 ) (D + δD ) (U1 + δU1 )
(2.15)
and
are the corresponding unique LU and LDU factorizations. The perturbation matrices δL1 ,
δU , δD and δU1 are strict lower triangular, upper triangular, diagonal and strict upper
triangular, respectively. For technical reasons we also use the unique LU factorization
A = LU1 , where L = L1 D is lower triangular and U1 is unit upper triangular. In this
case the LU factorization of the perturbed matrix will be given by
A + δA = (L + δL ) (U1 + δU1 ) ,
(2.16)
Triangular decompositions
14
where δL is lower triangular. We refer to this particular LU factorization as LU1 factorization, while the Þrst one will be called as L1 U factorization.
If A is symmetric and positive deÞnite and the perturbation δA is symmetric such
A + δA remains positive deÞnite, then the perturbed LDLT factorization is given by
A + δA = (L1 + δL1 ) (D + δD ) (L1 + δL1 )T ,
(2.17)
where δL1 is strict lower triangular and δD is diagonal. The LDLT factorization is a
T
). Hence any statement on the
special case of the LDU factorization (U1 = LT1 , δU1 = δL
1
perturbed LDU factorization is also a statement on the perturbed LDLT factorization
provided that A is symmetric and positive deÞnite. Therefore we do not formulate separate
theorems on the perturbation of the LDLT factorization.
The perturbed Cholesky decomposition is given by
T
A + δA = (R + δR ) (R + δR ) ,
(2.18)
where δR is upper triangular.
n
We use the following notation. Let A = [aij ]i,j=1 . Then
½
aij , i ≥ j − l
tril (A, l) = [αij ]ni,j=1 , αij =
, (0 ≤ |l| < n)
0, i < j − l
and
triu (A, l) = [βij ]ni,j=1 ,
βij =
½
aij , i ≤ j − l
,
0, i > j − l
(0 ≤ |l| < n) .
(2.19)
(2.20)
Related special notations are tril (A) = tril (A, 0), tril∗ (A) = tril (A, −1), triu (A) =
triu (A, 0) and triu∗ (A) = triu (A, 1). Let ei ∈ Rn denote the ith unit vector and let
P
P
I (k) = ki=1 ei eTi for 1 ≤ k ≤ n and I (k) = 0 for k ≤ 0. Similarly, let I(k) = ni=k+1 ei eTi
for 0 ≤ k < n and I(k) = 0 for k ≥ n. Note that I (n) = I = I(0) and I(k) + I (k) = I. We
also use the notation Ik for the k × k unit matrix.
Lemma 17 Let C, B ∈ Rn×n and assume that I − B is nonsingular and has LU factorization. Then the solution of the equation
W = C + Btriu (W, l)
(l ≥ 0)
(2.21)
(k = 1, . . . , n) .
(2.22)
is given by
´−1
³
Cek
W ek = I − BI (k−l)
Relations
W ek = Cek + Btriu (W, l) ek and triu (W, l) ek = I (k−l) W ek
¢
¡ Proof.(k−l)
W ek = Cek which gives the result. The nonsingularity of I − BI (k−l)
imply I − BI
follows from the assumptions that I − B is nonsingular and has LU factorization.
Note that I (j) = 0 (j ≤ 0) implies that W ek = Cek for k ≤ l. Equation (2.21)
can be transformed into one single system by using the vec operation:
´
³
(2.23)
diag I − BI (1−l) , . . . , I − BI (n−l) vec (W ) = vec (C) .
Corollary 18 Let C, B ∈ Rn×n and assume that I − B is nonsingular and has LU
factorization. Then the solution of the equation
W = C + tril (W, −l) B
(l ≥ 0)
(2.24)
is given by
³
´−1
eTk W = eTk C I − I (k−l) B
(k = 1, . . . , n) .
(2.25)
Perturbations of triangular matrix factorizations
15
¡
¢
Proof. As W T = C T + B T tril (W, −l)T = C T + B T triu W T , l Lemma 17
implies the requested result.
We Þrst give the exact perturbations in terms of A, the original factors and the
perturbation matrix δA . The perturbations will be derived from the perturbations of
certain lower triangular factors. It will be shown that a certain projection plays a key
role in the perturbation of the components. Then we derive and analyze corresponding
perturbation bounds which are compared with the existing ones.
2.3.1 Exact perturbation terms for the LU factorization
We Þrst derive the perturbations of the lower triangular factors in the LU factorizations
A = L1 U and A = LU1 , respectively. We assume that A and A + δA are nonsingular.
−1
and write
Let X1 = L−1
1 δL1 , Y = δU U
−1
−1
= L−1
L−1
1 (A + δA ) U
1 (L1 + δL1 ) (U + δU ) U
(2.26)
−1
in the form I + B = (I + X1 ) (I + Y ) = I + X1 + Y + X1 Y , where B = L−1
.
1 δA U
Observe that X1 is strict lower triangular and Y is upper triangular. Hence I + X1 is unit
lower triangular and I + Y is upper triangular and provide the unique L1 U factorization
of I + B. Note that I + B and I + Y are nonsingular by the initial assumptions. From
the identity
B = X1 + Y + X1 Y
(2.27)
−1
−1
−1
it follows that F := B (I + Y ) = Y (I + Y ) + X1 , where Y (I + Y ) is upper tri−1
∗
angular. Hence tril (F ) = X1 and triu (F ) = Y (I + Y ) . The relation (I + Y )−1 =
I − Y (I + Y )−1 implies B (I + Y )−1 = B − BY (I + Y )−1 which can be written as
F = B − Btriu(F ).
(2.28)
Hence the exact perturbation term is given by δL1 = L1 X1 = L1 tril∗ (F ). Lemma 17 and
relation tril∗ (F ) ek = I(k) F ek yield
·³
¸
´−1
´−1
³
(1)
(n)
Be1 , . . . , I + BI
Ben
F = I + BI
(2.29)
and
·
¸
³
´−1
³
´−1
Be1 , . . . , I(n) I + BI (n)
Ben .
X1 = I(1) I + BI (1)
(2.30)
Hence the kth column of δL1 is given by
δL1 ek = L1 X1 ek = L1 Pk (B) Bek
(k = 1, . . . , n),
(2.31)
where
³
´−1
Pk (B) = I(k) I + BI (k)
is a projection of rank n − k.
If B is partitioned in the form
¸
· 1
Bk Bk4
B=
Bk2 Bk3
(k = 1, . . . , n)
¡ 1
¢
Bk ∈ Rk×k ,
(2.32)
(2.33)
Triangular decompositions
16
then
Pk (B) =
·
0
¡
¢−1
2
−Bk I + Bk1
0
In−k
¸
.
(2.34)
We do now a similar calculation for the LU1 factorization. Let X = L−1 δL ,
Y1 = δU1 U1−1 and write
L−1 (A + δA ) U1−1 = L−1 (L + δL ) (U1 + δU1 ) U1−1
(2.35)
e = (I + X) (I + Y1 ) = I + X + Y1 + XY1 , where B
e = L−1 δA U −1 . Here
in the form I + B
1
X is lower triangular and Y1 is strict upper triangular. Hence I + X is lower triangular
e
and I + Y1 is unit upper triangular and provide the unique LU1 factorization of I + B.
e
Note that I + B is nonsingular. Again, the identity
e = X + Y1 + XY1
B
(2.36)
∗
e − Btriu
e
(G).
G=B
(2.37)
e (I + Y1 )−1 = Y1 (I + Y1 )−1 + X, where Y1 (I + Y1 )−1 is strict upper triimplies G := B
e (I + Y1 )−1 =
angular. Hence tril (G) = X, triu∗ (G) = Y1 (I + Y1 )−1 and we can write B
e − BY
e 1 (I + Y1 )−1 in the form
B
Hence the exact perturbation term δL is given by δL = LX = Ltril (G). Lemma 17 and
tril (G) ek = I(k−1) Gek imply
·³
¸
´−1
´−1
³
(0)
(n−1)
e
e
e
e
G = I + BI
Be1 , . . . , I + BI
Ben
(2.38)
and
¸
·
³
´−1
³
´−1
e (0)
e (n−1)
e 1 , . . . , I(n−1) I + BI
e n .
Be
Be
X = I(0) I + BI
(2.39)
Thus the kth column of δL is given by
³ ´
e Be
e k
δL ek = LXek = LPk−1 B
(k = 1, . . . , n) .
(2.40)
This relation is showing the structural differences between δL1 and δL . A close inspection
on δL also reveals that it includes a Schur complement, while δL1 does not.
We can now derive perturbation δU of the upper triangular factor in the LU
factorization A = L1 U . By transposing A and A + δA = (L1 + δL1 ) (U + δU ) we obtain
the perturbed LU1 factorization
¢¡ T
¢
¡
T
T
T
AT + δA
L1 + δL
,
(2.41)
= U T + δU
1
¡
¢
e = U −T δ T L−T = B T and δ T ek = U T Pk−1 B T B T ek (k = 1, . . . , n). Hence
where B
A 1
U
¡ ¢¤T
£
U (k = 1, . . . , n) .
(2.42)
eTk δU = eTk B Pk−1 B T
Theorem 19 Assume that A and A + δA are nonsingular and have LU factorizations
−1
. The exact
A = L1 U and A + δA = (L1 + δL1 ) (U + δU ), respectively. Let B = L−1
1 δA U
perturbation terms δL1 and δU are then given by δL1 = L1 X1 and δU = Y U , where
X1 ek = Pk (B) Bek
(k = 1, . . . , n)
(2.43)
and
¡ ¢¤T
£
eTk Y = eTk B Pk−1 B T
(k = 1, . . . , n) .
(2.44)
Perturbations of triangular matrix factorizations
17
Remark 20 It can be easily seen that Y = triu (G), where G is the unique solution of
the equation
G = B − tril∗ (G) B.
(2.45)
Remark 21 The known perturbation bounds for the LU factorization are derived under
the condition kBk < 1 or ρ (|B|) < 1 ([19], [25], [46], [47], [231], [232], [236]). Here we
only assumed that I + B is nonsingular and has LU factorization.
Remark 22 The exact perturbation terms can be made formally independent of A by setting δA = L1 ∆A U . In this case B = ∆A , X1 and Y are independent of A. Consequently,
the condition numbers introduced in [231], [143] and [46] are not justiÞed in this case.
Remark 23 The exact perturbation terms δL1 and δU are related to certain block LU
factorizations, from which they can also be derived.
Remark 24 If A and A + δA are nonsingular M-matrices such that A ≤ A + δA , i.e.
δA ≥ 0, then Theorem 9 implies that δL1 , δU ≥ 0, B ≥ 0, X1 ≥ 0 and Y ≥ 0.
2.3.2 Exact perturbation terms for the LDU and Cholesky factorizations
Theorem 25 Assume that A and A + δA are nonsingular and have LDU factorizations A = L1 DU1 and A + δA = (L1 + δL1 ) (D + δD ) (U1 + δU1 ), respectively. Let B =
−1
. The exact perturbation terms δL1 , δD and δU1 are then given by δL1 = L1 X1 ,
L−1
1 δA U
δD = ΓD and δU1 = Y1 U1 ,where
X1 ek = Pk (B) Bek
and
(k = 1, . . . , n) ,
¡ ¢¤T
£
ek eTk
eTk Γ = eTk B Pk−1 B T
£ ¡ ¢¤T
D
eTk Y1 = eTk D−1 B Pk B T
(2.46)
(k = 1, . . . , n)
(2.47)
(k = 1, . . . , n) .
(2.48)
Proof. We have to derive only δU1 and δD . If A + δA is decomposed in the LU1
form
A + δA = (L + δL ) (U1 + δU1 ) ,
¢¡ T
¢
¡
T
T
T
e = L−1 δA U −1 .
L + δL
is an L1 U factorization. Let B
then AT + δA
= U1T + δU
1
1
³ ´
T
T T
T
T
T
e
e
Theorem 19 implies δ = U Y , where Y ek = Pk B B ek . Hence δU = Y1 U1 ,
U1
1
1
1
1
where
h ³ ´iT
eT
e Pk B
eTk Y1 = eTk B
(k = 1, . . . , n) .
³ ´
Also Y1 = triu∗ Fe holds, where Fe is the unique solution of the equation
³ ´
e − tril Fe B.
e
Fe = B
e = D−1 BD we obtain
Substituting B
£ ¡ ¢¤T
eTk Y1 = eTk D−1 B Pk B T
D
(k = 1, . . . , n) .
(2.49)
Triangular decompositions
18
Relation
A + δA = (L1 + δL1 ) (U + δU ) = (L1 + δL1 ) (D + δD ) (U1 + δU1 )
implies
δU = (D + δD ) (U1 + δU1 ) − DU1 = δD U1 + (D + δD ) δU1 .
Here (D + δD ) δU1 is strict upper triangular and U1 is unit upper triangular. Hence
δD = diag (δD U1 ) = diag (δU ). The relation δU = triu (G) U , where U is upper triangular,
implies
diag (δU ) = diag (G) diag (U ) = diag (G) D.
Let diag (G) = Γ = diag (γk ). By deÞnition
γk = eTk Gek = eTk GI(k−1) ek = eTk triu (G) ek = eTk Y ek ,
where Y is given by Theorem 19.
We remind that in the case of the LDLT factorization the term δU1 can be
dropped.
Theorem 26 Let A ∈ Rn×n and A + δA ∈ Rn×n be symmetric positive deÞnite and have
T
the Cholesky factorizations A = RT R and A + δA = (R + δR ) (R + δR ), respectively. Let
−T
−1
b
B = R δA R . The exact perturbation term δR is then given by δR = ΩR, where
and
³ ´T
³
´
1/2
1/2
b
b k B
eTk Ω = (1 + γk ) − 1 eTk + (1 + γk ) eTk BP
³ ´T
b ek
b k−1 B
γk = eTk BP
(k = 1, . . . , n)
(k = 1, . . . , n) .
(2.50)
(2.51)
Proof. The perturbation of the Cholesky factorization is derived from the identity
A + δA = (R + δR )T (R + δR ) = (L1 + δL1 ) (D + δD ) (L1 + δL1 )T .
As D +δD has only positive entries on the main diagonal we can deÞne (D + δD )1/2 . Thus
by the unicity of the Cholesky factorization
1/2
1/2 T
.
R + δR = (D + δD ) LT1 + (D + δD ) δL
1
Hence
´
³
1/2
1/2 T
.
δR = (D + δD ) − D1/2 LT1 + (D + δD ) δL
1
(2.52)
b = R−T δA R−1 and substitute B =
Theorem 25 gives δL1 = L1 X1 and δD = ΓD. Let B
³ ´
³ ´T
b D−1/2 and γk = eT BP
b ek .
b −1/2 . It follows that X1 (B) = D1/2 X1 B
b k−1 B
D1/2 BD
k
³ ´
b . The positivity conditions
Hence δL1 = RT ΛD−1/2 and δD = ΓD, where Λ = X1 B
eTk Dek > 0 and eTk (D + δD ) ek > 0 imply that γk > −1 for all k. Thus (D + δD )1/2 =
1/2
1/2
1/2
(I + Γ) D1/2 and δR = ΩR, where Ω = (I + Γ) − I + (I + Γ) ΛT .
Perturbations of triangular matrix factorizations
19
2.3.3 Bounds for the projection Pk (B)
In the previous sections we have seen that projections Pk (B) play the key role in perturbation errors. Next we give bounds for these projections. A crude upper bound is given
by
Lemma 27 If kBk2 < 1, then
kPk (B)k2 ≤ 1/ (1 − kBk2 ) .
°
°
°
−1 °
Proof. Using °(I + A) ° ≤ 1/ (1 − kAk) (kAk < 1) we can write
(2.53)
°
°
°I(k) °
1
1
2 °
°
° =
°
≤
.
1 − kBk2
1 − °BI (k) °2
1 − kBk2 °I (k) °2
°
³
´−1 °
°
°
°I(k) I + BI (k)
° ≤
°
°
2
A sharper bound can be obtained by using the following result.
Lemma 28 Let
F =
·
0
Z
0
I
¸
¡
∈ R(p+q)×(p+q)
¢
Z ∈ Rq×p .
If σ1 denotes the maximal singular value of Z, then
q
q
kF k2 = 1 + σ12 = 1 + kZk22 .
(2.54)
(2.55)
Proof. Let Z have the singular value decomposition Z = P ΣQT , where P ∈ Rq×q
and Q ∈ Rp×p are orthogonal,
Σ = diag (σ1 , . . . , σr ) ∈ Rq×p
and σ1 ≥ . . . ≥ σr ≥ 0. As
°· T
¸·
° Q
0
0
°
° 0 PT
Z
0
I
¸·
Q 0
0 P
(r = min {p, q}) ,
°·
¸°
¸°
°
°
°
° =° 0 0 °
° Σ I °
°
2
2
we have to determine the spectral radius of the matrix
·
¸T ·
¸
¸ · T
0 0
0 0
Σ Σ ΣT
.
=
Σ I
Σ
I
Σ I
If p ≤ q, this matrix has the 3 × 3 block
 2
D
 D
0
form
D
Ip
0

0
,
0
Iq−p
where D = diag (σ1 , . . . , σp ) ∈ Rp×p . Hence 1 is an eigenvalue of the matrix with multiplicity q − p. The rest of eigenvalues are those of matrix
¸
· 2
D D
.
D Ip
Consider
·
D2 − λI
D
D
Ip − λIp
¸
.
Triangular decompositions
20
¡
¢
¡
¢
As D2 − λI D = D D2 − λI we can write that
µ· 2
¸¶
¢
¢
¡
¡
D − λI
D
det
= det (1 − λ) D2 − λI − D2
D
Ip − λIp
p
Y
£
¡
¤
¢
=
(1 − λ) σi2 − λ − σi2
=
i=1
p
Y
i=1
£ ¡
¢¤
λ λ − 1 − σi2 = 0.
This implies that λ = 0 is an eigenvalue with multiplicity p, and λi = 1 + σi2 (i = 1, . . . , p)
are the remaining eigenvalues. Thus the spectral radius is 1 + σ12 . If p > q, the matrix
above has the following 3 × 3 block form

 2
D 0 D
 0
0 0 ,
D 0 I
where D = diag (σ1 , . . . , σq ) ∈ Rq×q . This matrix is permutationally similar to the matrix
 2

D D 0
 D I 0 .
0
0 0
Hence the eigenvalues are 0 with multiplicity p and λi = 1 + σi2 (i = 1, . . . , q). Thus we
proved that in any case
°·
¸°
q
q
° 0 0 °
° = 1 + σ2 = 1 + kZk2 .
°
(2.56)
1
2
° Z I °
2
Lemma 29 For k = 0, . . . , n,
kPk (B)k2 ≤
s
1+
kBk22
(1 − kBk2 )2
(kBk2 < 1) .
(2.57)
Proof. We recall that Pk (B) has the form (2.34) and we can apply Lemma 28
° ° ° °
¡
¢−1
. The inequality °Bk1 °2 , °Bk2 °2 ≤ kBk2 and kBk2 < 1 imply
with Z = −Bk2 I + Bk1
° ° °
¢−1 °
kBk2
kBk2
°¡
°
≤
kZk2 ≤ °Bk2 °2 ° I + Bk1
.
° ≤
1 − kBk1 k
1 − kBk2
2
As
s
1+
x2
2
(1 − x)
≤
1
1−x
(0 ≤ x < 1)
this upper estimate
of kPk (B)k2 is sharper than (2.53). Figure 1 shows the ratio function
q
2
x
[1/ (1 − x)] / 1 + (1−x)
2.
We introduce the quantity
p (B) = max kPk (B)k2
0≤k≤n
(2.58)
which we use in upper estimates to follow. This quantity can be replaced by any of the
following bounds.
Perturbations of triangular matrix factorizations
21
q
Figure 1 Function [1/ (1 − x)] / 1 +
x2
(1−x)2
Corollary 30
p (B) ≤
s
1+
kBk22
(1 − kBk2 )2
(kBk2 < 1) .
(2.59)
Remark 31 Bound (2.53) gives the weaker estimate
p (B) ≤
1
1 − kBk2
(kBk2 < 1) .
(2.60)
We now give an ”all”¯ B bound.
The Bauer-Skeel condition number of a matrix
¯
A is deÞned by κSkeel (A) = ¯A−1 ¯ |A|.
Lemma 32 Assume that I+B is nonsingular and has LU factorization I+B = LI+B UI+B ,
where LI+B is lower triangular and UI+B is upper triangular. Then
¢¡
¢
¡
(2.61)
Pk (B) = I(k) LI+B I(k) L−1
I+B ,
° °
°
°
°
kPk (B)k2 ≤ °I(k) LI+B °2 °I(k) L−1
I+B ≤ κ2 (LI+B )
and
¯¯
¯
¯
¯ −1 ¯
¡ −1 ¢
¯
¯
¯
|Pk (B)| ≤ ¯I(k) LI+B ¯ ¯I(k) L−1
I+B ≤ |LI+B | LI+B = κSkeel LI+B .
Proof. Let
I +B =
·
I + Bk1
Bk2
Bk4
Bk3
¸
= LI+B UI+B ,
where
LI+B =
·
L11
L21
0
L22
¸
,
UI+B =
·
U11
0
U12
U22
¸
¡
¢
L11 , U11 ∈ Rk×k ,
(2.62)
(2.63)
Triangular decompositions
22
LI+B is lower triangular and UI+B is upper triangular. Then
·
¸
0
0
¡
¢
Pk (B) =
,
−1
I
−Bk2 I + Bk1
¡
¢−1
−1
= (L21 U11 ) (L11 U11 ) = L21 L−1
where Bk2 I + Bk1
11 . We can write
·
¸ ·
¸·
¸
0
0
0
0
0
0
=
−1
L21 L22
−L21 L−1
I
−L−1
L−1
11
22 L21 L11
22
¡
¢¡
¢
−1
= I(k) LI+B I(k) LI+B .
Hence
¢¡
¢
¡
Pk (B) = I(k) LI+B I(k) L−1
I+B .
(2.64)
Corollary 33 Under the conditions of the lemma
¡ ¢
p (B) ≤ κ2 (LI+B ) , p B T ≤ κ2 (UI+B ) .
(2.65)
Remark 34 Here we can use optimal diagonal scaling to decrease κ2 (LI+B ) or κ2 (UI+B )
(see, e.g., [143]).
Finally we recall a result of Demmel [60] (see, also Higham [143]). Let
·
¸
¡
¢
A11 A12
A11 ∈ Rk×k
A=
∈ Rn×n
A21 A22
be symmetric and positive deÞnite. Then
´
³
°
°
°A21 A−1 ° ≤ κ2 (A)1/2 − κ2 (A)−1/2 /2.
11 2
b such that I + B
b is symmetric and positive deÞnite,
Lemma 35 For any B
Proof. Let
´1/2
³ ´ 1 1 ³
b
b ≤ + κ2 I + B
.
p B
2 2
b=
A=I +B
"
b1
I +B
k
2
b
Bk
b4
B
k
b3
I +B
k
#
(2.66)
(2.67)
(2.68)
.
Demmel’s result implies
° ³
µ
´−1 °
´i1/2 h ³
´i−1/2 ¶
°
° 2
1 h ³
b I +B
b1
b
b
°B
°
− κ2 I + B
,
k
° k
° ≤ 2 κ2 I + B
2
which yields
° ³ ´°
´1/2
1 1 ³
°
b °
b
.
°Pk B
° ≤ + κ2 I + B
2 2
2
(2.69)
Perturbations of triangular matrix factorizations
23
2.3.4 Norm bounds for the perturbations of LU and LDU factorizations
Theorem 36 Assume that A and A + δA are nonsingular and have LU factorizations
−1
. Then
A = L1 U and A + δA = (L1 + δL1 ) (U + δU ), respectively. Let B = L−1
1 δA U
kδL1 kF ≤¡ kL1¢k2 kX1 kF and kδU kF ≤ kUk2 kY kF , where kX1 kF ≤ p (B) kBkF and
kY kF ≤ p B T kBkF . Also we have
kδL1 − L1 tril∗ (B)kF ≤ kL1 k2 p (B) kBk2 ktriu (B)kF
(2.70)
and
¡ ¢
kδU − triu (B) U kF ≤ kU k2 p B T kBk2 ktril∗ (B)kF .
(2.71)
Proof. By deÞnition kX1 ek kF ≤ kPk (B)k2 kBek kF ≤ p (B) kBek kF and kX1 kF ≤
p (B) kBkF . Similarly,
° T °
°
° °
°
¡ ¢°
¡ ¢°
°e Y ° ≤ °eT B ° °Pk−1 B T ° ≤ p B T °eT B °
k
k
k
F
F
2
F
¡ T¢
and kY kF ≤ p B kBkF . Relations Pk (B) = I(k) − Pk (B) BI (k) , I(k) Bek = tril∗ (B) ek
∗
(k)
b
b
and °I (k) Be
° k = triu (B) ek imply X1 = tril (B) − X1 , where X1 ek = Pk (B) BI Bek
°b °
and °X1 ° ≤ p (B) kBk2 ktriu (B)kF . The relations
F
¡ ¢¤T
¡ ¢¤T
£
£
Pk−1 B T
= I(k−1) − I (k−1) B Pk−1 B T
,
eTk BI(k−1) = eTk triu (B)
and eTk BI (k−1) = eTk tril∗ (B) also yield Y = triu (B) − Yb , where
¡ ¢¤T
£
eTk Yb = eTk BI (k−1) B Pk−1 B T
° °
¡ ¢
° °
and °Yb ° ≤ p B T kBk2 ktril∗ (B)kF . Relations δL1 = L1 X1 and δU = Y U imply the
F
last two bounds of the theorem.
We can make the following observations.¡ ¢
1. If kBk2 < 1, then both p (B) and p B T are bounded by (2.59). Thus our
result is better than that of Barrlund [19], which corresponds to the bound (2.60). For
general B, we have the bounds (2.65).
e ), then δL1 = 0. Hence we overestie is upper triangular (δA = L1 UU
2. If B = U
mate δL1 . A similar argument applies to δU . Thus we need more sensitive and asymptotically correct estimates.
3. It is clear that δL1 ∼ L1 tril∗ (B) and δU ∼ triu (B) U hold for B → 0 in
agreement with Stewart [231].
Theorem 37 Assume that A and A + δA are nonsingular and have LDU factorizations A = L1 DU1 and A + δA = (L1 + δL1 ) (D + δD ) (U1 + δU1 ), respectively. Let B =
−1
. Then
L−1
1 δA U
kδL1 kF ≤ kL1 k2 kX1 kF , kδD kF ≤ kDk2 kΓkF
and
where kX1 kF
Also we have
kδU1 kF ≤ kU1 k2 kY1 kF ,
¡ ¢
¡ ¢
≤ p (B) kBkF , kΓkF ≤ p B T kBkF and kY1 kF ≤ κ2 (D) p B T kBkF .
kδL1 − L1 tril∗ (B)kF ≤ kL1 k2 p (B) kBk2 ktriu (B)kF ,
(2.72)
Triangular decompositions
24
and
¡ ¢
kδD − diag (B) DkF ≤ kDk2 p B T kBk2 ktril∗ (B)kF
°
°
¡
¡
¢ °
¡ ¢
¢°
°δU1 − triu∗ D−1 BD U1 ° ≤ kU1 k kDk p B T kBk °tril D−1 B ° .
2
2
2
F
F
(2.73)
(2.74)
Proof. The Þrst three bounds are direct consequences of Theorem 25. The
fourth bound is included in Theorem 36. So we prove only the last two bounds. Simple
b where
calculations and eTk BI (k−1) = eTk tril∗ (B) lead to Γ = diag (B) − Γ,
¡ ¢¤
£
b = eTk BI (k−1) B Pk−1 B T T ek eTk
eTk Γ
° °
¡ ¢
°b°
and °Γ
° ≤ p B T kBk2 ktril∗ (B)kF . Using similar arguments, eTk ZI(k) = eTk triu∗ (Z)
F
¡
¢
and eT ZI (k) = eT tril (Z) we have Y1 = triu∗ D−1 BD − Yb1 , where
k
and
k
£ ¡ ¢¤T
D
eTk Yb1 = eTk D−1 BI (k) B Pk B T
° °
°
¡
¡ ¢
¢°
°b °
°Y1 ° ≤ kDk2 p B T kBk2 °tril D−1 B °F .
F
Relations δD = ΓD and δU1 = Y1 U1 imply the bounds of the theorem.
T
Barrlund [19] derived
° following LDL perturbation bounds. If A is symmet° −1the
°
°
kδA k2 < 1, then
ric, positive deÞnite and A
2
°3/2
1/2 °
1 kAk2 °A−1 °2 kδA kF
kδL1 kF ≤ √
= ∆B
(2.75)
L1
1 − kA−1 k2 kδA k2
2
and
where
°
¢
¤
£
¡°
kδD kF ≤ (κ2 (A) + 1) ω °A−1 °2 kδA k2 − 1 kδA kF = ∆B
D,
(2.76)
1
1
ln
,
x 1−x
0 < x < 1.
(2.77)
°
°
These perturbation bounds are the functions of °A−1 °2 kδA k2 , while our bounds
depend on B. The inequality
³
´°
°
kδA kp
¡
¢
≤ kBkp ≤ κ2 D1/2 °A−1 °2 kδA kp (p = 2, F )
(2.78)
1/2
kAk2
κ2 D
ω (x) =
indicates that a direct comparison of the estimates
¡ is not
¢ easy. In many cases our estimates
are better. For example, if kAk2 ≥ 2 kDk2 κ2 D1/2 , δA 6= 0 and kBk2 ≤ 1/2, then our
estimate for δD is better than Barrlund’s estimate (2.76).
2.3.5 Norm bounds for the Cholesky factorizations
We give two theorems.
Theorem 38 Let A ∈ Rn×n and A + δA ∈ Rn×n be symmetric positive deÞnite and have
the Cholesky factorizations A = RT R and A + δA = (R + δR )T (R + δR ), respectively. Let
b = R−T δA R−1 . Then
B
³ ³ ´° ° ´ ³ ´° °
b °
b°
b °
b°
(2.79)
kδR kF ≤ θ p B
° kRk2 ,
° p B
°B
°B
2
where
θ (x) =
√
1+x−1 √
+ 1+x
x
F
(x ≥ 0) .
(2.80)
Perturbations of triangular matrix factorizations
25
Proof. Theorem 26 implies that δR = ΩR, where
¯
¯
°
° ° ³ ´°
° T °
°
°ek Ω° ≤ ¯¯(1 + γk )1/2 − 1¯¯ + (1 + γk )1/2 °
b k°
b °
Be
°
° °Pk B
°
F
2
and
2
°
°
° °
° ³ ´
³ ´°
°b ° °
°b °
b °
b
|γk | ≤ °Be
k ° °Pk−1 B ° ≤ °Bek ° p B .
2
2
2
¯√
¯
√
Let φδ (x) = ¯ 1 + x − 1¯ + δ 1 + x, x > −1 and |x| ≤ δ (δ ≥ 0). Then
max φδ (x) =
τ ≤x≤δ
√
√
1+δ−1+δ 1+δ
(τ = max {−1, −δ}) .
(2.81)
Now we can write
p
p
° T °
°ek Ω° ≤ 1 + δk − 1 + δk 1 + δk ,
F
°
³ ´°
b °
b k°
where δk = p B
° . As θ (x) (θ (0) = 3/2) is strictly monotone increasing in x ≥ 0,
°Be
2
we can write
µ√
¶
° T °
1 + δ∗ − 1 √
∗
°e Ω° ≤
+ 1 + δ δk ,
k
F
δ∗
³ ´° °
b °
b°
where δ ∗ = p B
° . Hence
°B
2
µ√
¶ ³ ´° °
1 + δ∗ − 1 √
∗ p B
b °
b°
kΩkF ≤
+
1
+
δ
° ,
°B
δ∗
F
which is the requested result.
Remark 39 Inequality
3
2
≤ θ (x) ≤
kδR kF ≤
µ
3
2
+ 12 x (x ≥ 0) implies the bound
° ¶ ³ ´° °
3 1 ³ b´ °
° b°
b °
b°
+ p B °B
° p B
° .
°B
2 2
2
F
(2.82)
Remark 40 The bound of the theorem can be weakened to
r
³ ´° °
³ ´° ° r
³ ´° °
b °
b°
b °
b°
b °
b°
kΩkF ≤ 1 + p B
1+p B
° −1+p B
°
° .
°B
°B
°B
(2.83)
Proposition 41 Under the assumptions of Theorem 26
r
° °
° b°
kΩkp ≥ 1 + °B
° − 1 (p = 2, F ) .
(2.84)
F
F
F
We give now a sharp lower estimate for kΩk.
p
b Hence
Proof. The matrix Ω satisÞes the relation Ω + ΩT + ΩT Ω = B.
° °
° b°
2
°B ° ≤ 2 kΩkp + kΩkp ,
p
which implies the statement.
Triangular decompositions
26
This lower bound is sharp. DeÞne
·
¸
0 0
Ω=
0 x
(x ≥ 0) .
Then
T
T
Ω+Ω +Ω Ω=
·
0
0
0 2x + x2
¸
b
= B,
r
° °
° °
√
° b°
° b°
2
=
2x
+
x
.
As
1 + °B
kΩkF = x and °B
°
° − 1 = 1 + 2x + x2 − 1 = x = kΩkF ,
F
F
the assertion is proved. The proof is also valid in the spectral norm.
We can establish thatr
our upper estimate for Ω exceeds the lower bound essentially
³ ´° °
³ ´° °
°
°
b °
b°
b °B
b°
1+p B
by the quantity p B
° .
°B
F
F
Following Stewart [232] we deÞne Up (Z) = triu∗ (Z) + pdiag (Z) for 0 ≤ p ≤ 1.
Theorem 42 Let A ∈ Rn×n and A + δA ∈ Rn×n be symmetric positive deÞnite and have
T
the Cholesky factorizations A = RT R and A + δA = (R + δR ) (R + δR ), respectively. Let
b = R−T δA R−1 . Then
B
°
³ ´ °
³° °
³ ´´ ³ ´ ° °2
°
° b°
b R°
b p B
b °
b°
(2.85)
° ≤ kRk2 φ °B
° ,p B
° ,
°δR − U1/2 B
°B
F
F
F
where
µ
¶
1
1
φ (x, y) = 1 + √
+ y + xy 2 .
2
2 2
(2.86)
Proof. Relation
√
x
1 + x = 1 + + Rx ,
2
|Rx | ≤
x2
2
(x ≥ −1)
implies that
1
(I + D)1/2 = I + D + RD ,
2
kRD kp ≤
1
kDk2p
2
(p = 2, F ) ,
where D = diag (di ) is such that di ≥ −1 (i = 1, . . . , n). We can write that Γ =
³ ´iT
h
³ ´
b
b = eT BI
b (k−1) B
b Pk−1 B
b − Γ,
b where eT Γ
ek eTk and
diag B
k
k
° °
³ ´°
³ ´° ° °
°
°b°
b °
b °
b°
° °tril∗ B
°Γ° ≤ p B
° .
°B
F
2
F
° °
³ ´
³ ´T
°bT °
b −Λ
b T , where eT Λ
b
b T = eT BI
b (k) BP
b k B
and
Similarly, ΛT = triu∗ B
°Λ ° ≤
k
k
F
³ ´°
³ ´° ° °
°
°
°
°
b ° . Hence Ω can be written as
b °B
b ° °tril B
p B
2
F
µ
¶³
³ ´
´
³ ´
´
1
1³
b
b
b −Λ
bT
triu∗ B
diag B − Γ + RΓ + I + Γ + RΓ
Ω=
2
2
µ
¶
³ ´ 1
1
T
b
b
b
= U1/2 B − Γ + RΓ − Λ +
Γ + RΓ ΛT .
2
2
Perturbations of triangular matrix factorizations
27
and
µ
¶
°
° °
°
³ ´°
° °
1°
1
°bT °
°
°
°b°
b
kΓkF + kRΓ kF °ΛT °F .
°Ω − U1/2 B ° ≤ °Γ° + kRΓ kF + °Λ ° +
2
2
F
F
F
°
° °
³ ´°
³ ´° °
° °
°
° b°
2
b °
b °
b°
By noting that °ΛT °F ≤ p B
°
° , kRΓ kF ≤ 12 kΓkF and °tril∗ B
° ≤ √12 °B
°B
F
F
F
we can easily obtain
µ
¶° ° ³ ´ ° °
°
³ ´°
³ ´ 1 ° °3
³ ´
2
1
° b °2
°
°
b°
b
b + °
b .
b°
b +°
° p2 B
° p3 B
°B
°B ° p B
°Ω − U1/2 B ° ≤ 1 + √
°B
2
F
F
F
F
2 2
This° bound
° is asymptotically correct. For large kBk it is worse than our Þrst
bound. For °A−1 °2 kδA k2 < 1 Sun [235] proved that
°
°
1 kRk2 °A−1 °2 kδA kF
kδR kF ≤ √
= ∆SR .
(2.87)
2 1 − kA−1 k2 kδA k2
° °
° °
°
°
° b°
° b°
It is easy to prove that Sun’s estimate is better, if °B
° = °A−1 °2 kδA k2 and °B
° ≤ 1/2.
2
° °2
° °
√
° b°
° b°
Our estimate is better, if °B
° ≤ 1/2 and κ2 (A) > 4 2n.
° = kδA k2 / kAk2 , °B
2
2
Drmaÿc, Omladiÿc and Veseliÿc ([74]) proved that for kBkF ≤ 1/2,
°
√ °
° b°
r
µ
¶
°
°
B
2
°
°
1
° b°
F
r
kΩkF ≤ √ 1 − 1 − 2 °B
=
°
° ° .
F
2
° b°
1 + 1 − 2 °B °
(2.88)
F
For small perturbations this estimate is better than any of the previous estimates. However
it is valid only for small perturbations unlike our estimates which are valid for all allowed
perturbations.
2.3.6 Componentwise perturbation bounds
We use the following simple observations. If ρ (|B|) < 1, then I − |B| is an M -matrix and
¯
¯
¯
−1 ¯
−1
(2.89)
¯(I + B) ¯ ≤ (I − |B|) .
¯
¯
¯
¯
The gap (I − |B|)−1 − ¯(I + B)−1 ¯ is estimated as follows:
¯
¯
¯
¯
0 ≤ (I − |B|)−1 − ¯(I + B)−1 ¯ ≤ (I − |B|)−1 (|B| + B) (I − |B|)−1 .
(2.90)
Note that |B| + B ≥ 0. If B ≤ 0, ¡then |B|¢+ B =
(I ¢+ B)−1 = (I − |B|)−1 .
¡ 0 and
(k)
(k+1)
≤ ρ |B| I
≤ ρ (|B|) similar statements
As ρ (B) ≤ ρ (|B|) and ρ |B| I
hold for the matrices BI (k) . We also exploit that for ρ (|B|) < 1, I − |B| I (k) is an
M-matrix (k = 0, 1, . . . , n) and
´−1 ³
´−1
³
I ≤ I − |B| I (k)
≤ I − |B| I (k+1)
≤ (I − |B|)−1 .
(2.91)
Theorem 43 Assume that A and A + δA are nonsingular and have LU factorizations
−1
. Then
A = L1 U and A + δA = (L1 + δL1 ) (U + δU ), respectively. Let B = L−1
1 δA U
|δL1 | ≤ |L1 | tril∗ (|F |) ,
|δU | ≤ triu (|G|) |U | ,
(2.92)
Triangular decompositions
28
where F and G are given by equations (2.28) and (2.45), respectively. If ρ (|B|) < 1, then
¡
¢
¢
¡
|δL1 | ≤ |L1 | tril∗ F b,1 , |δU | ≤ triu Gb,1 |U| ,
(2.93)
where F b,1 and Gb,1 are the unique solutions of the equations
F = |B| + |B| triu (F ) ,
G = |B| + tril∗ (G) |B|
(2.94)
respectively.
Proof. Theorem 19 implies that |δL1 | ≤ |L1 | |X1 |, |δU | ≤ |Y | |U |, where X1 =
¢−1
¡
Bek
tril∗ (F ) and Y = triu (G). Matrices F and G have the forms F ek = I + BI (k)
¡
¢−1
T
T
(k−1)
and ek G = ek B I + I
B
(k = 1, . . . , n). Condition ρ (|B|) < 1 implies
¯³
¯
´−1 ¯
´−1
³
¯
¯ |B| ek ≤ I − |B| I (k)
|B| ek = F b,1 ek
(2.95)
|F | ek ≤ ¯¯ I + BI (k)
¯
and
¯³
´−1 ¯¯
³
´−1
¯
¯ ≤ eTk |B| I − I (k−1) |B|
= eTk Gb,1 ,
eTk |G| ≤ eTk |B| ¯¯ I + I (k−1) B
¯
(2.96)
where the equalities follow from Lemma 17.
Remark 44 Estimates F b,1 and Gb,1 are sharp. If B ≤ 0, then |F | = F b,1 and |G| =
Gb,1 . Such situation (B ≤ 0) occurs, if A is an M -matrix and δA ≤ 0.
Inequality (2.91) implies
−1
F b,1 ≤ (I − |B|) |B| = F b,2 ,
Gb,1 ≤ |B| (I − |B|)−1 = Gb,2 .
The resulting weaker estimates
¡
¢
|δL1 | ≤ |L1 | tril∗ F b,2 ,
are due to Sun [236]. Note that F b,2 = Gb,2 .
¯
¢¯
¡
|δU | ≤ ¯triu Gb,2 ¯ |U |
(2.97)
Theorem 45 Assume that A and A + δA are nonsingular and have LU factorizations
−1
. Then
A = L1 U and A + δA = (L1 + δL1 ) (U + δU ), respectively. Let B = L−1
1 δA U
¡
¡
¢
¢
|δU | ≤ triu (|B| κSkeel (UI+B )) |U| ,
|δL1 | ≤ |L1 | tril∗ κSkeel L−1
(2.98)
I+B |B| ,
where I + B = LI+B UI+B .
¡
¢
Proof. From Lemma 32 we obtain |Pk (B)| ≤ κSkeel L−1
I+B ,
¡
¢
|X1 | ek ≤ κSkeel L−1
I+B |B| ek ,
¡
¡
¢
¢
and |X1 | ≤ tril∗ κSkeel L−1
I+B |B| . Also, Lemma 32 implies
¡ ¢ ¡
¢¡
¢
−T
T
I(k) UI+B
Pk B T = I(k) UI+B
and
¯£ ¡ ¢¤ ¯ ¯
¯¯
¯
T¯
¯
−1
I(k) ¯ ¯UI+B I(k) ¯ ≤ κSkeel (UI+B ) .
¯ Pk B T
¯ ≤ ¯UI+B
Hence |Y | ≤ |B| κSkeel (UI+B ) and |Y | ≤ triu (|B| κSkeel (UI+B )).
(2.99)
Perturbations of triangular matrix factorizations
29
Theorem 46 Assume that A and A + δA are nonsingular and have LDU factorizations A = L1 DU1 and A + δA = (L1 + δL1 ) (D + δD ) (U1 + δU1 ), respectively. Let B =
−1
e = L−1 δA U −1 . Then
L−1
and B
1 δA U
1
³¯ ¯´
¯ ¯
(2.100)
|δL1 | ≤ |L1 | tril∗ (|F |) , |δD | ≤ diag (|G|) |D| , |δU1 | ≤ triu∗ ¯Fe¯ |U1 | ,
where F , G and ³¯
Fe are
¯´ given by equations (2.28), (2.45) and (2.49), respectively. If
¯ e¯
ρ (|B|) < 1 and ρ ¯B ¯ < 1, then
³
´
¡
¢
¢
¡
|δL1 | ≤ |L1 | tril∗ F b,1 , |δD | ≤ diag Gb,1 |D| , |δU1 | ≤ triu∗ Feb,1 |U1 | ,
(2.101)
where F b,1 , Gb,1 are the unique solutions of equations (2.94), and Feb,1 is the unique
solution of equation
¯ ¯
³ ´¯ ¯
¯ e¯
¯ e¯
(2.102)
Fe = ¯B
¯.
¯ + tril Fe ¯B
Proof. Theorem 25 implies that |δL1 | ≤ |L1 | |X1 |, |δD³| ≤
´ |Γ| |D| and |δU1 | ≤
∗
∗ e
|Y1 | |U1 |, where X1 = tril (F ), Γ = diag (G) and Y1 = triu F . We have to prove
³
´−1
e I + I (k) B
e
only the last bound. Matrix Fe has the form eTk Fe = eTk B
(k = 1, . . . , n).
³¯ ¯´
¯ e¯
Condition ρ ¯B
¯ < 1 implies
eTk
¯ ¯
¯ e¯
¯F ¯ ≤ eTk
¯ ¯ ¯¯³
´−1 ¯¯
¯ e¯ ¯
(k) e
¯ ≤ eTk
¯B ¯ ¯ I + I B
¯
¯ ¯³
¯ ¯´−1
¯ e¯
¯ e¯
= eTk Feb,1 .
¯
¯B ¯ I − I (k) ¯B
(2.103)
b,1
eb,1 are sharp. If B ≤ 0 and B
e ≤ 0, then |F | =
Remark 47 Estimates F b,1 , G
¯ ¯ and F
¯ e¯
b,1
b,1
b,1
e
e
= ¯F ¯. Such situation (B ≤ 0, B ≤ 0) occurs, if A is an
F , |G| = G and F
M-matrix and δA ≤ 0.
Parts of Theorems 43 and 46 were obtained in [112] in a different form.
Theorem 48 Assume that A and A + δA are nonsingular and have LDU factorizations A = L1 DU1 and A + δA = (L1 + δL1 ) (D + δD ) (U1 + δU1 ), respectively. Let B =
−1
e = L−1 δA U −1 . Then
L−1
and B
1 δA U
1
¡
¡
¢
¢
|δL1 | ≤ |L1 | tril∗ κSkeel L−1
(2.104)
I+B |B| ,
|δD | ≤ diag (|B| κSkeel (UI+B )) |D| ,
³¯ ¯
³
´´
¯ e¯
|δU1 | ≤ triu∗ ¯B
¯ κSkeel UI+Be |U1 |
e = L eU e.
where I + B = LI+B UI+B and I + B
I+B I+B
(2.105)
(2.106)
Proof. Estimates for δL1 and δU follow from Theorem 45 and the relation Γ =
h ³ ´iT
eT
e Pk B
. Lemma 32 implies
diag (D). We recall that δU1 = Y1 U1 , where eTk Y1 = eTk B
³ ´ ³
´³
´
−T
e T = I(k) U T
I(k) UI+
that Pk B
e
e . Hence
I+B
B
´³
´
h ³ ´iT ³
−1
eT
U
Pk B
= UI+
I
I
e
(k)
(k)
e
I+B
B
Triangular decompositions
30
¯ ¯
³
´
¯
¯
¯ e¯
and ¯eTk Y1 ¯ ≤ eTk ¯B
¯ κSkeel UI+Be . This implies |Y1 | ≤
³¯ ¯
³
´´
¯ e¯
triu∗ ¯B
¯ κSkeel UI+Be |U1 | .
¯ ¯
³
´
¯ e¯
¯B ¯ κSkeel UI+Be and |δU1 | ≤
Assume that A is symmetric and positive deÞnite. Replace F b,1 and Gb,1 in
Theorem 46 by F b,2 and Gb,2 , respectively. We then have the weaker estimates
¡
¢
(2.107)
|δL1 | ≤ |L1 | tril∗ F b,2
and
¢
¡
|δD | ≤ diag Gb,2 D.
(2.108)
We recall that Sun [236] for symmetric
positive
deÞnite matrices proved that
¢
¡
under the assumptions ρ (|B|) < 1 and diag D−1 Eld < I,
³
¡
¡
¢¢−1 −1 ´
|δL1 | ≤ |L1 | tril∗ Eld I − diag D−1 Eld
,
D
(2.109)
|δD | ≤ diag (Eld )
(2.110)
with
¯
¯
¯
¢ ¯
¡
−T ¯ −1 −1 ¯ −1
¯.
D
L1 δA L−T
Eld = I − ¯L−1
1 δA L1
1
(2.111)
We compare now estimates (2.107)-(2.108) and (2.109)-(2.110), respectively. We
exploit the fact that for any diagonal matrix D, |AD| = |A| |D| and diag (AD) =
diag (A) D hold. We can write
Gb,2 = F b,2 = (I − |B|)−1 |B|
¯
¯
¯ −1
¡
¢ ¯
−T ¯ −1 −1 ¯ −1
¯ D = Eld D−1
= I − ¯L−1
D
L1 δA L−T
1 δA L1
1
and then estimate (2.108) yields
¢
¡
|δD | ≤ diag Gb,2 D = diag (Eld ) .
(2.112)
¢¢−1
¡
¡
¢¢−1 −1
¡
¡
As I − diag D−1 Eld
≥ I and Eld I − diag D−1 Eld
D ≥ Eld D−1 , the bound
(2.109) satisÞes
³
¡
¡
¢¢−1 −1 ´
¡
¢
≥ |L1 | tril∗ Eld D−1
(2.113)
D
|L1 | tril∗ Eld I − diag D−1 Eld
¡
¢
(2.114)
= |L1 | tril∗ F b,2 .
Thus it follows that Theorem 46 improves the LDLT perturbation result of Sun [236].
Theorem 49 Let A ∈ Rn×n and A + δA ∈ Rn×n be symmetric and positive deÞnite.
T
T
−T
−1
b
Assume
³¯ ¯´ that A = R R and A + δA = (R + δR ) (R + δR ) and let B = R δA R . If
¯ b¯
ρ ¯B
¯ < 1, then
|δR | ≤ Ωb,1 |R| ,
¯ ¯³
¯ ¯´−1
¯ b¯
¯ b¯
where eTk Ωb,1 = eTk ¯B
I(k−1) .
¯ I − I (k) ¯B
¯
(2.115)
Perturbations of triangular matrix factorizations
31
Proof. Note that R and R + δR have only positive diagonal entries. Let X =
δR R−1 . Then
T
R−T (A + δA ) R−1 = R−T (R + δR ) (R + δR ) R−1
(2.116)
b = X + X T + X T X. It follows that
can be written in the form B
b (I + X)−1 = X (I + X)−1 + X T ,
W := B
where X (I + X)−1 is upper triangular and X³T is lower triangular.
Hence triu (W ) =
´
¡ T¢
−1
−1
T
+ X . The relation
X (I + X) + diag X and tril (W ) = diag X (I + X)
−1
(I + X)
= I − X (I + X)−1
implies
b − BX
b (I + X)−1 .
b (I + X)−1 = B
B
It is easy to see that
µ
¶
´
³
xii
−1
= diag
diag X (I + X)
,
1 + xii
and
sign
Hence
and
µ
xii
1 + xii
¶
= sign (xii )
¯
¯
¯
−1 ¯
¯X (I + X) ¯ ≤ triu (|W |) ,
¢
¡
diag X T = diag (xii )
(i = 1, . . . , n) .
¯ T¯
¯X ¯ ≤ tril (|W |)
¯ ¯ ¯ ¯
¯ b¯ ¯ b¯
|W | ≤ ¯B
¯ + ¯B ¯ triu (|W |) .
¯ ¯
¯ b¯
It is clear that |W | ≤ F b,1 , where F b,1 is the unique solution of the equation F = ¯B
¯+
¯ ¯
¯
¯
¯
¯
´−1
³
¯ b¯
¯ b ¯ (k)
¯ b¯
¯B ¯ triu (F ). In componentwise form F b,1 ek = I − ¯B
¯I
¯B ¯ ek (k = 1, . . . , n). As
|X| ≤ [tril (|W |)]T , we have
¯
¯
¯ ¯
|δR | = ¯δR R−1 R¯ ≤ ¯δR R−1 ¯ |R| ≤ [tril (|W |)]T |R|
£
¡
¢¤T
≤ tril F b,1
|R| = Ωb,1 |R| .
This result is an improvement over Sun’s estimate [236], which has the form
µ¯ ¯ ³
¯ ¯´−1 ¶
¯ b¯
¯ b¯
|δR | ≤ triu ¯B
|R| .
(2.117)
¯
¯ I − ¯B
This claim follows from the inequality
µ¯ ¯ ³
µ¯ ¯ ³
¯ ¯´−1 ¶
¯ ¯´−1 ¶
¯ b¯
¯ b¯
¯ b¯
T
T ¯ b¯
ek triu ¯B ¯ I − ¯B ¯
= ek ¯B ¯ I − ¯B
I(k−1)
¯
¯ ¯³
¯ ¯´−1
¯ b¯
¯ b¯
≥ eTk ¯B
I(k−1) .
¯ I − I (k) ¯B
¯
Triangular decompositions
32
2.3.7 Iterations for upper bounds
The componentwise upper bounds can be obtained by Þxed point iterations which are
monotone in certain cases. The following concept and result is an extension of Ortega
and Rheinboldt ([202]).
DeÞnition 50 Let G : Rn×n → Rn×n be a given map and P ∈ Rn×n be such that P ≥ 0
and ρ (P ) < 1. The map G is said to be a P -contraction on Rn×n , if
¢
¡
(2.118)
|G (X) − G (Y )| ≤ P |X − Y |
X, Y ∈ Rn×n .
Theorem 51 If G is a P -contraction on Rn×n , then for every X0 ∈ Rn×n , the sequence
Xk+1 = G (Xk )
(k = 0, 1, . . . )
(2.119)
converges to the unique Þxed point X ∗ of G and
|Xk − X ∗ | ≤ (I − P )−1 P |Xk − Xk−1 |
|Xk − X ∗ | ≤ (I − P )−1 P k |X1 − X0 |
(k = 1, 2, . . . ) ,
(2.120)
(k = 1, 2, . . . ) .
(2.121)
Assume that X ≤ Y implies G (X) ≤ G (Y ). If W0 ≤ W1 = G (W0 ), then Wi+1 =
G (Wi ) (i = 0, 1, . . . ) is monotone increasing. If V0 ≥ G (V0 ) = V1 , then Vi+1 = G (Vi )
(i = 0, 1, . . . ) is monotone decreasing and Wi ≤ Wi+1 ≤ X ∗ ≤ Vi+1 ≤ Vi (i = 0, 1, . . . ).
Theorem 51 implies the following results.
Theorem 52 Consider equation W = C + Btriu (W, l), where B, C, W ∈ Rn×n and
l ≥ 0. If ρ (|B|) < 1, then for every W0 ∈ Rn×n the sequence Wk+1 = C + Btriu (Wk , l)
converges to W and
|Wk − W | ≤ (I − |B|)−1 |B|k |W1 − W0 |
(k = 1, 2, . . . ) .
(2.122)
Furthermore, if B ≥ 0 and C ≥ 0, then by setting X0 = 0 and Y0 = (I − B)−1 C the
iterates Xk+1 = C + Btriu (Xk , l) and Yk+1 = C + Btriu (Yk , l) (k ≥ 0) are monotone
and satisfy
−1
0 ≤ Xk ≤ Xk+1 ≤ W ≤ Yk+1 ≤ Yk ≤ (I − B) C.
(2.123)
If B ≥ 0 and C ≤ 0, then by setting X0 = (I − B)−1 C and Y0 = 0 the iterates {Xk } and
{Yk } satisfy
(I − B)−1 C ≤ Xk ≤ Xk+1 ≤ W ≤ Yk+1 ≤ Yk ≤ 0
(k ≥ 0) .
(2.124)
Proof. For ρ (|B|) < 1 the mapping φ (X) = C+Btriu (X, l) is a |B|-contraction.
For B ≥ 0 the inequality X ≤ Y implies φ (X) ≤ φ (Y ). If, in addition, C ≥ 0, then
X0 = 0 ≤ X1 = φ (0) = C. Hence Xk % W . For Y0 = (I − B)−1 C (Y0 ≥ 0) we have the
inequality
Y1 = C + Btriu (Y0 , l) ≤ C + BY0 = Y0
implying Yk & W . Similarly, if B ≥ 0 and C ≤ 0, then X0 = (I − B)−1 C (X0 ≤ 0)
satisÞes the inequality
X1 = C + Btriu (X0 , l) ≥ C + BX0 = X0 .
Hence Xk % W . Also, Y0 = 0 ≥ Y1 = φ (0) = C leads to Yk & W .
Perturbations of triangular matrix factorizations
33
Theorem 53 Consider equation W = C + tril (W, −l) B, where B, C, W ∈ Rn×n and
l ≥ 0. If ρ (|B|) < 1, then for every W0 ∈ Rn×n the sequence Wk+1 = C + tril (Wk , −l) B
converges to W and
k
−1
|Wk − W | ≤ |W1 − W0 | |B| (I − |B|)
(k = 1, 2, . . . ) .
(2.125)
−1
Furthermore, if B ≥ 0 and C ≥ 0, then by setting X0 = 0 and Y0 = C (I − B) the
iterates Xk+1 = C + tril (Xk , −l) B and Yk+1 = C + tril (Yk , −l) B (k ≥ 0) are monotone
and satisfy
−1
0 ≤ Xk ≤ Xk+1 ≤ W ≤ Yk+1 ≤ Yk ≤ C (I − B) .
(2.126)
If B ≥ 0 and C ≤ 0, then by setting X0 = C (I − B)−1 and Y0 = 0 the iterates {Xk } and
{Yk } satisfy
C (I − B)−1 ≤ Xk ≤ Xk+1 ≤ W ≤ Yk+1 ≤ Yk ≤ 0
(k ≥ 0) .
(2.127)
Proof. Equation W = C + tril (W, −l) B is equivalent to
¡
¢
W T = C T + B T triu W T , l .
We can observe that the right-hand sides of equations (2.28), (2.45), (2.94), (2.49)
and (2.102) are |B|-contractions. Hence F , G, F b,1 , Gb,1 , Fe and Feb,1 can be obtained by
the Þxed point iterations given in Theorems 52 and 53. In the case of F b,1 , Gb,1 and Feb,1
we always have monotone convergence for the given initial matrices. In the case of F , G
and Fe we may have monotone convergence, if B ≤ 0. We also note that F b,2 and Gb,2
are the initial vectors for the monotone iterations to F b,1 and Gb,1 .
−1
−1
Let φ0 = (I − |B|) |B| = |B| (I − |B|) = ψ0 , φi+1 = |B|+|B| triu (φi ) (i ≥ 0)
∗
and ψi+1 = |B| + tril (ψi ) |B| (i ≥ 0). We then have the sequence of estimates
¡
¢
(2.128)
|δL1 | ≤ |L1 | tril∗ F b,1 ≤ |L1 | tril∗ (φi+1 ) ≤ |L1 | tril∗ (φi ) (i ≥ 0)
and
¢
¡
|δU | ≤ triu Gb,1 |U | ≤ triu (ψi+1 ) |U | ≤ triu (ψi ) |U|
(i ≥ 0) .
(2.129)
Similar estimates hold for the LDU case. Finally we give two examples calculated in
MATLAB.
Example 54 (LU perturbation). Let

2
 1
A=
 1
1
and

1
2
1
1

1 2
1 1.5 

2 1.5 
1 2.5

0.0107 −0.3193 −0.0176 −0.0738
 −0.2032 0.3638 −0.0637 −0.0823 

δA = 
 −0.0002 0.0420 −0.0630 0.0236  .
−0.0403 0.0148 −0.0351 −0.0301
Triangular decompositions
34
¯¢
¡¯
−1 ¯
= 0.5001, kδA kF = 0.55,
In this case ρ ¯L−1
1 δA U


0
0
0
0
 −0.1037
0
0
0 

δL1 = 
 −0.0028 0.0027
0
0 
−0.0227 −0.0039 −0.003 0
and


0.0107 −0.3193 −0.0176 −0.0738

0
0.5940
0.0470
0.1544 

δU = 

0
0
−0.0686 0.0126 
0
0
0
−0.0014
The error estimates (2.97) of Sun (case i = 0)

0
0
0

0.1958
0
0
Sun
δL
=
1
 0.1014 0.0195
0
0.0798 0.0193 0.0027
and
Sun
δU
(kδL1 kF = 0.1063) ,
(kδU kF = 0.7011) .
are given by

0
¢
¡° Sun °
0 

°δL ° = 0.2361 ,
1
F
0 
0


0.1007 0.6379 0.4064 0.3451

0
1.1909 0.7164 0.4955 

=

0
0
0.0763 0.0509 
0
0
0
0.0016
¢
¡° Sun °
°δU ° = 1.6990 .
F
The error estimates given by the Þrst iterate (φ1 , ψ1 ) are the following


0
0
0
0
°
´
³°
 0.1095
0
0
0 
° (1) °
(1)

δL1 = 
=
0.1414
,
°
°δ
L1
 0.0701 0.0182
0
0 
F
0.0490 0.0190 0.0027 0
(1)
δU


0.0107 0.3301 0.1989 0.1427

0
0.6913 0.4070 0.2812 

=

0
0
0.0725 0.0482 
0
0
0
0.0016
The error estimate (2.93) (case i → ∞) yields

0
0
0
 0.1048
0
0
best
δL1 = 
 0.0671 0.0181
0
0.0469 0.0189 0.0027
best
δU
Notice that

°
´
³°
° (1) °
°δU ° = 0.9482 .
F
the bounds

0
¢
¡° best °
0 

°δL ° = 0.1356 ,
1

F
0
0

0.0107 0.3301 0.1989 0.1427

0
0.6617 0.3894 0.2692 

=

0
0
0.0721 0.0481 
0
0
0
0.0016
° best °
°δU ° = 0.9157.
F
° best °
° Sun °
° / kδL1 k = 1.2753
°δL ° / kδL1 k = 2.2211 > °δL
F
F
1
1
F
F
Perturbations of triangular matrix factorizations
35
2.5
2
1.5
1
0.5
0
0
5
10
15
20
Figure 2 The norms of perturbation bounds for the LDLT factorization
and
° Sun °
° best °
°δU ° / kδU k = 2.4232 > °δU
° / kδU k = 1.3059.
F
F
F
F
Hence our estimate is an improvement over Sun’s. We can also observe that the Þrst
iterate gives estimates almost as good as the best estimates.
Computer experiments on symmetric positive deÞnite MATLAB test matrices
also indicate that the iterative estimate (i = 1) is often so good as the optimal estimate
e was
itself. We could observe signiÞcant difference between the estimates if B and B
relatively large. A typical result is shown on Figure 2.
Example 55 (LDLT perturbation). Here 20 random symmetric matrices with elements
of the magnitude 5 × 10−3 are added to the symmetric positive deÞnite matrix


1 −1 −1 −1 −1
 −1 2
0
0
0 



3
1
1 
A =  −1 0

 −1 0
1
4
2 
−1 0
1
2
5
(Example 6.1 of Sun [236]) and the Frobenius norms of the estimates and the true lower
triangular error matrix are displayed. Hence, the line marked with + denotes estimate
(2.109)-(2.111) of Sun, the line with triangles denotes the estimate (2.107)-(2.108), the
solid line denotes the Þrst iterative estimate (i = 1), the line with circles denotes the best
estimate (i → ∞), while the line with pentagrams stands for the true lower triangular
error matrix.
36
Triangular decompositions
Chapter 3
THE RANK REDUCTION PROCEDURE OF EGERVÁRY
The rank reduction procedure is a simple but a very general technique. It determines the
rank of a matrix and provides a related full rank factorization. The rank reduction algorithm Þrst appeared in Wedderburn [247] concerning the reduction of quadratic forms.
Egerváry developed the rank reduction algorithm in a sequence of papers from 1953 to
1956. The rank reduction is used in eigenvalue problems, factorizations, solution of linear
algebraic systems, optimization and many other areas. Most of the history of rank reduction is given in Hubert, Meulman and Heiser [160]. Corrections on the role of Egerváry
are given in [114] and [116].
The conjugation algorithm of the ABS methods is based on the rank reduction
algorithm of Egerváry (see, e.g. [6], [9]). The investigations of Section 5.4 necessarily
lead to the study of the rank reduction algorithm. The obtained results [108], [114], [113],
[95] are presented in this chapter. We give an exact characterization of the full rank
factorization produced by the rank reduction algorithm and exploit this result concerning
matrix decompositions and conjugation procedures. We also give an application of the
rank reduction algorithm concerning the inertia of matrices. The ABS applications of
these results will be shown in Sections 4.3, 4.5, 5.3, 5.4 and 5.5.
DeÞnition 56 (Frazer, Duncan, Collar). Assume that A ∈ Fm×n has rank r ≥ 1. Decomposition A = XY is called full rank factorization, if X ∈ Fm×r , Y ∈ Fr×n and
rank (X) = rank (Y ) = r.
Every nonzero matrix has a full rank factorization. The full rank
is
¡ factorization
¢
not unique for if A = XY is a full rank factorization, then A = (XM ) M −1 Y is also a
full rank factorization of A whenever M ∈ Fr×r is nonsingular. If A = XY and A = X1 Y1
are full rank factorizations, then a nonsingular matrix M exists such that X1 = XM −1
and Y1 = M Y . Full rank factorizations are frequently given in the form A = XZY ,
where Z ∈ Fr×r is nonsingular. Other notations of full rank factorizations are A = XY H
(X ∈ Fm×r , Y ∈ Fn×r ) and A = XZY H , respectively.
Let A = XY H , where X = [x1 , . . . , xr ] and Y = [y1 , . . . , yr ]. We can write A as
the sum of outer products:
A=
r
X
xi yiH ,
(3.1)
i=1
This representation is minimal in the sense that A cannot be decomposed as the sum of
less than r outer products. Hence the number of outer products in the above minimal
representation deÞnes the rank of matrix A (see, e.g., Egerváry [76]). Any partition of
the full rank factorization
¡
¢
H
Xj ∈ Fm×lj , Yj ∈ Fn×lj
A = XY H = [X1 , . . . , Xk ] [Y1 , . . . , Yk ]
The rank reduction procedure of Egerváry
38
gives the rank additive decomposition
A=
k
X
Xj YjH
(3.2)
j=1
with
rank (A) =
k
X
j=1
k
¢ X
¡
H
rank Xj Yj =
lj .
(3.3)
j=1
The rank additive decompositions have special properties (see Marsaglia and Styan [177],
Carlson [42]).
Lemma 57 (Guttman [134]). Let
·
¸
E F
A=
∈ Fm×n
G H
If E is nonsingular, then
¡
¢
E ∈ Fk×k .
¡
¢
rank (A) = rank (E) + rank H − GE −1 F .
(3.4)
(3.5)
If H is nonsingular, then
¡
¢
rank (A) = rank (H) + rank E − F H −1 G .
(3.6)
DeÞnition 58 The Schur complement of E in A is deÞned by
S = (A/E) = H − GE −1 F.
(3.7)
Similarly, the Schur complement of H in A is given by
T = (A/H) = E − F H −1 G.
Theorem 59 (Banachiewicz, Frazer, Duncan, Collar). Let
·
¸
¡
¢
E F
E ∈ Fk×k
A=
∈ Fn×n
G H
and E be nonsingular. Then A−1 can be expressed in the partitioned form
· −1
¸
E + E −1 F S −1 GE −1 −E −1 F S −1
A−1 =
.
−S −1 GE −1
S −1
Formula (3.10) is the basis for
and Collar [87] (see also [134], [203]).
form
·
¸−1 · −1
E
E F
=
0
G H
(3.8)
(3.9)
(3.10)
the bordered inversion method of Frazer, Duncan
Formula (3.10) can be written in the alternative
0
0
¸
+
·
E −1 F
−I
¸
£
¤
S −1 GE −1 , −I .
(3.11)
Let A ∈ Rm×n be an arbitrary matrix. The matrix A− is said to be a (1)-inverse
or g-inverse of A, if AA− A = A. The set of (1)-inverses of A will be denoted by A{1} . The
matrix A− is said to be a reßexive or (1, 2)-inverse of A, if AA− A = A and A− AA− = A− .
A particular (1, 2)-inverse will be denoted by A(1,2) , while the set of (1, 2)-inverses will be
denoted by A{1,2} . The Moore-Penrose inverse of a matrix A will be denoted by A+ .
The rank reduction operation
3.1
39
The rank reduction operation
The following result gives a unique characterization of the rank reduction operation. It is
the basis for computing rank, full rank factorizations, inertia and conjugate directions.
Theorem 60 (The rank reduction theorem). Let U R−1 V T (U ∈ Rm×k , R ∈ Rk×k ,
V ∈ Rn×k ) be a full rank decomposition of S. Then the necessary and sufficient conditions
for
rank (H − S) = rank (H) − rank (S) ,
(3.12)
are
U = HX,
V T = Y T H,
Y T HX = R
(3.13)
for some matrices X ∈ Rn×k and Y ∈ Rm×k .
Proof. Let
· T
Y HX
B=
HX
Y TH
H
¸
=
·
YT
Im
Ik
0
¸·
H
0
0
0
¸·
X
Ik
In
0
¸
.
(3.14)
The Þrst and last matrices of the right-hand side are nonsingular. Hence rank (H) =
rank (B). The Guttman lemma then implies
³
¢−1 T ´
¡
Y H = rank (H) − rank (S) .
rank H − HX Y T HX
Assume that rank (H − S) = rank (H) − k and deÞne
·
¸
R VT
B=
.
U H
The Guttman lemma implies
¢
¡
rank (H) ≤ rank (B) = rank (R) + rank H − U R−1 V T = rank (H) .
Hence rank (H) = rank (B) and there exist matrices X and Y such that
·
¸ · T ¸
¤
£
R
V
T
T
= Y [U, H] ,
=
R, V
X.
U
H
Thus we obtain V T = Y T H, U = HX and R = V T X = Y T U = Y T HX.
The rank reduction theorem is constructive. If rank (H) ≥ k, then H has a k × k
¡
¢−1 T
Πα H
nonsingular submatrix H [α, β]. Letting Y = Πα and X = Πβ , S = HΠβ ΠTα HΠβ
and rank (H − S) = rank (H) − k. The rank reduction operation is deÞned by
¢
¡
b = H − HX Y T HX −1 Y T H,
(3.15)
H
where H ∈ Rm×n , X ∈ Rn×k and Y ∈ Rm×k .
The sufficient part of the theorem was Þrst proved by Wedderburn for k = 1 (
[247], pp. 68-69) and then extended by Guttman [133] for k > 1. Later Guttman [135]
proved the whole theorem just after Egerváry [77], [76], who formulated and proved it for
k = 1, independently of Wedderburn. Other proofs can be found in Cline and Funderlic
[53], and Elsner and Rózsa [79]. The presented proof exploits Ouellette [203] (p. 200).
Theorem 60 is not the only characterization of rank subtractivity. Marsaglia
and Styan [177], and Cline and Funderlic [53] gave a number of equivalent conditions for
having rank(A − S) =rank(A) −rank(S). A generalization of the rank reduction operation
to g-inverses is given by Ouellette [203] (see also Galántai [95])
The rank reduction procedure of Egerváry
40
Proposition 61 The rank reduction operation has the following properties:
b can be written in the forms
(i) H
b = (I − R) H = H (I − S) ,
H
(3.16)
¢−1 T
¡
¢−1 T
¡
Y and S = X Y T HX
Y H are oblique projectors with
where R = HX Y T HX
N (R) = R⊥ (Y ) ,
R (R) = R (HX)
(3.17)
and
¡
¢
N (S) = R⊥ H T Y ,
R (S) = R (X) ;
(3.18)
b = 0 and Y T H
b = 0;
(ii) HX
¢−1 T
¡ T
Y . Then H = M − . If k = rank (M ) < rank (H), then
(iii) Let M = X Y HX
−
M 6= H . If k = rank (H), then M ∈ H {1,2} .
¡
¢−1
= G+ and Y =
(iv) Let H = F GT be a full rank factorization. If X = G GT G
¡ T ¢−1
+ T
+
= (F ) (k = rank (H)), then M = H .
F F F
¢−1 |k
¡
¢−1 |k
¡
b has the form
I and Y = F F T F
I , then H
(v) If k < rank (H), X = G GT G
¢
¡
b = F I − I |k I |kT GT .
H
b = H (I − S) X =
Proof. Properties (i) and (iii) are obvious. (i) implies (ii) as HX
³
´T
¡
¢−1
T
b = Y T (I − R) H = H T (I − R) Y
0 and Y T H
= 0. If H = F GT , X = G GT G
¡
¢−1
¡
¢−1 ¡ T ¢−1 T
F F
and Y = F F T F
, then by substitution M = G GT G
F , which is the
¡
¢−1 |k
¡ T ¢−1 |k
I and Y = F F T F
I
Moore-Penrose inverse of H. In the case X = G G G
b has the form H
b = F GT − F I |k I |kT GT .
matrix H
b = H,
b that is H − ∈ H
b −H
b {1} (H − = H
b − ).
We note that for any H − , HH
Proposition 62 The rank reduction operation does not change the zero columns or rows.
If X or Y contains unit vectors, the rank reduction operation makes the corresponding
columns or rows zero.
b j = 0 or
Proof. Assume that Hej = 0 or eTi H = 0. Then by deÞnition He
b =
0 also holds. If ei = Y ej for some 1 ≤ j ≤ k, then by the identity eTi H
th
b
= 0 the i row of H is zero. Similarly, if ep = Xeq for some 1 ≤ q ≤ k, then
b q = 0 implies that the pth column of H
b is zero.
b p = HXe
relation He
b =
eTi H
T T b
ej Y H
Lemma 63 Let X = Πβ and Y = Πα . Then
b [α0 , β 0 ] = H [α0 , β 0 ] − H [α0 , β] (H [α, β])−1 H [α, β 0 ] ,
H
(3.19)
b are zero.
while the other elements of H
b β = 0, ΠTα H
b = 0 and
Proof. By deÞnition HΠ
¡
¢
b [α0 , β 0 ] = ΠTα0 HΠ
b β 0 = ΠTα0 HΠβ 0 − ΠTα0 HΠβ ΠTα HΠβ −1 ΠTα HΠβ0 .
H
Expression (3.19) is also called the generalized Schur complement. For particular
choices we have the following consequences.
The rank reduction algorithm
41
Corollary 64 Assume that H ∈ Rn×n is partitioned according to
·
¸
¡
¢
H11 H12
H11 ∈ Rk×k
H=
H21 H22
(3.20)
(3.9) and let J = I n−k| ∈ Rn×(n−k) , Jb = I |k ∈ Rn×k . If H is nonsingular, then
·
¸
¡ T
¢−1 T
(H/H22 ) 0
H − HJ J HJ
J H=
.
0
0
(3.21)
If H11 is nonsingular, then
·
¸
´−1
³
0
0
T
T
b
b
b
b
J H=
H − HJ J HJ
.
0 (H/H11 )
(3.22)
¢
¡
−1
and
Remark 65 If H is replaced by H −1 , then H −1 /S −1 = H11
¡
¢−1 T −1
J H =
H −1 − H −1 J J T H −1 J
·
−1
H11
0
0
0
¸
.
(3.23)
Thus one rank reduction step on matrix H −1 results the inverse of the leading
principal submatrix H11 . Relation (3.23) gives a direct connection between the rank
reduction and the bordered inversion method based on formula (3.10). This connection
was Þrst exploited by Egerváry [76], [78] and rediscovered in a different form by Brezinski
et al. [32] (see also Galántai [113]).
The rank reduction is called symmetric if both H and S are symmetric. Chu,
Funderlic and Golub [51] proved the following results.
Lemma 66 (Chu-Funderlic-Golub [51]). Let H and S be symmetric matrices. Then
rank (H − S) = rank (H) − rank (S) if and only if there is a matrix such that S =
¢−1 T
¡
X H.
HX X T HX
Theorem 67 (Chu-Funderlic-Golub [51]). Suppose that H is symmetric positive deÞnite,
S is symmetric, and rank (H − S) = rank (H) − rank (S). Then H − S is positive
semideÞnite.
3.2
The rank reduction algorithm
The rank reduction algorithm is based on the repeated use of the rank reduction operation.
Let H1 = H, Xi ∈ Rn×li , Yi ∈ Rm×li , li ≥ 1 and YiT Hi Xi be nonsingular for i =
1, 2, . . . , k.
The rank reduction procedure
¡
¢−1 T
Yi Hi
Hi+1 = Hi − Hi Xi YiT Hi Xi
(i = 1, 2, . . . , k) ,
P
where rank (H) ≥ ki=1 li .
It is clear that
rank (Hi+1 ) = rank (Hi ) − li = rank (H) −
i
X
lj .
j=1
The algorithm stops, if Hk+1 = 0. If Hk+1 = 0, then rank (H) =
Pk
i=1 li .
(3.24)
The rank reduction procedure of Egerváry
42
We can write Hk+1 in the forms
Hk+1 = H1 −
k
X
i=1
and
¡
¢−1 T
Hi Xi YiT Hi Xi
Yi Hi
Hk+1 = H1 − QD−1 P T ,
where
and
¤
£
P = H1T Y1 , . . . , HkT Yk ,
Q = [H1 X1 , . . . , Hk Xk ]
¢
¡
D = diag Y1T H1 X1 , . . . , YkT Hk Xk .
(3.25)
(3.26)
If Hk+1 = 0, then we obtain the rank additive decomposition
H=
k
X
i=1
¡
¢−1 T
Hi Xi YiT Hi Xi
Yi Hi
(3.27)
and the related full rank factorization
H = QD−1 P T .
(3.28)
For the general properties and characterizations of rank additive decompositions we refer
to Carlson [42], and Marsaglia and Styan [177].
DeÞnition 68 The rank reduction procedure is said to be breakdown free, if YiT Hi Xi is
nonsingular for all i = 1, 2, . . . , k.
Theorem 69 Let X = [X1 , . . . , Xk ] and Y = [Y1 , . . . , Yk ]. Then the rank reduction
¤k
£
procedure can be carried out breakdown free if and only if Y T HX = YiT HXj i,j=1 is
block strongly nonsingular. In this event the rank reduction procedure has the canonical
form
¡
¢−1 T
Y H.
(3.29)
Hk+1 = H − HX Y T HX
Proof. We Þrst assume that k successive steps were performed, which means
that YiT Hi Xi is nonsingular for i = 1, . . . , k. Then
Hk+1 = H1 − QD−1 P T ,
where
and
¤
£
P = H1T Y1 , . . . , HkT Yk ,
Q = [H1 X1 , . . . , Hk Xk ]
¢
¡
D = diag Y1T H1 X1 , . . . , YkT Hk Xk .
(3.30)
(3.31)
Let us observe that YiT Hj = 0 (i < j) and Hi Xj = 0 (i > j). Therefore Y T Q =
¤k
¤k
£
£ T
Yi Hj Xj i,j=1 = LD and P T X = YiT Hi Xj i,j=1 = DU , where the nonsingular L and
U are unit block lower and upper triangular, respectively. We can also observe that
H1 X = (H1 − Hk+1 ) X = QD−1 P T X = QU.
The rank reduction algorithm
43
This implies that Q = H1 XU −1 . Similarly we obtain
Y T H1 = Y T (H1 − Hk+1 ) = Y T QD−1 P T = LP T
and P T = L−1 Y T H1 . Hence Hk+1 = H1 − QD−1 P T implies
Hk+1 = H1 − H1 XU −1 D−1 L−1 Y T H1 = H − HX (LDU )−1 Y T H.
As
Y T H1 X = Y T QU = LDU
(3.32)
we proved that Y T HX is block strongly nonsingular. By substitution we obtain formula
(3.29). Let us suppose that Y T HX is block strongly nonsingular. We must prove that
YiT Hi Xi is nonsingular for i = 1, . . . , k. By the initial assumption Y1T H1 X1 is nonsingu¡
¢−1 T
Y1 H1 is deÞned. Let us assume that
lar. Hence matrix H2 = H1 − H1 X1 Y1T H1 X1
Hi is deÞned for i ≤ k, X |i = [X1 , . . . , Xi ] and Y |i = [Y1 , . . . , Yi ]. Then
·³
¸−1 ³
´T
´T
Y |i−1 H
Hi = H − HX |i−1 Y |i−1 HX |i−1
and
YiT Hi Xi
=
YiT HXi
− YiT HX |i−1
·³
¸−1 ³
´T
´T
|i−1
|i−1
Y
Y |i−1 HXi .
HX
As YiT Hi Xi is the Schur complement of the block bordered matrix
" ¡
#
¢
¡ |i−1 ¢T
|i−1 T
|i−1
Y
Y
HX
HX
|iT
|i
i
Y HX =
,
YiT HX |i−1
YiT HXi
´ P
³¡
¡
¢ P
¢T
rank Y |iT HX |i = ij=1 lj and rank Y |i−1 HX |i−1 = i−1
j=1 lj , the Guttman lemma
¡ T
¢
T
implies that rank Yi Hi Xi = li . Hence Yi Hi Xi is nonsingular and Hi+1 is deÞned.
We emphasize that k successive rank reduction operations can be replaced by one
block rank reduction. Thus the characterizations of one rank reduction remain valid for
the rank reduction algorithm as well. In addition we give the following
Proposition 70 Let H − be any (1)-inverse of H. Then
Hk H − Hj = Hj H − Hk = Hk
(j ≤ k) .
(3.33)
provided that Y T HX is block strongly nonsingular.
Observe that H − Hk ’s are projectors. If H1 = I ∈ Rn×n , then all Hk ’s are
projectors. Another proof of this fact can be obtained as follows.
rs
The rank subtractivity partial ordering A ≤ B of Hartwig [139] and Nambooripad
[192] is deÞned by
rs
A ≤ B ⇔ rank (B − A) = rank (B) − rank (A) .
It is clear that the rank reduction algorithm produces the following partial orderings
rs
Hi+1 ≤ Hi ,
rs
¡
¢−1 T
Hi Xi YiT Hi Xi
Yi Hi ≤ Hi .
(3.34)
The rank reduction procedure of Egerváry
44
rs
Hartwig and Styan [140] proved that Π is a projector if and only if Π ≤ Ω for some
¡
¢−1 T
Yi Hi are
projector Ω. Hence H1 = I implies that Hi and ∆i = Hi Xi YiT Hi Xi
projectors for i = 1, . . . , k. For Hk+1 = 0 we obtain the rank additive decomposition
Pk
i=1 ∆i = I, where all ∆i ’s are oblique projectors. This special case of decomposition
(3.27) is related to generalizations of the Cochran theorem (cf. Carlson [42], Marsaglia
and Styan [177]).
Pk
We assume that Hk+1 = 0 (rank (H) = i=1 li ) and Y T HX is block strongly
nonsingular. Let X (i) = [X1 , . . . , Xi ] and Y (i) = [Y1 , . . . , Yi ].
Theorem 71 If Y T HX is block strongly nonsingular and Hk+1 = 0, then Hi+1 has the
canonical form
µ
· ¡
¸¶
¢−1
¡
¢−1
Y (i)T HX (i)
0
−
Hi+1 = HX Y T HX
Y T H (i = 1, . . . , k) . (3.35)
0
0
Proof. Formula (3.29) gives
Let ω =
imply
Pl
³
´−1
Y (i)T H
Hi+1 = H − HX (i) Y (i)T HX (i)
j=1 lj .
Hi+1
(i = 1, . . . , k) .
¡
¢−1 T
Relations H = HX Y T HX
Y H, X (i) = XI |ω and Y (i) = Y I |ω
·
³
´−1 ³ ´T ¸
¡ T
¢−1
|ω
(i)T
(i)
Y
I |ω
= HX Y HX
−I
HX
Y T H,
which clearly has the form in question.
The canonical form (3.35) indicates that the rank reduction procedure implicitly
approximates the inverse of Y T HX by the inverses of the leading principal submatrices bordered with zero matrices of appropriate size. Thus another connection with the
bordered inversion method is clariÞed. Two other canonical forms are given in [113].
3.3
Rank reduction and factorizations
Pk
For the rest of this chapter we assume that i=1 li = rank (H). Let B = LB DB UB be the
unique block LDU -decomposition of matrix B ∈ Rm×m with unit block lower triangular
LB , block diagonal DB and unit block upper triangular UB .
The following consequence of Theorem 69 gives the components of the full rank
factorization (3.28) in terms of the parameters X and Y . The result will play a key role
in the conjugation via the rank reduction.
Theorem 72 Let Y T HX be block strongly nonsingular. The components of the full rank
factorization (3.28) are
P = H T Y L−T
Y T HX ,
Q = HXUY−1T HX ,
D = DY T HX .
(3.36)
Proof. From the proof of Theorem 69 (equation (3.32)) it follows that L =
Y T H and Q = HXUY−1T HX .
LY T HX , D = DY T HX , U = UY T HX , P T = L−1
Y T HX
¢
¡
T
Here Z −T stands for Z −1 . Matrices P and Q can also be written in the form
−1
P = H T Y UX
T HT Y
(3.37)
Rank reduction and factorizations
45
and
Q = HXL−T
XT HT Y .
(3.38)
The role of P and Q can be changed by transposing the sequence (3.24). Hence the
results on P can be given for Q, as well. It follows from Theorem 72, that all full rank
factorizations can be generated by the rank reduction procedure.
¡
¢−1
Proposition 73 Let H = F G be any full rank factorization and let X = GT GGT
¡ T ¢−1
and Y = F F F
. Then Q = F , D = I and P T = G.
Proof. As Y T HX = I is strongly nonsingular, the rank reduction procedure is
breakdown free. Formula (3.36) yields Q = F and P = GT .
In the rest of this section we assume that all matrices H, X, Y ∈ Rm×m are
¤k
£
nonsingular and Y T HX = YiT HXj i,j=1 is block strongly nonsingular. Then Q and P
are also nonsingular and can be written in the form
P T = DY T HX UY T HX X −1 ,
Q = Y −T LY T HX DY T HX .
(3.39)
Using (3.39) one can easily prove the following results.
Proposition 74 P (Q) is block lower triangular, if and only if X (Y ) is block upper
triangular.
e is block upper triangular
Proof. The matrix P T = (DY T HX UY T HX ) X −1 = U
−1
−1 −1
e
if and only if X = U UY T HX DY T HX is also block upper triangular. Similarly, Q =
e is block lower triangular if and only if Y T is lower triangular.
Y −T LY T HX DY T HX = L
Corollary 75 Q is block lower triangular and P T is block upper triangular, if and only
if X and Y are block upper triangular.
If both Q and P are block lower triangular, then factorization (3.28) simpliÞes to
the block LDU type factorization
¡ −1 −1 −1 ¢
H = (LH DH DX ) DX
DH DY T (DY T DH UH ) .
Proposition 76 Let B ∈ Rm×m be a symmetric and positive deÞnite matrix. P is Borthogonal, if and only if X = BH T Y U holds with any nonsingular Y and a nonsingular
upper triangular U .
Proof. P is B-orthogonal, if and only if
P T BP = L−1
Y T HBH T Y L−T
= D,
Y T HX
Y T HX
where D is diagonal. This holds exactly if Y T HBH T Y = LY T HX DLTY T HX . This implies LY T HBH T Y = LY T HX . Hence a nonsingular upper triangular U exists such that
Y T HBH T Y U = Y T HX, from which X = BH T Y U follows. As B is symmetric and positive deÞnite, the nonsingularity of Y and U is the condition for the strong nonsingularity
of Y T HX.
Corollary 77 P is orthogonal up to a diagonal scaling and Q is lower triangular, if and
only if X = H T Y U holds with any nonsingular upper triangular U and Y . In such a case,
factorization (3.28) is the LQ-factorization of H (QR factorization of H T ).
46
The rank reduction procedure of Egerváry
Proposition 78 Let B ∈ Rm×m be a symmetric and positive deÞnite matrix. Q is Borthogonal, if and only if Y = BHXLT holds with any nonsingular X and a nonsingular
lower triangular L.
Proof. Q is B-orthogonal, if and only if
−1
T T
QT BQ = UY−T
T HX X H BHXUY T HX = D,
where D is diagonal. This holds exactly if X T H T BHX = UYT T HX DUY T HX from which
UX T H T BHX = UY T HX follows. Hence a nonsingular lower triangular matrix L exists
such that LX T H T BHX = Y T HX. This implies that Y = BHXLT . As B is symmetric and positive deÞnite, the nonsingularity of X and L is the condition for the strong
nonsingularity of Y T HX.
T
Corollary 79 Q is orthogonal up to a diagonal scaling and Pm
is upper triangular, if and
T
only if Y = HXL holds with any nonsingular lower triangular L and upper triangular
X. In such a case, factorization (3.28) is the QR-factorization of H.
Proposition 80 P is block upper Hessenberg, if and only if H T Y is block upper Hessenberg. Q is block upper Hessenberg, if and only if HX is block upper Hessenberg.
Proof. The block upper Hessenberg form is invariant under multiplication by
block upper triangular matrices from both sides. By deÞnition P = H T Y L−T
Y T HX = F
is block upper Hessenberg, if and only if H T Y = F LTY T HX is block upper Hessenberg.
Similarly, Q = HXUY−1T HX = F is upper Hessenberg, if and only if HX = F UY T HX is
block upper Hessenberg.
If H = I, then we have the reciprocal relation I = QD−1 P T , so that
¢−1
¡
.
(3.40)
QT = P D−1
In this case the rank reduction algorithm produces both P and its inverse P −1 apart from
a diagonal scaling. We exploit this property in the following observation on the Lánczos
reduction to tridiagonal form (see, e.g., [146]).
¤
£
If
Theorem 81 Let Y1 ∈ Rm be such that Y = Y1 , BY1 , . . . , B m−1 Y1¡ is nonsingular.
¢
T
T
−1
H = I and X is such that Y X is strongly nonsingular,
is an upper
then Q B P D
h
¡ T ¢m−1 i
T
Hessenberg matrix similar to B. If, in addition X = X1 , B X1 , . . . , B
X1 , then
¢
¡
T
−1
Q B PD
is a tridiagonal matrix, which is similar to B.
P
i−1
Proof. As B m Y1 = m
Y1 , the relation BY = Y F holds, where the
i=1 γi B
m
companion matrix F = [fij ]i,j=1 is deÞned by

 1, i = j + 1,
fij =
γi , j = n,

0, otherwise.
By the reciprocal relation (3.40) the matrix
¢
¡
¡
¢
QT B P D−1 = DY T X LTY T X Y −1 BY L−T
D−1
YTX YTX
−T
−1
T
= DY T X LY T X F LY T X DY T X ,
(3.41)
is a similarity transformation of B. As any upper Hessenberg matrix F multiplied by
two upper triangular matrices keeps the upper Hessenberg form, the rank reduction algorithm deÞnes a similarity transformation of matrix B to upper Hessenberg form. If
Rank reduction and conjugation
47
h
¡ ¢m−1 i
X1 for a suitable vector X1 ∈ Rm , then B T X = XF1
X = X1 , B T X1 , . . . , B T
holds with another companion matrix F1 . So relation (3.41) becomes
¢
¡ T
¢ T
¡
−T
T T
UY T X = UY−T
QT B P D−1 = UY−T
T X X BX
T X F1 UY T X ,
matrices. This
where the lower Hessenberg matrix F1T is multiplied by two lower
¢
¡ triangular
multiplication keeps the lower Hessenberg form. Thus QT B P D−1 is of both upper and
lower Hessenberg form. Hence the matrix must be tridiagonal.
The rank reduction algorithm deÞnes a similarity transformation of matrix B to
tridiagonal form giving the transformation matrices QT and Q−T simultaneously.
3.4
Rank reduction and conjugation
We use the following concepts of conjugate directions [229], [50].
DeÞnition 82 Let A ∈ Rm×n , V T AU be nonsingular, U = [U1 , U2 , . . . , Ur ] (Ui ∈ Rn×li )
and V = [V1 , V2 , . . . , Vr ] (Vi ∈ Rm×li ). The pair (U, V ) is said to be block£ A-conjugate
¤r
(with respect to the partition {l1 , l2 , . . . , lr }), if the matrix L = V T AU = ViT AUj i,j=1
is block lower triangular. The pair (U, V ) is said to be block A-biorthogonal (biconjugate),
if the matrix L = V T AU is nonsingular block diagonal.
If li = 1 for all i, we simply speak of A-conjugacy and A-biorthogonality (Abiconjugacy). We also say that P is A-orthogonal, if P T AP = D is diagonal nonsingular.
If A = I, then P is orthogonal.
Let us assume that H ∈ Rm×n , X and Y are such that Y T HX is block strongly
nonsingular. The following observations show the inherent conjugation properties of the
rank reduction algorithm (3.24).
Proposition 83 If X = AT V and Y = B T W , then the pair (P, V ) is block A-conjugate
and the pair (Q, W ) is block B-conjugate.
Proof. It follows from Theorem 72, that
−1
V T AP = X T H T Y UX
T H T Y = LX T H T Y DX T H T Y .
and
W T BQ = Y T HXUY−1T HX = LY T HX DY T HX .
In view of Proposition 83 we can say that the full rank factorization by the rank
reduction algorithm is essentially a conjugation algorithm. The next result shows that all
possible conjugate pairs can be generated in this way.
Proposition 84 For any block A-conjugate pair (P, V ) there exists a matrix Y with orthogonal columns such that the rank reduction algorithm (3.24) with H = I, X = AT V
and Y generates P .
e
Proof. Let P = QDR
1 be a QR-factorization of P with diagonal D and unit
−1
T e
e
upper triangular R1 . As V T AP = X T P = X T QDR
1 = L implies X QD = LR1 ,
e is strongly nonsingular. Let H = I, X = AT V and Y = QD.
e
the matrix X T QD
Hence
−1
e
e
=
QDR
.
Theorem 72 implies P = QDUX T QD
1
e
The rank reduction procedure of Egerváry
48
Propositions 83 and 84 indicate how the rank reduction process can be used
directly for A-conjugation. We can write the rank reduction process in the following
form:
Conjugation via rank reduction
H1 = H
for i = 1, . . . , k
Pi = HiT Yi
¡
¢−1 T
Hi+1 = Hi − Hi AT Vi YiT Hi AT Vi
Yi Hi
end
In general, we have
= H T Y UV−1T AH T Y .
P = [P1 , . . . , Pr ] = H T Y L−T
Y T HAT V
(3.42)
By Proposition 84 we can suppose that H = I.
Proposition 85 Let H − denote any (1)-inverse of H. Then the pair (Q, P ) is block
H − -biconjugate.
Proof. By deÞnition
¡
¢
P T H − Q = L−1
Y T HH − H XUY−1T HX = DY T HX .
Y T HX
Using Proposition 85 we can specialize the rank reduction algorithm to biconjugation. If H is a reßexive inverse, that is H = B (1,2) for some B, then the pair (Q, P )
is block B-biconjugate. Let B ∈ Rm×n , X and Y be such that Y T BX is block strongly
¡
¢−1 T
nonsingular. Let H = X Y T BX
Y be the reßexive inverse of B,
H1 = H,
Hi+1
and
(3.43)
¡
¢−1 T
= Hi − Hi Fi GTi Hi Fi
Gi Hi
Pi = HiT Gi ,
Qi = Hi Fi
(i = 1, . . . , k)
(i = 1, . . . , k) .
(3.44)
(3.45)
As H = B (1,2) holds, the pair (Q, P ) is block B-biconjugate provided that GT HF is
block strongly nonsingular, where F = [F1 , . . . , Fk ] and G = [G1 , . . . , Gk ]. Theorem
−1
T
72 implies P = H T GL−T
GT HF and Q = HF UGT HF . If F = BX and G = B Y , then
T
T
G HF = Y BX is block strongly nonsingular and
,
P = Y L−T
Y T BX
Q = XUY−1T BX .
Hence (3.44) implies
Hi+1 = H1 −
i
X
j=1
¡
¢−1 T
Hj Fj GTj Hj Fj
Gj Hj
from which
Pi+1 = H1T Gi+1 −
i
X
j=1
¡
¢−1 T
Pj FjT HjT Gj
Qj Gi+1
(3.46)
Inertia and rank reduction
49
and
Qi+1 = H1 Fi+1 −
i
X
j=1
¡
¢−1 T
Qj GTj Hj Fj
Pj Fi+1
follow. As Hj = Hj H − Hj = Hj BHj we can write GTj Hj Fj = PjT BQj . Also, by the
special choice of H, we have H1T Gi = Yi and H1 Fi = Xi . Thus we obtained the following
algorithm.
The block two-sided Gram-Schmidt algorithm
P1 = Y1 , Q1 = X1
for i = 1, . . . , k − 1
¡
¢−1 T T
P
Pi+1 = Yi+1 − ij=1 Pj QTj B T Pj
Qj B Yi+1
¡ T
¢−1 T
Pi
Qi+1 = Xi+1 − j=1 Qj Pj BQj
Pj BXi+1
end
Proposition 86 The pair (Q, P ) is block B-biconjugate with P and Q having the form
(3.46).
For more on the Gram-Schmidt method we refer to [141], [50] and [39]. The
two-sided Gram-Schmidt algorithm can be derived from two coupling rank reduction
procedures without using any reßexive inverse [108]. It is also noted that many results of
Chu, Funderlic and Golub [50] can be proved more easily with our technique [108].
3.5
Inertia and rank reduction
Assume that H ∈ Rn×n is symmetric in the whole section and let in (H) = (i+ , i− , i0 ) be
the inertia of H, where i+ , i− and i0 are the number of the positive, the negative and the
zero eigenvalues of matrix H, respectively. Haynsworth proved that, if H is partitioned
as
¸
·
¡
¢
E F
E ∈ Rk×k
∈ Rn×n
H=
T
G
F
and E is nonsingular, then
in (H) = in (E) + in (H/E) .
The Sylvester’s law of inertia reads as
¢
¡
in (H) = in CHC T
(det (C) 6= 0) .
(3.47)
(3.48)
We prove the following property of the symmetric rank reduction
¡
¢
b = H − HX X T HX −1 X T H.
H
Theorem 87 If H is symmetric, then
³
¢−1 T ´
¡
¢
¡
X H = in (H) − in X T HX + (0, 0, k) .
in H − HX X T HX
Proof. Let
B=
·
X T HX
HX
XT H
H
¸
.
(3.49)
The rank reduction procedure of Egerváry
50
The result of Haynsworth implies
³ ´
¡
¢
¡
¢
¡
¢
b .
in (B) = in X T HX + in B/X T HX = in X T HX + in H
Another expression
in (B) = in
µ·
follows from the decomposition
· T
X
B=
In
H
0
0
0
¸¶
Ik
0
¸·
H
0
= in (H) + (0, 0, k)
0
0
¸·
X
Ik
In
0
¸
and the Sylvester’s law of inertia. A simple comparison gives the statement.
Consider the symmetric rank reduction procedure: Let H1 = H,
¡
¢−1 T
¡
¢
Xi ∈ Rn×li , i = 1, . . . , k
Hi+1 = Hi − Hi Xi XiT Hi Xi
Xi Hi
and rank (H) =
implies
Pk
i=1 li
(3.50)
= r. Then
¢
¡
in (Hi+1 ) = in (Hi ) − in XiT Hi Xi + (0, 0, li )
(0, 0, n) = in (Hk+1 ) = in (H) −
k
X
i=1
(i = 1, . . . , k)
Ã
!
k
X
¢
¡ T
in Xi Hi Xi + 0, 0,
li .
i=1
Hence we have
Theorem 88 For any symmetric H ∈ Rn×n ,
in (H) =
k
X
i=1
¢
¡
in XiT Hi Xi + (0, 0, n − r) .
(3.51)
Thus the inertia of a symmetric matrix can be obtained from the symmetric rank
reduction algorithm. Theorems 87 and 88 are improvements of the results due to Zhang
Liwei [252], [253] who proved them only for nonsingular matrices and the scalar rank
reduction algorithm (li = 1, ∀i).
In the symmetric case factors (3.36) become
−1
P = Q = HXUX
T HX ,
D = DX T HX .
(3.52)
Thus the obtained full rank factorization H = QD−1 QT is a kind of LDLT factorization,
although Q is not necessarily lower triangular.
If X is a partial permutation matrix, then the algorithm is equivalent with that
of Cottle [55]. The only difference is that the Cottle algorithm discards the zero rows
and columns (all elements outside the generalized Schur complement (3.19)), while the
symmetric rank reduction algorithm (unnecessarily) keeps them.
It is interesting to know if a partial permutation matrix ·X = Πα¸exists such that
0 1
. Consider now
ΠTα HΠα is strongly nonsingular. There is no such Πα , if H =
1 0
h in
(i)
6= 0. As Cottle [55] points out, there are exactly two possible cases:
Hi = alj
l,j=1
Inertia and rank reduction
(i) ∃j : eTj Hl ej 6= 0;
51
(i)
(ii) eTj Hl ej = 0 (∀j), ∃p, q : apq 6= 0.
We can select Xi = ek in the Þrst case and Xi = [ep , eq ] in the second case. Accordingly,
XiT Hi Xi is nonsingular in both cases. Thus the symmetric rank reduction process can
be continued until Hi becomes zero. Hence a partial permutation matrix X = Πα always
exists such that ΠTα HΠα is block strongly nonsingular. The submatrix XiT Hi Xi of Hi
is called the pivot block. For best pivoting strategies we refer to Cottle [55], Bunch and
Parlett [41].
For the use of inertia in optimization we refer to [129], [127] and [253].
52
The rank reduction procedure of Egerváry
Chapter 4
FINITE PROJECTION METHODS FOR LINEAR SYSTEMS
We derive the class of conjugate direction methods as a recursive Galerkin-Petrov projection method. We also summarize certain known results on conjugate direction methods
to which we add some elementary, but apparently new results. We introduce the ABS
methods as conjugate direction methods with a special rank reduction based conjugation
technique. We give a necessary and sufficient characterization for ABS conjugation algorithm to be breakdown free. Introducing a special subclass of the ABS methods called
generalized implicit LU ABS (GILUABS) methods [104], [100] we give a full theoretical
characterization of such ABS methods. The ABS conjugation algorithm of the GILUABS
subclass is identical with the rank reduction conjugation. Finally we show the stability
of the conjugate direction methods [97] and the rank reduction based (or GILUABS)
conjugation algorithm [93].
4.1
The Galerkin-Petrov projection method
Here we derive the conjugate direction methods as recursive Galerkin-Petrov projection
methods. The projection methods can be deÞned in several ways. Consider the equation
Ax = y,
(4.1)
where A ∈ Rm×n . The projection method is deÞned as follows [173], [30]. Let {Xk } ⊂ Rn
and {Yk } ⊂ Rm be sequences of linear subspaces, and let Pk : Rm → Yk be projections.
Now replace equation Ax = y by the approximation
Pk (Axk − y) = 0
(xk ∈ Xk ) .
(4.2)
If m = n and Xk = Yk (k = 1, 2, . . . ), the projection method is called the Galerkin
n
m
method. Assume that {φj }j=1 ⊂ Rn and {ψj }j=1 ⊂ Rm are such that R (φ1 , . . . , φn ) =
Rn and R (ψ1 , . . . , ψm ) = Rm . Let Xk = R (φ1 , . . . , φik ) and Yk = R (ψ1 , . . . , ψik ), ik <
ik+1 . Assume that Pk : Rm → Yk is the orthogonal projection. Then Pk (Axk − y) = 0
holds if and only if
Axk − y ⊥ ψj
(j = 1, . . . , ik ) .
(4.3)
This version is called the Galerkin-Petrov projection method.
We now consider the linear systems of the form
Ax = b (A ∈ Rm×m ),
(4.4)
where A is nonsingular and the exact solution is ω = A−1 b. Let φk = Uk ∈ Rm×mk
and ψk = Vk ∈ Rm×mk (k = 1, . . . , r) be such that U = [U1 , . . . , Ur ] ∈ Rm×m and
V = [V1 , . . . , Vr ] ∈ Rm×m are nonsingular. Furthermore let
³ ´
(4.5)
Xk = R U |k = R ([U1 , . . . , Uk ]) ,
54
Finite projection methods for linear systems
´
³
Yk = R V |k = R ([V1, . . . , Vk ])
(4.6)
nk
and dim (Xk ) = dim (Yk ) = nk . If xk ∈ Xk , then there exists a unique ak ∈
¡ R|k ¢ such that
|k
it follows
xk = U ak . From the orthogonality condition rk = Axk − b ⊥ Yk = R V
that
V |kT (Axk − b) = 0.
Thus we obtain V |kT AU |k ak = V |kT b. If V |kT AU |k is nonsingular, then
´−1
³
ak = V |kT AU |k
V |kT b.
Hence
³
´−1
V |kT b
xk = U |k V |kT AU |k
³
´−1
= U |k V |kT AU |k
V |kT Aω,
(4.7)
(4.8)
¡
¢−1 |kT
¡
¢
¡ ¢
where the matrix U |k V |kT AU |k
V
A is a projector onto R U |k along R⊥ AT V |k .
Any of the expressions yields
Proposition 89 For k = r the Galerkin-Petrov projection method gives the exact solution, that is xr = ω.
We seek for a class of projection methods, where xk is recursively calculated from
xk−1 and the initial point x0 is arbitrary. Let xk = ∆k + x0 and consider the solution of
Pk (A (∆k + x0 ) − b) = Pk (A∆k − (b − Ax0 )) = 0.
Here ∆k plays the role of xk and −r0 = b − Ax0 plays the role of b. Hence the above
formula becomes
³
´−1
V |kT r0 .
(4.9)
xk = x0 − U |k V |kT AU |k
The residual error rk = Axk − b has the form
µ
¶
³
´−1
|k
|kT
|k
|kT
V
AU
V
r0 ,
rk = I − AU
(4.10)
¡
¢−1 |kT
¡
¢
¡
¢
where the matrix I−AU |k V |kT AU |k
V
is a projector onto R⊥ V |k along R AU |k .
The absolute error sk = xk − ω can also be written as
µ
¶
³
´−1
V |kT A s0 ,
(4.11)
sk = I − U |k V |kT AU |k
¡
¢−1 |kT
¡
¢
¡ ¢
where I − U |k V |kT AU |k
V
A is a projector onto R⊥ AT V |k along R U |k .
Let us consider the difference between xk−1 and xk . By deÞnition
xk − xk−1 = − (Zk − Zk−1 ) s0 ,
(4.12)
¡
¢−1 |kT
¢
¡
where the matrix Zk = U |k V |kT AU |k
V
A is a projector onto R U |k along
¡
¢
R⊥ AT V |k for k = 1, . . . , r.
The Galerkin-Petrov projection method
55
¡
¢
¡
¢
¡ ¢
Lemma 90 Zk −Zk−1 is a projector onto R U |k ∩ R⊥ AT V |k−1 along R⊥ AT V |k ⊕
¡
¢
R U |k−1 .
Proof. We exploit the Banachiewicz inversion formula (3.11). Let Bk = V |kT AU |k .
Then
Bk =
and
Bk−1
=
·
−1
Bk−1
0
0
0
¸
+
"
"
Bk−1
T
Vk AU |k−1
#
¡ |k−1 ¢T
V
AUk
VkT AUk
#
¡ |k−1 ¢T
h
i
−1
V
AUk S −1 V T AU |k−1 B −1 , −I .
Bk−1
k
k−1
−I
By simple calculations we can easily demonstrate that Zk Zk−1 = Zk−1 Zk = Zk−1 . We
also exploit the following known result. Suppose that P1 is projection onto M1 along N1
and P2 is projection onto M2 along N2 . Then P = P1 − P2 is projection, if and only if
P1 P2 = P2 P1 = P2 . If P1 P2 = P2 P1 = P2 , then P is the projection
¡ M = M¢1 ∩ N2
¡ ¢ onto
along N = N1 ⊕ M2 . Hence Zk − Zk−1 is a projector onto R U |k ∩ R⊥ AT V |k−1 along
¡
¢
¡
¢
R⊥ AT V |k ⊕ R U |k−1 .
We restrict projectors Zk − Zk−1 such that the equality
³
´
³ ´
R U |k ∩ R⊥ AT V |k−1 = R (Uk )
(4.13)
should hold. This means that xk − xk−1 ∈ R (Uk ). The elements of the left-hand side can
¢T
¡
be written as U |k y, where y ∈ Rmk satisÞes V |k−1 AU |k y = 0. The elements of R (Uk )
are of the form Uk z (z ∈ Rmk ). Relation (4.13) holds if and only if for any U |k y there
¢T
¡
is a corresponding
z such that U |k y = Uk z and vice versa. However, V |k−1 AU |k y =
´
³
¡ |k−1 ¢T
¢T
¡
V
AUk z = 0 (∀z ∈ Rmk ) implies that V |k−1 AUk = 0. This holds for all
k = 2, . . . , r if and only if V T AU is block lower triangular.
Assume that V T AU = L is a nonsingular block lower triangular matrix and let
|kT
AU |k = Lk for k = 1, . . . , r. Then
V
¸−1
´−1 ·
³
Lk−1
0
|kT
|k
V
AU
=
V T AU |k−1 VkT AUk
" k
#
L−1
0
k−1
¢−1 T
¡ T
¢−1
¡
=
Vk AUk
Vk AU |k−1 L−1
− VkT AUk
k−1
and
µ
³
´T ¶
¡
¢−1 T
|k−1
Zk − Zk−1 = Uk VkT AUk
V
Vk A I − U |k−1 L−1
A .
k−1
As Ask = rk we obtain that
and
¡
¢−1 T
¡
¢−1 T
Vk Ask−1 = Uk VkT AUk
Vk rk−1
(Zk − Zk−1 ) s0 = Uk VkT AUk
¡
¢−1 T
xk − xk−1 = −Uk VkT AUk
Vk rk−1
(k = 1, . . . , r) .
(4.14)
Theorem 91 If V T AU is a nonsingular block lower triangular matrix, the GalerkinPetrov method (4.9) has the Þnitely terminating recursive form (4.14).
In the next section we show that the obtained subclass of the Galerkin-Petrov
method is identical with the conjugate direction methods of Stewart [229].
Finite projection methods for linear systems
56
4.2
The conjugate direction methods
The concept of conjugate direction methods is due to Stewart [229], who obtained it as a
common generalization of different conjugate gradient methods known at that time.
Assume that A, U, V ∈ Rm×m are nonsingular, and the pair (U, V ) is block Aconjugate with respect to the partition {m1 , m2 , . . . , mr } (1 ≤ r ≤ m).
Algorithm CDM (Conjugate Direction Method)
x0 ∈ Rm
for k = 1, . . . , r
rk−1 = Axk−1 − b
dk = (VkT AUk )−1 VkT rk−1
xk = xk−1 − Uk dk
end
The conjugate direction methods are exactly the Þnitely terminating recursive
projection methods (4.14) we speciÞed just before. If V equals to a permutation matrix, then the algorithm becomes a row-action method [43], which is very useful in many
applications [44], [45].
Theorem 92 (Stewart). For any initial point x0 , Algorithm CDM terminates in at most
r iterations, that is
Axr = b.
(4.15)
Another interesting characterization of algorithm CDM or (4.14) is given by Broyden [37].
Theorem 93 (Broyden). Assume that r steps are performed with Algorithm CDM, where
the nonsingular matrices
V = [V1 , . . . , Vr ] ,
U = [U1 , . . . , Ur ]
are such that ViT AUi is nonsingular for i = 1, . . . , r. Then a necessary and sufficient
condition for the vector xr to solve Ax = b for any starting point x0 is that the matrix
L = V T AU be nonsingular block lower triangular.
Hence A-conjugacy is not only a sufficient, but also a necessary condition for the
Þnite termination property. Since the recursive Galerkin-Petrov algorithms are identiÞed with the class of conjugate direction methods, A-conjugacy is the key of the Þnite
termination property and recursivity.
Although we demonstrated the Þnite termination property we also prove Theorem
92 for its values and intrinsic characterization of the algorithm.
Let sk = xk − ω. Then
¡
¢−1 T
Vk rk−1 .
sk = xk−1 − Uk dk − ω = sk−1 − Uk VkT AUk
Now rk = Ask implies
sk = (I − Rk ) sk−1 ,
k = 1, . . . , r,
where
is a projector of rank mk with
¡
¢−1 T
Rk = Uk VkT AUk
Vk A
¡
¢
N (Rk ) = R⊥ AT Vk ,
R (Rk ) = R (Uk ) .
The conjugate direction methods
57
Generally we have
sk = (I − Rk ) (I − Rk−1 ) . . . (I − R1 ) s0 ,
k = 1, . . . , r.
(4.16)
As the pair (U, V ) is block A-conjugate, projectors R1 , . . . , Rr satisfy Ri Rj = 0 for i < j.
Hence it is natural to introduce the concept of conjugate projectors [229].
DeÞnition 94 Let R1 , . . . , Rk be projectors. Then R1 , . . . , Rk are conjugate projectors,
if
i < j ⇒ Ri Rj = 0.
(4.17)
We note that
R (Ri ) ∩ R (Rj ) = {0} ,
i 6= j
holds for conjugate projectors {Ri }ki=1 . Assume that there is a vector x 6= 0 such that
x ∈ R (Ri ) ∩ R (Rj ) holds for some i < j. DeÞnition 94 then implies that x = Ri x =
Ri Rj x = 0, which is a contradiction. The following result of Stewart [229] will be exploited
in several ways.
Theorem 95 If R1 , . . . , Rk are conjugate projectors, then
Qk = (I − Rk ) (I − Rk−1 ) . . . (I − R1 )
(4.18)
R (Qk ) = ∩kj=1 N (Rj )
(4.19)
is a projector with
and
N (Qk ) =
k
X
j=1
R(Rj ).
(4.20)
Proof. For i ≤ k it follows from DeÞnition 94 that Ri (I − Rk ) . . . (I − R1 ) = 0
and Q2k = Qk proving that Qk is a projector. We recall that R(H) = {x|Hx = x} holds
for any projector H. If x ∈ R(Qk ), then Rj x = Rj (Qk x) = (Rj Qk )x = 0x = 0 implies
that x ∈ N (Rj ) for all 1 ≤ j ≤ k. Hence R(Qk ) ⊆ ∩kj=1 N (Rj ). If x ∈ ∩kj=1 N (Rj ), then
Qk x = (I − Rk ) . . . (I − R1 )x = x ∈ R(Qk ),
which proves the relation R(Qk ) = ∩kj=1 N (Rj ). As the transposed projectors RTk , . . . , R1T
are also conjugate R(QTk ) = ∩kj=1 N (RjT ). The relation N (RjT ) = R⊥ (Rj ) implies
P
P
R(QTk ) = [ kj=1 R(Rj )]⊥ , from which N (Qk ) = kj=1 R(Rj ) readily follows.
Remark 96 N (Qk ) is the direct sum of subspaces R (Rj ).
The proof of Theorem 92. If we apply Theorem 95 to (4.16) we obtain that
¡
¢
R (Qk ) = ∩kj=1 R⊥ AT Vj

⊥
k
X
¡ T ¢
=
R A Vj 
j=1
³
´
= R⊥ AT V |k
´
³
= R [Uk+1 , . . . , Ur ] = R U r−k|
Finite projection methods for linear systems
58
and
N (Qk ) =
k
X
j=1
³ ´
R (Uj ) = R ([U1 , . . . , Uk ]) = R U |k .
For k = r the matrix Qr becomes the zero projector with
N (Qr ) = R ([U1 , . . . , Ur ]) = Rm ,
R (Qr ) = {0} .
Hence sr = Qr s0 = 0 proving Theorem 92.
It is clear that
Qk = PR(U r−k| ),R(U |k ) ,
where the Householder notation is used blockwise.
We prove the following simple results on the absolute and residual error of the
iterates xk .
Proposition 97 The absolute errors sk of Algorithm CDM satisfy the inequality
ksk k ≤ kQk k ks0 k
(k = 1, . . . , r).
(4.21)
In any unitarily invariant matrix norm, kQk k is minimal for all k if and only if U is
orthogonal. In any submultiplicative unitarily invariant matrix norm
kQk k ≤ cond (U )
(k = 1, . . . , r) .
(4.22)
Proof. The claim follows from Proposition 177 and Lemma 182. The projectors
Qk have minimal norm, if Qk = QTk for all k. The matrix Qk is symmetric if and only if
³
´
³ ´
(k = 1, . . . , r).
R U |k = R⊥ U r−k|
In other words, all Qk are symmetric if and only if U is orthogonal.
Proposition 98 The residual errors rk of Algorithm CDM satisfy
where
Hence
e k r0 ,
rk = Q
³
´
³ ´
ek = R (AU)r−k| ,
R Q
° °
°e °
krk k ≤ °Q
k ° kr0 k
³ ´
³
´
ek = R (AU)|k .
N Q
(4.23)
(4.24)
(k = 1, . . . , r) .
° °
°e °
In any unitarily invariant matrix norm, °Q
k ° is minimal for all k if and only if U is
AT A-orthogonal. In any submultiplicative unitarily invariant matrix norm
° °
°e °
°Qk ° ≤ cond (AU) (k = 1, . . . , r) .
(4.25)
The ABS projection methods
59
Proof. Since rk = Ask , we have
e k r0 ,
rk = Ask = AQk s0 = Q
where
n
¡
¢ o
¡
¢
ek = AQk A−1 = A U r−k| U −1 r−k A−1 = (AU )r−k| U −1 A−1 r−k
Q
(4.26)
(4.27)
³
´
³
´
³ ´
³ ´
¡
¢
ek = R (AU )r−k| = R AU r−k| and N Q
ek = R (AU )|k =
is a projector with R Q
³ ´
¡
¢
¡
¢
ek = R⊥ V |k .
R AU |k . From the deÞnition of block A-conjugacy it follows that R Q
ek is symmetric if and only if
Projector Q
³
´
³
´
R AU r−k| = R⊥ AU k|
(k = 1, . . . , r) .
ek are symmetric if and only if U T AT AU = I.
All Q
Corollary 99 The iterates of Algorithm CDM satisfy
kxk k ≤ kI − Qk k kωk + kQk k kx0 k
(k = 1, . . . , r) .
(4.28)
If the matrix norm is unitarily invariant and submultiplicative, then
kxk k ≤ cond (U ) (kωk + kx0 k)
(k = 1, . . . , r) .
(4.29)
Proof. It is obvious that
xk = (I − Qk ) ω + Qk x0 ,
k = 1, . . . , r,
which implies the Þrst bound. Lemma 182 implies
kQk k ≤ cond (U ) ,
kI − Qk k ≤ cond (U )
and the second inequality.
Remark 100 If all Qk are symmetric, that is U is orthogonal, then
kxk k2 ≤ kωk2 + kx0 k2 .
The conjugate direction methods differ from each other in the way they generate
the conjugate directions. There are several conjugation procedures, especially for biconjugation. Most of them are based on Krylov sequences and go with the calculation of
the iterates {xk }. Concerning these we refer to the cited literature (see, e.g., [246], [141],
[212]). General conjugation algorithms are given by Stewart [229], Voyevodin [246] and
Abaffy, Broyden and Spedicato [5], [9]. The latter is based on the rank reduction procedure of Egerváry. In Section 3.4 we showed that the rank reduction procedure itself is a
general conjugation algorithm.
4.3
The ABS projection methods
The development of ABS methods started when Abaffy [1] generalized the Huang method
[149] to a class of methods with rank two updates and three sequences of parameters.
Abaffy’s method was then modiÞed and the Þnal version is known as the ABS method of
Abaffy, Broyden and Spedicato [5], [9], [10], [11], [12]. The ABS method is a combination
60
Finite projection methods for linear systems
of the conjugate direction method (Algorithm CDM ) and the rank reduction based ABS
conjugation procedure.
ABS method
x1 ∈ Rm , H1 = I
for k = 1, . . . , m
pk = HkT zk (zk ∈ Rm )
¡
¢−1 T
v (Axk − b)
xk+1 = xk − pk vkT Apk
¡ T k T ¢−1 T
T
Hk+1 = Hk − Hk A vk wk Hk A vk
wk Hk (wk ∈ Rm )
end
The block version of this method [6] is as follows.
Block ABS method
x1 ∈ Rm , H1 = I
for k = 1, . . . , r
Pk = HkT Zk (Zk ∈ Rm×mk )
¡
¢−1 T
V (Axk − b)
xk+1 = xk − Pk VkT APk
¡ T k T ¢−1 T
T
Hk+1 = Hk − Hk A Vk Wk Hk A Vk
Wk Hk (Wk ∈ Rm×mk )
end
A particular ABS method is deÞned by the parameter matrices V = [V1 , . . . , Vr ],
Z = [Z1 , . . . , Zr ] and W = [W1 , . . . , Wr ] (and the partition {m1 , m2 , . . . , mr }). The
conjugation procedure of the ABS algorithm is the following.
The ABS conjugation algorithm
H1 ∈ Rm×m , det (H1 ) 6= 0
for k = 1, . . . , r
Pk = HkT Zk
¡
¢−1 T
Hk+1 = Hk − Hk AT Vk WkT Hk AT Vk
Wk Hk
end
The calculation of matrices Hk is just the rank reduction procedure with the
choice Xk = AT Vk and Yi = Wi (k = 1, . . . , r).
Proposition 101 The ABS conjugation algorithm produces A-conjugate pairs (P, V ).
Proof. Let X = AT V . For i < j the relation Xi ∈ N (Hj ) implies XiT HjT =
= XiT Pj = ViT APj = 0. Hence the matrix L = X T P = V T AP is block lower
triangular.
If Zk = Wk for all k = 1, . . . , r, then the ABS conjugation algorithm is the same
as the rank reduction conjugation algorithm with Xk = AT Vk .
XiT HjT Zj
DeÞnition 102 The case Zk = Wk (k = 1, . . . , r) is called the generalized implicit LU
ABS (GILUABS) method [100], [102].
In this case the ABS conjugation is identical with the rank reduction conjugation
of Section 3.4. Hence Propositions 83, 84, 85 and 86 are valid for the GILUABS methods.
The GILUABS methods are breakdown free if and only if the matrix V T AH1T W is strongly
nonsingular, in which case the direction matrix is given by
P = [P1 , . . . , Pr ] = H1T W UV−1T AH T W .
(4.30)
1
We seek conditions under which the general ABS conjugation algorithm produces nonsingular A-conjugate pairs (P, V ). This happens if the matrices VkT APk and
WkT Hk AT Vk 6= 0 are all nonsingular for i = 1, . . . , r. For simplicity, we use the notation
The ABS projection methods
61
X = AT V . Hence we require XkT Pk and WkT Hk Xk to be nonsingular for all i. It follows from Theorem 69 that the latter condition holds, if and only if W T H1 X is strongly
nonsingular. By deÞnition X1T P1 = X1T H1T Z1 and
³
´−1
T
T
T
Xk+1
Pk+1 = Xk+1
H1T Zk+1 − Xk+1
H1T W |k X |kT H1T W |k
X |kT H1T Zk+1 ,
which is the Schur complement of the bordered matrix
³
´T
h
i · X |kT H T W |k
|k+1
T
|k
1
X
H1 W , Zk+1 =
T
Xk+1
H1T W |k
X |kT H1T Zk+1
T
Xk+1
H1T Zk+1
¸
.
By the Guttman lemma
µ³
´T
h
i¶
´
³
¢
¡ T
Pk+1 = rank X |k+1 H1T W |k , Zk+1 − rank X |kT H1T W |k .
rank Xk+1
T
The matrix Xk+1
Pk+1 is nonsingular, if its rank is equal to its size mk+1 . This happens
¡
¢T
£
¤
exactly, if the matrix X |k+1 H1T W |k , Zk+1 is nonsingular. Hence we obtained the
following result [108].
Theorem 103 The ABS conjugation algorithm produces a nonsingular A-conjugate pair
(P, V ), if and only if the matrix W T H1 AT V is block strongly nonsingular, and the matrices
¢T
£
¤
¡
V1T AH1T Z1 6= 0, V |k+1 AH1T W |k , Zk+1 (k = 1, . . . , r − 1) are nonsingular.
If Wk = Zk (k = 1, . . . , r), then we obtain the former GILUABS result.
There are many special cases and properties of the ABS algorithms that can
be found in Abaffy and Spedicato [9], [11], [12]. Optimization applications are given in
Abaffy, Galántai, Spedicato [7], Spedicato et al. [225], [226], [227], Feng, Wang, Wang
[83] and Zhang, Xia, Feng [253]. For the extent of the research on ABS methods we refer
to the surveys and bibliographies [221], [243], and [198].
Since by Proposition 84 even a restricted set of parameters Y = Z = Q is enough
to generate all conjugate pairs (direction methods), we can restrict our studies to the
GILUABS or rank reduction conjugation case.
We now identify three special cases of the algorithm. In all cases wk = zk = qk
holds for k = 1, . . . , m.
Case A: The implicit LU algorithm
The implicit LU algorithm is deÞned by the parameters W = Z = I, when
P = UA−1 . Thus the algorithm works with the inverse of the upper triangular factor of
the LU decomposition of matrix A. It is clear from Theorem 69 that A must be strongly
nonsingular (see also [5], [9]). For the special properties of the implicit LU method we
refer to Abaffy and Spedicato [9] and Fletcher [85].
Fletcher [85] gives a historical survey on the implicit LU method and points out
that the seemingly different methods of Fox, Huskey, Wilkinson (1948), Hestenes, Stiefel
(1952), Purcell (1953, [80]), Householder (1955), Pietrzykowski ([207], 1960), Faddeev,
Faddeeva (1963), Stewart (1973), Enderson, Wassyng (1978), Sloboda (1978), Wassyng
(1982), Abaffy, Broyden, Spedicato (1984), Hegedý
us (1986), Benzi and Meyer ([22], 1995)
are essentially the implicit LU method. Fletcher also points out the advantages of the
implicit LU method in linear programming and related calculations. Other optimization
applications of the implicit LU algorithm are given in Zhang, Xia and Feng [253].
Numerical testing of the block LU algorithm are given by Bertocchi, Spedicato
[24] and Bodon [28]. Generalizations of the original implicit LU ABS algorithm are given
by Spedicato and Zhu [228].
Finite projection methods for linear systems
62
Case B: The scaled Huang method
The scaled Huang method is deÞned by the parameters wi = zi = AT vi (i =
1, . . . , m). Hence P = AT V UV−1T AAT V is an orthogonal matrix. The matrix V T AAT V
is strongly nonsingular if V is nonsingular. The very useful special properties of the
Huang method are given in [9]. The Huang method is related to the Gram-Schmidt
orthogonalization procedure. This and other connections are studied in [18].
Case C: Symmetrized conjugate direction methods [100], [102]
DeÞnition 104 The symmetrized conjugate direction (SCD) ABS methods are deÞned
by the parameters W = Z = Q and V = F P , where the matrix G = F T A is symmetric
and positive deÞnite.
For a given A it is always possible to Þnd an orthogonal matrix F such that
G = F T A is symmetric and positive deÞnite. The idea of symmetrized conjugate direction
methods came from the subclass S3 of the ABS methods [9]. It can be easily seen [100],
[102] that the subclasses S2-S5 of the ABS methods [9] are special cases of the symmetrized
conjugate direction methods. We prove the following result [100], [102].
Theorem 105 The symmetrized conjugate direction methods can be performed without
breakdown if and only if Q is nonsingular. Furthermore
−1
P = QUQ
T GQ ,
−1
V = F QUQ
T GQ .
(4.31)
Proof. Theorem 69 implies that the conjugation (and the algorithm) can be
performed
without
if V T AQ has the LU factorization V T AQ =
¢ breakdown if and only
¡
−1
LV T AQ DV T AQ UV T AQ . Then P = QUV T AQ and
−1
T T
T
AQUV−1T AQ = UV−T
V T AQ = UV−T
T AQ Q F
T AQ Q GQUV T AQ = LV T AQ DV T AQ .
Since QT GQ is symmetric and positive deÞnite, it follows that
LV T AQ DV T AQ = D
and D is diagonal with positive main diagonal entries.
Thus we obtain QT GQ =
UVT T AQ DUV T AQ which means that QT GQ is strongly nonsingular. As G is positive deÞnite this holds if and only if Q is nonsingular.
Remark 106 The symmetrized conjugate direction methods produce A-biorthogonal pairs
(P, V ) such that V T AP = P T GP = D, where D is a diagonal matrix.
We must emphasize that the SCD ABS methods still within the rank reduction
frame provide a way for biconjugation other than the two-sided Gram-Schmidt algorithm
of Section 3.4.
In principle, each conjugate direction method can be given in equivalent ABS or
other forms. For particular cases, see [9]. Different representations of the same algorithm
may have different performance on computers. It is a difficult task to Þnd the ”best”
representation of a given algorithm. Here we show one example by deriving two ABS
versions of the preconditioned conjugate gradient method.
We Þrst give two equivalent formulations of the SCD ABS methods. These algorithms are solving the equivalent system Gx = F T b. Version B is obtained by applying
the update algorithm with the parameters V = W = Z = Q and G = F T A instead of A.
Symmetrized conjugate direction ABS method (Version A)
y1 ∈ Rm , H1 = I
The ABS projection methods
63
for k = 1, . . . , m
pk = HkT qk
yk+1 = yk − pk (pTk Gpk )−1 pTk rek (e
rk = Gyk − F T b)
T
T
Hk+1 = Hk − Hk Gpk pk /pk Gpk
end
Symmetrized conjugate direction ABS method (Version B)
y1 ∈ Rm , H1 = I
for k = 1, . . . , m
pk = HkT qk
yk+1 = yk − pk (pTk Gpk )−1 pTk rek (e
rk = Gyk − F T b)
T
T
Hk+1 = Hk − Hk Gqk pk /pk Gqk
end
For the rest of section we assume that the coefficient matrix A is symmetric and
positive deÞnite. The classical Hestenes-Stiefel conjugate gradient method is then given
by the following procedure.
Conjugate gradient method
x0 = 0, r0 = b
for k = 1, . . . , m
if rk−1 = 0 then
x = xk−1 and quit
else
T
T
T
rk−1 /rk−2
rk−2
(β1 = 0)
βk = rk−1
pk = rk−1 + βk pk−1 (p1 = r0 )
T
rk−1 /pTk Apk
αk = rk−1
xk = xk−1 + αk pk
rk = rk−1 − αk Apk (rk = b − Axk )
end
end
Abaffy and Spedicato [9] showed that the ABS version the Hestenes-Stiefel conjugate gradient method is given by the parameters W = Z = Q and qk = rk (rk = Axk − b,
k = 1, . . . , m).
The preconditioned conjugate gradient method is derived in the following way
(see e.g., [128]). We consider the linear system
C −1 AC −1 x̃ = C −1 b,
where C is chosen such that the spectral condition number κ2 (C −1 AC −1 ) is small, C is
symmetric and M = C 2 is positive deÞnite. It is also supposed that M is sparse and
easy to solve any linear system of the form M z = d. The conjugate gradient method is
Þrst applied to the system C −1 AC −1 x̃ = C −1 b and then transformed into the following
equivalent form.
Preconditioned conjugate gradient method
x0 = 0, r0 = b
for k = 1, . . . , m
if rk−1 = 0 then
x = xk−1 and quit
else
Solve M zk−1 = rk−1 for zk−1
T
T
βk = zk−1
rk−1 /zk−2
rk−2 (β1 = 0)
pk = zk−1 + βk pk−1 (p1 = z0 )
T
rk−1 /pTk Apk
αk = zk−1
Finite projection methods for linear systems
64
end
xk = xk−1 + αk pk
rk = rk−1 − αk Apk
(rk = b − Axk )
end
We now apply the SCD ABS method to the system C −1 AC −1 x̃ = C −1 b with
the parameters G = C −1 AC −1 , b̃ = C −1 b, qi = r̃i (r̃i = Gỹi − b̃ = C −1 AC −1 ỹi − C −1 b,
i = 1, . . . , m), which correspond to the classical Hestenes-Stiefel method. We then have
Version A
ỹ1 ∈ Rm , H̃1 = I
for k = 1, . . . , m
p̃k = H̃kT r̃k
ỹk+1 = ỹk − p̃k (p̃Tk Gp̃k )−1 p̃Tk r̃k
H̃k+1 = H̃k − H̃k Gp̃k p̃Tk /p̃Tk Gp̃k
end
and
Version B
ỹ1 ∈ Rm , H̃1 = I
for k = 1, . . . , m
p̃k = H̃kT r̃k
ỹk+1 = ỹk − p̃k (p̃Tk Gp̃k )−1 p̃Tk r̃k
H̃k+1 = H̃k − H̃k Gr̃k p̃Tk /p̃Tk Gr̃k
end
We now make appropriate substitutions in both algorithms. Let p = C −1 p̃k ,
−1
yk = C ỹk , rk = C r̃k = Ayk − b and zk = C −1 r̃k . Then Mzk = rk . Furthermore let
Hk = C H̃k C −1 . After substitution we obtain two ABS versions of the preconditioned
conjugate gradient method.
Preconditioned conjugate gradient ABS method (Version A)
y1 ∈ Rm , H1 = I
for k = 1, . . . , m
Solve M zk = rk for zk (rk = Ayk − b)
pk = HkT zk
yk+1 = yk − pk (pTk Apk )−1 pTk rk
Hk+1 = Hk − Hk Apk pTk /pTk Apk
end
Preconditioned conjugate gradient ABS method (Version B)
y1 ∈ Rm , H1 = I
for k = 1, . . . , m
Solve M zk = rk for zk (rk = Ayk − b)
pk = HkT zk
yk+1 = yk − pk (pTk Apk )−1 pTk rk
Hk+1 = Hk − Hk Azk pTk /pTk Azk
end
Version A was Þrst obtained by Vespucci [244] in a different way. Preliminary
numerical testing done in MATLAB showed that the ABS versions of the preconditioned
conjugate gradient method outperformed the original preconditioned conjugate gradient
method on randomly chosen full systems. On sparse systems the original method was
better. In both cases we used the optimal band preconditioners [130].
The selection of the ”numerically best” representation of conjugate direction algorithms is not yet solved. Gáti Attila [123] renewed and signiÞcantly updated an almost
forgotten program of Miller [189], which makes an automatic stability analysis of matrix
algorithms given in a FORTRAN like program form. The Þrst testing of some known
The stability of conjugate direction methods
65
representations of the implicit LU and the Huang methods conÞrmed the usefulness of
Miller’s approach.
4.4
The stability of conjugate direction methods
We study the stability of the Þnitely terminating Galerkin-Petrov projection (or conjugate
direction) methods. Here we follow Broyden’s backward error analysis technique [36], [37],
[38]. The material of this section is based on [97] (see also [9], [11], [12]). The basic idea
of the stability analysis is the following. For the solution of some problem we consider
any Þnite algorithm of the form
Xk = Ψk (Xk−1 ) (k = 1, . . . , r),
(4.32)
where Xr is the solution of the problem. Assume that an error εj occurs at step j and
this error propagates further. It is also assumed that no other source of error occurs. The
exact solution Xr is given by
Xr = Ψr ◦ Ψr−1 ◦ . . . ◦ Ψj (Xj ) = Ωrj (Xj ) ,
(4.33)
Xr0 = Ωrj (Xj + εj ).
(4.34)
while the perturbed solution Xr0 is given by
kXr − Xr0 k
If the quantity
is large, algorithm (4.32) is considered as unstable.
We use the projector technique of Stewart [229] shown in Section 4.2. Let again
¡
¢−1 T
Vk A,
(4.35)
Rk = Uk VkT AUk
where the matrix Rk is a projector of rank mk with R(Rk ) = R(Uk ) and N (Rk ) =
R⊥ (AT Vk ). Using notation (4.35) we can rewrite Algorithm CDM in the form
¡
¢−1 T
xk = (I − Rk )xk−1 + dk (dk = Uk VkT AUk
Vk b, k = 1, . . . , r).
(4.36)
Furthermore let
Qk,j =
½
(I − Rk ) · · · (I − Rj )
I (k < j) .
(k ≥ j),
(4.37)
The solution of recursion (4.36) is then given by
xr = Qr,j+1 xj +
r
X
Qr,i+1 di .
(4.38)
i=j+1
Suppose that an error occurs at step j (0 ≤ j ≤ r − 1) resulting in x0j instead of xj . If
this error propagates further, the modiÞed sequence
x0k = (I − Rk ) x0k−1 + dk
(k = j + 1, . . . , r)
(4.39)
is calculated instead of (4.36). Hence we obtain
x0r = Qk,j+1 x0j +
r
X
Qk,i+1 di
(k = j + 1, . . . , r) .
(4.40)
i=j+1
The error occurring in the Þnal step is thus given by
ω − x0r = xr − x0r = Qr,j+1 (xj − x0j ).
The matrix Qr,j+1 can be considered as the error matrix. Hence we have the error bound
°
°
(4.41)
kω − x0r k ≤ kQr,j+1 k °xj − x0j ° .
Finite projection methods for linear systems
66
DeÞnition 107 (Broyden). A particular method of Algorithm CDM is said to be optimally stable, if kQr,j+1 k is minimal for all j.
Here we assume submultiplicative unitarily invariant matrix norm. As the projectors Rk are conjugate in the sense of DeÞnition 94, Theorem 95 of Stewart implies that
Qr,j+1 is a projector with
R(Qr,j+1 ) = ∩ri=j+1 N (Ri ),
N (Qr,j+1 ) =
r
X
i=j+1
R(Ri ).
(4.42)
It is easy to see that
R(Qr,j+1 ) = ∩ri=j+1 R⊥ (AT Vi ) = R⊥ (AT V r−j| )
(4.43)
and
N (Qr,j+1 ) =
r
X
i=j+1
R(Ui ) = R(U r−j| ) = R⊥ (AT V |j ).
(4.44)
Remark 108 Qr,j+1 = 0 for j = 0 in agreement with Theorem 95. Thus the error
propagation can inßuence the Þnal result only for j ≥ 1.
The projector Qr,j+1 has minimal norm if and only if it is symmetric. A projector
P is symmetric, if and only if R(P ) = N ⊥ (P ). Thus Qr,j+1 is symmetric, if and only if
R(AT V |j ) = R⊥ (AT V r−j| ).
(4.45)
A method is optimally stable, if and only if (4.45) is satisÞed for all j. The latter condition
is equivalent to the orthogonality condition
AT Vi ⊥ AT Vj
(i 6= j) .
(4.46)
In matrix formulation it means that V T AAT V = D holds, where D is block diagonal.
Thus we proved
Theorem 109 A method of the Algorithm CDM class is optimally stable, if and only if
(4.46), or equivalently V T AAT V = D holds with a block diagonal matrix D.
The result was originally obtained by Broyden [37] in a different way. The projector technique however gives the structure of the error matrix Qr,j+1 .
P T
An optimally stable method always exists for a given A. Let A = V
U be the
P
singular value decomposition of A. Then V T AAT V = 2 . However the requirement for
the optimal stability is too strong from a practical point of view.
¡ ¢
If P is a¡projection
with R (P ) and N (P ), then P T is a projection with R P T =
¢
N ⊥ (P ) and N P T = R⊥ (P ). Thus we have
QTr,j+1 = PR(AT V |j ),R(AT V r−j| ) .
Hence by Lemma 181 it can be represented in the form
¡
¢|j ³¡ T ¢−1 ´j
QTr,j+1 = AT V
A V
,
The stability of conjugate direction methods
67
which clearly gives the representation
Qr,j+1 = (A−1 V −T )|j (V T A)j .
(4.47)
By Lemma 182 the bound
°°
°
°
kQr,j+1 k ≤ °A−1 V −T ° °V T A° = cond(V T A) ≤ cond(V )cond(A)
(4.48)
Theorem 110 For the error propagation model (4.39) the bound
°
°
°
°
kω − x0r k ≤ cond(V T A) °xj − x0j ° ≤ cond(V )cond(A) °xj − x0j °
(4.49)
holds in any submultiplicative unitarily invariant matrix norm.
Using the inequality (4.41) we can establish
holds in any submultiplicative unitarily invariant matrix norm.
Remark 111 If V is a unitary matrix, then 1 ≤cond(V ) ≤ m holds for unitarily invariant
matrix norms generated by normalized symmetric gauge functions (cf. inequality (7.7).
Thus the error bound (4.49) is proportional to cond(A). Particularly, cond(V ) = 1 in the
spectral norm.
Next we deÞne the residual perturbation as rk0 = A(xk − x0k ). Then for the error
propagation model (4.39) we have
0
= AQr,j+1 A−1 rj0 .
rr+1
´
³
Using the relation (AB)|k (CD)k = A B |k C k D and (4.47) we can show that
¢|j ¡ T ¢j
¡
AQr,j+1 A−1 = V −T
V
(4.50)
(4.51)
is a projector
onto°R((V −T )|j ) along R((V −T )r−j| ) (cf. Lemma 181). Again by Lemma
°
°
) holds
182, AQr,j+1 A−1 ° ≤cond(V
°
° in any submultiplicative unitarily invariant matrix
norm. The quantity °AQr,j+1 A−1 ° is minimal, if and only if AQr,j+1 A−1 is symmetric,
that is
³¡
³¡
¢|j ´
¢r−j| ´
= R⊥ V −T
.
(4.52)
R V −T
Relation (4.52) holds for all j if and only if V T V = D holds with a block diagonal matrix
D. So we have
Theorem 112 For the residual error the inequality
°° °
° °
°
(4.53)
krr0 k ≤ °AQr,j+1 A−1 ° °rj0 ° ≤ cond (V ) °rj0 °
°
°
holds for all j. The error constant °AQr,j+1 A−1 ° is minimal for all j, if and only if
V T V = D holds with a suitable block diagonal matrix D.
The structure of Algorithm CDM yields the following simple extension of the
error propagation model (4.39). Assume that an εk error occurs at each step k and the
perturbed recursion (4.36) can be written in the form
¢
¡
(4.54)
x0k = (I − Rk ) x0k−1 + εk−1 + dk (k = 1, . . . , r).
Finite projection methods for linear systems
68
Here we assume that the errors εk occur independently of each other. Writing (4.54) in
the form
x0k = (I − Rk ) x0k−1 + [(I − Rk ) εk−1 + dk ]
(k = 1, . . . , r)
we get the solution
x0r = Qr,1 x00 +
r
X
i=1
Qr,i+1 [(I − Ri ) εi−1 + di ] .
A comparison with the solution of recursion (4.36) immediately gives the error term
ω − x0r =
r
X
Qr,i εi−1
(4.55)
i=1
from which the bound
kω
− x0r k
r
r
X
¡ T ¢X
≤ cond V A
kεi−1 k ≤ cond (V ) cond (A)
kεi−1 k
i=1
(4.56)
i=1
follows by (4.48).
Theorem 113 For the extended error propagation model (4.54) the inequality (4.56)
holds.
For the optimally stable method
kω − x0r k ≤
r
X
i=1
kεi−1 k2
(4.57)
holds.
Finally we note that Abaffy, Galántai and Spedicato gave a forward error analysis
for the linear ABS methods in ßoating point arithmetic (see [9], Section 12.3).
4.5
The stability of the rank reduction conjugation
We investigate the stability of conjugation via the rank reduction. This, in fact, means
the stability of the full rank factorization (3.28). The results are based on [112], [114],
[117] and [93].
bB UB be the unique L1 U and LU1 factorizations of
bB and B = L
Let B = LB U
bY T HX X −1 and
B, respectively. Then the components (3.39) can be written as P T = U
bY T HX .
Q = Y −T L
We assume that H, X, Y ∈ Rm×m and Y T HX is strongly nonsingular. If Y , H
and X are subject of perturbations δeY , δeH and δeX , respectively, then we can write
³
´³
´³
´
Y T + δeYT
H1 + δeH X + δeX = Y T (H + δH ) X.
Hence formally we can assume that only H1 is subject to perturbation and the parameter
matrices X and Y are exact or unchanged. Let δH be the perturbation of H. Thus
e = H + δH ,
H
e = Y T HX + Y T δH X = Y T HX + δY T HX
Y T HX
The stability of the rank reduction conjugation
69
e =Q
eD
e −1 PeT reads as
and the perturbed full rank factorization H
b Te .
b T e X −1 , D
e =D Te , Q
e = Y −T L
PeT = U
Y HX
Y HX
Y HX
The initial assumption implies that only the triangular and the diagonal factors of the
LU and LDU decompositions of Y T H1 X are subject to change in the above expression.
e = Q−1 δH P −T D. If the perturbation δH is
Theorem 114 Let B = DQ−1 δH P −T and B
T
T
T
such that both Y HX and Y HX +Y δH X are nonsingular and have LU factorizations,
then
³ ´
e ,
δP T = triu (G) P T , δD = diag (G) D, δQ = Qtril G
(4.58)
∗
e
e
where G and
³ G
´ are the unique solutions of equations G = B − tril (G) B and G =
∗ e
e − Btriu
e
G , respectively. Hence
B
¯ ¯
¯ T¯
¯δP ¯ ≤ triu (|G|) ¯P T ¯ ,
|δD| ≤ diag (|G|) |D| ,
³¯ ¯´
¯ e¯
If ρ (|B|) < 1 and ρ ¯B
¯ < 1, then
³¯ ¯´
¯ e¯
|δQ| ≤ |Q| tril ¯G
¯ .
´
³
¯ T¯
¡
¡
¢¯ ¯
¢
¯δP ¯ ≤ triu Gb,1 ¯P T ¯ , |δD| ≤ diag Gb,1 |D| , |δQ| ≤ |Q| tril G
eb,1 ,
eb,1 are the unique solutions of equations
where Gb,1 and G
¯ ¯ ¯ ¯
³ ´
e ¯¯ triu∗ G
e ¯¯ + ¯¯B
e ,
e = ¯¯B
G = |B| + tril∗ (G) |B| and G
(4.59)
(4.60)
(4.61)
respectively.
bZ and
Proof. Let Z = Y T HX, δZ = Y T δH X, Z = LZ U
´
³
bZ + δU .
Z + δZ = (LZ + δL1 ) U
´
³
bZ X −1 and PeT = U
bZ + δU X −1 , where by Theorem 19, δU = triu (G) U
bZ ,
Then P T = U
¡
¢
b −1 . Hence
Y T δH X U
G = B − tril∗ (G) B and B = L−1
Z
Z
bZ X −1 = triu (G) P T .
δP T = PeT − P T = δU X −1 = triu (G) U
Matrix B can also be written as
³
´
¡ −1 −1 T ¢
T
b −1 .
b −1
B = L−1
δH X U
Z Y δH X UZ = DZ DZ LZ Y
Z
Let Z = LZ DZ UZ and Z + δZ = (LZ + δL1 ) (DZ + δD ) (UZ + δU1 ) be the unique LDU
e = DZ + δD and δD = δD . Hence by
factorizations. Then by deÞnition D = DZ , D
Theorem 25
δD = diag (G) D,
¡ T
¢ −1
b .
Y δH X U
where G is the unique solution of equation G = B−tril∗ (G) B and B = L−1
Z
Z
b
Finally, let Z = LZ UZ and
´
³
bZ + δL (UZ + δU1 )
Z + δZ = L
Finite projection methods for linear systems
70
³
´
bZ + δL .
bZ and Q
e = Y −T L
be the unique LU1 factorizations. By deÞnition Q = Y −T L
T
T bT
T
By noticing ³that
³ ´ that
´ Z = UZ LZ is an L1 U factorization of Z we can easily obtain
∗ e
b
e
e
e
e
e
δL = LZ tril G , where G is the unique solution of equation G = B − Btriu G with
¡
¢
e=L
b−1 Y T δH X U −1 . Hence
B
Z
Z
bZ tril (G) = Qtril (G) .
e − Q = Y −T δL = Y −T L
δQ = Q
e in the form
We can also write B
´
³
¡
¢
b−1 Y T δH XU −1 D−1 DZ .
e=L
b−1 Y T δH XU −1 = L
B
Z
Z
Z
Z
Z
The rest of the proof follows from Theorems 43 and 46.
³¯ ¯´
¯ e¯
−1
Remark 115 For ρ (|B|) < 1 and ρ ¯B
¯ < 1 the upper estimates Gb,1 ≤ |B| (I − |B|)
¯ ¯´−1 ¯ ¯
³
¯ e¯
e ¯¯
eb,1 ≤ I − ¯¯B
and G
¯B ¯ hold.
In view of Theorem 72 it is natural to have the strong similarity between the
perturbations of full rank factorization (3.28) and the triangular factorizations. We can
conclude that the full rank factorization and conjugation algorithm of Egerváry is stable,
whenever the corresponding triangular factorization is stable. Finally we note that norm
estimates of δP can be easily obtained using the results of Section 2.3.
Chapter 5
PROJECTION METHODS FOR NONLINEAR ALGEBRAIC SYSTEMS
We give a nonlinear generalization of the Householder-Bauer class of iterative projection
methods [146] from which we derive the nonlinear Kaczmarz method and a class of nonlinear conjugate direction methods. The class of nonlinear conjugate direction methods
is a common generalization of the nonlinear ABS and ABS type methods and some other
methods. We also developed a local convergence theorem with a special proof technique that applies to all investigated methods and is simpler than the earlier convergence
proofs. Beyond that it leads to the characterization of the behavior of these methods in
the minor iteration loop. Using certain elements of this proof technique we give a new
convergence proof for the nonlinear Kaczmarz method and show the reason for the very
different convergence behavior of the two subclasses.
We apply our convergence theorem to several classes of nonlinear conjugate direction methods and especially to the nonlinear ABS methods developed by Abaffy, Galántai
and Spedicato [8] and Abaffy and Galántai [6]. For the quasi-Newton ABS methods of
Galántai and Jeney [118], [119] we present numerical testing as well. For a special class
of the nonlinear ABS methods we prove monotone and essentially global convergence in
the natural partial ordering of Rm . We also show that one special method of the class is
faster than the Newton method in the partial ordering.
Finally we give two special applications of the block implicit LU ABS method.
The Þrst application is related to nonlinear algebraic systems with special block arrowhead
Jacobian matrices, while the second one is related to constrained optimization through
the KT equations.
We investigate nonlinear algebraic equations of the form
F (x) = 0
(F : Rm → Rm ) ,
(5.1)
where
T
F (x) = [f1 (x) , . . . , fm (x)] .
(5.2)
For the Jacobian matrix we use the notation
m
F 0 (x) = [∂fi (x) /∂xj ]i,j=1 = A (x)
(5.3)
to invoke the similarities with the linear case. Vector ω ∈ Rm denotes any solution of the
equation. We use the following assumptions:
−1
∃A (x)
(x ∈ S (ω, δ0 )) ,
kF (x) − F (y)k ≤ K0 kx − yk
(5.4)
(x, y ∈ S (ω, δ0 ))
(5.5)
(x, y ∈ S (ω, δ0 )) ,
(5.6)
and
α
kA (x) − A (y)k ≤ K1 kx − yk
Projection methods for nonlinear algebraic systems
72
where K0 , K1 ≥ 0, 0 < α ≤ 1 and δ0 > 0. Condition (5.4) implies that ω is an isolated
solution of F (x) = 0. We need the following simple technical result.
Lemma 116 Assume that F : Rm → Rm satisÞes condition (5.6) and let Z ∈ Rm×m be
b ∈ Rm×m such that
arbitrary. Then for any x, y ∈ S (ω, δ0 ) there is a matrix A
³
´
b − Z (y − x)
F (y) = F (x) + Z (y − x) + A
(5.7)
°
° √
°b
°
α
and °A
− A (x)° ≤ mK1 kx − yk . If kZ − A (x)k ≤ Γ, then
°
°
√
°b
°
α
°A − Z ° ≤ Γ + mK1 kx − yk .
(5.8)
Proof. Since for any i,
fi (y) = fi (x) + ∇fi (x + ϑi (y − x))T (y − x)
(0 < ϑi < 1) ,
b (y − x) with
we have F (y) = F (x) + A
b = [∇f1 (x + ϑi (y − x)) , . . . , ∇fm (x + ϑi (y − x))]T .
A
This clearly implies the equality. The rest follows from condition (5.6) and the triangle
inequality.
Let us consider the following iteration method for solving F (x) = 0:
¡
¢−1 T
xk+1 = xk − Pk VkT Ak Pk
Vk F (xk )
(k = 0, 1, . . . ) ,
(5.9)
where Ak ∈ Rm×m , Pk , Vk ∈ Rm×mk and mk ≤ m.
When F (x) = Ax − b and Ak = A the iteration changes to
¡
¢−1 T
xk+1 = xk − Pk VkT APk
Vk r k
(k ≥ 0) ,
which has the form of the Householder-Bauer class of projection methods [146]. If mk ≡ m,
and the matrices Ak , Vk and Pk are invertible, then iteration (5.9) becomes the modiÞed
Newton method
xk+1 = xk − A−1
k F (xk )
(k ≥ 0) .
(5.10)
For the choice Ak = A (xk ) we obtain the Newton method.
Assume that mk < m and substitute
´
³
bk − Ak (xk − ω)
F (xk ) = F (ω) + Ak (xk − ω) + A
into (5.9). Then we obtain the recursion
xk+1 − ω = Bk (xk − ω) + Ck (xk − ω) ,
(5.11)
where
is a projection and
¡
¢−1 T
Bk = I − Pk VkT Ak Pk
Vk Ak
´
¡
¢−1 T ³
bk − Ak .
Ck = PkT VkT Ak Pk
Vk A
(5.12)
(5.13)
Extensions of the Kaczmarz method
73
Assuming that
and
°
¢−1 T °
° T¡ T
°
Vk ° ≤ K2
°Pk Vk Ak Pk
(k ≥ 0)
(5.14)
kAk − A (ω)k ≤ γ kxk − ωkα
(k ≥ 0)
(5.15)
we easily derive the bounds
¡
¢
√
kCk k ≤ K2 γ + mK1 kxk − ωkα = K3 kxk − ωkα
(k ≥ 0)
(5.16)
and
kxk+1 − ωk ≤ kBk k kxk − ωk + K3 kxk − ωk1+α
(k ≥ 0) .
Since kBk k ≥ 1, we cannot prove convergence in the standard way (see, e.g., [202]).
Next we investigate two subclasses of algorithm (5.9). In the Þrst section we deal
with the nonlinear Kaczmarz algorithms. These methods have local convergence with
linear speed. If the fi ’s are convex, then we can rewrite the nonlinear system F (x) = 0
as a convex feasibility problem that can be solved by other variants of the Kaczmarz
method [94]. In the second section we develop a nonlinear generalization of the conjugate
direction methods of Section 4.2. The nonlinear conjugate direction methods have local
convergence of order 1 + α. We also point out the reason for the different convergence
behavior of the two subclasses.
5.1
Extensions of the Kaczmarz method
We investigate nonlinear versions of the Kaczmarz method, which is a classical iterative
method for solving linear equations.
Tompkins [238] suggested Þrst a nonlinear version of the Kaczmarz method for
solving F (x) = 0. The Tompkins, Kaczmarz algorithm has the form
xk+1 = xk −
fi (xk )
k∇fi (xk )k2
∇fi (xk )
(i ≡ k (mod m) + 1) .
(5.17)
Tompkins did not prove the convergence of the method. It was McCormick [184], [185],
who Þrst showed the local linear convergence of the modiÞed Kaczmarz method
fi(k) (xk )
xk+1 = xk − °
(xk )
° ∇f
°∇fi(k) (xk )°2 i(k)
(k ≥ 0)
(5.18)
for various selection strategies i = i (k) which include the cyclic selection
i ≡ k (mod m) + 1
and the optimal (maximum error) selection
¯
¯
¯fi(k) (xk )¯ = max |fi (xk )| .
1≤i≤m
(5.19)
(5.20)
For other strategies we refer to [185] and [43].
Meyn [188] proved the local convergence of the relaxed nonlinear Kaczmarz method
xk+1 = xk − µ
fi (xk )
∇fi (xk )
k∇fi (xk )k2
(0 < µ < 2, i ≡ k (mod m) + 1) .
(5.21)
Projection methods for nonlinear algebraic systems
74
Martinez [180], [181], [182], [73] investigated several versions of the nonlinear block
Kaczmarz method with relaxation. Assume that F (x), F 0 (x) and Im are partitioned in
the following way:


F1 (x)


..
m
m
(5.22)
F (x) = 
 (Fi : R → R i , i = 1, . . . , r) ,
.
Fr (x)


A1 (x)


..
A (x) = 

.
Ar (x)
and
Im = [E1 , . . . , Er ]
¡
¢
Ai ∈ Rmi ×m , i = 1, . . . , r
¡
¢
Ei ∈ Rm×mi , i = 1, . . . , r .
(5.23)
(5.24)
Hence Fi (x) = EiT F (x) and Ai (x) = EiT A (x). Then we can deÞne the following relaxed
block Kaczmarz method
h
i−1
Fi(k) (xk ) (k ≥ 0) ,
(5.25)
xk+1 = xk − µk ATi(k) (xk ) Ai(k) (xk ) ATi(k) (xk )
where 0 < ε1 ≤ µk ≤ 2 − ε1 . The algorithm can also be written in the form
h
i−1
T
T
A (xk ) AT (xk ) Ei(k)
Ei(k)
F (xk ) ,
xk+1 = xk − µk AT (xk ) Ei(k) Ei(k)
(5.26)
which corresponds to (5.9). Martinez proved the local convergence for the cyclic selection
i (k) and for the selection i = i (k) deÞned by
°
°
°Fi(k) (xk )° ≥ θ max kFi (xk )k .
(5.27)
1≤i≤r
Liu [174] developed acceleration schemes for the Kaczmarz algorithm.
Next we prove the convergence of the relaxed nonlinear block Kaczmarz method
in the case of constant relaxation and cyclic selection. Thus we investigate the method
where k ≥ 0,
£
¤−1 T
Ei F (xk ) ,
xk+1 = xk − µAT (xk ) Ei EiT A (xk ) AT (xk ) Ei
0 < µ < 2,
i ≡ k (mod m) + 1.
(5.28)
(5.29)
The presented new proof, which is different from those of Martinez, enables us to compare
the nonlinear Kaczmarz type methods with the nonlinear conjugate direction methods of
next section.
In the case of algorithm (5.28) iteration (5.11) changes to
sk+1 = Bk sk + Ck sk ,
(5.30)
where sk = xk − ω, Bk = I − µR(k) ,
£
¤−1 T
Ei A (xk )
R(k) = Ri (xk ) = AT (xk ) Ei EiT A (xk ) AT (xk ) Ei
(5.31)
Extensions of the Kaczmarz method
75
¢
¡
is an orthogonal projection on R AT (xk ) Ei and
´
£
¤−1 T ³
bk − A (xk ) .
Ck = AT (xk ) Ei EiT A (xk ) AT (xk ) Ei
Ei A
(5.32)
°
°
°
°
For 0 < µ < 2, °I − µR(k) °2 ≤ 1. If °A−1 (x)° ≤ K2 for all x ∈ S (ω, δ0 ) and xk ∈
S (ω, δ0 ), then
° ¡√
°
¢
°
°b
m + 1 K1 kxk − ωkα ,
°Ak − A (xk )° ≤
and
°
°
£
¤−1 T °
° °
° T
Ei ° ≤ °A−1 (xk )° ≤ K2
°A (xk ) Ei EiT A (xk ) AT (xk ) Ei
kCk k ≤
¡√
¢
m + 1 K1 K2 kxk − ωkα = K3 kxk − ωkα .
(5.33)
We need the following technical result.
Lemma 117 Consider the sequence sk+1 = Bk sk + Ck sk (k = 1, 2, . . . ), where Bk , Ck ∈
Rm×m , kBk k ≤ KB , kCk k ≤ KC (k = 1, 2, . . . ). Then for k ≥ 1,
sk+1 = Bk Bk−1 · · · B1 s1 + Dk s1 ,
k−1
where kDk k ≤ kKC (KB + KC )
(5.34)
.
Proof. For k = 1, s2 = B1 s1 + C1 s1 . For k := k + 1,
sk+1
= (Bk + Ck ) (Bk−1 · · · B2 B1 s1 + Dk−1 s1 )
= Bk Bk−1 · · · B1 s1 + Ck Bk−1 · · · B2 B1 s1 + (Bk + Ck ) Dk−1 s1
= Bk Bk−1 · · · B1 s1 + Dk s1 .
Using elementary operations we have
k−1
k−1
+ (k − 1) KC (KB + KC )
kDk k ≤ KC KB
≤ kKC (KB + KC )k−1 .
In our case kBi k ≤ 1 for any i and 0 < µ < 2. If xi ∈ S (ω, δo ), then kCi k ≤
α
∞
K3 ksi k ≤ K3 δ0α . Consider the subsequence {xnr }n=0 and set k = nr. The solution of
recursion (5.30) is given by
sk+r = Bk+r−1 · · · Bk+1 Bk sk + Dk+r,k sk ,
(5.35)
where kDk+r,k k ≤ rKC (1 + KC )r−1 . The indices of Dk+r,k denote the fact that the
summation started from sk .
Substituting Bk+j = I − µRj+1 (xk+j ) (j = 0, . . . , r − 1) we have
sk+r = (I − µRr (xk+r−1 )) · · · (I − µR1 (xk )) sk + Dk+r,k sk ,
¢
¡
where Ri (x) is the orthogonal projection on R AT (x) Ei . We show that
kBk+r−1 · · · Bk+1 Bk k ≤ q1 < 1
holds under certain conditions.
(5.36)
Projection methods for nonlinear algebraic systems
76
¡ ¢
Lemma 118 Let R£i ∈ Rm×m be¤ the orthogonal projection on R ATi for i = 1, . . . , r and
assume that AT = AT1 , . . . , ATr ∈ Rm×m is nonsingular. Then for any Þxed 0 < µ < 2,
k(I − µRr ) · · · (I − µR1 )k2 < 1.
(5.37)
Proof. We exploit Whitney and Meany [248]. For 0 < µ < 2, kI − µRi k2 ≤ 1
for all i. Hence
¡ k(I
¢ − µRi ) yk2 ≤ kyk2 . The equality holds if and only if Ri y = 0, that
we prove that k(I − µRr ) · · · (I − µR1 ) yk2 = kyk2
is if y ∈ R⊥ ATi . Using induction
¡ ¢
holds if and only if y ∈ R⊥ ATi (i = 1, . . . , r). For r = 1, the claim
¡ Tis¢ true. Assume
⊥
that k(I − µRj ) · · · (I − µR1 ) yk2 = kyk2 holds if and only if y ∈ R Ai (i = 1, . . . , j).
Since
kyk2 = k(I − µRj+1 ) [(I − µRj ) · · · (I − µR1 ) y]k2
≤ k(I − µRj ) · · · (I − µR1 ) yk2 ≤ kyk2 ,
¡ ¢
the assumption implies y ∈ R⊥ ATi for i = 1, . . . , j. But for such y,
(I − µRj ) · · · (I − µR1 ) y = y
¡
¢
and we obtain the relation k(I − µRj+1 ) yk2 = kyk2 which implies y ∈ R⊥ ATj+1 . Thus
we showed
k(I − µRr ) · · · (I − µR1 ) yk2 ≤ kyk2 holds with equality if and only if
¡ that
¢
y ∈ R⊥ AT ={0}. Hence for any 0 6= y ∈ Rm ,
k(I − µRr ) · · · (I − µR1 ) yk2 < kyk2
and
k(I − µRr ) · · · (I − µR1 )k2 = max k(I − µRr ) · · · (I − µR1 ) yk2 < 1.
kyk2 ≤1
Taking A = A (ω) we obtain that
k(I − µRr (ω)) · · · (I − µR1 (ω))k2 < 1.
The continuity of the Jacobian matrix A (x) at x = ω implies the existence of numbers
0 < δ1 < δ0 and 0 < q1 < 1 such that
£ T
¤
A (xk+r−1 ) Er , . . . , AT (xk ) E1
is nonsingular and
k(I − µRr (xk+r−1 )) · · · (I − µR1 (xk ))k2 ≤ q1
holds for all xk , . . . , xk+r−1 ∈ S (ω, δ1 ).
(5.38)
¢
¡
Lemma 119 If xk ∈ S (ω, δ2 ) and 0 < δ2 ≤ δ21 (1 + 2K0 K2 )−r , then xk+j ∈ S ω, δ21
j
and ksk+j k ≤ (1 + 2K0 K2 ) ksk k for j = 1, . . . , r.
Proof. Since
and
£
¤−1 T
xk+1 = xk − µAT (xk ) Ei EiT A (xk ) AT (xk ) Ei
Ei F (xk )
£
¤−1 T
ω = ω − µAT (xk ) Ei EiT A (xk ) AT (xk ) Ei
Ei F (ω)
Extensions of the Kaczmarz method
77
we obtain the recursion
£
¤−1 T
Ei [F (xk ) − F (ω)] .
sk+1 = sk − µAT (xk ) Ei EiT A (xk ) AT (xk ) Ei
This implies the inequality
ksk+1 k ≤ ksk k + µK2 kF (xk ) − F (ω)k ≤ (1 + 2K0 K2 ) ksk k .
Hence ksk+j k ≤ (1 + 2K0 K2 )j ksk k ≤ δ1 /2 for j = 1, . . . , r.
We now give two estimates for KC ≥ maxk≤j≤k+r kCj k. If xk , . . . , xk+r ∈
S (ω, δ0 ), then for 0 ≤ j ≤ r,
α
rα
α
α
kCk+j k ≤ K3 ksk+j k ≤ K3 (1 + 2K0 K2 ) ksk k = K4 ksk k ≤ K4 δ0α = K5 .
¢
¡
If xk ∈ S (ω, δ2 ), then xk+j ∈ S ω, δ21 and
ksk+r k ≤ q1 ksk k + kDr+k,k k ksk k
where
r−1
kDr+k,k k ≤ rKC (1 + KC )
≤ rK4 (1 + K5 )r−1 ksk kα = K6 ksk kα .
Hence
1+α
ksk+r k ≤ q1 ksk k + K6 ksk k
.
For any q1 < q < 1 there is a number δ3 > 0 such that q1 + K6 δ3α = q. Then ksk k ≤ δ3
implies
α
ksk+r k ≤ (q1 + K6 ksk k ) ksk k ≤ q ksk k < δ3 .
Let δ ∗ = min {δ2 , δ3 }. Then x0 ∈ S (ω, δ ∗ ) implies that xr ∈ S (ω, δ ∗ ) and ksr k ≤ q ks0 k <
δ ∗ . Consequently, xnr ∈ S (ω, δ ∗ ) and ksnr k ≤ q n ks0 k hold for n ≥ 0. Hence xnr → ω
linearly as n → ∞.
Theorem 120 Suppose that conditions (5.4)-(5.6) are satisÞed and 0 < µ < 2. Then
there exists a number 0 < δ ∗ < δ0 such that for any x0 ∈ S (ω, δ ∗ ) the relaxed Kaczmarz
method (5.28) converges to ω linearly.
Proof. For x0 ∈ S (ω, δ ∗ ), kxnr − ωk ≤ q n kx0 − ωk holds (n ≥ 0). For any
i = nr + j, 1 ≤ j < r, we have
r−1
kxnr − ωk = βq n
kxnr+j − ωk ≤ (1 + 2K0 K2 )


³ 1 ´nr+j
³ ´
qr
β
 1 nr+j

≤ β ³ ´j ≤  ³ ´r−1  q r
.
1
1
r
r
q
q
If we restrict µ such that 0 < ε1 ≤ µ ≤ 2 − ε1 , then bound (5.38) remains true
with another q1 < 1 for any µ ∈ [ε1 , 2 − ε1 ]. Hence the method
£
¤−1 T
Ei F (xk )
xk+1 = xk − µk AT (xk ) Ei EiT A (xk ) AT (xk ) Ei
(5.39)
Projection methods for nonlinear algebraic systems
78
with
0 < ε1 ≤ µk ≤ 2 − ε1 ,
i ≡ k (mod m) + 1
(5.40)
is also linearly convergent under the assumptions (5.4)-(5.6).
The convergence of the Kaczmarz method depends on the product matrix
(I − µRr ) · · · (I − µR1 ) ,
which is not 0, in general. Hence the Kaczmarz method has only linear convergence
with rate ≈ k(I − µRr ) · · · (I − µR1 )k. We give computable estimates for this quantity in
Section 6.2.
5.2
Nonlinear conjugate direction methods
We generalize the Þnitely terminating Galerkin-Petrov or conjugate direction methods of
Pr(i) (i)
Section 4.2 as follows. Assume that 1 ≤ r (i) ≤ m, k=1 mk = m,
(i)
(i)
Pk , Vk
(i)T
Vj
and
(i)
∈ Rm×mk
(i)
(k = 1, . . . , r (i)),
(i)
Aj Pk = 0
´
³
(i)T (i) (i)
6= 0
det Vk Ak Pk
(j < k) ,
(k = 1, . . . , r (i)) .
The last two conditions imply that
h
ir(i)
(i)T (i) (i)
Vj Aj Pk
j,k=1
(5.41)
(5.42)
(5.43)
(5.44)
is a nonsingular block lower triangular matrix. Hence we impose a kind of A-conjugacy.
Algorithm 1 (Nonlinear Conjugate Direction Method)
x1 ∈ Rm
for i = 1, 2, . . .
y1 = xi
for k = 1, . . . , r (i)
³
´−1
(i)
(i)T (i) (i)
(i)T
Vk F (yk )
yk+1 = yk − Pk Vk Ak Pk
end
xi+1 = yr(i)+1
end
(1)
If F (x) = Ax − b and Aj = A (j = 1, . . . , r (1)) the algorithm coincides with
(i)
Algorithm CDM and Þnitely terminates. If Vk ’s are partial permutation matrices, then
we have a row-action method (see, e.g., [43]).
o
n
(i)
(i)
(i)
(i)
Note that r (i), the matrices Pk , Vk and the partition m1 , . . . , mr(i) can
change with the major iteration i. For simplicity we drop the upper index i whenever
possible and use only r (i) as a reminder.
Theorem 121 Assume that conditions (5.4)-(5.6) hold and let the matrices Vk , P k be
such that
° ¡
¢−1 T °
°
°
Vk ° ≤ K2 (k = 1, . . . , r (i) ; i ≥ 1)
(5.45)
°Pk VkT Ak Pk
Nonlinear conjugate direction methods
79
holds. There exist Γ∗ , δ ∗ > 0 such that if
kAk − A (ω)k ≤ Γ∗
(k = 1, . . . , r (i) ; i ≥ 1)
(5.46)
and x1 ∈ S (ω, δ ∗ ), then Algorithm 1 converges to ω with a linear speed. If kAk − A (ω)k ≤
α
K6 kxi − ωk (k = 1, . . . , r (i), i ≥ 1), the order of convergence is at least 1 + α.
Proof. Let K3 = (1 + K0 K2 )m and δ1 ≤ δ0 / (2K3 ). Consider the identity
¡
¢−1 T
Vk [F (yk ) − F (ω)] ,
sk+1 = sk − Pk VkT Ak Pk
(5.47)
where sk = yk − ω (k = 1, . . . , r(i)). If y1 ∈ S (ω, δ1 ) then conditions (5.5) and (5.45)
imply
k
ksk+1 k ≤ (1 + K0 K2 ) ksk k ≤ (1 + K0 K2 ) ks1 k ≤ K3 ks1 k ≤
δ0
2
and yk+1 ∈ S (ω, δ0 /2) for k = 1, . . . , r(i). Let γk = kAk − A (ω)k and consider the
expansion
F (yk ) = F (ω) + Ak (yk − ω) + (Âk − Ak )(yk − ω),
(5.48)
where
°
°
√
°
°
α
°Ak − Âk ° ≤ γk + mK1 ksk k .
By substituting expansion (5.48) into (5.47) we have the nonlinear recursion
sk+1 = Bk sk + Ck sk
(5.49)
with Bk = Im − Rk , Rk = Pk (VkT Ak Pk )−1 VkT Ak and
Ck = Pk (VkT Ak Pk )−1 VkT (Ak − Âk ).
Observe that Rk is an oblique projection of rank mk , N (Rk ) = R⊥ (ATk Vk ) and R(Rk ) =
R(Pk ). We can establish the following two bounds:
√
kIm k + kRk k ≤ m + K2 kAk k
kBk k ≤ √
m + K2 (kA (ω)k + maxk γk ) = KB
≤
and
°
°
°
°
kCk k ≤ K2 °Ak − Âk °
√
α
≤ K2 (γ
kk )
³ k + mK1 ks
¡ ¢α ´
√
= KC .
≤ K2 (maxk γk ) + mK1 δ20
The solution of recursion (5.49) is given by Lemma 117:
sk+1 = Bk · · · B1 s1 + Dk s1 ,
k−1
where kDk k ≤ kKC (KB + KC )
(5.50)
. Thus we have the bound
ksk+1 k ≤ kBk · · · B1 s1 k + kKC (KB + KC )k−1 ks1 k .
(5.51)
Projection methods for nonlinear algebraic systems
80
The Þrst part of the bound is of O (ks1 k) provided that Bk · · · B1 6= 0. If K
´
³C is small
α
enough, say O (ks1 k ), then the second part of the bound is a quantity of O ks1 k1+α .
Consequently, the error of the last minor iterate depends on the properties of the matrix
Qk = Bk · · · B1 . As Rj Rk = 0 for j < k, Theorem 95 implies that Qk is also a projection
with
R(Qk ) = ∩kj=1 N (Rj ) N (Qk ) = ∪kj=1 R(Rj ).
Properties R(Rj ) = R(Pj ) and rank (P ) = m imply
N (Qk ) = R([P1 , . . . , Pk ]).
(5.52)
The relation
(5.53)
R(Qk ) = [∪kj=1 R(ATj Vj )]⊥ = R([Pk+1 , . . . , Pr(i) ])
¡
¢
follows from N (Rj ) = R⊥ ATj Vj (j = 1, . . . , r(i)) and the orthogonality relation
ATj Vj ⊥ Pt (j < t) (see conditions (5.42)-(5.43)). For k = r(i) the matrix Qk is the
zero projection with N (Qk ) = R(P ) and R(Qk ) = {0}. Hence Qr(i) = 0 and we have the
estimate
°
°
°sr(i)+1 ° ≤ r (i) KC (KB + KC )r(i)−1 ks1 k .
Let γk ≤ Γ for all i and k. As
µ
µ
¶
µ ¶α ¶
√
√
δ0
KB + KC ≤ m + K2 kA (ω)k + 2 max γk + mK1
k
2
µ
µ ¶α ¶
√
√
δ0
≤ m + K2 kA (ω)k + 2Γ + mK1
= K4
2
we can write
µ
¶
r (i) KC (KB + KC )r(i)−1 ≤ K5 max γk + ks1 kα ,
k
√
where K5 = mK2 max {1, mK1 K3α } K4m−1 . Hence
µ
¶
kxi+1 − ωk ≤ K5 max γk + kxi − x∗ kα kxi − ωk .
k
Let 0 < Γ∗ ≤ Γ and δ ∗ ≤ δ1 be so small that K5 (Γ∗ + (δ ∗ )α ) ≤ 1/2. Then
kxi+1 − ωk ≤
1
kxi − ωk .
2
If x1 ∈ S (ω, δ ∗ ) and γk ≤ Γ∗ for all i and k, then
1
kxi+1 − ωk ≤ kxi − ωk ≤ · · · ≤
2
µ ¶i
1
kx1 − ωk .
2
Thus we proved the linear convergence. If maxk γk ≤ K6 ks1 kα , then
µ
¶
α
α
K5 max γk + ks1 k
≤ K7 ks1 k
k
(5.54)
Nonlinear conjugate direction methods
with K7 = K5 max {1, K6 } and
81
°
°
°sr(i)+1 ° ≤ K7 ks1 k1+α .
Assume now that 0 < δ ∗ ≤ δ1 satisÞes K7 (δ ∗ )α ≤ 1/2. Then for y1 = xi ∈ S (ω, δ ∗ )
the inequality kxi+1 − ωk ≤ (1/2) kxi − ωk holds. If x1 ∈ S (ω, δ ∗ ) then we have xi → ω
(i → +∞). The order of convergence is given by the much sharper estimation
−1/α
kxi+1 − ωk ≤ K7
1/α ∗
where K7
³
³
´(1+α)i
´1+α
1/α
−1/α
1/α
K7 kxi − ωk
K7 δ ∗
≤ K7
δ ≤ (1/2)1/α < 1.
Remark 122 If maxk γk ≤ K6 ks1 kβ with 0 < β ≤ α, then the order of convergence is
1 + β.
Theorem 121 is a generalization of the convergence results [96], [98], [102] developed for the nonlinear ABS methods. In the subsequent sections we present several
applications of the above theorem which indicate its generality. It is worth noting that
the original proofs of some special cases are more complicated than those we present here.
The next remarks reveal essential properties of the minor iteration structure of the nonlinear conjugate direction methods. This is also due to the proof technique of the previous
theorem.
Assume now that
max γk ≤ K6 ks1 kβ ,
k
(i ≥ 1, 0 < β ≤ α) .
(5.55)
The proof of Theorem 121 reveals that in the minor iteration steps the error
vector sk+1 has a nonvanishing Þrst order component Bk sk that dominates the (1 + β)order component Ck sk . The quantity kBk k is minimal if and only if R(Pk ) = R(ATk Vk ).
Consider the estimate
k−1
ks1 k .
(5.56)
ksk+1 k ≤ kBk · · · B1 s1 k + kKC (KB + KC )
³
´
The second part of the bound is of O ks1 k1+β . Hence the dominating part of the error
sk+1 is given by Bk · · · B1 s1 in the¡£term of s1 . The¤¢projection Qk = Bk · · · B1 maps the
error vector s1 = xi − ω onto R Pk+1 , . . . , Pr(i) along R ([P1 , . . . , Pk ]). Therefore
kBk · · · B1 s1 k depends on s1 and the choice of P1 , . . . , Pk .
This observation has some consequences for the inexact Newton methods of
Dembo, Eisenstat and Steihaug [59] (for details, see [102]).
We point out that the property Br(i) · · · B1 = 0 makes the difference between
the convergence behavior of the nonlinear conjugate direction methods and the Kaczmarz
method, where the corresponding matrix
(I − µRr ) · · · (I − µR1 )
is nonzero.
The upper estimation kBk · · · B1 s1 k ≤ kBk · · · B1 k ks1 k has a minimal error constant if and only if the projection Bk · · · B1 is symmetric. The matrix Qk is symmetric
if and only if R(Qk ) = N ⊥ (Qk ). From (5.52) and (5.53) it follows that Qk is symmetric for all k = 1, . . . , r(i) if and only if P T P is a diagonal matrix. It is noted that
kBk · · · B1 k ≤cond(P ) holds in any submultiplicative unitarily invariant matrix norm.
Projection methods for nonlinear algebraic systems
82
Finally we note that condition (5.45) is equivalent with the condition
° ¡
¢−1 T °
°
°
kRk k = °Pk VkT Ak Pk
Vk Ak ° ≤ Γ2 (k = 1, . . . , r(i); i ≥ 1)
(5.57)
if kAk − A (ω)k ≤ Γ∗ and Γ∗ is small enough. Therefore condition (5.45) which is the
key of the convergence is satisÞed if and only if the oblique projections Rk are uniformly
bounded.
Next we show that condition (5.45) is an essential requirement for the convergence
under the conditions (5.4)-(5.6).
Example 123 Consider the problem
f1 (x1 , x2 ) = x21 − x1 = 0
f2 (x1 , x2 ) = x21 − x2 = 0
the solutions of which are [0, 0]T and [1, 1]T . The problem clearly satisÞes conditions
(5.4)-(5.6). DeÞne the following nonlinear conjugate direction method:
·
¸
1 ai
r (i) = 2, P = I, Ak = A (xi ) (k = 1, 2) , V =
0 1
for i = 1, 2, . . . . Let xi = [µi , τi ]T be the ith major iterate and let the initial approximation
x1 = [µ1 , τ1 ]T be selected such that µ1 > 1 and τ1 > 0. Using straightforward calculations
one can show that the new major iterate xi+1 is given by


µ2
i
xi+1

=
2µi −1
(1+a)µ4i
(2µi −1)2
−
aµ2i
2µi −1

.
T
Observe that xi+1 = [µi+1 , τi+1 ] does not depend on τi . Since for µ1 > 1 the iteration
sequence µi+1 = µ2i /(2µi −1) (i = 1, 2, . . . ) is monotone decreasing and converges to 1, the
Þrst components of the major iterates xi converge to the Þrst component of the solution
T
[1, 1] . In the ith major iteration we select the value
ai =
(2µi − 1)2
,
µ2i (µi − 1)2
which depends only on the Þrst component of xi . Then it can be shown that
ai µ2i
(1 + ai ) µ4i
2 − 2µ − 1 > 2
i
(2µi − 1)
(µi > 1).
Consequently xi can not converge to any solution. Furthermore we have
kP k2 = 1,
kV kF → ∞
° ¡
¢−1 T °
°
°
V1 ° =
°P1 V1T A1 P1
F
(i → ∞) ,
1
→1
2µi − 1
° ¡
¢−1 T °
¢1/2
¡
°
°
V2 ° = 1 + a2i
→∞
°P2 V2T A2 P2
F
Obviously condition (5.45) does not hold.
(i → ∞) ,
(i → ∞) .
Particular methods
83
Observe that kP k is bounded, while kV k is not. The following simple result shows
that a similar behavior may not be expected when condition (5.45) holds.
Proposition 124 Condition (5.45) implies
°
°
°P D−1 V T ° ≤ mK2 ,
°
°
°
°
¢
¡
(5.45) and either °DP −1 ° ≤ Γ1 or °V −1 ° ≤ Γ2
where D = diag VkT Ak Pk . If °condition
°
hold, then either kV k ≤ γ1 or °P D−1 ° ≤ γ2 also holds, respectively.
Proof. Since
P D−1 V T =
m
X
Pk (VkT Ak Pk )−1 VkT
k=1
°
°
°
°
holds the inequality °P D−1 V T ° ≤ mK2 follows from (5.45). Assume that °DP −1 ° ≤ Γ1
holds. Then
° T° °
¡
¢°
°V ° = °DP −1 P D−1 V T ° ≤ Γ1 mK2 .
°
° °¡
°
°
°
¢
Similarly, if °V −1 ° ≤ Γ2 holds then °P D−1 ° = ° P D−1 V T V −T ° ≤ mK2 Γ2 .
In the case of nonlinear ABS methods condition (5.45) can be relaxed at the price
of losing convergence order [102], [65].
5.3
Particular methods
(i)
In practice Ak is an approximation to A (ω). Before discussing any particular method we
have to remark that the rank reduction procedure and the other conjugation procedures
mentioned in Section 4.2 can be used to produce directions Vk and Pk that satisfy (5.42)(5.43).
5.3.1 Methods with Þxed direction matrices
(i)
(i)
Assume that Ak = A, Vk
(i)
= Vk , Pk = Pk , r (i) = r and
o
n
(i)
(i)
m1 , . . . , mr(i) = {m1 , . . . , mr }
for all i ≥ 1. Then Algorithm 1 has the form
Algorithm 2
x1 ∈ Rm
for i = 1, 2, . . .
y1 = xi
for k = 1, . . . , r
¡
¢−1 T
yk+1 = yk − Pk VkT APk
Vk F (yk )
end
xi+1 = yr+1
end
By setting
° ¡
¢−1 T °
°
°
Vk °
K2 = max °Pk VkT Ak Pk
1≤k≤r
(5.58)
we satisfy condition (5.45). Hence Algorithm 2 has linear convergence rate provided that
approximations x1 ≈ ω and A ≈ A (ω) are good enough.
Projection methods for nonlinear algebraic systems
84
The multistep version of Algorithm 2 uses Vk , Pk and Ak for a Þxed number of
iterations and then recalculates them. The form of the algorithm is the following
Algorithm 3 (Multistep version of Algorithm 2)
x1 ∈ Rm
for i = 1, 2, . . .
z1 = xi
for j = 1, . . . , t + 1
(j)
y1 = zj
for k = 1, . . . , r
³
´−1
´
³
(j)
(j)
(i)
(i)T (i) (i)
(i)T
(j)
Vk F yk
yk+1 = yk − Pk Vk Ak Pk
end
(j)
zj+1 = yr+1
end
xi+1 = zt+2
end
Theorem 125 Assume that conditions (5.4)-(5.6) hold with α = 1 and the matrices V, P
are such that
° ¡
¢−1 T °
°
°
Vk ° ≤ K2 (k = 1, . . . , r; i ≥ 1)
(5.59)
°Pk VkT Ak Pk
holds. There exist Γ∗ , δ ∗ > 0 such that if
kAk − A (ω)k ≤ Γ∗
(k = 1, . . . , r; i ≥ 1)
(5.60)
and x1 ∈ S (ω, δ ∗ ), then Algorithm 3 converges to ω with a linear speed. If kAk − A (ω)k ≤
K6 kxi − ωk (k = 1, . . . , r, i ≥ 1), the order of convergence is at least t + 2.
Proof. By repeating the arguments of the proof of Theorem 121 we obtain the
inequality
¶
µ
kzj+1 − ωk ≤ K5 max γk + kzj − ωk kzj − ωk
k
(j = 1, . . . , t + 1) .
If 0 < Γ∗ ≤ Γ and δ ∗ ≤ δ1 are such that K5 (Γ∗ + (δ ∗ )α ) ≤ 1/2, then kzj+1 − ωk ≤
(1/2) kzj − ωk and
kxi+1 − ωk ≤
1
kxi − ωk .
2t+1
Thus we have the linear convergence speed. If kAk − A (ω)k ≤ K6 kxi − ωk, then by
induction we can prove that kzj − ωk ≤ K (j) kz1 − ωkj (j ≥ 2). Hence we have
kxi+1 − ωk ≤ K (t+2) kxi − ωkt+2
which is one order better than before. Thus the convergence order is not less than t + 2.
It is noted that Γ∗ and δ ∗ must be signiÞcantly smaller than those of Theorem 121.
The multistep version improves the speed of Algorithm 2. Similar results on
multistep versions of classical methods can be found in Ortega and Rheinboldt [202].
Particular methods
85
5.3.2 The nonlinear ABS methods
The Þrst (unscaled) version of the nonlinear ABS algorithms was developed by Abaffy,
Galántai and Spedicato [8]. The following generalization of that class was given by Abaffy
and Galántai [6].
Algorithm 5 (The block nonlinear ABS method)
x1 ∈ Rm
for i = 1, 2, . . .
y1 = xi , H1 = I
for k = 1, . . . , r (i)
³
´
Pk
Pk
τjk ≥ 0,
τ
=
1
uk = j=1 τjk yj
jk
j=1
Pk = HkT Zk
¡
¢−1 T
yk+1 = yk − Pk VkT A (uk ) Pk
V F (yk )
¡ T k T
¢−1 T
T
Wk Hk
Hk+1 = Hk − Hk A (uk ) Vk Wk Hk A (uk ) Vk
end
xi+1 = yr(i)+1
end
The parameters Zk , Wk ∈ Rm×mk are subject to the conditions
¡
¢
¡
¢
det PkT AT (uk ) Vk 6= 0 and det WkT Hk AT (uk ) Vk 6= 0.
(5.61)
The generation of direction matrices Vk , Pk is done through the rank reduction algorithm.
The algorithm coincides with the linear block ABS method if F (x) = Ax − b.
A particular nonlinear block ABS method is given by the parameter matrices
V = [V1 , . . . , Vr(i) ], W = [W1 , . . . , Wr(i) ], Z = [Z1 , . . . , Zr(i) ] (V, W, Z ∈ Rm×m ) and
r(i)
T = [τij ]i,j=1 , where τ11 = 1 and τij = 0 for i > j. By deÞnition uk = [y1 , . . . , yr(i) ]T ek ,
where ek ∈ Rr(i) is the kth unit vector. Note that the partition and the parameters may
vary with the major iterates. The unscaled nonlinear ABS class is a subset of the block
nonlinear ABS class (V = I, r(i) = m). The block nonlinear ABS methods coincide with
the linear block ABS method on linear systems of the form F (x) = Ax − b = 0.
The weight matrix T may provide different strategies for choosing the ”stepsize”
(VkT AT (uk )Pk )−1 VkT F (yk ).
For example the choice
uk = yk
(k = 1, . . . , r(i); T = Ir(i) )
corresponds to the Seidel principle and reevaluates the Jacobian matrix ”row” by ”row”.
The choice
uk = y1
(k = 1, . . . , r(i); T = [e1 , . . . , e1 ] ∈ Rr(i)×r(i) )
which keeps the Jacobian matrix Þxed was suggested by Stewart [229].
The nonlinear ABS methods contain the continuous Brown and Brent methods
and the Gay-Brown class of methods [124]. For other connections see [8], [6], [9], [102],
[164], and [163].
Using the properties of the rank reduction algorithm one can show that matrices
Ak = A (uk ), Pk and Vk satisfy conditions (5.42) and (5.43). It is also easy to see that
√
kAk − A (ω)k = kA (uk ) − A (ω)k ≤ mK1 max ksj kα
1≤j≤k
√
≤ mK1 K3α kxi − ωkα .
Projection methods for nonlinear algebraic systems
86
Hence by Theorem 121, Algorithm 5 has local convergence of order 1 + α.
Theorem 121 is a generalization of the convergence results of Galántai [96], [98]
and [102]. Other local convergence results were proved by Abaffy, Galántai, Spedicato [8],
Abaffy, Galántai [6], Abaffy [2], Deng, Zhu [65], [66], Huang [153], [155], Deng, Spedicato,
Zhu [64], Spedicato, Huang [224] and Zhu [256].
There are several modiÞcations of the nonlinear ABS methods to obtain greater
efficiency at least in principle. These modiÞcations include the multistep version of the
nonlinear ABS methods by Huang [151], the truncated nonlinear ABS methods by Deng,
Chen [62] and Abaffy [4], various discretizations by Huang [150], [154], Jeney [164], [163],
[165], [166], [168], Spedicato, Chen, Deng [222], [49], Deng, Chen [63], Zhu [255] and the
quasi-Newton ABS methods discussed in the next section. These modiÞed nonlinear ABS
methods also have their own local convergence proofs. All these convergence theorems
however are generally complicated even for special cases (see, e.g. [183]). The reason for
this is the very complicated structure of the iterations. In contrast to these results, the
proof of our theorem is relatively simple and structural, although its application is not
always easy. In fact, all earlier convergence results follow from Theorems 121 and 125.
We demonstrate this fact in the cases we study here.
We can reformulate Algorithm 5 by incorporating the scaling matrix V into F
as follows. Let Fe(x) = V T F (x), and Ãi = [AT (u1 )V1 , . . . , AT (ur )Vr ]T . Let again Im =
[E1 , . . . , Er ] (Ek ∈ Rm×mk , k = 1, . . . , r). Then step i of Algorithm 5 has the equivalent
form
y1
yk+1
xi+1
= xi
³
´−1
= yk − Pk EkT Ãi Pk
EkT Fe (yk )
= yr+1
(k = 1, . . . , r)
(5.62)
with the update procedure
H1
Hk+1
Pk
= I
)
³
´−1
= Hk − Hk ÃTi Ek WkT Hk ÃTi Ek
WkT Hk
= HkT Zk
(k = 1, . . . , r)
(5.63)
Note that relation (5.42) holds provided that no breakdown occurs.
The next result gives the condition of well-deÞniteness for the nonlinear GILUABS
methods.
Proposition 126 If W = Z = Q then step i of Algorithm 5 is breakdown free if and only
ei Q is block strongly nonsingular, then
if Ãi Q is block strongly nonsingular. If A
P = QUA−1
e Q.
(5.64)
i
Proof. The algorithm is breakdown free iff PkT AT (uk )Vk = WkT Hk AT (uk ) Vk =
is nonsingular for k = 1, ..., r. By Theorem 69 this holds if and only if
I Ãi Q = Ãi Q is block strongly nonsingular. Theorem 72 implies the rest.
Using Theorem 121 we study the following special cases of the nonlinear GILUABS methods. In all cases r (i) ≡ m.
(i) The implicit LU or Brown method:
WkT Hk ÃTi Ek
vk = wk = zk = ek
(k = 1, . . . , m);
(5.65)
(ii) The scaled Huang method:
zk = wk = A (uk ) vk
(k = 1, . . . , m);
(5.66)
Particular methods
87
(iii) Symmetrized conjugate direction ABS methods:
W = Z = Q,
vk = C (uk ) pk
(k = 1, . . . , m),
(5.67)
where C (x) ∈ Rm×m is continuous and G (x) = C T (x) A (x) is symmetric and positive
deÞnite in S (ω, δ0 ).
We prove the following results.
Theorem 127 (i) The implicit LU method is breakdown free in a suitable ball S(ω, δ 0 )
(0 < δ 0 ≤ δ0 ), if A(ω) is strongly nonsingular.
(ii) The scaled Huang method is breakdown free in a suitable ball S(ω, δ 0 ) (0 < δ 0 ≤ δ0 ),
if the scaling matrices V are nonsingular with cond(V ) ≤ Γ (Γ > 0).
(iii) Let T = [e1 , . . . , e1 ] ∈ Rm×m . The SCD ABS methods are breakdown free in the ball
S(ω, δ0 ), if Qi = Q is nonsingular (i ≥ 1).
Proof. (i) For the implicit LU method Q = I and the strong nonsingularity of
Ãi is the necessary and sufficient condition of the well-deÞniteness. If A(ω) is strongly
nonsingular, then A(x) is also strongly nonsingular in a suitable ball S(ω, δ 0 ) (0 < δ 0 ≤ δ0 ).
If δ ∗ is chosen such that δ 0 ≥ δ ∗ holds, where δ ∗ > 0 is deÞned in Theorem 121, then
every Ãi is strongly nonsingular (i ≥ 1) provided that x1 ∈ S(ω, δ ∗ ). (ii) For the scaled
Huang method Z = W = ÃTi . Hence the condition of well-deÞniteness is the strong
nonsingularity of Ãi ÃTi , which is satisÞed if and only if Ãi is nonsingular for every i ≥ 1.
Since A(ω) is nonsingular by assumption (5.4) A(x) is also nonsingular in a° suitable° ball
S(ω, δ 0 ) (0 < δ 0 ≤ δ ). Consequently V must be nonsingular. If δ 0 < 1/[K1 Γ °A(ω)−1 °]1/α
and δ ∗ ≤ δ 0 then every Ãi is nonsingular (i ≥ 1) provided that x1 ∈ S(ω, δ ∗ ) and V is
nonsingular with cond(V ) ≤ Γ (see [98], Lemma 1). (iii) If T = [e1 , . . . , e1 ] ∈ Rm×m ,
then Ãi Q = V T A(y1 )Q. Applying Theorem 105 we obtain the requested result.
If T 6= [e1 , ..., e1 ] in the SCD ABS subclass, then the biorthogonality property
V T A(y1 )P = D (D is diagonal) is generally lost.
Example 128 Consider again the problem
f1 (x1 , x2 ) = x21 − x1 = 0
f2 (x1 , x2 ) = x21 − x2 = 0
and deÞne the following SCD ABS method:
·
¸
1 0
H1 = E2 , W = Z = E2 , T =
,
0 1
vi = A(yi )pi
(i = 1, 2).
A straightforward calculation shows that v2T A(y2 )p1 6= 0.
For any m × m matrix B we denote the unique unit lower triangular (block) LU
factorization by B = LB VB . We exploit Theorem 72 in proving
Theorem 129 Assume that W = Z = Qi for i ≥ 1. Condition (5.45) is satisÞed if
°° °
°
°° °
°
(5.68)
°Qi VÃ−1Q ° °Ãi ° ≤ Γ2 (i ≥ 1)
i
holds.
i
Projection methods for nonlinear algebraic systems
88
Proof. We use the fact that condition (5.45) is equivalent with condition (5.57).
Using the formulation (5.62)-(5.63) of Algorithm 5 we change condition (5.57) to
° ³
°
´−1
°
°
T
°Pk EkT Ãi Pk
°
Ã
E
(k = 1, . . . , r (i) ; i ≥ 1) .
(5.69)
k i ° ≤ Γ2
°
Let D = diag(E1T Ãi P1 , . . . , ErT Ãi Pr ). Then Pk (EkT Ãi Pk )−1 = P D−1 Ek and P D−1 =
Qi VÃ−1Q imply that
i
i
° °
° ³
°
° °
°° °
´−1
° °
°° °
°
° °
° °
°
−1 ° °
T
−1
T
−1 ° °
°
°Pk EkT Ãi Pk
D
Ã
=
≤
P
D
=
E
Ã
E
E
Ã
V
°P
°Q
°
°
°
k k i
i
i à Q ° °Ãi ° .
k i°
°
i
i
°
°° °
°
°° °
If °Qi VÃ−1Q ° °Ãi ° ≤ Γ2 then (5.57) also holds, which was to be proven.
i i
The result gives the possibility of checking condition (5.45) in terms of the free
parameters, in advance.
Theorem 130 (i) The Brown method is convergent, if A(ω) is strongly nonsingular.
(ii) The
Huang method is convergent, if the scaling matrices V satisfy kV − V0 k ≤
°
° scaled
1/(2 °V0−1 °) (det (V0 ) 6= 0).
°
°° °
°
°° °
Proof. (i) For Qi = Im condition (5.68) has the form °VÃ−1 ° °Ãi ° ≤ Γ2 (i ≥ 1).
i
nonsingular.
This obviously holds in some ball S(ω, δ 0 ) (0 < δ 0 ≤ δ0 ), if A(ω) is strongly
°
°° °
° T −1 ° ° °
T
(ii) For the Huang subclass Q = Ãi implying that condition (5.68) is °Ãi VÃ ÃT ° °Ãi ° ≤
¡ i° i °¢
Γ2 (i ≥ 1). This is satisÞed in S(ω, δ 0 ) (0 < δ 0 ≤ δ0 ), if kV − V0 k ≤ 1/ 2 °V0−1 ° and
det (V0 ) 6= 0.
Note that the sufficient conditions of the local convergence are essentially the
same as the conditions of being breakdown free. For the SCD ABS methods we can give
a much better result.
Theorem 131 Any breakdown free SCD ABS method is convergent.
Proof. We give two proofs. As yT G(x)y ≥ µ1 (x)yT y we have the estimation
°
° °
°
°pk (v T A(uk )pk )−1 vT ° = °pk (pT G(uk )pk )−1 pT F T (uk )° ≤ kF (uk )k ≤ K2
k
k
k
k
µ1 (uk )
n
o
with K2 = max kF (x)k /µ1 (x) |x∈S (ω, δ 0 ) , where µ1 (x) > 0 is the smallest eigenvalue
of G(x) and 0 < δ 0 < δ0 . As condition (5.45) holds, the local convergence is proven.
For the second proof let G(x) = L(x)L(x)T be the LLT -factorization of G(x) and let
w = L(uk )T pk . Then we have
°
°
°
° ¡
°° ¡
¢−1 T °
¢−1 T °
°
°
°
°
°°
°°
vk ° ≤ °L (uk )−T ° °w wT w
w ° °L (uk )−1 F T (uk )° ≤
°pk vkT A (uk ) pk
°
°
°°
°
°
°°
≤ °L (uk )−T ° °L (uk )−1 F T (uk )°
which is clearly bounded in S(ω, δ 0 ). Hence the convergence is proved again.
The second proof exploits ideas of Abaffy [2]. The Theorem covers a special case
investigated in [66]. For the SCD ABS methods we improve Proposition 124.
Proposition 132 There is a ball S(ω, δ 0 ) (0 < δ 0 < δ0 ) such that kV k is bounded if and
only if kP k is bounded.
Particular methods
89
n
o
√
Proof. If kP k ≤ γ then kV k ≤ mγ max kF (x)k |x ∈ S (ω, δ 0 ) . If kV k ≤ τ
n°
o
°
√
then kP k ≤ mτ max °C −1 (x)° |x ∈ S (ω, δ 0 ) .
Applications of the nonlinear ABS methods are given in [152], [158], [210], [81]
[209], [208].
5.3.3 Quasi-Newton ABS methods
The quasi-Newton methods are very efficient methods. They save on the Jacobian calculation and the cost of linear solver. Although their convergence rate is only linear or
superlinear they often outperform the Newton method. Huang was the Þrst who combined the ideas of the quasi-Newton methods and the nonlinear ABS methods [159], [156],
[157]. His
save on the Jacobian but loose on the convergence rate and still re¢
¡ methods
quire O m3 ßops per step. A similar ABS type method was developed by Ge Rendong
[125]. Galántai and Jeney [118], [119] derived some signiÞcantly better quasi-Newton
ABS methods which are competitive with the Broyden method considered as the best
quasi-Newton method. Here we present a case when the quasi-Newton approaches outperform the Newton method. The section contains our quasi-Newton ABS methods and
the related numerical experiments. The presented local convergence theorems are due to
Galántai [105], [106], [107] and follow from Theorem 121.
For the nonlinear equation F (x) = 0 (F : Rm → Rm ) the Newton method
−1
xk+1 = xk − [F 0 (xk )]
F (xk )
k = 0, 1, . . .
(5.70)
has local quadratic convergence (α = 1). One step
¡ of¢ the Newton method costs one
Jacobian and one function evaluation and c1 m3 + O m2 (c1 > 0) ßops in arithmetic operations. It is known
practice that linearly convergent iteration methods with com¢
¡ from
putational cost O m2 ßops per step may signiÞcantly outperform the Newton method
in total execution time. Using the mesh-independence principle [15] we can easily prove
this observation.
Consider the solution of the abstract nonlinear operator equation
F (x) = 0
(F : X → Y ) ,
(5.71)
where X and Y are Banach spaces and x+ denotes a solution. Assume that equation
(5.71) is solved through a family of discretized problems
Fj (x) = 0
(Fj : Xj → Yj , j > 1) ,
(5.72)
where Xj and Yj are Þnite-dimensional spaces. For simplicity j denotes the number of
unknowns and ωj+ is the solution. The Newton-method on problem (5.71) has the form
£ ¡ ¢¤−1 ¡ k ¢
xk+1 = xk − F 0 xk
F x
(k = 0, 1, . . . ) .
The Newton method on discretization level j (problem (5.72)) has the form
£ ¡ ¢¤−1 ¡ k ¢
(k = 0, 1, . . . ) .
= xkj − Fj0 xkj
Fj xj
xk+1
j
(5.73)
(5.74)
Under appropriate conditions [15] it can be shown, that for any Þxed ε > 0 there exists a
constant j0 = j0 (ε) such that
¯
°
°
°
°
©
ª¯
ª
©
¯min k ≥ 0| °xk − ω ° < ε − min k ≥ 0| °xkj − ω+ ° < ε ¯ ≤ 1
(5.75)
j
holds for all j ≥ j0 (ε). In other words, the behavior of the abstract Newton method
determines the necessary iteration number for large j. Let
°
°
©
ª
(5.76)
k1 = min k ≥ 0| °xk − ω° < ε .
Projection methods for nonlinear algebraic systems
90
For linearly convergent iteration methods of the form
¡ ¢−1 ¡ k ¢
yjk+1 = yjk − Akj
Fj yj
¡ k
¡ ¢
¢
Aj ≈ Fj0 ωj+ , k = 0, 1, . . .
(5.77)
we have a similar behavior [172], that is
°
°
ª
©
(5.78)
k2 = min k ≥ 0| °yjk − ωj+ ° < ε
¡ ¢
is independent of level j. If method (5.77) costs ¡O j 2¢ ßops per step, then the total cost
of an approximate solution with ε precision is O k2 j 2 ßops. For
¡ the¢Newton method an
approximate solution with the same ε precision costs c1 k2 j 3 + O k1 j 2 ßops. It is obvious
that for j large enough the inequality
¢
¡
c1 k2 j 3 > O k2 j 2
(5.79)
holds. Thus the total cost (execution time) of the Newton method is larger than that of
method (5.77).
¡ ¢
The most important methods of the form (5.77) with O m2 ßops iteration cost
are the quasi-Newton methods [34], which have the following form [67].
Quasi-Newton method (iteration k)
Solve Ak sk = −F (xk ) for sk
xk+1 = xk + sk
yk = F (xk+1 ) − F (xk )
Ak+1 = φ (Ak , sk , yk )
The matrix A0 ≈ F 0 (x+ ) is given and Ak is updated such that
Ak+1 sk = yk
(k ≥ 0) .
(5.80)
The rank one updates have the form
¢
¡
Ak+1 = Ak + (yk − Ak sk ) zkT / zkT sk ,
(5.81)
where zk ∈ Rm is chosen properly.
The convergence speed of the quasi-Newton methods is superlinear [40], [67]. The
linear system Ak sk = −F (xk ) can be solved in O(m2 ) ßops either using the ShermanMorrison-Woodbury formula
¡ ¢or a fast QR update [67], [128]. Thus the computational
cost of one iteration is O m2 ßops plus one function evaluation. In practice, the quasiNewton methods with the Broyden updates are widely used for solving large nonlinear
systems and unconstrained optimization problems.
The quasi-Newton ABS methods of Galántai and Jeney [118], [119] are derived
in the following way. We keep A (uk ) Þxed during the minor iteration loop and calculate
it outside the minor iteration loop.
Quasi-Newton ABS method 1 (iteration k of QNABS1)
y1 = xk , H1 = I
for i = 1, . . . , m
pi = HiT zi
¢
¡
yi+1 = yi − viT F (yi ) pi / viT A¡k pi
¢
Hi+1 = Hi − Hi ATk vi wiT Hi / wiT Hi ATk vi
end
xk+1 = ym+1
sk = xk+1 − xk
Yk = F (xk+1 ) − F (xk )
Ak+1 = φ (Ak , sk , Yk )
Particular methods
91
¡ ¢
The computation cost of directions pi is O m3 ßops per iteration. This cost can
be reduced to O(m2 ) ßops in the following cases:
(i) P is the inverse of V T Ak ;
(ii) P is orthogonal up to a diagonal scaling.
In these cases we can take advantage of the Sherman-Morrison-Woodbury formula, or the
fast QR-update algorithm [67], [128].
For the rest of section we assume that W = Z and V T Ak Z is strongly nonsingular.
Then by Theorem 72
P = ZUV−1T Ak Z .
(5.82)
−T
U , where U is some
Proposition 133 P = (V T Ak )−1 holds if and only if Z = A−1
k V
unit upper triangular matrix.
−T
implying that Z =
Proof. If P = (V T Ak )−1 , then P = ZUV−1T Ak Z = A−1
k V
−1 −T
−1 −T e
e
Ak V UV T Ak Z . In turn, assume that Z = Ak V U , where U is some unit upper
e and P = ZU −1T
e −1 =
triangular matrix. Relations V T Ak Z = U
imply that P = Z U
V
Ak Z
−T
.
A−1
k V
¢−1
¡
By observing that viT Ak pi = 1 for i = 1, . . . , m for the case P = V T Ak
we
can deÞne the following special case of Algorithm QNABS1.
Quasi-Newton ABS method 2 (iteration k of QNABS2)
y1 = xk
−T
Calculate P = A−1
k V
for i = 1, . . . , m
yi+1 = yi − viT F (yi ) pi
end
xk+1 = ym+1
sk = xk+1 − xk
Yk = F (xk+1 ) − F (xk )
Ak+1 = φ (Ak , sk , Yk )
The following result can also be derived from Propositions 76 and 83.
Proposition 134 P is orthogonal up to a diagonal scaling if and only if Z = ATk V L−T ,
where L is some lower triangular matrix.
Proof. If P is orthogonal up to a diagonal scaling, then a unique diagonal matrix
D1 exists such that P D1 is orthogonal. The orthogonality condition
−1
T
(P D1 )T (P D1 ) = D1 UV−T
T A Z Z ZUV T A Z D1 = En
k
k
implies that Z T Z = UVT T Ak Z D1−2 UV T Ak Z . The symmetric and positive deÞnite matrix
Z T Z has a unique LDLT factorization
Z T Z = UZTT Z DZ T Z UZ T Z
from which UV T Ak Z = UZ T Z follows. If two nonsingular matrices B and C have LU
factorizations B = L1 U and C = L2 U with same unit upper triangular matrix U, then
a unique lower triangular matrix L exists such that B = LC. Thus there is a lower
triangular matrix L such that V T Ak Z = LZ T Z. This implies that Z = ATk V L−T . We
prove the reverse statement in two steps. We Þrst select L = E and Z = ATk V . Then
V T Ak Z = V T Ak ATk V and P = ATk V UV−1T Ak AT V . The matrix P D1 = ATk V UV−1T Ak AT V D1
k
k
92
Projection methods for nonlinear algebraic systems
e = AT V L−T with some
is clearly orthogonal, if D12 = DV−1T A AT V . Secondly we choose Z
k
k k
T
e is given by
lower triangular matrix L. Then the unique LDU factorization of V Ak Z
e D
e −1 UV T A AT V L−T ),
V T Ak ATk V L−T = LV T Ak ATk V (DV T Ak ATk V D)(
k k
e is the only diagonal matrix for which D
e −1 UV T A AT V L−T is unit upper triangular.
where D
k k
e −1
e holds.
Hence for the corresponding direction matrix Pe = ZU
= AT V U −1T
D
T
e
V T Ak Z
k
V
Ak Ak V
e it follows that Pe is also orthogonal matrix up to a diagonal scaling. Thus
From Pe = P D
we proved the statement.
We exploit
two observations. First, the minor iteration step yi+1 =
¢
¡ the following
yi − pi viT F (yi ) / viT Ak pi is invariant under the transformation P → P D, where D is
T
a diagonal matrix. The second observation
is that
´ in case Z = Ak V the decomposition
³
ATk V = P UV T Ak ATk V = (P D1 ) D1−1 UV T Ak ATk V
deÞnes a QR factorization of matrix
ATk V
, where matrix P D1 is the orthogonal component. As the QR factorization is a
unique up to a diagonal scaling and the Þrst observation holds we can choose direction
matrix P as the orthogonal component of an arbitrary QR factorization of matrix ATk V .
We also note that viT Ak pi = eTi RT ei = rii for i = 1, . . . , m. Thus we can deÞne the
following special case of Algorithm QNABS1.
Quasi-Newton ABS method 3 (iteration k of QNABS3)
y1 = xk ,
Calculate QR = ATk V and set P = Q
for i = 1, . . . , m
yi+1 = yi − viT F (yi ) pi /rii
end
xk+1 = ym+1
sk = xk+1 − xk
Yk = F (xk+1 ) − F (xk )
Ak+1 = φ (Ak , sk , Yk )
¡ ¢
Algorithms QNABS2 and QNABS3 require O m2 ßops and two function evaluations per step assuming that the scaling matrix V is diagonal.
We prove the local convergence of the quasi-Newton ABS methods for the Broydenupdate [34]
¢
¡
(5.83)
φ (Ak , sk , Yk ) = Ak + (yk − Ak sk ) sTk / sTk sk .
Theorem 135 Assume that conditions (5.4)-(5.6)
with α °= 1 and directions {pi }ni=1
° hold
satisfy (5.45). There exist ε, δ1 > 0 such that if °A(1) − A (ω)° ≤ δ1 and kx1 − ω k ≤ ε,
then Algorithm QNABS1 converges to ω linearly.
Proof. Algorithm QNABS1 is a special nonlinear conjugate direction method and
so Theorem 121 applies here. We just have to prove that Ak remains a sufficiently good
approximation to A (ω). The lemma of bounded deterioration (see [40] or [67] Lemma
8.2.1, pp. 175-176) says, that if xk+1 , xk ∈ S (ω, δ0 ) and xk 6= ω, then
kAk+1 − A (ω)k ≤ kAk − A (ω)k +
K1
(kxk+1 − ωk + kxk − ωk)
2
(5.84)
holds for thenBroyden-update.
Suppose that Γ∗ = 2δ ∗ and K5 (Γ∗ + δ ∗ ) = 3K5 δ ∗ ≤ 1/2.
o
∗
2δ
, kx1 − ωk ≤ ε, kA1 − A (ω)k ≤ δ ∗ . We prove by induction that
Let ε = min δ ∗ , 3K
1
kAk − A (ω)k ≤ Γ∗
(k ≥ 1) .
Particular methods
93
By the lemma of bounded deterioration we have
kA2 − A (ω)k ≤ kA1 − A (ω)k +
Assume that kA1 − A (ω)k ≤ δ ∗ +
³P
k−1
j=1
3K1
δ∗
ε ≤ δ∗ +
< Γ∗ .
4
2
´
2−j δ ∗ . Again by the lemma we get


k
X
3K1
kAk+1 − A (ω)k ≤ kAk − A (ω)k + k+1 ε ≤ δ ∗ + 
2−j  δ ∗ < Γ∗ .
2
j=1
Hence we proved that the quasi-Newton update matrices Ak are bounded and are sufficiently close to A (ω).
Corollary 136 If δ1 > 0 is small enough and V is nonsingular, then Algorithm QNABS2
is linearly convergent.
Proof. For δ1 small enough Ak is nonsingular. Hence the algorithm is breakdown
free. Since
° ¡
°
°
¢−1 T °
°
°
° cond(V ),
vi ° ≤ kP k kV k ≤ °A−1
°pi viT Ak pi
k
condition (5.45) also holds, if Ak is sufficiently close to A (ω).
Corollary 137 If δ1 > 0 is small enough and V is nonsingular, then Algorithm QNABS3
is linearly convergent.
Proof. Algorithm QNABS3 is breakdown free, if V T Ak Z = V T Ak ATk V is
strongly nonsingular. This is the case, if Ak and Vk are nonsingular. Theorem 129
implies that condition (5.45) is fulÞlled if W = Z = Q and
°°
°
°
°
°
°QUV−1T Ak Q DV−1T Ak Q ° °V T Ak ° ≤ Γ
holds for some Γ > 0. For Q = ATk V this obviously holds, if Ak is sufficiently close
to
observation also follows from the continuity of the QR-decomposition as
° A¡ (ω). The
¢−1 T °
°
°
T
vi ° ≤ kV k / |rii |.
°pi vi Ak pi
Since the quasi-Newton methods have local superlinear convergence, we may ask
whether or not the quasi-Newton ABS methods have such property. We need the following
two facts [40], [67]. The sequence {xk } is superlinearly convergent to ω, if and only
if kF (xk+1 )k / ksk k → 0. If xk → ω and the sequence {xk } satisÞes the inequality
kxi+1 − ωk ≤ 12 kxi − ωk, then
k(Ak − A (ω)) sk k
→0
ksk k
holds for the Broyden-update, provided that x0 is sufficiently close to ω.
Using the inequality |kak − kbk| ≤ ka − bk we readily obtain that
¯
¯
¯ kf (xk+1 )k kf(xk )+Ak sk k ¯
kYk −Ak sk k
¯ ≤
¯ ksk k −
ksk k
ksk k
≤
kYk −A(ω)sk k
ksk k
+
k(Ak −A(ω))sk k
.
ksk k
As the inequality
kYk − A (ω) sk k ≤
K1
ksk k (kxk+1 − ωk + kxk − ωk)
2
Projection methods for nonlinear algebraic systems
94
also holds (see, e.g., [67]) we conclude that
kf (xk+1 )k
→0
ksk k
⇔
kf (xk ) + Ak sk k
→ 0.
ksk k
Thus Algorithm 4 converges superlinearly to x+ , if and only if
kf (xk ) + Ak sk k
→ 0.
ksk k
(5.85)
For the quasi-Newton methods f (xk ) + Ak sk = 0 holds implying the superlinear convergence. Such equality relation is not true for the ABS and quasi-Newton ABS methods
indicating one signiÞcant difference with the Newton-like methods. Thus the relation
kf (xk ) + Ak sk k / ksk k → 0 has to be proven in order to get a superlinear convergence
result for Algorithm QNABS1.
The multistep versions of Algorithm QNABS2 and QNABS3 which save on the
calculation of P are deÞned as follows.
Multistep Quasi-Newton ABS method 2 (iteration k of mQNABS2)
(1)
y1 = xk
−T
Calculate P = A−1
k V
for j = 1, . . . , t
for i = 1, . . . , m
´
³
(j)
(j)
(j)
pi
yi+1 = yi − viT F yi
end
(j+1)
(j)
= ym+1
y1
end
(t)
xk+1 = ym+1
sk = xk+1 − xk
Yk = F (xk+1 ) − F (xk )
Ak+1 = φ (Ak , sk , Yk )
Multistep Quasi-Newton ABS method 3 (iteration k of mQNABS3)
(1)
y1 = xk
Calculate QR = ATk V and set P = Q
for j = 1, . . . , t
for i = 1, . . . , m
´
³
(j)
(j)
(j)
pi /rii
yi+1 = yi − viT F yi
end
(j+1)
(j)
= ym+1
y1
end
(t)
xk+1 = ym+1
sk = xk+1 − xk
Yk = F (xk+1 ) − F (xk )
Ak+1 = φ (Ak , sk , Yk )
The cost of one iteration for both algorithms is O(tn2 ) ßops plus t + 1 function
evaluations, if V is a diagonal matrix. For the local convergence of the t-step quasi-Newton
ABS methods we can prove the following result.
Theorem 138 There exist numbers ε, δ1 > 0 such that if kA0 − A (x+ )k ≤ δ1 and
kx0 − x+ k ≤ ε hold, then Algorithms mQNABS2 and mQNABS3 to x+ with a linear
speed.
Particular methods
95
(j)
(j)
Proof. Let zi = yi − ω. Similarly to the proof of Theorems 121 and 125 we
can obtain the estimate
°
°
°´ °
°
°
³
° (j) ° ° (j) °
° (j) °
°zm+1 ° ≤ γj kAk − A (ω)k + °z1 ° °z1 ° (j = 1, . . . , t) ,
from which our claim clearly follows.
It is noted that kxk+1 − ωk / kxk − ωk = O (δ t ), proved that kAk − A (ω)k ≤ δ
and kxk − ωk ≤ δ. This may explain the observed good behavior of Algorithms mQNABS2
and mQNABS3.
Derivation and variants of the quasi-Newton ABS algorithms and related results
can be found in papers [118], [119], [106], [107]. The convergence of the methods was
proved in [106], [107]. Numerical testing of the algorithm is given in [118], [119], [121],
[167]. A FORTRAN program of Algorithm QNABS3 is given in Section 7.4 [120].
It is worth noting that O’Leary [200] showed that the Broyden method itself when
applied to linear systems is a kind of projection method.
Experimental investigations
There have been four comparative numerical testing of the quasi-Newton ABS methods
under various circumstances [118], [119], [121], [167]. The Þrst three testing used standard
fast QR update technique for the quasi-Newton updates [67], [128].
The Þrst testing was done in PASCAL single precision [118] on the 100 test
problems of the Estonian collection [211]. The comparison of the Huang ABS method,
the row update ABS method of Z. Huang [157], [156], [159], the Broyden method and the
quasi-Newton ABS method QNABS3 with the Broyden update showed the superiority
of the Broyden method and Algorithm QNABS3 over the other two. In fact, Algorithm
QNABS3 was slightly better than Broyden’s algorithm.
The second and third comparative testing were done in FORTRAN 77 double
precision and both testing used the same set of test problems. This set of test problems
consists of 32 variable size nonlinear systems. The Þrst 22 problems, which contain the
variable size Argonne test problems [191], were taken from the Estonian collection of test
equations [211]. The rest of test problems were selected from numerical ODE and PDE
literature. The full description of the test problems can be found in [121]. The stopping
criteria was
kF (xk )k ≤ 10−10
or
k = 120.
If kF (xk )k ≤ 10−10 was satisÞed in less than 120 iteration steps the test problem was
considered being solved. Ranking of the methods was done with respect to the average iteration number, the average execution time (in seconds) and the number of solved
problems. The averages were calculated on those test problems, which were solved by all
methods.
Table 5.1 shows some of the numerical results from [121]. The computations were
carried out on an IBM RISC/6000 workstation. Algorithm QNABS3 used the Broyden
update. The code of the Brent method was written by Moré and Cosnard [190] and can
be found as algorithm TOMS 554 in Netlib. It can be seen from Table 5.1 that Broyden’s
method is the best in average CPU time closely followed by Algorithm QNABS3. They
both outperform the third best method (Brent) by factors varying from 2 to 6.
The third numerical testing [119] compared Þve quasi-Newton methods (Þve updates) with the corresponding versions of Algorithm QNABS3. The Þve updates selected on the basis of Spedicato and Greenstadt [223] were the following: Broyden Þrst
and second formula, Thomas formula, Pearson and Fletcher formula. The best methods
96
Projection methods for nonlinear algebraic systems
Method
m=50
m=100
m=200
ABS Huang 6.31/2.39/21 6.38/18.94/20 6.53/162.27/20
row update
8.94/3.82/19 8.94/29.68/16 9.40/256.43/16
BRENT
2.81/1.36/23 2.81/10.44/21 2.87/80.55/20
BROYDEN 10.06/0.55/21 10.19/2.42/19 10.40/11.78/19
QNABS3
10.31/0.64/18 10.56/2.74/17 11.27/13.20/17
Table 5.1 average iteration number/average execution time/number of solved problems
Method
Broyden1
Broyden3
Broyden5
BroydenD
mQNABS3(1)
mQNABS3(3)
mQNABS3(5)
mQNABS3(D)
Table 5.2 average execution
m=200
m=400
31.7/19 174.4/16
28.8/17 164.1/16
28.6/15 164.1/14
31.9/12 164.6/13
35.6/17 190.5/16
28.2/17 155.3/16
26.6/17 149.0/16
26.8/17 142.5/16
time/number of solved problems
among the quasi-Newton methods were the Broyden method in average CPU time and the
Thomas method in average iteration number in agreement with the results of Spedicato
and Greenstadt on a different set of test problems [223]. The quasi-Newton ABS methods (Algorithm QNABS3) were better in average iteration number but they were slightly
(≈ 6%) worse in average CPU time. Our Þrst testing [118] indicates that this situation
may change with the implementation (language, precision, etc.).
We tested the multistep quasi-Newton ABS method mQNABS3 with the Broyden
update versus the multistep quasi-Newton methods of the following form.
Multistep quasi-Newton method (iteration k)
y1 = xk
for j = 1, . . . , t
Solve Ak ∆j = −F (yj ) for ∆j
yj+1 = yj + ∆j
end
xk+1 = yt+1
Sk = xk+1 − xk
Yk = F (xk+1 ) − F (xk )
Ak+1 = φ (Ak , Sk , Yk )
Here we used the Broyden update. We selected steps t = 3, 5 and a dynamic
choice of step number t with respect to
(t−1)
(t)
− ym+1 k≤ 10−15
k ym+1
or
t = 20.
The notations mQNABS3(1), mQNABS(3), mQNABS(5), mQNABS3(D) stand for the
multistep quasi-Newton ABS method mQNABS3 with step number t = 1, 3, 5 and dynamic selection, respectively. The multistep versions of the Broyden method are denoted
by Broyden1, Broyden3, Broyden5 and BroydenD for step number t = 1, 3, 5 and dynamic
selection, respectively. We obtained the following results (Table 5.2).
The computations were done on a 40MHz IBM386AT compatible computer with
Lahey F77-EM/32 compiler and optimized IMSL library. From Table 5.2 it can be concluded that multistep quasi-Newton ABS method is better than the Broyden method in
average CPU time.
Monotone convergence in partial ordering
97
The fourth testing was made by Jeney [167], who succeeded to make a stable
implementation of Algorithms QNABS2 and mQNABS2 and carried out a full comparison of the Þve algorithms. We quote the most important experimental results from his
paper [167]. The notation mQNABSx(t) stands for the quasi-Newton ABS algorithm
x. The selected values are t = 1, 3, 5, D. IN V and QR will denote the inversion and
QR-factorization based updates, respectively. The computations were done on a 40MHz
IBM386AT compatible computer with Lahey F77-EM/32 compiler and optimized IMSL
library. The machine epsilon was 2.2D − 16. The following tables contain the average
execution time (in seconds) versus the number of solved problems and the size of test
problems.
Method
Broyden-QR1
Broyden-QR3
Broyden-QR5
Broyden-QRD
mQNABS3(1)
mQNABS3(3)
mQNABS3(5)
mQNABS3(D)
m = 100
ex.time/eqs
7.8/20
6.7/18
6.4/18
6.3/18
9.3/18
6.7/18
6.3/18
5.8/18
m = 400
ex.time/eqs
309.3/17
293.2/17
286.7/17
282.4/17
324.3/17
278.2/17
269.0/17
260.8/17
We can see that the Broyden method (Broyden-QR1 ) is better in average execution time
than Algorithm QNABS3. However the best algorithm mQNABS3(D) is better than the
best m-step Broyden method (Broyden-QRD ).
Method
Broyden-INV1
Broyden-INV3
Broyden-INV5
Broyden-INVD
mQNABS2(1)
mQNABS2(3)
mQNABS2(3)
mQNABS2(D)
m = 100
ex.time/eqs
3.5/20
3.3/18
3.4/18
3.6/18
4.0/18
3.4/18
3.5/18
3.6/18
m = 400
ex.time/eqs
99.0/17
99.0/17
101.0/17
105.3/17
107.7/17
97.0/17
96.0/17
100.2/17
We can see that the inversion based Broyden method (Broyden-INV1 ) is better in average
execution time than Algorithm QNABS2. For m = 100 the best Broyden method is
slightly better than the best inversion based quasi-Newton ABS, while the situation is
quite the opposite for m = 400. So we cannot decide which method is better. We can
conclude however that Algorithms mQNABS2 and mQNABS3 are competitive with the
Broyden method. It can be also observed that the inversion based quasi-Newton and
quasi-Newton ABS methods are deÞnitely faster than the QR-factorization based ones,
while the number of solved problems is essentially the same for both approaches.
5.4
Monotone convergence in partial ordering
In certain special cases the nonlinear conjugate direction methods have monotone and
essentially global convergence in the natural partial ordering. We also show that a particular conjugate direction method is faster than Newton’s method also in the partial
ordering. The results of the section were published in [98], [104].
Projection methods for nonlinear algebraic systems
98
We require the following concepts and results (see, e.g. [202], [201]). The function
F : Rm → Rm is said to be convex on a convex set D ⊆ Rm if
F (αx + (1 − αy)) ≤ αF (x) + (1 − α) F (y)
(5.86)
holds for all x, y ∈ D and α ∈ [0, 1]. We recall the following basic result [202], [201].
Assume that F : Rm → Rm is differentiable on the convex set D. Then F is convex on D
if and only if
F (y) − F (x) ≥ F 0 (x)(y − x)
(5.87)
holds for all x, y ∈ D. The function F : Rm → Rk is said to be isotone, if x ≤ y implies
F (x) ≤ F (y).
The monotone convergence will be proven under the following conditions:
(a) F : Rm → Rm is continuously differentiable and convex on Rm ;
(b) A (x) = F 0 (x) is a nonsingular M-matrix for all x ∈ Rm and A : Rm → Rm×m is
isotone.
If there is a solution ω of the equation F (x) = 0, then under the condition (a)
F (z) ≥ 0 implies z ≥ ω, and F (z) ≤ 0 implies z ≤ ω. Furthermore the solution ω is
unique if it exists (see, e.g., [201]). Throughout the section we assume that the solution ω
exists. Characterizations of functions satisfying (a) and (b) can be found in [202], [201],
[219], [91], [92] and the references therein.
Consider the following special (GILUABS) case of Algorithm 5 written in the
form

y1 = xi


³
´−1
T
T
(i ≥ 1) ,
(5.88)
yk+1 = yk − pk ek Ãi pk
ek F (yk ) (k = 1, . . . , m)


xi+1 = ym+1
and
H1
Hk+1
pk
= I
)
³
´−1
= Hk − Hk ÃTi ek zkT Hk ÃTi ek
zkT Hk
= HkT zk
(k = 1, . . . , m) ,
(5.89)
where Ãi = [AT (u1 )e1 , . . . , AT (um )em ]T . If no breakdown occurs, then
P = ZUA−1
e Z.
i
We remind that the scaling matrix V can be incorporated in F without loss of generality.
For simplicity we use the notation
³
´−1
¡
¢−1 T
ψk = ψk (uk ) = pk eTk A (uk ) pk
ek = pk eTk Ãi pk
eTk ∈ Rm×m .
(5.90)
We need the following observations.
−1
, then
Lemma 139 If Q ≥ 0, AQ is a nonsingular M-matrix and P = QUAQ
T
(i) ek Apk > 0 (k = 1, . . . , m);
(ii) P ≥ 0.
Proof. We recall the fact that a nonsingular matrix A is an M -matrix if and
only if there exist lower and upper triangular M -matrices R and S, respectively, such that
Monotone convergence in partial ordering
99
A = RS ([84], [23], [219], [249]). Since R and S are also nonsingular, it follows that the
factors of the LDU factorization A = LA DA UA are all nonsingular M -matrices. Hence if
−1
−1
≥ 0 and P = QUAQ
≥ 0. Also, we have
AQ is a nonsingular M-matrix, then UAQ
−1
ek = eTk LAQ DAQ ek = eTk DAQ ek > 0.
eTk Ai pk = eTk AQUAQ
bQ
b are strongly nonsingular, P = QU −1 and Pb =
Lemma 140 Assume that AQ and A
AQ
b j (j = 1, . . . , k + 1) and eT A = eT A
b (j = 1, . . . , k), then P ej = Pbej
b −1 . If Qej = Qe
QU
j
j
bQ
b
A
holds for j = 1, . . . , k + 1.
Proof. Direction matrices P and Pb can be generated by the rank reduction (or
GILUABS) conjugation algorithms
H1 = I,
Pi = HiT qi ,
and
b1 = I,
H
b iT qbi ,
Pbi = H
¡
¢−1 T
Hi+1 = Hi − Hi AT ei qiT Hi AT ei
qi Hi
³
´−1
bi+1 = H
bi − H
biA
bT ei qbiT H
biA
bT ei
bi
H
qbiT H
(i = 1, . . . , m)
(i = 1, . . . , m) ,
b1 , the given assumptions imply that Hj = H
b j for j = 2, . . . , k+
respectively. Since H1 = H
T
T
b
1 and pj = Hj qj = Hj qbj = pbj for j = 1, . . . , k + 1.
Theorem 141 Assume that
(i) F satisÞes conditions (a) and (b);
(ii) Z = Qi (i ≥ 1);
≥ Di A(xi ) holds for some nonsin(iii) The matrices Qi are nonnegative such that Q−1
i
gular diagonal matrix Di ≥ 0 (i ≥ 1);
(iv) There exist two nonsingular matrices Q∞ and D∞ such that Qi ≥ Q∞ ≥ 0 and
Di ≥ D∞ ≥ 0 (i ≥ 1);
If x1 ∈ Rm is an arbitrary point such that F (x1 ) ≥ 0 then algorithm (5.88)-(5.89) satisÞes
ω ≤ xi+1 = ym+1 ≤ ym ≤ · · · ≤ y1 = xi
(i ≥ 1)
(5.91)
and xi → ω as i → +∞.
Proof. We will use the observation that if A−1 ≥ 0, B ≥ 0 and B −1 ≥ DA holds
with a nonsingular diagonal matrix D ≥ 0, then AB is an M -matrix. If, in addition,
C −1 ≥ 0 and C ≤ A, then CB is also an M -matrix that satisÞes CB ≤ AB. Thus under
condition (iii) matrices A (x) Qi and A (xi ) Qi are M -matrices and satisfy the inequality
A (x) Qi ≤ A (xi ) Qi for x ≤ xi . We prove the monotone convergence in several steps.
1. Assume that F (xi ) ≥ 0. Then xi ≥ ω. Let k = 1 and y1 = xi . By assumption
ei p1 = eT1 A (xi ) q1 = eT1 A (xi ) Qe1 > 0. Consequently
u1 = y1 , p1 = q1 ≥ 0 and eT1 A
T
T e
ψ1 = ψ1 (ui ) = p1 e1 /e1 Ai p1 ≥ 0 and y2 = y1 − ψ1 F (y1 ) ≤ y1 . We show that y2 ≥ ω .
Relation (5.87) yields
F (y2 ) − F (y1 ) ≥ A(y1 )(y2 − y1 ) = −A(y1 )ψ1 F (y1 ).
Since A(y1 )Qi = A(xi )Qi is an M-matrix we have
ei p1 = A (y1 ) q1 eT1 /eT1 A (y1 ) q1 ≤ e1 eT1
A(y1 )ψ1 = A(y1 )p1 eT1 /eT1 A
Projection methods for nonlinear algebraic systems
100
from which A(y1 )ψ1 F (y1 ) ≤ e1 eT1 F (y1 ) and F (y2 ) ≥ (I − e1 eT1 )F (y1 ) ≥ 0 follow. The
latter condition implies that y2 ≥ ω.
2. Assume that yk ≤ yk−1 ≤ · · · ≤ y1 , F (yk ) ≥ 0, p1 , . . . , pk ≥ 0 and eTj Ãi pj > 0
(j = 1, . . . , k). Then ψj (uj ) ≥ 0 (j = 1, . . . , k) and yk+1 = yk − ψk (uk )F (yk ) ≤ yk .
DeÞne the matrix
A[k] = [AT (u1 )e1 , . . . , AT (uk ) ek , AT (uk ) ek+1 , . . . , AT (uk )em ]T .
The assumption of isotonicity and yk ≤ uk ≤ y1 imply A(yk ) ≤ A[k] ≤ A(y1 ). Hence
A[k] is also an M-matrix. Since Qi ≥ 0, A(yk )Qi ≤ A[k] Qi ≤ A(y1 )Qi . Condition (iii)
T [k]
ei
implies that A[k] Qi is also an M -matrix. Let P [k] = Qi UA−1
= eTj A
[k] Q . Then ej A
i
ei pj
(j = 1, . . . , k) implies that P [k] ej = pj (j = 1, . . . , k + 1) and eTj A[k] P [k] ej = eTj A
(j = 1, . . . , k). Relation (5.87) implies again that
F (yk+1 ) − F (yk ) ≥ A (yk ) (yk+1 − yk ) = −A (yk ) ψk F (yk ) .
Since A (yk ) ≤ A[k] and ψk ≥ 0 we have
LA[k] Qi DA[k] Qi ek eTk
A[k] pk eTk
A[k] P [k] ek eT
= T [k] [k] k = T
≤ ek eTk .
ei pk
ek A P ek
ek LA[k] Qi DA[k] Qi ek
eTk A
¢
¡
Hence F (yk+1 ) ≥ I − ek eTk F (yk ) and yk+1 ≥ ω. Relation pk+1 = P [k] ek+1 implies that
pk+1 ≥ 0. By the deÞnitions of Ãi and A[k+1] we have
A (yk ) ψk (uk ) ≤
eTk+1 Ãi pk+1 = eTk+1 A[k+1] pk+1 > 0.
3. We have proven that
ω ≤ xi+1 = ym+1 ≤ ym ≤ · · · ≤ y1 = xi
(i ≥ 1).
Hence there exists x̂ ∈ Rm such that xi → x̂ as i → +∞. We must prove that x̂ = ω.
(i)
(i)
Introduce the notation yj = yj (j = 1, . . . , m + 1; i ≥ 1). Then limi→+∞ yj = x̂ holds
for j = 1, . . . , m + 1. Let
γi =
m
X
j=1
As 0 ≤ ψj ≤
Then
Pm
j=1 ψj
(i)
(i)
ψj [F (y1 ) − F (yj )] (i ≥ 1).
we use a monotone norm for which 0 ≤ X ≤ Y implies kXk ≤ kY k.
°
°
° m ° ³ ´
°m
³ ´°
°X ° X
°
(i)
(i) °
°
y
−
F
yj °
ψ
kγi k ≤ °
°F
j°
1
°
°j=1 ° j=1
(i ≥ 1) .
The continuity assumption implies that
lim
i→+∞
m ° ³
´
³ ´°
X
°
(i)
(i) °
°F y1 − F yj ° = 0.
j=1
´
³
ei pk = D e
By deÞnition diag eTk A
Ai Qi and so
m
X
j=1
ψj =
k
X
pk eTk
−1
−1
−1
= P DA
e i Qi = Qi UA
e i Qi DA
e i Qi .
ei pk
eT A
j=1
k
Monotone convergence in partial ordering
101
If B is a nonsingular M-matrix and B = LB DB UB is its LDU -factorization, then 0 ≤
−1
−1
ei Qi we obtain that
≤ UB−1 DB
≤ B −1 . Substituting B with A
DB
−1
0 ≤ Qi DA
e Q ≤
i
i
m
X
j=1
−1
e−1
ψj = Qi UA−1
e Q DA
e Q ≤ Ai .
i
i
i
i
−1
ei Qi ≤ D−1 , while Theorem 9 implies D e
Assumption (iii) implies A
i
Ai Qi ≤ Di . Since
ei ≥ A (xi+1 ) ≥ A(ω), we obtain
A
0 ≤ Q∞ D∞ ≤ Qi Di ≤
m
X
j=1
ψj ≤ A−1 (ω).
Consequently
´ ³γi →
´ 0 for i → +∞. Assume that F (x̂) ≥ 0 and F (x̂) 6= 0. Then
³
Pm
(i)
F
y
≥ Q∞ D∞ F (x̂) ≥ 0 and Q∞ D∞ F (x̂) 6= 0. On the other hand
ψ
j
1
³Pj=1 ´ ³ ´
(i)
(i)
(i)
m
= y1 − ym+1 + γi → 0 (i → +∞), which is a contradiction. Thus
j=1 ψj F y1
x̂ = ω, which was to be proved.
ei is also an M -matrix
Since A (x) is an M -matrix and isotone for all x ∈ Rm , A
ei ≥ A
ei+1 for i ≥ 1. The decrease of F within the minor iteration loop is
and satisÞes A
characterized by the inequality
F (yk+1 ) ≥ (I − ek eTk )F (yk ) ≥ 0.
(5.92)
If x0 ∈ Rm is arbitrary, then x1 = x0 − [F 0 (x0 )]−1 F (x0 ) satisÞes the condition F (x1 ) ≥ 0
(see [201]). Hence Theorem 141 is essentially a global convergence theorem.
are M-matrices satisfying
The parameter matrices Qi can be chosen so that Q−1
i
−1
−1
the matrix Qi = Q1 satisÞes
Qi ≥ Di−1 A (xi ). For the choice Q1 ≥ 0, Q−1
1 ≥ A(x1 )D
condition (iii) provided that xi ≤ x1 . For the updated Qi we mention the following
possible choices (see also [249]):
T
m
(a) Q−1
i = Ai + uv (u, v ∈ R ).
−1
(b) Qi is a bidiagonal M -matrix obtained from Ai by setting the other elements of Ai
to 0.
(c) Q−1
is a tridiagonal M -matrix obtained from Ai by setting the other elements of Ai
i
to 0.
The last two cases are related to optimal preconditioning (see, e.g. [130]).
Assume that a point z1 ∈ Rm exists such that F (z1 ) ≤ 0. Then we can deÞne the
algorithm
w1
wk+1
zi+1
= zi
³
´−1
ei pk
= wk − pk eTk A
eTk F (wk )
= wm+1
(k = 1, . . . , m)
as parallel subordinate to algorithm (5.88)-(5.89). Note that
³
´−1
ei pk
ψk = ψk (uk ) = pk eTk A
eTk





(i ≥ 1) ,
(k = 1, . . . , m)
is deÞned by algorithm (5.88)-(5.89).
The following result can be proved similarly to Theorem 141.
(5.93)
Projection methods for nonlinear algebraic systems
102
Theorem 142 Assume that the conditions of Theorem 141 hold. In addition, assume
the existence of a point z1 such that F (z1 ) ≤ 0. Then for the subordinate algorithm (5.93)
we have
ω ≥ zi+1 = wm+1 ≥ wm ≥ · · · ≥ w1 = zi
(i ≥ 1)
(5.94)
and zi → ω as i → +∞.
Proof. We exploit the proof of Theorem 141.
1. We show that if wk ≤ ω ≤ yk are such that F (wk ) ≤ 0 ≤ F (yk ), then
yk+1 = yk − ψk (uk )F (yk ) ≥ yN = yk − A−1 (yk )F (yk ) ≥ ω
(5.95)
wk+1 = wk − ψk (uk )F (wk ) ≤ wN = wk − A−1 (yk )F (wk ) ≤ ω.
(5.96)
and
hold. Since A(yk )ψk ≤ A[k] ψk ≤ ek eTk ≤ I, we have the inequality A(yk )ψk F (yk ) ≤
F (yk ). Multiplying this by A−1 (yk ) ≥ 0 on the left we obtain ψk F (yk ) ≤ A−1 (yk ) F (yk )
and yk − ψk (uk ) F (yk ) ≥ yk − A−1 (yk ) F (yk ) = yN . Multiply 0 ≥ F (ω) − F (yk ) ≥
A (yk ) (ω − yk ) by A−1 (yk ) ≥ 0 on the left. Then we have −A−1 (yk )F (yk ) ≥ ω − yk
and yN = yk − A−1 (yk )F (yk ) ≥ ω. In case of inequality (5.96) F (wk ) ≤ 0 implies
A(yk )ψk F (wk ) ≥ F (wk ) and ψk F (wk ) ≥ A−1 (yk )F (wk ). From this wk − ψk (uk )F (wk ) ≤
wk − A−1 (yk )F (wk ) = wN follows. We show that yk ≥ wN . Simple calculations yield
that
yk ≥ yk − A−1 (yk )F (yk ) = wN + yk − wk + A−1 (yk )[F (wk ) − F (yk )] ≥ wN
because of F (wk ) − F (yk ) ≥ A(yk )(wk − yk ). We prove that wN ≤ ω. Since wk ≤
wk+1 ≤ wN and A(wN ) ≤ A(yk ) we have F (wN ) − F (wk ) ≤ A(wN )(wN − wk ) ≤
A(yk )[−A−1 (yk )F (wk )] = −F (wk ), which implies that F (wN ) ≤ 0 and wN ≤ ω.
2. Let w1 = zi , F (w1 ) ≤ 0. Since P ≥ 0 and ψj ≥ 0 (j = 1, . . . , m), we have
w2 = w1 − ψ1 F (w1 ) ≥ w1 . Inequality (5.96) implies that w2 ≤ wN ≤ ω. Assume
now that w1 ≤ w2 ≤ · · · ≤ wk ≤ ω. If F (wk ) ≤ 0, then wk ≤ wk+1 ≤ ω. We prove that
F (wk+1 ) ≤ 0. Since
F (wk ) − F (wk+1 ) ≥ A(wk+1 )(wk − wk+1 ) = A(wk+1 )ψk F (wk )
and A(wk+1 )ψk F (wk ) ≥ A[k] ψk F (wk ) ≥ ek eTk F (wk ), we have
F (wk+1 ) ≤ (I − ek eTk )F (wk ) ≤ 0.
(5.97)
3. We obtained that
ω ≥ zi+1 = wm+1 ≥ wm ≥ · · · ≥ w1 = zi
(i ≥ 1).
and similarly to the proof of Theorem 25 we can show that zi → ω as i → +∞. Thus
Theorem 142 is proved.
The increase of F is characterized by the inequality
F (wk+1 ) ≤ (I − ek eTk )F (wk ) ≤ 0.
(5.98)
Theorems 141 and 142 together imply that algorithms (5.88)-(5.89) and (5.93) produce
two-sided approximations to the solution ω in the form of inclusion intervals
zi ≤ zi+1 ≤ ω ≤ xi+1 ≤ xi
(i = 1, 2, . . . .)
with xi − zi → 0 (i → +∞).
Next we give an estimation for the inclusion interval [zi+1 , xi+1 ].
(5.99)
Monotone convergence in partial ordering
103
Theorem 143 Under the conditions of Theorems 141 and 142 the interval
h
i
−1
−1
−1
zi − Qi UA−1
e Q DA
e Q F (zi ), xi − Qi UA
e Q DA
e Q F (xi )
i
i
i
i
i
i
i
(5.100)
i
covers the inclusion interval [zi+1 , xi+1 ].
Proof. The inequalities
F (yk+1 ) ≥ (I − ek eTk )F (yk )
and
F (wk+1 ) ≤ (I − ek eTk )F (wk )
imply
F (yk+1 ) ≥ (I −
k
X
F (wk+1 ) ≤ (I −
k
X
ej eTj )F (y1 )
j=1
and
ej eTj )F (w1 ),
j=1
respectively. Since ψk ej = 0 (k 6= j), we have ψk F (yk ) ≥ ψk F (y1 ) and ψk F (wk ) ≤
ψk F (w1 ). From the inequalities




m
m
X
X
−1

ψj  F (yj ) ≥ 
ψj  F (y1 ) = Qi UA−1
e Q DA
e Q F (y1 )
j=1
j=1
i
i
i
i
and




m
m
X
X
−1

ψj  F (wj ) ≤ 
ψj  F (w1 ) = Qi UA−1
e Q DA
e Q F (w1 )
j=1
j=1
i
i
i
i
the statement follows.
Finally, we show that a particular case of Algorithm (5.88)-(5.89) is faster than
Newton’s method in the partial ordering.
Theorem 144 Assume that conditions of Theorem 141 are satisÞed, T = [e1 , . . . , e1 ] and
Qi = A−1 (xi )D1 , where D1 ≥ 0 is diagonal. Then the corresponding algorithm (5.88)(5.89) is faster than Newton’s method in the partial ordering provided that they start from
the same initial point x1 .
Proof. Assume for a moment that xi is the starting point for both methods.
ei = A (xi ) and Qi U −1 D−1 = A−1 (xi ) by deÞnition. Since
Then A
e Q
e Q
A
A
i
xi+1 ≤
i
i
i
−1
xi − Qi UA−1
DA
F (xi )
i Qi
i Qi
= xi − A−1 (y1 )F (xi )
and xNewton = xi − A−1 (xi )F (xi ), we proved the statement for one iteration. We need
the observation that if x ≤ y, F (x) ≥ 0, then x − A−1 (x)F (x) ≤ y − A−1 (y)F (y). If
we multiply the inequality F (x) − F (y) ≥ F 0 (y)(x − y) by A−1 (y) on the left, then we
Projection methods for nonlinear algebraic systems
104
obtain y − A−1 (y)F (y) ≥ x − A−1 (y)F (x). As A(y) is isotone and an M -matrix, we
have A−1 (x) ≥ A−1 (y) which implies A−1 (x)F (x) ≥ A−1 (y)F (x) and x − A−1 (x)F (x) ≤
x − A−1 (y)F (x), from which the observation follows. We now complete the proof as
follows. We start from x1 . Algorithm (5.88)-(5.89) generates the sequence xncd
i , while the
ncd
Newton
.
Assume
that
x
≤
x
.
Then
Newton method generates xNewton
i
i
i
xncd
i+1
ncd
≤ xNewton = xncd
− A−1 (xncd
i
i )F (xi )
= xNewton
− A−1 (xNewton
)F (xNewton
)
≤ xNewton
i+1
i
i
i
implies the theorem.
Theorems 141-144 were proved in [104]. The monotone convergence of the Newton
method was Þrst proved by Baluev (see, e.g. [202], [201]). The special nonlinear conjugate
direction method of Theorem 144 is essentially a modiÞed Newton-method belonging to
Algorithm 5. Frommer [89], [88] proved a similar result for the Brown method.
ei = A (y1 ) and P =
In the particular case uk = y1 (k = 1, . . . , m), when A
−1
we
can
prove
the
statements
of
Theorems
141-143
without
the conjugation
Qi UA(x
i )Qi
procedure (see, e.g., [94]).
5.5
Special applications of the implicit LU ABS method
The implicit LU ABS method has the following form in the linear case Ax = b (A ∈ Rm×m )
Implicit LU ABS method
x1 ∈ Rm , H1 = I
for k = 1, . . . , m
pk = HkT ek
¡
¢−1 T
xk+1 = xk − pk eTk Apk
e (Axk − b)
¡ T k T ¢−1 T
T
Hk+1 = Hk − Hk A ek ek Hk A ek
ek Hk
end
If
·
¸
¡
¢
Ak Ck
A=
Ak ∈ Rk×k ,
(5.101)
∈ Rm×m
Bk Dk
then the update matrices Hk have the form
·
0
Hk+1 =
−CkT A−T
k
0
I
¸
.
(5.102)
The very special structure of Hk ’s can be exploited in many ways. Abaffy [3] and Phua
[206] investigated the implicit LU ABS method on linear systems with structured sparsity
and showed that it is particularly well-Þtted to sparse banded matrices both in memory
space and arithmetic operations (see also [9]). Similar results were obtained by Galántai
[99], [101] on large nonlinear systems with banded Jacobian structure.
Frommer [90] showed that the discretized Brown method, which is the discretized
nonlinear implicit LU method (see [164], [163]), can be implemented on nonlinear equations with banded Jacobian in a very efficient way that requires the same arithmetic work
as the Newton method with banded Gaussian elimination.
In the section we show two recent applications of the block implicit LU ABS
method to sparse nonlinear problems [111]. The common feature of both cases is the
block arrowhead structure of the Jacobian matrix.
Special applications of the implicit LU ABS method
105
5.5.1
The block implicit LU ABS method on linear and nonlinear systems with block
arrowhead structure
Consider the numerical solution of the block bordered nonlinear systems of the form
(i = 1, . . . , q) ,
fi (xi , xq+1 ) = 0
(5.103)
fq+1 (x1 , . . . , xq+1 ) = 0,
Pq+1
where xi ∈ Rni , fi ∈ Rni (i = 1, . . . , q + 1), and i=1 ni = n . Such systems of nonlinear
equations occur in VLSI design and other application areas (see [148], [215], [254] and the
¤T
£
references therein). Let x = xT1 , . . . , xTq , xTq+1 ∈ Rn and
¤T
£
T
(x) ∈ Rn .
F (x) = f1T (x), . . . , fqT (x), fq+1
(5.104)
Then the Jacobian matrix of system (5.103) has the block bordered or arrowhead structure


A1
B1

A2
B2 



..  ,
.
..
(5.105)
J(x) = 

.



Aq Bq 
C1 C2 · · · Cq D
where
Ai =
∂fi (x)
∈ Rni ×ni
∂xi
Bi =
∂fi (x)
∈ Rni ×nq+1 ,
∂xq+1
(i = 1, . . . , q) ,
D=
∂fq+1 (x)
∈ Rnq+1 ×nq+1 ,
∂xq+1
(5.106)
and
Ci =
∂fq+1 (x)
∈ Rnq+1 ×ni
∂xi
(i = 1, . . . , q) .
(5.107)
Linear systems with similar coefficient matrices arise in the domain decomposition
method [27], [240] and also in the least-squares method, if the observation matrix is block
angular [56]. The special sparsity pattern of the Jacobian or the coefficient matrix (5.105)
offers advantages for a specialized solver. Several authors investigated the efficient solution
of such nonlinear systems for reasons of applications and algorithmic developments (see
e.g., [148], [215], [214], [147], [254], [82], [217], [218], [250]).
Here we specialize the block implicit LU ABS algorithm to block bordered systems
and compare it to the other existing methods. We show that Method 2 of Hoyer and
Schmidt [148], which is equivalent with the basic corrected implicit method of Zhang,
Byrd and Schnabel [254], is a special case of the block implicit LU ABS method. For block
arrowhead linear systems we demonstrate that the implicit LU ABS method also contains
the capacitance matrix method of Bjørstad and Widlund [27]. The results indicate the
usefulness of block ABS methods on systems with structured sparsity.
The basic idea of the known methods is the use of the implicit function theorem.
The idea goes back to Brown [33] (see also Ortega and Rheinboldt [202], or Schmidt [214]).
Let
S (x) = D(x) −
q
X
i=1
Ci (x) A−1
i (x) Bi (x).
(5.108)
Projection methods for nonlinear algebraic systems
106
Hoyer and Schmidt [148] suggested the following general algorithm to solve problems of
the form (5.103).
Hoyer-Schmidt algorithm
Step 1:
xi − xi ) = 0 for x
ei (i = 1, . . . , q).
Solve fi (xi , xq+1 ) + Ai (x) (e
Step 2:
¢
¡
+
x1 , . . . , x
eq , xq+1 ) + S (x) x+
Solve fq+1 (e
q+1 − xq+1 = 0 for xq+1 .
Step 3: ¡
¢
e1 , . . . , x
(i = 1, . . . , q) (correction step).
eq , x+
x+
q+1
i = Ψi x, x
Hoyer and Schmidt suggested the following three corrections:
ei ,
x+
i =x
i = 1, . . . , q
ei − Ai (x)−1 Bi (x) (x+
x+
q+1 − xq+1 ),
i =x
¢
¡ +
xi , x+
ei = 0,
fi (e
q+1 ) + Ai (x) xi − x
(Method 1),
i = 1, . . . , q
i = 1, . . . , q
(5.109)
(Method 2),
(Method 3).
(5.110)
(5.111)
They proved that the local convergence rate of Method 1 is 2-step Q-quadratic, while
Methods 2 and 3 have local convergence of Q-order 2.
Next we show that Methods 1 and 2 are identical with the explicit and the basic
corrected implicit methods of Zhang, Byrd and Schnabel [254], respectively. Let


xk1
 .. 


(5.112)
xk =  . 
 xkq 
xkq+1
denote the kth iteration. Zhang, Byrd and Schnabel [254] suggested the following
Corrected Implicit Method (iteration k)
Step 1:
for i = 1, . . . , q
= xki
xk,0
i
for j = 1, . . . , ji
³
´
¡ ¢
Solve Ai xk 4xk,j−1
= −fi xk,j−1
, xkq+1 for 4xk,j−1
.
i
i
i
= xk,j−1
+ 4xk,j−1
xk,j
i
i
i
end
end
Step 2:
¡ ¢
¡ k¢
Pq
x ´
Bi (xk ).
Calculate S(xk ) = D(xk ) − ³i=1 Ci xk A−1
i
¡ k¢
k,jq
1
Solve S x 4xq+1 = −fq+1 xk,j
, xkq+1 for 4xq+1 .
1 , . . . , xq
k
xk+1
q+1 = xq+1 + 4xq+1
Step 3:
for i = 1, . . . , q
¡ k¢ ¡ k¢
i
x Bi x 4xq+1
= xk,j
− A−1
xk+1
i
i
i
end
Steps 1 and 3 of the corrected implicit method provide parallelism for calculating
x
ei ’s and x+
i ’s. The basic corrected implicit method is deÞned by ji = 1 (i = 1, . . . , q).
Special applications of the implicit LU ABS method
107
The class of corrected implicit methods is a modiÞcation of the basic corrected implicit
method such that it allows repetition of Step 1 ”equationwise”. The reason for this is the
parallel bottleneck in Step 2 of the Hoyer-Schmidt algorithm or the corrected implicit
method [254]. Globalized versions of the latter and their convergence analysis are given
in Feng and Schnabel [82] and Zanghirati [250].
Let the n × n unit matrix In be partitioned into r blocks as follows
i
h
In = E (1) , . . . , E (r)
³
´
E (i) ∈ Rn×ni
(5.113)
and consider the block implicit LU method on the general nonlinear system F (x) = 0
(F : Rn → Rn ).
Block implicit LU ABS method (iteration k)
u1 = xk , H1 = In
for i = 1, . . . , r
Pi = HiT E (i)
³
´
Pi
P
τji ≥ 0, ij=1 τji = 1
ηi = j=1 τji uj
¡
¢−1 (i)T ¡ i ¢
E
F u
ui+1 = ui − Pi E (i)T J (ηi ) Pi
³
´−1
T
(i)
(i)T
E
Hi+1 = Hi − Hi J (ηi ) E
Hi J (ηi )T E (i)
E (i)T Hi
end
xk+1 = ur+1 .
Consider one step of the implicit LU method on the coupled nonlinear system
f (x, y) = 0,
g (x, y) = 0,
(5.114)
¤T
£
where x = xT1 , . . . xTq , y = xq+1 ,

and

f1 (x1 , xq+1 )


..
n
n−nq+1
f (x, y) = 
:R →R
.
fq (xq , xq+1 )
g (x, y) = fq+1 (x1 , . . . , xq+1 ) : Rn → Rnq+1 .
Observe that



fx0 (x, y) = 


A1
A2
..
.
Aq
Let us partition ui as follows
i
u =
·
ui1
ui2
¸
,
1


,

u =x=



fy0 (x, y) = 

·
x
y
¸
.
Then by direct calculation we have
· 2 ¸ ·
¸
u1
x − [fx0 (x, y)]−1 f (x, y)
2
u =
=
,
u22
y
B1
B2
..
.
Bq



.

Projection methods for nonlinear algebraic systems
108
S = gy0 (η2 ) − gx0 (η2 ) [fx0 (x, y)]
−1
fy0 (x, y) ,
and
3
u =
·
Using the notations
¡
¢ ¸
u21 + [fx0 (x, y)]−1 fy0 (x,
y) S¢ −1 g u21 , y
¡
.
y − S −1 g u21 , y

x
e1


u21 =  ...  ,
x
eq

u32 = x+
q+1 ,

x+
1


u31 =  ... 
x+
q

and choosing η2 = x we obtain Method 2 of the Hoyer-Schmidt algorithm [148]. In the
notation of the corrected implicit method, the block implicit LU ABS method takes the
following form
Block implicit LU ABS method (Special version 1, iteration k)
Step 1:
for i = 1, . . . ¡, q ¢
¡
¢
Solve Ai xk 4xi = −fi xki , xkq+1 for 4xi .
= xki + 4xi
xk,1
i
end
Step 2:
¡ ¢
¡ k¢
Pq
x ´ Bi (xk ).
Calculate S(xk ) = D(xk ) − ³i=1 Ci xk A−1
i
¡ k¢
k,1 k
Solve S x 4xq+1 = −fq+1 xk,1
1 , . . . , xq , xq+1 for 4xq+1 .
k
xk+1
q+1 = xq+1 + 4xq+1
Step 3:
for i = 1, . . . , q
¡ k¢ ¡ k¢
−1
x Bi x 4xq+1
= xk,1
xk+1
i
i − Ai
end
This form of the implicit LU ABS method is obviously identical with the basic
corrected implicit method of Zhang, Byrd and Schnabel [254]. It is noted that a direct
application of the block implicit LU ABS method to the system (5.103) also leads to this
special form due to the block diagonal structure of the block q × q principal submatrix of
the Jacobian matrix (5.105).
Proposition 145 The block implicit LU ABS method contains Method 2 of the HoyerSchmidt algorithm or the corrected implicit method of Zhang, Byrd and Schnabel for ji = 1
(i = 1, . . . , q).
In addition to the convergence results mentioned earlier we can also apply the
local convergence analysis of Section 5.2.
All nonlinear block ABS methods terminate in r + 1 steps on linear systems of
the form F (x) = Ax − b = 0 for arbitrary initial vector u1 . Consider the following linear
system




Ax = 


A1
B1
B2
..
.
A2
..
C1
C2
.
···
Aq
Cq

x1
x2
..
.





Bq   xq
D
xq+1


b1
b2
..
.
 
 
 
=
 
  bq
bq+1




.


Special applications of the implicit LU ABS method
109
The block implicit LU ABS method (Special version 1) then has the form
Step 1:
for i = 1, . . . , q
¢
¡
Solve Ai 4ui = − Ai u1i + Bi u1q+1 − bi for 4ui .
= u1i + 4ui
u1,1
i
end
Step 2:
Pq
Calculate S = D − ³ i=1 Ci A−1
i Bi .
´
Pq
1,1
1
for 4uq+1 .
Solve S4uq+1 = −
C
u
+
Du
−
b
i
q+1
q+1
i
i
u2q+1 = u1q+1 + 4uq+1
Step 3:
for i = 1, . . . , q
−1
u2i = u1,1
i − Ai Bi 4uq+1
end
The domain decomposition method of Bjørstad and Widlund [27] for solving the
Poisson problem
−∆u = f (x, y) , (x, y) ∈ Ω
u (x, y) = g (x, y) , (x, y) ∈ ∂Ω
leads to a linear system of the form

0
0
A1,1

A
0
2,2


A
3,3


..

.
T
T
T
A1,q+1 A1,q+1 A2,q+1 . . .

A1,q+1
A2,q+1
A3,q+1
..
.
Aq+1,q+1






u1
u2
u3
..
.
uq+1


 
 
 
=
 
 
b1
b2
b3
..
.
bq+1




.


(5.115)
Assuming that uq+1 is known, the solution of the system reduces to the solution
of subsystems
Ai,i ui = bi − Ai,q+1 uq+1 ,
i = 1, . . . , q.
(5.116)
A reduced system of equations in uq is obtained from equation (5.115) by eliminating the
unknown vectors u1 through uq . Substitute
ui = A−1
i,i (bi − Ai,q+1 uq+1 )
into block number q + 1 of equation (5.115) to obtain the capacitance system
Cuq+1 = sq+1 ,
(5.117)
with the capacitance matrix
C = Aq+1,q+1 −
q
X
ATi,q+1 A−1
i,i Ai,q+1
(5.118)
i=1
as coefficient matrix and with right-hand side vector given by
sq+1 = bq+1 −
q
X
i=1
ATi,q+1 vi ,
(5.119)
110
Projection methods for nonlinear algebraic systems
where
Aii vi = bi ,
i = 1, . . . , q.
(5.120)
The domain decomposition method consists of three steps in order: the solution of systems
(5.120), the solution of the capacitance system (5.117) and Þnally the solution of systems
(5.116). The subsystems can be solved by any available Poisson solver.
It is easy to verify the following statement.
Proposition 146 The block implicit LU ABS method contains the capacitance matrix
method of Bjørstad and Widlund, if u1 = 0, Ai,q+1 = Bi and Ci = ATi,q+1 .
The experimental results of Zhang, Byrd and Schnabel [254] show that the corrected implicit methods, or equivalently the Hoyer-Schmidt or the implicit LU ABS
method are very effective on nonlinear systems of the form (5.103). Van de Velde [240]
gives a very detailed implementation and performance analysis of the domain decomposition method of Bjørstad and Widlund on parallel computers, which also applies here.
5.5.2 Constrained minimization with implicit LU ABS methods
We seek the Kuhn-Tucker points of equality and inequality constrained optimization problems by solving an equivalent nonlinear system. For the solution of this structured nonlinear system we suggest a special version of the implicit LU ABS method which seems
to be useful in certain cases.
We investigate nonlinear optimization problems of the form
f (x) → min
hj (x) = 0, j ∈ J = {1, 2, . . . , p} ,
gi (x) ≤ 0, i ∈ I = {1, 2, . . . , m} ,
(5.121)
where f, gi , hj : Rn → R (i ∈ I, j ∈ J) are smooth enough. Let
X
X
µj hj (x) +
λi gi (x)
L (x, µ, λ) = f (x) +
j∈J
and
i∈I


x
z =  µ .
λ
¤T
£
is said to be a Kuhn-Tucker point (KTP) if it satisÞes
A point z ∗ = x∗T , µ∗T , λ∗T
∇x L (x, µ, λ) = 0,
hj (x) = 0 (j ∈ J) ,
gi (x) ≤ 0 (i ∈ I) ,
λi gi (x) = 0 (i ∈ I) ,
λi ≥ 0 (i ∈ I) .
(5.122)
Under a regularity assumption, conditions (5.122) are necessary for the optimality of x∗
in optimization problem (5.121). There are several methods to solve (5.122), especially in
the case of I = ∅. For details and references we refer to [170].
DeÞnition 147 We call a function φ : R2 → R NCP-function if it satisÞes the complementarity condition
φ (a, b) = 0 ⇔ a ≥ 0, ab = 0, b ≤ 0.
Special applications of the implicit LU ABS method
111
Mangasarian [175] gives a method to construct such functions. Everywhere differentiable NCP-functions are
φ (a, b) = ab +
1
(max {0, b − a})2 ,
2
(Evtushenko-Purtov)
2
2
φ (a, b) = ab + (max {0, a}) + (min {0, b}) ,
(Evtushenko-Purtov)
φ (a, b) = (a + b)2 + b |b| − a |a| .
(Mangasarian)
Using any complementarity function we can rewrite the Kuhn-Tucker conditions
(5.122) as an equivalent nonlinear system Fφ (z) = 0, where Fφ : Rn+p+m → R is deÞned
by


∇x L (x, µ, λ)

 h1 (x)



 ..

 .


 = 0.

(5.123)
Fφ (z) =  hp (x)

 φ (λ1 , g1 (x)) 



 .

 ..
φ (λm , gm (x))
This kind of equivalence was Þrst given by Mangasarian [175], who also gave the Þrst
technique to construct NCP-functions.
Under the following assumptions Kanzow and Kleinmichel [170] deÞned a class of
Newton-type methods for solving the nonlinear system
Fφ (z) = 0.
(5.124)
¤T
£
Let z ∗ = x∗T , µ∗T , λ∗T
be a KTP of problem (5.121).
(A.1) The functions f, gi and hj (i ∈ I, j ∈ J) are twice differentiable with
Lipschitz-continuous second derivatives in a neighborhood of x∗ .
(A.2) The gradients ∇gi (x∗ ) (i ∈ I ∗∗ ) and ∇hj (x∗ ) (j ∈ J ) are linearly independent, where I ∗∗ = {i ∈ I | λ∗i > 0}.
(A.3) yT ∇2xx L (x∗ , µ∗ , λ∗ ) y > 0 for all y ∈ Rn , y 6= 0, yT ∇gi (x∗ ) = 0 (i ∈ I ∗∗ )
and y T ∇hj (x∗ ) = 0 (j ∈ J).
(A.4) I ∗ = I ∗∗ , where I ∗ = {i ∈ I | gi (x∗ ) = 0}.
(A.5) The NCP-function φ satisÞes
∂φ
∂a
(λ∗i , gi (x∗ )) = 0
(i ∈ I ∗∗ ) ,
∂φ
∂b
(λ∗i , gi (x∗ )) 6= 0
(i ∈ I ∗∗ ) ,
∂φ
∂a
(λ∗i , gi
(x )) 6= 0
(i ∈
/ I ),
∂φ
∂b
(λ∗i , gi (x∗ )) = 0
(i ∈
/ I ∗∗ ) .
∗
(5.125)
∗∗
It is noted that if the NCP-function φ is continuously differentiable, then the Þrst
and the last relations of condition (5.125) follow from this fact (see Lemma 2.4 in [170]).
Kanzow and Kleinmichel [170] proved the following result.
Projection methods for nonlinear algebraic systems
112
¤T
£
Theorem 148 (Kanzow and Kleinmichel). Let z ∗ = x∗T , µ∗T , λ∗T be a KTP of problem (5.121). Suppose that the assumptions (A.1)-(A.5) hold at z ∗ . Let φ : R2 → R be a
continuously differentiable NCP-function. Then the Jacobian matrix Fφ0 (z ∗ ) is nonsingular.
Theorem 148 can be weakened to locally differentiable NCP-functions φ, if otherwise they satisfy assumption (A.5). Such an NCP-function is the Fischer-function
p
(5.126)
φ (a, b) = a2 + b2 + b − a,
which is differentiable for all vectors (a, b) 6= (0, 0).
The Jacobian matrix Fφ0 (z) is given by

∇2xx L (x, µ, λ)
∇h1 (x)T
..
.






T
 ³ ∇h
´ p (x)

∂φ

∇g1 (x)T

∂b
1

..


 ³ ´ .
∂φ
∇gm (x)T
∂b
∇h1 (x)
...
∇hp (x)
∇g1 (x)
0
³
∂φ
∂a
´
m
where
³
∂φ
∂a
∂φ
∂b
´
i
´
i
=
∂φ
∂a
(λi , gi (x))
(i ∈ I) ,
=
∂φ
∂b
(λi , gi (x))
(i ∈ I) .
This Jacobian matrix has the block structure

A
Fφ0 (z) =  C T
D
∇gm (x)
0
0
³
...
C
0
0
1
..
.
³
´
∂φ
∂a m

B
0 ,
E








,






(5.127)
(5.128)
where the diagonal matrix E may become singular. The Newton-type methods suggested
by Kanzow and Kleinmichel [170] take special care of this singularity by Þrst solving
a reduced linear system and then making a correction step. It is noted that in [170]
the inequality constraints precede the equations hi (x) = 0 and the Jacobian Fφ0 (z) is
permutationally similar to (5.128).
Here we suggest another approach by using the block implicit LU ABS method
that can avoid the handling of the zero diagonals in E.
Lemma 149 If Fφ0 (z ∗ ), A and C T A−1 C are nonsingular, then Fφ0 (z ∗ ) is block strongly
nonsingular and its LU-decomposition is given by



I
0
0
A
C
B
 C T A−1
I
0   0 −C T A−1 C −C T A−1 B  ,
¡ T −1 ¢−1
−1
−1
0
0
W
DA C C A C
I
DA
¡
¢−1 T −1
where W = E − DA−1 B + DA−1 C C T A−1 C
C A B is also nonsingular.
Special applications of the implicit LU ABS method
113
The proof of Lemma is easy by direct calculation. Coleman and Fenyes [54] points
out that many authors assume that A = ∇2xx L (z ∗ ) is positive deÞnite. In such a case,
C T A−1 C is also positive deÞnite, provided that C has maximum column rank. For other
assumptions, see [199], [54] and [170].
Consider now one step of the block implicit LU ABS method on the partitioned
system


F1 (x, µ, λ)
,
F2 (x)
Fφ (z) = 
F3 (x, λ)
where F1 (z) = F1 (x, µ, λ) = ∇x L (x, µ, λ),



h1 (x)



F2 (x) =  ...
 , F3 (x, λ) = 
hp (x)
φ (λ1 , g1 (x))
..
.
φ (λm , gm (x))



and r = 3 (n1 = n, n2 = p, n3 = m). We also partition ui accordingly, that is
 i 
u1
ui =  ui2  ,
ui3
where ui1 ∈ Rn , ui2 ∈ Rp and ui3 ∈
(j > 1) we have the following
Block implicit LU ABS method
Step 1:
u1 = z k ,

A
 CT
D
Rm . By direct calculation and the choice τji = 0
(Special version 1, iteration k)
C
0
0
Step 2:

B
¡ ¢
0  = Fφ0 u1 ;
E
¡ ¢
u21 = u11 − A−1 F1 u1 ,
u22 = u12 ,
u23 = u13 ;
Step 3:
¡
¢−1 ¡ 2 ¢
u31 = u21 − A−1 C C T A−1 C
F2 u1 ,
¡
¢
¡
¢
−1
u32 = u22 + C T A−1 C
F2 u21 ,
u33 = u23 ;
Step 4:
¡
¢−1 T −1
W = E − DA−1 B + DA−1 C C T A−1 C
C A B,
³
¡
¢−1 T −1 ´ −1 ¡ 3 3 ¢
u41 = u31 − −A−1 B + A−1 C C T A−1 C
C A B W F3 u1 , u3 ,
¡
¢
¡
¢
−1 T −1
u42 = u32 + C T A−1 C
C A BW −1 F3 u31 , u33 ,
¡
¢
u43 = u33 − W −1 F3 u31 , u33 ,
114
Projection methods for nonlinear algebraic systems
Step 5:
z k+1 = u4
It is clear that we can avoid inversions when implementing the algorithm. A
possible way of doing this is the following:
Step 2:
¡ ¢
A∆1 = −F1 u1 , u21 = u11 + ∆1 ;
Step 3:
¢
¡ ¢
¡
A∆1 = C, C T ∆1 ∆2 = F2 u21 , A∆3 = C∆2 ,
u31 = u21 − ∆3 ,
u32 = u22 + ∆2 ;
Step 4:
¢
¡
A∆¡1 = C,¢A∆2 = B, C T ∆1 ∆3 = C T ∆2 , A∆4 = C∆3 , W = E − D∆2 + D∆4 ,
W ∆5 = F3 u31 , u33 ,
u41 = u31 − (−∆2 + ∆4 ) ∆5 ,
u42 = u32 + ∆3 ∆5 ,
u43 = u33 − ∆5 .
Preliminary numerical testing in MATLAB indicates that the algorithm works
well if C T A−1 C is well-conditioned.
Chapter 6
CONVERGENCE AND ERROR ESTIMATES
Here we study the a posteriori error estimate of Auchmuty for the approximate solution of
linear equations [115] and derive computable convergence estimates for the von NeumannHalperin theorem and Nelson-Neumann theorems in Þnite-dimensional spaces. The latter
result can be used to estimate the convergence speed of the Kaczmarz method of Section
5.1 and other similar methods.
6.1
A posteriori error estimates for linear and nonlinear equations
We analyze the absolute error estimate of Auchmuty [17] developed for linear systems. In
the euclidean norm this estimate and its geometrical interpretation are derived from the
Kantorovich inequality. The estimate is then compared with other estimates known in
the literature. A probabilistic analysis and extension of the estimate to nonlinear systems
are also given. The computational test results indicate that Auchmuty’s estimate is an
appropriate tool for practice.
Let ω be the exact solution of the linear system
¢
¡
Ax = b
A ∈ Rn×n , det (A) 6= 0
and let r (x) = Ax − b denote the residual error for any approximate solution x. There
are several a posteriori error estimates which exploit the residual information [57], [14],
[257], [31]. Here we recall the following estimates
°
°
kr (x)k
≤ kx − ωk ≤ °A−1 ° kr (x)k ,
kAk
kBr (x)k
kBr (x)k
≤ kx − ωk ≤
1 + kBA − Ik
1 − kBA − Ik
(6.1)
(6.2)
where I stands for the unit matrix, B is an approximation to A−1 satisfying kBA − Ik < 1,
the matrix norms are multiplicative and the vector norms are
°
° consistent.
Estimate (6.1) requires the knowledge of kAk and °A−1 °, while estimate (6.2),
which is due to Aird and Lynch [14], [257], requires an approximate inverse B of matrix
A. Auchmuty’s estimate [17] requires neither information. Let x ∈ Rn be an arbitrary
approximate solution (r (x) 6= 0). Then
¶
µ
kr (x)k22
1 1
+ =1
(6.3)
1 ≤ p ≤ ∞,
kx − ωkp = c T
kA r (x)kq
p q
holds with 1 ≤ c ≤ Cp (A), where
Cp (A) = sup
y6=0
° T ° ° −1 °
°A y ° °A y °
q
p
kyk22
.
(6.4)
Convergence and error estimates
116
Auchmuty’s estimate seems unnoticed although computational experiments indicate that
the error constant c is usually less than 10 in practice. Such a ratio between the estimate
and the estimated quantity is usually acceptable (see, e.g., Higham [143], p. 294).
In the sequel we investigate the Auchmuty estimate for the Euclidean norm, which
has the form
kr (x)k22
kr (x)k22
≤
kx
−
ωk
≤
C
(A)
,
2
2
kAT r (x)k2
kAT r (x)k2
with
C2 (A) = sup
y6=0
° T ° ° −1 °
°A y° °A y °
2
2
kyk22
°
° ¢
¡
C2 (A) ≤ κ2 (A) = kAk2 °A−1 °2 .
(6.5)
(6.6)
We Þrst show that the error estimate is a consequence of the Kantorovich inequality. This approach leads to the exact value of C2 (A) and the characterization of all
cases when equality appears in the upper bound of (6.5). Using the Greub-Rheinboldt
formulation of the Kantorovich inequality we derive the geometric interpretation of the
estimate. This shows that Auchmuty’s
¢lower estimate orthogonally projects the error vec¡
tor x − ω into the subspace R AT r (x) . We also make some probability reasoning about
the possible values of c and C2 (A) giving a better background for the numerical testing.
The Auchmuty estimate is then extended to nonlinear systems of the form F (x) = 0.
This result can be used in conjunction with the Newton and Newton-like methods. We
carried out an intensive computational testing for linear systems. The results which indicate the usefulness of the estimate are evaluated in Section 6.1.5, where a practical version
(formula (6.28)) is also suggested.
6.1.1 Derivation and geometry of the Auchmuty estimate
We Þrst show that Auchmuty’s estimate is a consequence of the Kantorovich inequality
given in the following form (see e.g., [131], [144], [176]). If B ∈ Rn×n is a symmetric
positive deÞnite matrix with eigenvalues λ1 ≥ λ2 ≥ . . . ≥ λn > 0 and x ∈ Rn is an
arbitrary vector, then
¡
¢¡
¢ 1 (λ1 + λn )2
4
kxk42 .
kxk2 ≤ xT Bx xT B −1 x ≤
4 λ1 λn
(6.7)
The Kantorovich inequality is sharp. Let B = U ΣU T with orthogonal U = [u1 , . . . , un ]
and Σ = diag (λ1 , . . . , λn ). Let the multiplicity of λ1 and λn be k and l, respectively. It
follows from Henrici [142], that equality holds for x 6= 0 in the upper bound, if and only if
x = τU y, where τ ∈ R (τ 6= 0) and y = [y1 , . . . , yk , 0, . . . , 0, yn−l+1 , . . . , yn ]T is such that
k
X
i=1
yi2 =
n
X
i=n−l+1
1
yi2 = .
2
(6.8)
√
Particularly, for x = (u1 + un ) / 2 equality is achieved in the upper bound. Notice that
for x = ui (i = 1, . . . , n) equality holds in the lower bound.
Let A = U ΣA V T (ΣA = diag (σ1 , . . . , σn )) be the singular value decomposition
°
°2
of A such that σ1 ≥ σ2 ≥ . . . ≥ σn > 0 and let B = AAT . As xT Bx = °AT x°2 ,
°
°2
xT B −1 x = °A−1 x°2 and λi = λi (B) = σi2 (A) = σi2 , where σi is the ith singular value of
A, we can write
¡
¢
° T °2 ° −1 °2 1 σ12 + σn2 2
4
4
°
°
°
°
kxk2 ,
kxk2 ≤ A x 2 A x 2 ≤
4
σ12 σn2
A posteriori error estimates for linear and nonlinear equations
117
from which
°
° °
°
1 σ12 + σn2
kxk22 ≤ °AT x°2 °A−1 x°2 ≤
kxk22
2 σ1 σn
(6.9)
follows. Substituting x by r (x) = Ax − b = A (x − ω) we have
°
°
1 σ12 + σn2
kr (x)k22 ,
kr (x)k22 ≤ °AT r (x)°2 kx − ωk2 ≤
2 σ1 σn
which implies the Auchmuty estimate
2
kr (x)k22
1 σ12 + σn2 kr (x)k2
≤
kx
−
ωk
≤
.
2
kAT r (x)k2
2 σ1 σn kAT r (x)k2
(6.10)
The upper bound is sharp for x = ω + τV Σ−1
A y, where τ and y are deÞned at (6.8). We
may conclude that
µ
¶
1 σ12 + σn2
1
1
=
C2 (A) =
κ2 (A) +
.
(6.11)
2 σ1 σn
2
κ2 (A)
Auchmuty [17] mentions that for p = 2, a weaker form of the upper bound in (6.5)
can be obtained from the Kantorovich inequality. Here we point out that exactly the
inequality
can be derived from the Kantorovich inequality and C2 (A) is equal to
¢
¡same
σ12 + σn2 / (2σ1 σn ). Observe that
1
C2 (A) ≈ κ2 (A) ,
2
if κ2 (A) is large enough.
As r (x) = Ae (e = x − ω) we can write the error constant c in the form
³ ¡
¢2 ´ ¡
¢
eT AT A e eT e
2
.
c =
2
(eT AT Ae)
(6.12)
(6.13)
Observe that c is invariant under the transformation e → γe. So the error constant
c depends only on the direction of the error vector e. For later use we introduce the
notation c = c (A, e).
For the geometrical interpretation of the estimate we need the Greub-Rheinboldt
reformulation of the Kantorovich inequality ([131], [144]).
Let D, E ∈ Rn×n be two positive deÞnite, symmetric and commuting matrices.
Denote by λ1 and λn the largest and the smallest eigenvalue of D, respectively. Similarly,
denote by µ1 and µn the largest and the smallest eigenvalue of E, respectively. Then
¡ T 2 ¢ ¡ T 2 ¢ (λ1 µ1 + λn µn )2 ¡ T
¢2
x D x x E x ≤
x DEx
4λ1 λn µ1 µn
(6.14)
for all x ∈ Rn .
Let cos (x, y) denote the cosine between the vectors x and y. If D is positive
deÞnite and symmetric, then
p
2 κ2 (D)
(x 6= 0) .
(6.15)
cos (Dx, x) ≥
1 + κ2 (D)
Convergence and error estimates
118
The deÞnition of cosine and the Greub-Rheinboldt inequality (6.14) with E = I imply
that
¡ T
¢2
x Dx
4λ1 λn
4κ2 (D)
2
≥
=
.
cos (Dx, x) = T 2
2
T
(x D x) (x x)
(λ1 + λn )
(1 + κ2 (D))2
Inequality (6.15) is sharp. Let D = AT A and let A = U ΣA V T be again the singular value
decomposition of A. The lower bound is then achieved
for x = τ V Σ−1
A y, where τ and
p
y are deÞned at (6.8). We note that quantity 2 κ2 (D)/ (1 + κ2 (D)) is equal to cos D
which is the cosine of operator D (see Gustafson, Rao [132]). In general,
cos (A) =
xT Ax
x6=0,Ax6=0 kAxk kxk
inf
¡
¢
A ∈ Rn×n .
(6.16)
We can easily recognize that the error constant c = c (A, e) can be expressed as
c = c (A, e) =
1
cos (AT Ae, e)
,
(6.17)
³
´i
h
¡
¢
1 σn
. It is clear that c is
where the angle α = AT Ae, e ] can vary in 0, cos−1 σ2σ
2 +σ 2
n
1
maximal, if α is also maximal.
We can now express Auchmuty’s estimate as follows.
Theorem 150 For the absolute error, the relation
holds with
2
¡
¢
kr (x)k2
= cos AT Ae, e kek2
T
kA r (x)k2
¡
¢
1 ≥ cos AT Ae, e ≥
1
= cos AT A.
C2 (A)
(6.18)
(6.19)
So we can think that¡Auchmuty’s
lower estimate orthogonally projects
the
¢
¡
¢ error
vector e into the subspace R AT Ae = R(AT r). The smaller the angle AT Ae, e ] , the
better the estimate.
6.1.2 Comparison of estimates
We compare estimates (6.1) and (6.5). These estimates give the inclusion intervals
#
"
¸
·
2
2
°
kr
(x)k
kr
(x)k
kr (x)k2 °
2
2
, °A−1 °2 kr (x)k2 ,
, C2 (A)
kAk2
kAT r (x)k2
kAT r (x)k2
for kek2 , respectively. The ratio of the upper and lower interval bounds are κ2 (A) and
C2 (A), respectively. As C2 (A) ≈ κ2 (A) /2 for large κ2 (A), this ratio is smaller for the
Auchmuty estimate. The lower bounds satisfy
kr (x)k22
kr (x)k2
≤
≤ kx − ωk2 .
kAk2
kAT r (x)k2
Thus Auchmuty’s lower estimate is a better approximation to kek2 than the lower bound
of estimate (6.1). For the upper bounds of the inclusion intervals the relation
2
°
°
°
1°
°A−1 ° kr (x)k ≤ C2 (A) kr (x)k2 ≤ C2 (A) °A−1 ° kr (x)k
2
2
2
T
2
2
kA r (x)k2
A posteriori error estimates for linear and nonlinear equations
119
holds.
° T The
° relative position of the corresponding upper bounds depends on the value of
°A r (x)° , which may lie in [σn kr (x)k , σ1 kr (x)k ]. One can easily prove that
2
2
2
C2 (A)
°
°
kr (x)k22
≥ °A−1 °2 kr (x)k2
T
kA r (x)k2
°
°
for °AT r (x)°2 = σn kr (x)k2 , and
C2 (A)
°
°
kr (x)k22
< °A−1 °2 kr (x)k2
T
kA r (x)k2
°
°
for °AT r (x)°2 = σ1 kr (x)k2 .
Brezinski gave Þve error estimates using the theory of moments and interpolation
2
[31]. The closest one of these estimates is e3 = kr (x)k2 / kAr (x)k2 for which he proved
that e3 /κ2 (A) ≤ kek ≤ κ2 (A) e3 . For symmetric A estimate e3 is identical with the
Auchmuty’s lower bound. In general, e3 can be less or greater than the lower Auchmuty
estimate. It is easy to prove that
e3 /κ2 (A) ≤
6.1.3
kr (x)k22
≤ κ2 (A) e3 .
kAT r (x)k2
Probabilistic analysis
We investigate the behavior of c and C2 (A) for random values of e and A, respectively. We
can assume that kek2 = 1, without loss of generality. Let us assume©Þrst that A is Þxed
ª
and e is random on the surface of the n-dimensional unit sphere Sn = x ∈ Rn | xT x = 1 .
As the random variable c (A, e) is bounded, that is 1 ≤ c (A, e) ≤ C2 (A), its expected
value and variance must satisfy the inequalities
1 ≤ E (c (A, e)) ≤ C2 (A) ,
V ar (c (A, e)) ≤
µ
C2 (A) − 1
2
¶2
,
(6.20)
respectively. Considering the fact that the extremum of c (A, e) is achieved only on a
special subset of Sn¡, we may¢ hope that for a relatively small positive ξ the inequality
c (A, e) ≤ ξ (or cos AT Ae, e ≥ ξ −1 ) holds with a high probability. In such a case the
expected values and variances can be signiÞcantly smaller than the corresponding upper
bounds in (6.20). The results of numerical testing, in which e was uniformly distributed
on Sn , strongly support this expectation.
If the matrix A is assumed to be random, we can use the special relationship
between C2 (A) and κ2 (A) and known results on the condition number distribution of
random matrices (Demmel [61], Edelman [75]). The matrix A ∈ Rn×n is called Gaussian,
if its elements are independent
°
° standard normal random variables. For the condition
number κD (A) = kAkF °A−1 °2 Demmel proved that
P (κD (A) ≥ t) ≤ 2
"µ
2n
1+
t
¶n2
#
−1 ,
(6.21)
if A ∈ Rn×n is Gaussian matrix (see Demmel [61], Thm. 5.2 and also Edelman [75]).
This tail probability bound is proportional to 4n3 /t. It is less than 1, if t exceeds
about 5n3 . So for Gaussian matrices of a given order n, it is very unlikely that κD (A)
exceeds a rather large value of t.
Convergence and error estimates
120
As C2 (A) ≤ κ2 (A) ≤ κD (A) one can easily obtain
"µ
2n
1+
t
P (c ≥ t) ≤ P (C2 (A) ≥ t) ≤ P (κD (A) ≥ t) ≤ 2
¶n2
#
−1 ,
(6.22)
if A ∈ Rn×n is a Gaussian matrix.
Edelman [75] proved that for Gaussian matrices An ∈ Rn×n
E (log (κ2 (An ))) ≈ log n + 1.537
(6.23)
as n → ∞. This result indicates that κ2 (A) is unlikely to be large for such random
matrices. From (6.23) we can derive, with a reasonable heuristic, that E (log (C2 (An ))) ≈
log n + 0.844 as n → ∞. Consequently, C2 (An ) is likely to be under αn, where α is an
appropriate constant.
Denote by Ln the lower triangular part of a Gaussian matrix An . Viswanath and
Trefethen [245] recently proved that
p
n
κ2 (Ln ) → 2
almost surely
(6.24)
as n → ∞. This bound gives a rather large value for C2 (Ln ) (≈ κ2 (Ln ) /2).
Numerical testing up to the size n = 300 indicates that E (c (A, e)) is likely to be
small for both An and Ln (≤ 2).
6.1.4 The extension of Auchmuty’s estimate to nonlinear systems
We consider the nonlinear algebraic systems of the form
F (x) = 0
(F : Rn → Rn )
³
´
and assume that the Jacobian matrix F 0 (ω) is invertible, F 0 ∈ C 1 S (ω, δ) and
kF 0 (x) − F 0 (y)k2 ≤ L kx − yk2
³
´
∀x, y ∈ S (ω, δ) .
Here S (ω, δ) = {x | kω − xk2 ≤ δ} and δ > 0. Assume that x is close enough to ω. Let
B = F 0 (x) F 0 (x)T and apply the Kantorovich inequality (6.7). We obtain
°
° °
°
1 σ12 + σn2
°
° °
°
kzk22 ≤ °F 0 (x)T z ° °F 0 (x)−1 z ° ≤
kzk22 (z ∈ Rn ) ,
2 σ1 σn
2
2
where σi = σi (F 0 (x)). Let
´ F (x). From the Lipschitz continuity
´ it follows that
³ z =
³
F (x) = F 0 (x) (x − ω) + O kek22 and F 0 (x)−1 F (x) = x − ω + O kek22 . Hence
°
° ³
´´
³
°
°
2
T
2
2
kF (x)k2 ≤ °F 0 (x) F (x)° kx − ωk2 + O kek2 ≤ C2 (F 0 (x)) kF (x)k2
2
and
´
³
kF (x)k22
kF (x)k22
° ≤ kx − ωk2 + O kek22 ≤ C2 (F 0 (x)) °
° .
°
°
°
° 0
° 0
T
T
°F (x) F (x)°
°F (x) F (x)°
2
2
Thus we obtained the approximate absolute error estimate
kF (x)k22
° ,
kx − ωk2 = c °
°
° 0
T
°F (x) F (x)°
2
where 1 / c / C2 (F (x)).
0
(6.25)
A posteriori error estimates for linear and nonlinear equations
121
6.1.5 Numerical testing
For linear systems we investigated the value of c (A, e) when e is a uniformly distributed
random vector on the surface of the n-dimensional unit sphere Sn . This means that the
computed solution x
b satisÞes the perturbed equation Ab
x = b + Ae, where e ∈ Sn is
uniformly distributed.
The test matrices were mainly taken from the Higham collection [143] (gallery
in MATLAB 5.1). We selected two groups of test problems. Group 1 and Group 2 consists
of 42 and 8 variable size test problems (matrix families), respectively. In Group 1 the size
of the matrices were chosen as n = 10, 20, . . . , 300. This choice gives 1260 matrices in
Group 1. This group consists of two subgroups, namely, matrices with relatively small and
matrices with relatively high condition numbers. In Group 2 the size of the matrices were
chosen as n = 5, 10, 15, . . . , 50. Thus we have 80 matrices in Group 2. The maximum size
in Group 2 was limited by MATLAB’s built-in cond function.
The testing sequence was carried out as follows. For each matrix we generated
2000 uniformly distributed random vectors e on Sn and calculated the values of c (A, e)
by formula (6.13). The sample estimate of the expected value c (A) = E (c (A, e)) and
variance σ2 (A) = V ar (c (A, e)) are denoted by c (A) and s2 (A), respectively. For each
dimension n we calculated the average of c (A)’s and κ2 (A)’s, respectively. These averages
are denoted by c (n) and κ (n), respectively. The reliability of the test results is about
P (|c (A) − c (A)| < 0.044σ (A)) ≈ 0.95
for 2000 sample elements.
The following results were obtained.
c (A)min
c (A)max
c (n)min
c (n)max
κ2 (A)min
κ2 (A)max
κ (n)min
κ (n)max
Group 1
1.0015
128.20
3.4698
13.873
1.4142
1.3051 × 1023
2.2393 × 1016
3.1824 × 1021
Group 2
1.0804
35.573
2.8304
11.856
2.3915
9.5911 × 10145
3.8835 × 107
1.1989 × 10145
The results of Group 1 testing are shown in Figures 1, 2, 3. On Figure 1 we can see that
the average of c (A)’s (c (n)) tends to increase with n.
This tendency is similar to the Edelman result given by (6.23). Graphic presentation of c (A)’s and κ2 (A)’s versus test matrix families and dimension are given in Figures
2 and 3. These two pictures show that for several test problems the c (A)’s are relatively
small, while the condition numbers are quite high. The weak dependence on κ2 (A) is also
indicated by the following multiple linear regression result
c (A) = 5.7164 × 10−2 dim (A) + 9.0520 × 10−23 κ2 (A) ,
(6.26)
where the coefficient of κ2 (A) is not signiÞcantly different from 0 at 95% conÞdence level.
In Group 1 the 90 percentile of the c (A)’s is 23.128, which indicates that c (A) is
likely to be remain small. Those cases for which c (A) exceeded 23.128 were the cauchy,
krylov, lotkin, minij, moler, pei, randsvd and magic matrices.
The results of Group 2 testing are shown in Figures 4, 5, 6. The average of c (A)’s
again tends to increase with n, as shown by Figure 4. Graphic presentation of c (A)’s and
κ2 (A)’s versus test matrix families and dimension are given in Figures 5 and 6). The
122
Convergence and error estimates
Figure 1 The average of c (A)’s versus dimension
Figure 2 The values of c (A)’s versus matrices and dimension
A posteriori error estimates for linear and nonlinear equations
Figure 3 Condition numbers versus matrices and dimension
Figure 4 The average of c (A)’s versus dimension
123
124
Convergence and error estimates
Figure 5 The values of c (A)’s versus matrices and dimension
Figure 6 Condition number versus matrices and dimension
Bounds for the convergence rate of the alternating projection method
125
multiple regression result is
c (A) = 2.6845 × 10−1 dim (A) + 1.0648 × 10−145 κ2 (A) ,
(6.27)
where the coefficient of κ2 (A) is not signiÞcantly different from 0 at 95% conÞdence level.
So we can conclude again that c (A) depends on dim (A) rather than cond (A).
In Group 2 the 90 percentile of c (A)’s is 22.482, which indicates that c (A) is
likely to be remain small. Those matrices for which c (A) exceeded 22.482 were the invol
and ipjfact.
In most of the Group 1 and 2 cases when c (A) exceeded the 90 percentile the
singular values are concentrated roughly in two clusters, where the cluster members are
of equal size in each group. Usually the Þrst cluster contains a few large singular values
while the remaining singular values, which belong to the second cluster, are small. In case
of the moler matrix the situation is opposite. It has only a few small singular values of
the same size, while the remaining ones are large and approximately equal. So we can
think that the above singular value distribution is at least partially responsible for c (A)
being high.
We can now make the following conclusions. The average of the error constant
c in Auchmuty’s estimate is slowly increasing with n, and it depends on n rather than
cond (A). Upon the basis of the observed trend of c (n) and the regression results (6.26),
(6.27) the following estimate holds with a high degree of probability:
°
°
(6.28)
kx − ωk2 / 0.5 dim (A) kr (x)k22 / °AT r (x)°2 .
6.2
Bounds for the convergence rate of the alternating projection method
The alternating projection method is a very general approximation algorithm that has
many forms and applications [70], [94]. Let M1 , M2 ,. . . , Mk be closed subspaces of the real
Hilbert space H, M = ∩ki=1 Mi , and Pi = PMi (i = 1, . . . , k). The orthogonal projection
PM is called the intersection projection. We seek for PM x, the best approximation in
the intersection M to any x ∈ H, by projecting a point cyclically through the individual
subspaces. Thus the method of alternating projection is deÞned by
x0 = x,
xj+1 = Pi xj
(i ≡ j (mod k) + 1) .
(6.29)
Halperin [136] proved the following convergence theorem, which is a generalization of von
Neumann’s result [195], who proved the case k = 2.
Theorem 151 (Halperin). Let M1 , M2 , . . . , Mk be closed subspaces of the real Hilbert
space H, M = ∩ki=1 Mi , and Pi = PMi (i = 1, . . . , k). For each x ∈ H,
lim (Pk Pk−1 · · · P1 )n x = PM x.
n→∞
(6.30)
For other convergence theorems we refer to [70] and [94]. Estimates for the
convergence speed are given in [16], [69], [86], [220], [171] and [72]. The estimates use the
following concepts of the angle between subspaces.
DeÞnition 152 (Friedrichs). The angle between the subspaces M and N of a Hilbert
space H is the angle α (M, N ) in [0, π/2] whose cosine is given by
n
⊥
c (M, N ) = sup |hx, yi| | x ∈ M ∩ (M ∩ N ) , kxk ≤ 1,
o
(6.31)
y ∈ N ∩ (M ∩ N )⊥ , kyk ≤ 1
Convergence and error estimates
126
DeÞnition 153 (Dixmier). The minimal angle between the subspaces M and N is the
angle α0 (M, N ) in [0, π/2] whose cosine is deÞned by
c0 (M, N ) = sup {|hx, yi| | x ∈ M, kxk ≤ 1, y ∈ N, kyk ≤ 1} .
(6.32)
The two deÞnitions are different except for the case M ∩ N = {0} when they
clearly agree. For the properties of these angle concepts we refer to Deutsch [71].
We only deal with the estimate of Smith, Solmon and Wagner [220], which is
perhaps the most practical one for k ≥ 2.
Theorem 154 (Smith, Solmon, Wagner). For j = 1, . . . , k, let Pj be the orthogonal
k
projection on Mj , where Mj is a closed subspace
¢ M = ∩i=1 Mi and let PM
¡ of kH. DeÞne
be the orthogonal projection on M . If θj =α Mj , ∩i=j+1 Mi , then for any x ∈ H, and
integer n ≥ 1,
k(Pk · · · P2 P1 )n x − PM xk ≤ cnSSW kx − PM xk
(6.33)
´1/2
³
Qk−1
.
with cSSW = 1 − j=1 sin2 θj
This estimate is very usuful in computer tomography and other applications (see,
e.g., [220], [137]). The difficulty of applying the estimate is the computation of the angles
θj even in Þnite-dimensional cases. For an important special case that includes the Kaczmarz method of Section 5.1 we develop an estimate of cSSW , which is easy to compute.
6.2.1 The special case of the alternating projection method
Consider the linear system
¢
¡
(6.34)
Ax = b
A ∈ Rm×m , det (A) 6= 0
and set
Im = [E1 , . . . , Er ]
¡
¢
Ei ∈ Rm×mi , i = 1, . . . , r .
(6.35)
The block Kaczmarz algorithm [169], [242] starts from an arbitrary point x0 ∈ Rm and
has the form
¡
¢−1 T
Ei r (xj ) (i ≡ j (mod r) + 1) ,
(6.36)
xj+1 = xj − AT Ei EiT AAT Ei
where r (x) = Ax − b. For m = r, the method coincides with the original Kaczmarz
method. It is easy to see that
³
¡
¢−1 T ´
¢
¡
Ei A (xj − ω) = I − PR(AT Ei ) (xj − ω)
xj+1 − ω = I − AT Ei EiT AAT Ei
and for n ≥ 1,
where
¡
¢
xnr − ω = Q x(n−1)r − ω = Qn (x0 − ω) ,
¢ ¡
¢
¡
Q = I − PR(AT Er ) · · · I − PR(AT E1 )
(6.37)
(6.38)
is a product of orthogonal projectors.¡
¢
¡
¢
It is clear that Mi = R⊥ AT Ei (i = 1, . . . , r), M = ∩ri=1 R⊥ AT Ei =
¡
¢
R⊥ AT , Q = PR⊥ (AT Er ) . . . PR⊥ (AT E1 ) and PM = PR⊥ (AT ) . If the matrix A is nonsin¡ ¢
gular, then R⊥ AT = {0} and PM = 0. Hence Qn → 0 and xj → ω follow from the von
Neumann-Halperin theorem.
Bounds for the convergence rate of the alternating projection method
127
The situation is similar for other iterative projection methods such as the Altman
method or the Householder-Bauer class of projection methods (see, e.g., [146], [94]).
Therefore we investigate the following special case of the alternating projection
method, where
Mj = R⊥ (Xj ) ,
Xj ∈ Rm×nj ,
XjT Xj = I
(j = 1, . . . , k) .
(6.39)
Then Pj can be written in the form Pj = I − Xj XjT (j = 1, . . . , k),
¢ ¡
¢
¡
Q = Pk · · · P2 P1 = I − Xk XkT . . . I − X1 X1T ,
M = ∩kj=1 R⊥ (Xj ) = R⊥ ([X1 , . . . , Xk ]) = R⊥ (X) ,
and PM = PR⊥ (X) = I − PR(X) . Since all orthogonal projections can be represented in
the above form, the special case is the Þnite-dimensional case of the alternating projection
method. In this special case the Smith-Solmon-Wagner theorem has the form
°
°£¡
°
°
¢ ¡
¢¤n
°
°
y − PR⊥ (X) y ° ≤ cnSSW °PR(X) y °
° I − Xk XkT . . . I − X1 X1T
(y ∈ Rm )
(6.40)
with

and
cSSW = 1 −
k−1
Y
j=1
1/2
sin2 θj 
¢
¡
¡
¢
θj = α Mj , ∩ki=j+1 Mi = α R⊥ (Xj ) , R⊥ ([Xj+1 , . . . , Xk ]) .
(6.41)
(6.42)
In the next section we derive an estimate for cSSW .
6.2.2 A new estimate for the convergence speed
We need to determine the angle
¢
¡
¡
¢
θj = α Mj , ∩ki=j+1 Mi = α R⊥ (Xj ) , R⊥ ([Xj+1 , . . . , Xk ]) .
Note that
¢
¡
Mj ∩ ∩ki=j+1 Mi = R⊥ ([Xj , Xj+1 , . . . , Xk ]) 6= {0} ,
if R ([Xj , Xj+1 , . . . , Xk ]) 6= Rm . Using Theorem 16 of Deutsch [71], which says that
¡
¢
¡ ¡
¢¢
c (M, N ) = c M ⊥ , N ⊥
(cos (θ (M, N)) = cos θ M ⊥ , N ⊥ ),
and that 0 ≤ θ ≤ π/2, we obtain that
θj = α (R (Xj ) , R ([Xj+1 , . . . , Xk ])) .
Assuming that X = [X1 , . . . , Xk ] is of maximum column rank, we obtain that
R (Xj ) ∩ R ([Xj+1 , . . . , Xk ]) = {0}. Hence
θj = α0 (R (Xj ) , R ([Xj+1 , . . . , Xk ])) ,
where α0 is the minimal angle (the Þrst principal angle).
Convergence and error estimates
128
The principal angles can be determined by computing singular value decompositions (see Björck and Golub [26]). The error constant cSSW would require the calculation
of k − 1 SVD’s. Instead, we exploit the following observations of Zassenhaus [251] and
Ben-Israel [21].
Let M, N ⊂ Rm be subspaces with dim (M) = p and dim (N ) = q. Let the
columns of U ∈ Rm×p and V ∈ Rm×q form a basis of M and N , respectively. For
¡
¢−1 T ¡ T ¢−1
p ≥ q the eigenvalues of V T U U T U
U V V V
∈ Rq×q are the squares of the
cosines of the principal angles between M and N . If p < q, then the Þrst p eigenvalues
¡
¢−1 T ¡ T ¢−1
of V T U U T U
U V V V
∈ Rq×q are the squares of the cosines of the principal
angles between M and N provided that the eigenvalues are given in descending order.
In our case
Pk
U = [Xj+1 , . . . , Xk ] = X k−j| ∈ Rm×
V = Xj and
i=j+1
ni
(X = [X1 , . . . , Xk ]),
¡
¢−1 T ¡ T ¢−1
V T U UT U
U V V V
= XjT PR(X k−j| ) Xj
(j)
is a positive semideÞnite matrix ofnthe size nj × noj . Let us denote by λi (i = 1, . . . , nj )
P
(j)
and ϑi (i = 1, . . . , n
ej , n
ej = min nj , ki=j+1 ni ) the eigenvalues (in decreasing order)
(j)
(j)
and the corresponding principal angles, respectively. We assume that ϑ1 ≤ ϑ2 ≤ . . . ≤
(j)
ϑne j . Then by the above Zassenhaus-Ben-Israel result we have that
³
´
(j)
(j)
(j)
cos2 ϑi
= λi
(i = 1, . . . , n
ej ) , λi = 0 (e
nj + 1 ≤ i ≤ nj ) .
´ Q
³
nj
(j)
(j)
Since by deÞnition θj = ϑ1 and det XjT PR(X k−j| ) Xj = i=1
λi , we obtain that
nj
³
´
Y
(j)
0 ≤ det XjT PR(X k−j| ) Xj = cos2 (θj )
λi ≤ cos2 (θj ) .
i=2
The eigenvalues of the positive semideÞnite Hermitian matrix
´
³
I − XjT PR(X k−j| ) Xj = XjT I − PR(X k−j| ) Xj
(j)
are 1 − λi
(j)
(0 ≤ 1 − λi ≤ 1, i = 1, . . . , nj ). Hence
´
³
³
´ ´
³
0 ≤ det I − XjT PR(X k−j| ) Xj = det XjT I − PR(X k−j| ) Xj =
= sin2 (θj )
nj ³
n
ej ³
´
´
Y
Y
(j)
(j)
1 − λi
= sin2 (θj )
1 − λi
≤ sin2 (θj ) .
i=2
Thus we have
k−1
Y
j=1
i=2
³
´ ´ k−1
³
Y
T
det Xj I − PR(X k−j| ) Xj ≤
sin2 (θj )
(6.43)
j=1
and
c2GM = 1 −
k−1
Y
j=1
k−1
´ ´
³
³
Y
det XjT I − PR(X k−j| ) Xj ≥ 1 −
sin2 (θj ) = c2SSW .
j=1
There is equality if all nj = 1 for j = 1, . . . , k.
We prove the following Gram-Hadamard type result.
(6.44)
Bounds for the convergence rate of the alternating projection method
129
Theorem 155 Let X ∈ Rm×p and Y ∈ Rm×q (p + q ≤ m). Then
³
´
¡
¢ ¢
¡
¢
¡
det [X, Y ]T [X, Y ] = det X T X det Y T I − PR(X) Y .
(6.45)
Proof. We decompose Y in the form Y = YS + YN , where¡ R (YS ) ⊂ ¢R (X) and
R (YN ) ⊂ R⊥ (X). It is easy to see that YS = PR(X) Y and YN = I − PR(X) Y . Hence
YST YN = 0, X T YN = 0 and
· T
¸ · T
¸
X X XT Y
X X
X T YS
[X, Y ]T [X, Y ] =
=
.
YST X YST YS + YNT YN
Y TX Y TY
Since YS = XC for some C ∈ Rp×q we can write
·
¸· T
Ip 0
X X
[X, Y ]T [X, Y ] =
C T Iq
0
Hence
0
YNT YN
¸·
Ip
0
C
Iq
¸
.
³
´
¢
¡
¢
¡
det [X, Y ]T [X, Y ] = det X T X det YNT YN ,
which clearly gives the requested result.
Corollary 156 The equality
³
´
¡
¢ ¢
¡
¢
¡
det [X, Y ]T [X, Y ] = det Y T Y det X T I − PR(Y ) X
(6.46)
also holds
Proof. The statement follows from the identity
³
´
³
´
det [X, Y ]T [X, Y ] = det [Y, X]T [Y, X] .
¡
¢
Corollary 157 If det X T X > 0, then
¢
¡
¡
¢ ¢
¡
¢
¡
det YNT YN = det Y T I − PR(X) Y ≤ det Y T Y .
Proof. The statement follows from the inequality
³
´
¡
¢
¡
¢
det [X, Y ]T [X, Y ] ≤ det X T X det Y T Y
(see, e.g., [187], [58]).
Achieser, Glasmann [13] includes the following result. Let X = [x1 , . . . , xk ],
xi ∈ Rm (i = 1, . . . , k). Then
µ³
¶°
°2
´T
¢
¡
°
°
det X T X = det X k−1| X k−1| °x1 − PR(X k−1| ) x1 ° ,
2
which can be written in the form
µ³
¶
´T
³
´
¡ T ¢
k−1|
k−1|
det X X = det X
X
xT1 I − PR(X k−1| ) x1
µ³
¶
´ ´
´T
³ ³
k−1|
k−1|
= det X
X
det xT1 I − PR(X k−1| ) x1 .
Convergence and error estimates
130
Hence the previous theorem is a generalization of this result.
Next we show that
k−1
Y
j=1
³
´ ´
³
¡
¢
det XjT I − PR(X k−j| ) Xj = det X T X .
(6.47)
Theorem 155 yields the recursion
µ³
µ³
¶
¶
´ ´
´T
³
´T
³
det X k−j+1| X k−j+1| = det XjT I − PR(X k−j| ) Xj det X k−j| X k−j|
and the expression


µ³
¶
k−1
´ ´
³
´T
³
Y
¡ T ¢
T
1|
1|


det X
det Xj I − PR(X k−j| ) Xj
X
.
det X X =
j=1
´
³¡
¡
¢T
¢
Since det X 1| X 1| = det XkT Xk = 1, we obtain that
´ ´
³
³
Y
¢ k−1
¡
det XjT I − PR(X k−j| ) Xj .
det X T X =
(6.48)
j=1
Hence
¡
¢
c2GM = 1 − det X T X .
We can summarize our Þndings as follows.
Theorem 158 Assume that Xj ∈ Rm×nj , XjT Xj = I for j = 1, . . . , k and X =
[X1 , . . . , Xk ] is of maximum column rank. Then for any l ≥ 1,
°
°£¡
°
°
¢ ¡
¢¤l
°
°
° I − Xk XkT . . . I − X1 X1T y − PR⊥ (X) y ° ≤ clGM °PR(X) y ° (y ∈ Rm ) , (6.49)
¡
¡
¢¢1/2
, cGM ≥ cSSW and cGM = cSSW if nj = 1 for j =
where cGM = 1 − det X T X
1, . . . , k.
It is known [58], that for the Gram matrix X T X ∈ Rn×n the inequality
¢
¡
2
2
2
0 ≤ det X T X ≤ kx1 k kx2 k · · · kxn k
(6.50)
0 ≤ cGM < 1.
(6.51)
holds. The lower extreme occurs if and only if the vectors xi are dependent. The upper
extreme occurs if and only if the vectors are orthogonal.
Since
¢ X has maximum column
¡
rank and all vectors are normalized, we have 0 < det X T X ≤ 1 and
The estimate cGM is increasing with the column dimension of X.
Our estimate,
can be easily calculated by the Gaussian
¡ ¢ i.e. the Gram determinant, P
k
elimination in O n3 arithmetic operations (n = j=1 nj ≤ m). For other techniques for
determinants and special details we refer to Pan, Yu and Stewart [205].
When applying the result to the case of block Kaczmarz (or similar) method we
¡
¢−1 T
Ei A instead of Xi XiT that satisÞes XiT Xi = I. If we take
have AT Ei EiT AAT Ei
the Cholesky decomposition EiT AAT Ei = LLT , then Xi = AT Ei L−T will satisfy the
requirements.
Bounds for the convergence rate of the alternating projection method
6.2.3 An extension of the new estimate
For the relaxed block Kaczmarz algorithm
¡
¢−1 T
Ei r (xj )
xj+1 = xj − µi AT Ei EiT AAT Ei
(i ≡ j (mod r) + 1)
with 0 < µi < 2 (i = 1, . . . , r) we have
¡
¢
xnr − ω = Q x(n−1)r − ω = Qn (x0 − ω) ,
131
(6.52)
(6.53)
where
¡
¢ ¡
¢
Q = I − µr PR(AT Er ) · · · I − µ1 PR(AT E1 ) .
(6.54)
If P is an orthogonal projector, then I − µP is not a projector for µ 6= 0, 1. Hence we need
an extension of the von Neumann-Halperin theorem and the convergence estimates. Such
extensions are given for contractive or nonexpansive mappings of Hilbert spaces (see, e.g.,
[20] or [94]). Here we quote the result of Nelson and Neumann [193], which is the only
known speed estimate for the contractive case.
For A ∈ Cm×m let
γ (A) = max {|λ| | λ ∈ {0} ∪ σ (A) Â {1}} .
(6.55)
DeÞnition 159 A matrix B ∈ Cm×m is called paracontracting, if the spectral norm of B
is bounded by unity and if
0 6= x ∈ N ⊥ (I − B) ⇒ kBxk2 < kxk2 .
If a matrix B is paracontracting, then the contraction constant
©
ª
c (B) = inf c ∈ [0, ∞) | ∀x ∈ N ⊥ (I − B) , kBxk2 ≤ c kxk2
(6.56)
satisÞes 0 ≤ c (B) < 1. If B is Hermitian, then c (B) = γ (B). For any orthogonal
projection P , c (P ) = 0. If B = I − ωP , 0 < ω < 2 and P 6= 0 is orthogonal projection,
then c (B) = |1 − ω|.
Theorem 160 (Nelson-Neumann). Let B = Bk · · · B1 be the product of k paracontracting
matrices Bi ∈ Cm×m . Then B is paracontracting,
γ (B) ≤ c (B) < 1
and
hold. Furthermore
(6.57)
° l
°
°
°
°B x − PN (I−B) x° ≤ cl (B) °x − PN (I−B) x°
2
2
N (I − B) = ∩ki=1 N (I − Bi )
(6.58)
(6.59)
and
c (B) ≤
(
k
Y
Y
£
¤ k−1
2
1−
1 − c (Bi )
sin2 θi
i=1
i=1
)1/2
,
(6.60)
where θi denotes the angle between the subspaces
N (I − Bi )
and
∩kj=j+1 N (I − Bi ) .
In particular, if γ (Bi ) = c (Bi ) for i = 1, . . . , k then
(
)1/2
k
Y
Y
£
¤ k−1
2
2
1 − γ (Bi )
γ (B) ≤ 1 −
sin θi
.
i=1
i=1
(6.61)
(6.62)
Convergence and error estimates
132
If Bi = I − µi Pi , Pi is orthogonal projection¡and 0 < µi < 2 for i =¢1, . . . , k, then
I − Bi = µi Pi , N (I − B) = ∩ki=1 N (Pi ) and θi = α N (Pi ) , ∩kj=i+1 N (Pj ) . Hence
°
°
°
°
°
°
°
°
l
°[(I − µk Pk ) . . . (I − µ1 P1 )] x − P∩ki=1 N (Pi ) x° ≤ cl (B) °x − P∩ki=1 N (Pi ) x°
2
2
holds with
c (B) ≤
(
1−
k h
Y
i=1
2
1 − (1 − µi )
i k−1
Y
i=1
2
sin θi
)1/2
.
Since in our special case Pi = Xi XiT (XiT Xi = I) and N (Pi ) = R⊥ (Xi ) for i =
¡ T ¢
Q
2
1, . . . , k, we have ∩ki=1 N (Pi ) = ∩ki=1 R⊥ (Xi ) = R⊥ (X) and k−1
i=1 sin θi ≥ det X X .
Thus we have the following extension of Theorem 158.
Theorem 161 Assume that Xj ∈ Rm×nj , XjT Xj = I, 0 < µj < 2 for j = 1, . . . , k and
X = [X1 , . . . , Xk ] is of maximum column rank. Then for any l ≥ 1,
°
°£¡
°
°
¢ ¡
¢¤l
°
°
° I − µk Xk XkT . . . I − µ1 X1 X1T y − PR⊥ (X) y ° ≤ clAGM °PR(X) y ° (y ∈ Rm ) ,
(6.63)
where
cAGM =
Ã
!1/2
k h
i
Y
¡ T ¢
2
1 − (1 − µi ) det X X
1−
.
(6.64)
i=1
Thus we obtained an easily computable bound for the convergence speed of the
relaxed Kaczmarz method. We can observe that cAGM ≥ cGM and cAGM = cGM , if
µi = 1 for i = 1, . . . , k. This is in contrast with the expectation that relaxation methods
are faster, which is not always the case (see, e.g., [94]).
6.2.4 A simple computational experiment
The error constant cGM of the estimate (6.49) is the largest, when X has the rank m.
Then R (X) = Rm and the estimates (6.33) and (6.49) becomes essentially the norm
estimates
°¡
¢ ¡
¢°
° I − Xk XkT . . . I − X1 X1T ° ≤ cSSW ≤ cGM .
(6.65)
Setting the parameters k = 4, n1 = 1, n2 = n3 = n4 = 3 and m = 10 we calculated
in MATLAB the true norm (solid line with circles), the error constant cSSW (solid line
with triangles) and the error constant cGM (solid line with stars) for 25 random 10 × 10
matrices with uniformly distributed entries in (0, 1) (Figure 7). We computed the same
for Gauss matrices (Figure 8). The corresponding relative errors are also given in the
bottom half of the Þgures.
We can see that for the Þrst case the two estimates are almost overlapping. For
the second case, there are observable differences with very small differences in relative
errors. In both cases cSSW overestimates the true error. This problem has been studied
in [72] and [171], but it is practically unsolved.
Bounds for the convergence rate of the alternating projection method
Figure 7 Estimates for rand(10) matrices
Figure 8 Estimates for randn(10) matrices
133
Convergence and error estimates
134
6.2.5 Final remarks
We show that a special case of Theorem 158 is equivalent with Meany’s inequality [186],
which inspired us to develop our results. Meany’s inequality has the form
¡
¡
¢¢1/2
kyk2
kQyk2 ≤ 1 − det X T X
(y ∈ R (X)) ,
where X = [x1 , . . . , xk ], xi ∈ Rm , kxi k2 = 1 (i = 1, . . . , k) and
¢ ¡
¢
¡
Q = I − xk xTk . . . I − x1 xT1 .
In the above case Theorem 158 gives the bound
° n
°
°
°
°Q z − PR⊥ (X) z ° ≤ cnGM °PR(X) z °
2
2
(z ∈ Rm )
Theorem 162 The Meany inequality and the inequality
° n
°
°
°
°Q z − PR⊥ (X) z ° ≤ cnGM °PR(X) z °
2
2
(z ∈ Rm ) ,
(6.66)
(6.67)
(6.68)
¡
¡
¢¢1/2
with cGM = 1 − det X T X
. We can prove the following result.
(6.69)
are equivalent.
Proof. Assume Þrst that Meany’s inequality holds. Let z = z1 + z2 , where
z1 ∈ R (X) and z2 ∈ R⊥ (X). Then kz1 k2 , kz2 k ≤ kzk2 . By deÞnition
Qn z − PR⊥ (X) z = Qn z1 + Qn z2 − z2 .
¢
¡
Since z2 ⊥ xi for all i and I − xi xTi z2 = z2 , we have Qn z2 = z2 . Hence
Qn z − PR⊥ (X) z = Qn z1 .
¢
¡
¡
¢
We prove now that Ql z1 ∈ R (X). Let y ∈ R (X). Then I − xi xTi y = y − xi xTi y ∈
n
R (X) for any i. Hence Qy ∈ R
° Thus if z1 ∈ R (X), then for any n, Q z1 ∈ R (X)
° (X).
l
also holds. It is also clear that °Q z1 °2 ≤ kz1 k2 ≤ 1. Hence by repeated use of the Meany
inequality we obtain that
°
°
° ¡
¢°
kQn z1 k2 = °Q Qn−1 z1 °2 ≤ cGM °Qn−1 z1 °2
≤ cnGM kz1 k2 .
Conversely, let y ∈ R (X). Then
°
°
kQn yk2 = °Qn y − PR⊥ (X) y °2 = cnGM kyk2 ,
which gives just the Meany inequality for n = 1.
We note that Meany used an elementary geometrical reasoning, which can not be
extended to general cases. Meany’s inequality can be used directly to prove the convergence of several classical iterative projection methods (see [94]).
Chapter 7
APPENDICES
7.1
Notation
R denotes the set of real numbers, while R+ is the set of non-negative real numbers. C
denotes the set of complex numbers. F denotes a Þeld (here R or C). Fn is the vector
space of n-tuples of elements over F. Similarly, Fm×n is the vector space of m×n matrices
over F. The range and null spaces of a matrix A ∈ Fm×n will be denoted by R (A) and
N (A), respectively.
Let A ∈ Fm×n , α = {i1 , . . . , ik } ⊆ {1, . . . , m}, β = {j1 , . . . , jk } ⊆ {1, . . . , n},
0
α = {1, . . . , m} \α and β 0 = {1, . . . , n} \β. Then A [α, β] denotes the submatrix of
A lying in the rows indicated by α and the columns indicated by β. Furthermore let
Πα = [ei1 , . . . , eik ] denote the partial permutation matrix. Thus A [α, β] = ΠTα AΠβ .
Given any matrix A, the matrices Ak , A|k , Ak and Ak| will denote the submatrices
consisting of the Þrst k rows, the Þrst k columns, the last k rows and the last k columns,
respectively. Thus A|k is the leading principal submatrix of order k. This notation is due
to Householder (see [229]).
n
Let A = [aij ]i,j=1 . Then
diag (A) = diag (a11 , a22 , . . . , ann ) .
Let M and N be linear subspaces of any vector space V . Then PM,N denotes the
oblique projection onto M along N . For the orthogonal projections when N = M ⊥ we
use the notation PM .
The Hölder p-norm of vectors is denoted by k·kp (p ≥ 1). The corresponding
induced matrix norm is also denoted by k·kp . The Frobenius norm of vectors and matrices
will be denoted by k·k2 and k·kF , respectively.
°
°number of a matrix
° −1 °The standard condition
°A ° or κ (A) = kAk °A−1 °. For p-norms we
A is denoted by either cond(A) = kAk
°
°
also use the notation κp (A) = kAkp °A−1 °p .
DeÞnition 163 For any A ∈ Fm×n let
m,n
|A| = [|aij |]i,j=1 ∈ Rm×n .
We deÞne the natural partial ordering of real vectors and matrices as follows.
DeÞnition 164 Let A, B ∈ Rm×n . Then A ≤ B if and only if aij ≤ bij for all i =
1, . . . , m and j = 1, . . . , n.
The absolute value |A| of matrix A satisÞes the following properties:
(i) |A| ≥ 0 (A ∈ Fm×n ), |A| = 0 ⇔ A = 0;
(ii) |λA| = |λ| |A| (λ ∈ F);
(iii) |A + B| ≤ |A| + |B| (A, B ∈ Fm×n );
(iv) |AB| ≤ |A| |B| (A ∈ Fm×k , B ∈ Fk×n ).
|A| is sometimes called a matricial norm (see, e.g., [68]).
Appendices
136
DeÞnition 165 A matrix A ∈ Rm×m is said to be an M-matrix, if A = sI − B, where
s > 0, B ≥ 0 and s ≥ ρ(B), with ρ(B) denoting the spectral radius of B.
The M -matrix A = sI − B is nonsingular if s > ρ (B). An equivalent deÞnition
is given by
DeÞnition 166 A matrix A ∈ Rn×n is said to be a nonsingular M-matrix if aij ≤ 0 for
all i 6= j and A−1 ≥ 0.
If A ∈ Rn×n is a nonsingular M -matrix, then aii > 0 for all i = 1, . . . , n (see,
e.g., [23]).
DeÞnition 167 A ∈ Rn×n is a Z-matrix, if aij ≤ 0 (i 6= j).
The n × n type Z-matrices are denoted by Z n×n .
Lemma 168 Assume that B ∈ Z n×n and A is an M -matrix with A ≤ B. Then B is
also an M -matrix and 0 ≤ B −1 ≤ A−1 .
Theorem 169 Let A, B ∈ Rn×n . If |A| ≤ B, then ρ (A) ≤ ρ (|A|) ≤ ρ (B) .
Corollary 170 Let A, B ∈ Rn×n . If 0 ≤ A ≤ B, then ρ (A) ≤ ρ (B).
For more on M -matrices the reader is referred to [23], [219], [144], [249], [145].
7.2
Unitarily invariant matrix norms and projector norms
The present section is based on [109].
DeÞnition 171 A matrix norm k.k : Fn×n → R+ is called unitarily invariant, if kAk =
kU AV k holds for all unitary matrices U and V .
n×n
F
Let A = U ΣV H be the singular value decomposition of A ∈ Fn×n , where U, V ∈
are unitary matrices and
Σ = diag (σ1 , . . . , σn ) ∈ Rn×n
with σ1 ≥ . . . ≥ σn ≥ 0. Then° kAk° = kΣk holds for all A ∈ Fn×n in any unitarily invariant
matrix norm. Hence kAk = °AH ° also holds. Let us denote by σ (A) = [σ1 , . . . , σn ] the
decreasingly ordered vector of singular values.
DeÞnition 172 A function φ : Rn → R is called a symmetric gauge function, if
(i) φ (u) > 0 for all 0 6= u ∈ Rn ;
(ii) φ (γu) = |γ| φ (u) for all γ ∈ R and u ∈ Rn ;
(iii) φ (u + v) ≤ φ (u) + φ (v) for all u, v ∈ Rn ;
(iv) φ (u) = φ (diag (²1 , . . . , ²n ) Πu), whenever Π is a permutation matrix and ²i = ±1
(i = 1, . . . , n).
Von Neumann [194] proved the following result (see also Schatten [213], or Horn
and Johnson [144]).
Theorem 173 (von Neumann). If φ is a symmetric gauge function and A ∈ Fn×n , then
the matrix function deÞned by φ (σ1 (A) , . . . , σn (A)) is a unitarily invariant matrix norm.
Conversely, every unitarily invariant matrix norm k.k has a representation of the form
kAk = φ (σ1 (A) , . . . , σn (A)) ,
where φ is a symmetric gauge function.
(7.1)
Unitarily invariant matrix norms and projector norms
137
The following examples of symmetric gauge functions are well known. Function
φ (u) = max |ui |
(7.2)
1≤i≤n
corresponds to the spectral norm. Function
!1/p
à n
X
p
|ui |
,
φ (u) =
i=1
p≥1
(7.3)
generates Schatten’s p norm. For p = 1 and p = 2 it corresponds to the trace and the
Frobenius norms, respectively. Finally, function
φk (u) =
max
i1 <i2 <...<ik
(|ui1 | + . . . + |uik |) .
(7.4)
generates the so called Ky Fan k-norm. Function (7.2) is Ky Fan’s k-norm for k = 1.
The proof of the following result can be found in Schatten [213].
Theorem 174 (Schatten). For symmetric gauge functions φ, the relation 0 ≤ u ≤ v
(u, v ∈ Rn ) implies
φ (u1 , . . . , un ) ≤ φ (v1 , . . . , vn ) .
(7.5)
If the symmetric gauge function φ is normalized such that
φ (1, 0, . . . , 0) = 1,
(7.6)
then
max |ui | ≤ φ (u) ≤
1≤i≤n
n
X
i=1
|ui |
(u ∈ Rn )
(7.7)
holds.
DeÞnition 175 A matrix norm k·k is submultiplicative or consistent if it satisÞes the
inequality kABk ≤ kAk kBk.
A unitarily invariant matrix norm is submultiplicative if and only if kAk ≥ σ1 (A)
for all A ∈ Fn×n . Particularly, the norms generated by the symmetric gauge functions
(7.2)-(7.4) are all submultiplicative (see, Horn-Johnson [144], p. 450).
We use the following generalization of the unitarily invariant norms to rectangular
matrices. Let A ∈ Fr×s be an arbitrary rectangular matrix and 1 ≤ r, s ≤ n. Let A be
augmented by zeros such that
·
¸
A 0
 =
∈ Fn×n .
(7.8)
0 0
If k.k denotes any unitarily invariant matrix norm on Fn×n , then the quantity
° °
° °
kAk = °Â°
(7.9)
deÞnes the unitarily invariant matrix norm of the rectangular matrix A (see, e.g., [178],
[179] or [144]). The corresponding condition number of A ∈ Fr×s (1 ≤ r, s ≤ n) is then
deÞned as
° °
(7.10)
cond (A) = kAk °A+ ° ,
Appendices
138
where A+ ∈ Fs×r is the Moore-Penrose inverse of matrix A (see [178], [179] and the next
section). Assuming that φ is given for any n ≥ 1 Stewart [230] deÞnes the unitarily
invariant norm of A ∈ Fr×s (1 ≤ r, s ≤ n) by
kAk = φ (σ1 (A) , . . . , σp (A))
(p = min {r, s}) .
(7.11)
We use the singular value decomposition and the principal angles between subspaces to determine the unitarily invariant norm of oblique projections. The taken approach is based on [109].
Let M, N ⊂ Rn be subspaces with p = dim (M ) ≥ dim (N) = q. Denote the
principal angles between M and N by θi (i = 1, . . . , q). Let U = [U1 , U2 ] ∈ Rn×n be
orthogonal such that U1 ∈ Rn×p is a basis of M , and U2 ∈ Rn×(n−p) is a basis of M ⊥ .
Similarly, let V = [V1 , V2 ] ∈ Rn×n be an orthogonal matrix such that V1 ∈ Rn×q is a basis
of N , and V2 ∈ Rn×(n−q) is a basis of N ⊥ .
Assume now that M and N are complementary. Then p+q = n and 0 < θi ≤ π/2
for all i = 1, . . . , q. It is easy to verify that
PM,N = U
·
0
0
¡ T ¢−1 ¸
V2 U1
VT
0
PN,M = V
·
0
0
¡ T ¢−1 ¸
U2 V1
UT .
0
and
Let V2T U1 = XΣY T be the singular value decomposition of V2T U1 with orthogonal matrices X,Y ∈ Rp×p . The Björck-Golub theorem (see, e.g. [26], [128] or Section 2.2) implies
that
Σ = diag(1, . . . , 1, sin (θq ) , . . . , sin (θ1 )) ∈ Rp×p .
| {z }
p−q
Then the singular value decomposition of projection PM,N has the form
PM,N = [U1 Y, U2 ]
·
Σ−1
0
0
0
¸·
X T V2T
V1T
¸
.
(7.12)
The Björck-Golub theorem also implies that U1 Y and V2 X are the principal vectors related
e Ỹ T be the singular
to the principal angles between M and N ⊥ . Similarly, let U2T V1 = X̃ Σ
value decomposition of U2T V1 , where X̃, Ỹ ∈ Rq×q are orthogonal matrices and
e = diag (sin (θq ) , . . . , sin (θ1 )) ∈ Rq×q .
Σ
Then the singular value decomposition of PN,M is given by
i · e −1
h
Σ
PN,M = V1 Ỹ , V2
0
0
0
¸·
X̃ T U2T
U1T
¸
,
(7.13)
where V1 Ỹ and U2 X̃ are the principal vectors related to the angles between M ⊥ and N .
The singular value decompositions of PM,N and PN,M and Theorem 173 of von Neumann
lead to the following result.
Unitarily invariant matrix norms and projector norms
139
Proposition 176 For any projection 0 6= PM,N 6= I,
p−q
and
z }| {
1
1
kPM,N k = φ(
,... ,
, 1, . . . , 1, 0, . . . , 0)
sin (θ1 )
sin (θq )
kPN,M k = φ(
1
1
,... ,
, 0, . . . , 0),
sin (θ1 )
sin (θq )
(7.14)
(7.15)
where φ is the symmetric gauge function.
Proposition 177 For any projection PM,N we have
kPM,N k ≥ kPM k
(7.16)
in any unitarily invariant matrix norm.
Proof. Since 1/ sin (θi ) ≥ 1, the monotonicity of φ implies
p
z }| {
kPM,N k ≥ φ(1, . . . , 1, 0, . . . , 0).
The norm of PM,N is minimal, if θi = π/2 for all i = 1, . . . , q. This means that M ⊥ N
(or M = N ⊥ ). The minimal value depends only on the dimension p of subspace M .
The result is well known for the spectral norm.
We now give bounds for projections and projected vectors. We need the following
result.
Lemma 178 Let A ∈ Rr×s and B ∈ Rr×k (1 ≤ k, r, s ≤ n, k + s ≤ n) be arbitrary
matrices. Then, in any unitarily invariant matrix norm of Rn×n ,
kAk ≤ k[A, B]k .
(7.17)
Proof. The singular values of the augmented n × n



A 0 0
A B
 =  0 0 0  , F =  0 0
0 0 0
0 0
are deÞned by the eigenvalues of
matrices

0
0 
0

and

AT A 0 0
0 0 
C = ÂT Â =  0
0
0 0

AT A AT B
T
D = F F =  BT A BT B
0
0

0
0 ,
0
respectively. The eigenvalues of C are the s eigenvalues of AT A and n−s zero eigenvalues.
The eigenvalues of D are the k + s eigenvalues of
· T
¸
A A AT B
D̃ =
B T A BT B
Appendices
140
and n−(k + s) zeros. Since D̃ is symmetric, the Poincaré separation theorem [161] implies
that
³ ´
³ ´
¡
¢
2
2
b = λi AT A ≤ λi+k D̃ = σs−i+1
A
σs−i+1
(F ) , i = 1, . . . , s.
As
³ ´
σ Â = [σ1 (A) , . . . , σs (A) , 0, . . . , 0]T ,
σ (F ) = [σ1 (F ) , . . . , σs (F ) , . . . , σk+s (F ) , 0, . . . , 0]T
we obtain
³ ´
0 ≤ σ Â ≤ σ (F ) .
Hence
³ ³ ´´
φ σ Â ≤ φ (σ (F ))
follows from the monotonicity of the symmetric gauge function proving the statement of
the lemma.
Lemma 178 was proved in spectral norm by Hanson and Lawson [138] in a different
way. It is obvious that
°·
¸°
° T ° ° AT °
°
°A ° ≤ °
° BT °
also holds. Lemma 178 is not true in general. Bosznay and Garay [29] investigated the
induced norms of projections P : X → X in n-dimensional real vector spaces X. Let
N (X) be the set of vector norms deÞned on X. In any induced operator norm kIk = 1
and kP k ≥ 1 for P 6= 0. Denote by N1 (X) the set of those vector norms for which
P : X → X, P 2 = P, P 6= I, dim (R (P )) > 1 ⇒ kP k > 1
in the induced operator norm. Bosznay and Garay proved that for n ≥ 3 the set N1 (X)
is open and dense in N (X). Taking such a norm from N1 (X), X = Rn and


1 0 0 0
 0 1 0 0 

P =
 0 0 0 0 
0 0 0 0
we can see that kP k > 1 = kIk.
Theorem 179 (Egerváry). Let P ∈ Rn×n be a projection matrix of rank r and let
¡
¢
P = V WT
V, W ∈ Rn×r ,
(7.18)
be any full rank factorization. Then the column vectors of V and W are biorthogonal, that
is W T V = Ir .
Lemma 180 Let P ∈ Rn×n be a projection of rank r ≥ 1, P = X1 Y1T and I −P = X2 Y2T
full rank factorizations. Then X = [X1 , X2 ] and Y = [Y1 , Y2 ] are biorthogonal, that is
Y T X = I.
Unitarily invariant matrix norms and projector norms
141
Proof. Theorem 179 implies that Y1T X1 = Ir and Y2T X2 = In−r . By deÞnition
P (I − P ) = X1 Y1T X2 Y2T = 0.
The matrices X1 and Y2T have left and right inverses, respectively. Multiplying the equation by these left and right inverses we obtain, that Y1T X2 = 0. Similarly, we get
(I − P ) P = X2 Y2T X1 Y1T = 0
and Y2T X1 = 0. Thus we have
·
· T ¸
Ir
Y1
[X1 , X2 ] =
Y2T
0
0
In−r
¸
,
which is the stated biorthogonality relation.
Observe that X = [X1 , X2 ] ∈ Rn×n is nonsingular, R (P ) = R (X1 ) and N (P ) =
R (I − P ) = R (X2 ). In Householder notation we just obtained that
P = PR(X |r ),R(X n−r| ) ,
I − P = PR(X n−r| ),R(X |r ) .
(7.19)
Lemma 181 If P = PR(X |r ),R(X n−r| ) for some X ∈ Rn×n , then P and I − P can be
written in the form
¡
¢r
P = X |r X −1 ,
Proof. Since
·
Y1T
Y2T
¸
¡
¢n−r
I − P = X n−r| X −1
−1
= [X1 , X2 ]
(7.20)
= X −1 ,
¢r
¢n−r
¡
¡
follow.
then Y1T = X −1 and Y2T = X −1
Representation (7.20) is taken from [229]. Using Lemmas 178 and 181 we can
prove the following result [97].
Lemma 182 Let A ∈ Rn×n be an arbitrary nonsingular matrix. Then
°
°
°
°
°
°
°
°
°PR(A|k ),R(An−k| ) ° ≤ cond (A) , °PR(An−k| ),R(A|k ) ° ≤ cond (A)
holds in any submultiplicative unitarily invariant matrix norm.
Proof. Lemma 181 implies that
i
¡
¢k h
PR(A|k ),R(An−k| ) = A A−1 = A|k , 0
|k
" ¡
¢k #
A−1
0
from which
°"
#°
°
° °h
i° ° ¡ −1 ¢k °
°
°
°
°
° ° |k ° °
A
°PR(A|k ),R(An−k| ) ° ≤ ° A , 0 ° °
° ≤ kAk °A−1 ° = cond (A)
°
°
0
follows. The proof of the other statement is similar.
(7.21)
Appendices
142
7.3
Variable size test problems
Here we enlist the 32 variable size test problems used for testing the quasi-Newton ABS
methods. The Þrst 22 problems are taken from the Estonian collection of test equations
[211]. These problems contain the variable size Argonne test problems [191] as well. The
rest of test problems are selected from numerical ODE and PDE literature. The full
description of the test problems can be found in [121]. All test problems are given in the
T
form F (x) = [f1 (x) , . . . , fn (x)] = 0.
No. 1 (Schmidt)
f1 (x) = 1 − x1 ,
fi (x) = 10 (i − 1) (xi − xi−1 )
2
(i = 2, . . . , m) ,
T
xinitial = [−1.2, . . . , −1.2, −1] .
No. 2 (Price)
fi (x) = xi − 0.1x2i+1
fn (x) =
(i = 1, . . . , n − 1) ,
xn − 0.1x21 ,
T
xinitial = [2, . . . , 2] .
No. 3 (Brown)
fi (x) = − (n + 1) + xi +
fn (x) = −1 +
n
Y
n
X
xj
j=1
(i = 1, . . . , n − 1) ,
xj ,
j=1
xinitial = [0.5, . . . , 0.5]T .
No. 4 (Moré-Garbow-Hillstrom)
fi (x) = 1 − xi
(i = 1, 3, . . . , n − 1) ,
fi (x) = 10 (xi − xi−1 )
2
(i = 2, 4, . . . , n; n = 2k) ,
T
xinitial = [−1.2, 1, −1.2, 1, . . . , −1.2, 1] .
No.5 (Kearfott)


n
X
1 
i+
x3j 
fi (x) = xi −
2n
j=1
(i = 1, . . . , n) ,
T
xinitial = [1.5, 1.5, . . . , 1.5] .
No. 6 (Burmeister)
fi (x) = xi−1 − 2xi + xi+1 − h2 exp (xi )
x0 = xn+1 = 0, h = 1/ (n + 1) ,
T
xinitial = [0, . . . , 0] .
(i = 1, . . . , n) ,
Variable size test problems
143
No. 7 (Broyden)
fi (x) = (3 − kxi ) xi + 1 − xi−1 − 2xi+1
x0 = xn+1 = 0, k = 0.1,
(i = 1, . . . , n) ,
T
xinitial = [−1, −1, . . . , −1] .
No. 8 (Broyden)
X¡
¢
¢
¡
xj + x2j
fi (x) = k1 + k2 x2i xi + 1 − k3
(i = 1, . . . , n) ,
j∈Ii
Ii = {j | j 6= i, max {1, i − r1 } ≤ j ≤ min {n, i + r3 }} ,
k1 = k2 = k3 = 1, r1 = r2 = 3,
xinitial = [−1, −1, . . . , −1]T .
No. 9 (Maruster)
f1 (x) = x21 − 1,
fi (x) = x2i−1 + ln (xi ) − 1
(i = 2, . . . , n) ,
T
xinitial = [0.5, 0.5, . . . , 0.5] .
No. 10 (Kearfott)
Ã
Ã
fi (x) = exp cos i
T
xinitial = [0, . . . , 0] .
n
X
xk
k=1
!!
(i = 1, . . . , n) ,
No. 11 (Kearfott)


n
1 X 3
x + i
fi (x) =
2n j=1 j
(i = 1, . . . , n) ,
xinitial = [0, . . . , 0]T .
No. 12 (Maruster)
f1 (x) = x1 ,
fi (x) = cos (xi−1 ) + xi − 1
(i = 2, . . . , n) ,
T
xinitial = [0.5, 0.5, . . . , 0.5] .
No. 13 (Alefeld-Platzöder)
fi (x) = −xi−1 + 2xi − xi+1 + h2 (xi + sin (xi ))
x0 = 0, xn+1 = 1, h = 1/ (n + 1) ,
(i = 1, . . . , n) ,
T
xinitial = [1, 1, . . . , 1] .
No. 14 (No. 8 with k1 = 2, k2 = 5, k3 = 1, r1 = 5, r2 = 1)
X
¢
¡
fi (x) = 2 + 5x2i xi + 1 −
xj (1 + xj ) (i = 1, . . . , n) ,
j∈Ji
Ji = {j | j 6= i, max {1, i − 5} ≤ j ≤ min {n, i + 1}} ,
T
xinitial = [−1, −1, . . . , −1] .
Appendices
144
No. 15 (Moré-Garbow-Hillstrom)
f1 (x) =
f2 (x) =
P29 ¡ k ¢−1 n³
Pn ¡ k ¢j−1 ´
0 − 2k
xj
k=1 29
j=1 29
29
¸¾
·
³
¡ k ¢j−2
Pn ¡ k ¢j−1 ´2
Pn
xj −
xj − 1
∗
j=2 (j − 1) 29
j=1 29
¡
¡
¢¢
2
+x1 1 − 2 x2 − x1 − 1 ,
P29 ¡ k ¢0 n³
Pn ¡ k ¢j−1 ´
1 − 2k
xj
k=1 29
j=1 29
29
¸¾
·
³P
¡ k ¢j−1 ´2
¡ k ¢j−2
Pn
n
xj −
xj − 1
∗
j=2 (j − 1) 29
j=1 29
+x2 − x21 − 1,
fi (x) =
¡ k ¢j−1 ´
P29 ¡ k ¢i−2 n³
2k Pn
i
−
1
−
xj
k=1 29
j=1 29
29
¸¾
·
³
¢
¡
Pn ¡ k ¢j−1 ´2
Pn
k j−2
(j
−
1)
x
−
x
−
1
∗
j
j
j=2
j=1 29
29
(i = 3, . . . , n) ,
xinitial
T
= [0, . . . , 0] .
No. 16 (Chebyquad)
n
f2i−1 (x) =
1X
Y2i−1 (xj )
n j=1
(i ≥ 1),
n
f2i (x) =
1X
1
Y2i (xj ) +
n j=1
(2i)2 − 1
(i ≥ 1) ,
Y1 (x) = 2x − 1,
2
Y2 (x) = 2 [Y1 (x)] − 1,
Yi (x) = 2Y1 (x) Yi−1 (x) − Yi−2 (x) (i = 3, . . . , n) ,
·
¸T
2
n
1
,
,... ,
.
xinitial =
n+1 n+1
n+1
No. 17 (Rump)
fi (x) = 3xi (xi+1 − 2xi + xi−1 ) +
1
(xi+1 − xi−1 )2
4
(i = 1, . . . , n) ,
x0 = 0, xn+1 = 20,
T
xinitial = [10, . . . , 10] .
No. 18 (Moré-Garbow-Hillstrom)
fi (x) = 2xi − xi−1 − xi+1 + h2
x0 = xn+1 = 0,
1
,
n+1
∈ Rn .
h=
xinitial = [tj (tj − 1)]nj=1
(xi + ti + 1)3
2
ti = ih,
(i = 1, . . . , n) ,
Variable size test problems
145
No. 19 (Moré-Garbow-Hillstrom)

i
X
h
(1 − ti )
tj (xj + tj + 1)3
fi (x) = xi +
2
j=1

n
X
3
+ti
(1 − tj ) (xj + tj + 1)
(i = 1, . . . , n) ,
j=i+1
1
,
n+1
= [tj (tj − 1)]nj=1 ∈ Rn .
ti = ih,
xinitial
h=
No. 20 (Moré-Garbow-Hillstrom)
fi (x) = n −
xinitial =
·
n
X
j=1
cos (xj ) + i (1 − cos (xi )) − sin (xi )
1
1
,... ,
n
n
¸T
(i = 1, . . . , n) ,
.
No. 21 (Moré-Garbow-Hillstrom)


2 

n
n
X
X


fi (x) = xi − 1 + i 
j (xj − 1) 1 + 2 
j (xj − 1) 
j=1
(i = 1, . . . , n) ,
j=1
¸T
·
2
n−1
1
,0 .
xinitial = 1 − , 1 − , . . . , 1 −
n
n
n
No. 22 (No. 7 with k = 2)
fi (x) = (3 − 2xi ) xi + 1 − xi−1 − 2xi+1
x0 = xn+1 = 0,
(i = 1, . . . , n) ,
T
xinitial = [−1, . . . , −1] .
No. 23 (Gheri-Mancino)
fi (x) =
n
X
j=1, j6=i
Aij [sinα (log (Aij )) + cosα (log (Aij ))]
³
n ´γ
+ βnxi + i −
(i = 1, . . . , n) ,
2
µ
¶1/2
i
Aij (x) = x2j +
(i, j = 1, . . . , n, i 6= j) ,
j
α = 5, β = 14, γ = 3,
βn
xinitial = −
F (0) .
β 2 n2 − (α + 1)2 (n − 1)2
Appendices
146
No. 24 (Discretized H-equation)
µ
¶
n
X
ch
fi (x) = −1 + 1 −
aij xi xj
xi −
4
j=1
(
cih
if j = 1, . . . , n − 1,
2(i+j) ,
aij =
cih
,
if j = n,
4(i+n)
(i = 1, . . . , n) ,
c = 1/2, h = 1/n,
xinitial = [1, . . . , 1]T .
No. 25 (Trigexp1)
f1 (x) = φ1 (x1 , x2 ) ,
fi (x) = φ2 (xi−1 , xi ) + φ1 (xi , xi+1 )
fn (x) = φ2 (xn−1 , xn ) ,
(i = 2, . . . , n − 1) ,
φ1 (t, s) = 3t3 + 2s − 5 + sin (t − s) sin (t + s) ,
φ2 (t, s) = 4s − 3 − t exp (t − s) ,
T
xinitial = [0, . . . , 0] .
No. 26 (Troesch)
fi (x) = −xi−1 + 2xi − xi+1 + µh2 sinh (µxi )
x0 = 0, xn+1 = 1, h = 1/ (n + 1) , µ = 10,
(i = 1, . . . , n) ,
T
xinitial = [1, . . . , 1] .
No. 27 (Maier)
µ
¶
h2
xi+1 − xi−1
x2i +
ε
2h
= 0.5, h = 1/ (n + 1) ,
fi (x) = −xi−1 + 2xi − xi+1 −
x0 = 0, xn+1
(i = 1, . . . , n) ,
xinitial = [1, . . . , 1]T .
No. 28 (Maier)
µ
¶µ
¶
i
h2 X
i
i
xj+1 − xj+1
2
−
j 1−
fi (x) = xi −
xj +
2n + 2
ε j=1
n+1
2h
µ
¶µ
¶
n
h2 X
j
xj+1 − xj−1
−
i 1−
x2j +
(i = 1, . . . , n) ,
ε j=i+1
n+1
2h
x0 = 0, xn+1 = 0.5, h = 1/ (n + 1) ,
T
xinitial = [1, . . . , 1] .
Variable size test problems
147
No. 29 (Rheinboldt)
fi (x) = −xi−1 + 2xi − xi+1
µ
µ
2
+ h f ti , α 1 −
i
n+1
¶
i
+ xi
+β
n+1
f (t, x) = cb eγ(x−β) − ca eγ(α−x) + d (t) ,
½
ca , if t ≤ 0,
d (t) =
cb , if t > 0,
¶
(i = 1, . . . , n) ,
a = −9e − 3, b = 1e − 3, α = 0, β = 25, γ = 20, ca = 1e + 6, cb = 1e + 7,
b−a
x0 = α, xn+1 = β, ti = a + ih, h =
,
n+1
T
xinitial = [1, . . . , 1] .
No. 30 (Potra-Rheinboldt)
¢
¡
x ∈ R2n ,
F (x) = Bx + h2 Φ (x) − b
ti = ih, h = 1/ (n + 1) ,
T
xinitial = [t1 (1 − t1 ) , . . . , tn (1 − tn ) , t1 (1 − t1 ) , . . . , tn (1 − tn )] ,
where

B=
·
Φ (x) =
and
A 0
0 A
·
¸
φ1 (x)
φ2 (x)







A=






,
¸
,
2
−1
0
−1
2
..
.
1
..
.
..
..
0
..
.
..
.
..
.
0
.
..
···
.
.
···
···
..
.
..
.
..
.
..
.
..
.
···
···
···
..
.
..
.
..
.
.
..
.
..
−1
0
2
−1
n
φi (x) = [gi (tj , xj , xn+j )]j=1 ∈ Rn

α1 , if i = 1,




 β1 , if i = n,
α2 , if i = n + 1,
bi =


 β2 , if i = 2n,


0, otherwise
g1 (t, u1 , u2 ) = u21 + u1 + 0.1u22 − 1.2,
g2 (t, u1 , u2 ) = 0.2u21 + u22 + 2u2 − 0.6,
α1 = α2 = β1 = β2 = 0.
0
..
.
..
.
..
.








,


0 


−1 
2
(i = 1, 2) ,
Appendices
148
No. 31 (Glowinski-Keller)
´
³
2
x ∈ Rn ,
F (x) = Ax + h2 ψ (x) = 0
¢
¡
i = 1, . . . , n2 ,
ψi (x) = λ exp (xi )
h = 1/ (n + 1) , λ = 1,
2
xinitial = [1, . . . , 1]T ∈ Rn ,
where A is deÞned in problem No. 30.
No. 32 (Ishihara)
´
³
2
x ∈ Rn ,
F (x) = Ax + h2 ψ (x) − b
¡
¢
i = 1, . . . , n2 ,
ψi (x) = x2i
h = 1/ (n + 1) ,
2
xinitial = [1, . . . , 1]T ∈ Rn ,
where A is deÞned in problem No. 30,

φ (h, 0) + φ (0, h) , if i = 1,




φ (ih, 0) , if 1 < i < n,




φ (1, h) + φ (nh, 0) , if i = n,




 φ (0, jh) , if i = (j − 1) n + 1 and 1 < j < n,
0, if (j − 1) n + 1 < i < jn and 1 < j < n,
bi =


φ (1, jh) , if i = jn and 1 < j < n,




φ (h, 1) + φ (0, nh) , if i = n2 − n + 1,




φ (lh, 1) , if i = n2 − n + l and 1 < l < n,



φ (1, nh) + φ (nh, 1) , if i = n2
and
φ (t, s) =
7.4
12
.
(t + s + 1)2
A FORTRAN program of Algorithm QNABS3
Here we give the list of the FORTRAN 77 program of the quasi-Newton ABS method
QNABS3. The program was developed by Galántai, Jeney and Spedicato [120] and used
in the numerical testing [121].
149
C**********************************************************************C
C
SUBROUTINE QNABS.FOR
C
C
PURPOSE: SOLVING NONLINEAR ALGEBRAIC SYSTEMS OF THE FORM
C
C
F(X)=0 (X=(X(1),...,X(N))
C
C
OR IN COMPONENT WISE FORM
C
C
F_1(X)=0,....,F_N(X)=0
C
C
BY THE QUASI-NEWTON ABS METHOD OF GALANTAI AND JENEY
C
C
WITH THE HUANG PARAMETERS
C
C
THIS VERSON OF THE PROGRAM WAS WRITTEN IN APRIL-MAY, 1993.
C
C
REFERENCES:
C
C
[1] GALANTAI A.,JENEY A.: QUASI-NEWTON ABS METHODS, microCAD- C
C
System’ 93 Conference, Section Modern Numerical Methods,
C
C
MISKOLC, 1993, PP. 63-68
C
C
[2] GALANTAI A., JENEY A., SPEDICATO, E.: TESTING OF ABSC
C
HUANG METHODS ON MODERATELY LARGE NONLINEAR SYSTEMS,
C
C
QUADERNO DMSIA 93/5, UNIVERSITY OF BERGAMO, 1993
C
C
[3] ABAFFY J., SPEDICATO, E.: ABS PROJECTION ALGORITHMS:
C
C
MATHEMATICAL TECHNIQUES FOR LINEAR AND NONLINEAR
C
C
ALGEBRAIC EQUATIONS, ELLIS HORWOOD, 1989
C
C----------------------------------------------------------------------C
C
INPUT PARAMETERS:
C
C
N - DIMENSION OF THE SYSTEM
C
C
MAX - MAXIMUM ALLOWED MAJOR ITERATION NUMBER
C
C
EPSF - TOLERANCE VALUE FOR THE MAXIMUM NORM OF F(X),
C
C
OUTPUT PARAMETERS:
C
C
FAM - MAXIMUM NORM OF F(X)
C
C
XAM - MAXIMUM NORM OF THE DIFFERENCE BETWEEN THE LAST TWO
C
C
MAJOR ITERATES
C
C
ITNO - THE NUMBER OF MAJOR ITERATIONS
C
C
K1 - INFORMATION PARAMETER WITH VALUES
C
C
K1=1, IF NORM(F(X))<=EPSF IS SATISFIED ON OUPUT
C
C
K1=-1, IF DENOMINATOR IN THE UPDATE MATRIX BECOMES
C
C
LESS THAN 1D-30
C
C
K1=-2, IF DENOMINATOR BECOMES LESS THAN 1D-30 IN THE
C
C
MINOR ITERATIONS
C
C
INPUT/OUTPUT PARAMETERS:
C
C
X - INITIAL ESTIMATE OF THE ZERO ON INPUT,
C
C
- FINAL ESTIMATE OF THE ZERO ON OUTPUT
C
C
WORKING PARAMETERS:
C
C
Y - ARRAY OF MINOR ITERATES
C
C
FX - ARRAY OF F(X) AT POINT X
C
C
W - N DIMENSIONAL ARRAY
C
C
JR - ACTUAL ROW OF THE JACOBIAN MATRIX
C
C
A - TRANSPOSE OF THE INITIAL JACOBIAN MATRIX
C
C
Q - Q-FACTOR OF THE QR-DECOMPOSITION OF THE TRANSPOSED
C
C
JACOBIAN
C
C
R - R-FACTOR OF THE QR-DECOMPOSITION OF THE TRANSPOSED
C
C
JACOBIAN IN VECTOR FORM. ITS SIZE IS AT LEAST N*(N+1)/2
C
C
WW - N DIMENSIONAL ARRAY
C
C
WRK1,WRK2,WRK3,WRK4 - N DIMENSIONAL ARRAYS
C
150
C----------------------------------------------------------------------C
C
REQUIRED ROUTINES:
C
C
DOUBLE PRECISION FUNCTION F(N,I,X) - USER SUPPLIED.
C
C
THE ROUTINE F(N,I,X) CALCULATES THE ITH COORDINATE OF C
C
FUNCTION F(X) AT POINT X AND RETURNS ITS VALUE AS F
C
C
THE SUBROUTINE MUST NOT CHANGE THE VALUE OF N,X AND I! C
C
SUBROUTINE JACOBI(N,I,X,JR) - USER SUPPLIED.
C
C
THE ROUTINE JACOBI(N,I,X,JR) CALCULATES THE ITH ROW OF C
C
THE JACOBIAN MATRIX AT POINT X AND RETURNS ITS VALUE
C
C
IN THE VECTOR JR. THE SUBROUTINE MUST NOT CHANGE THE
C
C
VALUE OF N,X AND I!
C
C----------------------------------------------------------------------C
C
REQUIRED BLAS ROUTINES:
C
C
DAXPY.FOR
C
C
DCOPY.FOR
C
C
DDOT.FOR
C
C
DGER.FOR
C
C
DGEMV.FOR
C
C
DNRM2.FOR
C
C
DSCAL.FOR
C
C
DTPMV.FOR
C
C
DTPSV.FOR
C
C
IDAMAX.FOR
C
C
LSAME.FOR
C
C
XERBLA.FOR
C
C----------------------------------------------------------------------C
C
USED OTHER ROUTINES (INCLUDED IN THIS CODE)
C
C
QRFACT
- SUBROUTINE FROM PROGRAM TOMS 580
C
C
RANK1
- SUBROUTINE FROM PROGRAM TOMS 580
C
C
INSCOL
- SUBROUTINE FROM PROGRAM TOMS 580
C
C
ORTCOL
- SUBROUTINE FROM PROGRAM TOMS 580
C
C
CRFLCT
- SUBROUTINE FROM PROGRAM TOMS 580
C
C
ARFLCT
- SUBROUTINE FROM PROGRAM TOMS 580
C
C
NOTE: THE QRUP PROGRAM (ALGORITHM TOMS 580) WAS WRITTEN BY
C
C
A. BUCKLEY AND IT WAS PUBLISHED IN
C
C
C
C
"ACM TRANSACTIONS ON MATHEMATICAL SOFTWARE, DECEMBER 1981."
C
C
C
C
THESE ROUTINES ARE A TRANSLATION INTO FORTRAN OF THE
C
C
ALGOL ROUTINES PUBLISHED IN
C
C
C
C
"REORTHOGONALIZATION AND STABLE ALGORITHMS FOR UPDATING
C
C
THE GRAM SCHMIDT QR FACTORIZATION"
C
C
C
C
PUBLISHED BY J.W. DANIEL, W.B. GRAGG, L. KAUFMANN AND
C
C
G.W. STEWART IN MATHEMATICS OF COMPUTATION, VOLUME 30,
C
C
NUMBER 136, OCTOBER 1976, PAGES 772-795.
C
C**********************************************************************C
subroutine QNABS(N,MAX,EPSF,FAM,XAM,ITNO,K1,X,Y,FX,W,JR,A,Q,R,WW,
&
WRK1,WRK2,WRK3,WRK4)
integer N,MAX,ITNO,K1
151
double precision EPSF,FAM,XAM
double precision X(N),Y(N),FX(N),W(N),JR(N),A(N,N),Q(N,N),R(*),
&
WW(N),WRK1(N),WRK2(N),WRK3(N),WRK4(N)
integer I, IFAIL,IRA,IRQ,K,LDA,NC,NR
double precision
T,DDOT,F
C
C
C
3020
C
C
C
EVALUATE F(X) AT THE INITIAL POINT
do 3020 I=1,N
FX(I)=F(N,I,X)
continue
FAM=DABS(FX(IDAMAX(N,FX,1)))
K1=1
ITNO=0
EXIT CRITERION CHECK FOR THE INITIAL POINT
if(FAM.LT.EPSF) then
XAM=0.0D0
return
endif
C
IRQ = N
IRA = N
NR = N
NC = N
C
C
C
3010
C
C
C
CALCULATE TRANSPOSE OF THE INITIAL JACOBIAN
do 3010 I=1,N
call JACOBI(N,I,X,JR)
call DCOPY(N,JR,1,A(1,I),1)
continue
QR-DECOMPOSITION OF THE TRANSPOSE OF THE INITIAL JACOBIAN MATRIX A
call QRFACT(NR,NC,A,IRA,Q,IRQ,R,WRK1,WRK2,WRK3,WRK4,IFAIL)
C
C
C
SETS INITIAL MINOR ITERATE AS Y=X, WHERE X IS THE INITIAL VECTOR
C
call DCOPY(N,X,1,Y,1)
C
C
C
3525
C
C
C
START OF THE MAJOR ITERATIONS
ITNO=ITNO+1
SETS DIMENSION FOR TOMS ROUTINES
LDA=N
152
C
C
C
START OF THE MINOR ITERATIONS
do 3030 K=1,N
T=R(K*(K+1)/2)
if(DABS(T) .LT. 1.0E-30) then
K1=-2
return
endif
T=-F(N,K,Y)/T
C
C
C
3030
C
C
C
3040
UPDATE OF Y:
Y=Y+T*P (THE SEARCH VECTOR IS THE K-TH COLUMN OF Q)
call DAXPY(N,T,Q(1,K),1,Y,1)
continue
END OF THE MINOR ITERATIONS
do 3040 I=1,N
W(I)=F(N,I,Y)-FX(I)
continue
call DAXPY(N,1.0D0,W,1,FX,1)
FAM=DABS(FX(IDAMAX(N,FX,1)))
call DAXPY(N,-1.0D0,Y,1,X,1)
XAM=DABS(X(IDAMAX(N,X,1)))
if(FAM.LT.EPSF) then
call DCOPY(N,Y,1,X,1)
return
endif
T=DDOT(N,X,1,X,1)
if(T .LT. 1.0E-30) then
K1=-1
call DCOPY(N,Y,1,X,1)
return
endif
T=-1.0D0/T
call DGEMV(‘T’,N,N,1.0D0,Q,LDA,X,1,0.0D0,WW,1)
call DTPMV(‘U’,‘T’,‘N’,N,R,WW,1)
call DAXPY(N,1.0D0,WW,1,W,1)
call DSCAL(N,T,X,1)
C
C-- UPDATE OF Q-R DECOMPOSITION
C
call RANK1 (NR,NC,Q,IRQ,R,W,X,WRK1,WRK2,WRK3,WRK4,IFAIL)
C
call DCOPY(N,Y,1,X,1)
if(ITNO .GE. MAX) return
goto 3525
C
C
END OF THE MAJOR ITERATIONS
C
153
end
C
C**********************************************************************
C
subroutine QRFACT(M,N,A,IRA,Q,IRQ,R,WRK1,WRK2,WRK3,WRK4,IFAIL)
C
C======================== D E S C R I P T I O N ========================
C
C
THIS COMPUTES A GRAM-SCHMIDT QR FACTORIZATION OF A.
C
C
IT IS ASSUMED THAT A IS M BY N AND THAT M >= N.
C
THUS Q MUST BE M BY N AND R WILL BE N BY N,
C
ALTHOUGH R WILL BE STORED ONLY AS THE UPPER TRIANGULAR
C
HALF, STORED BY COLUMNS, AS DESCRIBED IN THE ROUTINE
C
"DESCRB".
C
C
WRK4 IS A TEMPORARY WORK VECTOR OF LENGTH M, NAMELY
C
V OF THE ALGOL ROUTINE.
C
WRK1, WRK2 AND WRK3 ARE USED IN "INSCOL" AND ARE OF LENGTHS
C
M, N AND N RESPECTIVELY.
C
C
IRA AND IRQ ARE THE ACTUAL DECLARED FIRST DIMENSIONS OF THE
C
ARRAYS A AND Q RESPECTIVELY.
C
C
IFAIL IS DEFINED IN "ORTCOL", WHICH IS INDIRECTLY CALLED BY
C
"QRFACT".
C
C
FOR FURTHER DETAILS, PLEASE SEE THE ALGOL ROUTINE "QRFACTOR"
C
BY DANIEL ET AL.
C
C======================= D E C L A R A T I O N S =======================
C
double precision
A,
Q,
R,
WRK1,
WRK2
double precision
WRK3,
WRK4
C
integer
&
I, IFAIL,
IRA,
IRQ,
K,
&
M,
N
C
dimension
A(IRA,N), Q(IRQ,N), R(1)
dimension
WRK1(M), WRK2(N), WRK3(N), WRK4(M)
C
C======================== E X E C U T I O N ============================
C
do 2000 K = 1 , N
do 1000 I = 1 , M
1000
WRK4(I) = A( I,K )
call INSCOL(M,K,Q,IRQ,R,K,WRK4,WRK1,WRK2,WRK3,IFAIL)
if (IFAIL .EQ. 1) goto 90000
2000
continue
C
154
C========================= E X I T =====================================
C
90000 return
C
end
C
C***********************************************************************
C
subroutine RANK1(M,N,Q,IRQ,R,U,V,WRK1,WRK2,WRK3,WRK4,IFAIL)
C
C======================== D E S C R I P T I O N ========================
C
C
THIS SUBROUTINE UPDATES THE FACTORIZATION A = Q R WHEN THE
C
OUTER PRODUCT OF THE M-VECTOR V AND THE N-VECTOR U IS
C
ADDED TO A. ON ENTRY Q IS M BY N
AND R IS N BY N.
C
THE USER SHOULD ENSURE THAT M >= N > 0.
C
C
IRQ IS DESCRIBED IN "QRFACT".
C
C
WRK1 AND WRK2 ARE TEMPORARY VECTORS PASSED AS WORKING STORAGE
C
TO THE ROUTINE "ORTCOL".
C
C
WRK3 IS A TEMPORARY WORK VECTOR OF LENGTH N CORRESPONDING TO
C
THE VECTOR T DECLARED IN THE ALGOL PROCEDURE.
C
C
NOTICE ALSO THAT, AS MENTIONED IN "DESCRB" , THE TRIANGULAR
C
MATRIX R IS NOT STORED IN FULL, BUT ONLY ITS NONZERO
C
UPPER HALF IS AVAILABLE. THUS THERE IS NO STORAGE AVAILABLE
C
FOR THE ZERO ELEMENTS IN THE LOWER PART. HOWEVER, THE ALGOL
C
PROCEDURE USES THE STORAGE SPACE ALONG THE FIRST SUBDIAGONAL OF
C
R. THUS WE NEED TO PROVIDE SOME TEMPORARY STORAGE TO ALLOW
C
FOR THE INFORMATION STORED THERE. THIS IS THE USE OF THE
C
WORKING VECTOR WRK4.
C
C
C======================= D E C L A R A T I O N S =======================
C
double precision
C,
ONE,
Q,
R,
RHO
double precision
RHOV,
S,
T1,
U,
V
double precision
WRK1,
WRK2,
WRK3,
WRK4,
ZERO
C
integer
&
I, IFAIL,
IRQ, ITEMP1,
K,
&
KP1,
M,
N,
NM1,
NP1
C
dimension Q(IRQ,N), R(1), U(N), V(M), RHOV(1)
dimension WRK1(M), WRK2(N), WRK3(N), WRK4(N)
C
equivalence (RHO,RHOV(1))
C
data ZERO/0.D0/, ONE/1.D0/
155
C
C======================== E X E C U T I O N ============================
C
NM1 = N - 1
NP1 = N + 1
C
call ORTCOL(M,N,Q,IRQ,V,WRK3,RHO,WRK1,WRK2,IFAIL)
if (IFAIL .EQ. 1) goto 90000
call CRFLCT(WRK3(N),RHO,C,S)
ITEMP1 = ( N*NP1) / 2
call ARFLCT(C,S,1,R(ITEMP1),0,1,RHOV,0,1)
call ARFLCT(C,S,M,Q(1,N),0,1,V,0,1)
C
if ( N .LE. 1) goto 2000
do 1000 I = 1,NM1
K = N-I
KP1 = K + 1
call CRFLCT(WRK3(K),WRK3(KP1), C,S)
call ARFLCT(C,S,I,R(ITEMP1-1),1,KP1,R(ITEMP1),1,KP1)
WRK4(KP1) = ZERO
ITEMP1 = ITEMP1 - KP1
call ARFLCT(C,S,1,R(ITEMP1),0,1,WRK4(KP1),0,1)
call ARFLCT(C,S,M,Q(1,K),0,1,Q(1,KP1),0,1)
1000
continue
C
2000 K = 1
T1 = WRK3(1)
do 2500 I = 1,N
R(K) = ONE * R(K) + T1 * U(I)
K = K + I
2500
continue
ITEMP1 = 1
if ( N .LE. 1) goto 4000
do 3000 K = 1,NM1
KP1 = K + 1
call CRFLCT(R(ITEMP1), WRK4(KP1), C,S)
ITEMP1 = ITEMP1 + KP1
call ARFLCT(C,S,N-K,R(ITEMP1-1),1,KP1,R(ITEMP1),1,KP1)
call ARFLCT(C,S,M,Q(1,K),0,1,Q(1,KP1),0,1)
3000
continue
C
4000 call CRFLCT(R(ITEMP1),RHO,C,S)
call ARFLCT(C,S,M,Q(1,N),0,1,V,0,1)
C
C========================= E X I T =====================================
C
90000 return
C
end
C
156
C***********************************************************************
C
subroutine INSCOL (M,N,Q,IRQ,R,K,V,WRK1,WRK2,WRK3,IFAIL)
C
C======================== D E S C R I P T I O N ========================
C
C
THIS SUBROUTINE UPDATES THE FACTORIZATION A = Q R WHEN THE MC
VECTOR V IS INSERTED BETWEEN COLUMNS K - 1 AND K OF A.
C
C
IT ASSUMES Q IS INITIALLY M BY N-1
C
AND THAT
R IS INITIALLY N-1 BY N-1.
C
C
THE USER SHOULD ENSURE THAT M >= N > 0 AND THAT 0 < K <= N.
C
NOTICE THAT A CALL WITH K = N JUST AMOUNTS TO A CALL
C
TO "ORTCOL".
C
C
WRK1 AND WRK2 ARE TEMPORARY VECTORS PASSED TO "ORTCOL".
C
WRK3 IS FOR TEMPORARY STORAGE OF THE WORK VECTOR U OF THE
C
ALGOL ROUTINE.
C
C
R IS STORED IN TRIANGULAR FORM, AS DESCRIBED IN "DESCRB".
C
C
IRQ IS EXPLAINED IN "QRFACT".
C
C
IFAIL IS EXPLAINED IN "ORTCOL".
C
C
C======================= D E C L A R A T I O N S =======================
C
double precision
C,
Q,
R,
RHO,
S
double precision
V,
WRK1,
WRK2,
WRK3,
ZERO
C
integer
&
I, IFAIL,
IRQ, ITEMP1, ITEMP2,
&
IT1,
IT2,
J,
JJ,
K,
&
L,
LL,
LP1,
M,
N,
&
NK,
N1
C
dimension Q(IRQ,N), R(1), V(M)
dimension WRK1(M), WRK2(N), WRK3(N)
C
data ZERO /0.D0/
C
C======================== E X E C U T I O N ============================
C
N1 = N - 1
if ( K .GE. N) goto 3500
NK = N1 + K
ITEMP1 = (N*N1) / 2
ITEMP2 = ITEMP1 + N
157
1000
2000
C
3500
do 2000 JJ = K,N1
R(ITEMP2) = ZERO
ITEMP2 = ITEMP1
J = NK - JJ
ITEMP1 = ITEMP1 - J
do 1000 I = 1, J
IT1 = ITEMP1 + I
IT2 = ITEMP2 + I
R(IT2) = R(IT1)
continue
call ORTCOL(M,N1,Q,IRQ,V,WRK3,RHO,WRK1,WRK2,IFAIL)
if (IFAIL .EQ. 1) goto 90000
WRK3(N) = RHO
C
4000
C
5000
C
5500
do 4000 I = 1, M
Q(I,N) = V(I)
if ( K .GE. N) goto 5500
ITEMP1 = (N*N1) /2 + N1
do 5000 LL = K, N1
L = NK - LL
LP1 = L + 1
call CRFLCT(WRK3(L),WRK3(LP1),C,S)
call ARFLCT(C,S,N-L,R(ITEMP1),1,LP1,R(ITEMP1+1),1,LP1)
call ARFLCT(C,S,M,Q(1,L),0,1,Q(1,LP1),0,1)
ITEMP1 = ITEMP1 - LP1
continue
ITEMP1 = (K*(K-1)) /
do 6000 I = 1, K
IT1 = ITEMP1 + I
R(IT1) = WRK3(I)
2
6000
C
C========================= E X I T =====================================
C
90000 return
C
end
C
C***********************************************************************
C
subroutine ORTCOL(M,N,Q,IRQ,V,SMALLR,RHO,WRK1,WRK2,IFAIL)
C
C======================== D E S C R I P T I O N ========================
C
C
ASSUMING THE M BY N MATRIX Q HAS (NEARLY) ORTHONORMAL COLUMNS,
C
THIS SUBROUTINE ORTHOGONALIZES THE M-VECTOR V TO THE COLUMNS
C
OF Q. IT NORMALIZES THE RESULT IF M > N. THE N-VECTOR
C
SMALLR IS THE ARRAY OF "FOURIER COEFFICIENTS", AND RHO
C
IS THE DISTANCE FROM V TO THE RANGE OF Q. SMALLR AND
158
C
ITS CORRECTIONS ARE COMPUTED IN DOUBLE PRECISION. FOR
C
MORE DETAIL, SEE SECTIONS 2 AND 4 OF THE PAPER BY DANIEL ET AL.
C
C
NOTES : 1. INNER PRODUCTS ARE DONE USING THE ROUTINE SDOT
C
FROM THE BLAS (DDOT IN DOUBLE PRECISION) AND ARE
C
ACCUMULATED IN DOUBLE PRECISION.
C
C
2. WE DO NOT CHECK THAT M > 0. THE USER MUST ENSURE THIS.
C
N MAY BE 0. IF N < 0, IT IS TREATED AS 0.
C
C
3. THE VECTORS U AND S FROM THE ALGOL PROGRAM ARE
C
PASSED TO THE ROUTINE AS WORK VECTORS WRK1 AND WRK2.
C
C
4. THE GLOBAL VARIABLES THETA, OMEGA AND SIGMA ARE
C
EXPLAINED IN DESCRB. NORMALLY SIGMA SHOULD BE OF THE
C
ORDER OF ONE TENTH OF THE RELATIVE MACHINE PRECISION,
C
OMEGA MAY BE SET TO 0 AND THETA MAY BE 1.4. THESE
C
SPECIFIC RECOMMENDATIONS ARE BASED ON THE PRESENTATION
C
OF EXPERIMENT 1 IN THE LAST SECTION OF THE DANIEL
C
ET AL PAPER. FOR COMPLETE INFORMATION, SEE THE PAPER.
C
C
5. EXIT TO THE GLOBAL EXIT "FAIL" IN ALGOL IS
C
IMPLEMENTED BY SETTING IFAIL = 1 ON EXIT.
C
OTHERWISE, IFAIL = 0 .
C
C
6. SEE "QRFACT" FOR A DESCRIPTION OF IRQ.
C
C======================= D E C L A R A T I O N S =======================
C
double precision DDOT, DNRM2, OMEGA, ONE, ONENEG, Q, RHO
double precision
RHO0,
RHO1, SIGMA, SMALLR
double precision
T, THETA,
TWO,
V,
WRK1
double precision
WRK2,
ZERO
C
integer
&
I, IFAIL,
IRQ,
J,
K,
&
M,
N
C
dimension Q(IRQ, 1), V(M), SMALLR(1)
dimension WRK1(M), WRK2(1)
C
logical RESTAR, NULL
C
common /MGREEK/ THETA,OMEGA,SIGMA
C
data ZERO /0.D0/, ONE /1.D0/, TWO /2.D0/, ONENEG /-1.D0/
C
C========================= E X E C U T I O N ===========================
C
THETA = 1.4D0
C
159
OMEGA = 0.0D0
C
SIGMA = 1.11D-17
C
RESTAR = .FALSE.
NULL
= .FALSE.
IFAIL = 0
C
if ( N .LE. 0 ) goto 2000
C
1000
C
2000
do 1000 J = 1, N
SMALLR(J)
= ZERO
continue
continue
RHO
= DNRM2(M,V,1)
RHO0
= RHO
K
= 0
C
C=======================================================================
C-----TAKE A GRAM-SCHMIDT ITERATION, IGNORING R ON LATER STEPS
C-----IF PREVIOUS V WAS NULL.
C=======================================================================
C
3000
do 3100 I = 1, M
WRK1(I)
= ZERO
3100
continue
C
if ( N .LE. 0 ) goto 3400
C
do 3300 J = 1, N
T = DDOT(M,Q(1,J),1,V,1)
WRK2(J)
= T
call DAXPY(M,T,Q(1,J),1,WRK1,1)
3300
continue
C
3400 continue
if (.NOT. NULL .AND. N .GT. 0 ) call DAXPY(N,ONE,WRK2,1,SMALLR,1)
C
call DAXPY(M,ONENEG,WRK1,1,V,1)
RHO1
= DNRM2(M,V,1)
T
= DNRM2(N,WRK2,1)
K
= K + 1
C
if ( M .NE. N ) goto 5000
C
C=======================================================================
C-----A SPECIAL CASE WHEN M = N.
C=======================================================================
C
do 4100 I = 1, M
160
4100
C
V(I)
= ZERO
continue
RHO
= ZERO
goto 90000
C
C=======================================================================
C----TEST FOR NONTERMINATION.
C=======================================================================
C
5000 if ( RHO0 + OMEGA * T .LT. THETA * RHO1 ) goto 6000
C
C-----EXIT IF TOO MANY ITERATIONS.
C
if ( K .LE. 4 ) goto 5100
IFAIL = 1
goto 90000
C
C-----RESTART IF NECESSARY.
C
5100 if ( RESTAR .OR. RHO1 .GT. RHO * SIGMA ) goto 5900
RESTAR = .TRUE.
C
C-----FIND FIRST ROW OF MINIMAL LENGTH OF Q.
C
do 5300 I = 1, M
WRK1(I) = DDOT(N,Q(I,1),IRQ,Q(I,1),IRQ)
5300
continue
C
T
= TWO
C
do 5500 I = 1, M
if ( WRK1(I) .GE. T ) goto 5500
K
= I
T
= WRK1(K)
5500
continue
C
C-----TAKE CORRECT ACTION IF V IS NULL.
C
if ( RHO1 .NE. ZERO ) goto 5700
NULL
= .TRUE.
RHO1
= ONE
C
C-----REINITIALIZE V AND K.
C
5700
do 5800 I = 1, M
5800
V(I)
= ZERO
C
V(K)
= RHO1
K
= 0
C
161
C-----TAKE ANOTHER ITERATION.
C
5900 RHO0
= RHO1
goto 3000
C
C======================================================================
C-----NORMALIZE V AND TAKE THE STANDARD EXIT
C======================================================================
C
6000
do 6100 I = 1, M
6100
V(I)
= V(I) / RHO1
C
RHO
= ZERO
if ( .NOT. NULL ) RHO = RHO1
C
C=============================== E X I T ===============================
C
90000 return
C
end
C
C***********************************************************************
C
subroutine CRFLCT(X, Y, C, S)
C
C======================== D E S C R I P T I O N ========================
C
C
THIS SUBROUTINE COMPUTES PARAMETERS FOR THE GIVENS MATRIX G FOR
C
WHICH (X,Y)G = (Z,0). IT REPLACES (X,Y) BY (Z,0).
C
C======================= D E C L A R A T I O N S =======================
C
double precision
ARG,
C,
ONE,
S,
T
double precision
U, UDUM,
UM,
V,
VDUM
double precision
X,
Y,
ZERO
C
data ZERO /0.D0/, ONE /1.D0/
C
C========================= E X E C U T I O N ===========================
C
U
= X
V
= Y
C
if ( V .NE. ZERO ) goto 1000
C
= ONE
S
= ZERO
goto 90000
C
1000 continue
UM
= DMAX1(DABS(U), DABS(V))
UDUM
= U / UM
162
VDUM
ARG
T
= V / UM
= UDUM * UDUM + VDUM * VDUM
= UM * DSQRT(ARG)
C
if ( U .LT. ZERO ) T = -T
C
C
S
X
Y
=
=
=
=
U / T
V / T
T
ZERO
C
C=============================== E X I T ===============================
C
90000 return
C
end
C
C***********************************************************************
C
subroutine ARFLCT (C,S,IP,X,INCX,IDISX,Y,INCY,IDISY)
C
C======================== D E S C R I P T I O N ========================
C
C
THIS IS A FORTRAN IMPLEMENTATION OF THE ALGOL ROUTINE
C
"APPLYREFLECTOR" WRITTEN BY DANIEL ET AL.
C
C
THE CALLING SEQUENCE IS DIFFERENT, BUT THAT IS UNAVOIDABLE DUE
C
TO FUNDAMENTAL DIFFERENCES IN THE HANDLING OF PARAMETER
C
LISTS IN FORTRAN AND ALGOL. (SEE THE FOLLOWING PARAGRAPHS.)
C
C
THIS ROUTINE TAKES 2 VECTORS, CALLED X AND Y, AND REPLACES
C
THEM BY LINEAR COMBINATIONS
C
C * X + S * Y
C
S * X - C * Y.
C
THAT IS, IT APPLIES A GIVEN’S REFLECTION TO VECTORS X
C
AND Y. C AND S ARE COMPUTED IN "CRFLCT". THE NUMBER
C
OF ELEMENTS IN EACH OF X AND Y IS IP.
C
C
THE JENSEN DEVICE USED IN THE ALGOL PROCEDURE IS NO LONGER
C
RELEVANT. INSTEAD IT IS ASSUMED THAT ANY CALL WITH AN ACTUAL
C
PARAMETER WHICH IS AN ARRAY OR ARRAY ELEMENT WILL BE DONE BY
C
PASSING THE ADDRESS OF THE FIRST ELEMENT OF THE ARRAY OR
C
THE ADDRESS OF THE ARRAY ELEMENT.
C
C
IN "APPLYREFLECTOR" X AND Y WERE IN EFFECT ROWS OR COLUMNS
C
OF A SQUARE MATRIX. THE SAME WILL BE TRUE HERE, BUT THEY
C
MAY BE FROM THE TRIANGULAR MATRIX R AS DISCUSSED
C
IN THE ROUTINE "DESCRB".
C
C
THE PARAMETERS INCX AND IDISX ARE USED IN THE FOLLOWING WAY
C
(WITH SIMILAR USAGE FOR INCY AND IDISY):
163
C
C
THE PARAMETER X IS ASSUMED TO BE EQUIVALENT TO X(1).
C
THE SUBSCRIPT REFERENCE IS INITIALIZED TO I = 1 AND THE FIRST
C
SUBSCRIPT REFERENCE IS TO X(I) = X(1) .
C
THE NEXT LOCATION REFERENCED IN THE ARRAY X IS X(I + IDISX).
C
THUS IDISX IS THE DISTANCE TO THE NEXT SUBSCRIPT NEEDED.
C
THEN I IS REPLACED BY I + IDISX.
C
THEN IDISX IS INCREMENTED BY INCX SO THAT THE DISTANCE TO
C
THE NEXT SUBSCRIPT NEEDED MAY BE DIFFERENT.
C
THE CYCLE THEN REPEATS, SO THAT THE CALL "...X,1,1,..." WILL
C
GET X(1),X(2),X(4),X(7),X(11),...
AND THE CALL WITH
C
"...X,0,2,..." WILL GET X(1),X(3),X(5),... .
C
THIS IS EXACTLY WHAT IS NEEDED TO HANDLE THE TRIANGULAR ARRAYS.
C
C======================= D E C L A R A T I O N S =======================
C
double precision
C,
ONE,
S,
T,
U
double precision
UN,
V,
X,
Y
C
integer
& IDISX, IDISY, INCVXT, INCVYT,
INCX,
&
INCY,
IP,
JX,
JY,
K
C
dimension X(1), Y(1)
C
data ONE /1.D0/
C
C========================= E X E C U T I O N ===========================
C
if ( IP .LE. 0 ) goto 90000
UN
= S / ( ONE + C )
JX
= 1
JY
= 1
INCVXT = IDISX
INCVYT = IDISY
C
do 1000 K = 1, IP
U
= X(JX)
V
= Y(JY)
T
= U * C + V * S
X(JX) = T
Y(JY) = ( T + U ) * UN - V
JX
= JX + INCVXT
JY
= JY + INCVYT
INCVXT = INCVXT + INCX
INCVYT = INCVYT + INCY
1000
continue
C
C=============================== E X I T ===============================
C
90000 return
164
C
end
C
Chapter 8
REFERENCES
1. Abaffy, J.: 1979, ‘A Lineáris Egyenletrendszerek Általános Megoldásának Egy Direkt Módszerosztálya’. Alkalmazott Matematikai Lapok 5, 233—240.
2. Abaffy, J.: 1988, ‘A Superlinear Convergency Theorem in the ABSg Class of Algorithms
for Nonlinear Algebraic Equations’. JOTA 59(1), 39—43.
3. Abaffy, J.: 1991, ‘ABS Algorithms for Sparse Linear Systems’. In: E. Spedicato (ed.):
Computer Algorithms for Solving Linear Algebraic Equations: The State of the Art. pp.
111—131.
4. Abaffy, J.: 1992, ‘Superlinear Convergence Theorems for Newton-Type Methods for Nonlinear Systems of Equations’. JOTA 73(2), 269—277.
5. Abaffy, J., C. Broyden, and E. Spedicato: 1984, ‘A class of direct methods for linear
equations’. Numerische Mathematik 45, 361—376.
6. Abaffy, J. and A. Galántai: 1987, ‘Conjugate Direction Methods for Linear and Nonlinear
Systems of Algebraic Equations’. In: P. Rózsa (ed.): Colloquia Mathematica Soc. János
Bolyai, 50. Numerical Methods, Miskolc (Hungary) 1986. Amsterdam, pp. 481—502.
7. Abaffy, J., A. Galántai, and E. Spedicato: 1987a, ‘Applications of ABS Class to Unconstrained Function Minimization’. Technical report, Istituto Universitario di Bergamo.
Quaderni del Dipartimento di Matematica, Statistica e Informazioni e Applicazioni, 198714.
8. Abaffy, J., A. Galántai, and E. Spedicato: 1987b, ‘The local convergence of ABS methods
for nonlinear algebraic equations’. Numerische Mathematik 51, 429—439.
9. Abaffy, J. and E. Spedicato: 1989, ABS-Projection Algorithms: Mathematical Techniques
for Linear and Nonlinear Algebraic Equations. Chichester: Ellis Horwood.
10. Abaffy, J. and E. Spedicato: 1990, ‘A Class of Scaled Direct Methods for Linear Systems’.
Ann. Inst. Stat. 42(1), 187—201.
11. Abaffy, J. and E. Spedicato: 1991, ABS Projection Algorithms: Mathematical Techniques
for Linear and Nonlinear Equations. Beijing Polytechnic University Press. Chinese.
12. Abaffy, J. and E. Spedicato: 1996, Matematicheskie Metodi Dlja Lineinikh I Nelineinikh
Uravnenii; Proekcionnie ABS-Algoritmi. Moscow: Mir. Russian.
13. Achieser, N. and I. Glasmann: 1954, Theorie der Linearen Operatoren im Hilbert Raum.
Berlin: Akademie-Verlag.
14. Aird, T. and R. Lynch: 1975, ‘Computable Accurate Upper and Lower Error Bounds for
Approximate Solutions of Linear Algebraic Systems’. ACM TOMS 1, 217—231.
15. Allgower, E., P. F. Böhmer, K., and W. Rheinboldt: 1986, ‘A Mesh-Independence Principle
for Operator Equations and Their Discretizations’. SIAM J. Numer. Anal. 23, 160—169.
16. Aronszajn, N.: 1950, ‘Theory of reproducing kernels’.
337—404.
Trans. Amer. Math. Soc. 68,
17. Auchmuty, G.: 1992, ‘A Posteriori Error Estimates for Linear Equations’.
Mathematik 61, 1—6.
Numerische
18. Ballalij, M. and H. Sadok: 1998, ‘New Interpretation of Related Huang’s Methods’. Linear
References
166
Algebra and its Applications 269, 183—195.
19. Barrlund, A.: 1991, ‘Perturbation Bounds for the LDLH and LU Decompositions’. BIT
31, 358—363.
20. Bauschke, H. and J. Borwein: 1996, ‘On projection algorithms for solving convex feasibility
problems’. SIAM Review 38(3), 367—426.
21. Ben-Israel, A.: 1967, ‘On the geometry of subspaces in Euclidean n-spaces’.
Appl. Math. 15(5), 1184—1198.
SIAM J.
22. Benzi, M. and C. Meyer: 1995, ‘A direct projection method for sparse linear systems’.
SIAM J. Sci. Comput. 16(5), 1159—1176.
23. Berman, A. and R. Plemmons: 1994, Nonnegative Matrices in the Matematical Sciences.
SIAM.
24. Bertocchi, M. and M. Spedicato: 1990, ‘Block ABS Algorithms for Dense Linear Systems in
a Vector Processor Environment’. In: D. Laforenza and R. Perego (eds.): Supercomputing
Tools for Science and Engineering. Milano, pp. 39—46.
25. Bhatia, R.: 1994, ‘Matrix Factorizations and their Perturbations’. Linear Algebra and its
Applications 197-198, 245—276.
26. Björck, A. and G. Golub: 1973, ‘Numerical methods for computing angles between linear
subspaces’. Mathematics of Computation 27(123), 579—594.
27. Bjørstad, P. and O. Widlund: 1986, ‘Iterative Methods for the Solution of Elliptic Problems
on Regions Partitioned Into Substructures’. SIAM J. Numer. Anal. 23, 1097—1120.
28. Bodon, E.: 2001, ‘On the Block Implicit LU Algorithm for Linear Systems of Equations’.
Mathematical Notes, Miskolc 2(1), 11—29.
29. Bosznay, A. and B. Garay: 1986, ‘On norms of projections’. Acta Sci. Math. 50(1—2),
87—92.
30. Brezinski, C.: 1997, Projection Methods for Systems of Equations. Elsevier.
31. Brezinski, C.: 1999, ‘Error Estimates in the Solution of Linear Systems’. SIAM J. Sci.
Comput. 21, 764—781.
32. Brezinski, C., M. Morandi Cecchi, and M. Redivo-Zaglia: 1994, ‘The Reverse Bordering
Method’. SIAM J. Matrix. Anal. Appl. 15, 922—937.
33. Brown, K.: 1969, ‘A Quadratically Convergent Newton-Like Method Based Upon
Gaussian-Elimination’. SIAM J. Numer. Anal. 6, 560—569.
34. Broyden, C.: 1965, ‘A Class of Methods for Solving Nonlinear Simultaneous Equations’.
Mathematics of Computation 19, 557—593.
35. Broyden, C.: 1973, ‘Some Condition-Number Bounds for the Gaussian Elimination Process’. J. Inst. Maths. Applics 12, 273—286.
36. Broyden, C.: 1974, ‘Error Propagation in Numerical Processes’. Journal of the Institute
of Mathematics and Its Applications 14, 131—140.
37. Broyden, C.: 1985, ‘On the numerical stability of Huang’s and related methods’. JOTA
47, 401—412.
38. Broyden, C.: 1989, ‘On the Numerical Stability of Huang’s Update’. Calcolo 26, 303—311.
39. Broyden, C.: 1997, ‘The Gram-Schmidt Method - a Hierarchy of Algorithms’. In: A. Sydow
(ed.): 15th IMACS World Congress on ScientiÞc Computation, Modelling and Applied
Mathematics, Vol. 2, Numerical Mathematics. Berlin, pp. 545—550.
40. Broyden, C., J. Dennis, and J. Moré: 1973, ‘On the Local and Superlinear Convergence
of Quasi-Newton Methods’. Journal of the Institute of Mathematics and Applications 12,
223—245.
41. Bunch, J. and B. Parlett: 1971, ‘Direct Methods for Solving Symmetric IndeÞnite Systems
of Linear Equations’. SIAM J. Numer. Anal. 8, 639—655.
42. Carlson, D.: 1975, ‘Matrix Decompositions Involving the Schur Complement’. SIAM J.
References
167
Appl. Math. 28(3), 577—587.
43. Censor, Y.: 1981, ‘Row-action methods for huge and sparse systems and their applications’.
SIAM Review 23(4), 444—466.
44. Censor, Y.: 1982, ‘Cyclic subgradient projections’. Mathematical Programming 24, 233—
235.
45. Censor, Y. and S. Zenios: 1997, Parallel Optimization: Theory, Algorithms and Applications. Oxford University Press.
46. Chang, X.-W. and C. Paige: 1998a, ‘On the Sensitivity of the LU Factorization’. BIT 38,
486—501.
47. Chang, X.-W. and C. Paige: 1998b, ‘Sensitivity Analyses for Factorizations of Sparse or
Structured Matrices’. Linear Algebra and its Applications 284, 53—71.
48. Chang, X.-W., C. Paige, and G. Stewart: 1996, ‘New Perturbation Analyses for the
Cholesky Factorization’. IMA J. Numer. Anal. 16, 457—484.
49. Chen, Z., N. Deng, and E. Spedicato: 2000, ‘Truncated Difference ABS-Type Methods for
Nonlinear Systems of Equations’. Optimization Methods and Software 13, 263—274.
50. Chu, M., R. Funderlic, and G. Golub: 1995, ‘A Rank-One Reduction Formula and its
Applications to Matrix Factorizations’. SIAM Review 37, 512—530.
51. Chu, M., R. Funderlic, and G. Golub: 1998, ‘Rank ModiÞcations of SemideÞnite Matrices
Associated with a Secant Update Formula’. SIAM J. Matrix. Anal. Appl. 20, 428—436.
52. Cline, A., C. Moler, G. Stewart, and J. Wilkinson: 1979, ‘An Estimate for the Condition
Number of a Matrix’. SIAM J. Numer. Anal. 16, 368—375.
53. Cline, R. and R. Funderlic: 1979, ‘The rank of a difference of matrices and associated
generalized inverses’. Linear Algebra and its Applications 24, 185—215.
54. Coleman, T. and P. Fenyes: 1992, ‘Partitioned Quasi-Newton Methods for Nonlinear Equality Constrained Optimization’. Mathematical Programming 53, 17—44.
55. Cottle, R.: 1974, ‘Manifestations of the Schur Complement’. Lin. Alg. Appl. 8, 189—211.
56. Cox, M.: 1990, ‘The Least-Squres Solution of Linear Equations with Block-Angular Observation Matrix’. In: M. Cox and S. Hammarling (eds.): Reliable Numerical Computation.
Oxford, pp. 227—240.
57. Dahlquist, G., S. Eisenstat, and G. Golub: 1972, ‘Bounds for the Error of Linear Systems
of Equations Using the Theory of Moments’. J. Math. Anal. Appl. 37, 151—166.
58. Davis, P.: 1975, Interpolation and Approximation. New York: Dover.
59. Dembo, R., S. Eisenstat, and T. Steihaug: 1982, ‘Inexact Newton Methods’.
Num. Anal. 19, 400—408.
SIAM J.
60. Demmel, J.: 1983, ‘The Condition Number of Equivalence Transformations That Diagonalize Matrix Pencils’. SIAM J. Numer. Anal. 20, 599—610.
61. Demmel, J.: 1988, ‘The Probability That a Numerical Analysis Problem is Difficult’. Math.
Comp. 50, 449—480.
62. Deng, N. and Z. Chen: 1990, ‘Truncated Nonlinear ABS Algorithm and Its Convergence
Property’. Computing 45, 169—173.
63. Deng, N. and Z. Chen: 1998, ‘A New Nonlinear ABS-Type Algorithm and its Efficiency
Analysis’. Optimization Methods and Software 10, 71—85.
64. Deng, N., E. Spedicato, and M. Zhu: 1994, ‘A Local Convergence Theorem for Nonlinear
ABS Algorithms’. Comp. Appl. Math. 13(1), 49—59.
65. Deng, N. and M. Zhu: 1992a, ‘Further Results on the Local Convergence of the Nonlinear
ABS Algorithms’. Optimization Methods and Software 1, 211—221.
66. Deng, N. and M. Zhu: 1992b, ‘The Local Convergence Property of the Nonlinear VoevodinABS Algorithm’. Optimization Methods and Software 1, 223—231.
67. Dennis, J. and R. Schnabel: 1983, Numerical Methods for Unconstrained Optimization and
References
168
Nonlinear Equations. Englewood Cliffs, New Jersey: Prentice-Hall, Inc.
68. Deutsch, E.: 1970, ‘Matricial Norms’. Numerische Mathematik 16, 73—84.
69. Deutsch, F.: 1984, ‘Rate of convergence of the method of alternating projections’.
International Series of Numerical Mathematics, Vol. 72. Basel, pp. 96—107.
In:
70. Deutsch, F.: 1992, ‘The Method of Alternating Orthogonal Projections’. In: S. P. Singh
(ed.): Approximation Theory, Spline Functions and Applications. pp. 105—121.
71. Deutsch, F.: 1995, ‘The angle between subspaces of a Hilbert space’. In: S. Singh (ed.):
Approximation Theory, Wavelets and Applications. pp. 107—130.
72. Deutsch, F. and H. Hundal: 1997, ‘The rate of convergence for the method of alternating
projections. II’. Journal of Mathematical Analysis and Applications 205, 381—405.
73. Diniz-Ehrhardt, M. and J. Martínez: 1996, ‘Successive projection methods for the solution
of overdetermined nonlinear systems’. In: G. Di Pillo and F. Gianessi (eds.): Nonlinear
Optimization and Applications. New York, pp. 75—84.
74. Drmaÿc, Z., M. Omladiÿc, and K. Veseliÿc: 1994, ‘On the Perturbation of the Cholesky
Factorization’. SIAM J. Matrix. Anal. Appl 15, 1319—1332.
75. Edelman, A.: 1988, ‘Eigenvalues and Condition Numbers of Random Matrices’. SIAM J.
Matrix. Anal. Appl. 9, 543—560.
76. Egerváry, E.: 1960a, ‘On Rank-Diminishing Operations and their Applications to the
Solution of Linear Equations’. ZAMP 11, 376—386.
77. Egerváry, J.: 1956, ‘Régi És Új Módszerek Lineáris Egyenletrendszerek Megoldására’. A
Magyar Tudományos Akadémia Matematikai Kutató Intézetének Közleményei 1, 109—123.
In Hungarian.
78. Egerváry, J.: 1960b, ‘Über eine Methode Zur Numerischen Lösung der Poissonschen Differenzengleichung Für Beliebige Gebiete’. Acta Mathematica Academiae Scientiarium
Hungaricae 11, 341—361.
79. Elsner, L. and P. Rózsa: 1981, ‘On Eigenvectors and Adjoints of ModiÞed Matrices’. Linear
and Multilinear Algebra 10, 235—247.
80. Faddejew, D. and W. Faddejewa: 1973, Numerischen Methoden der Linearen Algebra.
Berlin: VEB Deutscher Verlag der Wissenschaften.
81. Fegyverneki, S.: 1995, ‘Some New Estimators for Robust Nonlinear Regression’.
Univ. of Miskolc, Series D. Natural Sciences, Mathematics 36(1), 23—32.
Publ.
82. Feng, D. and R. Schnabel: 1993, ‘Globally Convergent Parallel Algorithms for Solving
Block Bordered Systems of Nonlinear Equations’. Optimization Methods and Software 2,
269—295.
83. Feng, E., X. Wang, and L. Wang: 1997, ‘On the Application of the ABS Algorithm to
Linear Programming and Linear Complementarity’. Optimization Methods and Software
8, 133—142.
84. Fiedler, M. and V. Pták: 1962, ‘On Matrices with Nonpositive Off-Diagonal Elements and
Positive Principal Minors’. Czech. Math. J. 12, 619—633.
85. Fletcher, R.: 1997, ‘Dense Factors of Sparse Matrices’. In: M. Buhman and A. Iserles
(eds.): Approximation Theory and Optimization. pp. 145—166.
86. Franchetti, C. and W. Light: 1986, ‘On the von Neumann alternating algorithm in Hilbert
space’. Journal of Mathematical Analysis and Applications 114, 305—314.
87. Frazer, A., J. Duncan, and R. Collar: 1938, Elementary Matrices. Cambridge University
Press.
88. Frommer, A.: 1986a, ‘Comparison of Brown’s and Newton’s Method in the Monotone
Case’. Numerische Mathematik 52, 511—521.
89. Frommer, A.: 1986b, ‘Monotonie und Einschließung Beim Brown-Verfahren’. Ph.D. thesis,
Inst. für Angewandte Mathematik, University of Karlsruhe.
References
169
90. Frommer, A.: 1989, ‘Implementing Brown’s Method for Systems of Nonlinear Equations
with Dense Banded Jacobian’. Computing 41, 123—129.
91. Frommer, A.: 1992a, ‘Verallgemeinerte Diagonaldominanz Bei Nichtlinearen Funktionen.’.
ZAMM 72(9), 431—444.
92. Frommer, A.: 1992b, ‘Verallgemeinerte Diagonaldominanz Bei Nichtlinearen Funktionen
II: Anwendung Auf Asynchrone Iterationsverahren und Beispiele’. ZAMM 72(11), 591—
605.
93. Galántai, A., ‘Perturbation Bounds for Triangular and Full Rank Factorizations’. Computers and Mathematics with Applications. submitted.
94. Galántai, A., Projectors and Projection Methods. Kluwer. under publication.
95. Galántai, A., ‘Rank Reduction: Theory and Applications’. Int. Journal of Mathematics,
Game Theory and Algebra. to appear.
96. Galántai, A.: 1990, ‘Block ABS Methods for Nonlinear Systems of Algebraic Equations’.
In: E. Allgower and K. Georg (eds.): Computational Solution of Nonlinear Systems of
Equations (AMS-SIAM Summer Seminar on the Computational Solution of Nonlinear Systems of Equations, Fort Collins, Colorado, 1988), Lectures in Applied Mathematics, 26.
Providence, RI, pp. 181—187.
97. Galántai, A.: 1991a, ‘Analysis of error propagation in the ABS class for linear systems’.
Ann. Inst. Statist. Math. 43, 597—603.
98. Galántai, A.: 1991b, ‘Convergence Theorems for the Nonlinear ABS-Methods’. In: D.
Greenspan and P. Rózsa (eds.): Colloquia Mathematica Soc. János Bolyai, 59. Numerical
Methods, Miskolc (Hungary) 1990. Amsterdam, pp. 65—79.
99. Galántai, A.: 1993a, ‘ABS Methods on Large Nonlinear Systems with Banded Jacobian
Structure’. Technical report, University of Bergamo. Quaderni DMSIA, 1993/14.
100. Galántai, A.: 1993b, ‘Generalized Implicit LU Algorithms in the Class of ABS Methods
for Linear and Nonlinear Systems of Algebraic Equations’. Technical report, University of
Bergamo. Quaderni DMSIA, 1993/5,.
101. Galántai, A.: 1993c, ‘Testing of Implicit LU ABS Methods on Large Nonlinear Systems
with Banded Jacobian’. Technical report, University of Bergamo. Quaderni DMSIA,
1993/19.
102. Galántai, A.: 1994a, ‘ABS Methods for the Solution of Linear and Nonlinear Systems of
Algebraic Equations’. Habilitation dissertation, Technical University of Budapest.
103. Galántai, A.: 1994b, ‘A Monotonicity Property of M -Matrices’. Publ. Math. Debrecen
44(3-4), 265—268.
104. Galántai, A.: 1995, ‘The Global Convergence of the ABS Methods for a Class of Nonlinear
Problems’. Optimization Methods and Software 4, 283—295.
105. Galántai, A.: 1997a, ‘Hybrid Quasi-Newton ABS Methods for Nonlinear Equations’.
Technical report, University of Bergamo. Quaderni DMSIA, 1997/11.
106. Galántai, A.: 1997b, ‘Hybrid Quasi-Newton ABS Methods for Nonlinear Equations’. In:
F. Cobos, J. Gómez, and F. Mateos (eds.): EAMA97 Meeting on Matrix Analysis and
Applications, Sevilla, September 10-12, 1997. pp. 221—228.
107. Galántai, A.: 1997c, ‘The Local Convergence of Quasi-Newton ABS Methods’.
Univ. of Miskolc, Series D. Natural Sciences 37., Mathematics, 31—38.
Publ.
108. Galántai, A.: 1997-99, ‘Rank Reduction and Conjugation’. Acta Technica Acad. Sci.
Hung. 108(1—2), 107—130. appeared also in Mathematical Notes, Miskolc, Vol.1, No.1,
(2000) 11—33.
109. Galántai, A.: 1998, ‘A Note on Projector Norms’. Publ. Univ. of Miskolc, Series D.
Natural Sciences 38., Mathematics, 41—49.
110. Galántai, A.: 1999a, ‘The Geometry of LU Decomposability’. Publ. Univ. of Miskolc,
Series D. Natural Sciences 40, Mathematics, 21—24.
References
170
111. Galántai, A.: 1999b, ‘Parallel ABS Projection Methods for Linear and Nonlinear Systems
with Block Arrowhead Structure’. Computers and Mathematics with Applications 38,
11—17.
112. Galántai, A.: 2000, ‘Componentwise Perturbation Bounds for the LU , LDU and LDLT
Decompositions’. Mathematical Notes, Miskolc 1, 109—118.
113. Galántai, A.: 2001a, ‘Rank Reduction and Bordered Inversion’.
Miskolc 2(2), 117—126.
Mathematical Notes,
114. Galántai, A.: 2001b, ‘Rank Reduction, Factorization and Conjugation’.
Multilinear Algebra 49, 195—207.
Linear and
115. Galántai, A.: 2001c, ‘A Study of Auchmuty’s Error Estimate’. Computers and Mathematics with Applications 42, 1093—1102.
116. Galántai, A.: 2002, ‘Egerváry Rangszámcsökkentõ Algoritmusa És Alkalmazásai’. Szigma
33(1—2), 45—55.
117. Galántai, A.: 2003, ‘Perturbations of Triangular Matrix Factorizations’.
Multilinear Algebra 51, 175—198.
Linear and
118. Galántai, A. and A. Jeney: 1993, ‘Quasi-Newton ABS Methods’. In: microCAD 93SYSTEM’93, Nemzetközi Számitástechnikai Találkozó, Miskolc, 1993, Március 2-6, M
Szekció: Modern Numerikus Módszerek. pp. 63—68.
119. Galántai, A. and A. Jeney: 1996, ‘Quasi-Newton ABS Methods for Solving Nonlinear
Algebraic Systems of Equations’. JOTA 89(3), 561—573.
120. Galántai, A., A. Jeney, and E. Spedicato: 1993a, ‘A FORTRAN Code of a Quasi-Newton
Algorithm’. Technical report, University of Bergamo. Quaderni DMSIA, 1993/7.
121. Galántai, A., A. Jeney, and E. Spedicato: 1993b, ‘Testing of ABS-Huang Methods on
Moderately Large Nonlinear Systems’. Technical report, University of Bergamo. Quaderni
DMSIA, 1993/6.
122. Gantmacher, F.: 1959, The Theory of Matrices, Vol. I-II. New York: Chelsea.
123. Gáti, A.: 2003, ‘Algoritmusok Automatikus Hibaanalízise’. University of Miskolc. thesis,
Hungarian.
124. Gay, D.: 1975, ‘Brown’s Method and some Generalizations with Application to Minimization Problems’. Technical report, Cornell University. Comput. Sci. Techn. Rep.,
75-225.
125. Ge, R.: 1997, ‘A Family of Broyden-ABS Type Methods for Solving Nonlinear Systems’.
Technical report, University of Bergamo. Quaderni DMSIA, 1997/1.
126. Gill, L. and A. Lewbel: 1992, ‘Testing the Rank and DeÞniteness of Estimated Matrices
with Applications to Factor, State-Space and ARMA Models’. Journal of the American
Statistical Association 87(419), 766—776.
127. Gill, P., W. Murray, M. Saunders, and M. Wright: 1991, ‘Inertia-Controlling Methods for
General Quadratic Programming’. SIAM Review 33, 1—36.
128. Golub, G. and C. Van Loan: 1983, Matrix Computations. Baltimore: The John Hopkins
University Press.
129. Gould, N.: 1985, ‘On Practical Conditions for the Existence and Uniqueness of Solutions
to the General Equality Quadratic Programming Problem’. Mathematical Programming
32, 90—99.
130. Greenbaum, A. and H. Rodrigue: 1989, ‘Optimal Preconditioners of a Given Sparsity
Pattern’. BIT 29, 610—634.
131. Greub, W. and W. Rheinboldt: 1959, ‘On a Generalization of an Inequality of L.V.
Kantorovich’. Proc. Amer. Math. Soc. 10, 407—415.
132. Gustafson, K. and D. Rao: 1996, Numerical Range: The Field of Values of Linear Operators and Matrices. Springer.
133. Guttman, L.: 1944, ‘General Theory and Methods for Matric Factoring’. Psychometrika
References
171
9, 1—16.
134. Guttman, L.: 1946, ‘Enlargement methods for computing the inverse matrix’.
Math. Statist. 17, 336—343.
Ann.
135. Guttman, L.: 1957, ‘A Necessary and Sufficient Formula for Matrix Factoring’.
chometrika 22, 79—91.
Psy-
136. Halperin, I.: 1962, ‘The product of projection operators’. Acta Sci. Math. 23, 96—99.
137. Hamaker, C. and D. Solmon: 1978, ‘The angles between the null spaces of X rays’. Journal
of Mathematical Analysis and Applications 62, 1—23.
138. Hanson, R. and C. Lawson: 1969, ‘Extensions and applications of the Householder algorithm for solving linear least squares problems’. Mathematics of Computation 23, 787—812.
139. Hartwig, R.: 1980, ‘How to Partially Order Regular Elements?’. Mathematica Japonica
25, 1—13.
140. Hartwig, R. and G. Styan: 1987, ‘Partially ordered idempotent matrices’. In: T. Pukkila
and S. Puntanen (eds.): Proc. Second International Tampere Conference in Statistics. pp.
361—383.
141. Hegedûs, C.: 1991, ‘Generating Conjugate Directions for Arbitrary Matrices by Matrix
Equations, I-II.’. Comp. Math. Applic. 21, 71—94.
142. Henrici, P.: 1961, ‘Two Remarks on the Kantorovich Inequality’. Am. Math. Monthly
68, 904—906.
143. Higham, N.: 1996, Accuracy and Stability of Numerical Algorithms. Philadelphia: SIAM.
144. Horn, R. and C. Johnson: 1985, Matrix Analysis. Cambridge University Press.
145. Horn, R. and C. Johnson: 1991, Topics in Matrix Analysis. Cambridge University Press.
146. Householder, A.: 1964, The Theory of Matrices in Numerical Analysis. Blaisdell Publishing Company.
147. Hoyer, W.: 1987, ‘Quadratically Convergent Decomposition Algorithms for Nonlinear
Systems of Equations with Special Structure’. Technical Report 07-23-87, Sektion Mathematik, Technical Universitat, Dresden.
148. Hoyer, W. and J. Schmidt: 1981, ‘Newton-Type Decomposition Methods for Equations
Arising in Network Analysis’. ZAMM 64(9), 397—405.
149. Huang, H.: 1975, ‘A Direct Method for the General Solution of a System of Linear
Equations’. JOTA 16, 429—445.
150. Huang, Z.: 1991a, ‘ModiÞed ABS Methods for Nonlinear Systems Without Evaluating
Jacobian Matrices’. Numer. Math. J. Chinese Univ. 13(1), 60—71.
151. Huang, Z.: 1991b, ‘Multi-Step Nonlinear ABS Methods and Their Efficiency Analysis’.
Computing 46, 143—153.
152. Huang, Z.: 1992a, ‘ABS Methods for Solving Nonlinear Least Squares Problems’. Communications on Applied Mathematics and Computation 6, 75—85.
153. Huang, Z.: 1992b, ‘Convergence Analysis of the Nonlinear Block Scaled ABS Methods’.
JOTA 75(2), 331—344.
154. Huang, Z.: 1992c, ‘A Family of Discrete ABS Type Algorithms for Solving Systems of
Nonlinear Equations’. Numerical Mathematics, a Journal of Chinese Universities 14,
130—143.
155. Huang, Z.: 1993a, ‘On the Convergence Analysis of the Nonlinear ABS Methods’. Chinese
Annals of Mathematics 14B, 213—224.
156. Huang, Z.: 1993b, ‘Restart Row Update ABS Methods for Systems of Nonlinear Equations’. Computing 50, 229—239.
157. Huang, Z.: 1993c, ‘Row Update ABS Methods for Solving Sparse Nonlinear Systems of
Equations’. Optimization Methods and Software 2, 297—309.
158. Huang, Z.: 1994a, ‘A New Method for Solving Nonlinear Underdetermined Systems’.
References
172
Computational and Applied Mathematics 1, 33—48.
159. Huang, Z.: 1994b, ‘Row Update ABS Methods for Systems of Nonlinear Equations’.
Optimization Methods and Software 3, 41—60.
160. Hubert, L., J. Meulman, and W. Heiser: 2000, ‘Two Purposes for Matrix Factorization:
A Historical Appraisal’. SIAM Review 42, 68—82.
161. Ikebe, Y., T. Inagaki, and S. Miyamoto: 1987, ‘The monotonicity theorem, Cauchy’s
interlace theorem, and the Courant-Fischer theorem’. Am. Math. Monthly 94, 352—354.
162. Jain, S. and L. Snyder: 1987, ‘New Classes of Monotone Matrices’. In: F. Uhlig and R.
Grone (eds.): Current Trends in Matrix Theory. New York, pp. 155—166.
163. Jeney, A.: 1991a, ‘Nemlineáris Egyenletrendszerek Megoldására Szolgáló ABS-Módszerek
Egy Részosztályának Diszkretizációja’. Alkalmazott Matematikai Lapok 15 (1990-91)(3—
4), 365—379.
164. Jeney, A.: 1991b, ‘Nemlineáris Egyenletrendszerek Megoldására Szolgáló ABS-Módszerek
Numerikus Vizsgálata’. Alkalmazott Matematikai Lapok 15 (1990-91)(3—4), 353—364.
165. Jeney, A.: 1992, ‘Diszkrét ABS-Módszerek Nemlineáris Egyenletrendszerekre’.
In:
Micro-CAD-SYSTEM’ 92 Nemzetközi Számitástechnikai Találkozó, 1992. Február 25-29,
Számítástechnika Mûszaki Alkalmazása Konferencia Elõadásai, II. Miskolc, pp. 107—110.
166. Jeney, A.: 1995, ‘The Convergence of Discrete ABS Methods and Numerical Investigations’. In: microCAD’95 International Computer Science Conference, February 23,1995,
Miskolc, Section K: Modern Numerical Methods. pp. 22—25.
167. Jeney, A.: 1996, ‘Numerical Comparison of Multistep Quasi-Newton ABS Methods with
the Multistep Broyden Method for Solving Systems of Nonlinear Equations’. Publ. Univ.
of Miskolc, Series D, Natural Sciences 36(2, Mathematics), 47—54.
168. Jeney, A.: 1997, ‘Multistep Discrete ABS Methods for Solving Systems of Nonlinear
Equations’. Publ. Univ. of Miskolc, Series D. Natural Sciences 37, Mathematics(39—
46).
169. Kaczmarz, S.: 1937, ‘Angenäherte Außösung von Systemen linearer Gleichungen’. Bulletin International de l’Académie Polonaise des Sciences et des Lettres. Classe des Sciences
Mathématiques et Naturelles. Série A. Sciences Mathématiques pp. 355—357.
170. Kanzow, C. and H. Kleinmichel: 1995, ‘A Class of Newton-Type Methods for Equality and
Inequality Constrained Optimization’. Optimization Methods and Software 5, 173—198.
171. Kayalar, S. and H. Weinert: 1988, ‘Error bounds for the method of alternating projections’. Math. Control Signals Systems 1, 43—59.
172. Kelley, C.: 1990, ‘Operator Prolongation Methods for Nonlinear Equations’. In: E.
Allgower and K. Georg (eds.): Computational Solution of Nonlinear Systems of Equations,
Lectures in Applied Mathematics 26. Providence, Rhode Island, pp. 359—388.
173. Krasnosel’skii, M., G. Vainikko, P. Zabreiko, Y. Rutitskii, and V. Stetsenko: 1972, Approximate Solution of Operator Equations. Groningen: Wolters-Nordhoff.
174. Liu, C.: 1995, ‘An acceleration scheme for row projection methods’. Journal of Computational and Applied Mathematics 57, 363—391.
175. Mangasarian, O.: 1976, ‘Equivalence of the Complementarity Problem to a System of
Nonlinear Equations’. SIAM J. Appl. Math. 31, 89—92.
176. Marcus, M. and M. Minc: 1992, A Survey of Matrix Theory and Matrix Inequalities. New
York: Dover Publications, Inc.
177. Marsaglia, G. and G. Styan: 1974, ‘Equalities and Inequalities for Ranks of Matrices’.
Linear and Multilinear Algebra 2, 269—292.
178. Marshall, A. and I. Olkin: 1973, ‘Norms and inequalities for condition numbers, III’.
Linear Algebra and its Applications 7, 291—300.
179. Marshall, A. and I. Olkin: 1979, Inequalities: Theory of Majorization and Its Applications.
New York: Academic Press.
References
173
180. Martínez, J.: 1985, ‘The projection method for solving nonlinear systems of equations
under the "most violated constraint" control’. Comp. and Maths. with Appls. 11(10),
987—993.
181. Martínez, J.: 1986a, ‘The method of successive orthogonal projections for solving nonlinear simultaneous equations’. Calcolo 23, 93—104.
182. Martínez, J.: 1986b, ‘Solution of nonlinear systems of equations by an optimal projection
method’. Computing 37, 59—70.
183. Martinez, J.: 1994, ‘Algorithms for Solving Nonlinear Systems of Equations’. In: E.
Spedicato (ed.): Algorithms for Continuous Optimization: The State of the Art. pp. 81—
108.
184. McCormick, S.: 1975, ‘An iterative procedure for the solution of constrained nonlinear
equations with application to optimization problems’. Numerische Mathematik 23, 371—
385.
185. McCormick, S.: 1977, ‘The methods of Kaczmarz and row orthogonalization for solving linear equations and least squares problems in Hilbert space’. Indiana University
Mathematics Journal 26(6), 1137—1150.
186. Meany, R.: 1969, ‘A matrix inequality’. SIAM J. Numer. Anal. 6(1), 104—107.
187. Metcalf, F.: 1965, ‘A Bessel-Schwarz Inequality for Gramians and Related Bounds for
Determinants’. Ann. Mat. Pura Appl. 68(4), 201—232.
188. Meyn, K.: 1983, ‘Solution of underdetermined nonlinear equations by stationary iteration
methods’. Numerische Mathematik 42, 161—172.
189. Miller, W. and C. Wrathall: 1980, Software for Roundoff Analysis of Matrix Algorithms.
Academic Press.
190. Moré, J. and M. Cosnard: 1979, ‘Numerical Solution of Nonlinear Equations’. TOMS 5,
64—85.
191. Moré, J., B. Garbow, and K. Hillstrom: 1981, ‘Testing Unconstrained Optimization Software’. TOMS 7, 17—41.
192. Nambooripad, K.: 1980, ‘The Natural Partial Order on a Regular Semigroup’. Proceedings
of the Edinburgh Mathematical Society 23, 249—260.
193. Nelson, S. and M. Neumann: 1987, ‘Generalizations of the Projection Method with Application to SOR Theory for Hermitian Positive SemideÞnite Linear Systems’. Numer.
Math. 51, 123—141.
194. Neumann, J. v.: 1937, ‘Some matrix-inequalities and metrization of matric-space’. Tomsk
Univ. Rev. 1, 286—300. in Collected Works, Vol. IV., Pergamon Press, 1962, pp. 205-219.
195. Neumann, J. v.: 1950, ‘On Rings of Operators, Reduction Theory’. Annals of Math. 50,
401—485.
196. Neumann, J. V. and H. Goldstine: 1947, ‘Numerical Inverting of Matrices of High Order’.
Bull. Amer. Math. Soc. 53, 1021—1099.
197. Neumann, J. V. and H. Goldstine: 1951, ‘Numerical Inverting of Matrices of High Order.
II’. Proc. Amer. Math. Soc. 2, 188—202.
198. Nicolai, S. and E. Spedicato: 1997, ‘A Bibliography of the ABS Methods’. Optimization
Methods and Software 8, 171—183.
199. Nocedal, J. and M. Overton: 1985, ‘Projected Hessian Updating Algorithms for Nonlinearly Constrained Optimization’. SIAM J. Numer. Anal. 22, 821—850.
200. O’Leary, D.: 1995, ‘Why Broyden’s nonsymmetric method terminates on linear equations’.
SIAM J. Optimization 5(2), 231—235.
201. Ortega, J.: 1972, Numerical Analysis: A Second Course. New York: Academic Press.
202. Ortega, J. and W. Rheinboldt: 1970, Iterative Solution of Nonlinear Equations in Several
Variables. Academic Press.
174
References
203. Ouellette, D.: 1981, ‘Schur complements and statistics’. Linear Algebra and its Applications 36, 187—295.
204. Pan, C.-T.: 2000, ‘On the Existence and Computation of Rank-Revealing LU Factorizations’. Linear Algebra and its Applications 316, 199—222.
205. Pan, V., Y. Yu, and C. Stewart: 1997, ‘Algebraic and Numerical Techniques for the
Computation of Matrix Determinants’. Computers Math. Applic. 34(1), 43—70.
206. Phua, K.: 1988, ‘Solving Sparse Linear Systems by an ABS- Method That Corresponds
to LU-Decomposition’. BIT 28, 709—718.
207. Pietrzykowski, T.: 1960, ‘Projection method’. Technical Report A 8, Polskiej Akademii
Nauk, Warsaw. Zaklad Aparatow Matematycznych PAN.
208. Ronto, M. and A. Samoilenko: 2000, Numerical-Analytic Methods in the Theory of
Boundary-Value Problems. World ScientiÞc.
209. Rontó, M. and Á. Tuzson: 1999, ‘ModiÞcation of Trigonometric Collocation Method for
Impulsive Periodic BVP’. Computers and Mathematics with Applications 38, 117—123.
210. Ronto, N. and Á. Tuzson: 1994, ‘Construction of Periodic Solutions of Differential Equations with Impulse Effect’. Publ. Math. Debrecen 44(3—4), 335—357.
211. Roose, A., V. Kulla, M. Lomp, and T. Meressoo: 1990, Test Examples of Systems of
Nonlinear Equations, Version 3-90. Estonian Software and Computer Service Company,
Tallin.
212. Saad, Y.: 1996, Iterative Methods for Sparse Linear Systems. PWS Publishing Company.
213. Schatten, R.: 1960, Norm Ideals of Completely Continuous Operators. Springer-Verlag.
214. Schmidt, J.: 1987, ‘A Class of Superlinear Decomposition Methods in Nonlinear Equations’. Numer. Funct. Anal. and Optimiz. 9(5—6), 629—645.
215. Schmidt, J., W. Hoyer, and C. Haufe: 1985, ‘Consistent Approximations in Newton-Type
Decomposition Methods’. Numer. Math. 47, 413—425.
216. Schmidt, J. and U. Patzke: 1981, ‘Iterative Nachverbesserung mit Fehlereingrenzung
der Cholesky-Faktoren Von Stieltjes-Matrizen’. Journ. für die Reine und Angewandte
Mathematik 327, 81—92.
217. Schnabel, R.: 1994, ‘Parallel Nonlinear Optimization: Limitations, Opportunities, and
Challenges’. In: E. Spedicato (ed.): Algorithms for Continuous Optimization:The State of
the Art. pp. 531—559.
218. Schnabel, R.: 1995, ‘A View of the Limitations, Opportunities, and Challenges in Parallel
Optimization’. Parallel Computing 21, 875—905.
219. Schröder, J.: 1980, Operator Inequalities. New York: Academic Press.
220. Smith, K., D. Solmon, and S. Wagner: 1977, ‘Practical and mathematical aspects of the
problem of reconstructing objects from radiographs’. Bull. Am. Math. Soc. 83(6),
1227—1270.
221. Spedicato, E.: 1993, ‘Ten Years of ABS Methods: A Review of Theoretical Results and
Computational Achievements’. Surveys on Mathematics for Industry 3, 217—232.
222. Spedicato, E., Z. Chen, and N. Deng: 1994, ‘A Class of Difference ABS-Type Algorithms
for a Nonlinear System of Equations’. Numerical Linear Algebra with Applications 1(3),
313—329.
223. Spedicato, E. and J. Greenstadt: 1978, ‘On some Classes of Variationally Derived QuasiNewton Methods for Systems of Nonlinear Algebraic Equations’. Numerische Mathematik
29, 363—380.
224. Spedicato, E. and Z. Huang: 1995, ‘Optimally Stable ABS Methods for Nonlinear Underdetermined Systems’. Optimization Methods and Software 5, 17—26.
225. Spedicato, E. and Z. Xia: 1994, ‘ABS Methods for Nonlinear Optimization’. In: E.
Spedicato (ed.): Algorithms for Continuous Optimization: The State of the Art. pp. 333—
356.
References
175
226. Spedicato, E., Z. Xia, and L. Zhang: 2000, ‘ABS Algorithms for Linear Equations and
Optimization’. J. Comp. Appl. Math. 124, 155—170.
227. Spedicato, E., Z. Xia, L. Zhang, and K. Mirnia: 1998, ‘ABS Algorithms for Linear Equations and Applications to Optimization’. In: G. Winter Althaus and E. Spedicato (eds.):
Algorithms for Large Scale Linear Algebraic Systems: Applications in Science and Engineering. pp. 291—319.
228. Spedicato, E. and M. Zhu: 1999, ‘A Generalization of the Implicit LU Algorithm to an
Arbitrary Initial Matrix’. Numerical Algorithms 20, 343—351.
229. Stewart, G.: 1973, ‘Conjugate direction methods for solving systems of linear equations’.
Numerische Mathematik 21, 285—297.
230. Stewart, G.: 1977, ‘On the perturbation of pseudo-inverses, projections and linear least
squares problems’. SIAM Review 19(4), 634—662.
231. Stewart, G.: 1993, ‘On the Perturbation of LU, Cholesky and QR Factorizations’. SIAM
J. Matrix. Anal. Appl. 14, 1141—1145.
232. Stewart, G.: 1997, ‘On the Perturbation of LU and Cholesky Factors’. IMA J. Numer.
Anal. 17, 1—6.
233. Stewart, G.: 1998, Matrix Algorithms, Vol. I: Basic Decompositions.
SIAM.
Philadelphia:
234. Stewart, G. and J. Sun: 1990, Matrix Perturbation Theory. Academic Press.
235. Sun, J.-G.: 1991, ‘Perturbation Bounds for the Cholesky and QR Factorizations’. BIT
31, 341—352.
236. Sun, J.-G.: 1992a, ‘Componentwise Perturbation Bounds for some Matrix Decompositions’. BIT 32, 702—714.
237. Sun, J.-G.: 1992b, ‘Rounding-Error and Perturbation Bounds for the Cholesky and LDLT
Factorizations’. Linear Algebra and its Applications 173, 77—97.
238. Tompkins, C.: 1955, ‘Projection methods in calculation’. In: H. Antosiewicz (ed.): Proc.
Second Symposium on Linear Programming. Washington, D.C., pp. 425—448.
239. Turing, A.: 1948, ‘Rounding-Off Errors in Matrix Processes’. Quart. J. Mech. Appl.
Math. 1, 287—308.
240. Van de Velde, E.: 1994, Concurrent ScientiÞc Computing. Springer-Verlag.
241. Varga, R. and D.-Y. Cai: 1981, ‘On the LU Factorization of M -Matrices’. Numerische
Mathematik 38, 179—192.
242. Vasil’chenko, G. and A. Svetlanov: 1980, ‘Proektsionnyj algoritm resheniya sistem linejnykh algebraicheskikh uravnenij bol’shoj razmernosti’. Zh. Vychisl. Mat. Mat. Fiz.
20(1), 3—10. in Russian.
243. Vespucci, M., Z. Yang, E. Feng, H. Yu, and X. Wang: 1992, ‘A Bibliography on the ABS
Methods’. Optimization Methods and Software 1, 273—281.
244. Vespucci, T.: 1992, ‘The ABS Preconditioned Conjugate Gradient Algorithm’. Technical
report, University of Bergamo. Quaderno DMSIA, 92/6.
245. Viswanath, D. and L. Trefethen: 1998, ‘Condition Numbers of Random Triangular Matrices’. SIAM J. Matrix Anal. Appl. 19, 564—581.
246. Voyevodin, V.: 1983, Linear Algebra. Moscow: Mir.
247. Wedderburn, J.: 1934, Lectures on Matrices, Amer. Math. Soc. Colloquium Publications,
Vol. XVII. AMS.
248. Whitney, T. and R. Meany: 1967, ‘Two algorithms related to the method of steepest
descent’. SIAM J. Numer. Anal. 4(1), 109—118.
249. Windisch, G.: 1989, M-Matrices in Numerical Analysis. Leipzig: Teubner.
250. Zanghirati, G.: 1999, ‘Some Theoretical Properties of Feng-Schnabel Algorithm for Block
Bordered Nonlinear Systems’. Optimization Methods and Software 10, 783—801.
176
References
251. Zassenhaus, H.: 1964, ‘Angles of inclination in correlation theory’. Am. Math. Monthly
71, 218—219.
252. Zhang, L.: 1998, ‘Computing Inertias of KKT Matrix and Reduced Hessian Via the ABS
Algorithm’. Department of Applied Mathematics, Dalian University of Technology, Dalian,
China, pp. 1-14.
253. Zhang, L., Z. Xia, and E. Feng: 1999, Introduction to ABS Methods in Optimization.
Dalian University of Technology Press. Chinese.
254. Zhang, X., R. Byrd, and R. Schnabel: 1992, ‘Parallel Methods for Solving Nonlinear
Block Bordered Systems of Equations’. SIAM J. ScientiÞc and Statistical Comput. 13,
841—859.
255. Zhu, M.: 1989, ‘A Generalization of Nonlinear ABS Algorithms and Its Convergence’.
Journal of Beijing Polytechnic University 15(4), 21—26.
256. Zhu, M.: 1991, ‘A Note on the Condition for Convergence of Nonlinear ABS Method’.
Calcolo 28, 307—314.
257. Zielke, G.: 1988, ‘Some Remarks on Matrix Norms, Condition Numbers and Error Estimates for Linear Equations’. Lin. Alg. Appl. 110, 29—41.
Download