Document 11383322

advertisement
ETNA
Electronic Transactions on Numerical Analysis.
Volume 8, 1999, pp. 26-35.
Copyright  1997, Kent State University.
ISSN 1068-9613.
Kent State University
etna@mcs.kent.edu
PRECONDITIONERS FOR LEAST SQUARES PROBLEMS BY LU
FACTORIZATION
A. BJÖRCKy AND J. Y. YUANz
Abstract. Iterative methods are often suitable for solving least-squares problems
min kAx , bk2 , where A 2
Rmn is large and sparse. The use of the conjugate gradient method with a nonsingular square submatrix A1 2
Rnn of A as preconditioner was first suggested by Läuchli in 1961. This conjugate gradient method has recently
been extended by Yuan to generalized least-squares problems.
In this paper we consider the problem of finding a suitable submatrix A1 and its LU factorization for a sparse
rectangular matrix A. We give three algorithms based on the sparse LU factorization algorithm by Gilbert and
Peierls.
Numerical results are given, which indicate that our preconditioners can be effective.
Key words. Linear least squares, preconditioner, conjugate gradient method, LU factorization.
AMS subject classifications. 65F10, 65F20.
1. Introduction. Consider the linear least-squares problem
(1.1)
min kAx , bk2; (A 2 Rmn ; b 2 Rm; m n)
x2Rn
where k k2 denotes the Euclidean vector norm. For large scale sparse problems iterative
solution methods are often preferable to direct methods. To improve the rate of convergence
of an iterative method it can be applied to the preconditioned problem
(1.2)
min kAS ,1y , bk2 ; y = Sx;
y2Rn
R
where the nonsingular matrix S 2 nn is a preconditioner.
Here we will consider a class of preconditioners based on a partitioning of A,
(1.3)
1
PA = A
A2 ;
R
where P is a permutation matrix and A1 2 nn a nonsingular submatrix. Such preconditioners first seem to have been suggested by Läuchli [11], and later investigated by Chen [4].
For applications see also [6, 7, 8, 12, 13, 15], and the survey in [2]. The detailed construction
of preconditioners of this form for sparse problems seems to have been little studied in the
literature.
A powerful class of iterative methods is obtained by applying the conjugate gradient
method to the normal equations of the preconditioned problem (1.2),
(1.4)
S ,T AT AS ,1 y = S ,T AT b:
Note that for numerical stability care is needed in the implementation of this method. The
stability in finite precision of the conjugate gradient and Lanczos methods for least squares
problems is discussed in [3].
Received November 30, 1997. Accepted for publication December 11, 1998. Recommended by D. Calvetti
December 11, 1998. The work of the second author was partially supported by CNPq, Brazil.
y Department of Mathematics, Linköping University, S-581 83, Linköping, Sweden (akbjo@math.liu.se)
z Department of Mathematics, Universidade Federal do Paraná, Curitiba, Paraná, Brazil
(jin@gauss.mat.ufpr.br)
26
ETNA
Kent State University
etna@mcs.kent.edu
27
A. Björck and J. Y. Yuan
Yuan [18] recently presented block iterative methods, which use
for the generalized least-squares problem
(1.5)
A1 as preconditioner,
min (Ax , b)T W ,1 (Ax , b);
x2Rn
where W is symmetric and positive definite. The solution to this problem satisfies the normal
equations AT W ,1 Ax = AT W ,1 b. Introducing the scaled residual vector r = W ,1 (b,Ax)
these can be written in augmented form as
Wr + Ax
AT r
(1.6)
=b :
=0
An outline of the paper is as follows. In Section 2 we give two basic conjugate gradient
methods using A1 in (1.3) as preconditioner, and give bounds for the rate of convergence.
We also show how one of these can be adapted to solve the generalized least squares problem (1.5). An algorithm for selecting n linearly independent rows from A to form A1 , and
performing a sparse LU factorization of A1 is outlined in Section 3. In order to save multiplication and storage, we use a modification of the algorithm of Gilbert and Peierls [9]. In order
to ensure the numerical stability of this algorithm, we consider using partial pivoting.
In Section 4 the LU factorization algorithms are tested on some sparse rectangular matrices from the Harwell-Boeing sparse matrix collection [5]. Numerical comparisons between
the conjugate gradient method and the preconditioned conjugate gradient method with the
preconditioner A1 given by our algorithms are also given for the generalized least squares
problem (5) where A and W are Hilbert matrices. From the comparison results, the preconditioned conjugate gradient method is much better than the conjugate gradient method.
2. CG methods preconditioned by LU factorizations. Iterative methods using A1 in
(1.3) as preconditioner often have very good convergence properties. However, the row permutation matrix P must be chosen so that the n rows of A1 are linearly independent and
AA,1 1 is well conditioned.
In this section we assume for simplicity that the permutation P in (1.3) has been carried
1
(m,n)n , where
out in advance, and that we have computed the matrix C = A2 A,
1 2
(2.1)
R
AA,1 1 = A IAn,1 = CI :
2 1
Then the preconditioned problem can be written in the form
(2.2)
where
are
(2.3)
; y = A1 x; C = A2A,1 1 ;
2 In y , b1
min
y C
b
b has been partitioned conformingly with A.
The normal equations for this problem
(In + C T C )y = b1 + C2T b2 ;
and x can then be retrieved from A1 x = y . Läuchli [11] proposed to apply the conjugate
gradient algorithm to this system of equations. This leads to the following algorithm:
ETNA
Kent State University
etna@mcs.kent.edu
28
Preconditions for Least Squares Problems
LU P RECONDITIONED CGLSI. Set
r1(0) = b1 ; r2(0) = b2 ; p(0) = s(0) = b1 + C T b2 ; 0 = ks(0) k22 ;
and for k
= 0; 1; 2; : : : while k > tol compute
q(k) = Cp(k) ;
k = k =(kp(k)k22 + kq(k) k22 );
r1(k+1) = r1(k) , k p(k) ;
r2(k+1) = r2(k) , k q(k) ;
s(k+1) = r1(k+1) + C T r2(k+1) ;
k+1 = ks(k+1) k22 ;
k = k+1 =k ;
p(k+1) = s(k+1) + k p(k) :
Solve x from A1 x = b1 , r1 .
It is well known (see Björck [2, Chap. 7.4]) that for the conjugate gradient method
applied to the preconditioned normal equations (1.4) the error is reduced according to
p 1 k
kA(x , xk )k2 2 p ,
+ 1 kA(x , x0 )k2
where x0 is the initial solution and = (S ,T AT AS ,1 ) the condition number of the preconditioned normal matrix.
The convergence of the conjugate gradient method applied to (2.3) has been studied by
Freund [8]. The eigenvalues of (In + C T C ) are
(2.4)
i = 1 + i2 (C );
i = 1; : : : ; n;
where i (C ) are the singular values of C . It follows that
(AA,1 ) =
1
p1 + 2 (C ) p
p1 + 12 (C ) 1 + 2;
n
where
(2.5)
= 1 (C ) = kC k2 = kA2 A,1 1 k2 max((AA2)) :
min
1
Hence, the standard error bounds based on Chebyshev polynomials for this CG method is
(see Freund [8])
(2.6)
kA(x , xk )k2 < 2 2k
kA(x , x0 )k2 1 + 4k
=
p
1 + 1 + 2
:
To get fast convergence we want to have small. According to (2.5) this is achieved if
the blocking (1.3) can be chosen so that kA2 k2 is small and A1 well-conditioned. Since C
has at most p = minfm , n; ng distinct singular values, the matrix (I + C T C ) will have at
most minfp + 1; ng distinct eigenvalues. Hence, in exact arithmetic, CGLSI will converge
ETNA
Kent State University
etna@mcs.kent.edu
A. Björck and J. Y. Yuan
29
in at most minfp + 1; ng steps. Therefore, in particular, we can expect rapid convergence if
p n.
An alternative is obtained by eliminating r1 from the relations
r1 = b1 , y = ,C T r2 ;
r2 = b2 , Cy:
Then we get a system of dimension (m , n) (m , n) for r2
(CC T + Im,n )r2 = b2 , Cb1 ;
(2.7)
and x can be retrieved from
A1 x = b1 + C T r2 :
(2.8)
The system (2.7) can be interpreted as the normal equations for
(2.9)
,C T b 1 min
r Im,n r2 , b2 :
2
If (m , n) is small the system (2.7) can be cheaply solved by Cholesky factorization,
A QR decomposition of the matrix (,C Im,n )T of size n (m , n) also can be used to
solve for r2 . Note that the special form of this matrix allows for further economizing in the
QR decomposition, see [2, Sec. 2.7.2].
The conjugate gradient method can be adapted to solve the system (2.7). The resulting
algorithm is as follows:
LU P RECONDITIONED CGLSII. Set
v1(0) = b1 ; v2(0) = b2 ; p(0) = s(0) = b2 , Cb1 ; 0 = ks(0) k22 ;
and for k = 0; 1; 2; : : : while k > tol compute
q(k) = ,C T p(k) ;
k = k =(kp(k)k22 + kq(k) k22 );
v1(k+1) = v1(k) , k q(k) ;
v2(k+1) = v2(k) , k p(k) ;
s(k+1) = v2(k+1) , Cv1(k+1) ;
k+1 = ks(k+1) k22 ;
k = k+1 =k ;
p(k+1) = s(k+1) + k p(k) :
Solve A1 x = b1 , C T (b2 , v2 ) for x.
Algorithm CGLSII requires about the same storage and work as CGLSI. Since the eigenvalues of (I + CC T ) are the same as those of (I + C T C ) the convergence rate will also be
the same. Yuan [18] has shown that an advantage of CGLSII is that it can more easily be
adapted to solve generalized least squares problems. For the generalized problem (1.5) the
normal equations (2.3) become
(2.10)
( In C T ) W ,1
I n
,1
T
,C y = ( In C ) W b; A1 x = y:
ETNA
Kent State University
etna@mcs.kent.edu
30
Preconditions for Least Squares Problems
On the other hand the equations (2.7) and (2.8) generalize to
(2.11)
(2.12)
T
( ,C Im,n ) W I,C r2 = b2 , Cb1 ;
m,n
T
A1 x = b1 , ( In 0 ) W I,C r2 :
m,n
Here the symmetric system (2.11) can be solved by the conjugate gradient method without
the need of inverting or factorizing the full covariance matrix W . Yuan shows in [18] that the
standard error bounds based on Chebyshev polynomials for this CG method is
(2.13)
where kA(x , xk )kW ,1 < 2 2k ;
kA(x , x0 )kW ,1 1 + 4k
2 =
(1 p
+ 2 ) , 1 :
(1 + (1 + 2 ) )2
= (W ) is the spectral condition number of W , and
kxkW ,1 = (xT W ,1 x)1=2 :
(For = 1 this estimate reduces to (2.6).)
T HEOREM 2.1. Let n and n be smallest eigenvalues of AT A and AT1 A1 respectively.
Then
(I + C T C ) (AT A) n :
n
Proof. Let 1 be the largest eigenvalue of AT A, A1 x1 eigenvector of I + C T C associated
with 1 , the largest eigenvalue of I + C T C , i.e., 1 = maxkxk2 =1 xT AT Ax and 1 =
maxkxk2 =1 xT (I + C T C )x. Then,
1 xT1 AT1 (I + C T C )A1 x1 = 1 xT1 AT1 A1 x1 1 n :
Assume that n = minkxk2 =1 xT (I
n 1. It follows that
+ C T C )x is the smallest eigenvalue of I + C T C . Hence,
(I + C T C ) 1 1 = (AT A) n :
n
n
R EMARK 1. If x1 is not eigenvalue of AT A associated with 1 , then
(I + C T C ) < (AT A) n :
n
Thus we must choose the preconditioner A1 as well as possible such that n
antee (I + C T C ) (AT A).
n to guar-
3. LU-decomposition algorithms. Läuchli [11] used Gauss-Jordan elimination to ex1
plicitly compute the matrix C = A2 A,
1 in (2.1). Then at each iteration step in the preconditioned conjugate gradient methods matrix vector products with C and C T are computed.
This approach is suitable only when (m , n) is not too large compared to n.
When A has many more rows than columns it is preferable to factorize just A1 rather
1
than A and not form the matrix C = A2 A,
1 explicitly. If C is not formed, we need to
ETNA
Kent State University
etna@mcs.kent.edu
A. Björck and J. Y. Yuan
31
perform in each iterative step two matrix vector multiplications with A2 and AT2 , and solve
two linear systems of the form
A1 y = c
(3.1)
and
AT1 z = d:
To do this efficiently we need to compute the LU factorization of A1 . We could compute the
LU factorization of the complete matrix A and then extract the LU factorization of A. However, this would require extra computation and storage. Further we would need an auxiliary
array to save a copy of A2 .
We will use the sparse LU-decomposition algorithm of Gilbert and Peierls [9], modified
so that we get a LU decomposition of A1 only. This algorithm performs Gaussian elimination
in column-wise order, and breaks the computation of each column into a symbolic and a
numerical stage. The j th column of L and U are computed by solving a sparse triangular
system. This approach allows partial pivoting by rows to be performed in time proportional
to arithmetic operations.
The algorithm of Gilbert and Peierls can be modified to work for an n m matrix with
m n. For our least squares problems the matrix A is m n, with m n. Therefore
we apply Gilbert and Peierls’ algorithm to the transposed matrix AT to obtain a factorization
A1 = U T LT of an n n nonsingular submatrix. To reduce fill-in we first sort the rows of A
by increasing nonzero count.
We use similar notations as Gilbert and Peierls in [3] for description of the basic algorithm. Thus j is the index of the row of L and U being computed. aj = (aj 1 ; : : : ; aj;j ,1 )T ,
a0j = (ajj ; : : : ; ajn )T ;
0 l11 : : :
Lj = @ ... . . .
0
..
.
0 l ::: l 1
1
j1
j ,1;j
A L0j = B@ ... . . . ... CA ;
lj,1;1 : : : lj,1;j,1
l1;n : : : lj,1;n
and lj0 = (ljj ; : : : ; lnj )T , uj = (u1j ; : : : ; uj ,1;j )T where ljj = 1. Also we use c0j =
(cjj ; : : : ; cnj )T as an intermediate result.
In the basic algorithm SP1, if juii j < , where is small tolerance, we consider the ith
column in AT as linearly dependent on the first i , 1 columns. Such a column is put out of
all further computations in the factorization, and interchanged with a later column. We repeat
this process until i = n or more than m , n linearly dependent columns have been found.
The output of Algorithm SP1 is A1 = U T LT where L is unit lower triangular.
A LGORITHM SP1
1. Set initial values l = m, index(i) = i, i = 1; : : : ; m;
2. For j = 1, to n do
Compute column j of U and L;
2.1 solve Lj uj = aj for uj ; (aTj is the j th row in A)
2.2 cj = a0j , L0j uj ;
2.3 if jcjj j then go to 2.5
else
interchange the j th column and the lth column of AT ,
and update index and l;
2.4 if l j then rank (A) < n and stop else goto 2.1
2.5 ujj = cjj ; lj0 = cj =ujj .
ETNA
Kent State University
etna@mcs.kent.edu
32
Preconditions for Least Squares Problems
The upper triangular solve in step 2.1 proceeds as follows (see [9]: for each
ukj 6= 0 (in the topological order determined in the previous step) do
k with
uj = uj , ukj ( l1k : : : lj,1;k )T :
Algorithm SP1 computes elements in the LU-decomposition of A1 in a column-wise
fashion. Now we shall consider the LU decomposition of A1 as U T LT where U is unit upper
triangular.
A LGORITHM SP2
1. Set initial values l = m, index(i) = i, i = 1; : : : ; m;
2. For j = 1; : : : ; n do
2.1 solve Uj lj = aj for lj ;
2.2 cj = a0j , Uj0 lj ;
2.3 if jcjj j then go to 2.5
else
interchange the j th row and the lth row of A,
and update index and l;
2.4 if l j then rank (A) < n and stop else goto 2.1
2.5 ljj = cjj ; u0j = cj =ljj :
2.6 l = m
Algorithm SP3 performs partial pivoting, and is obtained by making the following changes
in Algorithm SP2:
A LGORITHM SP3
Before step 2.3 insert
2.3a k = arg maxj in jcji j and s1 = cjk ;
Change 2.6 to:
2.6 if k = j goto 2.7 else do
0 ; (i = 1; : : : ; j , 1)
exchange cjj , index1(j ) and Uji
0 ; (i = 1; : : : ; j , 1).
respectively with cjk , index1(k ) and Ujk
2.7 ljj = cjj
u0j = cj =ljj :
We finally mention an alternative approach suggested by Saunders [16] is to use Gaussian
elimination with row interchanges to compute a stable, sparse factorization PA = LU , where
L 2 mn unit lower trapezoidal, U upper triangular, and use U as preconditioner. The
rational for this choice is that any ill-conditioning in A is usually reflected in U , and L tends
to be well conditioned. A preliminary pass through the rows of A can be made to select a
triangular subset with maximal diagonal elements. The matrix L is not saved, and subsequent
use of the operator AU ,1 involves back-substitution with U and multiplication with A. This
approach has the advantage that often there is very little fill in U , and hence U is likely to
contain less non-zeros than A.
R
4. Numerical experiments. All programs were written in FORTRAN 77 and executed
in double precision. The computations were performed on a SUN Workstation running UNIX
at Information Laboratory, the Federal University of Paraná, Curitiba, Brazil.
The three LU factorization algorithms in Section 3 were tested on some sparse rectangular matrices from the Harwell-Boeing sparse matrix collection by Duff, Grimes and Lewis
ETNA
Kent State University
etna@mcs.kent.edu
A. Björck and J. Y. Yuan
33
[5]. The numerical results are given in Tables 1–3 for the case = 0. The CPU time is the
time in seconds for the factorization and m and n are the numbers of rows and columns of
sparse matrix A. NZ is the number of nonzero elements in matrix A and NZ 2 the number of
nonzero elements in the submatrix A2 . NZL and NZU are the numbers of nonzero elements
of the triangular factors L and U in the LU decomposition of A1 .
We also did numerical comparisons between the conjugate gradient method and the preconditioned conjugate gradient method with the preconditioner A1 selected by algorithms in
the previous section for the generalized least squares problems (GLSP) (5) where A and
W are Hilbert matrices and b generated by random number, with convergence tolerance
= 0:0000001. The comparison results were given in Table 4.
Algorithm SP1 gives almost the same number of nonzero elements in the LU decomposition as SP2, but needs much more CPU time. Algorithms SP2 and SP3 are faster than SP1
by a factor of 5–10, but have more fill-in. We have also considered 6= 0, e.g., = 10,10 and
10,8. All three algorithms worked for the matrices tested, and gave an LU decomposition of
a nonsingular submatrix A1 . All results have shown that for all algorithms the bigger we
give, the less CPU time is used and the more nonzero elements in the LU decomposition of
A1 result.
It follows from Table 4 that the preconditioned conjugate gradient method with selected
preconditioner A1 is much better than the conjugate gradient method in the number of iterations, CPU time and also precision of solutions.
REFERENCES
[1] J. L. BARLOW, N. K. N ICHOLS AND R. J. P LEMMONS , Iterative methods for equality-constrained least
squares problems, SIAM J. Sci. Statist. Comput., 9 (1988), 892–906.
[2] A. B J ÖRCK, Numerical Methods for Least Squares Problems, in Frontiers in Applied Mathematics, SIAM,
1996.
[3] A. B J ÖRCK , T. E LFVING AND Z. S TRAKO S̆ , Stability of conjugate gradient and Lanczos methods for
linear least squares problems, SIAM. J. Matrix Anal. Appl., 19 (1998), 720–736.
[4] Y. T. C HEN, Iterative methods for linear least squares problems, Technical Report CS-75-04, University
of Waterloo, Canada, 1975.
[5] I. S. D UFF , R. G. G RIMES AND J. G. L EWIS , User’s guide for Harwell-Boeing sparse matrix test problems collection, Technical Report RAL-92-086, Rutherford Appleton Laboratory, 1992.
[6] D. J. E VANS AND C. L I , Numerical aspects of the generalized cg-method applied to least squares problems, Computing, 41 (1989), 171–178.
[7]
, The theoretical aspects of the gcg-method applied to least-squares problems, Inter. J. Comput.
Math., 35 (1990), 207–229.
[8] R. F REUND, A note on two block SOR methods for sparse least squares problems, Linear Algebra Appl.,
88/89 (1987), 211–221.
[9] J. R. G ILBERT AND T. P EIERLS , Sparse partial pivoting in time proportional to arithmetic operations,
SIAM J. Sci. Statist. Comput., 9 (1988), 862–874.
[10] A. J ENNINGS AND M. A. A JIZ, Incomplete methods for solving AT Ax = b, SIAM J. Sci. Statist. Comput., 5 (1984), 978–987.
[11] P. L ÄUCHLI , Jordan-Elimination und Ausgleichung nach kleinsten Quadraten, Numer. Math., 3 (1961),
226–240.
[12] T. L. M ARKHAM , M. N EUMANN AND R. J. P LEMMONS , Convergence of a direct-iterative method for
large-scale least-squares problems, Linear Algebra Appl., 69 (1985), 155–167.
[13] W. N IETHAMMER , J. DE P ILLIS AND R. S. VARGA, Convergence of block iterative methods applied to
sparse least-squares problems, Linear Algebra Appl., 58 (1984), 327–341.
[14] C. C. PAIGE, Fast numerically stable computations for generalized least squares problems, SIAM J. Numer. Anal., 16 (1979), 165–171.
[15] E. P. PAPADOPOULOU , Y. G. S ARIDAKIS AND T. S. PAPATHEODOROU, Block AOR iterative schemes for
large-scale least-squares problems, SIAM J. Numer. Anal., 26 (1989), 637–660.
[16] M. A. S AUNDERS , Sparse least squares by conjugate gradients: a comparison of preconditioning methods,
in Proceedings of Computer Science and Statistics: Twelfth Annual Conference on the Interface,
Waterloo, Canada, 1979.
ETNA
Kent State University
etna@mcs.kent.edu
34
Preconditions for Least Squares Problems
[17] J.-Y. Y UAN, The convergence of the 2-block SAOR method for the least-squares problem, Appl. Numer.
Math., 11 (1993), 429–441.
[18]
, Iterative Methods for the Generalized Least-Squares Problem, Ph.D. thesis, Instituto de
Matemática Pura e Aplicada, Rio de Janeiro, Brazil, 1993.
, Numerical methods for generalized least-squares problems, J. Comp. Appl. Math., 66 (1996),
[19]
571–584.
[20] J.-Y. Y UAN AND A.-N. I USEM, Preconditioned conjugate gradient method for generalized least squares
problems, J. Comp. Appl. Math., 71 (1996), 287–297.
ETNA
Kent State University
etna@mcs.kent.edu
35
A. Björck and J. Y. Yuan
TABLE 4.1
Algorithm SP1 with = 0
Matrix
ILLC1033
WELL1033
ILLC1850
WELL1850
m
n
NA
1033
1033
1850
1850
320
320
712
712
4732
4732
8758
8758
NA2
3254
3257
5378
5388
NAL NAU CPU
1100
1243
3603
4152
1774
1752
4927
5301
8.96875
8.17578
144.891
153.930
TABLE 4.2
Algorithm SP2 with = 0
Matrix
ILLC1033
WELL1033
ILLC1850
WELL1850
m
n
NA
1033
1033
1850
1850
320
320
712
712
4732
4732
8758
8758
NA2
3214
3214
5377
5377
NAL NAU CPU
1623
1802
8652
9073
654
844
2834
2627
1.82031
1.02344
25.9297
11.6280
TABLE 4.3
Algorithm SP3 with = 0
Matrix
ILLC1033
WELL1033
ILLC1850
WELL1850
m
n
NA
1033
1033
1850
1850
320
320
712
712
4732
4732
8758
8758
NA2
3253
3256
5379
5388
NAL NAU CPU
2865
3100
13054
13058
2450
2976
12579
12581
1.92188
1.80078
11.4609
12.7108
TABLE 4.4
Comparison between CG and PCG for GLSP
m n
9
9
9
9
8
8
8
8
8
7
6
5
8
7
6
5
CG
IT
21
14
10
7
20
14
11
7
CPU
e
0.18662
0.19877
0.22156
0.15960
0.21428
0.22700
0.14108
0.21004
0.2794D-08
0.6519D-08
0.6519D-08
0.3725D-08
0.1397D-07
0.2352D-07
0.6985D-09
0.9313D-08
PCG
IT
1
4
7
1
0
1
4
6
CPU
e
0.0055
0.0055
0.0396
0.0055
0.0055
0.0055
0.0391
0.0395
0.1434D-07
0.5560D-07
0.3817D-07
0.1368D-06
0.1760D-08
0.1499D-07
0.7903D-08
0.7047D-08
Download