An Algorithm for the Parallel Computation of Subsets of

advertisement
An Algorithm for the Parallel Computation of Subsets of Eigenvalues and
Associated Eigenvectors of Large Symmetric Matrices
using an Array Processor
E. J. Stuart and J. S. Weston
Department of Computing Science, University of Ulster, Coleraine, Northern Ireland.
Abstract
In this paper the parallel implementation on an array processor and the mathematical basis of the POTS algorithm
for the computation of subsets of eigenpairs of a real symmetric matrix of order n, 8n256 is discussed. An adaptation
of the algorithm incorporating an acceleration technique is presented and contrasted with the original. Finally, the
execution time efficiency of the algorithms for the computation of partial eigensolutions of a variety of matrices is
presented, analysed and compared to that of a parallel Lanczos algorithm for the computation of subsets of eigenpairs of
real symmetric matrices.
Keywords: Eigenvalues, symmetric matrices, array processors, orthogonal transformations.
1: Introduction
The eigensolution of a real symmetric (n*n) matrix A is a real (n*n) diagonal matrix D and an (n*n) orthogonal matrix
Q which satisfy the equation
QT . A . Q = D
(1)
where the eigenvalues of A are the diagonal elements of D and the eigenvectors of A are the columns of Q. However, in
many scientific and engineering applications, only a few eigenvalues and their corresponding eigenvectors need be
evaluated. Consequently, a significant amount of research is now being expended in the development and parallel
implementation of algorithms for the computation of subsets of eigenvalues and eigenvectors [1],[2]. It is to be expected
that the use of such algorithms, in those instances where only a few eigenvalues and their corresponding eigenvectors are
required, would be more efficient and more cost effective than the use of any standard algorithm to compute the complete
eigensolution from which the required information could be extracted.
In this paper a detailed discussion of the POTS algorithm, which permits the computation of a subset of
eigenvalues and the corresponding subset of eigenvectors, is presented. The computed eigenvalues may correspond to the
numerically largest, the numerically smallest, or those closest to a given numerical value. The algorithm is based on the
simultaneous iteration method of Clint and Jennings [3] and has been developed, sequentially implemented, and evaluated
for subsets of matrices of order n, 3n21 by Weston [4]. Partial eigensolutions for a variety of matrices were computed
using this algorithm and a selection of the results obtained are presented in tabular form. POTS is also compared to a
parallel Lanczos algorithm [5] for the computation of subsets of eigenpairs of real symmetric matrices and finally some
conclusions are drawn.
2: Parallel Orthogonal Transformations for the computation of Subsets of eigenvalues and
associated eigenvectors of real symmetric matrices (POTS)
When used to compute the ‘m’ numerically largest eigenvalues and their associated eigenvectors, of a real
symmetric matrix A, of order n, POTS takes the following form:
Let Dm be the real diagonal matrix, of order m, whose diagonal components yield the required subset of eigenvalues, let
Qm be the (n*m) orthonormal matrix whose columns represent the associated eigenvectors, and let U0 be an (n*m)
arbitrary orthonormal matrix whose columns represent approximations to these eigenvectors. Construct the sequence of
eigenvector approximations {Uk} as follows:
Uk+1 = ortho ( A . Uk . transform (Bk) ), 0k<x (2)
Uk+1 = ortho ( A . Uk ), xk
(3)
where x is the minimum value such that Bk is diagonal. Bk is defined according to
Bk = (Uk)T . ( A . Uk ) ; 0k<x
(4)
It follows that
limk-> Uk = Qm and QmT . A . Qm = Dm
(5)
The function ortho is an extension of the Gram-Schmidt orthogonalisation process [6] and the function transform returns
a non-orthogonal matrix, of order m, the columns of which represent approximations to the eigenvectors of Bk. These
approximations are constructed as follows:
The matrix Bk is visualised as a set of m.(m-1)/2 overlapping 2*2 symmetric submatrices, each of which contains
two diagonal components of the matrix Bk. If b(i,j) denotes the (i,j)th component of the matrix Bk, then a typical
submatrix, Bk(i,j), is defined by
It follows that the eigenvectors of the submatrix, Bk(i,j), may be expressed as
where
tij = 2bij / (dji+ sign (dji). (dji2 + 4bij2)), i<j
(8)
dji = (bjj - bii) and
(9)
sign (0) = -1
(10)
Thus, the matrix transform (Bk) is formed from the corresponding overlapping of the m.(m-1)/2 submatrices of the
eigenvector matrices Tk(i,j). Observe that the value whose square root is determined in equation (8) is real, and that the
sign of the square root is chosen so as to maximise the magnitude of the denominator in the expression for tij.
Clearly, the algorithm consists of two distinct iterative cycles where the termination of the primary cycle and the
beginning of the secondary cycle is signalled by the diagonalisation of the matrix Bk, since from this point onwards the
function transform returns the identity matrix of order m. The secondary cycle is essentially an extension of Bauer’s
method [7]. Thus, at each stage of this cycle, a set of orthonormal eigenvector approximations, Uk, is premultiplied by the
matrix A and then reorthonormalised, thereby producing a new set of orthonormal eigenvector approximations, Uk+1.
These approximations eventually converge onto the ‘m’ eigenvectors represented by the matrix Qm, provided that the set
Uk is ordered such that the eigenvector approximations ui, 1im, correspond to the eigenvalue approximations taken in
descending order of magnitude.
If li, 1in, are the eigenvalues of A where
|l1 |  | l2 | ...  | lm-1 |  | lm | ...  | ln | (11)
then the overall rate of convergence of the algorithm is determined by the ratio |lm+1| / |lm|.
In the case where m=n, the secondary cycle is not required and the method reduces to the standard POT algorithm [8],
[9], which permits the computation of the complete set of eigenvalues and their associated eigenvectors. It has been
established that all of the operations of the POT algorithm are highly efficient when implemented on an array processor
[10]. It follows that the operations of
POTS exhibit the same characteristics.
2.1: The 'm' numerically smallest eigenvalues
It can be easily shown that the eigenvalues of the matrix A-1 may be expressed in the form
{ 1/l1, 1/l2, ..., 1/ln }
(12)
where in this case the ‘m’ numerically largest eigenvalues are contained in the subset
{ 1/ln, 1/ln-1, ..., 1/ln-m+1 }
(13)
Clearly, the reciprocals of the elements of this subset yield the ‘m’ numerically smallest eigenvalues of the matrix A.
Consequently, the ‘m’ numerically smallest eigenvalues of A may be evaluated by first computing the ‘m’ numerically
largest eigenvalues of A-1 and then inverting each of these values.
The algorithm described in §2.0 may be deployed in this process where the matrix A is replaced by the matrix A-1. In this
case the overall rate of convergence is determined by the ratio |ln-m+1 | / |ln-m |
In the implementation of this revised algorithm the product
Vk = A-1 . Uk
(14)
must be evaluated at each iteration. Since matrix inversion is a computationally expensive process, the expression in (14)
may be rearranged to yield the system of linear equations
A . Vk = Uk
(15)
which may be solved using LU decomposition followed by the use of forward and backward substitution techniques.
Observe that the decomposition of A, the major computational overhead in this approach, is performed only once.
Consequently, in each iteration of both cycles of the revised algorithm, the computation of Vk, k>0, ascribes to one
forward and one backward substitution.
2.2: The 'm' eigenvalues closest to a scalar ‘p’
If ‘p’ is any scalar then it is easily shown that the elements of the sets
{ l1 - p, l2 - p, ..., ln - p }
(16)
and
{ 1/ (l1 - p), 1/ (l2 - p), ..., 1/(ln - p) }
(17)
are the eigenvalues of the matrices (A-pI) and (A-pI)-1, respectively, where I is the unit matrix of order n. It follows that
the ‘m’ numerically largest eigenvalues of the matrix (A-pI)-1 are identical to the reciprocals of the ‘m’ numerically
smallest eigenvalues of the matrix
(A-pI), which in turn correspond to the ‘m’ eigenvalues of A which are closest to
the scalar ‘p’. Consequently, the required eigenvalues may be evaluated by first computing the reciprocal value of each of
the 'm' numerically largest eigenvalues of the matrix (A-pI)-1 and then adding the value of ‘p’ to each. Clearly, the
algorithm described in §2.0 may be deployed in this process where the matrix A is replaced by the matrix (A-pI)-1.
Observe that when p=0 this reduces to the case discussed in §2.1 above. Furthermore, when (A-pI) is singular the solution
of the set of linear equations
(A-pI) . Vk = Uk
(18)
is either inconsistent, or consistent with infinitely many solutions. This degeneracy can be alleviated by increasing or
decreasing the modulus of ‘p’ by a small amount, effectively yielding a shift of origin.
2.3: The acceleration process
Equation (2) shows that, at each stage of the construction of the sequence of eigenvector approximations {Uk}, Uk is
premultiplied by either A, A-1, or (A-pI)-1, according to whether the required subset of eigenvalues corresponds to the
numerically largest, the numerically smallest, or those which are closest to a given numerical value ‘p’. These matrix
multiplications help to accentuate the dominance of the eigenvectors onto which the eigenvector approximations are
converging. It may be anticipated that a further premultiplication by the appropriate matrix at this stage, would further
accentuate this eigenvector dominance, thereby enabling a faster overall rate of convergence. Hence, another version of
the POTS algorithm, POTS2, which incorporated such a premultiplication, was implemented and compared to the
original.
Observe that a version of POTS which does not incorporate any premultiplications whatsoever could be constructed and
compared to the versions discussed in this paper.
3: The Mathematical Basis of POTS
A brief analysis of the mathematical basis of POTS, as defined in §2.0, is presented below.
Essentially POTS is an iterative algorithm where, during the course of each iteration in either cycle, new and better sets of
eigenvalue and eigenvector approximations are constructed. Consider first the process of constructing the (k+1)th sets of
approximations in an iteration of the first cycle:
Since A is a real symmetric matrix its eigenvectors are linearly independent. Consequently, each of the eigenvector
approximations represented by the columns of the matrix Uk may be expressed as a linear combination of these
eigenvectors. Thus, Uk may be expressed in the form:
Uk = Qm.Cm + Qn-m.Cn-m
(19)
where Qm is the (n*m) matrix whose columns represent the m dominant eigenvectors of A, Qn-m is the (n*(n-m)) matrix
whose columns represent the (n-m) subdominant eigenvectors of A, and Cm and Cn-m are matrices of unknown coefficients
of order (m*m) and ((n-m)*m), respectively. Since Uk is an orthonormal set, it follows that
UkT.Uk = (CmT.QmT + Cn-mT.Qn-mT) .
(Qm.Cm + Qn-m.Cn-m) = Im
(20)
where Im is the identity matrix of order m. Using the orthonormal properties of the eigenvectors of A, this reduces to
CmT.Cm + Cn-mT.Cn-m = Im
(21)
The matrix Bk is defined according to
Bk = UkT. A. Uk
= (CmT.QmT + Cn-mT.Qn-mT) . A .
(Qm.Cm + Qn-m.Cn-m)
(22)
If Dm and Dn-m are diagonal matrices of order m and (n-m), respectively, whose non-zero elements are the m dominant
and (n-m) subdominant eigenvalues of A, respectively, then
A.Qm = Qm.Dm
and
(23)
A.Qn-m = Qn-m.Dn-m
(24)
It follows that equation (22) simplifies to
Bk = CmT.Dm.Cm + Cn-mT.Dn-m. Cn-m (25)
If we assume that the initial set of eigenvector approximations, U0, is only slightly contaminated by the (n-m)
subdominant eigenvectors represented by Qn-m, and in addition, that the Euclidean norm of A is less than unity, then
equations (21) and (25), respectively, may be replaced by the approximate equalities
CmT.Cm = Im and
(26)
Bk = CmT.Dm.Cm
(27)
The new set of eigenvector approximations Uk+1 is constructed by orthonormalising the columns of the matrix Wk,
where
Wk= A . Uk . transform(Bk)
= A . (Qm.Cm + Qn-m.Cn-m).transform(Bk)
= (Qm.Dm.Cm+Qn-m.Dn-m.Cn-m).transform(Bk) (28)
Let transform(Bk) be defined according to
transform(Bk) = Cm-1
(29)
It follows that equation (28) reduces to
Wk = Qm.Dm + Qn-m.Dn-m.Cn-m. Cm-1
(30)
which yields a new improved, but non-orthogonal set of approximations to the m dominant eigenvectors of the matrix A.
The function ortho, when applied to this set, produces the next set of orthonormal eigenvector approximations, Uk+1.
It can easily be deduced from equations (26) and (27) that the matrix of right eigenvectors of the matrix Bk
corresponds to the matrix Cm-1. The construction of approximations to these eigenvectors has already been described in
§2.0.
If the set Uk+1 is a sufficiently good approximation to the dominant set of eigenvectors Qm of A, then it follows from
equation (22) that Bk+1 must be diagonal. Hence a necessary condition for convergence is the diagonality of Bk. However,
since it is possible to find an orthonormal set of m (< n) vectors, R, where R does not consist of the eigenvectors of A,
which satisfies the property that RT.A.R be diagonal, the diagonality of Bk is not a sufficient condition for the convergence
of the algorithm. Whenever the matrix Bk is diagonal it follows that transform(Bk) is a unit matrix of order m, the diagonal
elements of Bk+1 yield approximations to the m dominant eigenvalues of A, and
Uk+1 = Qm + Qn-m.Cn-m
(31)
It may be deduced from equation (31) that, at this stage of the iteration process, each of the m dominant eigenvector
approximations of A is uncontaminated by any other vector belonging to the dominant subset Qm, but may contain a
component of each of the unrequired subdominant eigenvectors in the set Qn-m. Further, the construction of the matrix
transform(Bk) becomes redundant in subsequent iterations. Thus, the diagonality of Bk signifies the completion of the first
cycle of the algorithm and the commencement of the second.
In the second cycle an extension of Bauer’s algorithm [7] is used to eliminate the vector components of the unwanted
subdominant eigenvectors. Essentially, in each iteration, the set of premultiplied eigenvector approximations A.Uk is
orhonormalised to produce a new set of eigenvector approximations, Uk+1. If the set Uk is ordered so that the vectors ui,
1im, correspond to the eigenvalue approximations taken in descending order of magnitude, then it can be shown that
this iterative process eventually yields the set Qm to some predetermined degree of accuracy. It can also be concluded that
the convergence rate of the algorithm is ultimately determined by the ratio |lm+1|/|lm|.
4: Implementation and numerical experience
Each of the algorithms was implemented on the AMT DAP 510, an array processor with edgesize 32. The language
used was Fortran Plus Enhanced [11], a language which permits the programmer to disregard the edgesize of the machine.
Real variables are declared to be of type REAL*8 and for simplicity, the columns of the matrix U0 are equated to the first
m columns of the identity matrix of order n. The diagonality of the matrix Bk is determined when the modulus of all of its
off-diagonal components is less than a predetermined value, 1, which is usually taken to be 0.5*10-14. Termination of the
secondary cycle is deemed to have occured when the modulus of each component of the difference between two
successive eigenvector approximations, Uk+1 and Uk, is less than another predetermined value, 2, usually taken to be
0.5*10-8. The Lanczos algorithm used for comparative purposes is one of the computational variants discussed by
Paige[5], the implementation of which incorporates the QR algorithm. The algorithms were used to compute partial
eigensolutions for a collection of matrices and a selection of the results obtained are presented in Tables 1-4.
Table 1 presents the results which were obtained when POTS was used to compute partial eigensolutions for three
matrices taken from Gregory and Karney [12]. The first matrix is the well known Rosser matrix which has eigenvalues
{1020.0490184, -1020.0490184, 1020, 1019.9019514, 1000, 1000, 0.0980486, 0.0 }, the second matrix has well
separated eigenvalues, and the third matrix has one zero eigenvalue and ten pairs of eigenvalues, where the eigenvalues of
each pair are equal in modulus but opposite in sign.
Table 2 presents the results which were obtained when POTS was used to compute a selection of partial eigensolutions
for large matrices, the elements of which were selected at random to lie in the range (-250,250). Subsequently, the results
presented in Table 3 provide data from which a comparison between POTS and POTS2 may be made, the examples used
being identical to those used for Table 2. Finally, Table 4 presents the results which were obtained when both POTS and
Lanczos were used to compute the ‘m’ numerically largest eigenvalues of the given matrices.
In all tables the subset of eigenvalues closest to |l1| is deemed to represent the set of ‘m’ numerically largest eigenvalues.
5: Conclusions
It is clear from Table 1 that POTS was sucessfully used to compute partial eigensolutions of small matrices where the
eigenvalue subsets contained well separated eigenvalues, a zero eigenvalue, close and equal eigenvalues, and eigenvalues
equal in modulus but opposite in sign, respectively. Observe that the smaller the convergence ratio, the better the
convergence characteristics and that as this ratio approaches unity the algorithm becomes very inefficient.
The results for partial eigensolutions of large matrices, which are presented in Table 2, confirm the observations made
above for the computation of partial eigensolutions of small matrices. Furthermore, partial eigensolutions whose order
exceeded the edgesize of the machine were also efficiently computed. For each of the examples in this table it was
observed that the time taken to compute the complete eigensolution using the standard POT algorithm [8],[9] proved to
be considerably greater than the time taken to compute the corresponding partial eigensolution. This was not the case for
each of the smaller matrices.
It may be observed from Table 3 that POTS2 is marginally more efficient than POTS. For the examples chosen, the
extra premultiplication which occurs in POTS2 produces a significant reduction in the number of iterations taken in the
first cycle. However, the expected gain in efficiency is considerably reduced by the time taken to compute these extra
premultiplications. Since one of the purposes of the first cycle is to produce a relatively good set of eigenvector
approximations which is then used as the initial set of approximations in the second cycle, it is to be expected that both
methods should differ only slightly in the number of iterations required in the second cycle. Table 3 confirms this
expectation.
It appears from Table 4 that, in general, POTS is more efficient than Lanczos for the computation of a few of the
numerically largest eigenvalues and their corresponding eigenvectors. However, when this is not the case, it has been
observed that the numerically largest eigenvalues of the given matrix tend to be closely clustered and the eigenvalue
spread tends to be quite large. These factors favour the rapid convergence of the Lanczos algorithm. Further, it should be
noted that the use of a different implementation of the Lanczos variant used (e.g. use of the method of bisection instead of
the QR method), or indeed the use of an entirely different variant, may prove to be more efficient than POTS in all cases.
In general, the accuracy of the partial eigensolutions depended on the position of the required within the eigenvalue
spectrum. Thus, provided that the required eigenvalues were not close to |ln|, it was observed that the eigenvalues were
correct to at least ten significant figures and in most of these cases they were correct to at least ten decimal places.
Otherwise, the eigenvalues were found to be correct to at least eight significant figures. In the latter cases accurate
solutions were obtained when the values of 1 and 2 were increased, the upper bounds being 0.5*10-7 and 0.5*10-5,
respectively. Note that the above accuracy was achieved despite the fact that the initial eigenvector approximations, U0,
were not necessarily close to the required eigenvectors Qm.
In conclusion, provided that the ratio of convergence is not close to unity, POTS appears to be an efficient and useful
algorithm for the computation of partial eigensolutions of large symmetric matrices. However, since the ratio of
convergence cannot be determined in advance, further research is required to produce a version of POTS which could be
efficiently used under all circumstances.
Acknowledgement
The research described in this paper was funded by the Department of Education for Northern Ireland and the work
was carried out using the facilities of the Parallel Computer Centre at the Queen’s University of Belfast.
References
[1]
[2]
[3]
[4]
[5]
[6]
[7]
Pini, G., A parallel algorithm for the partial eigensolution of sparse symmetric matrices on the Cray Y-MP, Siam Journal on
Matrix analysis and Applications, 28, 1752-1775, (1991).
Kuczynski, J., and H. Wozniakowski, Estimating the largest Eigenvalue by the Power and Lanczos algorithms with a random
start, Siam Journal on matrix analysis and applications, 13, 1094-1122, (1992)
Clint, M., and A. Jennings, The Evaluation of Eigenvalues and Eigenvectors of Real Symmetric Matrices by Simultaneous
Iteration, Comp. Jnl., 13, 76-80, (1970).
Weston, J. S., An Iterative Method for the computation of Subsets of Eigenvalues and associated Eigenvectors of Real
Symmetric Matrices, M.Sc. Dissertation, The Queen’s University of Belfast, (1975).
Paige, C. C., Computational Variants of the Lanczos Method for the Eigenproblem, J. Inst. Maths Applics., 10, 373-381, (1972).
Stewart, G. W., Introduction to Matrix Computations, Academic Press, New York, (1973).
Bauer, F. L., Das Verfahren der Treppeniteration und Verwandte Verfahren zur Losung Algebraescher Eigenwert-Probleme. Z.
Agnew., Math. Phys., 8, 214-235, (1957).
[8]
Clint, M., Holt, C., Perrott, R. and A. Stewart, A Comparison of two parallel algorithms for the Symmetric Eigenproblem,
International Journal of Comp. Math., 15, 291-302, (1984).
[9] Weston, J. S. and M. Clint, Two algorithms for the parallel computation of Eigenvalues and Eigenvectors of large symmetric
matrices using the ICL DAP, Parallel Computing, 13, 281-288, (1990).
[10] Clint, M., Weston, J. S., and C. W. Bleakney, A Comparison of Two Fortran Dialects for Expressing Solutions for a Problem in
Linear Algebra, to appear in Parallel Computing.
[11] Fortran-Plus Enhanced, man 102.01, AMT (1990).
[12] Gregory, R. T. and D. L. Karney, A collection of Matrices for Testing Computational Algorithms, Wiley-Interscience, (1969).
Download