An Algorithm for the Parallel Computation of Subsets of Eigenvalues and Associated Eigenvectors of Large Symmetric Matrices using an Array Processor E. J. Stuart and J. S. Weston Department of Computing Science, University of Ulster, Coleraine, Northern Ireland. Abstract In this paper the parallel implementation on an array processor and the mathematical basis of the POTS algorithm for the computation of subsets of eigenpairs of a real symmetric matrix of order n, 8n256 is discussed. An adaptation of the algorithm incorporating an acceleration technique is presented and contrasted with the original. Finally, the execution time efficiency of the algorithms for the computation of partial eigensolutions of a variety of matrices is presented, analysed and compared to that of a parallel Lanczos algorithm for the computation of subsets of eigenpairs of real symmetric matrices. Keywords: Eigenvalues, symmetric matrices, array processors, orthogonal transformations. 1: Introduction The eigensolution of a real symmetric (n*n) matrix A is a real (n*n) diagonal matrix D and an (n*n) orthogonal matrix Q which satisfy the equation QT . A . Q = D (1) where the eigenvalues of A are the diagonal elements of D and the eigenvectors of A are the columns of Q. However, in many scientific and engineering applications, only a few eigenvalues and their corresponding eigenvectors need be evaluated. Consequently, a significant amount of research is now being expended in the development and parallel implementation of algorithms for the computation of subsets of eigenvalues and eigenvectors [1],[2]. It is to be expected that the use of such algorithms, in those instances where only a few eigenvalues and their corresponding eigenvectors are required, would be more efficient and more cost effective than the use of any standard algorithm to compute the complete eigensolution from which the required information could be extracted. In this paper a detailed discussion of the POTS algorithm, which permits the computation of a subset of eigenvalues and the corresponding subset of eigenvectors, is presented. The computed eigenvalues may correspond to the numerically largest, the numerically smallest, or those closest to a given numerical value. The algorithm is based on the simultaneous iteration method of Clint and Jennings [3] and has been developed, sequentially implemented, and evaluated for subsets of matrices of order n, 3n21 by Weston [4]. Partial eigensolutions for a variety of matrices were computed using this algorithm and a selection of the results obtained are presented in tabular form. POTS is also compared to a parallel Lanczos algorithm [5] for the computation of subsets of eigenpairs of real symmetric matrices and finally some conclusions are drawn. 2: Parallel Orthogonal Transformations for the computation of Subsets of eigenvalues and associated eigenvectors of real symmetric matrices (POTS) When used to compute the ‘m’ numerically largest eigenvalues and their associated eigenvectors, of a real symmetric matrix A, of order n, POTS takes the following form: Let Dm be the real diagonal matrix, of order m, whose diagonal components yield the required subset of eigenvalues, let Qm be the (n*m) orthonormal matrix whose columns represent the associated eigenvectors, and let U0 be an (n*m) arbitrary orthonormal matrix whose columns represent approximations to these eigenvectors. Construct the sequence of eigenvector approximations {Uk} as follows: Uk+1 = ortho ( A . Uk . transform (Bk) ), 0k<x (2) Uk+1 = ortho ( A . Uk ), xk (3) where x is the minimum value such that Bk is diagonal. Bk is defined according to Bk = (Uk)T . ( A . Uk ) ; 0k<x (4) It follows that limk-> Uk = Qm and QmT . A . Qm = Dm (5) The function ortho is an extension of the Gram-Schmidt orthogonalisation process [6] and the function transform returns a non-orthogonal matrix, of order m, the columns of which represent approximations to the eigenvectors of Bk. These approximations are constructed as follows: The matrix Bk is visualised as a set of m.(m-1)/2 overlapping 2*2 symmetric submatrices, each of which contains two diagonal components of the matrix Bk. If b(i,j) denotes the (i,j)th component of the matrix Bk, then a typical submatrix, Bk(i,j), is defined by It follows that the eigenvectors of the submatrix, Bk(i,j), may be expressed as where tij = 2bij / (dji+ sign (dji). (dji2 + 4bij2)), i<j (8) dji = (bjj - bii) and (9) sign (0) = -1 (10) Thus, the matrix transform (Bk) is formed from the corresponding overlapping of the m.(m-1)/2 submatrices of the eigenvector matrices Tk(i,j). Observe that the value whose square root is determined in equation (8) is real, and that the sign of the square root is chosen so as to maximise the magnitude of the denominator in the expression for tij. Clearly, the algorithm consists of two distinct iterative cycles where the termination of the primary cycle and the beginning of the secondary cycle is signalled by the diagonalisation of the matrix Bk, since from this point onwards the function transform returns the identity matrix of order m. The secondary cycle is essentially an extension of Bauer’s method [7]. Thus, at each stage of this cycle, a set of orthonormal eigenvector approximations, Uk, is premultiplied by the matrix A and then reorthonormalised, thereby producing a new set of orthonormal eigenvector approximations, Uk+1. These approximations eventually converge onto the ‘m’ eigenvectors represented by the matrix Qm, provided that the set Uk is ordered such that the eigenvector approximations ui, 1im, correspond to the eigenvalue approximations taken in descending order of magnitude. If li, 1in, are the eigenvalues of A where |l1 | | l2 | ... | lm-1 | | lm | ... | ln | (11) then the overall rate of convergence of the algorithm is determined by the ratio |lm+1| / |lm|. In the case where m=n, the secondary cycle is not required and the method reduces to the standard POT algorithm [8], [9], which permits the computation of the complete set of eigenvalues and their associated eigenvectors. It has been established that all of the operations of the POT algorithm are highly efficient when implemented on an array processor [10]. It follows that the operations of POTS exhibit the same characteristics. 2.1: The 'm' numerically smallest eigenvalues It can be easily shown that the eigenvalues of the matrix A-1 may be expressed in the form { 1/l1, 1/l2, ..., 1/ln } (12) where in this case the ‘m’ numerically largest eigenvalues are contained in the subset { 1/ln, 1/ln-1, ..., 1/ln-m+1 } (13) Clearly, the reciprocals of the elements of this subset yield the ‘m’ numerically smallest eigenvalues of the matrix A. Consequently, the ‘m’ numerically smallest eigenvalues of A may be evaluated by first computing the ‘m’ numerically largest eigenvalues of A-1 and then inverting each of these values. The algorithm described in §2.0 may be deployed in this process where the matrix A is replaced by the matrix A-1. In this case the overall rate of convergence is determined by the ratio |ln-m+1 | / |ln-m | In the implementation of this revised algorithm the product Vk = A-1 . Uk (14) must be evaluated at each iteration. Since matrix inversion is a computationally expensive process, the expression in (14) may be rearranged to yield the system of linear equations A . Vk = Uk (15) which may be solved using LU decomposition followed by the use of forward and backward substitution techniques. Observe that the decomposition of A, the major computational overhead in this approach, is performed only once. Consequently, in each iteration of both cycles of the revised algorithm, the computation of Vk, k>0, ascribes to one forward and one backward substitution. 2.2: The 'm' eigenvalues closest to a scalar ‘p’ If ‘p’ is any scalar then it is easily shown that the elements of the sets { l1 - p, l2 - p, ..., ln - p } (16) and { 1/ (l1 - p), 1/ (l2 - p), ..., 1/(ln - p) } (17) are the eigenvalues of the matrices (A-pI) and (A-pI)-1, respectively, where I is the unit matrix of order n. It follows that the ‘m’ numerically largest eigenvalues of the matrix (A-pI)-1 are identical to the reciprocals of the ‘m’ numerically smallest eigenvalues of the matrix (A-pI), which in turn correspond to the ‘m’ eigenvalues of A which are closest to the scalar ‘p’. Consequently, the required eigenvalues may be evaluated by first computing the reciprocal value of each of the 'm' numerically largest eigenvalues of the matrix (A-pI)-1 and then adding the value of ‘p’ to each. Clearly, the algorithm described in §2.0 may be deployed in this process where the matrix A is replaced by the matrix (A-pI)-1. Observe that when p=0 this reduces to the case discussed in §2.1 above. Furthermore, when (A-pI) is singular the solution of the set of linear equations (A-pI) . Vk = Uk (18) is either inconsistent, or consistent with infinitely many solutions. This degeneracy can be alleviated by increasing or decreasing the modulus of ‘p’ by a small amount, effectively yielding a shift of origin. 2.3: The acceleration process Equation (2) shows that, at each stage of the construction of the sequence of eigenvector approximations {Uk}, Uk is premultiplied by either A, A-1, or (A-pI)-1, according to whether the required subset of eigenvalues corresponds to the numerically largest, the numerically smallest, or those which are closest to a given numerical value ‘p’. These matrix multiplications help to accentuate the dominance of the eigenvectors onto which the eigenvector approximations are converging. It may be anticipated that a further premultiplication by the appropriate matrix at this stage, would further accentuate this eigenvector dominance, thereby enabling a faster overall rate of convergence. Hence, another version of the POTS algorithm, POTS2, which incorporated such a premultiplication, was implemented and compared to the original. Observe that a version of POTS which does not incorporate any premultiplications whatsoever could be constructed and compared to the versions discussed in this paper. 3: The Mathematical Basis of POTS A brief analysis of the mathematical basis of POTS, as defined in §2.0, is presented below. Essentially POTS is an iterative algorithm where, during the course of each iteration in either cycle, new and better sets of eigenvalue and eigenvector approximations are constructed. Consider first the process of constructing the (k+1)th sets of approximations in an iteration of the first cycle: Since A is a real symmetric matrix its eigenvectors are linearly independent. Consequently, each of the eigenvector approximations represented by the columns of the matrix Uk may be expressed as a linear combination of these eigenvectors. Thus, Uk may be expressed in the form: Uk = Qm.Cm + Qn-m.Cn-m (19) where Qm is the (n*m) matrix whose columns represent the m dominant eigenvectors of A, Qn-m is the (n*(n-m)) matrix whose columns represent the (n-m) subdominant eigenvectors of A, and Cm and Cn-m are matrices of unknown coefficients of order (m*m) and ((n-m)*m), respectively. Since Uk is an orthonormal set, it follows that UkT.Uk = (CmT.QmT + Cn-mT.Qn-mT) . (Qm.Cm + Qn-m.Cn-m) = Im (20) where Im is the identity matrix of order m. Using the orthonormal properties of the eigenvectors of A, this reduces to CmT.Cm + Cn-mT.Cn-m = Im (21) The matrix Bk is defined according to Bk = UkT. A. Uk = (CmT.QmT + Cn-mT.Qn-mT) . A . (Qm.Cm + Qn-m.Cn-m) (22) If Dm and Dn-m are diagonal matrices of order m and (n-m), respectively, whose non-zero elements are the m dominant and (n-m) subdominant eigenvalues of A, respectively, then A.Qm = Qm.Dm and (23) A.Qn-m = Qn-m.Dn-m (24) It follows that equation (22) simplifies to Bk = CmT.Dm.Cm + Cn-mT.Dn-m. Cn-m (25) If we assume that the initial set of eigenvector approximations, U0, is only slightly contaminated by the (n-m) subdominant eigenvectors represented by Qn-m, and in addition, that the Euclidean norm of A is less than unity, then equations (21) and (25), respectively, may be replaced by the approximate equalities CmT.Cm = Im and (26) Bk = CmT.Dm.Cm (27) The new set of eigenvector approximations Uk+1 is constructed by orthonormalising the columns of the matrix Wk, where Wk= A . Uk . transform(Bk) = A . (Qm.Cm + Qn-m.Cn-m).transform(Bk) = (Qm.Dm.Cm+Qn-m.Dn-m.Cn-m).transform(Bk) (28) Let transform(Bk) be defined according to transform(Bk) = Cm-1 (29) It follows that equation (28) reduces to Wk = Qm.Dm + Qn-m.Dn-m.Cn-m. Cm-1 (30) which yields a new improved, but non-orthogonal set of approximations to the m dominant eigenvectors of the matrix A. The function ortho, when applied to this set, produces the next set of orthonormal eigenvector approximations, Uk+1. It can easily be deduced from equations (26) and (27) that the matrix of right eigenvectors of the matrix Bk corresponds to the matrix Cm-1. The construction of approximations to these eigenvectors has already been described in §2.0. If the set Uk+1 is a sufficiently good approximation to the dominant set of eigenvectors Qm of A, then it follows from equation (22) that Bk+1 must be diagonal. Hence a necessary condition for convergence is the diagonality of Bk. However, since it is possible to find an orthonormal set of m (< n) vectors, R, where R does not consist of the eigenvectors of A, which satisfies the property that RT.A.R be diagonal, the diagonality of Bk is not a sufficient condition for the convergence of the algorithm. Whenever the matrix Bk is diagonal it follows that transform(Bk) is a unit matrix of order m, the diagonal elements of Bk+1 yield approximations to the m dominant eigenvalues of A, and Uk+1 = Qm + Qn-m.Cn-m (31) It may be deduced from equation (31) that, at this stage of the iteration process, each of the m dominant eigenvector approximations of A is uncontaminated by any other vector belonging to the dominant subset Qm, but may contain a component of each of the unrequired subdominant eigenvectors in the set Qn-m. Further, the construction of the matrix transform(Bk) becomes redundant in subsequent iterations. Thus, the diagonality of Bk signifies the completion of the first cycle of the algorithm and the commencement of the second. In the second cycle an extension of Bauer’s algorithm [7] is used to eliminate the vector components of the unwanted subdominant eigenvectors. Essentially, in each iteration, the set of premultiplied eigenvector approximations A.Uk is orhonormalised to produce a new set of eigenvector approximations, Uk+1. If the set Uk is ordered so that the vectors ui, 1im, correspond to the eigenvalue approximations taken in descending order of magnitude, then it can be shown that this iterative process eventually yields the set Qm to some predetermined degree of accuracy. It can also be concluded that the convergence rate of the algorithm is ultimately determined by the ratio |lm+1|/|lm|. 4: Implementation and numerical experience Each of the algorithms was implemented on the AMT DAP 510, an array processor with edgesize 32. The language used was Fortran Plus Enhanced [11], a language which permits the programmer to disregard the edgesize of the machine. Real variables are declared to be of type REAL*8 and for simplicity, the columns of the matrix U0 are equated to the first m columns of the identity matrix of order n. The diagonality of the matrix Bk is determined when the modulus of all of its off-diagonal components is less than a predetermined value, 1, which is usually taken to be 0.5*10-14. Termination of the secondary cycle is deemed to have occured when the modulus of each component of the difference between two successive eigenvector approximations, Uk+1 and Uk, is less than another predetermined value, 2, usually taken to be 0.5*10-8. The Lanczos algorithm used for comparative purposes is one of the computational variants discussed by Paige[5], the implementation of which incorporates the QR algorithm. The algorithms were used to compute partial eigensolutions for a collection of matrices and a selection of the results obtained are presented in Tables 1-4. Table 1 presents the results which were obtained when POTS was used to compute partial eigensolutions for three matrices taken from Gregory and Karney [12]. The first matrix is the well known Rosser matrix which has eigenvalues {1020.0490184, -1020.0490184, 1020, 1019.9019514, 1000, 1000, 0.0980486, 0.0 }, the second matrix has well separated eigenvalues, and the third matrix has one zero eigenvalue and ten pairs of eigenvalues, where the eigenvalues of each pair are equal in modulus but opposite in sign. Table 2 presents the results which were obtained when POTS was used to compute a selection of partial eigensolutions for large matrices, the elements of which were selected at random to lie in the range (-250,250). Subsequently, the results presented in Table 3 provide data from which a comparison between POTS and POTS2 may be made, the examples used being identical to those used for Table 2. Finally, Table 4 presents the results which were obtained when both POTS and Lanczos were used to compute the ‘m’ numerically largest eigenvalues of the given matrices. In all tables the subset of eigenvalues closest to |l1| is deemed to represent the set of ‘m’ numerically largest eigenvalues. 5: Conclusions It is clear from Table 1 that POTS was sucessfully used to compute partial eigensolutions of small matrices where the eigenvalue subsets contained well separated eigenvalues, a zero eigenvalue, close and equal eigenvalues, and eigenvalues equal in modulus but opposite in sign, respectively. Observe that the smaller the convergence ratio, the better the convergence characteristics and that as this ratio approaches unity the algorithm becomes very inefficient. The results for partial eigensolutions of large matrices, which are presented in Table 2, confirm the observations made above for the computation of partial eigensolutions of small matrices. Furthermore, partial eigensolutions whose order exceeded the edgesize of the machine were also efficiently computed. For each of the examples in this table it was observed that the time taken to compute the complete eigensolution using the standard POT algorithm [8],[9] proved to be considerably greater than the time taken to compute the corresponding partial eigensolution. This was not the case for each of the smaller matrices. It may be observed from Table 3 that POTS2 is marginally more efficient than POTS. For the examples chosen, the extra premultiplication which occurs in POTS2 produces a significant reduction in the number of iterations taken in the first cycle. However, the expected gain in efficiency is considerably reduced by the time taken to compute these extra premultiplications. Since one of the purposes of the first cycle is to produce a relatively good set of eigenvector approximations which is then used as the initial set of approximations in the second cycle, it is to be expected that both methods should differ only slightly in the number of iterations required in the second cycle. Table 3 confirms this expectation. It appears from Table 4 that, in general, POTS is more efficient than Lanczos for the computation of a few of the numerically largest eigenvalues and their corresponding eigenvectors. However, when this is not the case, it has been observed that the numerically largest eigenvalues of the given matrix tend to be closely clustered and the eigenvalue spread tends to be quite large. These factors favour the rapid convergence of the Lanczos algorithm. Further, it should be noted that the use of a different implementation of the Lanczos variant used (e.g. use of the method of bisection instead of the QR method), or indeed the use of an entirely different variant, may prove to be more efficient than POTS in all cases. In general, the accuracy of the partial eigensolutions depended on the position of the required within the eigenvalue spectrum. Thus, provided that the required eigenvalues were not close to |ln|, it was observed that the eigenvalues were correct to at least ten significant figures and in most of these cases they were correct to at least ten decimal places. Otherwise, the eigenvalues were found to be correct to at least eight significant figures. In the latter cases accurate solutions were obtained when the values of 1 and 2 were increased, the upper bounds being 0.5*10-7 and 0.5*10-5, respectively. Note that the above accuracy was achieved despite the fact that the initial eigenvector approximations, U0, were not necessarily close to the required eigenvectors Qm. In conclusion, provided that the ratio of convergence is not close to unity, POTS appears to be an efficient and useful algorithm for the computation of partial eigensolutions of large symmetric matrices. However, since the ratio of convergence cannot be determined in advance, further research is required to produce a version of POTS which could be efficiently used under all circumstances. Acknowledgement The research described in this paper was funded by the Department of Education for Northern Ireland and the work was carried out using the facilities of the Parallel Computer Centre at the Queen’s University of Belfast. References [1] [2] [3] [4] [5] [6] [7] Pini, G., A parallel algorithm for the partial eigensolution of sparse symmetric matrices on the Cray Y-MP, Siam Journal on Matrix analysis and Applications, 28, 1752-1775, (1991). Kuczynski, J., and H. Wozniakowski, Estimating the largest Eigenvalue by the Power and Lanczos algorithms with a random start, Siam Journal on matrix analysis and applications, 13, 1094-1122, (1992) Clint, M., and A. Jennings, The Evaluation of Eigenvalues and Eigenvectors of Real Symmetric Matrices by Simultaneous Iteration, Comp. Jnl., 13, 76-80, (1970). Weston, J. S., An Iterative Method for the computation of Subsets of Eigenvalues and associated Eigenvectors of Real Symmetric Matrices, M.Sc. Dissertation, The Queen’s University of Belfast, (1975). Paige, C. C., Computational Variants of the Lanczos Method for the Eigenproblem, J. Inst. Maths Applics., 10, 373-381, (1972). Stewart, G. W., Introduction to Matrix Computations, Academic Press, New York, (1973). Bauer, F. L., Das Verfahren der Treppeniteration und Verwandte Verfahren zur Losung Algebraescher Eigenwert-Probleme. Z. Agnew., Math. Phys., 8, 214-235, (1957). [8] Clint, M., Holt, C., Perrott, R. and A. Stewart, A Comparison of two parallel algorithms for the Symmetric Eigenproblem, International Journal of Comp. Math., 15, 291-302, (1984). [9] Weston, J. S. and M. Clint, Two algorithms for the parallel computation of Eigenvalues and Eigenvectors of large symmetric matrices using the ICL DAP, Parallel Computing, 13, 281-288, (1990). [10] Clint, M., Weston, J. S., and C. W. Bleakney, A Comparison of Two Fortran Dialects for Expressing Solutions for a Problem in Linear Algebra, to appear in Parallel Computing. [11] Fortran-Plus Enhanced, man 102.01, AMT (1990). [12] Gregory, R. T. and D. L. Karney, A collection of Matrices for Testing Computational Algorithms, Wiley-Interscience, (1969).