The Enhancement of Parallel Numerical Linear Algorithms using a Visualisation Approach E.J.Stuart and J.S.Weston Department of Computing Science, University of Ulster, Coleraine, Northern Ireland. Abstract Visualisation techniques may be used to determine how the components of data structures change throughout the execution of an iterative algorithm. The information presented by the application of such techniques provides the basis for an efficient analysis of the characteristics of the data structures, thereby yielding insights into the overall behaviour of the algorithm. As a result of these insights, the development of an enhanced algorithm becomes a distinct possibility. In this paper, two different approaches to the visualisation of vectors are identified and used in the analysis of an algorithm for the partial solution of the eigenproblem. Consequently, a modified algorithm is developed which is particularly efficient when implemented on an array processor. Finally, the efficiency of the algorithms for the computation of partial eigensolutions of a variety of matrices on an array processor is presented and analysed. 1: Introduction Sommerville[1] introduces the law of continuing change by paraphrasing Lehman and Belady[2] - "A program that is used in a realworld environment necessarily must change or become less and less useful in that environment". This concept of change has become particularly applicable to numerical algorithms since the recent advent of new parallel architectures, the commercial availability of which has provided a stimulus for the further development and enhancement of such algorithms. Visualisation may be considered to play an important role in this process. Treinish [3] states that a variety of different visualisation techniques must be available to either examine a single set of parameters from one source or a number of different parameters from disparate sources. Essentially, such techniques transform numerical data into a geometric form which may be represented by two-dimensional graphics, three-dimensional graphics, or animations. Subsequently, an analysis of the transformed data may yield a deeper understanding of the algorithm, thereby enriching scientific knowledge. However, the greatest potential of visualisation is not the entrancing movies produced by animations, but the knowledge gained and the concepts explained by perception of existing anomalies [4]. Since the human mind has considerable capability to capture and understand change in visualised data [5], this paper considers two approaches to the visual representation of the change which occurs in a single vector during the course of an iterative process. These are then applied to a vector in the context of POTS [6], an iterative algorithm for the computation of subsets of eigenvalues and their associated eigenvectors. The vector under discussion represents approximations to the required eigenvalue subset. An analysis of the visualised data explores the convergence characteristics of the eigenvalue subset and, consequently, a modified POTS algorithm is derived. Partial eigensolutions for a variety of matrices are computed using these algorithms and a selection of the results obtained are presented in tabular form. Finally, some conclusions are drawn. 2: The POTS Algorithm When used to compute the ‘m’ numerically largest eigenvalues and their associated eigenvectors, of a real symmetric matrix A, of order n, POTS takes the following form: ` Let Dm be the real diagonal matrix, of order m, whose diagonal components yield the required subset of eigenvalues, let Qm be the (n m) orthonormal matrix whose columns represent the associated eigenvectors, and let U0 be an (n m) arbitrary orthonormal matrix whose columns represent approximations to these eigenvectors. Construct the sequence of eigenvector approximations {Uk} as follows: Uk+1 = ortho ( A.Uk.transform (Bk)), 0k<x (1) Uk+1 = ortho ( A.Uk ), x k (2) where x is the minimum value such that Bk is diagonal. Bk, the interaction matrix of order mm, is defined according to Bk = (Uk)T . ( A . Uk ) ; 0 k < x (3) It follows that limk U = Qm and QmT . A . Qm = Dm. (4) k The function ortho is an extension of the GramSchmidt orthogonalisation process [7]. It orthogonalises each of the eigenvector approximations according to the descending order of modulus of its corresponding eigenvalue approximations. The function transform returns a non-orthogonal matrix Tk, of order m, the columns of which represent approximations to the eigenvectors of Bk. The (i,j)th component of Tk, tij, is defined by the expression tij = 2bij/(dji + sign (dji). (dji2 + 4bij2)), ij (5) where dji = (bjj - bii) and sign(0) = -1 (6) tji = -tij (7) Observe that the value whose square root is determined in equation (5) is real, and that the sign of the square root is chosen so as to maximise the magnitude of the denominator in the expression for tij. Clearly, the algorithm consists of two distinct iterative cycles where the termination of the primary cycle and the beginning of the secondary cycle is signalled by the diagonalisation of the matrix Bk. One of the main objectives of the primary cycle is the creation of a good set of eigenvector approximations which is subsequently used as input to the secondary cycle, the convergence of this cycle being dependant upon the accuracy of its input. Essentially, the secondary cycle, whose output is the required eigenvector subset, is an extension of Bauer’s method [8]. This subset may then be used in the computation of the required eigenvalue subset. Convergence of the secondary cycle is deemed to have occurred when the difference between the modulus of successive eigenvector approximations is less than a predefined tolerance level. If i, 1in, are the eigenvalues of A where |1| | 2| ... | m-1| | m| ... | n| (8) then the overall rate of convergence of the algorithm is determined by the ratio |m+1| / |m|. Observe that the smaller the convergence ratio, the better the overall convergence characteristics of the algorithm and that as this ratio approaches unity the algorithm becomes very inefficient. Also, in the case where mn, the secondary cycle is not required and the algorithm reduces to the standard POT algorithm [9], [10], which permits the computation of the complete set of eigenvalues and their associated eigenvectors. 2.1: The Data Structures Many data structures which exhibit an informative convergent nature exist within POTS. Three of these are identified below: (i) The diagonal of the matrix Bk is a dynamic vector which represents the eigenvalue approximations. This vector converges onto the required eigenvalue subset. (ii)The real symmetric matrix Bk is a dynamic matrix which converges to diagonal form. (iii)The set of column vectors ui, 1im, forms a dynamic matrix whose columns converge onto the required set of eigenvectors. In order to obtain a better understanding of POTS a thorough investigation of the convergence characteristics of these data structures is required. During the course of such an investigation, questions such as the following would need to be addressed: 2 ` (i)Do the eigenvalue approximations always converge monotonically or do they oscillate? (ii)Do the eigenvalue approximations of largest modulus in the required subset always converge more quickly than those with smaller modulus? (iii)In those cases where the diagonalisation of the matrix Bk requires a large number of iterations, is this always due to a few particularly dominant off-diagonal components? (iv)Is the matrix Bk always diagonalised systematically, i.e. do the off-diagonal elements of Bk become zero row by row or all at once? Clearly these answers could be determined mathematically. However, due to the abundance of data involved this would be a complicated and tedious task. Consequently, since this paper is concerned with the changes which occur in data structures during the course of an iterative process, visualisation techniques have been employed to simplify the interpretation and analysis of the changes which occur in the diagonal of the matrix Bk. applied to the diagonal of the matrix Bk in POTS, thereby, enabling question (ii) of §2.1 to be addressed in a meaningful manner. 70 60 50 40 30 y-axis 20 10 0 -10 1 2 3 4 x-axis S4 5 S5 S3 6 S2 7 S1 Figure 1 The second representation utilised is a ‘2Dline’ representation of a vector which uses a 2D line technique. Figure 2 illustrates this representation for a vector (s1,s2,s3,s4,s5) where the x-axis denotes the iteration number and the y-axis denotes the numerical value of the vector component. The graphs are superimposed to aid the perception of anomalies. In this representation colour or patterns are used to differentiate between each graph depending upon the output media. 3: The Representations Techniques such as depth shading [11], 2D contours and surfaces [12], [13], 3D contours, lines and surfaces [14], [15], [16], and animation [17] are only a subset of the gallery of visualisation techniques currently used in data visualisation. In this paper 2D line, 3D line and 2D surface techniques are used to provide several different representations of the changes which occur in the components of a given vector, thereby enabling two distinct approaches to the visualisation of change to be adopted. The first representation is the ‘area’ representation, which uses 2D surface techniques to show the relative importance of each of the vector components as they change from iteration to iteration. Figure 1 illustrates this representation for a vector (s1,s2,s3,s4,s5), where the x-axis denotes the iteration number and the y-axis denotes the modulus of the vector component. This type of representation could be 200 y-axis 150 100 S1 S3 50 S2 x-axis 0 S5 S4 -50 -100 -150 1 2 3 4 5 6 7 8 9 10 Figure 2 An alternative to the 2D-line representation discussed above is the ‘3D-line’ representation which depicts each of the 2D graphs as a two dimensional ribbon. This is illustrated in Figure 3. This type of representation enhances the 3 ` perception of the 2D-line representation. It can be further enhanced by varying both the viewpoint and the perspective. (iv)The primary cycle of the algorithm exhibits a unique two-stage composition where the transition from the first stage to the second stage occurs when all of the vector components are in the second stage of their composition. The boundary point between the two stages is defined to be the transition point. Figure 4 illustrates these trends for a particular example where the matrices A and Bk are of orders 100 and 8, respectively. 200 y-axis 150 100 50 0 -50 0.6 y-axis -100 1 2 3 4 5 x-axis 6 7 8 9 S1 S2 S3 S4 S5 0.5 -150 0.4 10 0.3 Figure 3 0.2 An important consideration of the above representations is that they may be used to view other aspects of the changing data. Thus, for example, one of the axes in an area representation could denote the iteration number whereas the other axis could represent the numerical differences in the vector components in consecutive iterations of the algorithm. 0.1 0 -0.1 -0.2 1 3 5 7 9 x-axis 11 13 15 17 S1 S3 S5 S7 Figure 4 Since, the 2D-line and 3D-line representations used in this paper superimpose many independent graphs within a single illustration, it follows that the y-axis scale may not be the most appropriate for each individual graph. This may lead to the suppression of activity in some cases. Thus, for example, the fact that the second stage of the primary cycle of POTS exhibits little or no change over a relatively large number of iterations begs the following questions: What is the purpose of this stage of the primary cycle if no significant change occurs in any of the components of the given vector? Is the representation used to examine the change in the vector components appropriate in this instance? If it is assumed that the representation used is inappropriate then, clearly, it is necessary to 4: Two Visualisation Approaches Two approaches to the use of the 2D-line and 3D-line representations were adopted in order to investigate the behaviour of the diagonal of Bk. In the first approach the use of these representations, as illustrated in Figures 2 and 3, enabled the following trends to be detected: (i)The graph for each vector component exhibits a unique two-stage composition. (ii)The first stage is characterised by large amounts of activity during the course of a relatively small number of iterations. (iii)The second stage is characterised by little or no activity during the course of a relatively large number of iterations. 4 ` examine a different aspect of the changing data. This may be accomplished by using a 2D-line representation in which one of the axes represents the numerical differences in the vector components in consecutive iterations of the algorithm. The use of such a representation in the second stage of the primary cycle of POTS constitutes the second approach to visualisation. This approach enabled the following additional trends to be detected: (v)The magnitude of the activity masked by the primary analysis is relatively significant. (vi)The graphs of the components of lesser modulus continue to exhibit a dynamic behaviour for a considerable period of time after the components of greater modulus have ceased to change. investigate fully the behaviour of the diagonal of Bk. 4.1: Algorithm Development and Enhancement Recall that POTS consists of two cycles. In the primary cycle a good set of eigenvector approximations, Uk, is constructed which is then used as input to the secondary cycle. Observe that the transition from one cycle to the next occurs whenever the matrix Bk becomes diagonal. This corresponds to the case where the diagonal components of Bk (which represent the eigenvalue approximations) have ceased to change. However, it is apparent from the observations (i)-(iv) of §4 that the major changes in the modulus of the eigenvalue approximations take place in the first stage of the primary cycle. Consequently, it was hypothesised that sufficiently accurate eigenvector approximations might be available for use as input to the secondary cycle whenever the first stage of the primary cycle is complete. Thus, experiments were carried out to investigate the efficiency of the algorithm when the primary cycle was terminated at the transition point. The trends established in (v) and (vi) of §4 led to a further hypothesis, namely, that a more efficient algorithm might be constructed in which (i) Each of the orthonormal matrices Uk generated in the primary cycle is of order nl, l>m (ii) Each of the interaction matrices Bk generated in the primary cycle is of order ll (iii) Termination of the primary cycle is deemed to have occurred whenever the m components of largest modulus in the diagonal of Bk have ceased to change. The point at which this occurs is defined to be the critical point. (iv) The input to the secondary cycle is the orthonormal matrix Uk, of order nm, whose columns represent the eigenvector approximations corresponding to the m eigenvalue approximations of largest modulus in 0.0002 0.0001 0 16 18 20 22 24 26 28 30 32 34 36 38 40 42 44 46 48 50 -0.0001 x-axis -0.0002 -0.0003 -0.0004 y-axis Figure 5 Figure 5 illustrates these trends for the previous example, where the horizontal axis denotes the iteration number and the vertical axis denotes the numerical difference between two successive vector components. Observe that in Figure 4 all change appears to have ceased after fifteen iterations yet in Figure 5 change continues to occur in the four components of smallest modulus for at least a further thirty-five iterations. Clearly, in the case of POTS, it is necessary to visualise the two stages of the primary cycle from two different viewpoints in order to 5 ` the diagonal of the final matrix Bk in the primary cycle. Thus, experiments were carried out to test this hypothesis. In this case the modified algorithm is known as GPOTS (Greedy POTS). The principle differences between GPOTS and POTS occur in the primary cycle of each - in the former the matrices Uk and Bk are of order nl and ll, respectively, whereas in the latter they are of order nm and mm, respectively, l>m. These differences occur as a result of using l rather than m eigenvector approximations throughout the primary cycle of the developed algorithm. It is anticipated that the number of iterations required in the first cycle of GPOTS will be considerably less than that required in the case of POTS, and further, that the accuracy of the input to the secondary cycle in both cases will be of the same order. The diagonality of the matrix Bk is determined when the modulus of all of its off-diagonal components is less than a predetermined value, m1, usually taken to be 0.510-14. Termination of the secondary cycle is deemed to have occurred when the modulus of each component of the difference between two successive eigenvector approximations, Uk+1 and Uk, is less than another predetermined value, m2, usually taken to be 0.510-8. The algorithms are used to compute partial eigensolutions for a collection of matrices and a selection of the results obtained are presented in Tables 1-4. In many instances the subsets of eigenvalues evaluated contained close and equal eigenvalues, eigenvalues equal in modulus but opposite in sign, and well separated eigenvalues. The components of the matrices were generated randomly to lie in the range (-100,100). Initially, the experiments focused upon the efficiency of the algorithm when the primary cycle was terminated at the transition point. The analysis of the results obtained from these experiments demonstrated that, at the transition point, the eigenvector approximations are not sufficiently accurate for use in the secondary cycle. Subsequently, the efficiency of the algorithm was investigated when the primary cycle was terminated at the critical point within stage 2 of the primary cycle. 5: Numerical Experience. Each of the algorithms is implemented on the AMT DAP 510, an array processor with edgesize 32. The language used is Fortran Plus Enhanced, a language which permits the programmer to disregard the edgesize of the machine. Real variables are declared to be of type REAL8 and for simplicity, the columns of the matrix U0 are equated to the first m (l) columns of the identity matrix of order n in the case of POTS (GPOTS). n m 75 90 125 150 6 12 15 10 GPOTS POTS Iters Total m+1 Iters m Cycle 1 Cycle 2 Time Iters Iters Total Speedup l Cycle 1 Cycle 2 Time (%) 0.847 0.800 0.746 0.700 20 30 25 25 16 44 31 39 76 23 15 10 25.705 29.843 35.398 49.130 24 24 24 26 13 12 8 12 17.995 23.470 29.935 42.583 30.0 21.4 15.4 13.3 Table 1 Table 1 presents the results which are obtained when POTS and GPOTS are used to compute partial eigensolutions of a selection of matrices. For each of these matrices the critical point is that point at which the difference between each of the m components of largest modulus in the 6 ` diagonal of Bk in two successive iterations is less than or equal to 5.010-7. The ratios given in column 3 indicate that POTS should exhibit good convergence characteristics in all of the test cases and the times in columns 6 and 10 are quoted in seconds. n m m+1 m l Total time(s) Speedup (%) 100 100 150 175 14 8 7 5 0.991 1.000 0.993 0.994 60 55 20 30 67.885 46.735 186.100 168.662 71.0 74.3 67.3 84.8 The effect of varying the value of l for a given partial eigenproblem is demonstrated in Table 3. In this case the order of the matrix is 100, m equals 8, and the critical point corresponds to a value of 5.010-10. This table includes the scenario where the value of l is greater than the edgesize of the array processor. Finally, Table 4 presents the case where the critical point corresponds to values chosen to lie between 2.2510-7 and 5.0010-10 and the values of l and m are 12 and 8, respectively. The example used is identical to that used in Table 3. Table 2 Table 2 presents the results obtained when GPOTS is used to compute the partial eigensolutions of another selection of matrices. For each of these matrices the critical point corresponds to a value of 5.010-10. The ratios given in column 3 indicate that POTS has very poor convergence characteristics in all of the examples shown. In fact, in these cases it is more efficient to first solve the complete eigenproblem and then extract the required subset. l Iters Cycle 1 Iters Cycle 2 Total Time (s) 12 16 20 30 32 55 278 105 74 41 38 15 233 72 53 26 25 4 292.033 112.218 87.139 55.616 53.523 46.735 Critical Point Iters Cycle 1 Iters Cycle 2 Total Time (s) 5.00*10-10 5.00*10-7 5.00*10-8 5.00*10-9 2.25*10-7 8.75*10-8 3.13*10-7 278 14 17 19 15 16 15 233 22 13 11 15 14 15 292.033 20.733 18.263 18.563 17.962 18.112 17.962 Table 4 6: Conclusions An analysis of the results presented in Tables 1 and 2 suggests that GPOTS is significantly more efficient than POTS in all cases and that in general each cycle of GPOTS requires fewer iterations than the corresponding cycle of POTS. Observe that the time taken for each iteration of the primary cycle of GPOTS is greater than that taken for each iteration of the primary cycle of POTS. This is attributed to the extra computation required for the orthogonalisation process in the first cycle of GPOTS. Further, provided that l and m are less than the edgesize of the array processor used, the time required for all other computations in each iteration remains approximately the same for both algorithms. However, the reduction of the number of iterations taken in the primary cycle of GPOTS significantly outweighs the additional Table 3 Note that the complete eigensolution may be obtained using POT (see §2). Consequently, the percentage speedup presented in the last column of this table is in relation to solving the complete eigenproblem using POT and then extracting the required subset. 2 ` computation time. This may not be the case in a sequential environment. In general, as the number of eigenvector approximations used in the first cycle of GPOTS increases the execution time of the complete algorithm decreases. This also appears to be true in many cases where this number exceeds the edgesize of the array processor used. An example which illustrates this point is given in Table 3. Presently, this behaviour is not fully understood and further research is required to determine the optimal number of eigenvector approximations required, possibly using a visualisation approach. Also, the example presented in Table 4 suggests that, for each partial eigenproblem, a numerical value exists which corresponds to an optimal critical point. Finally, it may be concluded that the use of a visualisation approach in the analysis of parallel numerical algorithms provides extensive insight and knowledge. In the case study presented in this paper, the knowledge gained from analyses of the visualisation of the chosen data structure has not only provoked insight but has also resulted in the development of a significantly enhanced algorithm. Clearly, data structures from a variety of numerical algorithms, both iterative and non-iterative, need to be identified and investigated in order to further exploit the capabilities of visualising data in parallel numerical linear algorithms. [3] Treinish, L. A., The Visualisation Software Needs of Scientists, NCGA '90, 1, 6-15, (1990). [4] McCormick, B. H. et al., Visualisation in Scientific Computing (Special Issue), Computer Graphics, 21(6), (1987). [5] Sekuler, R. and R. Blake, Perception, Alfred A. Knopf Series, New York, (1985). [6] Stuart, E. J. and J. S. Weston, An Algorithm for the Parallel Computation of Subsets of Eigenvalues and Associated Eigenvectors of Large Symmetric Matrices using an Array Processor, In Proceedings of the Euromicro Workshop on Parallel and Distributed Processing, (1992). [7] Stewart, G. W., Introduction to Matrix Computations, Academic Press, New York, (1973). [8] Bauer, F. L., Das Verfahren der Treppeniteration und Verwandte Verfahren zur Losung Algebraescher EigenwertProbleme. Z. Agnew., Math. Phys., 8, 214-235, (1957). [9] Clint, M. et al., A Comparison of two parallel algorithms for the Symmetric Eigenproblem, International Journal of Comp. Math., 15, 291-302, (1984). [10] Weston, J. S. and M. Clint, Two algorithms for the parallel computation of Eigenvalues and Eigenvectors of large symmetric matrices using the ICL DAP, Parallel Computing, 13, 281-288, (1990). [11] Ohashi, T. et al., A Three-Dimensional Shared Display Method for Voxel-based Representation, Eurographics '85, 221-232, (1985) [12] Boissonnat J. D., Shape Reconstruction from Planar Cross Sections, CVGIP, 44(1), 1-19, (1988). [13] Cabral B. and C. L. Hunter, Visualisation Tools at Lawrence Livermore National Laboratory, IEEE Comp., 22(8), 77-84, (1989). [14] Veen A. and L. D. Peachey, TROTS : A Computer Graphics System for Three-Dimensional Reconstruction form Serial Sections, Comp. and Graph., 2, 135-150, (1977). [15] Yamashita H. et al., Interactive Visualisation of ThreeDimensional Magnetic Fields, J. Vis. Comp. Anim., 2(1), 34-40, (1991). [16] Frederick C. and E. L. Schwartz, Brain Peeling : Viewing the Inside of a Laminar Three-Dimensional Solid, Vis. Comp., 6(1), 37-49, (1990). [17] Thompson W. R. and C. Sagan, Computer Visualization in Spacecraft Exploration of the Solar System, CGI '91, SpringerVerlag, 37-44, (1991). [18] Fortran Plus Enhanced, man 102.01,AMT (1990). Acknowledgements This research was funded by the Department of Education for Northern Ireland. The software development was carried out using the facilities of the Parallel Computer Centre at the Queen's University of Belfast. References [1] Sommerville I., Software Engineering (Third Edition), Addison-Wesley Publishing Company, (1989). [2] Lehman, M. M. and L. Belady, Program Evolution, Processes of Software Change, London: Academic Press, (1985). 3