Stuart, E. J., Weston, J.S. The Enhancement of Parallel

advertisement
The Enhancement of Parallel Numerical Linear Algorithms
using a Visualisation Approach
E.J.Stuart and J.S.Weston
Department of Computing Science, University of Ulster,
Coleraine, Northern Ireland.
Abstract
Visualisation techniques may be used to determine how
the components of data structures change throughout the
execution of an iterative algorithm. The information
presented by the application of such techniques provides
the basis for an efficient analysis of the characteristics of
the data structures, thereby yielding insights into the
overall behaviour of the algorithm. As a result of these
insights, the development of an enhanced algorithm
becomes a distinct possibility. In this paper, two different
approaches to the visualisation of vectors are identified
and used in the analysis of an algorithm for the partial
solution of the eigenproblem. Consequently, a modified
algorithm is developed which is particularly efficient when
implemented on an array processor. Finally, the efficiency
of the algorithms for the computation of partial
eigensolutions of a variety of matrices on an array
processor is presented and analysed.
1: Introduction
Sommerville[1] introduces the law of
continuing change by paraphrasing Lehman and
Belady[2] - "A program that is used in a realworld environment necessarily must change or
become less and less useful in that environment".
This concept of change has become particularly
applicable to numerical algorithms since the
recent advent of new parallel architectures, the
commercial availability of which has provided a
stimulus for the further development and
enhancement of such algorithms. Visualisation
may be considered to play an important role in
this process.
Treinish [3] states that a variety of different
visualisation techniques must be available to
either examine a single set of parameters from
one source or a number of different parameters
from disparate sources. Essentially, such
techniques transform numerical data into a
geometric form which may be represented by
two-dimensional graphics, three-dimensional
graphics, or animations. Subsequently, an
analysis of the transformed data may yield a
deeper understanding of the algorithm, thereby
enriching scientific knowledge. However, the
greatest potential of visualisation is not the
entrancing movies produced by animations, but
the knowledge gained and the concepts
explained by perception of existing anomalies
[4].
Since the human mind has considerable
capability to capture and understand change in
visualised data [5], this paper considers two
approaches to the visual representation of the
change which occurs in a single vector during
the course of an iterative process. These are then
applied to a vector in the context of POTS [6],
an iterative algorithm for the computation of
subsets of eigenvalues and their associated
eigenvectors. The vector under discussion
represents approximations to the required
eigenvalue subset. An analysis of the visualised
data explores the convergence characteristics of
the eigenvalue subset and, consequently, a
modified POTS algorithm is derived. Partial
eigensolutions for a variety of matrices are
computed using these algorithms and a selection
of the results obtained are presented in tabular
form. Finally, some conclusions are drawn.
2: The POTS Algorithm
When used to compute the ‘m’ numerically
largest eigenvalues and their associated
eigenvectors, of a real symmetric matrix A, of
order n, POTS takes the following form:
`
Let Dm be the real diagonal matrix, of order
m, whose diagonal components yield the
required subset of eigenvalues, let Qm be the (n
m) orthonormal matrix whose columns represent
the associated eigenvectors, and let U0 be an (n
m) arbitrary orthonormal matrix whose columns
represent approximations to these eigenvectors.
Construct the sequence of eigenvector
approximations {Uk} as follows:
Uk+1 = ortho ( A.Uk.transform (Bk)), 0k<x (1)
Uk+1 = ortho ( A.Uk ), x  k
(2)
where x is the minimum value such that Bk is
diagonal. Bk, the interaction matrix of order
mm, is defined according to
Bk = (Uk)T . ( A . Uk ) ; 0  k < x
(3)
It follows that
limk
U = Qm and QmT . A . Qm = Dm.
(4)
 k
The function ortho is an extension of the GramSchmidt orthogonalisation process [7]. It
orthogonalises each of the eigenvector
approximations according to the descending
order of modulus of its corresponding eigenvalue
approximations. The function transform returns
a non-orthogonal matrix Tk, of order m, the
columns of which represent approximations to
the eigenvectors of Bk. The (i,j)th component of
Tk, tij, is defined by the expression
tij = 2bij/(dji + sign (dji). (dji2 + 4bij2)), ij
(5)
where
dji = (bjj - bii) and sign(0) = -1
(6)
tji = -tij
(7)
Observe that the value whose square root is
determined in equation (5) is real, and that the
sign of the square root is chosen so as to
maximise the magnitude of the denominator in
the expression for tij.
Clearly, the algorithm consists of two distinct
iterative cycles where the termination of the
primary cycle and the beginning of the secondary
cycle is signalled by the diagonalisation of the
matrix Bk. One of the main objectives of the
primary cycle is the creation of a good set of
eigenvector
approximations
which
is
subsequently used as input to the secondary
cycle, the convergence of this cycle being
dependant upon the accuracy of its input.
Essentially, the secondary cycle, whose output is
the required eigenvector subset, is an extension
of Bauer’s method [8]. This subset may then be
used in the computation of the required
eigenvalue subset. Convergence of the secondary
cycle is deemed to have occurred when the
difference between the modulus of successive
eigenvector approximations is less than a
predefined tolerance level.
If i, 1in, are the eigenvalues of A where
|1|  | 2| ...  | m-1|  | m| ...  | n|
(8)
then the overall rate of convergence of the
algorithm is determined by the ratio |m+1| / |m|.
Observe that the smaller the convergence ratio,
the better the overall convergence characteristics
of the algorithm and that as this ratio approaches
unity the algorithm becomes very inefficient.
Also, in the case where mn, the secondary cycle
is not required and the algorithm reduces to the
standard POT algorithm [9], [10], which permits
the computation of the complete set of
eigenvalues and their associated eigenvectors.
2.1: The Data Structures
Many data structures which exhibit an
informative convergent nature exist within
POTS. Three of these are identified below:
(i) The diagonal of the matrix Bk is a dynamic
vector which represents the eigenvalue
approximations. This vector converges onto the
required eigenvalue subset.
(ii)The real symmetric matrix Bk is a dynamic
matrix which converges to diagonal form.
(iii)The set of column vectors ui, 1im, forms a
dynamic matrix whose columns converge onto
the required set of eigenvectors.
In order to obtain a better understanding of
POTS a thorough investigation of the
convergence characteristics of these data
structures is required. During the course of such
an investigation, questions such as the following
would need to be addressed:
2
`
(i)Do the eigenvalue approximations always
converge monotonically or do they oscillate?
(ii)Do the eigenvalue approximations of largest
modulus in the required subset always converge
more quickly than those with smaller modulus?
(iii)In those cases where the diagonalisation of
the matrix Bk requires a large number of
iterations, is this always due to a few particularly
dominant off-diagonal components?
(iv)Is the matrix Bk always diagonalised
systematically, i.e. do the off-diagonal elements
of Bk become zero row by row or all at once?
Clearly these answers could be determined
mathematically. However, due to the abundance
of data involved this would be a complicated and
tedious task. Consequently, since this paper is
concerned with the changes which occur in data
structures during the course of an iterative
process, visualisation techniques have been
employed to simplify the interpretation and
analysis of the changes which occur in the
diagonal of the matrix Bk.
applied to the diagonal of the matrix Bk in
POTS, thereby, enabling question (ii) of §2.1 to
be addressed in a meaningful manner.
70
60
50
40
30 y-axis
20
10
0
-10
1
2
3
4
x-axis
S4
5
S5
S3
6
S2
7
S1
Figure 1
The second representation utilised is a ‘2Dline’ representation of a vector which uses a 2D
line technique. Figure 2 illustrates this
representation for a vector (s1,s2,s3,s4,s5) where
the x-axis denotes the iteration number and the
y-axis denotes the numerical value of the vector
component. The graphs are superimposed to aid
the perception of anomalies. In this
representation colour or patterns are used to
differentiate between each graph depending upon
the output media.
3: The Representations
Techniques such as depth shading [11], 2D
contours and surfaces [12], [13], 3D contours,
lines and surfaces [14], [15], [16], and animation
[17] are only a subset of the gallery of
visualisation techniques currently used in data
visualisation. In this paper 2D line, 3D line and
2D surface techniques are used to provide
several different representations of the changes
which occur in the components of a given vector,
thereby enabling two distinct approaches to the
visualisation of change to be adopted.
The first representation is the ‘area’
representation, which uses 2D surface techniques
to show the relative importance of each of the
vector components as they change from iteration
to iteration. Figure 1 illustrates this
representation for a vector (s1,s2,s3,s4,s5), where
the x-axis denotes the iteration number and the
y-axis denotes the modulus of the vector
component. This type of representation could be
200
y-axis
150
100
S1
S3
50
S2
x-axis
0
S5
S4
-50
-100
-150
1
2
3
4
5
6
7
8
9
10
Figure 2
An alternative to the 2D-line representation
discussed above is the ‘3D-line’ representation
which depicts each of the 2D graphs as a two
dimensional ribbon. This is illustrated in Figure
3. This type of representation enhances the
3
`
perception of the 2D-line representation. It can
be further enhanced by varying both the
viewpoint and the perspective.
(iv)The primary cycle of the algorithm exhibits a
unique two-stage composition where the
transition from the first stage to the second stage
occurs when all of the vector components are in
the second stage of their composition. The
boundary point between the two stages is defined
to be the transition point.
Figure 4 illustrates these trends for a particular
example where the matrices A and Bk are of
orders 100 and 8, respectively.
200
y-axis
150
100
50
0
-50
0.6 y-axis
-100
1 2
3 4
5
x-axis
6
7
8
9
S1
S2
S3
S4
S5
0.5
-150
0.4
10
0.3
Figure 3
0.2
An important consideration of the above
representations is that they may be used to view
other aspects of the changing data. Thus, for
example, one of the axes in an area
representation could denote the iteration number
whereas the other axis could represent the
numerical differences in the vector components
in consecutive iterations of the algorithm.
0.1
0
-0.1
-0.2
1
3
5
7
9
x-axis
11
13
15
17 S1
S3
S5
S7
Figure 4
Since, the 2D-line and 3D-line representations
used in this paper superimpose many
independent graphs within a single illustration, it
follows that the y-axis scale may not be the most
appropriate for each individual graph. This may
lead to the suppression of activity in some cases.
Thus, for example, the fact that the second stage
of the primary cycle of POTS exhibits little or no
change over a relatively large number of
iterations begs the following questions:
What is the purpose of this stage of the primary
cycle if no significant change occurs in any of
the components of the given vector?
Is the representation used to examine the change
in the vector components appropriate in this
instance?
If it is assumed that the representation used is
inappropriate then, clearly, it is necessary to
4: Two Visualisation Approaches
Two approaches to the use of the 2D-line and
3D-line representations were adopted in order to
investigate the behaviour of the diagonal of Bk.
In the first approach the use of these
representations, as illustrated in Figures 2 and 3,
enabled the following trends to be detected:
(i)The graph for each vector component exhibits
a unique two-stage composition.
(ii)The first stage is characterised by large
amounts of activity during the course of a
relatively small number of iterations.
(iii)The second stage is characterised by little or
no activity during the course of a relatively large
number of iterations.
4
`
examine a different aspect of the changing data.
This may be accomplished by using a 2D-line
representation in which one of the axes
represents the numerical differences in the vector
components in consecutive iterations of the
algorithm. The use of such a representation in
the second stage of the primary cycle of POTS
constitutes the second approach to visualisation.
This approach enabled the following additional
trends to be detected:
(v)The magnitude of the activity masked by the
primary analysis is relatively significant.
(vi)The graphs of the components of lesser
modulus continue to exhibit a dynamic
behaviour for a considerable period of time after
the components of greater modulus have ceased
to change.
investigate fully the behaviour of the diagonal of
Bk.
4.1: Algorithm Development and Enhancement
Recall that POTS consists of two cycles. In
the primary cycle a good set of eigenvector
approximations, Uk, is constructed which is then
used as input to the secondary cycle. Observe
that the transition from one cycle to the next
occurs whenever the matrix Bk becomes
diagonal. This corresponds to the case where the
diagonal components of Bk (which represent the
eigenvalue approximations) have ceased to
change. However, it is apparent from the
observations (i)-(iv) of §4 that the major changes
in the modulus of the eigenvalue approximations
take place in the first stage of the primary cycle.
Consequently, it was hypothesised that
sufficiently accurate eigenvector approximations
might be available for use as input to the
secondary cycle whenever the first stage of the
primary cycle is complete. Thus, experiments
were carried out to investigate the efficiency of
the algorithm when the primary cycle was
terminated at the transition point.
The trends established in (v) and (vi) of §4 led
to a further hypothesis, namely, that a more
efficient algorithm might be constructed in
which
(i) Each of the orthonormal matrices Uk
generated in the primary cycle is of order nl,
l>m
(ii) Each of the interaction matrices Bk generated
in the primary cycle is of order ll
(iii) Termination of the primary cycle is deemed
to have occurred whenever the m components of
largest modulus in the diagonal of Bk have
ceased to change. The point at which this occurs
is defined to be the critical point.
(iv) The input to the secondary cycle is the
orthonormal matrix Uk, of order nm, whose
columns
represent
the
eigenvector
approximations corresponding to the m
eigenvalue approximations of largest modulus in
0.0002
0.0001
0
16 18 20 22 24 26 28 30 32 34 36 38 40 42 44 46 48 50
-0.0001
x-axis
-0.0002
-0.0003
-0.0004
y-axis
Figure 5
Figure 5 illustrates these trends for the previous
example, where the horizontal axis denotes the
iteration number and the vertical axis denotes the
numerical difference between two successive
vector components. Observe that in Figure 4 all
change appears to have ceased after fifteen
iterations yet in Figure 5 change continues to
occur in the four components of smallest
modulus for at least a further thirty-five
iterations.
Clearly, in the case of POTS, it is necessary to
visualise the two stages of the primary cycle
from two different viewpoints in order to
5
`
the diagonal of the final matrix Bk in the primary
cycle.
Thus, experiments were carried out to test this
hypothesis. In this case the modified algorithm is
known as GPOTS (Greedy POTS).
The principle differences between GPOTS and
POTS occur in the primary cycle of each - in the
former the matrices Uk and Bk are of order nl
and ll, respectively, whereas in the latter they
are of order nm and mm, respectively, l>m.
These differences occur as a result of using l
rather than m eigenvector approximations
throughout the primary cycle of the developed
algorithm. It is anticipated that the number of
iterations required in the first cycle of GPOTS
will be considerably less than that required in the
case of POTS, and further, that the accuracy of
the input to the secondary cycle in both cases
will be of the same order.
The diagonality of the matrix Bk is determined
when the modulus of all of its off-diagonal
components is less than a predetermined value,
m1, usually taken to be 0.510-14.
Termination of the secondary cycle is deemed
to have occurred when the modulus of each
component of the difference between two
successive eigenvector approximations, Uk+1 and
Uk, is less than another predetermined value, m2,
usually taken to be 0.510-8. The algorithms are
used to compute partial eigensolutions for a
collection of matrices and a selection of the
results obtained are presented in Tables 1-4. In
many instances the subsets of eigenvalues
evaluated contained close and equal eigenvalues,
eigenvalues equal in modulus but opposite in
sign, and well separated eigenvalues. The
components of the matrices were generated
randomly to lie in the range (-100,100).
Initially, the experiments focused upon the
efficiency of the algorithm when the primary
cycle was terminated at the transition point. The
analysis of the results obtained from these
experiments demonstrated that, at the transition
point, the eigenvector approximations are not
sufficiently accurate for use in the secondary
cycle. Subsequently, the efficiency of the
algorithm was investigated when the primary
cycle was terminated at the critical point within
stage 2 of the primary cycle.
5: Numerical Experience.
Each of the algorithms is implemented on the
AMT DAP 510, an array processor with edgesize
32. The language used is Fortran Plus Enhanced,
a language which permits the programmer to
disregard the edgesize of the machine. Real
variables are declared to be of type REAL8 and
for simplicity, the columns of the matrix U0 are
equated to the first m (l) columns of the identity
matrix of order n in the case of POTS (GPOTS).
n
m
75
90
125
150
6
12
15
10
GPOTS
POTS
Iters
Total
m+1 Iters
m Cycle 1 Cycle 2 Time
Iters
Iters
Total Speedup
l Cycle 1 Cycle 2 Time
(%)
0.847
0.800
0.746
0.700
20
30
25
25
16
44
31
39
76
23
15
10
25.705
29.843
35.398
49.130
24
24
24
26
13
12
8
12
17.995
23.470
29.935
42.583
30.0
21.4
15.4
13.3
Table 1
Table 1 presents the results which are obtained
when POTS and GPOTS are used to compute
partial eigensolutions of a selection of matrices.
For each of these matrices the critical point is
that point at which the difference between each
of the m components of largest modulus in the
6
`
diagonal of Bk in two successive iterations is less
than or equal to 5.010-7. The ratios given in
column 3 indicate that POTS should exhibit
good convergence characteristics in all of the test
cases and the times in columns 6 and 10 are
quoted in seconds.
n
m
m+1
m
l
Total
time(s)
Speedup
(%)
100
100
150
175
14
8
7
5
0.991
1.000
0.993
0.994
60
55
20
30
67.885
46.735
186.100
168.662
71.0
74.3
67.3
84.8
The effect of varying the value of l for a given
partial eigenproblem is demonstrated in Table 3.
In this case the order of the matrix is 100, m
equals 8, and the critical point corresponds to a
value of 5.010-10. This table includes the
scenario where the value of l is greater than the
edgesize of the array processor.
Finally, Table 4 presents the case where the
critical point corresponds to values chosen to lie
between 2.2510-7 and 5.0010-10 and the values
of l and m are 12 and 8, respectively. The
example used is identical to that used in Table 3.
Table 2
Table 2 presents the results obtained when
GPOTS is used to compute the partial
eigensolutions of another selection of matrices.
For each of these matrices the critical point
corresponds to a value of 5.010-10. The ratios
given in column 3 indicate that POTS has very
poor convergence characteristics in all of the
examples shown. In fact, in these cases it is more
efficient to first solve the complete eigenproblem
and then extract the required subset.
l
Iters
Cycle 1
Iters
Cycle 2
Total
Time (s)
12
16
20
30
32
55
278
105
74
41
38
15
233
72
53
26
25
4
292.033
112.218
87.139
55.616
53.523
46.735
Critical
Point
Iters
Cycle 1
Iters
Cycle 2
Total
Time (s)
5.00*10-10
5.00*10-7
5.00*10-8
5.00*10-9
2.25*10-7
8.75*10-8
3.13*10-7
278
14
17
19
15
16
15
233
22
13
11
15
14
15
292.033
20.733
18.263
18.563
17.962
18.112
17.962
Table 4
6: Conclusions
An analysis of the results presented in Tables
1 and 2 suggests that GPOTS is significantly
more efficient than POTS in all cases and that in
general each cycle of GPOTS requires fewer
iterations than the corresponding cycle of POTS.
Observe that the time taken for each iteration of
the primary cycle of GPOTS is greater than that
taken for each iteration of the primary cycle of
POTS. This is attributed to the extra
computation required for the orthogonalisation
process in the first cycle of GPOTS. Further,
provided that l and m are less than the edgesize
of the array processor used, the time required for
all other computations in each iteration remains
approximately the same for both algorithms.
However, the reduction of the number of
iterations taken in the primary cycle of GPOTS
significantly
outweighs
the
additional
Table 3
Note that the complete eigensolution may be
obtained using POT (see §2). Consequently, the
percentage speedup presented in the last column
of this table is in relation to solving the complete
eigenproblem using POT and then extracting the
required subset.
2
`
computation time. This may not be the case in a
sequential environment.
In general, as the number of eigenvector
approximations used in the first cycle of GPOTS
increases the execution time of the complete
algorithm decreases. This also appears to be true
in many cases where this number exceeds the
edgesize of the array processor used. An
example which illustrates this point is given in
Table 3. Presently, this behaviour is not fully
understood and further research is required to
determine the optimal number of eigenvector
approximations required, possibly using a
visualisation approach. Also, the example
presented in Table 4 suggests that, for each
partial eigenproblem, a numerical value exists
which corresponds to an optimal critical point.
Finally, it may be concluded that the use of a
visualisation approach in the analysis of parallel
numerical algorithms provides extensive insight
and knowledge. In the case study presented in
this paper, the knowledge gained from analyses
of the visualisation of the chosen data structure
has not only provoked insight but has also
resulted in the development of a significantly
enhanced algorithm. Clearly, data structures
from a variety of numerical algorithms, both
iterative and non-iterative, need to be identified
and investigated in order to further exploit the
capabilities of visualising data in parallel
numerical linear algorithms.
[3] Treinish, L. A., The Visualisation Software Needs of
Scientists, NCGA '90, 1, 6-15, (1990).
[4] McCormick, B. H. et al., Visualisation in Scientific
Computing (Special Issue), Computer Graphics, 21(6), (1987).
[5] Sekuler, R. and R. Blake, Perception, Alfred A. Knopf
Series, New York, (1985).
[6] Stuart, E. J. and J. S. Weston, An Algorithm for the Parallel
Computation of Subsets of Eigenvalues and Associated
Eigenvectors of Large Symmetric Matrices using an Array
Processor, In Proceedings of the Euromicro Workshop on
Parallel and Distributed Processing, (1992).
[7] Stewart, G. W., Introduction to Matrix Computations,
Academic Press, New York, (1973).
[8] Bauer, F. L., Das Verfahren der Treppeniteration und
Verwandte Verfahren zur Losung Algebraescher EigenwertProbleme. Z. Agnew., Math. Phys., 8, 214-235, (1957).
[9] Clint, M. et al., A Comparison of two parallel algorithms
for the Symmetric Eigenproblem, International Journal of Comp.
Math., 15, 291-302, (1984).
[10] Weston, J. S. and M. Clint, Two algorithms for the parallel
computation of Eigenvalues and Eigenvectors of large symmetric
matrices using the ICL DAP, Parallel Computing, 13, 281-288,
(1990).
[11] Ohashi, T. et al., A Three-Dimensional Shared Display
Method for Voxel-based Representation, Eurographics '85,
221-232, (1985)
[12] Boissonnat J. D., Shape Reconstruction from Planar Cross
Sections, CVGIP, 44(1), 1-19, (1988).
[13] Cabral B. and C. L. Hunter, Visualisation Tools at
Lawrence Livermore National Laboratory, IEEE Comp., 22(8),
77-84, (1989).
[14] Veen A. and L. D. Peachey, TROTS : A Computer Graphics
System for Three-Dimensional Reconstruction form Serial
Sections, Comp. and Graph., 2, 135-150, (1977).
[15] Yamashita H. et al., Interactive Visualisation of ThreeDimensional Magnetic Fields, J. Vis. Comp. Anim., 2(1), 34-40,
(1991).
[16] Frederick C. and E. L. Schwartz, Brain Peeling : Viewing
the Inside of a Laminar Three-Dimensional Solid, Vis. Comp.,
6(1), 37-49, (1990).
[17] Thompson W. R. and C. Sagan, Computer Visualization in
Spacecraft Exploration of the Solar System, CGI '91, SpringerVerlag, 37-44, (1991).
[18] Fortran Plus Enhanced, man 102.01,AMT (1990).
Acknowledgements
This research was funded by the Department
of Education for Northern Ireland. The software
development was carried out using the facilities
of the Parallel Computer Centre at the Queen's
University of Belfast.
References
[1] Sommerville I., Software Engineering (Third Edition),
Addison-Wesley Publishing Company, (1989).
[2] Lehman, M. M. and L. Belady, Program Evolution,
Processes of Software Change, London: Academic Press, (1985).
3
Download