MVMULT(CMU) UNIX... NAME mvmult - multiply sparse matrix by dense vector on the...

MVMULT(CMU)
UNIX Programmer's Manual
MVMULT(CMU)
NAME
mvmult - multiply sparse matrix by dense vector on the Cray
SYNOPSIS
There are several versions available:
mvmult
-- fast CAL single processor implementation
mmvmult -- fast CAL multi-processor implementation
fmvmult -- fast pure fortran implementation (not as fast as
mvmult)
cmvmult -- clean fortran implementation (slower but more readable
code)
c2mvmult -- clean fortran implementation (slighlty faster than
cmvmult)
Setup:
call makedesc (descriptor,pntr,nrows,scratch)
call mmakedesc (descriptor,pntr,nrows,scratch)
call fmakedesc (descriptor,pntr,nrows,scratch)
call cmakedesc (descriptor,pntr,nrows,scratch)
call c2makedesc(descriptor,pntr,nrows,scratch)
Sparse-matrix vector multiply:
call mvmult
(result,matrix,vector,indx,pntr,descriptor,nrows,scratch)
call mmvmult
(result,matrix,vector,indx,pntr,descriptor,nrows,scratch)
call fmvmult
(result,matrix,vector,indx,pntr,descriptor,nrows,scratch)
call cmvmult
(result,matrix,vector,indx,pntr,descriptor,nrows,scratch)
call
c2mvmult(result,matrix,vector,indx,pntr,descriptor,nrows,scratch)
Optional multi-processor tuning function:
call setnthreads(nthreads)
Optional function for allowing zero length rows in CAL versions:
call checkzero(1)
DESCRIPTION
mvmult multiplies a sparse matrix stored in compressed row
format by a dense vector.
Storage format:
| 11
| 21
21
22
0
32
0
52
51 |
0 |
A = | 0
| 0
| 51
32
0
52
33
0
0
0
44
0
0 |
0 |
55 |
The matrix is stored using three one-dimensional arrays
MATRIX, INDX, PNTR where
MATRIX
= ( 11
21
51
21
22
32
52
32
33
44
51
52
5
3
3
4
5
5
5
55
)
INDX = ( 1
2
5
2
2
3
PNTR = ( 1
4
8
10
11
14)
The arguments are as follows:
descriptor
Holds additional information about the internal
representation of matrix. This array is created by
makedesc and used by mvmult. For the mvmult,
mmvmult, and fmvmult implementations, using a matrix
with m rows and n nonzero values, this array must be
at least 6000 + n/64 + m/64 words. For the cmvmult
and cmvmult2, implementations, using a matrix with m
rows and n nonzero values, this array must be at
least 6000 + n words.
nrows
The number of rows in the matrix
scratch a scratch array that must be at least as big as
matrix
result
a dense vector of length nrows which will contain
the product of matrix and vector
vector
a dense vector of length nrows
matrix
the nonzero values of the matrix stored row-by-row
(in compressed row format).
indx
The column indices of the corresponding elements of
the matrix.
pntr
An array of nrows+1 indices. PNTR(I) points to the
location in MATRIX of the first element of row I. By
convention, PNTR(M+1) contains the total number of
nonzero elements. For the fortran versions, these
indices must be strictly increasing, i.e., zerolength rows are not allowed. For the CAL versions,
zero-length rows are allowed only if the checkzero
function is called with a nonzero integer value.
METHOD
The implementation is based on segmented scans.
MORE HERE.
)
PERFORMANCE
On the Y-MP, for a matrix with m rows and n nonzero values,
the time for mvmult is approximately 2.7n + 2.5m clock
periods (6 nsec). The time is independent of the number of
elements in each row. The setup time (for makedesc) is
approximately 3n + 4m clock periods, i.e., the time for one
to two calls to mvmult. On the C90, the time for mvmult is
approximately 1.7n + 1.6m clock periods (4.2 nsec). The
fortran versions are approximately 1.5 to 2.2 times slower
than the CAL versions for large problems. The fortran versions have a higher startup overhead.
MULTIPROCESSOR VERSIONS
The routines mmakedesc and mmvmult take the same arguments
as the single-processor routines but take advantage of
multi-processing using microtasking. The number of cpu's
used is determined by the environment variable 'NCPUS'.
Since the microtasking interface is still under development,
there are some tuning functions in the current version that
users can experiment with to improve the microtasking performance. The multiprocessor functions break up the work
into a number of "threads". The number of threads used
defaults to 16, but it can be modified using the setnthreads
function. If the number of cpu's is much less than 16, it
will probably improve performance to lower the number of
threads.
Depending on the scheduling environment (batch, dedicated,
etc.) it might be helpful to use the tsktune function to
keep the operating system from reclaiming processors during
uniprocessor portions of the code. For example, the call
call tsktune('HOLDTIME'L,100000)
will hold an idle cpu for 100000 clocks before releasing it.
COMPILING AND LINKING
To compile the sorting routines, type 'make mvmult.o
smvmult.o puremv.o'. To link with the routines, add
'mvmult.o smvmult.o puremv.o' to the end of the link command.
SEE ALSO
SPARSE(3SCI), TSKTUNE(3F)
BUGS
Earlier versions have only been tested on a Cray Y-MP. The
current CAL versions will only work on the Cray Y-MP/C90.
Future versions should work on all types of Y-MP's.
If you would like to be notified of future improvements to
the code, send e-mail to marco.zagha@cs.cmu.edu.
FUTURE WORK
Someday, the routines may be generalized to support sparse
matrix by dense matrix multiply and block entries. Support
for zero-length rows might be added to the fortran versions.
REFERENCE
The technique used is an extension of the segmented scan
algorithm described in:
Siddhartha Chatterjee, Guy E. Blelloch, and Marco Zagha,
"Scan Primitives for Vector Computers", in Proceedings of
Supercomputing '90, 666-675.
The sparse-matrix vector multiplication algorithm is
described in:
Guy E. Blelloch, Michael A. Heroux, and Marco Zagha, "Segmented Operations for Sparse Matrix Computations on Vector
Multiprocessors", in preparation.
DISTRIBUTION
Copyright (c) 1992 Carnegie Mellon University, Marco Zagha,
and Guy Blelloch. All Rights Reserved.
Permission to use, copy, modify and distribute this software
and its documentation is hereby granted, provided that both
the copyright notice and this permission notice appear in
all copies of the software, derivative works or modified
versions, and any portions thereof, and that both notices
appear in supporting documentation.
CARNEGIE MELLON ALLOWS FREE USE OF THIS SOFTWARE IN ITS "AS
IS" CONDITION. CARNEGIE MELLON DISCLAIMS ANY LIABILITY OF
ANY KIND FOR ANY DAMAGES WHATSOEVER RESULTING FROM THE USE
OF THIS SOFTWARE.
We request users of this software to return to
Marco Zagha and Guy Blelloch
School of Computer Science
Carnegie Mellon University
5000 Forbes Ave.
Pittsburgh PA 15213-3890
marco.zagha@cs.cmu.edu
guy.blelloch@cs.cmu.edu
any improvements or extensions that they make and grant Carnegie Mellon the rights to redistribute these changes.