MVMULT(CMU) UNIX... NAME mvmult - multiply sparse matrix by dense vector on the...

MVMULT(CMU)
UNIX Programmer's Manual
MVMULT(CMU)
NAME
mvmult - multiply sparse matrix by dense vector on the Y-MP
SYNOPSIS
Single processor versions:
Setup:
call makedesc(descriptor, pntr, nrows, scratch)
Matrix-vector multiply:
call mvmult(result, matrix, vector, indx, pntr, descriptor,
nrows, scratch)
Multi-processor versions:
Setup:
call mmakedesc(descriptor, pntr, nrows, scratch)
Matrix-vector multiply:
call mmvmult(result, matrix, vector, indx, pntr, descriptor,
nrows, scratch)
Optional tuning function:
call setnthreads(nthreads)
DESCRIPTION
mvmult multiplies a sparse matrix stored in compressed row
format by a dense vector.
Storage format:
| 11
| 21
A = | 0
| 0
| 51
21
22
32
0
52
0
32
33
0
0
0
52
0
44
0
51
0
0
0
55
|
|
|
|
|
The matrix is stored using three one-dimensional arrays
MATRIX, INDX, PNTR where
MATRIX
= ( 11
21
51
21
22
32
52
32
33
44
51
52
5
3
3
4
5
5
5
55
)
INDX = ( 1
2
5
2
2
3
PNTR = ( 1
4
8
10
11
14)
The arguments are as follows:
)
descriptor
Holds additional information about the internal
representation of matrix. This array is created by
makedesc and used by mvmult. For a matrix with m
rows and n nonzero values, this array must be at
least 6000 + n/64 + m/64 words.
nrows
The number of rows in the matrix
scratch a scratch array that must be at least as big as
matrix
result
a dense vector of length nrows which will contain
the product of matrix and vector
vector
a dense vector of length nrows
matrix
the nonzero values of the matrix stored row-by-row
(in compressed row format).
indx
The column indices of the corresponding elements of
the matrix.
pntr
An array of nrows+1 indices. PNTR(I) points to the
location in MATRIX of the first element of row I. By
convention, PNTR(M+1) contains the total number of
nonzero elements.
METHOD
The implementation is based on segmented scans.
MORE HERE.
PERFORMANCE
On the Y-MP, for a matrix with m rows and n nonzero values,
the time for mvmult is approximately 2.7n + 2.5m clock
periods (6 nsec). The time is independent of the number of
elements in each row. The setup time (for makedesc) is
approximately 3n + 4m clock periods, i.e., the time for one
to two calls to mvmult. On the C90, the time for mvmult is
approximately 1.8n + 1.7m clock periods (4.2 nsec).
MULTIPROCESSOR VERSIONS
The routines mmakedesc and mmvmult take the same arguments
as the single-processor routines but take advantage of
multi-processing using microtasking. The number of cpu's
used is determined by the environment variable 'NCPUS'.
Since the microtasking interface is still under development,
there are some tuning functions in the current version that
users can experiment with to improve the microtasking performance. The multiprocessor functions break up the work
into a number of "threads". The number of threads used
defaults to 16, but it can be modified using the setnthreads
function. If the number of cpu's is much less than 16, it
will probably improve performance to lower the number of
threads.
Depending on the scheduling environment (batch, dedicated,
etc.) it might be helpful to use the tsktune function to
keep the operating system from reclaiming processors during
uniprocessor portions of the code. For example, the call
call tsktune('HOLDTIME'L,100000)
will hold an idle cpu for 100000 clocks before releasing it.
COMPILING AND LINKING
To compile the sorting routines, type 'make mvmult.o
smvmult.o'. To link with the routines, add 'mvmult.o
smvmult.o' to the end of the link command.
SEE ALSO
SPARSE(3SCI), TSKTUNE(3F)
BUGS
Earlier versions have only been tested on a Cray Y-MP.
This version will only work on the Cray Y-MP/C90. Future
versions should work on all types of Y-MP's.
If you would like to be notified of future improvements to
the code, send e-mail to marco.zagha@cs.cmu.edu.
FUTURE WORK
Someday, the routines may be generalized to support sparse
matrix by dense matrix multiply.
REFERENCE
The technique used is an extension of the segmented scan
algorithm in: Siddhartha Chatterjee, Guy E. Blelloch, and
Marco Zagha, "Scan Primitives for Vector Computers", in
Proceedings of Supercomputing '90, 666-675.
DISTRIBUTION
Copyright (c) 1992 Carnegie Mellon University, Marco Zagha,
and Guy Blelloch. All Rights Reserved.
Permission to use, copy, modify and distribute this software
and its documentation is hereby granted, provided that both
the copyright notice and this permission notice appear in
all copies of the software, derivative works or modified
versions, and any portions thereof, and that both notices
appear in supporting documentation.
CARNEGIE MELLON ALLOWS FREE USE OF THIS SOFTWARE IN ITS "AS
IS" CONDITION. CARNEGIE MELLON DISCLAIMS ANY LIABILITY OF
ANY KIND FOR ANY DAMAGES WHATSOEVER RESULTING FROM THE USE
OF THIS SOFTWARE.
We request users of this software to return to
Marco Zagha and Guy Blelloch
School of Computer Science
Carnegie Mellon University
5000 Forbes Ave.
Pittsburgh PA 15213-3890
marco.zagha@cs.cmu.edu
guy.blelloch@cs.cmu.edu
any improvements or extensions that they make and grant Carnegie Mellon the rights to redistribute these changes.