MVMULT(CMU) UNIX Programmer's Manual MVMULT(CMU) NAME mvmult - multiply sparse matrix by dense vector on the Cray SYNOPSIS There are several versions available: mvmult -- fast CAL single processor implementation mmvmult -- fast CAL multi-processor implementation fmvmult -- fast pure fortran implementation (not as fast as mvmult) cmvmult -- clean fortran implementation (slower but more readable code) c2mvmult -- clean fortran implementation (slighlty faster than cmvmult) Setup: call makedesc (descriptor,pntr,nrows,scratch) call mmakedesc (descriptor,pntr,nrows,scratch) call fmakedesc (descriptor,pntr,nrows,scratch) call cmakedesc (descriptor,pntr,nrows,scratch) call c2makedesc(descriptor,pntr,nrows,scratch) Sparse-matrix vector multiply: call mvmult (result,matrix,vector,indx,pntr,descriptor,nrows,scratch) call mmvmult (result,matrix,vector,indx,pntr,descriptor,nrows,scratch) call fmvmult (result,matrix,vector,indx,pntr,descriptor,nrows,scratch) call cmvmult (result,matrix,vector,indx,pntr,descriptor,nrows,scratch) call c2mvmult(result,matrix,vector,indx,pntr,descriptor,nrows,scratch) Optional multi-processor tuning function: call setnthreads(nthreads) Optional function for allowing zero length rows in CAL versions: call checkzero(1) DESCRIPTION mvmult multiplies a sparse matrix stored in compressed row format by a dense vector. Storage format: | 11 | 21 21 22 0 32 0 52 51 | 0 | A = | 0 | 0 | 51 32 0 52 33 0 0 0 44 0 0 | 0 | 55 | The matrix is stored using three one-dimensional arrays MATRIX, INDX, PNTR where MATRIX = ( 11 21 51 21 22 32 52 32 33 44 51 52 5 3 3 4 5 5 5 55 ) INDX = ( 1 2 5 2 2 3 PNTR = ( 1 4 8 10 11 14) The arguments are as follows: descriptor Holds additional information about the internal representation of matrix. This array is created by makedesc and used by mvmult. For the mvmult, mmvmult, and fmvmult implementations, using a matrix with m rows and n nonzero values, this array must be at least 6000 + n/64 + m/64 words. For the cmvmult and cmvmult2, implementations, using a matrix with m rows and n nonzero values, this array must be at least 6000 + n words. nrows The number of rows in the matrix scratch a scratch array that must be at least as big as matrix result a dense vector of length nrows which will contain the product of matrix and vector vector a dense vector of length nrows matrix the nonzero values of the matrix stored row-by-row (in compressed row format). indx The column indices of the corresponding elements of the matrix. pntr An array of nrows+1 indices. PNTR(I) points to the location in MATRIX of the first element of row I. By convention, PNTR(M+1) contains the total number of nonzero elements. For the fortran versions, these indices must be strictly increasing, i.e., zerolength rows are not allowed. For the CAL versions, zero-length rows are allowed only if the checkzero function is called with a nonzero integer value. METHOD The implementation is based on segmented scans. MORE HERE. ) PERFORMANCE On the Y-MP, for a matrix with m rows and n nonzero values, the time for mvmult is approximately 2.7n + 2.5m clock periods (6 nsec). The time is independent of the number of elements in each row. The setup time (for makedesc) is approximately 3n + 4m clock periods, i.e., the time for one to two calls to mvmult. On the C90, the time for mvmult is approximately 1.7n + 1.6m clock periods (4.2 nsec). The fortran versions are approximately 1.5 to 2.2 times slower than the CAL versions for large problems. The fortran versions have a higher startup overhead. MULTIPROCESSOR VERSIONS The routines mmakedesc and mmvmult take the same arguments as the single-processor routines but take advantage of multi-processing using microtasking. The number of cpu's used is determined by the environment variable 'NCPUS'. Since the microtasking interface is still under development, there are some tuning functions in the current version that users can experiment with to improve the microtasking performance. The multiprocessor functions break up the work into a number of "threads". The number of threads used defaults to 16, but it can be modified using the setnthreads function. If the number of cpu's is much less than 16, it will probably improve performance to lower the number of threads. Depending on the scheduling environment (batch, dedicated, etc.) it might be helpful to use the tsktune function to keep the operating system from reclaiming processors during uniprocessor portions of the code. For example, the call call tsktune('HOLDTIME'L,100000) will hold an idle cpu for 100000 clocks before releasing it. COMPILING AND LINKING To compile the sorting routines, type 'make mvmult.o smvmult.o puremv.o'. To link with the routines, add 'mvmult.o smvmult.o puremv.o' to the end of the link command. SEE ALSO SPARSE(3SCI), TSKTUNE(3F) BUGS Earlier versions have only been tested on a Cray Y-MP. The current CAL versions will only work on the Cray Y-MP/C90. Future versions should work on all types of Y-MP's. If you would like to be notified of future improvements to the code, send e-mail to marco.zagha@cs.cmu.edu. FUTURE WORK Someday, the routines may be generalized to support sparse matrix by dense matrix multiply and block entries. Support for zero-length rows might be added to the fortran versions. REFERENCE The technique used is an extension of the segmented scan algorithm described in: Siddhartha Chatterjee, Guy E. Blelloch, and Marco Zagha, "Scan Primitives for Vector Computers", in Proceedings of Supercomputing '90, 666-675. The sparse-matrix vector multiplication algorithm is described in: Guy E. Blelloch, Michael A. Heroux, and Marco Zagha, "Segmented Operations for Sparse Matrix Computations on Vector Multiprocessors", in preparation. DISTRIBUTION Copyright (c) 1992 Carnegie Mellon University, Marco Zagha, and Guy Blelloch. All Rights Reserved. Permission to use, copy, modify and distribute this software and its documentation is hereby granted, provided that both the copyright notice and this permission notice appear in all copies of the software, derivative works or modified versions, and any portions thereof, and that both notices appear in supporting documentation. CARNEGIE MELLON ALLOWS FREE USE OF THIS SOFTWARE IN ITS "AS IS" CONDITION. CARNEGIE MELLON DISCLAIMS ANY LIABILITY OF ANY KIND FOR ANY DAMAGES WHATSOEVER RESULTING FROM THE USE OF THIS SOFTWARE. We request users of this software to return to Marco Zagha and Guy Blelloch School of Computer Science Carnegie Mellon University 5000 Forbes Ave. Pittsburgh PA 15213-3890 marco.zagha@cs.cmu.edu guy.blelloch@cs.cmu.edu any improvements or extensions that they make and grant Carnegie Mellon the rights to redistribute these changes.