MVMULT(CMU) UNIX Programmer's Manual MVMULT(CMU) NAME mvmult - multiply sparse matrix by dense vector on the Y-MP SYNOPSIS Single processor versions: Setup: call makedesc(descriptor, pntr, nrows, scratch) Matrix-vector multiply: call mvmult(result, matrix, vector, indx, pntr, descriptor, nrows, scratch) Multi-processor versions: Setup: call mmakedesc(descriptor, pntr, nrows, scratch) Matrix-vector multiply: call mmvmult(result, matrix, vector, indx, pntr, descriptor, nrows, scratch) Optional tuning function: call setnthreads(nthreads) DESCRIPTION mvmult multiplies a sparse matrix stored in compressed row format by a dense vector. Storage format: | 11 | 21 A = | 0 | 0 | 51 21 22 32 0 52 0 32 33 0 0 0 52 0 44 0 51 0 0 0 55 | | | | | The matrix is stored using three one-dimensional arrays MATRIX, INDX, PNTR where MATRIX = ( 11 21 51 21 22 32 52 32 33 44 51 52 5 3 3 4 5 5 5 55 ) INDX = ( 1 2 5 2 2 3 PNTR = ( 1 4 8 10 11 14) The arguments are as follows: ) descriptor Holds additional information about the internal representation of matrix. This array is created by makedesc and used by mvmult. For a matrix with m rows and n nonzero values, this array must be at least 6000 + n/64 + m/64 words. nrows The number of rows in the matrix scratch a scratch array that must be at least as big as matrix result a dense vector of length nrows which will contain the product of matrix and vector vector a dense vector of length nrows matrix the nonzero values of the matrix stored row-by-row (in compressed row format). indx The column indices of the corresponding elements of the matrix. pntr An array of nrows+1 indices. PNTR(I) points to the location in MATRIX of the first element of row I. By convention, PNTR(M+1) contains the total number of nonzero elements. METHOD The implementation is based on segmented scans. MORE HERE. PERFORMANCE On the Y-MP, for a matrix with m rows and n nonzero values, the time for mvmult is approximately 2.7n + 2.5m clock periods (6 nsec). The time is independent of the number of elements in each row. The setup time (for makedesc) is approximately 3n + 4m clock periods, i.e., the time for one to two calls to mvmult. On the C90, the time for mvmult is approximately 1.8n + 1.7m clock periods (4.2 nsec). MULTIPROCESSOR VERSIONS The routines mmakedesc and mmvmult take the same arguments as the single-processor routines but take advantage of multi-processing using microtasking. The number of cpu's used is determined by the environment variable 'NCPUS'. Since the microtasking interface is still under development, there are some tuning functions in the current version that users can experiment with to improve the microtasking performance. The multiprocessor functions break up the work into a number of "threads". The number of threads used defaults to 16, but it can be modified using the setnthreads function. If the number of cpu's is much less than 16, it will probably improve performance to lower the number of threads. Depending on the scheduling environment (batch, dedicated, etc.) it might be helpful to use the tsktune function to keep the operating system from reclaiming processors during uniprocessor portions of the code. For example, the call call tsktune('HOLDTIME'L,100000) will hold an idle cpu for 100000 clocks before releasing it. COMPILING AND LINKING To compile the sorting routines, type 'make mvmult.o smvmult.o'. To link with the routines, add 'mvmult.o smvmult.o' to the end of the link command. SEE ALSO SPARSE(3SCI), TSKTUNE(3F) BUGS Earlier versions have only been tested on a Cray Y-MP. This version will only work on the Cray Y-MP/C90. Future versions should work on all types of Y-MP's. If you would like to be notified of future improvements to the code, send e-mail to marco.zagha@cs.cmu.edu. FUTURE WORK Someday, the routines may be generalized to support sparse matrix by dense matrix multiply. REFERENCE The technique used is an extension of the segmented scan algorithm in: Siddhartha Chatterjee, Guy E. Blelloch, and Marco Zagha, "Scan Primitives for Vector Computers", in Proceedings of Supercomputing '90, 666-675. DISTRIBUTION Copyright (c) 1992 Carnegie Mellon University, Marco Zagha, and Guy Blelloch. All Rights Reserved. Permission to use, copy, modify and distribute this software and its documentation is hereby granted, provided that both the copyright notice and this permission notice appear in all copies of the software, derivative works or modified versions, and any portions thereof, and that both notices appear in supporting documentation. CARNEGIE MELLON ALLOWS FREE USE OF THIS SOFTWARE IN ITS "AS IS" CONDITION. CARNEGIE MELLON DISCLAIMS ANY LIABILITY OF ANY KIND FOR ANY DAMAGES WHATSOEVER RESULTING FROM THE USE OF THIS SOFTWARE. We request users of this software to return to Marco Zagha and Guy Blelloch School of Computer Science Carnegie Mellon University 5000 Forbes Ave. Pittsburgh PA 15213-3890 marco.zagha@cs.cmu.edu guy.blelloch@cs.cmu.edu any improvements or extensions that they make and grant Carnegie Mellon the rights to redistribute these changes.