●
●
Fundamental inner kernels
–
–
–
–
Inner product
Outer product
Matrix-Vector multiplication
Matrix-Matrix multiplication
Matrix formats
–
–
Dense matrices
Sparse matrices
●
●
●
●
●
●
Coordinate Storage (COO)
Compressed Row Storage (CRS)
Compressed Column Storage (CCS)
ELLPACK/ITPACK Storage
Hybrid Storage (HYB)
Block Compressed Row Storage (BCRS)
2
●
●
●
●
We are led to identify three levels of linear algebra operations: vectorvector operations, matrix-vector operations and matrix-matrix operations.
Inner product of two n-vectors x and y given by x t y = Σ(x i
*y i
) i= 1,...,n.
Computation of inner product requires n multiplications and n − 1 additions. For simplicity, model serial time as T
1 for one scalar multiply-add operation.
= tc*n where tc is time
Outer product of two n-vectors x and y is n × n matrix Z = xy t whose (i,j) entry z ij
= x i
*y j
. Computation of outer product requires n so model serial time as T
1
= tc*n 2 .
2 multiplications,
Consider matrix-vector product y = A*x where A is n × n matrix and x and y are n-vectors. Components of vector y are given by y i
= Σ(a ij
*x j
), i =
1,...,n. Each of n components requires n multiply-add operations, so model serial time as T
1
= tc*n 2 .
Consider matrix-matrix product C = A*B where A, B, and result C are n × n matrices. Entries of matrix C are given by c ij
= Σa ik of n 2
*b kj
, i,j= 1,...,n. Each as T
1
entries of C requires n multiply-add operations, so model serial time
= tc*n 3 . Matrix-matrix product can be viewed as:
–
–
– n 2 inner products , or sum of n outer products , or n matrix-vector products
3
Level Work Examples saxpy
1 O(n) sdot
2 O(n 2 ) snrm2
Sgemv strsv sger
3 O(n 3 ) sgemm strsm ssyrk
Function
Scalar*Vector + Vector
Inner product
Euclidean vector norm
Matrix*Vector product
Triangular solution
Rank-one update
Matrix*Matrix product
Multiple triangular solutions
Rank-k update
4
Dense systems of linear equations are found in numerous applications, such as the following:
–
–
–
–
–
–
Airplane wing design
Radar cross-section studies (A signal of fixed frequency bounces off an object; the goal is to determine the intensity of the reflected signal in all possible directions)
Flow around ships and other offshore constructions
Diffusion of solid bodies in a liquid
Noise reduction
Diffusion of light through small particles
Note: The electromagnetics community is a major user of dense linearsystem solvers. Of particular interest to this community is the solution of the so-called radar cross-section problem.
From J. Dongarra, I. Foster and G. Fox. Source book and parallel computing. Chapter 20.
5
Column
Row Blocks
6
●
Huge sparse matrices often appear in science and engineering disciplines when solving problems that arise of: partial differential equations, chemical engineering, air traffic control, astrophysics, circuit simulation, economic modelling, nuclear reactor core modelling, optimal power flow problems, stochastic modelling, oil reservoir modelling, oceanography among others.
Chemical engineering:
Hydrocarbon separation problem
Economic Modelling:
Australian economic models Air traffic control model
7
From Matrix Market: http://math.nist.gov/MatrixMarket/
●
●
●
Sparse matrices are those matrices which have less than 10% of the number of nonzero elements to the total number of entries in the matrix .
This definition is variable and depends on the author and the order of the matrix. Brayton, Gustavson and Willoughby say that a typical sparse matrix has between 2 to 10 nonzero elements per row .
Livesley indicates that an average of 3 or 4 nonzero elements per row in a matrix that arise of an structural problem is a good estimate to define this matrix is sparse .
In the order to take advantage of the large number of the zero elements special schemes are required to store sparse matrices.
R.E. Ginna Nuclear Power Station
(Matrix bcsstk18 of Matrix Market).
Size : 11948 x 11948, 80519 entries
Average nonzero elements per colums/row: 12
Buckling model of a Boeing 767 rear pressure bulkhead (Matrix bcsstk29 of Matrix Market).
Size : 13992 x 13992, 316740 entries
Average nonzero elements per colums/row: 44
From Matrix Market: http://math.nist.gov/MatrixMarket/
8
●
●
●
●
The Coordinate Storage (COO or COOR) format also called the triplet format or IJV format is a particularly simple storage scheme.
The COO format stores both the corresponding row and column index for each nonzero element . The order of the storage of the elements is not specied. It can be stored by rows, columns, diagonals, etc.
This format uses three one-dimensional arrays: the array row_ind and col_ind to store the row and column indices, respectively; and an array val to store all nonzero elements.
The COO format for the matrix in example, ordering the elements by column is given below: a
11 a
12
0 a
21 a
22 a
23
0
0
0
0
0
0
0 a
32 a
33
0 0 a
43 a
34 a
44
0 a
45
0
0
0 0 0 a
54 a
55 a
56 row_ind = (1 2 1 2 3 2 3 4 3 4 5 4 5 5)
col_ind = (1 1 2 2 2 3 3 3 4 4 4 5 5 6) val = (a
11
a
21
a
12
a
22
a
32
a
23
a
33
a
43
a
34
a
44
a
54
a
45
a
55
a
56
)
9
●
●
The Compressed Storage Row (CSR or CRS) format also is called Yale
Sparse Matrix. It is a popular and general purpose sparse matrix representation where the elements nonzero in the matrix are still stored in row form.
The CSR format requires three arrays to store the matrix AєR MxN : row_ptr : integer array which is of length M +1, where the row ptr(i) contains the pointer in val and col_ind arrays of the first element in row i.
Row i extends from row ptr(i) to row ptr(i + 1)-1.
col ind : integer array holding the column indices of the nonzero values. This array is of length nonzero elements (nnz).
val: float array stores the nonzero values in the matrix. It has length equal to nnz.
a
11 a
12 a
21 a
22
0 a
23
0
0
0
0
0
0
0 a
32 a
33 a
34
0 0
0 0 a
43 a
44
0 0 0 a
54 a
45 a
55
0 a
56 row_ptr = ( 1 3 6 9 12 15 )
col_ind = ( 1 2 1 2 3 2 3 4 3 4 5 4 5 6 ) val = (a
11
a
12
a
21
a
22
a
23
a
32
a
33
a
34
a
43
a
44
a
45
a
54
a
55
a
56
)
10
●
●
Compressed Storage Column (CSC or CCS) is also called the Harwell-
Boeing Sparse Matrix Storage format .
CSC format is considered to be the transpose of CSR because the only difference is it deals with columns rather than rows. CSC format replaces two arrays of CSR format:
the array row_ptr with col_ptr that contains the pointer in val and row_ind arrays of the first element in one column , and
the array col_ind with row_ind that stores the row indices of the nonzero elements in the matrix . a
11 a
12 a
21 a
22
0 a
23
0
0
0
0
0
0
0 a
32 a
33 a
34
0 0
0 0 a
43 a
44
0 0 0 a
54 a
45 a
55
0 a
56 col_ptr = (
row_ind = ( val = (a
11
a
1 3 6 9 12 14 15
21
1 2 1 2 3 2 3 4 3 4 5 4 5 5
a
12
a
22
a
32
a
23
a
33
a
43
)
a
34
a
44
a
54
a
45
a
55
)
a
56
)
11
●
●
●
The ITPACK storage format also called ELLPACK format or Purdue
Storage is best suited to matrices in which most rows of the matrix have the same number of nonzero elements.
ITPACK format uses two bi-dimensional arrays : the arrays val and col_ind , each of size MxF, where M is the number of rows in the sparse matrix and F is the maximum number of nonzero elements per row . The bi-dimensional arrays store all nonzero values and column indices, respectively.
The elements of each row are packed consecutively in the corresponding row of val and padding is used for shorter rows: if the number of elements in a row is lower than F, the corresponding row in both val and col ind are padded with zeros.
This format best supports matrices in which the number of nonzeros in all rows is close to F . If we consider F = 3 then the ITPACK format is specified by the following arrays: a
11 a
12 a
21 a
22
0 a
23
0
0
0
0
0
0
0 a
32 a
33 a
34
0 0
0 0 a
43 a
44
0 0 0 a
54 a
45 a
55
0 a
56
1 2 0
1 2 3 col_ind = 2 3 4
3 4 5
4 5 6
a
11
a
12
0
a val = a
21
32
a
a
22
33
a
43
a
44
a
54
a
55
a
23
a
34
a
45
a
56
12
●
●
The Hybrid (HYB) format is a combination of the different storage formats . For example, an hybrid storage format compose by the
ELLPACK and COO formats . In this example, this HYB format stores the minimum number of nonzeros per row (F) in the ELLPACK format and the remaining nonzero values of longer rows into COO format.
As an example, we consider F = 2 then the HYB format is as follows: a
11 a
12 a
21 a
22
0 a
23
0
0
0
0
0
0
0 a
32 a
33 a
34
0 0
0 0 a
43 a
44
0 0 0 a
54 a
45 a
55
0 a
56
1 2
1 2 col_ind _ELL= 2 3
3 4
4 5 row_ind_coo = (2 3 4 5)
col_ind_coo = (3 4 5 6) val_coo = (a
23
a
34
a
45
a
56
)
a
11
a
12
a val_ELL = a
32
a
21
a
a
22
43
a
54
a
33
a
44
55
13
●
●
The Block Compressed Row Storage (BCSR, BSR or SPBCSR) format divides a matrix into small blocks (squares or rectangles) of nb dimension, where nb = r x c and r, c denotes the number of rows and columns of each block, respectively . This format is an extension of the CSR format with the difference that the elements in val array are considered dense blocks where it is possible to store a few zero elements inside the block .
The number of the nonzero blocks in the matrix is nnzbl and each block is stored consecutively by row using the CSR format . Like the CSR format we need three arrays: row_blk : integer array which is of length nnzbl. The row_blk(i) contains the pointer in val array of the first element of the nonzero block i.
col ind blk : integer array holding the column indices in the original matrix to the first element of the nonzero blocks. The length of this array is nnzbl.
val : float array which is of length nb*nnzbl. This array stores the values by row to each block.
a
11 a
12
0 0 0 0
To nb = 2x3 and nnzbl = 4 a
21 a
22 a
23
0 0 0 col_ind_blk = (1 1 4 4)
0
0 a
32
0 0 a a
33
43
0 0
0 a
34 a
44
0 a
54
0 0
0 0 a
45
0 a
55 a
56
0 a
66 row_blk = (1 7 13 19) val = (a
11
a
12
0 a
21
a
22
a
23
0 a
32
a
33
0 0 a
43
a
34
0 0 a
44
a
45
0 a
54
a
55
a
56
0 0 a
66
14
)
SpMxV Serial
600
500
400
300
200
100
0 bcsstk29 e40r0100 garon2 memplus msc10848
Matrices
Na5 ncvxbqp1
Intel Core i7
Serial executions of 100 iterations of SpMxV
BCRS: block size 3x3
F=16
CSR
CSRP
CSRPELL
BCSR 3x3
IBCSR 3x3
COO
JDS
TJDS
BJDS
ELL
LadderELL
HYB
FEBA
CSR-FEBA
15
12
10
8
6
4
2
0 bcsstk29 garon2 msc10848 ncvxbqp1 psmigr_1 s3dkq4m2 tandem_vtx
Matrices
Intel Core i7
100 iterations of SpMxV csr_scalar csr_scalar_tex csr_vector csr_vector_tex
Only 240 used
16
17