Fundamental inner kernels and matrix formats Beatriz Otero

advertisement

Fundamental inner kernels and matrix formats

Beatriz Otero and José Ramón Herrero

Computer Architecture Department

Technical University of Catalonia

2012

Outline

Fundamental inner kernels

Inner product

Outer product

Matrix-Vector multiplication

Matrix-Matrix multiplication

Matrix formats

Dense matrices

Sparse matrices

Coordinate Storage (COO)

Compressed Row Storage (CRS)

Compressed Column Storage (CCS)

ELLPACK/ITPACK Storage

Hybrid Storage (HYB)

Block Compressed Row Storage (BCRS)

2

Fundamental inner kernels

We are led to identify three levels of linear algebra operations: vectorvector operations, matrix-vector operations and matrix-matrix operations.

Inner product of two n-vectors x and y given by x t y = Σ(x i

*y i

) i= 1,...,n.

Computation of inner product requires n multiplications and n − 1 additions. For simplicity, model serial time as T

1 for one scalar multiply-add operation.

= tc*n where tc is time

Outer product of two n-vectors x and y is n × n matrix Z = xy t whose (i,j) entry z ij

= x i

*y j

. Computation of outer product requires n so model serial time as T

1

= tc*n 2 .

2 multiplications,

Consider matrix-vector product y = A*x where A is n × n matrix and x and y are n-vectors. Components of vector y are given by y i

= Σ(a ij

*x j

), i =

1,...,n. Each of n components requires n multiply-add operations, so model serial time as T

1

= tc*n 2 .

Consider matrix-matrix product C = A*B where A, B, and result C are n × n matrices. Entries of matrix C are given by c ij

= Σa ik of n 2

*b kj

, i,j= 1,...,n. Each as T

1

entries of C requires n multiply-add operations, so model serial time

= tc*n 3 . Matrix-matrix product can be viewed as:

– n 2 inner products , or sum of n outer products , or n matrix-vector products

3

BLAS

Level Work Examples saxpy

1 O(n) sdot

2 O(n 2 ) snrm2

Sgemv strsv sger

3 O(n 3 ) sgemm strsm ssyrk

Function

Scalar*Vector + Vector

Inner product

Euclidean vector norm

Matrix*Vector product

Triangular solution

Rank-one update

Matrix*Matrix product

Multiple triangular solutions

Rank-k update

4

Dense Matrices

Dense systems of linear equations are found in numerous applications, such as the following:

Airplane wing design

Radar cross-section studies (A signal of fixed frequency bounces off an object; the goal is to determine the intensity of the reflected signal in all possible directions)

Flow around ships and other offshore constructions

Diffusion of solid bodies in a liquid

Noise reduction

Diffusion of light through small particles

Note: The electromagnetics community is a major user of dense linearsystem solvers. Of particular interest to this community is the solution of the so-called radar cross-section problem.

From J. Dongarra, I. Foster and G. Fox. Source book and parallel computing. Chapter 20.

5

Column

Dense Matrices

Row Blocks

6

Sparse Matrices

Huge sparse matrices often appear in science and engineering disciplines when solving problems that arise of: partial differential equations, chemical engineering, air traffic control, astrophysics, circuit simulation, economic modelling, nuclear reactor core modelling, optimal power flow problems, stochastic modelling, oil reservoir modelling, oceanography among others.

Chemical engineering:

Hydrocarbon separation problem

Economic Modelling:

Australian economic models Air traffic control model

7

From Matrix Market: http://math.nist.gov/MatrixMarket/

Sparse Matrices

Sparse matrices are those matrices which have less than 10% of the number of nonzero elements to the total number of entries in the matrix .

This definition is variable and depends on the author and the order of the matrix. Brayton, Gustavson and Willoughby say that a typical sparse matrix has between 2 to 10 nonzero elements per row .

Livesley indicates that an average of 3 or 4 nonzero elements per row in a matrix that arise of an structural problem is a good estimate to define this matrix is sparse .

In the order to take advantage of the large number of the zero elements special schemes are required to store sparse matrices.

R.E. Ginna Nuclear Power Station

(Matrix bcsstk18 of Matrix Market).

Size : 11948 x 11948, 80519 entries

Average nonzero elements per colums/row: 12

Buckling model of a Boeing 767 rear pressure bulkhead (Matrix bcsstk29 of Matrix Market).

Size : 13992 x 13992, 316740 entries

Average nonzero elements per colums/row: 44

From Matrix Market: http://math.nist.gov/MatrixMarket/

8

Coordinate Storage

The Coordinate Storage (COO or COOR) format also called the triplet format or IJV format is a particularly simple storage scheme.

The COO format stores both the corresponding row and column index for each nonzero element . The order of the storage of the elements is not specied. It can be stored by rows, columns, diagonals, etc.

This format uses three one-dimensional arrays: the array row_ind and col_ind to store the row and column indices, respectively; and an array val to store all nonzero elements.

The COO format for the matrix in example, ordering the elements by column is given below: a

11 a

12

0 a

21 a

22 a

23

0

0

0

0

0

0

0 a

32 a

33

0 0 a

43 a

34 a

44

0 a

45

0

0

0 0 0 a

54 a

55 a

56 row_ind = (1 2 1 2 3 2 3 4 3 4 5 4 5 5)

col_ind = (1 1 2 2 2 3 3 3 4 4 4 5 5 6) val = (a

11

a

21

a

12

a

22

a

32

a

23

a

33

a

43

a

34

a

44

a

54

a

45

a

55

a

56

)

9

Compressed Row Storage (CRS)

The Compressed Storage Row (CSR or CRS) format also is called Yale

Sparse Matrix. It is a popular and general purpose sparse matrix representation where the elements nonzero in the matrix are still stored in row form.

The CSR format requires three arrays to store the matrix AєR MxN : row_ptr : integer array which is of length M +1, where the row ptr(i) contains the pointer in val and col_ind arrays of the first element in row i.

Row i extends from row ptr(i) to row ptr(i + 1)-1.

col ind : integer array holding the column indices of the nonzero values. This array is of length nonzero elements (nnz).

val: float array stores the nonzero values in the matrix. It has length equal to nnz.

a

11 a

12 a

21 a

22

0 a

23

0

0

0

0

0

0

0 a

32 a

33 a

34

0 0

0 0 a

43 a

44

0 0 0 a

54 a

45 a

55

0 a

56 row_ptr = ( 1 3 6 9 12 15 )

col_ind = ( 1 2 1 2 3 2 3 4 3 4 5 4 5 6 ) val = (a

11

a

12

a

21

a

22

a

23

a

32

a

33

a

34

a

43

a

44

a

45

a

54

a

55

a

56

)

10

Compressed Column Storage (CCS)

Compressed Storage Column (CSC or CCS) is also called the Harwell-

Boeing Sparse Matrix Storage format .

CSC format is considered to be the transpose of CSR because the only difference is it deals with columns rather than rows. CSC format replaces two arrays of CSR format:

the array row_ptr with col_ptr that contains the pointer in val and row_ind arrays of the first element in one column , and

the array col_ind with row_ind that stores the row indices of the nonzero elements in the matrix . a

11 a

12 a

21 a

22

0 a

23

0

0

0

0

0

0

0 a

32 a

33 a

34

0 0

0 0 a

43 a

44

0 0 0 a

54 a

45 a

55

0 a

56 col_ptr = (

row_ind = ( val = (a

11

a

1 3 6 9 12 14 15

21

1 2 1 2 3 2 3 4 3 4 5 4 5 5

a

12

a

22

a

32

a

23

a

33

a

43

)

a

34

a

44

a

54

a

45

a

55

)

a

56

)

11

ELLPACK/ITPACK Storage (ELL)

The ITPACK storage format also called ELLPACK format or Purdue

Storage is best suited to matrices in which most rows of the matrix have the same number of nonzero elements.

ITPACK format uses two bi-dimensional arrays : the arrays val and col_ind , each of size MxF, where M is the number of rows in the sparse matrix and F is the maximum number of nonzero elements per row . The bi-dimensional arrays store all nonzero values and column indices, respectively.

The elements of each row are packed consecutively in the corresponding row of val and padding is used for shorter rows: if the number of elements in a row is lower than F, the corresponding row in both val and col ind are padded with zeros.

This format best supports matrices in which the number of nonzeros in all rows is close to F . If we consider F = 3 then the ITPACK format is specified by the following arrays: a

11 a

12 a

21 a

22

0 a

23

0

0

0

0

0

0

0 a

32 a

33 a

34

0 0

0 0 a

43 a

44

0 0 0 a

54 a

45 a

55

0 a

56

1 2 0

1 2 3 col_ind = 2 3 4

3 4 5

4 5 6

a

11

a

12

0

a val = a

21

32

a

a

22

33

a

43

a

44

a

54

a

55

a

23

a

34

a

45

a

56

12

Hybrid Storage (HYB)

The Hybrid (HYB) format is a combination of the different storage formats . For example, an hybrid storage format compose by the

ELLPACK and COO formats . In this example, this HYB format stores the minimum number of nonzeros per row (F) in the ELLPACK format and the remaining nonzero values of longer rows into COO format.

As an example, we consider F = 2 then the HYB format is as follows: a

11 a

12 a

21 a

22

0 a

23

0

0

0

0

0

0

0 a

32 a

33 a

34

0 0

0 0 a

43 a

44

0 0 0 a

54 a

45 a

55

0 a

56

1 2

1 2 col_ind _ELL= 2 3

3 4

4 5 row_ind_coo = (2 3 4 5)

col_ind_coo = (3 4 5 6) val_coo = (a

23

a

34

a

45

a

56

)

a

11

a

12

a val_ELL = a

32

a

21

a

a

22

43

a

54

a

33

a

44

55

13

Block Compressed Row Storage (BCRS)

The Block Compressed Row Storage (BCSR, BSR or SPBCSR) format divides a matrix into small blocks (squares or rectangles) of nb dimension, where nb = r x c and r, c denotes the number of rows and columns of each block, respectively . This format is an extension of the CSR format with the difference that the elements in val array are considered dense blocks where it is possible to store a few zero elements inside the block .

The number of the nonzero blocks in the matrix is nnzbl and each block is stored consecutively by row using the CSR format . Like the CSR format we need three arrays: row_blk : integer array which is of length nnzbl. The row_blk(i) contains the pointer in val array of the first element of the nonzero block i.

col ind blk : integer array holding the column indices in the original matrix to the first element of the nonzero blocks. The length of this array is nnzbl.

val : float array which is of length nb*nnzbl. This array stores the values by row to each block.

a

11 a

12

0 0 0 0

To nb = 2x3 and nnzbl = 4 a

21 a

22 a

23

0 0 0 col_ind_blk = (1 1 4 4)

0

0 a

32

0 0 a a

33

43

0 0

0 a

34 a

44

0 a

54

0 0

0 0 a

45

0 a

55 a

56

0 a

66 row_blk = (1 7 13 19) val = (a

11

a

12

0 a

21

a

22

a

23

0 a

32

a

33

0 0 a

43

a

34

0 0 a

44

a

45

0 a

54

a

55

a

56

0 0 a

66

14

)

SpMxV (serial executions)

SpMxV Serial

600

500

400

300

200

100

0 bcsstk29 e40r0100 garon2 memplus msc10848

Matrices

Na5 ncvxbqp1

Intel Core i7

Serial executions of 100 iterations of SpMxV

BCRS: block size 3x3

F=16

CSR

CSRP

CSRPELL

BCSR 3x3

IBCSR 3x3

COO

JDS

TJDS

BJDS

ELL

LadderELL

HYB

FEBA

CSR-FEBA

15

SpMxV (parallel executions using GPUs)

NVIDIA GTX 295 (480 cores)

SpMxV Nvidia test

CRS format

12

10

8

6

4

2

0 bcsstk29 garon2 msc10848 ncvxbqp1 psmigr_1 s3dkq4m2 tandem_vtx

Matrices

Intel Core i7

100 iterations of SpMxV csr_scalar csr_scalar_tex csr_vector csr_vector_tex

Only 240 used

16

Fundamental inner kernels and matrix formats

Beatriz Otero and José Ramón Herrero

Computer Architecture Department

Technical University of Catalonia

17

Download