Lecture 11

advertisement

Linear Triangular System

Lx

 b

L – lower triangular matrix, nonsingular

 a a

11

21 x

1

 x

2

0 a

22

 x x

2

1

 b

1

( b

2

/ a

11

 a

21 x

1

)

/

 a

22

 b b

2

1

Lx=b

L: nxn nonsingular lower triangular b: known vector b(1) = b(1)/L(1,1)

For i=2:n b(i) = (b(i)-L(i,1:i-1)b(1:i-1))/L(i,i) end

Forward substitution, row version

1

Triangular System

Column version (column sweep method):

As soon as a variable is solved, its effect can be subtracted from subsequent equations

2

1

7 x

1

0

5

9

3

5

9

0

0

8

 x x

2 x

3

1

6

2

5

0

8

 x

2 x

3

2

5

3

1

7

1

16

Lx = b for j=1:n-1 b(j) = b(j)/L(j,j) b(j+1:n) = b(j+1:n)-b(j)L(j+1:n,j) end b(n) = b(n)/L(n,n)

Forward substitution, column version

Column version is more amenable to parallel computing

2

Triangular System: Parallel

L L b b block Block cyclic

As soon as x_i (or a few x_i variables) is computed, the value is passed downward to neighboring cpus;

As soon as a cpu receives x_i value, it passes the value downward to neighboring cpus;

Then local b vector is updated.

Disadvantage: load imbalance, about 50% cpus are active on average

Remedy: cyclic or block cyclic distribution of rows.

3

Triangular System: Inversion

A

A

1

X

A

A

3

1

0

A

2

A – NxN lower triangular

Divide A into equal blocks

A

1

1

X

0

A

2

1

A

2

1

A

3

A

1

1

Can inverse A recursively:

Inverse A1;

Inverse A2;

Compute X by matrix multiplication

 Matrix multiplication

4

Triangular System: Inversion

1

4 2

-2 3 1

7 2 -3 3

-1 4 3 0 -1

0 1 2 5 -2 1

6 -1 3 0 0 1 -2

4 1 5 -3 2 0 0 1

-1

0

-2

7

1

4

6

1/2

4

1

3

2

-1

3

2

1

-3

3

1/3

0

5

0

-1

-2

0

1

1

4 1 5 -3 2 0

-

1/2

0 1

-2

7

1

-2

-1

0

6

2

4

1

-1

1/2

3

3

2

3

1

1

4 1 5

1/3

0

5

0

-1

-2

0

1

1

-3 2 0

-1/2

0 1

First phase: invert diagonal elements of A

2 nd

… phase: compute 2x2 diagonal blocks of A^(-1)

K-th phase: compute diagonal 2^(k-1) x 2^(k-1) blocks of

A^(-1)

Essentially matrix multiplications;

K-th phase: N/2^(k-1) pairs of 2^(k-2)x2^(k-2) matrix multiplications

Can do in parallel on P=K^3 processors

5

Gaussian Elimination

3 x

6 x

1

1

5 x

2

7 x

2

9

4

3 x

1

3

 x

2

5 x

2

14

9

3

6

5

7

1

2

0

1

3

0

5

3

Ax = b

A = LU,

L – unit lower triangular

U – upper triangular

Ax = b

LUx = b

Ly = b, Ux = y

Especially with multiple rhs or solve same equations (same coefficient matrix) many times

6

LU Factorization

A = LU

A – nxn matrix

A(1:k,1:k) nonsingular for k=1:n-1

After factorization, L is in strictly lower triangular part of A, U is in upper triangular part of A (including

(kij) version

For k=1:n-1 for i=k+1:n diagonal)

A(i,k) = A(i,k)/A(k,k) for j=k+1:n

A(k,k) is the pivot

A(i,j) = A(i,j) – A(i,k)A(k,j) end end end or

For k=1:n-1

A(k+1:n,k) = A(k+1:n,k)/A(k,k)

A(k+1:n,k+1:n) = A(k+1:n,k+1:n)- A(k+1:n,k)A(k,k+1:n) end

7

Factorization Breakdown

If A(k,k)=0 at any stage

 breakdown, LU factorization may not exist even if A is nonsingular

Theorem:

Assume A is nxn matrix.

(1) A has an LU factorization if A(1:k,1:k) is non-singular for all k=1:n-1.

(2) If the LU factorization exists and A is non-singular, then the LU factorization is unique.

Avoid method breakdown

 pivoting

Pivoting is also necessary to improve accuracy. Small pivot

 increased errors

Make sure no large entries appear in L or U. Use large pivots.

8

Block LU Factorization

A

A

11

A

21

A

12

A

22

L

11

L

21

0

L

22

U

0

11

U

12

U

22

A – nxn matrix, n = r*N

A11 – rxr matrix, A22 – (n-r)x(n-r) matrix, A12 – rx(n-r) matrix, A21 – (n-r)xr

A11 = L11*U11

A12 = L11*U12

U12

A21 = L21*U11

L21

A22 = L21*U12+L22*U22

A22-L21*U12 = A’ = L22*U22

LU factorization iteratively

9

Block LU Factorization

A – nxn matrix

A(1:k,1:k) is non-singular for k=1:n-1

1<= r <= n

Upon completion, A(i,j) overwritten by L(i,j) for i>j; A(i,j) overwritten by U(i,j) for i<=j s = 1

While s <= n q = min(n,s+r-1)

Use scalar algorithm to LU-factorize A(s:q,s:q) into L and U

Solve LZ = A(s:q,q+1:n) for Z and overwrite A(s:q,q+1:n) with Z

Solve WU = A(q+1:n,s:q) for W and overwrite A(q+1:n,s:q) with W

A(q+1:n,q+1:n) = A(q+1:n,q+1:n) – WZ s = q+1

End

Matrix multiplication accounts for significant fraction of operations

10

Permutation Matrix

Permutation matrix: identity matrix with its rows re-ordered.

P

0

 1

0

 0

0

0

0

1

0

0

1

0

1

0

0

0  p = [4 1 3 2] encodes permutation matrix P p(k) is the column index of the “1” in k-th row

PA: row-permuted version of A

AP: column-permuted version of A

Interchange permutation matrix: identity matrix with two rows swapped

E

0

0

0

1

0

1

0

0

0

0

1

0

1

0

0

0

Row 1 and 4 swapped

EA: swap rows 1 and 4 of A

AE: swap columns 1 and 4 of A

11

Permutation Matrix

A permutation matrix can be expressed as a series of row interchanges:

P

E n

 E

2

E

1

If E_{k} is the interchange permutation matrix with rows k and p(k) interchanged,

Then P can be encoded by vector p(1:n).

If x(1:n) is a vector, then Px can be computed using p(1:n)

For k=1:n swap x(k) and x(p(k))

End p(1:n) vector is useful for pivoting

12

Partial Pivoting

Pivoting is crucial to preventing breakdown and improving accuracy

Partial pivoting: choose largest element in a column (or row) and interchange rows

(columns)

A

3

2

6

17

4

18

10

2

12

Swap rows 1 and 3

6

2

3

18

4

17

12

10

2 

6

0

0

18

8

2

12

2

16

6

0

0

8

8

0

12

16

6

Swap rows

2 and 3

6

0

0

18

8

2

12

16

2

13

LU Factorization with Row Partial Pivoting

A – nxn matrix

After factorization, strictly lower triangular part of A contains L; upper triangular part contains U; vector p(1:n-1) contains permutation operations in partial pivoting

Algorithm F2:

For k=1:n-1

Determine s with k<=s<=n s.t. |A(s,k)| is largest among |A(k:n,k)| swap A(k,1:n) and A(s,1:n) p(k) = s if A(k,k) != 0

A(k+1:n,k) = A(k+1:n,k)/A(k,k)

A(k+1:n,k+1:n) = A(k+1:n,k+1:n)-A(k+1:n,k)A(k,k+1:n) end end

14

How to Use Factorized A

Solve Ax = b

Using LU factorization of row partial pivoting

Need to swap elements of b according to partial pivoting  information in p(1:n-1)

Assume A is LU factorized with row partial pivoting using algorithm F2:

For k=1:n-1 swap b(k) and b(p(k))

End

Solve Ly = b

Solve Ux = y

L - unit lower triangular matrix whose lower triangular part is the same as that of A; U

- upper triangular part of A (including diagonal)

15

LU Factorization With Row Partial Pivoting

A – nxn matrix

After factorization, strictly lower triangular part of A contains multipliers; upper triangular part contains U; vector p(1:n-1) contains permutation operations in partial pivoting

Algorithm F1:

For k=1:n-1

Determine s with k<=s<=n s.t. |A(s,k)| is largest among |A(k:n,k)| swap A(k,k:n) and A(s,k:n) // only difference with F2 p(k) = s if A(k,k) != 0

A(k+1:n,k) = A(k+1:n,k)/A(k,k)

A(k+1:n,k+1:n) = A(k+1:n,k+1:n)-A(k+1:n,k)A(k,k+1:n) end end

16

How to Use Factorized A

Solve Ax = b

Using LU factorization of partial pivoting

Need to swap elements of b according to partial pivoting

 information in p(1:n-1)

Need to multiply appropriate coefficients

 information in lower triangular part of A

Assume A is LU factorized with partial pivoting using algorithm F1 :

For k=1:n-1 swap b(k) and b(p(k)) b(k+1:n) = b(k+1:n) – b(k)A(k+1:n,k)

End

Solve Ux = b

U - upper triangular part of A (including diagonal)

17

Column Partial Pivoting

Column partial pivoting: search row k for the largest element, exchange that column with column k.

A – nxn matrix

After factorization, strictly lower triangular part of A contains L; upper triangular part contains U; vector p(1:n-1) contains permutation operations in partial pivoting

Algorithm G:

For k=1:n-1

Determine s with k<=s<=n s.t. |A(k,s)| is largest among |A(k,k:n)| swap A(1:n,k) and A(1:n,s) p(k) = s if A(k,k) != 0

A(k+1:n,k) = A(k+1:n,k)/A(k,k)

A(k+1:n,k+1:n) = A(k+1:n,k+1:n)-A(k+1:n,k)A(k,k+1:n) end end

18

How to Use Factorized A

Solve Ax = b

Using LU factorization with column partial pivoting

Need to swap elements of x according to partial pivoting  information in p(1:n-1)

Assume A is LU factorized with column partial pivoting using algorithm G:

Solve Ly = b

Solve Ux = y

For k=n-1:-1:1 swap x(k) and x(p(k)) end

L - unit lower triangular matrix whose lower triangular part is the same as that of A; U

- upper triangular part of A (including diagonal)

19

Complete Pivoting

Complete pivoting: the largest element in submatrix A(k:n,k:n) is permuted into

(k,k) as the pivot

Need a row interchange and a column interchange

A – nxn matrix p(1:n-1) – vector encoding row interchanges q(1:n-1) – vector encoding column interchanges

After factorization, lower triangular part of A contains L, upper triangular part of

A contains U (including diagonal)

20

LU Factorization with Complete Pivoting

LU factorization with complete pivoting

For k=1:n-1

Determine s (k<=s<=n) and t (k<=t<=n) s.t. |A(s,t)| is largest among |A(i,j)| for i=k:n, j=k:n swap A(k,1:n) and A(s,1:n) swap A(1:n,k) and A(1:n,t) p(k) = s q(k) = t if A(k,k) != 0 end end

A(k+1:n,k) = A(k+1:n)/A(k,k)

A(k+1:n,k+1:n) = A(k+1:n,k+1:n)-A(k+1:n,k)A(k,k+1:n)

21

How to Use Factorized A

Solve Ax = b

By LU factorization with complete pivoting

Suppose A is LU factorized with complete pivoting, p(1:n-1) and q(1:n-1) are permutation encoding vectors for k=1:n-1 swap b(k) and b(p(k))

End

Solve Ly = b for y

Solve Ux = y for x

For k=n-1:-1:1 swap x(k) and x(q(k))

End

L and U are lower and upper triangular parts of factorized A

22

Parallelization of Gaussian Elimination

A(k,k)

Row-wise 1D block decomposition

At step k, the processor holding the pivot sends row k: A(k,k:n) to bottom neighboring processor;

At each processor, forward data immediately to bottom neighbor upon receiving data from top processor; then update its own data; then wait for data from top neighbor

Disadvantage: load imbalance

Remedy: row-wise block cyclic distribution 23

Parallelization with Partial Pivoting

Row-wise block/block-cyclic decomposition

Gaussian elimination with column partial pivoting

More difficult with row partial pivoting

Pivoting search on the processor holding row k, no communication among processors;

Column index of the new pivot element together with row k: A(k:n) need to be sent out;

On each processor, upon receiving data from top neighbor, forward immediately to bottom neighbor, and swap column k and new pivot column of own data; update own data; wait data from top neighbor;

24

Download