Lx
b
L – lower triangular matrix, nonsingular
a a
11
21 x
1
x
2
0 a
22
x x
2
1
b
1
( b
2
/ a
11
a
21 x
1
)
/
a
22
b b
2
1
Lx=b
L: nxn nonsingular lower triangular b: known vector b(1) = b(1)/L(1,1)
For i=2:n b(i) = (b(i)-L(i,1:i-1)b(1:i-1))/L(i,i) end
Forward substitution, row version
1
Column version (column sweep method):
As soon as a variable is solved, its effect can be subtracted from subsequent equations
2
1
7 x
1
0
5
9
3
5
9
0
0
8
x x
2 x
3
1
6
2
5
0
8
x
2 x
3
2
5
3
1
7
1
16
Lx = b for j=1:n-1 b(j) = b(j)/L(j,j) b(j+1:n) = b(j+1:n)-b(j)L(j+1:n,j) end b(n) = b(n)/L(n,n)
Forward substitution, column version
Column version is more amenable to parallel computing
2
L L b b block Block cyclic
As soon as x_i (or a few x_i variables) is computed, the value is passed downward to neighboring cpus;
As soon as a cpu receives x_i value, it passes the value downward to neighboring cpus;
Then local b vector is updated.
Disadvantage: load imbalance, about 50% cpus are active on average
Remedy: cyclic or block cyclic distribution of rows.
3
A
A
1
X
A
A
3
1
0
A
2
A – NxN lower triangular
Divide A into equal blocks
A
1
1
X
0
A
2
1
A
2
1
A
3
A
1
1
Can inverse A recursively:
Inverse A1;
Inverse A2;
Compute X by matrix multiplication
Matrix multiplication
4
1
4 2
-2 3 1
7 2 -3 3
-1 4 3 0 -1
0 1 2 5 -2 1
6 -1 3 0 0 1 -2
4 1 5 -3 2 0 0 1
-1
0
-2
7
1
4
6
1/2
4
1
3
2
-1
3
2
1
-3
3
1/3
0
5
0
-1
-2
0
1
1
4 1 5 -3 2 0
-
1/2
0 1
-2
7
1
-2
-1
0
6
2
4
1
-1
1/2
3
3
2
3
1
1
4 1 5
1/3
0
5
0
-1
-2
0
1
1
-3 2 0
-1/2
0 1
First phase: invert diagonal elements of A
2 nd
… phase: compute 2x2 diagonal blocks of A^(-1)
K-th phase: compute diagonal 2^(k-1) x 2^(k-1) blocks of
A^(-1)
Essentially matrix multiplications;
K-th phase: N/2^(k-1) pairs of 2^(k-2)x2^(k-2) matrix multiplications
Can do in parallel on P=K^3 processors
5
3 x
6 x
1
1
5 x
2
7 x
2
9
4
3 x
1
3
x
2
5 x
2
14
9
3
6
5
7
1
2
0
1
3
0
5
3
Ax = b
A = LU,
L – unit lower triangular
U – upper triangular
Ax = b
LUx = b
Ly = b, Ux = y
Especially with multiple rhs or solve same equations (same coefficient matrix) many times
6
A = LU
A – nxn matrix
A(1:k,1:k) nonsingular for k=1:n-1
After factorization, L is in strictly lower triangular part of A, U is in upper triangular part of A (including
(kij) version
For k=1:n-1 for i=k+1:n diagonal)
A(i,k) = A(i,k)/A(k,k) for j=k+1:n
A(k,k) is the pivot
A(i,j) = A(i,j) – A(i,k)A(k,j) end end end or
For k=1:n-1
A(k+1:n,k) = A(k+1:n,k)/A(k,k)
A(k+1:n,k+1:n) = A(k+1:n,k+1:n)- A(k+1:n,k)A(k,k+1:n) end
7
If A(k,k)=0 at any stage
breakdown, LU factorization may not exist even if A is nonsingular
Theorem:
Assume A is nxn matrix.
(1) A has an LU factorization if A(1:k,1:k) is non-singular for all k=1:n-1.
(2) If the LU factorization exists and A is non-singular, then the LU factorization is unique.
Avoid method breakdown
pivoting
Pivoting is also necessary to improve accuracy. Small pivot
increased errors
Make sure no large entries appear in L or U. Use large pivots.
8
A
A
11
A
21
A
12
A
22
L
11
L
21
0
L
22
U
0
11
U
12
U
22
A – nxn matrix, n = r*N
A11 – rxr matrix, A22 – (n-r)x(n-r) matrix, A12 – rx(n-r) matrix, A21 – (n-r)xr
A11 = L11*U11
A12 = L11*U12
U12
A21 = L21*U11
L21
A22 = L21*U12+L22*U22
A22-L21*U12 = A’ = L22*U22
LU factorization iteratively
9
A – nxn matrix
A(1:k,1:k) is non-singular for k=1:n-1
1<= r <= n
Upon completion, A(i,j) overwritten by L(i,j) for i>j; A(i,j) overwritten by U(i,j) for i<=j s = 1
While s <= n q = min(n,s+r-1)
Use scalar algorithm to LU-factorize A(s:q,s:q) into L and U
Solve LZ = A(s:q,q+1:n) for Z and overwrite A(s:q,q+1:n) with Z
Solve WU = A(q+1:n,s:q) for W and overwrite A(q+1:n,s:q) with W
A(q+1:n,q+1:n) = A(q+1:n,q+1:n) – WZ s = q+1
End
Matrix multiplication accounts for significant fraction of operations
10
Permutation matrix: identity matrix with its rows re-ordered.
P
0
1
0
0
0
0
0
1
0
0
1
0
1
0
0
0 p = [4 1 3 2] encodes permutation matrix P p(k) is the column index of the “1” in k-th row
PA: row-permuted version of A
AP: column-permuted version of A
Interchange permutation matrix: identity matrix with two rows swapped
E
0
0
0
1
0
1
0
0
0
0
1
0
1
0
0
0
Row 1 and 4 swapped
EA: swap rows 1 and 4 of A
AE: swap columns 1 and 4 of A
11
A permutation matrix can be expressed as a series of row interchanges:
P
E n
E
2
E
1
If E_{k} is the interchange permutation matrix with rows k and p(k) interchanged,
Then P can be encoded by vector p(1:n).
If x(1:n) is a vector, then Px can be computed using p(1:n)
For k=1:n swap x(k) and x(p(k))
End p(1:n) vector is useful for pivoting
12
Pivoting is crucial to preventing breakdown and improving accuracy
Partial pivoting: choose largest element in a column (or row) and interchange rows
(columns)
A
3
2
6
17
4
18
10
2
12
Swap rows 1 and 3
6
2
3
18
4
17
12
10
2
6
0
0
18
8
2
12
2
16
6
0
0
8
8
0
12
16
6
Swap rows
2 and 3
6
0
0
18
8
2
12
16
2
13
A – nxn matrix
After factorization, strictly lower triangular part of A contains L; upper triangular part contains U; vector p(1:n-1) contains permutation operations in partial pivoting
Algorithm F2:
For k=1:n-1
Determine s with k<=s<=n s.t. |A(s,k)| is largest among |A(k:n,k)| swap A(k,1:n) and A(s,1:n) p(k) = s if A(k,k) != 0
A(k+1:n,k) = A(k+1:n,k)/A(k,k)
A(k+1:n,k+1:n) = A(k+1:n,k+1:n)-A(k+1:n,k)A(k,k+1:n) end end
14
Solve Ax = b
Using LU factorization of row partial pivoting
Need to swap elements of b according to partial pivoting information in p(1:n-1)
Assume A is LU factorized with row partial pivoting using algorithm F2:
For k=1:n-1 swap b(k) and b(p(k))
End
Solve Ly = b
Solve Ux = y
L - unit lower triangular matrix whose lower triangular part is the same as that of A; U
- upper triangular part of A (including diagonal)
15
A – nxn matrix
After factorization, strictly lower triangular part of A contains multipliers; upper triangular part contains U; vector p(1:n-1) contains permutation operations in partial pivoting
Algorithm F1:
For k=1:n-1
Determine s with k<=s<=n s.t. |A(s,k)| is largest among |A(k:n,k)| swap A(k,k:n) and A(s,k:n) // only difference with F2 p(k) = s if A(k,k) != 0
A(k+1:n,k) = A(k+1:n,k)/A(k,k)
A(k+1:n,k+1:n) = A(k+1:n,k+1:n)-A(k+1:n,k)A(k,k+1:n) end end
16
Solve Ax = b
Using LU factorization of partial pivoting
Need to swap elements of b according to partial pivoting
information in p(1:n-1)
Need to multiply appropriate coefficients
information in lower triangular part of A
Assume A is LU factorized with partial pivoting using algorithm F1 :
For k=1:n-1 swap b(k) and b(p(k)) b(k+1:n) = b(k+1:n) – b(k)A(k+1:n,k)
End
Solve Ux = b
U - upper triangular part of A (including diagonal)
17
Column partial pivoting: search row k for the largest element, exchange that column with column k.
A – nxn matrix
After factorization, strictly lower triangular part of A contains L; upper triangular part contains U; vector p(1:n-1) contains permutation operations in partial pivoting
Algorithm G:
For k=1:n-1
Determine s with k<=s<=n s.t. |A(k,s)| is largest among |A(k,k:n)| swap A(1:n,k) and A(1:n,s) p(k) = s if A(k,k) != 0
A(k+1:n,k) = A(k+1:n,k)/A(k,k)
A(k+1:n,k+1:n) = A(k+1:n,k+1:n)-A(k+1:n,k)A(k,k+1:n) end end
18
Solve Ax = b
Using LU factorization with column partial pivoting
Need to swap elements of x according to partial pivoting information in p(1:n-1)
Assume A is LU factorized with column partial pivoting using algorithm G:
Solve Ly = b
Solve Ux = y
For k=n-1:-1:1 swap x(k) and x(p(k)) end
L - unit lower triangular matrix whose lower triangular part is the same as that of A; U
- upper triangular part of A (including diagonal)
19
Complete pivoting: the largest element in submatrix A(k:n,k:n) is permuted into
(k,k) as the pivot
Need a row interchange and a column interchange
A – nxn matrix p(1:n-1) – vector encoding row interchanges q(1:n-1) – vector encoding column interchanges
After factorization, lower triangular part of A contains L, upper triangular part of
A contains U (including diagonal)
20
LU factorization with complete pivoting
For k=1:n-1
Determine s (k<=s<=n) and t (k<=t<=n) s.t. |A(s,t)| is largest among |A(i,j)| for i=k:n, j=k:n swap A(k,1:n) and A(s,1:n) swap A(1:n,k) and A(1:n,t) p(k) = s q(k) = t if A(k,k) != 0 end end
A(k+1:n,k) = A(k+1:n)/A(k,k)
A(k+1:n,k+1:n) = A(k+1:n,k+1:n)-A(k+1:n,k)A(k,k+1:n)
21
Solve Ax = b
By LU factorization with complete pivoting
Suppose A is LU factorized with complete pivoting, p(1:n-1) and q(1:n-1) are permutation encoding vectors for k=1:n-1 swap b(k) and b(p(k))
End
Solve Ly = b for y
Solve Ux = y for x
For k=n-1:-1:1 swap x(k) and x(q(k))
End
L and U are lower and upper triangular parts of factorized A
22
A(k,k)
Row-wise 1D block decomposition
At step k, the processor holding the pivot sends row k: A(k,k:n) to bottom neighboring processor;
At each processor, forward data immediately to bottom neighbor upon receiving data from top processor; then update its own data; then wait for data from top neighbor
Disadvantage: load imbalance
Remedy: row-wise block cyclic distribution 23
Row-wise block/block-cyclic decomposition
Gaussian elimination with column partial pivoting
More difficult with row partial pivoting
Pivoting search on the processor holding row k, no communication among processors;
Column index of the new pivot element together with row k: A(k:n) need to be sent out;
On each processor, upon receiving data from top neighbor, forward immediately to bottom neighbor, and swap column k and new pivot column of own data; update own data; wait data from top neighbor;
24