Solving linear systems

advertisement
Solving linear systems
Solving linear systems – p. 1
Overview
Chapter 12 from Michael J. Quinn, Parallel Programming in C with MPI
and OpenMP
We want to find vector x = (x0 , x1 , . . . , xn−1 ) as solution of a system
of linear equations: Ax = b, where A is an n × n matrix, vector b is of
length n
Three topics for today:
Back substitution
Gaussian elimination
Iterative methods for sparse linear systems
Solving linear systems – p. 2
A system of linear equations
Example:
a0,0 x0
a1,0 x0
...
an−1,0 x0
+a0,1 x1
+a1,1 x1
+ . . . +a0,n−1 xn−1
+ . . . +a1,n−1 xn−1
=
=
b0
b1
+an−1,1 x1
+ . . . +an−1,n−1 xn−1
=
bn−1
where ai,j and bi are constants, xj are the unknown values to be found
Solving linear systems – p. 3
Back substitution
An algorithm for solving Ax = b when A is upper triangular
i > j ⇒ ai,j = 0
We shall first look at its serial implementation
Two possible parallelizations
Solving linear systems – p. 4
Example of back substitution
Starting point:
1x0
+1x1
−2x1
−1x2
−3x2
2x2
+4x3
+1x3
−3x3
2x3
=
=
=
=
8
5
0
4
Solving linear systems – p. 5
Example of back substitution (cont’d)
After step 1:
1x0
+1x1
−2x1
−1x2
−3x2
2x2
2x3
=
=
=
=
0
3
6
4
Solving linear systems – p. 6
Example of back substitution (cont’d)
After step 2:
1x0
+1x1
−2x1
2x2
2x3
=
=
=
=
3
12
6
4
Solving linear systems – p. 7
Example of back substitution (cont’d)
After step 3:
1x0
−2x1
x2
x3
=
=
=
=
9
12
3
2
Solving linear systems – p. 8
Pseudo-code for back substitution
a[0..n − 1, 0..n − 1] — coefficient matrix
b[0..n − 1] — constant vector
x[0..n − 1] — solution vector
for i ← n − 1 down to 1 do
x[i] ← b[i]/a[i, i]
for j ← 0 to i − 1 do
b[j] ← b[j] − x[i] × a[j, i]
a[j, i] ← 0
endfor
endfor
Solving linear systems – p. 9
Observations about back substitution
In each i iteration, x[i] must be computed first as b[i]/a[i, i]
However, b[i] depends on previous i iterations
Therefore, the i for-loop can not be executed in parallel
The j for-loop inside each i iteration can be executed in parallel
Solving linear systems – p. 10
Row-oriented parallel back substitution
The rows of A are distributed among p processes, in an interleaved
striped decomposition:
If mod(i, p)=k, then row i is assigned to process k
The b and x vectors are distributed in the same way
In each i iteration, the process responsible for row i computes
xi = bi /ai,i
Then, the newly computed xi value is broadcast to all processes
Thereafter, each process updates all its responsible bj values as
bj = bj − aj,i xi
Complexity:
Average number of iterations of loop j per process: n/(2p)
Therefore, computational complexity: O(n2 /p)
Complexity of communication latency time: O(n log p)
Complexity of communication data transmission time: O(n log p)
Solving linear systems – p. 11
Column-oriented parallel back substitution
Alternatively, we can distribute the columns of A by an interleaved
striped decomposition
During iteration i, the responsible process computes xi and updates
the entire b vector
Then, the newly updated b vector must be sent to the successor
process before the next i iteration
Therefore, the column-oriented parallel back substitution is actually
not a parallel algorithm, because there is computational concurrency!
Complexity:
Computational complexity: O(n2 )
Complexity of communication latency time: O(n)
Complexity of communication data transmission time: O(n2 )
Solving linear systems – p. 12
Comparison
The column-oriented parallel back substitution is always slower than
the sequential substitution
The row-oriented parallel back substitution can be faster than the
sequential substitution
depending on the values of n, p and communication speeds
The row-oriented parallel back substitution can also be slower than
the column-oriented parallel back substitution
especially when n is relatively small and p is relatively large
Solving linear systems – p. 13
Gaussian elimination
A well-known algorithm for solving dense linear systems
The original system Ax = b is reduced by Gaussian elimination to an
upper triangular system T x = c
Then, back substitution can be used to find x
Solving linear systems – p. 14
Example of Gaussian elimination
Starting point:
4x0
2x0
−4x0
8x0
+6x1
−3x1
+18x1
+2x2
+5x2
−5x2
−2x2
−2x3
−2x3
+4x3
+3x3
=
=
=
=
8
4
1
40
Solving linear systems – p. 15
Example of Gaussian elimination (cont’d)
After step 1:
4x0
+6x1
−3x1
+3x1
+6x1
+2x2
+4x2
−3x2
−6x2
−2x3
−1x3
+2x3
+7x3
=
=
=
=
8
0
9
24
Solving linear systems – p. 16
Example of Gaussian elimination (cont’d)
After step 2:
4x0
+6x1
−3x1
+2x2
+4x2
+1x2
+2x2
−2x3
−1x3
+1x3
+5x3
=
=
=
=
8
0
9
24
Solving linear systems – p. 17
Example of Gaussian elimination (cont’d)
After step 3:
4x0
+6x1
−3x1
+2x2
+4x2
+1x2
−2x3
−1x3
+1x3
+3x3
=
=
=
=
8
0
9
6
Solving linear systems – p. 18
Sequential algorithm of Gaussian elimination
Total n − 1 steps are needed for a linear system with n × n matrix A
and n × 1 vector b
During step i,
The nonzero elements of A below the diagonal in column i are
eliminated by replacing each row j, where i + 1 ≤ j < n, with the
sum of row j and −aj,i /ai,i times row i
Solving linear systems – p. 19
Partial pivoting
During step i of Gaussian elimination, row i is called the pivot row,
that is, the row used to drive to zero all nonzero elements below the
diagonal in column i
However, if ai,i is zero or very close to zero, we will have “trouble”
Gaussian elimination with partial pivoting:
In step i, rows i through n − 1 are searched for the row whose
column i element has the largest absolute value
Then, this row is swapped (pivoted) with row i
Solving linear systems – p. 20
Pseudo-code for Gaussian elimination (row pivoting)
for i ← 0 to n − 1
magnitude ← 0
for j ← i to n − 1
if |a[loc[j], i]| > magnitude
magnitude ← |a[loc[j], i]|
picked ← j
endif
endfor
swap loc[i] and loc[picked]
for j ← i + 1 to n − 1
t ← a[loc[j], i]/a[loc[i], i]
for k ← i + 1 to n − 1
a[loc[j], k] ← a[loc[j], k] − a[loc[i], k] × t
endfor
endfor
endfor
Solving linear systems – p. 21
Parallel algorithms for Gaussian elimination
The outermost i loop can not be parallelized
Both the innermost k loop and the middle j loop can be executed in
parallel
Two parallel algorithms
Two data decompositions
Solving linear systems – p. 22
Row-oriented parallel Gaussian elimination
Row-wise block striped decomposition of the rows
Use of partial pivoting ensures load balancing as the outermost i
loop proceeds
Determining the pivot row (value of picked) can use cooperation
among processes
Each process first finds its candidate for picked together with its
value of |a[loc[picked]], i|
Then, MPI Allreduce is used with operation MPI MAXLOC and
datatype MPI DOUBLE INT
More communication is needed per i iteration:
When picked is decided, the process in charge of row picked
broadcasts aloc[picked],i , aloc[picked],i+1 , . . . , aloc[picked],n−1 to all
other processes
Then, each process carries out, concurrently, a segment of the
middle j loop
Solving linear systems – p. 23
Column-oriented parallel Gaussian elimination
Column-wise interleaved striped decomposition of A
During iteration i
The process controlling column i is alone responsible for finding
the pivot row
Once the pivot row is identified, the controlling process has to
broadcast value of picked and column i elements of the
unmarked rows
The remaining computations are carried out in parallel by all
processes
Solving linear systems – p. 24
Another parallel Gaussian elimination algorithm
Use of column pivoting
Broadcast is replaced by a series of point-to-point message
send/receive
The flow of messages is pipelined
Possibility of overlap between computation and communication
See Section 12.4.6 in the textbook for the details
Solving linear systems – p. 25
Linear systems with a sparse matrix
Gaussian elimination followed by back substitution is an example of
direct method for solving linear systems
Gaussian elimination works well for dense matrix A
When A is sparse, that is, only a few elements are nonzero,
Gaussian elimination is not a good choice
Iterative methods are better choices for sparse matrices
Solving linear systems – p. 26
Iterative methods
A series of approximation vectors: x0 , x1 , . . .
Simple iterative methods (such as Jacobi method)
Advanced iterative methods (such as conjugate gradient method)
Data decomposition
Row-wise block striped decomposition of matrix A (only nonzero
elements are stored)
Matching block striped decomposition of vectors b and x
Each process needs to store a few “ghost values” of x, in addition
to its segment of vector x
A parallel matrix-vector multiplication = local sequential
matrix-vector multiplication + communication afterward
Solving linear systems – p. 27
Download