Introduction to Simulation - Lecture 21 Karen Veroy

advertisement
Introduction to Simulation - Lecture 21
Boundary Value Problems - Solving 3-D
Finite-Difference problems
Jacob White
Thanks to Deepak Ramaswamy, Michal Rewienski, and
Karen Veroy
Outline
• Reminder about FEM and F-D
– 1-D Example
• Finite Difference Matrices in 1, 2 and 3D
– Gaussian Elimination Costs
• Krylov Method
– Communication Lower bound
– Preconditioners based on improving communication
Heat Flow
1-D
Example
Normalized 1-D Equation
Normalized Poisson Equation
∂ ∂T ( x )
∂ u ( x)
= − hs ⇒ −
= f ( x)
κ
2
∂x
∂x
∂x
2
− u xx ( x ) = f ( x )
SMA-HPC ©2002 MIT
SMA-HPC ©2002 MIT
FD Matrix
properties
x0 x1 x2
2
 −1
1 
0
2
∆x 
M
 0
SMA-HPC ©2002 MIT
−uxx = f
1-D Poisson
Finite Differences
−
xn xn +1
−1
0
2 −1
O O
O O
L 0
A
L
O
O
O
−1
uˆ j +1 − 2uˆ j + uˆ j
∆x
2
= f (x j )
 uˆ1 
 f ( x1 ) 
 M 
0   M 


 M 
M   M 
 


0  M  =  M 
 
 M 
−1 M
 


2   M 
 M 
uˆ 
 f ( x )
n 
 n

Residual Equation
Using Basis
Functions
Partial Differential Equation form
∂u
− 2 = f
∂x
2
u (0) = 0 u (1) = 0
Basis Function Representation
n
u ( x ) ≈ uh ( x ) = ∑ ω i ϕ i ( x )
{
i =1
Basis Functions
Plug Basis Function Representation into the Equation
d ϕi ( x )
R ( x ) = ∑ ωi
+ f ( x)
2
dx
i =1
n
SMA-HPC ©2002 MIT
2
Using Basis functions
Basis Weights
Galerkin Scheme
Force the residual to be “orthogonal” to the basis functions
1
∫ ϕ ( x ) R ( x ) dt = 0
l
0
Generates n equations in n unknowns


d ϕi ( x )
ωi
+ f ( x )  dx = 0 l ∈ {1,..., n}
2
∫0 ϕl ( x ) ∑
dx
i =1

1
n
SMA-HPC ©2002 MIT
2
Basis Weights
Using Basis
Functions
Galerkin with integration by
parts
Only first derivatives of basis functions
n
d ϕl ( x )
∫0 dx
1
d ∑ ωiϕ i ( x )
i =1
dx
1
dx − ∫ ϕi ( x ) f ( x ) dx = 0
0
l ∈ {1,..., n}
SMA-HPC ©2002 MIT
Structural Analysis of
Automobiles
• Equations
– Force-displacement relationships for
mechanical elements (plates, beams, shells)
and sum of forces = 0.
– Partial Differential Equations of Continuum
Mechanics
Drag Force Analysis of
Aircraft
• Equations
– Navier-Stokes Partial Differential Equations.
Engine Thermal
Analysis
• Equations
– The Poisson Partial Differential Equation.
2-D Discretized Problem
FD Matrix
properties
Discretized Poisson
m
x1
x2
x2m
xm +1
m
u j +1 − 2u j + u j −1
∆244
x
144
3
u
xx
2
SMA-HPC ©2002 MIT
xm
+
u j + m − 2u j + u j − m
∆y
1442443
u
yy
2
= f (x j )
FD Matrix
properties
SMA-HPC ©2002 MIT
2-D Discretized Problem
Matrix Nonzeros, 5x5 example
3-D Discretization
FD Matrix
properties
Discretized Poisson
x j −m
m
m
x j −1
x j − m2
xj
x j +1
m
x j + m2
uˆ j +1 − 2uˆ j + uˆ j −1
( ∆x )
1442443
u xx
2
SMA-HPC ©2002 MIT
+
uˆ j + m − 2uˆ j + uˆ j − m
( ∆y )
1442443
u yy
2
+
x j +m
uˆ j + m 2 − 2uˆ j + uˆ j − m 2
( ∆z )
144
42444
3
u zz
2
= f (xj )
FD Matrix
properties
SMA-HPC ©2002 MIT
3-D Discretization
Matrix nonzeros, m = 4 example
FD Matrix
properties
Summary
Numerical Properties
Matrix is Irreducibly Diagonally Dominant


 | Aii | ≥ ∑ | Aij | 
j ≠i


Each row is strictly diagonally dominant, or path
connected to a strictly diagonally dominant row
Matrix is symmetric positive definite
Assuming uniform discretization, diagonal is
1
1 − D → 2 2,
∆
SMA-HPC ©2002 MIT
1
1
2 − D → 2 4, 3 − D → 2 6,
∆
∆
Summary
FD Matrix
properties
Structural Properties
Matrices in 3-D are LARGE
1 − D → m × m,
2 − D → m 2 × m 2 , 3 − D → m3 × m3
100x100x100 grid in 3-D = 1 million x 1 million matrix
Matrices are very sparse
Nonzeros per row 1 − D
Matrices are banded
1− D
Aij = 0
|i− j | > 1
2−D
Aij = 0
|i− j| > m
3− D
Aij = 0
| i − j | > m2
SMA-HPC ©2002 MIT
3,
2−D
5, 3 − D
7
Basics of GE
 A11
A
0
21

A
031

041
A
SMA-HPC ©2002 MIT
Triangularizing
Picture
A12 A13 A14 

A
A2323 A
A24
A22
22 A
24


A03232 A33
A
34
34
33

A04242 A0A4343 A4444 
Triangularizing
GE Basics
Algorithm
For i = 1 to n-1 {
“For each Row”
For j = i+1 to n {
“For each Row below pivot”
For k = i+1 to n { “For each element beyond Pivot”
Ajk ⇐ Ajk −
}
Multiplie
r
}
}
Aji
Aii
Pivot
Aik
Form n-1 reciprocals (pivots)
Form
n −1
n2
(n − i ) =
∑
2
i =1
n −1
Perform
SMA-HPC ©2002 MIT
multipliers
∑ (n − i)2 ≈
i =1
2 3
n
3
Multiply-adds
Complexity of GE
1− D
O ( n3 ) = O ( m3 )
← 100 pt grid O(106 ) ops
2−D
O (n3 ) = O (m6 ) ← 100 ×100 grid O(1012 ) ops
3− D
O (n3 ) = O (m9 ) ← 100 ×100 × 100 grid O(1018 ) ops
For 2- D and 3-D problems Need a Faster Solver !
SMA-HPC ©2002 MIT
Triangularizing
Banded GE
Algorithm
b
b
0
For i = 1 to n-1 {
For j = i+1 to ni+b-1
{ {
{
For k = i+1 to n i+b-1
{
Ajk ⇐ Ajk −
NONZEROS
0
Aii
Aik
}
}
}
n −1
Perform
∑ (min(b − 1, n − i))
i =1
SMA-HPC ©2002 MIT
Aji
2
≈ O (b2n )
Multiply-adds
Complexity of
Banded GE
1− D
O (b 2 n) = O (m)
← 100 pt grid O(100) ops
2−D
O (b 2 n) = O (m 4 ) ← 100 ×100 grid O(108 ) ops
3− D
O (b 2 n) = O (m7 ) ← 100 ×100 × 100 grid O (1014 ) ops
For 3 - D problems Need a Faster Solver !
SMA-HPC ©2002 MIT
The World According to
Krylov
Preconditioning
Start with Ax = b, Form PAx = Pb
Determine the Krylov Subspace r 0 = Pb − PAx
k 0
0
0
Krylov Subspace ≡ span r , PAr ,..., ( PA ) r
{
}
Select Solution from the Krylov Subspace
k 0
k +1
k
k
0
0
0
x = x + y , y ∈ span r , PAr ,..., ( PA ) r
{
k
GCR picks a residual minimizing y .
SMA-HPC ©2002 MIT
}
Preconditioning
Krylov Methods
Diagonal Preconditioners
Let A = D + And
(
)
(
)
Apply GCR to D −1 A x = I + D −1 And x = D −1b
• The Inverse of a diagonal is cheap to compute
• Usually improves convergence
SMA-HPC ©2002 MIT
Krylov Methods
Convergence Analysis
Optimality of GCR poly
GCR Optimality Property
% k+1 ( PA) r 0 where ℘
% k+1 is any k th order
r k +1 ≤ ℘
% k+1 ( 0 ) =1
polynomial such that ℘
Therefore
Any polynomial which satisfies the
constraints can be used to get an
upper bound on
SMA-HPC ©2002 MIT
r k +1
r
0
Residual Poly Picture for Heat Conducting Bar Matrix
No loss to air (n=10)
Keep ℘k ( λi ) as small as possible:
Easier if eigenvalues are clustered!
SMA-HPC ©2002 MIT
Krylov Vectors for diagonal
Preconditioned A
The World According to
Krylov
xexact (1 digit)
0.9 -0.7 0.6 -0.5 0.4 -0.3 0.1
b = r0
1
0
0
0
0
0
0
1
-.5
0
0
0
0
0
1.25 -1 -.5
0
0
0
0
−1
D A
D −1 A
1-D Discretized
PDE
−0.5
0
0   1  
 1
 −0.5
  
1
0
O
0 

  −0.5
 0
O
O −0.5   M  

  
0
0
−0.5
1   0 
14444
4244444
3
SMA-HPC ©2002 MIT
D−1A
=
1 
1.25

−0.5

−1 



 −M0.5
 0 
 0 
The World According to
Krylov
xexact (1 digit)
b = r0
−1
D A
−1
D A
Krylov Vectors for Diagonal
Preconditioned A
Communication Lower Bound
0.9 -0.7 0.6 -0.5 0.4 -0.3 0.1
1
0
0
0
0
0
0
1
-.5
0
0
0
0
0
1.25 -1 -.5
0
0
0
0
Communication Lower Bound for m gridpoints
(
−1
)
k
D A r 0 is nonzero in mth entry after k = m iters
k


0
k +1
j
=  x + ∑α j r  ≠ 0
Need at least m iterations for x
m
j =1

m
(
SMA-HPC ©2002 MIT
)
Krylov Vectors for Diagonal
Preconditioned A
The World According to
Krylov
Two Dimensional Case
m
For an mxm Grid
1
1 
0
If r 0 =  
M 
 
0
m
Takes ≈ 2m = O ( m ) iters
m2
SMA-HPC ©2002 MIT
(
for x k +1
)m
2
≠ 0
Convergence for GCR
The World According to
Krylov
Eigenanalysis
−0.5
0
0   1  
 1
 −0.5
  
1
0
O
0 

  −0.5
 0
O
O −0.5   M  

  
0
0
−0.5
1   0 
14444
3
4244444
=
1 
1.25
−0.5

−
1


 −M0.5
 0 
 0 
D−1A
 kπ 
Recall Eigenvalues of D A = 1 − cos 

 m +1 
−1
SMA-HPC ©2002 MIT
Convergence for GCR
The World According to
Krylov
Eigenanalysis
For D −1 A,
κ ≡ λλ
max
min

 mπ   
 π 
= 1 − cos 
−
1
cos



+
+
m
1
m
1





−1
# Iters for GCR to achieve convergence
rk
r
0
k
 κ −1 
≤ 2 
 ≤
 κ +1 
k =
γ
log
 convergence 


tolerance


γ
2
≅
 κ −1  m → ∞
log 

 κ +1 
O (m)
GCR achieves Communication lower bound O(m)!
SMA-HPC ©2002 MIT
The World According to
Krylov
Work for Banded Gaussian
Elimination, Relaxation and GCR
Dimension Banded GE Sparse GE
1
O (m)
O (m)
2
O (m
O (m
3
)
O (m )
4
7
)
O (m )
3
6
GCR
O (m
)
O (m )
O (m )
GCR faster than banded GE in 2 and 3 dimensions
Could be faster, 3-D matrix only m3 nonzeros.
Must defeat the communication lower bound!
SMA-HPC ©2002 MIT
2
3
4
The World According to
Krylov
How to get Faster
Converging GCR
Preconditioning is the Only Hope
GCR already achieves Communication Lower
bound for a diagonally preconditioned A
Preconditioner must accelerate communication
Multiplying by PA must move values by
more than one grid point.
SMA-HPC ©2002 MIT
Gauss-Seidel Preconditioning
Preconditioning
Approaches
Physical View
1-D Discretized PDE
(new)
u0 û1
û2(old)
û1(new)
û
(new)
2
Gauss Seidel
û3(old)
uˆn(new)
−1
uˆn(new) uˆ
n +1
Each Iteration of Gauss-Seidel Relaxation
moves data across the grid
SMA-HPC ©2002 MIT
Gauss-Seidel Preconditioning
Preconditioning
Approaches
xexact (1 digit)
b =−1 r 0
( D + L) A
( D + L)
−1
A
b =−1 r
( D + L) A
0
( D + L)
−1
A
Krylov Vectors
0.9 -0.7 0.6 -0.5 0.4 -0.3 0.1
1
0
0
0
0
0
0
x
x
x
x
x
x
x
x
x
x
x
x
x
x
0
0
0
0
0
0
1
0
0
0
0
0
x
x
0
0
0
0
x
x
x
1-D Discretized
PDE
X=nonzero
Gauss-Seidel communicates quickly
in only one direction
SMA-HPC ©2002 MIT
Gauss-Seidel Preconditioning
Preconditioning
Approaches
Symmetric Gauss-Seidel
(new)
û
u0 1
û2(old)
û2(new)
(new)
û1
û3(old)
uˆn(new)
−1
uˆ
(newer)
û
u 1
0
(new)
n−2
uˆn(newer)
−1
uˆn(new) uˆ
n +1
uˆn( new)
û2(newer)
This symmetric Gauss-Seidel Preconditioner
Communicates both directions
SMA-HPC ©2002 MIT
Gauss-Seidel Preconditioning
Preconditioning
Approaches
Symmetric Gauss-Seidel
Derivation of the SGS Iteration Equations
Forward Sweep ( half step ) : ( D + L ) x k +1/ 2 + Ux k = b
Backward Sweep ( half step ) : ( D + U ) x k + Lx k +1/ 2 = b
⇒ x
k +1
= ( D + U ) L ( D + L ) Ux k
−1
−1
−1
+ ( D +U ) b − ( D +U ) L ( D + L) b
⇒ x
k +1
= x − ( D + U ) D ( D + L ) Ax k
−1
k
−1
−1
−1
+ ( D +U ) D ( D + L) b
−1
SMA-HPC ©2002 MIT
−1
Block Diagonal
Preconditioners
Preconditioning
Approaches
Line Schemes
Grid
Matrix
1
m2
Tridiagonal Matrices factor quickly
SMA-HPC ©2002 MIT
Block Diagonal
Preconditioners
Preconditioning
Approaches
Line Schemes
Grid
Problem
1
Lines preconditioners
communicate rapidly in
only one direction
Solution
m2
Do lines first in x,
then in y.
The Preconditioner is now two Tridiagonal solves, with
variable reordering in between.
SMA-HPC ©2002 MIT
Block Diagonal
Preconditioners
Preconditioning
Approaches
Domain Decomposition
Approach
1
Break the domain into
small blocks each with the
same # of grid points
The trade-off
m2
SMA-HPC ©2002 MIT
Fewer blocks means
faster convergence, but
more costly iterates
Block Diagonal
Preconditioners
Preconditioning
Approaches
Domain Decomposition
m points
1
2
Block
index
l
m
l2
m
points
l
2
m
Block cost: factoring   l × l grids, O ( m 2l ) , sparse GE
l 
m
GCR iters: Communication bound gives O   iterations.
 l3 
Suggests insensitivity to l: Algorithm is O ( m ) .
Do you have to refactor every GCR iteration?
SMA-HPC ©2002 MIT
Siedelerized Block Diagonal
Preconditioners
Preconditioning
Approaches
Line Schemes
Grid
Matrix
1
m2
SMA-HPC ©2002 MIT
Overlapping Domain
Preconditioners
Preconditioning
Approaches
Line based Schemes
Grid
Matrix
1
m2
Bigger systems to solve, but can have faster convergence on
tougher problems (not just Poisson).
SMA-HPC ©2002 MIT
Preconditioning
i
Approaches
Incomplete Factorization
Schemes
Outline
Reminder about Gaussian Elimination
Computational Steps
Fill-in for Sparse Matrices
Greatly increases factorization cost
Fill-in in a 2-D grid
Incomplete Factorization Idea
Sparse Matrices
Fill-In
Example
Matrix Non zero structure
Matrix after one GE step
X X X
X X 0 


 X 0 X 
X X X
X X X

0


0 X 
 X X
X= Non zero
Fill-ins
SMA-HPC ©2002 MIT
Fill-In
Sparse Matrices
Second Example
Fill-ins Propagate
X
X

0

0
X
X
X
X
0
X
X
X
X
0
X

X
0

X
0

X
0
Fill-ins from Step 1 result in Fill-ins in step 2
SMA-HPC ©2002 MIT
Sparse Matrices
Fill-In
Pattern of a Filled-in Matrix
























SMA-HPC ©2002 MIT
Very Sparse
Very Sparse
Dense
























Sparse Matrices
SMA-HPC ©2002 MIT
Fill-In
Unfactored Random Matrix
Sparse Matrices
SMA-HPC ©2002 MIT
Fill-In
Factored Random Matrix
Factoring 2-D Finite-Difference matrices
Generated Fill-in Makes Factorization Expensive
SMA-HPC ©2002 MIT
FD Matrix
properties
SMA-HPC ©2002 MIT
3-D Discretization
Matrix nonzeros, m = 4 example
Preconditioning
i
Approaches
Incomplete Factorization
Schemes
Key idea
THROW AWAY FILL-INS!
Throw away all fill-ins
Throw away only fill-ins with small values
Throw away fill-ins produced by other fill-ins
Throw away fill-ins produced by fill-ins of
other fill-ins, etc.
Summary
• 3-D BVP Examples
– Aerodynamics, Continuum Mechanics, Heat-Flow
• Finite Difference Matrices in 1, 2 and 3D
– Gaussian Elimination Costs
• Krylov Method
– Communication Lower bound
– Preconditioners based on improving communication
Download