Introduction to Simulation - Lecture 21 Boundary Value Problems - Solving 3-D Finite-Difference problems Jacob White Thanks to Deepak Ramaswamy, Michal Rewienski, and Karen Veroy Outline • Reminder about FEM and F-D – 1-D Example • Finite Difference Matrices in 1, 2 and 3D – Gaussian Elimination Costs • Krylov Method – Communication Lower bound – Preconditioners based on improving communication Heat Flow 1-D Example Normalized 1-D Equation Normalized Poisson Equation ∂ ∂T ( x ) ∂ u ( x) = − hs ⇒ − = f ( x) κ 2 ∂x ∂x ∂x 2 − u xx ( x ) = f ( x ) SMA-HPC ©2002 MIT SMA-HPC ©2002 MIT FD Matrix properties x0 x1 x2 2 −1 1 0 2 ∆x M 0 SMA-HPC ©2002 MIT −uxx = f 1-D Poisson Finite Differences − xn xn +1 −1 0 2 −1 O O O O L 0 A L O O O −1 uˆ j +1 − 2uˆ j + uˆ j ∆x 2 = f (x j ) uˆ1 f ( x1 ) M 0 M M M M 0 M = M M −1 M 2 M M uˆ f ( x ) n n Residual Equation Using Basis Functions Partial Differential Equation form ∂u − 2 = f ∂x 2 u (0) = 0 u (1) = 0 Basis Function Representation n u ( x ) ≈ uh ( x ) = ∑ ω i ϕ i ( x ) { i =1 Basis Functions Plug Basis Function Representation into the Equation d ϕi ( x ) R ( x ) = ∑ ωi + f ( x) 2 dx i =1 n SMA-HPC ©2002 MIT 2 Using Basis functions Basis Weights Galerkin Scheme Force the residual to be “orthogonal” to the basis functions 1 ∫ ϕ ( x ) R ( x ) dt = 0 l 0 Generates n equations in n unknowns d ϕi ( x ) ωi + f ( x ) dx = 0 l ∈ {1,..., n} 2 ∫0 ϕl ( x ) ∑ dx i =1 1 n SMA-HPC ©2002 MIT 2 Basis Weights Using Basis Functions Galerkin with integration by parts Only first derivatives of basis functions n d ϕl ( x ) ∫0 dx 1 d ∑ ωiϕ i ( x ) i =1 dx 1 dx − ∫ ϕi ( x ) f ( x ) dx = 0 0 l ∈ {1,..., n} SMA-HPC ©2002 MIT Structural Analysis of Automobiles • Equations – Force-displacement relationships for mechanical elements (plates, beams, shells) and sum of forces = 0. – Partial Differential Equations of Continuum Mechanics Drag Force Analysis of Aircraft • Equations – Navier-Stokes Partial Differential Equations. Engine Thermal Analysis • Equations – The Poisson Partial Differential Equation. 2-D Discretized Problem FD Matrix properties Discretized Poisson m x1 x2 x2m xm +1 m u j +1 − 2u j + u j −1 ∆244 x 144 3 u xx 2 SMA-HPC ©2002 MIT xm + u j + m − 2u j + u j − m ∆y 1442443 u yy 2 = f (x j ) FD Matrix properties SMA-HPC ©2002 MIT 2-D Discretized Problem Matrix Nonzeros, 5x5 example 3-D Discretization FD Matrix properties Discretized Poisson x j −m m m x j −1 x j − m2 xj x j +1 m x j + m2 uˆ j +1 − 2uˆ j + uˆ j −1 ( ∆x ) 1442443 u xx 2 SMA-HPC ©2002 MIT + uˆ j + m − 2uˆ j + uˆ j − m ( ∆y ) 1442443 u yy 2 + x j +m uˆ j + m 2 − 2uˆ j + uˆ j − m 2 ( ∆z ) 144 42444 3 u zz 2 = f (xj ) FD Matrix properties SMA-HPC ©2002 MIT 3-D Discretization Matrix nonzeros, m = 4 example FD Matrix properties Summary Numerical Properties Matrix is Irreducibly Diagonally Dominant | Aii | ≥ ∑ | Aij | j ≠i Each row is strictly diagonally dominant, or path connected to a strictly diagonally dominant row Matrix is symmetric positive definite Assuming uniform discretization, diagonal is 1 1 − D → 2 2, ∆ SMA-HPC ©2002 MIT 1 1 2 − D → 2 4, 3 − D → 2 6, ∆ ∆ Summary FD Matrix properties Structural Properties Matrices in 3-D are LARGE 1 − D → m × m, 2 − D → m 2 × m 2 , 3 − D → m3 × m3 100x100x100 grid in 3-D = 1 million x 1 million matrix Matrices are very sparse Nonzeros per row 1 − D Matrices are banded 1− D Aij = 0 |i− j | > 1 2−D Aij = 0 |i− j| > m 3− D Aij = 0 | i − j | > m2 SMA-HPC ©2002 MIT 3, 2−D 5, 3 − D 7 Basics of GE A11 A 0 21 A 031 041 A SMA-HPC ©2002 MIT Triangularizing Picture A12 A13 A14 A A2323 A A24 A22 22 A 24 A03232 A33 A 34 34 33 A04242 A0A4343 A4444 Triangularizing GE Basics Algorithm For i = 1 to n-1 { “For each Row” For j = i+1 to n { “For each Row below pivot” For k = i+1 to n { “For each element beyond Pivot” Ajk ⇐ Ajk − } Multiplie r } } Aji Aii Pivot Aik Form n-1 reciprocals (pivots) Form n −1 n2 (n − i ) = ∑ 2 i =1 n −1 Perform SMA-HPC ©2002 MIT multipliers ∑ (n − i)2 ≈ i =1 2 3 n 3 Multiply-adds Complexity of GE 1− D O ( n3 ) = O ( m3 ) ← 100 pt grid O(106 ) ops 2−D O (n3 ) = O (m6 ) ← 100 ×100 grid O(1012 ) ops 3− D O (n3 ) = O (m9 ) ← 100 ×100 × 100 grid O(1018 ) ops For 2- D and 3-D problems Need a Faster Solver ! SMA-HPC ©2002 MIT Triangularizing Banded GE Algorithm b b 0 For i = 1 to n-1 { For j = i+1 to ni+b-1 { { { For k = i+1 to n i+b-1 { Ajk ⇐ Ajk − NONZEROS 0 Aii Aik } } } n −1 Perform ∑ (min(b − 1, n − i)) i =1 SMA-HPC ©2002 MIT Aji 2 ≈ O (b2n ) Multiply-adds Complexity of Banded GE 1− D O (b 2 n) = O (m) ← 100 pt grid O(100) ops 2−D O (b 2 n) = O (m 4 ) ← 100 ×100 grid O(108 ) ops 3− D O (b 2 n) = O (m7 ) ← 100 ×100 × 100 grid O (1014 ) ops For 3 - D problems Need a Faster Solver ! SMA-HPC ©2002 MIT The World According to Krylov Preconditioning Start with Ax = b, Form PAx = Pb Determine the Krylov Subspace r 0 = Pb − PAx k 0 0 0 Krylov Subspace ≡ span r , PAr ,..., ( PA ) r { } Select Solution from the Krylov Subspace k 0 k +1 k k 0 0 0 x = x + y , y ∈ span r , PAr ,..., ( PA ) r { k GCR picks a residual minimizing y . SMA-HPC ©2002 MIT } Preconditioning Krylov Methods Diagonal Preconditioners Let A = D + And ( ) ( ) Apply GCR to D −1 A x = I + D −1 And x = D −1b • The Inverse of a diagonal is cheap to compute • Usually improves convergence SMA-HPC ©2002 MIT Krylov Methods Convergence Analysis Optimality of GCR poly GCR Optimality Property % k+1 ( PA) r 0 where ℘ % k+1 is any k th order r k +1 ≤ ℘ % k+1 ( 0 ) =1 polynomial such that ℘ Therefore Any polynomial which satisfies the constraints can be used to get an upper bound on SMA-HPC ©2002 MIT r k +1 r 0 Residual Poly Picture for Heat Conducting Bar Matrix No loss to air (n=10) Keep ℘k ( λi ) as small as possible: Easier if eigenvalues are clustered! SMA-HPC ©2002 MIT Krylov Vectors for diagonal Preconditioned A The World According to Krylov xexact (1 digit) 0.9 -0.7 0.6 -0.5 0.4 -0.3 0.1 b = r0 1 0 0 0 0 0 0 1 -.5 0 0 0 0 0 1.25 -1 -.5 0 0 0 0 −1 D A D −1 A 1-D Discretized PDE −0.5 0 0 1 1 −0.5 1 0 O 0 −0.5 0 O O −0.5 M 0 0 −0.5 1 0 14444 4244444 3 SMA-HPC ©2002 MIT D−1A = 1 1.25 −0.5 −1 −M0.5 0 0 The World According to Krylov xexact (1 digit) b = r0 −1 D A −1 D A Krylov Vectors for Diagonal Preconditioned A Communication Lower Bound 0.9 -0.7 0.6 -0.5 0.4 -0.3 0.1 1 0 0 0 0 0 0 1 -.5 0 0 0 0 0 1.25 -1 -.5 0 0 0 0 Communication Lower Bound for m gridpoints ( −1 ) k D A r 0 is nonzero in mth entry after k = m iters k 0 k +1 j = x + ∑α j r ≠ 0 Need at least m iterations for x m j =1 m ( SMA-HPC ©2002 MIT ) Krylov Vectors for Diagonal Preconditioned A The World According to Krylov Two Dimensional Case m For an mxm Grid 1 1 0 If r 0 = M 0 m Takes ≈ 2m = O ( m ) iters m2 SMA-HPC ©2002 MIT ( for x k +1 )m 2 ≠ 0 Convergence for GCR The World According to Krylov Eigenanalysis −0.5 0 0 1 1 −0.5 1 0 O 0 −0.5 0 O O −0.5 M 0 0 −0.5 1 0 14444 3 4244444 = 1 1.25 −0.5 − 1 −M0.5 0 0 D−1A kπ Recall Eigenvalues of D A = 1 − cos m +1 −1 SMA-HPC ©2002 MIT Convergence for GCR The World According to Krylov Eigenanalysis For D −1 A, κ ≡ λλ max min mπ π = 1 − cos − 1 cos + + m 1 m 1 −1 # Iters for GCR to achieve convergence rk r 0 k κ −1 ≤ 2 ≤ κ +1 k = γ log convergence tolerance γ 2 ≅ κ −1 m → ∞ log κ +1 O (m) GCR achieves Communication lower bound O(m)! SMA-HPC ©2002 MIT The World According to Krylov Work for Banded Gaussian Elimination, Relaxation and GCR Dimension Banded GE Sparse GE 1 O (m) O (m) 2 O (m O (m 3 ) O (m ) 4 7 ) O (m ) 3 6 GCR O (m ) O (m ) O (m ) GCR faster than banded GE in 2 and 3 dimensions Could be faster, 3-D matrix only m3 nonzeros. Must defeat the communication lower bound! SMA-HPC ©2002 MIT 2 3 4 The World According to Krylov How to get Faster Converging GCR Preconditioning is the Only Hope GCR already achieves Communication Lower bound for a diagonally preconditioned A Preconditioner must accelerate communication Multiplying by PA must move values by more than one grid point. SMA-HPC ©2002 MIT Gauss-Seidel Preconditioning Preconditioning Approaches Physical View 1-D Discretized PDE (new) u0 û1 û2(old) û1(new) û (new) 2 Gauss Seidel û3(old) uˆn(new) −1 uˆn(new) uˆ n +1 Each Iteration of Gauss-Seidel Relaxation moves data across the grid SMA-HPC ©2002 MIT Gauss-Seidel Preconditioning Preconditioning Approaches xexact (1 digit) b =−1 r 0 ( D + L) A ( D + L) −1 A b =−1 r ( D + L) A 0 ( D + L) −1 A Krylov Vectors 0.9 -0.7 0.6 -0.5 0.4 -0.3 0.1 1 0 0 0 0 0 0 x x x x x x x x x x x x x x 0 0 0 0 0 0 1 0 0 0 0 0 x x 0 0 0 0 x x x 1-D Discretized PDE X=nonzero Gauss-Seidel communicates quickly in only one direction SMA-HPC ©2002 MIT Gauss-Seidel Preconditioning Preconditioning Approaches Symmetric Gauss-Seidel (new) û u0 1 û2(old) û2(new) (new) û1 û3(old) uˆn(new) −1 uˆ (newer) û u 1 0 (new) n−2 uˆn(newer) −1 uˆn(new) uˆ n +1 uˆn( new) û2(newer) This symmetric Gauss-Seidel Preconditioner Communicates both directions SMA-HPC ©2002 MIT Gauss-Seidel Preconditioning Preconditioning Approaches Symmetric Gauss-Seidel Derivation of the SGS Iteration Equations Forward Sweep ( half step ) : ( D + L ) x k +1/ 2 + Ux k = b Backward Sweep ( half step ) : ( D + U ) x k + Lx k +1/ 2 = b ⇒ x k +1 = ( D + U ) L ( D + L ) Ux k −1 −1 −1 + ( D +U ) b − ( D +U ) L ( D + L) b ⇒ x k +1 = x − ( D + U ) D ( D + L ) Ax k −1 k −1 −1 −1 + ( D +U ) D ( D + L) b −1 SMA-HPC ©2002 MIT −1 Block Diagonal Preconditioners Preconditioning Approaches Line Schemes Grid Matrix 1 m2 Tridiagonal Matrices factor quickly SMA-HPC ©2002 MIT Block Diagonal Preconditioners Preconditioning Approaches Line Schemes Grid Problem 1 Lines preconditioners communicate rapidly in only one direction Solution m2 Do lines first in x, then in y. The Preconditioner is now two Tridiagonal solves, with variable reordering in between. SMA-HPC ©2002 MIT Block Diagonal Preconditioners Preconditioning Approaches Domain Decomposition Approach 1 Break the domain into small blocks each with the same # of grid points The trade-off m2 SMA-HPC ©2002 MIT Fewer blocks means faster convergence, but more costly iterates Block Diagonal Preconditioners Preconditioning Approaches Domain Decomposition m points 1 2 Block index l m l2 m points l 2 m Block cost: factoring l × l grids, O ( m 2l ) , sparse GE l m GCR iters: Communication bound gives O iterations. l3 Suggests insensitivity to l: Algorithm is O ( m ) . Do you have to refactor every GCR iteration? SMA-HPC ©2002 MIT Siedelerized Block Diagonal Preconditioners Preconditioning Approaches Line Schemes Grid Matrix 1 m2 SMA-HPC ©2002 MIT Overlapping Domain Preconditioners Preconditioning Approaches Line based Schemes Grid Matrix 1 m2 Bigger systems to solve, but can have faster convergence on tougher problems (not just Poisson). SMA-HPC ©2002 MIT Preconditioning i Approaches Incomplete Factorization Schemes Outline Reminder about Gaussian Elimination Computational Steps Fill-in for Sparse Matrices Greatly increases factorization cost Fill-in in a 2-D grid Incomplete Factorization Idea Sparse Matrices Fill-In Example Matrix Non zero structure Matrix after one GE step X X X X X 0 X 0 X X X X X X X 0 0 X X X X= Non zero Fill-ins SMA-HPC ©2002 MIT Fill-In Sparse Matrices Second Example Fill-ins Propagate X X 0 0 X X X X 0 X X X X 0 X X 0 X 0 X 0 Fill-ins from Step 1 result in Fill-ins in step 2 SMA-HPC ©2002 MIT Sparse Matrices Fill-In Pattern of a Filled-in Matrix SMA-HPC ©2002 MIT Very Sparse Very Sparse Dense Sparse Matrices SMA-HPC ©2002 MIT Fill-In Unfactored Random Matrix Sparse Matrices SMA-HPC ©2002 MIT Fill-In Factored Random Matrix Factoring 2-D Finite-Difference matrices Generated Fill-in Makes Factorization Expensive SMA-HPC ©2002 MIT FD Matrix properties SMA-HPC ©2002 MIT 3-D Discretization Matrix nonzeros, m = 4 example Preconditioning i Approaches Incomplete Factorization Schemes Key idea THROW AWAY FILL-INS! Throw away all fill-ins Throw away only fill-ins with small values Throw away fill-ins produced by other fill-ins Throw away fill-ins produced by fill-ins of other fill-ins, etc. Summary • 3-D BVP Examples – Aerodynamics, Continuum Mechanics, Heat-Flow • Finite Difference Matrices in 1, 2 and 3D – Gaussian Elimination Costs • Krylov Method – Communication Lower bound – Preconditioners based on improving communication