An Multigrid Tutorial Van Emden Henson Center for Applied Scientific Computing Lawrence Livermore National Laboratory vhenson@llnl.gov http://www.casc.gov/CASC/people/henson April 10, 1999 AMG: What is Algebraic Multigrid?? Any multilevel method where geometry is not used (and may not be available) to build coarse grids, interpolation and restriction, or coarse-grid operators. “Classical” AMG was introduced by Brandt, McCormick and Ruge in 1982. It was explored early on by Stueben in 1983, and popularized by Ruge and Stuben in 1987. This tutorial will describe only the classical AMG algorithm. CASC veh 2 AMG: What is Algebraic Multigrid?? Many other algorithms qualify under the definition given, that are important, or novel, or historical, or downright weird: • • • • • • • • Multigraph methods (Bank & Smith) Aggregation methods (Braess; Chan & Zikatanov & Xu ) Smoothed Aggregation methods (Mandel & Brezina & Vanek) Black Box Multigrid (Dendy, Dendy & Bandy) Algebraic Multilevel Recursive Solver (Saad) Element based algebraic multigrid (Chartier; Cleary et al) MultiCoarse correction with Suboptimal Operators (Sokol) Multilevel block ILU methods (Jang & Saad; Bank & Smith & Wagner; Reusken) • AMG based on Element Agglomeration (Jones & Vassilevski) • Sparse Approximate Inverse Smoothers (Tang & Wan) •Algebraic Schur-Complement approaches (Axelsson & Vassilevski & Neytcheva) CASC veh 3 Highlights of Multigrid: The 1-d Model Problem Poisson’s equation: u f in [0,1], with boundary conditions u 0 u 1 0 . Discretized as: ui 1 2 ui ui 1 u0 uN 0 fi h2 Leads to the Matrix equation CASC 2 1 1 A 2 h 1 2 1 1 2 1 1 2 1 1 2 Au f u1 u2 u3 u u N 2 u N 1 , where f1 f 2 f f 3 f N 2 f N 1 veh 4 Highlights of Multigrid: Weighted Jacobi Relaxation Consider the iteration: 2h Letting A = D+L+U, the matrix form is: n e w o l d ui 1 ui ui ol1d ui ol1d f i 2 u n ew 1 I D 1 L U u o ld D 1f G u old D 1f It is easy to see that if then . e u ex a c t u a p p ro x , e n ew G e old CASC veh 5 Highlights of Multigrid: Relaxation Typically Stalls The eigenvectors of G are the same as those of A , and are Fourier Modes: vi sin ik N k 1 2 N 1 The eigenvalues of G are 1 2 s i n 2 k 2N , so the effect of relaxation on the modes is: 1 N = 400 0 .5 1 3 1 2 2 3 0 -0 .5 -1 CASC 0 2 0 0 4 0 0 Note: No value of will damp out the low frequency waves 1 veh 6 Highlights of Multigrid: Relaxation Smooths the Error Initial error, 2 1 0 - 1 - 2 0 0 . 5 1 Error after 35 iteration sweeps: 2 1 0 - 1 Many relaxation schemes have the smoothing property, where oscillatory modes of the error are eliminated effectively, but smooth modes are damped very slowly. - 2 0 CASC 0 . 5 1 veh 7 Highlights of Multigrid: Smooth error can be represented on a coarse grid A smooth function: 1 0 . 5 0 - 0 . 5 - 1 0 0 . 5 1 Can be represented by linear interpolation from a coarser grid: 1 0 . 5 0 - 0 . 5 - 1 0 CASC 0 . 5 1 On the coarse grid, the smooth error appears to be relatively higher in frequency: in the example it is the 4-mode, out of a possible 16, on the fine grid, 1/4 the way up the spectrum. On the coarse grid, it is the 4-mode out of a possible 8, hence it is 1/2 the way up the spectrum. Relaxation will be more effective on this mode if done on the coarser grid!! veh 8 Highlights of Multigrid: Coarse-grid Correction h h Perform relaxation on A uh f on fine grid until error is smooth. h f h Ahuh r Compute residual, and transfer to the 2h coarse grid r 2h I h r h. Solve the coarse-grid residual equation to obtain the error: 2h 2h 2h 1 2h 2h 2h A e r e A r Interpolate the error to the fine grid and correct the fine-grid solution: u h u h I 2hh e 2h CASC veh 9 Highlights of Multigrid: Coarse-grid Correction h h h A u f Relax on h h hh Compute r f A u uh Correct u h eh Interpolate Restrict r 2h I h2h r h eh I2hh e2h Solve A2h e2h r2h e 2h A 2h 1 r 2h CASC veh 10 Highlights of Multigrid: Tools Needed Interpolation and restriction operators: 0 5 1 0 0 5 0 5 h I 2h 1 0 0 5 0 5 1 0 0 5 0 1 0 0 1 0 0 1 0 I h2h 0 25 1 0 0 25 0 25 1 0 0 25 0 25 1 0 0 25 I h2h Linear Injection Full-weighting Interpolation 2h A Coarse-grid Operator . Two methods: (1) Discretize equation at larger spacing (2) Use Galerkin Formula: A 2h I h2h A h I 2hh CASC veh 11 Highlights of Multigrid: Recursion: the ( ,0) V-cycle Major question: How do we “solve” the coarse-grid residual equation? Answer: recursion! u h G A h f h uh uh eh e h I 2hh u 2h f 2h I h2h f h A h u h u 2h G A 2h f 2h u 2h u 2h e 2h f 4h I 24hh f 2h A 2h u 2h e 2h I 42hh u 4h u 4h u 4h e 4h u 4h G A 4h f 4h f 8h I 48hh f 4h A 4h u 4h u 8h G A 8h f 8h e 4h I 84hh u 8h u 8h u 8h e 8h e H A H 1 f H CASC veh 12 Algebraic multigrid: for unstructured-grids Automatically defines coarse “grid” AMG has two distinct phases: AMG approach is opposite of geometric MG — setup phase: define MG components — solution phase: perform MG cycles — fix relaxation (point Gauss-Seidel) — choose coarse “grids” and prolongation, P, so that error not reduced by relaxation is in range(P) — define other MG components so that coarse-grid correction eliminates error in range(P) (i.e., use Galerkin principle) (in contrast, geometric MG fixes coarse grids, then defines suitable operators and smoothers) CASC veh 13 AMG has two phases: Setup Phase m1 m 1 2 — Select Coarse “grids,” — Define interpolation, m I m 1 m 1 2 — Define restriction and coarse-grid operators T m1 m Im Im 1 m1 m m A m 1 Im A Im 1 Solve Phase — Standard multigrid operations, e.g., V-cycle, Wcycle, FMG, etc CASC veh 14 In AMG, we choose relaxation first: Typically, pointwise Gauss-Seidel is used A D L U The iteration is developed: Ax b D L x b Ux 1 1 n e w x D L b D L Ux old Add and subtract D L 1 D L x old to get: 1 old n e w o l d x x D L r CASC veh 15 Gauss-Seidel relaxation error propagation: The iteration is: 1 old n e w o l d x x D L r Subtracting both sides from the exact solution: x ex a ct x n ew x ex a ct x old D L 1 r old enew e old D L 1 r old Using r Ae e n ew CASC this can be written as: I D L 1 A e o ld veh 16 An observation: error that is slow to converge “small” residuals Consider the iterative method error recurrence 1 k 1 e I Q A ek Error that is slow to converge satisfies I Q 1A e e Q 1 r 0 Perhaps a better viewpoint is 1 I Q 1A e e CASC Ae 0 Q A e A e « e Ae veh 17 Some implications of slow convergence For most iterations (e.g., Jacobi or Gauss-Seidel) this last holds if D 1 A e A e « e A e . (1) Hence N ri2 a ii i1 « N ri ei implying i 1 that, on average, ri « aii ei An implication is that, if e is an error slow to converge, then locally at least, ei can be wellapproximated by an average of its neighbors: CASC veh 18 In Multigrid, error that is slow to converge is geometrically smooth Combining the algebraic property that slow convergence implies “small residuals” with the observation above, in AMG we DEFINE smooth error: Smooth error is that error which is slow to converge under relaxation, that is, I Q 1A e e I CASC or, more precisely, 1 Q A e A e A veh 19 AMG uses dependence (influence) to determine MG components We need to choose a subset of the gridpoints (coarse grid) that can be used 1) to represent smooth errors, and 2) to interpolate these errors to the fine grid. Intuitively, a point u j is a good candidate for a Cpoint if its value is important in determining the value of another point, ui in the ith equation. If the a i j coefficient is “large” compared to the other off-diagonal coefficients in the ith equation then u j influences ui (or ui depends on u j ). CASC veh 20 Dependence and smooth error For M-matrices, we define “i depends on j ” by aij max k i aik 0 1 alternatively, “ j influences i. ” It is easy to show from (1) that smooth error satisfies (2) A e e « De e CASC veh 21 Dependence and smooth error For M-matrices, we have from (2) 1 2 2 a i j ei e j 2 a ii ei « 1 i j — If ei does not depend on ej then the inequality may be satisfied because a i j is “small”. — If ei does depend on ej, then a i j need not be small, and the inequality must be satisfied by This implies that smooth error varies slowly in the direction of dependence. CASC veh 22 Some useful definitions The set of dependencies of a variable ui , that is, the variables upon whose values the value of ui depends, is defined as S i j a i j ma x a i k ki The set of points that ui influences is denoted: S iT j i S j CASC veh 23 More useful definitions The set of coarse-grid variables is denoted C . The set of fine-grid variables is denoted F. The set of coarse-grid variables used to interpolate the value of the fine-grid variable called the coarse interpolatory set for i, is denoted C i . CASC ui , veh 24 Two Criteria for Choosing the Coarse Grid Points First Criterion: F - F dependence — (C1) For each i F , each point j S i should either be in C itself or should depend on at least one point in C i . j Si k Ci i F n Ci CASC Since the value of ui depends on the value of uj , the value of uj must be represented on the coarse-grid for good interpolation. If j isn’t a C-point, it should depend on a point in C i so its value is “represented” in the interpolation. veh 25 Two Criteria for Choosing the Coarse Grid Points Second Criterion: Maximal Subset — (C2) C should be a maximal subset with the property that no C -point depends on another. — (C1) tends to increase the number of C -points. H In general, the more C-points on , the better the h-level convergence. — But more C-points means more work for relaxation and interpolation. — (C2) is designed to limit the size (and work) of the coarse grid. CASC veh 26 Two Criteria for Choosing the Coarse Grid Points It is sometimes not possible to satisfy both criteria simultaneously (an example will be seen shortly). In those cases, we choose to satisfy (C1), the requirement that F-F dependencies be represented in the coarse-interpolatory set, while using (C2) as a guide. This choice leads to somewhat larger coarse grids, but tends to preserve good convergence properties. CASC veh 27 Choosing the Coarse Grid Points Assign to each gridpoint k a “value” equal to the number of points that depend on k. Choose the first point with global maximum value as a C-point. The new C-point can be used to interpolate values of points it influences. Assign them all as Fpoints. Other points influencing these new F-points can be used in their interpolation. Increment their value. Repeat until all points are C- or F-points. CASC veh 28 Ruge AMG: start 3 5 5 5 5 5 3 5 8 8 8 8 8 5 5 8 8 8 8 8 5 select C-pt with maximal measure select 5 8 8 8 8 8 5 5 8 8 8 8 8 5 5 8 8 8 8 8 5 3 5 5 5 5 5 3 CASC neighbors as F-pts update measures of F-pt neighbors veh 29 Ruge AMG: select C-pt 1 3 5 5 5 5 5 3 5 8 8 8 8 8 5 5 8 8 8 8 8 5 5 8 8 8 8 8 5 5 8 8 8 8 8 5 5 8 8 8 8 8 5 3 5 5 5 5 5 3 CASC select next C-pt with maximal measure select neighbors as F-pts update measures of F-pt neighbors veh 30 Ruge AMG: select F-pt 1 3 5 5 5 5 5 3 5 8 8 8 8 8 5 5 8 8 8 8 8 5 select C-pt with maximal measure select 5 CASC 8 8 8 8 8 5 8 8 8 5 8 8 8 5 5 5 5 3 neighbors as F-pts update measures of F-pt neighbors veh 31 Ruge AMG: update F-pt neighbors 1 3 5 5 5 5 5 3 5 8 8 8 8 8 5 5 8 8 8 8 8 5 select C-pt with maximal measure select 7 CASC 11 10 9 8 8 5 10 8 8 5 11 8 8 5 7 5 5 3 neighbors as F-pts update measures of F-pt neighbors veh 32 Ruge AMG: select C-pt 2 3 5 5 5 5 5 3 5 8 8 8 8 8 5 5 8 8 8 8 8 5 7 11 10 9 8 8 5 10 8 8 5 8 8 5 5 5 3 7 CASC select next C-pt with maximal measure select neighbors as F-pts update measures of F-pt neighbors veh 33 Ruge AMG: select F-pt 2 3 5 5 5 5 5 3 5 8 8 8 8 8 5 5 8 8 8 8 8 5 7 11 10 9 8 8 5 8 5 8 5 5 3 CASC select next C-pt with maximal measure select neighbors as F-pts update measures of F-pt neighbors veh 34 Ruge AMG: update F-pt neighbors 2 3 5 5 5 5 5 3 5 8 8 8 8 8 5 5 8 8 8 8 8 5 7 11 11 11 10 9 5 10 5 11 5 6 3 CASC select next C-pt with maximal measure select neighbors as F-pts update measures of F-pt neighbors veh 35 Ruge AMG: select C-pt, F-pts, update neighbors 3 3 5 5 5 5 5 3 5 8 8 8 8 8 5 5 8 8 8 8 8 5 7 11 11 11 11 11 7 select next C-pt with maximal measure select neighbors as F-pts update measures of F-pt neighbors CASC veh 36 Ruge AMG: select C-pt, F-pts, update neighbors 4 3 7 5 11 5 10 5 5 5 3 9 8 8 5 10 8 8 5 13 11 11 7 select next C-pt with maximal measure select neighbors as F-pts update measures of F-pt neighbors CASC veh 37 Ruge AMG: select C-pt, F-pts, update neighbors 5 3 7 5 11 5 11 5 11 5 10 5 3 9 5 10 5 13 7 select next C-pt with maximal measure select neighbors as F-pts update measures of F-pt neighbors CASC veh 38 Ruge AMG: select C-pt, F-pts, update neighbors 6,7,8,9 select next C-pt with maximal measure select neighbors as F-pts update measures of F-pt neighbors CASC veh 39 Examples: Laplacian Operator 5-pt FD, 9-pt FE (quads), and 9-pt FE (stretched quads) CASC 5-pt FD 9-pt FE (quads) 1 1 4 1 1 1 1 1 1 8 1 1 1 1 veh 40 Coarse-grid selection: 9pt Laplacian, periodic boundary conditions 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 select C-pt with maximal measure select 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 CASC neighbors as F-pts update measures of F-pt neighbors veh 41 Coarse-grid selection: 9pt Laplacian, periodic boundary conditions 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 CASC select C-pt with maximal measure select neighbors as F-pts update measures of F-pt neighbors veh 42 Coarse-grid selection: 9pt Laplacian, periodic boundary conditions 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 CASC select C-pt with maximal measure select neighbors as F-pts update measures of F-pt neighbors veh 43 Coarse-grid selection: 9pt Laplacian, periodic boundary conditions 10 8 8 10 11 10 9 8 8 9 10 8 8 8 8 8 8 8 8 8 8 8 8 8 8 11 10 9 8 8 9 10 10 8 8 10 11 8 8 11 CASC select C-pt with maximal measure select neighbors as F-pts update measures of F-pt neighbors veh 44 Coarse-grid selection: 9pt Laplacian, periodic boundary conditions 10 10 11 11 11 10 9 9 10 8 8 8 8 8 8 8 8 8 8 8 8 8 8 11 11 11 10 9 9 10 10 10 11 11 CASC select C-pt with maximal measure select neighbors as F-pts update measures of new F-pt neighbors veh 45 Coarse-grid selection: 9pt Laplacian, periodic boundary conditions 11 11 11 11 11 11 11 8 8 8 8 8 8 8 8 8 8 8 8 8 8 11 11 11 11 11 11 11 CASC select C-pt with maximal measure select neighbors as F-pts update measures of new F-pt neighbors veh 46 Coarse-grid selection: 9pt Laplacian, periodic boundary conditions 11 11 11 11 11 11 11 11 10 9 8 8 9 10 10 8 8 10 13 11 11 13 CASC select C-pt with maximal measure select neighbors as F-pts update measures of new F-pt neighbors veh 47 Coarse-grid selection: 9pt Laplacian, periodic boundary conditions 11 11 11 11 11 11 11 11 11 11 10 9 9 10 10 10 13 13 CASC select C-pt with maximal measure select neighbors as F-pts update measures of new F-pt neighbors veh 48 Coarse-grid selection: 9pt Laplacian, periodic boundary conditions 11 11 11 11 11 11 11 11 11 11 11 11 11 11 select C-pt with maximal measure select neighbors as F-pts update measures of new F-pt neighbors CASC veh 49 Coarse-grid selection: 9pt Laplacian, periodic boundary conditions 13 11 11 13 13 11 11 13 select C-pt with maximal measure select neighbors as F-pts update measures of new F-pt neighbors CASC veh 50 Coarse-grid selection: 9pt Laplacian, periodic boundary conditions 13 13 13 13 select C-pt with maximal measure select neighbors as F-pts update measures of new F-pt neighbors CASC veh 51 Coarse-grid selection: 9pt Laplacian, periodic boundary conditions Modulo periodicity, it’s the same coarsening as in the Dirichlet case. However, it has many F-F connections that do not share a common C-point CASC veh 52 Coarse-grid selection: 9pt Laplacian, periodic boundary conditions A second pass is made in which some F-points are made into C-points to enforce (C1). Goals of the second pass include minimizing C-C connections, and minimizing the number of C-points converted to F-points. CASC veh 53 Prolongation F F C i C C F iC ei P e i ik ek iF kCi The interpolated value at point i is just ei if i is a Cpoint. If i is an F-point, the value is a weighted sum of the values of the points in the coarse interpolatory set C . i CASC veh 54 To define prolongation at i, we must examine the types of connections of ui. Sets of connection types: F F C i C C F CASC Ci i is dependent on these coarse interpolatory Cpoints. Dis i is dependent on these Fpoints. Diw i does not depend on these “weakly connected” points, which may be C- or Fpoints. veh 55 Prolongation is based on smooth error and dependencies (from M-matrices) Recall that smooth error is characterized by “small” residuals: ri a i i ei j N i a i j ej 0 which we can rewrite as: a i i ei a i j ej j i We base prolongation on this formula by “solving” for ei and making some approximating substitutions. CASC veh 56 Prolongation is based on smooth error and dependencies (from M-matrices) We begin by writing the smooth-error relation: a i i ei a i j ej j i Identifying its component sums: a i i ei aij ej s aij ej waij ej j Ci Coarse interpolatory set j Di j Di F-point dependencies Weak connections We must approximate e in each of the last j two sums in terms of ei or of e for j Ci . j CASC veh 57 For the weak connections: let ej ei . a i i ei F F aij ej j Ci Coarse interpolatory set s aij ej j Di F-point dependencies w aij ej j Di Weak connections C i C C F Effectively, this throws the weak connections onto the diagonal: aii ai j j Diw ei a i j ej a i j ej j Ci j Dis This approximation can’t hurt too much: • Since the connection is weak. • If i depended on points in Diw, smooth error varies slowly in the direction of dependence CASC veh 58 For the F-point dependencies: use a weighted avg. of errors in Ci Cj. F F a a ii w i j ei a i j ej s a i j ej j Ci j Di j Di C Weak connections Coarse interpolatory set i C C F Approximate e j by a weighted average of the ek in the coarse interpolatory set Ci C j . ej F-point dependencies a j k ek k Ci ajk k Ci It is for this reason that the intersection of the coarse interpolatory sets of two F-points with a dependence relationship must be nonempty (C1). CASC veh 59 Finally, the prolongation weights are defined Making the previous substitution, and with a bit of messy algebra, the smooth error relation can be “solved” for ei to yield the interpolation formula: ei j Ci i j ej where the prolongation weights are given: aij i j aikak j s a j Di mC k m i aii w ain n Di CASC veh 60 Highlights of Multigrid: Storage: f h, u h must be stored each level In 1-d, each coarse grid has about half the number of points as the finer grid. In 2-d, each coarse grid has about one- fourth the number of points as the finer grid. In d-dimensions, each coarse grid has about 2 d finer grid. the number of points as the d Storage cost: 2N 1 2 d 2 2d 2 3d 2 Md 2N d 1 2d less than 2, 4/3, 8/7 the cost of storage on the fine grid for 1, 2, and 3-d problems, respectively. CASC veh 61 AMG storage: grid complexity For AMG there is no simple predictor for total m m 1 m 1 storage costs. u m, f m , and A m I m A Im 1 must be stored on all levels. Define , the grid complexity, as the total number of unknowns (gridpoints) on all levels, divided by the number of unknowns on the finest level. Total storage of the vectors u and f occupy 2 storage locations. CASC veh 62 AMG storage: operator complexity Define A, the operator complexity, as the total m number of nonzero coefficients of all operators A divided by the number of nonzero coefficients in 0 the fine-level operator A . Total storage of the operators occupies A storage locations. CASC veh 63 AMG storage: interpolation We could define I , an interpolation complexity, as the total number of nonzero coefficients of all m1 operators I m divided by the number of nonzero 1 coefficients in the operator I 0 . This measure is not generally cited, however (like most multigridders, the AMG crowd tends to ignore the cost of intergrid transfers). Two measures that occasionally appear are A , the average “stencil size,” and I, the average number of interpolation points per F-point. CASC veh 64 AMG Setup Costs: flops Flops in the setup phase are only a small portion of the work, which includes sorting, maintaining linked-lists, keeping counters, storage manipulation, and garbage collection. Estimates of the total flop count to define interpolation weights (I ) and the coarse-grid operators ( A) are: A N I 2 I A I 3 I A and I N I 3 A I 2 CASC veh 65 AMG setup costs: a bad rap Many geometric MG methods need to compute prolongation and coarse-grid operators The only additional expense in the AMG setup phase is the coarse grid selection algorithm AMG setup phase is only 10-25% more expensive than in geometric MG and may be considerably less than that! CASC veh 66 Highlights of Multigrid: Computation Costs Let 1 Work Unit (WU) be the cost of one relaxation sweep on the fine-grid. Ignore the cost of restriction and interpolation (typically about 20% of the total cost). (See?) Consider a V-cycle with 1 pre-Coarse-Grid correction relaxation sweep and 1 post-CoarseGrid correction relaxation sweep. Cost of V-cycle (in WU): 2 1 2 d 2 2d 2 3d 2 Md 2 1 2d Cost is about 4, 8/3, 16/7 WU per V-cycle in 1, 2, and 3 dimensions. CASC veh 67 AMG Solve Costs: flops per cycle The approximate number of flops in on level m for one relaxation sweep, residual transfer, and interpolation are (respectively) F A A F NmC 2 I Nm 2N m 2 Nm 2 I Nm A N where m is the number of coefficients in A m m C F and N m N m are the numbers of C-, F-points on . The total flop count for a 1 2 V-cycle, noting F that N m N and letting 1 2 is m approximately N 2 1 A 4 I 1 CASC veh 68 AMG Solve Costs: flops per cycle, again All that is very well, but in practice we find the solve phase is generally dominated by the cost of relaxation and computing the residual. Both of those operations are proportional to the number of nonzero entries in the operator matrix on any given level. Thus the best measure of the ratio of work done on all levels to the work done on the finest level is operator complexity: A CASC veh 69 Highlights of Multigrid: difficultiesanisotropic operators and grids Consider the operator : 2u x 2 2u y 2 If « then the GS-smoothing factors in the x- and y-directions are shown at right. Note that GS relaxation does not damp oscillatory components in the x-direction. f x y 1.2 1 0.8 0.6 0.4 0.2 0 -100 -50 0 50 100 The same phenomenon occurs for grids with much larger spacing in one direction than the other: CASC veh 70 Highlights of Multigrid: difficultiesdiscontinuous or anisotropic coefficients Consider the operator : D x y u where d11 x y d12 x y D x y d x y d x y 22 21 Again, GS-smoothing factors in the x- and y-directions can be highly variable, and very often, GS relaxation does not damp oscillatory components in the one or both directions. Solutions: line-relaxation (where whole gridlines of values are found simultaneously), and/or semicoarsening (coarsening only in the strongly coupled direction). CASC veh 71 AMG does semi-coarsening automatically! Consider the operator : 2u x 2 2u y 2 f x y In the limit, as stencil becomes: , the AMG automatically produces a semicoarsened grid!! 1 4 1 8 2 2 1 4 1 CASC veh 72 How’s it perform (vol I)? Regular grids, plain, old, vanilla problems The Laplace Operator: Convergence Stencil per cycle 5-pt 0.054 0.067 0.078 0.109 5-pt skew 9-pt (-1,8) 9-pt (-1,-4,20) Anisotropic Laplacian: Time Setup Complexity per Cycle Times 2.21 2.12 1.30 1.30 0.29 0.27 0.26 0.26 1.63 1.52 1.83 1.83 Ux x Uy y Epsilon 0.001 0.01 0.1 0.5 1 2 10 100 1000 Convergence/cycle 0.084 0.093 0.058 0.069 0.056 0.079 0.087 0.093 0.083 CASC veh 73 How’s it perform (vol II)? Structured Meshes, Rectangular Domains 5-point Laplacian on regular rectangular grids Convergence factor (y-axis) plotted against number of nodes (x-axis) 0 .1 6 0 .1 4 0 .1 3 5 0 .1 4 2 0 .1 4 6 0 .1 3 3 0 .1 2 0 .1 0 .0 8 0 .0 8 2 0 .0 6 0 .0 5 5 0 .0 4 0 .0 2 0 0 CASC 500000 1000000 veh 74 How’s it perform (vol III)? Unstructured Meshes, Rectangular Domains Laplacian on random unstructured grids (regular triangulations, 15-20% nodes randomly collapsed into neighboring nodes) Convergence factor (y-axis) plotted against number of nodes (x-axis) 0 .3 0 .2 5 3 0 .2 5 0 .2 2 3 0 .2 0 .1 7 5 0 .1 5 0 .1 1 1 0 .0 9 7 0 .1 0 .0 5 0 0 CASC 20000 40000 60000 veh 75 How’s it perform (vol IV)? Isotropic diffusion, Structured/Unstructured Grids d x y u on structured, unstructured grids 0 . 4 0 . 3 5 6 6 7 7 8 8 9 9 0 . 3 0 . 2 5 0 . 2 0 . 1 5 0 . 1 0 . 0 5 a b a b a b a b 0 N=16642 Structured N=66049 Structured N=13755 Unstruct. N=54518 Unstruct. Problems used: “a” means parameter c=10, “b” means c=1000 6 d x y 1 0 c x y 1 0 x 0 5 7 d x y c x 0 5 CASC 1 0 0 125 max x 0 5 y 0 5 0 25 8 d x y otherwise c 9 d x y 1 0 0 125 c x 0 5 2 y 0 5 2 0 25 otherwis e veh 76 How’s it perform (vol IVa)? Isotropic diffusion, Structured/Unstructured Grids d x y u on structured, unstructured grids 0 .4 5 0 .4 0 .3 5 1 1 1 1 1 1 0 .3 0 .2 5 0 .2 0 .1 5 0 .1 0 .0 5 0 0 1 1 2 2 a b a b a b , , , , , , n n n n n n = = = = = = 2 2 1 1 5 5 0 0 0 0 0 N=16642 Structured N=66049 N=13755 Structured Unstruct. N=54518 Unstruct. Problem used: “a” means parameter c=10, “b” means c=1000 “Checkerboard” of coefficients 1.0 and c, squares sized 1/n: j 1 i i 1 j 1 0 n x n n y n i j ev en d x y j 1 i i 1 j c n x n n y n i j odd CASC veh 77 How’s it perform (vol V)? Laplacian operator, unstructured Grids Convergence factor 0.3 0.2754 0.25 0.2198 0.2237 0.2 0.1888 0.1715 0.15 0.1002 0.1 0.05 0 0 50000 100000 150000 200000 250000 300000 350000 Gridpoints CASC veh 78 So, what could go wrong? Strong F-F connections: weights are dependent on each other For point the value ej is interpolated from k 1, k 2 , and is needed to make the interpolation weights for approximating ei . For point j the value ei is interpolated from k 1 , k 2 , and is needed to make the interpolation weights for approximating ej . It’s an implicit system! i k 1 Ci k 1 Cj k 3 Ci j i k4 Cj k 2 Ci CASC k 2 Cj veh 79 Is there a fix? A Gauss-Seidel like iterative approach to weight definition is implemented. Usually two passes suffice. But does it work? Frequently, it does: Convergence factors for Laplacian, stretched quadrilaterals theta Standard Iterative x 10 y 0.25 0.5 0.47 0.24 0.14 0.14 x 100 y 0.25 0.5 0.83 0.53 0.82 0.23 CASC veh 80 AMG for systems How can we do AMG on systems? A 11 A 12 u f A A v g 21 22 Naïve approach: “Block” approach (block Gauss-Seidel, using scalar AMG to “solve” at each cycle) u A 11 1 f A 12 v v A 22 1 g A 21 u Great Idea! Except that it doesn’t work! (relaxation does not evenly smooth errors in both unknowns) CASC veh 81 AMG for systems: a solution To solve the system problem, allow interaction between the unknowns at all levels: k k k I 0 k 1 A A u 11 12 k I Ak and k 1 k k k 0 Ik 1 A 21 A 22 v This is called the “unknown” approach. Results: 2-D elasticity, uniform quadrilateral mesh: mesh spacing 0.125 0.0625 0.03135 0.015625 Convergence factor CASC 0.22 0.35 0.42 0.44 veh 82 So, what else can go wrong? Ouch! Thin body elasticity! Elasticity, 3-d, thin bodies! 1 1 uyy uz z v x y wx z f 1 2 2 1 1 v yy vxx vz z ux y wy z f 2 2 2 1 1 wz z wx x wy y ux z v yz f 3 2 2 ux x Slide surfaces, Lagrange multipliers, force balance constraints: S T x 1 F1 x F U V 2 2 Wanted: Good solution method for this problem. REWARD S is “generally” positive definite, V can be zero, U T T . CASC veh 83 Needed: more robust methods for characterizing smooth error Consider quadrilateral finite elements on a stretched 2D Cartesian grid (dx -> infinity): A 1 4 1 2 8 2 1 4 1 Direction of dependence is not apparent here Iterative weight interpolation will sometimes compensate for mis-identified dependence Elasticity problems are still problematic CASC veh 84 Scalability is central for large-scale parallel computing A code is scalable if it can effectively use additional computational resources to solve larger problems Many specific factors contribute to scalability: — architecture of the parallel computer — parallel implementation of the algorithms — convergence rates of iterative linear solvers Linear solver convergence can be discussed independent of parallel computing, and is often overlooked as a key scalability issue. CASC veh 85 In Conclusion, AMG Rules! Interest in AMG methods is high, and probably still rising, because of the increasing importance of terra-scale simulations on unstructured grids. AMG has been shown to be a robust, efficient solver on a wide variety of problems of real-world interest. Much research is underway to find effective ways of parallelizing AMG, which is essential to largescale computing. This work was performed under the auspices of the U. S. Department of Energy under contract number W-7405-Eng-48. Release number UCRL MI 133749. CASC veh 86