Notes on Multigrid Methods Qinghai Zhang September 30, 2013 Motivation of multigrids. • The convergence rates of classical iterative method depend on the grid spacing, or problem size. In contrast, convergence rates of multigrid methods does not. • The complexity is O(n). Book: Brigg, Henson, and McCormick 2000 A multigrid tutorial, SIAM, 2nd ed. Encyclopedic website: www.mgnet.org 1 The model problem: 1D Possion equation. On the unit 1D domain x ∈ [0, 1], we numerically solve Poisson equation with homogeneous boundary condition ∆u = f, x(0) = x(1) = 0. (1) Discretize the domain with n cells and locate the knowns fj and unknowns uj at nodes xj = j/n = jh, j = 0, 1, . . . , n. We would like to approximate the second derivative of u using the discrete values at the nodes. Using Taylor expansion, we have ∂ 2 u uj+1 + uj−1 − 2uj = + O(h2 ). (2) 2 ∂x jh h2 Definition 1. The one-dimensional second-order discrete Laplacian is a Toeplitz matrix A ∈ R(n−1)×(n−1) as i=k 2, −1, i − k = ±1 aij = (3) 0, otherwise Then we are going to solve the linear system Au = f , (4) where fj = h2 f (xj ). Proposition 2. 1 (Au)j − (∆u)|xj = O(h2 ), ∀j = 1, . . . , n − 1. h2 Proposition 3. The eigenvalues λk and eigenvectors wk of A are λk (A) = 4 sin2 wk,j = sin where j, k = 1, 2, . . . , n − 1. 1 kπ , 2n jkπ , n (5) (6) (7) Proof. use the trigonmetric identity sin α + sin β = 2 sin α+β α−β cos . 2 2 (8) Remark 1. The 2D counterpart of A is A ⊗ I + I ⊗ A. Note that this is not true for variable coefficient Poisson equation. 2 The residual equation Definition 4. For an approximate solution ũj ≈ uj , the error is e = u − ũ, the residual is r = f − Aũ. Then Ae = r (9) holds and it is called the residual equation. As one advantage, the residual equation lets us focus on homogenuous Dirichlet condition WLOG. Question 1. For inexact arithmetic, does a small residual imply a small error? Definition 5. The condition number of a matrix A is cond(A) = kAk2 kA−1 k2 . It indicates how well the residual measures the error. kAxk2 kAk2 = sup = sup x6=0 kxk2 x6=0 s (Ax, Ax) = sup (x, x) x6=0 s (x, AT Ax) = (x, x) q λmax (AT A) (10) Since A is symmetric, kAk2 = λmax (A). kA−1 k2 = λmax (A−1 ) = λ−1 min (A). To give you an idea about the magnitude of cond(A), for n = 8, cond(A)=32, for n = 1024, cond(A)=4.3e+5. Theorem 6. 1 krk2 kek2 krk2 ≤ ≤ cond(A) cond(A) kf k2 kuk2 kf k2 (11) Proof. Use kAxk ≤ kAkkxk and the equations Ae = r, A−1 f = u. 3 Fourier modes and aliasing Hereafter Ωh denote both the uniform grid with n intervals and the corresponding vector space. Wavelength refers to the distance of one sinusoidal period. Proposition 7. The kth Fourier mode wk,j = sin(xj kπ) has wavelength Lw = k2 . Proof. sin(xj kπ) = sin(xj + L 2 )kπ implies xj kπ = (xj + L 2 )kπ − π. Hence k = 2 Lw . The wavenumber k is the number of crests and troughs in the unit domain. Question 2. What is the range of representable wavenumbers on Ωh ? For n = 8, consider k = 1, 2, 8. krep ∈ [1, n). What happens to modes of k > n? E.g. the mode with k = 3n/2 is represented by k = n/2. Plot the case of n = 4. Proposition 8. On Ωh , a Fourier mode wk = sin(xj kπ) with n < k < 2n is actually represented as the mode wk0 where k 0 = 2n − k. Proof. − sin(xj kπ) = sin(2jπ − xj kπ) = sin(xj (2n − k)π) = sin(xj k 0 π) = wk0 . Definition 9. On Ωh , the Fourier modes with wavenumbers k ∈ [1, n/2) are called low-frenquency (LF) or smooth modes, those with k ∈ [n/2, n − 1) high-frenquency (HF) or oscillatory modes. 2 4 The spectral property of weighted Jacobi The scalar fixed-point iteration converts the problem of finding a root of f (x) = 0 to the problem of finding a fixed point of g(x) = x where f (x) = c(g(x) − x) c 6= 0. Classical iterative methods split A as A = M − N and convert (4) to a fixed point (FP) problem u = M −1 N u + M −1 f . Let T = M −1 N , c = M −1 f . Then fixed point iteration yields u(`+1) = T u(`) + c. (12) e(`) = T ` e(0) . (13) After ` iterations Obviously, the FP iteration will converge iff ρ(T ) = |λ(T )|max < 1. Decompose A as A = D + L + U . Jacobi iteration has M = D, N = −(L + U ), T = −D−1 (L + U ), i.e. 1 2 , i − j = ±1 (14) tij = 0, otherwise Here ρ(T ) = 1 − O(h2 ). As h → 0, ρ(T ) → 1, and Jacobi converges slowly. Consider a generalization of the Jacobi iteration. Definition 10. The weighted Jacobi is given by the following. u∗ = −D−1 (L + U )u(`) + D−1 f, u(`+1) = (1 − ω)u(`) + ωu∗ , (15) (16) Setting ω = 1 yields Jacobi. Proposition 11. The weighted Jacobi has the iteration matrix Tω = (1 − ω)I − ωD−1 (L + U ) = I − ω A, 2 (17) whose eigenvectors are the same as those of A, with the corresponding eigenvalues as λk (Tω ) = 1 − 2ω sin2 kπ , 2n (18) where k = 1, 2, . . . , n − 1. See Fig. 2.7. For n = 64, ω ∈ [0, 1], ρ(Tω ) ≥ 0.9986. Not a great iteration method either. Why? P Look more under the hood to consider how weighted Jacobi damps different modes. Write e(0) = k ck wk , then X e(`) = Tω` e(0) = ck λ`k (Tω )wk . (19) k No value of ω will reduce the smooth components of the error effectively. λ1 (Tω ) = 1 − 2ω sin2 ωπ 2 h2 π ≈1− . 2n 2 (20) Having accepted that no value of ω damps the smooth components satisfactorily, we ask what value of ω provides the best damping of the oscillatory modes. Definition 12. The smoothing factor µ is the smallest factor by which the HF modes are damped per iteration. An iterative method is said to have the smoothing property if µ is small and independent of the grid size. 3 For weighted Jacobi, this leads to an optimization problem µ= min k∈[n/2,n) λk (Tω ), for ω ∈ (0, 1]. (21) λk (Tω ) is a monotonically decreasing function, and the minimum is therefore obtained by setting λn/2 (Tω ) = −λn (Tω ) ⇒ ω = 2 . 3 (22) Exercise: 2 1 ⇒ |λk | ≤ µ = (23) 3 3 See Figure 2.8 and 2.9. Regular Jacobi is only good for modes 16 ≤ k ≤ 48. For ω = 23 , the modes 16 ≤ k < 64 are all damped out quickly. ω= 5 Two-grid correction Proposition 13. The kth mode on Ωh becomes the kth mode on Ω2h : h 2h wk,2j = wk,j . (24) However, the LF modes k ∈ [ n4 , n2 ) of Ωh will become HF modes on Ω2h . Proof. h wk,2j = sin 2jkπ jkπ 2h = sin = wk,j , n n/2 (25) where k ∈ [1, n/2). Because of the smaller range of k on Ω2h , the mode with k ∈ [ n4 , n2 ) are HF by definition since the highest wavenumber is n2 on Ω2h . Definition 14. The restriction operator Ih2h : Rn−1 → Rn/2−1 maps a vector on the fine grid Ωh to its counterpart on the coarse grid Ω2h : Ih2h v h = v 2h . (26) A common restriction operator is the full-weighting operator vj2h = 1 h h h (v + 2v2j + v2j+1 ), 4 2j−1 (27) where j = 1, 2, . . . , n2 − 1. h Definition 15. The prolongation or interpolation operator I2h : Rn/2−1 → Rn−1 maps a vector on 2h h the coarse grid Ω to its counterpart on the fine grid Ω : h 2h I2h v = vh . (28) A common prolongation is the linear interpolation operator h v2j h v2j+1 = vj2h , 2h ). = 21 (vj2h + vj+1 (29) The key idea is that the weighted Jacobi with ω = 32 damps HF modes effectively, we can exploit this on a series of successively coarsened grides to eliminite HF modes. Definition 16. For Au = f , the two grid correction scheme v h ← MG(v h , f h , ν1 , ν2 ) consists of the following steps: 4 (30) 1) Relax Ah uh = f h ν1 times on Ωh with initial guess v h : v h ← Tων1 v h + c0 (f ), 2) compute the fine-grid residual rh = f h − Ah v h and restrict it to the coarse grid by r2h = Ih2h rh : r2h ← Ih2h (f h − Ah v h ), 3) solve A2h e2h = r2h on Ω2h : e2h ← (A2h )−1 r2h , h 2h 4) interpolate the coarse-grid error to the fine grid by eh = I2h e and correct the fine-grid approxih h h 2h mation: v ← v + I2h e , 5) Relax Ah uh = f h ν2 times on Ωh with initial guess v h : v h ← Tων1 v h + c0 (f ). Proposition 17. Let T G denote the iteration matrix of the two-grid correction scheme. Then h T G = Tων2 I − I2h (A2h )−1 Ih2h Ah Tων1 . (31) Proof. By definition, the two-grid correction scheme replaces the initial guess with h (A2h )−1 Ih2h f h − Ah (Tων1 v h + c0 (f )) , v h ← Tων1 v h + c0 (f ) + I2h (32) which also holds for the exact solution uh . Subtracting the two equations yields (31). 5.1 The spectral picture Our objective is to show that T G ≈ 0.1 for ν1 = 2, ν2 = 0. For this purpose, we need to examine the intergrid transfer operators. Definition 18. wkh (k ∈ [1, n/2)) and wkh0 (k 0 = n − k) are called complementary modes on Ωh . Proposition 19. For a pair of complementary modes on Ωh , we have Proof. wkh0 ,j = sin (n−k)jπ n h wkh0 ,j = (−1)j+1 wk,j h = sin jπ − kjπ = (−1)j+1 wk,j . n (33) Lemma 20. The action of the full-weighting operator on a pair of complementary modes is Ih2h wkh = cos2 kπ 2h w = ck wk2h , 2n k kπ 2h w = −sk wk2h , 2n k h where k ∈ [1, n/2), k 0 = n − k. In addition, Ih2h wn/2 = 0. Ih2h wkh0 = − sin2 (34a) (34b) Proof. For the smooth mode, Ih2h wkh = 1 (j − 1)kπ 1 jkπ 1 (j + 1)kπ 1 kπ jkπ kπ 2h sin + sin + sin = (1 + cos ) sin = cos2 w , 4 n 2 n 4 n 2 n n 2n k where the last step uses Proposition 14. As for the HF mode, follow the same procedure, but replace k with n − k, use Proposition 9 for aliasing, and notice that j is even. The full-weighting operator thus maps a pair of complementary modes to a multiple of the smooth mode, which might be a HF mode on the coarse grid. Lemma 21. The action of the interpolation operator on Ω2h is h I2h wk2h = ck wkh − sk wkh0 , where k 0 = n − k. 5 (35) Proof. Proposition 20 and trignometric identities yield kπ kπ wkh , j is even, ck wkh − sk wkh0 = cos2 + (−1)j sin2 wkh = h w , j is odd. cos kπ 2n 2n k n On the other hand, by Definition 16, we have ( h I2h wk2h j == kπ(j−1)/2 1 + 2 sin n/2 1 2 wkh , j is even, kπ h sin kπ(j+1)/2 = cos w , j is odd. k n/2 n Hence, the range of the interpolation operator contains both smooth and oscillatory modes. In other words, it excites oscillatory modes on the fine grid. However, if k n, the amplitudes of these 2 HF modes sk ∼ O( nk 2 ). Theorem 22. The two-grid correction operator is invariant on the subspaces Wkh = span{wkh , wkh0 }. T Gwk = λνk1 +ν2 sk wk + λkν1 λνk20 sk wk0 (36a) T Gwk0 = λνk10 λνk2 ck wk + λνk10 +ν2 ck wk0 , (36b) where λk is the eigenvalue of Tω . Proof. Consider first the case of ν1 = ν2 = 0. Ah wkh = 4sk wkh ⇒Ih2h Ah wkh = (37a) 16ck sk wk2h ⇒(A2h )−1 Ih2h Ah wkh = (37b) 16ck sk wk2h = wk2h kπ 4 sin2 n/2 (37c) h ⇒ − I2h (A2h )−1 Ih2h Ah wkh = −ck wkh + sk wkh0 h ⇒ I − I2h (A2h )−1 Ih2h Ah wkh = sk wkh + sk wkh0 , (37d) (37e) where the additional factor of 4 in (37b) comes from the fact that the residual is scaled by h2 and the trigonmetric identity sin(2θ) = 2 sin θ cos θ is applied in (37c). Similarly, Ah wkh0 = 4sk0 wkh0 = 4ck wkh0 ⇒Ih2h Ah wkh0 = (38a) −16ck sk wk2h ⇒(A2h )−1 Ih2h Ah wkh0 (38b) 16ck sk wk2h = −wk2h =− kπ 4 sin2 n/2 (38c) h ⇒ − I2h (A2h )−1 Ih2h Ah wkh0 = ck wkh − sk wkh0 h ⇒ I − I2h (A2h )−1 Ih2h Ah wkh0 = ck wkh + ck wkh0 . (38d) (38e) Note that in the first equation we have used ck = sk0 . Adding pre-smoothing incurs a scaling of λνk1 for (37e) and λνk10 for (38e). In contrast, adding postsmoothing incurs a scaling of λνk2 for wkh and a scaling of λkν20 for wkh0 in both (37e) and (38e). Hence (36) holds. (36) can be rewritten as ν1 +ν2 wk λ s TG = kν1 ν2 k wk 0 λk0 λk ck For k n, although λνk1 +ν2 ≈ 1, sk ∼ Figure 1 for four examples. λkν1 λνk20 sk λkν10 +ν2 ck k2 n2 , wk c = 1 wk0 c3 c2 c4 wk . wk 0 (39) hence c1 1. Also, λνk10 1, hence c2 , c3 , c4 1. See 6 1 1 c1 c2 c3 c4 0.9 c1 c2 c3 c4 0.9 0.8 0.8 0.7 0.7 0.6 0.6 0.5 0.5 0.4 0.4 0.3 0.3 0.2 0.2 0.1 0.1 0 0 0 10 20 30 40 50 60 70 0 10 20 (a) ν1 = 0, ν2 = 0 30 40 50 60 70 (b) ν1 = 0, ν2 = 2 0.12 0.35 c1 c2 c3 c4 c1 c2 c3 c4 0.3 0.1 0.25 0.2 0.08 0.15 0.06 0.1 0.05 0.04 0 −0.05 0.02 −0.1 0 −0.15 0 10 20 30 40 50 60 0 70 10 20 (c) ν1 = 1, ν2 = 1 30 40 50 60 70 (d) ν1 = 2, ν2 = 0 0.12 0.07 c1 c2 c3 c4 c1 c2 c3 c4 0.06 0.1 0.05 0.08 0.04 0.06 0.03 0.04 0.02 0.02 0.01 0 0 0 10 20 30 40 50 60 70 0 (e) ν1 = 2, ν2 = 2 10 20 30 40 50 60 70 (f) ν1 = 4, ν2 = 0 Figure 1: The damping coefficients of two-grid correction with weighted Jacobi for n = 64. The x-axis is k. 7 5.2 The algebraic picture Lemma 23. The full-weighting operator and the linear-interpolation operator satisfy the variational properties h I2h = c(Ih2h )T , c ∈ R. (40a) h Ih2h Ah I2h = A2h . (40b) (40b) is also called the Galerkin condition. h Proposition 24. A basis for the range of the interpolation operator R(I2h ) is given by its columns, n h h hence dim R(I2h ) = 2 − 1. N (I2h ) = {0}. h h 2h h Proof. R(I2h ) = {I2h v P : v 2h ∈ Ω2h }. The maximum dimension of R(I2h ) is thus n2 − 1. Any v 2h can 2h 2h 2h h be expressed as v = vj ej . It is obvious that the columns of I2h are linearly-independent. Corollary 25. For the full-weighting operator, dim R(Ih2h ) = n − 1, 2 dim N (Ih2h ) = n . 2 (41) Proof. The fundamental theorem of linear algebra states that for a matrix A : Rm → Rn , Rm = R(A) ⊕ N (AT ), n T R = R(A ) ⊕ N (A), (42) (43) where R(A) ⊥ N (AT ) and R(AT ) ⊥ N (A). If A has rank r, from the singular value decomposition A = U ΣV T , we have R(A) = span{U1 , U2 , . . . , Ur }, (44) N (A) = span{Vn−r+1 , Vn−r+2 , . . . , Vn }, (45) T (46) T (47) R(A ) = span{V1 , V2 , . . . , Vr }, N (A ) = span{Um−r+1 , Vm−r+2 , . . . , Vm }. See Figure 5.7 on page 85. The rest of the proof follows from (40). Proposition 26. A basis for the null space of the full-weighting operator is given by N (Ih2h ) = span{Ah ehj : j is odd}, (48) where ehj is the jth unit vector on Ωh . Proof. Consider Ih2h Ah . The jth row of Ih2h has 2(j − 1) leading zeros and the next three nonzero entries are 1/4, 1/2, 1/4. Since Ah has bandwidth of 3, it suffices to only consider three columns of Ah P 2h for potentially non-zero inner-product j (Ih )ij (Ah )jk . For 2j ± 1, these inner products are zero, for 2j, the inner product is −1/2. The basis vector of N (Ih2h ) are thus of the form (0, 0, . . . , −1, 2, −1, . . . , 0, 0)T . Hence N (Ih2h ) consists of both smooth and oscillatory modes. Theorem 27. The null space of the two-grid correction operator is the range of interpolation: h N (T G) = R(I2h ). 8 (49) h h 2h Proof. If sh ∈ R(I2h ), then sh = I2h q . h 2h h T Gsh = I − I2h (A2h )−1 Ih2h Ah I2h q = 0, where the last step comes from (40b). h T Consider th ∈ N (I2h ) = N (Ih2h ). By Proposition 27, th ∈ N (Ih2h Ah ). Hence h T Gth = I − I2h (A2h )−1 Ih2h Ah th = I, h i.e., TG is the identity operator when acting on N (Ih2h ). This implies that dimN (T G) ≤ dimR(I2h ), which completes the proof. Now that we have both the spectral decomposition Ωh = L ⊕ H and the subspace decomposition h h Ω = R(I2h ) ⊕ N (I2h ), the combination of relaxation with TG correction is equivalent to projecting the initial error vector to the L axis and then to the N axis. Repeating this process reduces the error vector to the origin; see Figure 5.8-Figure 5.11 for an illustration. h 6 Multigrid cycles Definition 28. The V-cycle scheme is an algorithm vh ← V h (vh , f h , ν1 , ν2 ) (50) with the following steps. 1) relax ν1 times on Ah uh = f h with a given initial guess vh , 2) if Ωh is the coarsest grid, go to step 4), otherwise f 2h ← Ih2h (f h − Avh ), v2h ← 0, v2h ← V h (v2h , f 2h ). h 2h 3) interpolate error back and correct the solution: vh ← vh + I2h v . 4) relax ν2 times on Ah uh = f h with the initial guess vh . Definition 29. The Full Multigrid V-cycle is an algorithm vh ← F M Gh (f h , ν1 , ν2 ) (51) with the following steps. 1) If Ωh is the coarsest grid, set vh ← 0 and go to step 3), otherwise f 2h ← Ih2h f h , v2h ← F M G2h (f 2h , ν1 , ν2 ). h 2h 2) correct vh ← I2h v , 3) perform a V-cycle with initial guess vh : vh ← V h (vh , f h , ν1 , ν2 ). See Figure 3.6 for the above two methods. Note that in Figure 3.6(c) the initial descending to the coarsest grid is missing. 9