15: GMRES and preconditioned GMRES Math 639 (updated: January 2, 2012) The following theorem follows from the proposition of the previous class and the fact that GMRES is a minimizer. Theorem 1. Assume that A is an n × n real matrix which satisfies (15.1) αkxk2 ≤< Ax, x >, for all x ∈ Rn , < Ax, y > ≤ βkxk kyk, for all x, y ∈ Rn . Suppose that ei = x − xi where xi is the i’th iterate in the GMRES algorithm (with starting iterate x0 ) and x is the solution of Ax = b. Then for kAei k ≤ ρi/2 kAe0 k ρ=1− α2 . β2 Proof. We consider the inner product << x, y >>=< Ax, Ay > for all x, y ∈ Rn . We shall denote its corresponding norm by k · k∗ = (<< x, y >>)1/2 . It follow from (15.1) that (15.2) << Ax, x >>=< A2 x, Ax >≥ αkAxk2 = αkxk∗ and (15.3) << Ax, y >>=< A2 x, Ay >≤ βkAxk kAyk = βkxk∗ kyk∗ . We consider the Richardson method (15.4) x̃i+1 = x̃i + τ (b − Ax̃i ) using the same initial vector as in the GMRES iteration, i.e., x̃0 = x0 . Let ẽi = x − x̃i . Then (15.2) and (15.3) and the proposition of last class implies that √ kẽi+1 k∗ ≤ ρkẽi k∗ . Repetitively applying the above inequality gives Now, kẽi k∗ ≤ ρi/2 ke0 k∗ . ẽi = (I − τ A)i e0 1 2 so A(ẽi ) = (I − τ A)i r0 = r0 − Aζ for some ζ ∈ Ki . By the minimization property of GMRES, kAei k = kr(θ)k = min kr(ζ)k ζ∈Ki (A) = min kr0 − Aζk ≤ kAẽi k ζ∈Ki (A) = kẽi k∗ ≤ ρi/2 ke0 k∗ = ρi/2 kAe0 k. This completes the proof of the theorem. An alternative to applying GMRES is to apply CG to the normal equations. There are some who argue against this approach as it generally squares the condition number. The hope is that GMRES might produce CG like acceleration without squaring the condition number. This is actually true when A is SPD however, in this case, one should obviously apply CG since it involves much less computation. Even in the case when A satisfies (15.1), at least theoretically, GMRES may not lead to any computational advantage. Indeed, consider applying CG to the normal equations, A∗ Ax = A∗ b. (15.5) Here A∗ denotes the adjoint of A with respect to < ·, · >. By the definition of A∗ , A∗ A is self adjoint with respect to < ·, · > To estimate the convergence associated with conjugate gradient applied to (15.5), we need to estimate the condition number of A∗ A. We first observe that kAxk2 =< Ax, Ax >≤ βkxk kAxk from which it follows that < A∗ Ax, x >= kAxk2 ≤ β 2 kxk2 . Thus, the largest eigenvalue λn of A∗ A satisfies < A∗ Ax, x > < x, x > x∈Rn ,x6=0 < Ax, Ax > ≤ β2. = sup 2 n kxk x∈R ,x6=0 λ0 = sup For the smallest eigenvalue λ0 , we start with the following fact: (15.6) kzk = sup y∈Rn ,y6=0 < z, y > . kyk 3 This inequality is an easy consequence of the Schwarz inequality. Taking z = Ax gives < Ax, y > kAxk = sup kyk y∈Rn ,y6=0 < Ax, x > ≥ ≥ αkxk. kxk Thus, < Ax, Ax > ≥ α2 . λ1 = inf x∈Rn ,x6=0 kxk2 It follows that the condition number of A is bounded by β 2 /α2 . Thus, the error ei resulting from CG applied to the normal equations satisfies 1 − α/β i kAei k ≤ 2 kAe0 k. 1 + α/β We see that CG provides an acceleration over GMRES (at least in theory) converging in O(β/α) iterations as opposed to the O(β 2 /α2 ) suggested by the above theorem. We finish this class with a proof of the last two lemmas of the previous class. Lemma 1. Assume that the Arnoldi algorithm does not stop before the l’th step. Then the vectors v1 , v2 , . . . vl form a < ·, · >-orthonormal basis for the Krylov space Kl (A). Proof. By the definition of v1 and (5), vi is unit sized for i = 1, . . . , l + 1. Provided that v1 , . . . , vj are orthonormal, for l = 1, 2, . . . , j, < vj+1 , vl > = h−1 j+1,j < Avj − j X hi,j vi , vl > i=1 = h−1 j+1,j (< Avj , vl > −hl,j ) = 0. This is the key computation in the induction showing that v1 , , . . . , vl+1 are orthogonal. Hence v1 , . . . , vl+1 are orthonormal and span an l+1 dimensional space. A simple induction shows that vi ∈ Ki . As Ki has at most dimension i, v1 , . . . , vi is an orthonormal basis for Ki , for i = 1, 2, . . . , l + 1. Lemma 2. Assume that the Arnoldi algorithm does not stop before the l’th step. For i = 1, . . . , l, i+1 X (H̃l )k,i vk . Avi = k=1 4 Proof. From (2) of the Arnoli algorithm, wj = hj+1,j vj = Avj − j X hi,j vi . i=1 The lemma follows by rearrangement and the definition of H̃. Example 1. We consider a finite element or finite difference method for the boundary value problem: −∆u + v(x) · ∇u + q(x)u = f for x ∈ Ω, u = 0 for x ∈ ∂Ω. Here v is a vector field (a velocity field). The first term above involves higher order derivatives (order 2) while the remaining terms involve lower order derivatives (order 1 and 0, respectively). As we have already seen, discretization of the first term, e.g., with finite differences, leads to a symmetric and positive definite matrix A while the discretization of the full equation e Often, it gives a non-symmetric matrix (and possibly no longer definite) A. e satisfying is possible to develop an efficient preconditioner for A e αkxk2A ≤ (BAx, x)A , for all x ∈ Rn , (15.7) (BAx, y)A ≤ βkxkA kykA , for all x, y ∈ Rn , with constants α, β independent of h and hence n. The techniques for deriving such preconditioners and why they lead to the above inequalities is beyond the scope of this class. Note that Theorem 1 can be applied to the preconditioned equations and guarantees a uniform (independent of h) rate of iterative convergence. This result holds for the preconditioned GMRES method defined using < ·, · >= (·, ·)A . This convergence rate is not guaranteed for the preconditioned GMRES algorithm based on the standard (dot) inner product.