14: GMRES Math 639 (updated: January 2, 2012) In this class, we consider the GMRES method. This, again, is a minimization method based on the Krylov space Km (A) where A is a general n × n real matrix. Although this method is often developed using the Euclidean inner product and norm, we shall consider a slightly more general method utilizing an arbitrary inner product < ·, · > defined on Rn × Rn . We shall let k · k denote the corresponding norm. Mathematically, the GMRES method computes xi = x0 + θ with θ ∈ Ki (A) chosen so that (14.1) kr(θ)k = min kr(ζ)k. ζ∈Ki (A) Here, as in the previous class, r(ζ) = b − A(x0 + ζ) = r0 − Aζ. The GMRES method is usually applied to general matrix problems (A is not symmetric or self-adjoint with respect to < ·, · >). There are several algorithms which can be developed to solve the GMRES minimization problem (14.1) and there are various claims in the literature as to which ones are more numerically stable. We shall develop a basic GMRES implementation. We start by introducing the following lemma. Lemma 1. There is a unique solution xi = x0 + θ with θ ∈ Ki solving (14.1). It is characterized as the unique function of this form satisfying (14.2) < b − Axi , Aζ >= 0 for all ζ ∈ Ki . This lemma and its proof are completely analogous to Lemma 1 and its proof in Class 12. Given a basis {v1 , . . . , vl } for Ki (as usual, we take the dimension of Ki to be l which may be less than i), the proof sets up a matrix problem for computing θ. Specifically, (14.3) θ= l X cj vj . j=1 The condition (14.2) is equivalent to < A(x − x0 − l X cj vj ), Avm >= 0 for m = 1, . . . , l j=1 1 2 or l X j=1 cj < Avj , Avm >=< b − Ax0 , Avm > for m = 1, . . . , l. This is the same as the matrix problem (14.4) Nc = F where Nj,m =< Avm , Avj > and Fm =< b − Ax0 , Avm >, j, m = 1, . . . , l. That N is nonsingular follows as in the proof of Lemma 1 of class 12 but using the fact that < Aw, Aw >6= 0 when w 6= 0. Remark 1. We get one implementation of GMRES by taking {v1 , . . . , vl } ≡ {r0 , Ar0 , . . . , Al−1 r0 } and solving the system (14.4). This is not the typical implementation of GMRES. In fact, if the condition number of A is large, then the above method tends to be unstable and can lead to blowup. For example, if A has 10000 as an eigenvalue, A50 has 10200 as an eigenvalue. To make the algorithm more stable, one uses an orthonormal basis for Kl . A typical implementation starts from the Anoldi algorithm. This algorithm builds an orthonormal basis for the Krylov space. Algorithm 1. (Arnoldi) Given v1 ≡ r0 /kr0 k, we compute as follows: For j=1,2,. . . ,l do: (1) (2) (3) (4) (5) Enddo Compute hi,j =< Avj , vi > for i = 1, . . . , j. P Compute wj = Avj − ji=1 hi,j vi . Set hj+1,j = kwj k. If hj+1,j = 0, then STOP. Set vj+1 = h−1 j+1,j wj . Assuming that the above process does not stop before the l ’th step, it defines l + 1 vectors v1 , v2 , . . . , vl+1 and an (l + 1) × l (Hessenberg) matrix H̃l given by hi,j if i ≤ j + 1, (H̃l )i,j = 0 otherwise. A (upper) Hessenberg matrix is a matrix for which hi,j = 0 when i > j + 1. We shall use the following two lemmas to define our GMRES implementation. Lemma 2. Assume that the above algorithm does not stop before the l’th step. Then the vectors v1 , v2 , . . . vl form an < ·, · >-orthonormal basis for the Krylov space Kl (A). 3 Lemma 3. Assume that the above algorithm does not stop before the l’th step. For i = 1, . . . , l, i+1 X (H̃l )k,i vk . Avi = k=1 We shall eventually prove these lemmas but first we shall show how they enable us to compute the solution of the GMRES minimization. Expanding our unknown θ ∈ Km as in (14.3) and using Lemma 3 gives (with β = kr0 k) X i+1 l X l X (H̃l )k,i ci vk . b − A(x0 + θ) = βv1 − A = βv1 − ci vi i=1 k=1 i=1 It follows from Lemma 2 that (Why?) kb − A(x0 + θ)k2 = kr0 − Aθk2 = kβe1 − H̃l ck2ℓ2 where c = (c1 , . . . , cl )t and e1 = (1, 0, 0, . . . , 0)t . The desired coefficients satisfy the matrix minimization problem, kβe1 − H̃l ck2ℓ2 = min kβe1 − H̃l dk2ℓ2 . d∈Rl As in the proof of Lemma 1 of Class 12, its solution satisfies (βe1 − H̃l c, H̃l d) = 0 for all d ∈ Rl . This can be rewritten (14.5) (H̃l )t H̃l c = (H̃l )t e1 . Our GMRES algorithm can be restated as follows: Algorithm 2. (GMRES) Compute the approximate solution xl by the following steps. (1) Run the Arnoldi algorithm for l steps, defining the Hessenberg matrix H̃l and the orthonormal basis {v1 , v2 , . . . , vl }. (2) Compute the coefficients of θ ∈ Kl in {v1 , v2 , . . . , vl } by solving the matrix (minimization) problem (14.5). (3) Compute xl by xl = x0 + θ = x0 + l X ci vi . i=1 The above algorithm involves significantly more work and memory than the conjugate gradient algorithm. Note that the second item in the j’th step in the Arnoldi algorithm involves a linear combination of the j vectors 4 v1 , . . . , vj . In addition, all of the vectors in {v1 , v2 , . . . , vl } must be stored for use in Step 3 above. It is natural to ask if GMRES converges and when it can be used as part of an iterative strategy. A simple answer to the first problem is that for a nonsingular matrix A, GMRES converges in at most n iterations where n is the dimension of the problem. Indeed, the Cayley-Hamilton Theorem implies that P (A) = 0 where P is the characteristic polynomial. Let ci be the coefficient of Ai in P . Then a simple manipulation gives A−1 = − 1 (c1 + c2 A + · · · + cn An−1 ) c0 and hence e0 = A−1 r0 = − 1 (c1 + c2 A + · · · + cn An−1 )r0 ∈ Kn . c0 This means that there is a θ ∈ Kn with e0 = θ, i.e., x = x0 + θ and r(θ) = b − A(x0 + θ) = A(x − x0 − θ) = 0 solves the minimization problem (14.1). It follows that xn = x0 + θ. Example 1. Unfortunately, the above result is sharp unless further assumptions are made on the matrix A. Indeed, consider a right circular shift operator on Rn defined by (x1 , x2 , . . . , xn ) → (xn , x1 , x2 , . . . , xn−1 ). This is a linear mapping (what is its matrix?) of Rn onto itself. It is clearly nonsingular. Consider applying GMRES with inner product (·, ·) to a problem with an initial residual of (1, 0, 0, . . . , 0). For l < n, Kl = {(c1 , c2 , . . . , cl , 0, 0, . . . , 0) : ci ∈ R}. Thus, the vectors appearing in the minimization on the right hand side of (14.1) are of the form r(ζ) = (1, −c1 , −c2 , −c3 , . . . , −cl , 0, 0, . . . , 0). The solution of (14.1) is when c1 = c2 = · · · = cl = 0. Thus, GMRES does nothing for this problem until the n’th step when it produces the solution. This example illustrates that the Krylov space may not provide good approximation to the error in general. It is interesting to note that for this example, the normal equations immediately produces the solution. In fact, the transpose of the right circular shift A is a left circular shift (check this!) so that their product is the identity. Thus, the normal equations corresponding to Ax = b reduce to x = At b. 5 A common occurrence in practice involves positive definite matrices, i.e., a matrix A which, although possibly not symmetric, satisfies (14.6) < Ax, x > > 0 when x 6= 0. We shall investigate the convergence of GMRES in this case in terms of positive constants α, β appearing in the following two inequalities: (14.7) αkxk2 ≤< Ax, x >, for all x ∈ Rn , < Ax, y > ≤ βkxk kyk, for all x, y ∈ Rn . The first inequality above is often called “coercivity”. When A satisfies (14.6), there exists constants α, β > 0 satisfying (14.7) and visa-versa. It is immediate from the two inequalities that αkxk2 ≤ βkxk2 so α ≤ β. Recall that the key to the analysis of CG and MINRES was to show that a linear method with proper parameter choice converged. Then, we used the minimization property to conclude the desired result. We shall do the same for GMRES. We start by considering the Richardson method (14.8) xi+1 = xi + τ (b − Axi ). Let ei denote the error x − xi where x is the solution of Ax = b. Then we have the following proposition. Proposition 1. Assume that A satisfies (14.7). Let τ= α2 α and ρ = 1 − . β2 β2 Then the error for (14.8) satisfies kei+1 k ≤ √ ρkei k. Proof. We clearly have so ei+1 = ei − τ Aei kei+1 k2 =< ei − τ Aei , ei − τ Aei > Applying (14.7) gives It follows that = kei k2 − 2τ < Aei , ei > +τ 2 kAei k2 . kAei k2 =< Ae, Aei >≤ βkei k kAei k. kAei k ≤ βkei k. 6 Combining the above inequalities and using the other inequality in (14.7) gives kei+1 k2 ≤ (1 − 2τ α + τ 2 β 2 )kei k2 α2 = 1 − 2 kei k2 . β The proposition immediately follows.