13: CG-eigenvalues and MINRES Math 639 (updated: January 2, 2012) We start this class by investigating the use of CG to get bounds on the spectrum for A. As before, A is a SPD n × n real matrix. We start by considering the Rayleigh-Ritz eigenvalue approximation. For a symmetric matrix A, (13.1) λn = (Ax, x) (Ax, x) and λ1 = min . n x∈R ,x6=0 (x, x) ,x6=0 (x, x) max n x∈R Given a subspace K ⊆ Rn , we can approximate λn and λ1 by replacing Rn above by K, i.e., (13.2) λ̃n = max x∈K,x6=0 (Ax, x) (Ax, x) and λ̃1 = min . x∈K,x6=0 (x, x) (x, x) This is the so-called Rayliegh-Ritz approximation. We can get eigenvalue approximations from our CG algorithm by computing λ̃n and λ̃1 satisfying (13.2) using K = Km (the m’th Krylov space). The numbers λ̃n and λ̃1 are, respectively, the largest and smallest eigenvalues of the generalized eigenvalue problem: Find pairs {λ, φ} with φ ∈ Km and λ ∈ R satisfying (13.3) (Aφ, θ) = λ(φ, θ), for all θ ∈ Km . It is always possible to solve eigenvalue problems of the form of (13.3) by introducing a basis {v1 . . . , vl }. Here l denotes the dimension of Km which may be less then m. The obvious basis is r0 , Ar0 , . . . , Al−1 r0 but this turns out not to be the most computationally effective. Given a basis {v1 , v2 , . . . , vl−1 }, we then solve the (generalized) matrix eigenvalue problem (13.4) M x = λN x for x ∈ Rl . Here Mi,j = (Avi , vj ) and Ni,j = (vi , vj ), i, j = 1, . . . , l. Note that both M and N are SPD l×l matrices. Although there are routines available for solving this problem, we can use some of the properties of the CG algorithm to develop a much more efficient approach. Specifically, we use {r0 , r1 , . . . , rl−1 } for our basis. Note that by (I.1) in the proof of Theorem (CG-equivalence) {p0 , p1 , . . . , pl−1 } is a basis for Kl and so it follows from pi = ri − βi−1 pi−1 that {r0 , r1 , . . . , rl−1 } is also a 1 2 basis for Kl . We first note that if j > i, (I.2) (again, the proof of Theorem (CG-equivalence) implies (rj , ri ) = (ej , ri )A = 0 so the N matrix is diagonal. Its diagonal entries are given by Ni,i = (ri , ri ) = (ri , pi ) + βi−1 (ei , pi−1 )A = (ri , pi ). Note that the quantity (ri , pi ) appears as the numerator in αi . We now consider the matrix M . First we observe that it is tridiagonal since if j > i + 1, (I.2) implies (Arj , ri ) = (ej , Ari )A = 0. We next compute its diagonal entries. Clearly, M1,1 = (Ap0 , p0 ). This is the denominator in α0 . In addition, ri = pi + βi−1 pi−1 is an A-orthogonal decomposition. The Pythagorean Theorem holds (check it!) for arbitrary inner products so 2 kri k2A = kpi k2A + βi−1 kpi−1 k2A . This can be rewritten 2 Mi,i = (Ari , ri ) = (Api , pi ) + βi−1 (Api−1 , pi−1 ). Thus, for i > 1, the value of Mi,i can computed from βi−1 and the denominators for αi and αi−1 . Finally, applying (I.1) gives Mj,j−1 = (Arj , rj−1 ) = (pj , rj−1 )A + βj−1 (pj−1 , rj−1 )A = βj−1 (pj−1 , rj−1 )A = βj−1 (Apj−1 , pj−1 ). We see the denominator of αi−1 appearing here as well. Finally, if x satisfies (13.4) with eigenvalue λ if and only if y = N 1/2 x satisfies fy = λy f = N −1/2 M N −1/2 . M where M This is a standard eigenvalue problem involving a symmetric tridiagonal matrix. The nonzero element of M and N can be gathered during the CG f and it eigenvalues can be computed as part of a iteration and the matrix M post processing step. Efficient procedures for computing the eigenvalues for symmetric tridiagonal matrices are available in standard software libraries such as LAPACK, see www.netlib.org/lapack/. This technique goes over to the preconditioned case as well and produces estimates for the condition number for the preconditioned system. These estimates are useful in that they provide information on the quality of the preconditioner. Specifically, a small condition number indicates a well designed preconditioner. 3 Another interesting property of the conjugate gradient method is that it performs well when there are a few outlying eigenvalues with most of the spectrum restricted to a small interval. We illustrate this by a simple example where there is only one outlying eigenvalue. We consider a SPD matrix A with eigenvalues 0 < λ0 << λ1 ≤ λ2 ≤ . . . ≤ λn . Here we assume that K1 = λn /λ1 is much smaller than K0 = λn /λ0 . Applying the convergence theorem for CG gives √ K0 − 1 i ke0 kA . (13.5) kei kA ≤ √ K0 + 1 This estimate assumes nothing about the distribution of the spectrum of A. Alternatively, as conjugate gradient is the minimizer, we can bound the error by approximating it by any element in the Krylov space. In this case, we use i−1 Y −1 (I − τj A)e0 kei kA ≤ (I − λ0 A) . A j=1 Here the parameters τ1 , . . . , τi−1 are chosen using the Chebyschev technique applied to the interval [λ1 , λn ]. We estimate the operator norm on the right hand side above by i−1 Y −1 (I − λ A) (I − τj A) 0 A j=1 i−1 Y n ≤ max (1 − λ−1 λ ) (I − τ λ ) j k k 0 k=1 ≤ j=1 i−1 Y max (I − τ λ) (1 − λ−1 λ) j 0 λ∈[λ1 ,λn ] j=1 λ−1 0 λ Note that the factor I − produces 0 when λ = λ0 and hence we can exclude λ0 from the middle maximum above. Applying the multi-parameter theorem and the bound gives (13.6) |1 − λ−1 0 λ| ≤ K0 for λ ∈ [λ1 , λn ] √ K1 − 1 i−1 kei kA ≤ 2K0 √ ke0 kA . K1 + 1 Comparing (13.5) and (13.6), we see that although (13.6) has a larger constant (2K0 ), it produces a much better asymptotic (as i becomes) reduction assuming that λ1 /λ0 is large. 4 MINRES: There are many other Krylov based methods around. For example, we consider the case when A is nonsingular and symmetric (but not positive definite). In this case, A2 is SPD and we consider the minimization problem: xi = x0 + θ for θ ∈ Ki ≡ Ki (A, r0 ) satisfying (13.7) kei kA2 = min kx − (x0 + ζ)kA2 . ζ∈Ki Defining the residual, r(ζ) = b − A(x0 + ζ), (13.7) can be rewritten (13.8) kri kℓ2 = kr(θ)kℓ2 = min kr(ζ)kℓ2 ζ∈Ki i.e., xi is the vector in x0 + Ki which minimizes the residual over all vectors in x0 + Ki . This problem could be solved by setting up an i × i matrix as discussed in the lemma of the previous class but can be much more efficiently solved by the following conjugate gradient like algorithm. Algorithm 1. (MINRES). Let A be a symmetric nonsingular n × n matrix and x0 ∈ Rn (the initial iterate) and b ∈ Rn (the right hand side) be given. Start by setting p0 = r0 = b − Ax0 . Then for i = 0, 1, . . ., define xi+1 = xi + αi pi , (13.9) where αi = (ri , Api ) (Api , Api ) ri+1 = ri − αi Api , pi+1 = ri+1 − βi pi , where βi = (Ari+1 , Api ) . (Api , Api ) Exercise 1. The above algorithm requires two evaluations of A per step, namely Api and Ari+1 . By possibly introducing extra vectors, find an implementation which only requires one A evaluation per step after startup. That the MINRES algorithm solves the minimization problem (13.8) can be seen by repeating the proof of Theorem (CG-equivalence) replacing the A inner product by the A2 -inner product (go through the proof and see!). Application: We consider a finite difference approximation to a Helmholtz equation: −∆u(x) − k 2 u(x) = f (x), in Ω, u(x) = 0 on ∂Ω. This equation models, for example, time harmonic acoustic waves in bounded waveguide (Ω contained in R2 or R3 ). The finite difference equations have two contributions, one from the Laplacian and the other from the lower order term (the k 2 -term), i.e. A = A1 + A2 . We considered the finite difference approximation to the Laplacian earlier in this course and saw that 5 it resulted in a symmetric and positive definite matrix A1 . In contrast, the second term results in A2 = −k 2 I where I is the identity matrix. Let 0 < λ1 ≤ λ2 ≤ · · · ≤ λn be the eigenvalues of A1 . We then have σ(A) = {λi − k 2 : i = 1, . . . , n}. When k 2 is greater than λ0 , A is symmetric but not positive definite. For this problem, A is nonsingular only if k 2 is not an eigenvalue of A1 . The resulting matrix A is a natural candidate for the MINRES algorithm. We next provide an analysis of the above algorithm. As MINRES produces the solution of the minimization (13.7), its analysis consists of coming up with a good approximation in the Krylov space. Recall that for the analysis of CG, we obtained a good approximation in the Krylov space by using the approximation based on the Chebyschev polynomials. We shall use the Chebyschev polynomials again but on an interval which results from the SPD operator A2 . Our assumptions on A imply that there are eigenvalues of A, namely, −a, −b, c, d with a, b, c, d positive and satisfying A2 σ(A) ⊂ [−a, −b] ∩ [c, d]. The spectrum of is contained in the interval [λ1 , λn ] where λ1 = min(c2 , b2 ) and λn = max(a2 , d2 ). After 2m + 1 iterations, Y m 2 ke2m+1 kA2 ≤ (I − τi A ) ke0 kA2 i=1 A2 where {τ1 , τ2 , . . . , τm } are the optimal Chebyschev iteration parameters based on the interval [λ1 , λn ]. Note that λ1 and λn are, respectively, the smallest and largest eigenvalue of A2 (why?). Applying the multi-parameter theorem gives the following result. Theorem 1. Suppose that A is a symmetric and nonsingular real matrix. Then, the errors corresponding to the MINRES iteration satisfy √ K −1 m (13.10) ke2m+1 kA2 ≤ 2 √ ke0 kA2 . K +1 Here K is the spectral condition number of A2 . Remark 1. Alternatively, one could apply the conjugate gradient method directly to the system A2 x = Ab. These are the normal equations in this case. The resulting iteration requires 2 matrix vector evaluations per step (so m steps involve more or less the same number of matrix evaluations as 2m steps of MINRES). The resulting cg errors satisfy √ K −1 m kem kA2 ≤ 2 √ ke0 kA2 K +1 6 which is essentially the same as (13.10). Thus, at least theoretically, MINRES is no better than the application of conjugate gradient to the normal equations. Note that we only used the terms in the Krylov space having even powers of A in the analysis of MINRES. The odd powers might give improved convergence in some examples. However, it is possible to construct problems (how would you do this?) for which MINRES and the normal equation approach gives rise to exactly the same sequence of iterates, i.e., the result at 2i + 1 for MINRES equals the result at i for CG-normal.