13: CG-eigenvalues and MINRES

advertisement
13: CG-eigenvalues and MINRES
Math 639
(updated: January 2, 2012)
We start this class by investigating the use of CG to get bounds on the
spectrum for A. As before, A is a SPD n × n real matrix. We start by
considering the Rayleigh-Ritz eigenvalue approximation. For a symmetric
matrix A,
(13.1)
λn =
(Ax, x)
(Ax, x)
and λ1 = min
.
n
x∈R ,x6=0 (x, x)
,x6=0 (x, x)
max
n
x∈R
Given a subspace K ⊆ Rn , we can approximate λn and λ1 by replacing Rn
above by K, i.e.,
(13.2)
λ̃n =
max
x∈K,x6=0
(Ax, x)
(Ax, x)
and λ̃1 = min
.
x∈K,x6=0 (x, x)
(x, x)
This is the so-called Rayliegh-Ritz approximation.
We can get eigenvalue approximations from our CG algorithm by computing λ̃n and λ̃1 satisfying (13.2) using K = Km (the m’th Krylov space). The
numbers λ̃n and λ̃1 are, respectively, the largest and smallest eigenvalues
of the generalized eigenvalue problem: Find pairs {λ, φ} with φ ∈ Km and
λ ∈ R satisfying
(13.3)
(Aφ, θ) = λ(φ, θ), for all θ ∈ Km .
It is always possible to solve eigenvalue problems of the form of (13.3) by introducing a basis {v1 . . . , vl }. Here l denotes the dimension of Km which may
be less then m. The obvious basis is r0 , Ar0 , . . . , Al−1 r0 but this turns out
not to be the most computationally effective. Given a basis {v1 , v2 , . . . , vl−1 },
we then solve the (generalized) matrix eigenvalue problem
(13.4)
M x = λN x for x ∈ Rl .
Here
Mi,j = (Avi , vj ) and Ni,j = (vi , vj ),
i, j = 1, . . . , l.
Note that both M and N are SPD l×l matrices. Although there are routines
available for solving this problem, we can use some of the properties of the
CG algorithm to develop a much more efficient approach.
Specifically, we use {r0 , r1 , . . . , rl−1 } for our basis. Note that by (I.1) in
the proof of Theorem (CG-equivalence) {p0 , p1 , . . . , pl−1 } is a basis for Kl
and so it follows from pi = ri − βi−1 pi−1 that {r0 , r1 , . . . , rl−1 } is also a
1
2
basis for Kl . We first note that if j > i, (I.2) (again, the proof of Theorem
(CG-equivalence) implies
(rj , ri ) = (ej , ri )A = 0
so the N matrix is diagonal. Its diagonal entries are given by
Ni,i = (ri , ri ) = (ri , pi ) + βi−1 (ei , pi−1 )A = (ri , pi ).
Note that the quantity (ri , pi ) appears as the numerator in αi .
We now consider the matrix M . First we observe that it is tridiagonal
since if j > i + 1, (I.2) implies
(Arj , ri ) = (ej , Ari )A = 0.
We next compute its diagonal entries. Clearly, M1,1 = (Ap0 , p0 ). This is
the denominator in α0 . In addition, ri = pi + βi−1 pi−1 is an A-orthogonal
decomposition. The Pythagorean Theorem holds (check it!) for arbitrary
inner products so
2
kri k2A = kpi k2A + βi−1
kpi−1 k2A .
This can be rewritten
2
Mi,i = (Ari , ri ) = (Api , pi ) + βi−1
(Api−1 , pi−1 ).
Thus, for i > 1, the value of Mi,i can computed from βi−1 and the denominators for αi and αi−1 . Finally, applying (I.1) gives
Mj,j−1 = (Arj , rj−1 ) = (pj , rj−1 )A + βj−1 (pj−1 , rj−1 )A
= βj−1 (pj−1 , rj−1 )A = βj−1 (Apj−1 , pj−1 ).
We see the denominator of αi−1 appearing here as well.
Finally, if x satisfies (13.4) with eigenvalue λ if and only if y = N 1/2 x
satisfies
fy = λy
f = N −1/2 M N −1/2 .
M
where M
This is a standard eigenvalue problem involving a symmetric tridiagonal
matrix. The nonzero element of M and N can be gathered during the CG
f and it eigenvalues can be computed as part of a
iteration and the matrix M
post processing step. Efficient procedures for computing the eigenvalues for
symmetric tridiagonal matrices are available in standard software libraries
such as LAPACK, see www.netlib.org/lapack/.
This technique goes over to the preconditioned case as well and produces
estimates for the condition number for the preconditioned system. These
estimates are useful in that they provide information on the quality of the
preconditioner. Specifically, a small condition number indicates a well designed preconditioner.
3
Another interesting property of the conjugate gradient method is that
it performs well when there are a few outlying eigenvalues with most of
the spectrum restricted to a small interval. We illustrate this by a simple
example where there is only one outlying eigenvalue. We consider a SPD
matrix A with eigenvalues 0 < λ0 << λ1 ≤ λ2 ≤ . . . ≤ λn . Here we
assume that K1 = λn /λ1 is much smaller than K0 = λn /λ0 . Applying the
convergence theorem for CG gives
√
K0 − 1 i
ke0 kA .
(13.5)
kei kA ≤ √
K0 + 1
This estimate assumes nothing about the distribution of the spectrum of A.
Alternatively, as conjugate gradient is the minimizer, we can bound the
error by approximating it by any element in the Krylov space. In this case,
we use
i−1
Y
−1
(I − τj A)e0 kei kA ≤ (I − λ0 A)
.
A
j=1
Here the parameters τ1 , . . . , τi−1 are chosen using the Chebyschev technique
applied to the interval [λ1 , λn ]. We estimate the operator norm on the right
hand side above by
i−1
Y
−1
(I − λ A)
(I − τj A)
0
A
j=1
i−1
Y
n
≤ max (1 − λ−1
λ
)
(I
−
τ
λ
)
j k
k
0
k=1
≤
j=1
i−1
Y
max
(I
−
τ
λ)
(1 − λ−1
λ)
j
0
λ∈[λ1 ,λn ]
j=1
λ−1
0 λ
Note that the factor I −
produces 0 when λ = λ0 and hence we can
exclude λ0 from the middle maximum above. Applying the multi-parameter
theorem and the bound
gives
(13.6)
|1 − λ−1
0 λ| ≤ K0 for λ ∈ [λ1 , λn ]
√
K1 − 1 i−1
kei kA ≤ 2K0 √
ke0 kA .
K1 + 1
Comparing (13.5) and (13.6), we see that although (13.6) has a larger constant (2K0 ), it produces a much better asymptotic (as i becomes) reduction
assuming that λ1 /λ0 is large.
4
MINRES: There are many other Krylov based methods around. For
example, we consider the case when A is nonsingular and symmetric (but not
positive definite). In this case, A2 is SPD and we consider the minimization
problem: xi = x0 + θ for θ ∈ Ki ≡ Ki (A, r0 ) satisfying
(13.7)
kei kA2 = min kx − (x0 + ζ)kA2 .
ζ∈Ki
Defining the residual, r(ζ) = b − A(x0 + ζ), (13.7) can be rewritten
(13.8)
kri kℓ2 = kr(θ)kℓ2 = min kr(ζ)kℓ2
ζ∈Ki
i.e., xi is the vector in x0 + Ki which minimizes the residual over all vectors
in x0 + Ki .
This problem could be solved by setting up an i × i matrix as discussed
in the lemma of the previous class but can be much more efficiently solved
by the following conjugate gradient like algorithm.
Algorithm 1. (MINRES). Let A be a symmetric nonsingular n × n matrix
and x0 ∈ Rn (the initial iterate) and b ∈ Rn (the right hand side) be given.
Start by setting p0 = r0 = b − Ax0 . Then for i = 0, 1, . . ., define
xi+1 = xi + αi pi ,
(13.9)
where αi =
(ri , Api )
(Api , Api )
ri+1 = ri − αi Api ,
pi+1 = ri+1 − βi pi ,
where βi =
(Ari+1 , Api )
.
(Api , Api )
Exercise 1. The above algorithm requires two evaluations of A per step,
namely Api and Ari+1 . By possibly introducing extra vectors, find an implementation which only requires one A evaluation per step after startup.
That the MINRES algorithm solves the minimization problem (13.8) can
be seen by repeating the proof of Theorem (CG-equivalence) replacing the
A inner product by the A2 -inner product (go through the proof and see!).
Application: We consider a finite difference approximation to a Helmholtz
equation:
−∆u(x) − k 2 u(x) = f (x), in Ω,
u(x) = 0 on ∂Ω.
This equation models, for example, time harmonic acoustic waves in bounded
waveguide (Ω contained in R2 or R3 ). The finite difference equations have
two contributions, one from the Laplacian and the other from the lower
order term (the k 2 -term), i.e. A = A1 + A2 . We considered the finite
difference approximation to the Laplacian earlier in this course and saw that
5
it resulted in a symmetric and positive definite matrix A1 . In contrast,
the second term results in A2 = −k 2 I where I is the identity matrix. Let
0 < λ1 ≤ λ2 ≤ · · · ≤ λn be the eigenvalues of A1 . We then have
σ(A) = {λi − k 2 : i = 1, . . . , n}.
When k 2 is greater than λ0 , A is symmetric but not positive definite. For
this problem, A is nonsingular only if k 2 is not an eigenvalue of A1 . The
resulting matrix A is a natural candidate for the MINRES algorithm.
We next provide an analysis of the above algorithm. As MINRES produces
the solution of the minimization (13.7), its analysis consists of coming up
with a good approximation in the Krylov space. Recall that for the analysis
of CG, we obtained a good approximation in the Krylov space by using
the approximation based on the Chebyschev polynomials. We shall use the
Chebyschev polynomials again but on an interval which results from the
SPD operator A2 . Our assumptions on A imply that there are eigenvalues
of A, namely, −a, −b, c, d with a, b, c, d positive and satisfying
A2
σ(A) ⊂ [−a, −b] ∩ [c, d].
The spectrum of
is contained in the interval [λ1 , λn ] where λ1 = min(c2 , b2 )
and λn = max(a2 , d2 ). After 2m + 1 iterations,
Y
m
2 ke2m+1 kA2 ≤ (I − τi A ) ke0 kA2
i=1
A2
where {τ1 , τ2 , . . . , τm } are the optimal Chebyschev iteration parameters based
on the interval [λ1 , λn ]. Note that λ1 and λn are, respectively, the smallest
and largest eigenvalue of A2 (why?). Applying the multi-parameter theorem
gives the following result.
Theorem 1. Suppose that A is a symmetric and nonsingular real matrix.
Then, the errors corresponding to the MINRES iteration satisfy
√
K −1 m
(13.10)
ke2m+1 kA2 ≤ 2 √
ke0 kA2 .
K +1
Here K is the spectral condition number of A2 .
Remark 1. Alternatively, one could apply the conjugate gradient method
directly to the system A2 x = Ab. These are the normal equations in this
case. The resulting iteration requires 2 matrix vector evaluations per step
(so m steps involve more or less the same number of matrix evaluations as
2m steps of MINRES). The resulting cg errors satisfy
√
K −1 m
kem kA2 ≤ 2 √
ke0 kA2
K +1
6
which is essentially the same as (13.10). Thus, at least theoretically, MINRES is no better than the application of conjugate gradient to the normal
equations. Note that we only used the terms in the Krylov space having even
powers of A in the analysis of MINRES. The odd powers might give improved
convergence in some examples. However, it is possible to construct problems
(how would you do this?) for which MINRES and the normal equation approach gives rise to exactly the same sequence of iterates, i.e., the result at
2i + 1 for MINRES equals the result at i for CG-normal.
Download