Class Notes 7: CONVERGENCE IN NATURAL NORMS FOR THE SPD CASE

advertisement
Class Notes 7: CONVERGENCE IN NATURAL NORMS FOR THE
SPD CASE
Math 639d
Due Date: Oct. 16
(updated: October 9, 2014)
In earlier classes, we saw that provided that the spectral radius of the
reduction matrix G was less than one, there was an abstractly defined norm
k · k∗ for which
(7.1)
kGk∗ < 1.
This, in term, implied that the method ultimately converged for any initial
iterate and right hand side. We also saw that there were examples where
even though ρ(G) was zero, no convergence in the L∞ norm was observed
until the n’th step where n was the number of unknowns (see, Example 1 of
Class Notes 3). This, somewhat annoying behavior was a result of a Jordan
block with a very large order. Even when large dimensional Jordan blocks
are not present, the abstract norm k · k∗ may be far from some natural norm.
For this class, we shall assume that A is a (real) symmetric and positive
definite (SPD) n × n matrix. Our goal will be to study the convergence of
iterative methods in two natural norms, the first being the ℓ2 norm and the
second being the so-called “operator norm.” The operator norm or A-norm
(for x ∈ Rn ) is defined by
(7.2)
kxkA = (Ax, x)1/2
where (·, ·) is defined by the dot product,
(7.3)
(x, y) =
n
X
x j yj .
j=1
In your linear algebra course, you should have studied inner products.
We shall start with inner products defined on Rn . An inner product is a
mapping from Rn × Rn into R. One often uses the notation < ·, · > so that
< x, y > is a real number when x, y are vectors in Rn . The inner product is
required to satisfy the following axioms:
(1) < x, x > ≥ 0 for all x ∈ Rn and equals 0 only if x = 0 (positive
definite).
(2) < x, y >=< y, x > for all x, y ∈ Rn (symmetry).
(3) For α, β ∈ R and x, y, z ∈ Rn ,
< αx + βy, z >= α < x, z > +β < y, z >
1
2
and
< x, αy + βz >= α < x, y > +β < x, z >
(bilinearity).
The simplest inner product in Rn is the ℓ2 -inner product defined by (7.3).
Exercise 1. Show that (7.3) satisfies the inner product axioms. Show that
(·, ·)A defined by
(7.4)
(x, y)A = (Ax, y) for all x, y ∈ Rn
satisfies the inner product axioms (this only holds for symmetric and positive
definite A).
Inner products have a number of interesting properties. The first is that
for any inner product < ·, · >, there is a corresponding norm given by
(7.5)
kxk =< x, x >1/2 for all x ∈ Rn .
In the case of the dot product, kxk is just the length of x. It is obvious
that the first two axioms for being a norm are satisfied by (7.5). That
the triangle inequality holds is a consequence of the Cauchy-Schwarz (or
Schwarz) inequality:
(7.6)
| < x, y > | ≤ kxk kyk for all x, y ∈ Rn .
Indeed,
kx + yk2 =< x, x > +2 < x, y > + < y, y >
≤ kxk2 + 2kxkkyk + kyk2 = (kxk + kyk)2
and the triangle inequality follows by taking square roots.
The inequality (7.6) is obvious for the dot product from the identity
(7.7)
(x, y) = kxk kyk cos(θ)
where θ is the angle between x and y. Although we shall not prove this
inequality in general, the Cauchy-Schwarz inequality holds for any inner
product, e.g .,
|(x, y)A | ≤ kxkA kykA .
Whenever we deal with an inner product, we shall almost always be using
the corresponding norm. Usually, we shall indicate that the norm corresponds to the inner product. In any event, we shall always explicitly point
out if we are using a norm which does not correspond to the inner product
norm.
Remark 1. Note that the A-norm is the norm corresponding to the (·, ·)A
inner product (the A-inner product). Consequently, the expression given by
the right hand side of (7.2) is indeed a norm.
3
As we have already noticed, different norms assign different “lengths” to
the same vector. The norms coming from different inner products have the
same property. Besides defining a length (norm), inner products give rise
to a notion of angle by generalizing (7.7), i.e., the angle between nonzero
vectors x and y with respect to the inner product < ·, · > is θ where
< x, y >
.
cos(θ) =
kxkkyk
Two vectors x and y are orthogonal with respect to the inner product < ·, · >
if θ = π/2, i.e., < x, y >= 0. The angles between two vectors corresponding
to different inner products are often not the same as seen in the following
example.
Example 1. Let
x = (1, 1)t and y = (1, −1)t
then x and y are orthogonal in (·, ·), i.e., (x, y) = 0. The matrix
1 0
A=
0 2
is symmetric and positive definite and (x, y)A = −1 (compute it!). Thus, x
and y are not orthogonal in (·, ·)A .
You should recall from you linear algebra class, that a set of vectors
{v1 , v2 , . . . , vk } are (mutually) orthogonal if < vi , vj >= 0 when i 6= j. This
set of vectors is called orthornormal if in addition, kvi k = 1 for i = 1, . . . , k.
Of course, k · k is the norm corresponding to < ·, · >.
Definition 1. We say that an n × n matrix B is “self-adjoint” with respect
to the inner product < ·, · > on Rn if
< Bx, y >=< x, By > for all x, y ∈ Rn .
It is easy to check (do it!) that an n × n matrix A is self adjoint with respect
to (·, ·) if and only if A is symmetric. We also generalize the notion of
positive definite and say that A is positive definite with respect to < ·, · > if
< Ax, x >≥ 0 for all x ∈ Rn and < Ax, x >= 0 only if x = 0.
Example 2. Let A and B be n × n matrices with A symmetric and positive
definite and B symmetric. In general, BA is not symmetric (unless B and
A commute). However, BA is always self adjoint with respect to the A-inner
product since
(BAx, y)A = (ABAx, y) = (BAx, Ay) = (Ax, BAy) = (x, BAy)A .
Note that if B is positive definite then BA is positive definite with respect to
the A-inner product since
(BAx, x)A = (BAx, Ax)
4
which is non-negative (and zero only if x = 0 since A is non-singular).
Exercise 2. Assume that A and B are as in the previous example with B
SPD. Show that B −1 is also symmetric and positive definite. Show that BA
is self adjoint with respect to the B −1 -inner product.
The following theorem will be important for the analysis of iterative methods. Its proof is available in any reasonably good book on linear algebra.
Theorem 1. (Spectral Theorem). Let A be an n × n symmetric real matrix.
Then there is an orthornormal basis of (real) eigenvectors for A, i.e. there
is a set n vectors {φ1 , . . . , φn } satisfying
Aφi = λi φi , for i = 1, . . . , n
and
(φi , φj ) = δij ≡
1
0
if i = j,
otherwise.
We shall also use a more general version given in the following theorem.
The result in the generality stated can be easily derived from the simpler
case and is an exercise (below).
Theorem 2. (A More General Spectral Theorem). Let A and B be n × n
matrices with B symmetric and positive definite and A self adjoint with
respect to the B-inner product. Then there is a B-orthornormal basis of
(real) eigenvectors with real eigenvalues for A, i.e. there is a set of n vectors
{φ1 , . . . , φn } satisfying
Aφi = λi φi , for i = 1, . . . , n
and
(φi , φj )B = δij ≡
1
0
if i = j,
otherwise.
As an example of the application of the first theorem, we consider the
definition of the square root of a symmetric and positive n×n matrix A. Let
{φ1 , . . . , φn } be the orthornormal basis of eigenvectors. We expand x ∈ Rn
as
n
X
c i φi
x=
i=1
and define
(7.8)
A1/2 x =
n
X
i=1
1/2
λi c i φ i .
5
We already observed in an earlier class that the eigenvalues of a positive
1/2
definite matrix A are all positive and we set λi to be the positive square
root. The above definition is more of a transformation representation of the
matrix A1/2 . A more explicit (equivalent) representation is given by
1/2
(7.9)
A
=
n
X
1/2
λi φi [φi ]t .
i=1
This is a sum of n × n matrices since the matrix product φi [φi ]t is an n × n
matrix for each i. It is not hard to check that:
(1) The matrix product A1/2 A1/2 equals A.
(2) A1/2 is invertible and its inverse is given by A−1/2 (defined as in (7.8)
1/2
−1/2
).
or (7.9) with λi replaced by λi
1/2
(3) The matrix A
is symmetric and positive definite.
Exercise 3. Use Theorem 1 to prove the general version of the spectral
theorem. First show that C = B 1/2 AB −1/2 is a symmetric matrix. Apply
Theorem 1 to define an orthogonal basis {φ̃1 , . . . , φ̃n } of eigenvectors for C.
Then show that {φ1 , φ2 , . . . , φn } where φj = B −1/2 φ̃i are B-orthornormal
eigenvectors for A. Hint: You will need to use properties (1),(2),(3) above
with B replacing A.
Corollary 1. Let A and B be as in the above theorem. Then
kAkB = ρ(A).
Here kAkB denotes the operator norm of A induced from the vector norm
k · kB .
Proof. Let λi and φi , i = 1, . . . , n be as in the general spectral theorem. Let
x ∈ Rn be nonzero with expansion
x=
n
X
c i φi .
i=1
Then, by bilinearity,
(x, x)B = (
=
n
X
i=1
n
X
c i φi ,
c2i .
i=1
n
X
j=1
c j φj )B =
n
X
i,j=1
ci cj (φi , φj )B
6
A similar computation using the identity
n
n
X
X
c i λi φ i
ci Aφi =
Ax =
i=1
i=1
gives
(Ax, Ax)B =
n
X
c2i λ2i
≤ ρ(A)
i=1
2
n
X
c2i = ρ(A)2 (x, x)B .
i=1
Thus,
1/2
(Ax, Ax)B
kAxkB
=
≤ ρ(A).
1/2
kxkB
(x, x)B
Taking the supremum over x above gives kAkB ≤ ρ(A). The opposite inequality was proved in an earlier class, i.e., kAk ≥ ρ(A) holds for any norm
k · k.
Application: Let A be a symmetric and positive definite n × n matrix
and consider the linear iterative method (the Richardson Method)
xi+1 = xi + τ (b − Axi ).
Its reduction matrix is given by G = (I − τ A). We clearly have that
n
ρ(G) = max |(1 − τ λi )|
i=1
where λi , i = 1, . . . , n are the eigenvalues of A. Here we used the fact that the
eigenvalues of G = (I − τ A) are (1 − τ λi ), i = 1, . . . , n. Since A is symmetric
with respect to the usual inner product, G is also. It is also obvious that
G is self adjoint with respect to the A-inner product (check it!). It follows
from the corollary that kGk = kGkA = ρ(G). This immediately implies that
kei+1 k ≤ ρ(G)kei k and kei+1 kA ≤ ρ(G)kei kA .
This means that each step of the iteration is guaranteed to reduce the error
by a factor of ρ(G) in either of the natural norms k · k or k · k,
If λ̃ is any bound on ρ(A) then taking τ = 1/λ̃ leads to
ρ(G) = 1 − λ1 /λ̃.
Here we have assumed, w.l.o.g, that the eigenvalues of A are ordered so that
0 < λ1 ≤ λ2 ≤ · · · ≤ λn . It is natural to look for the value of τ which results
in the smallest value of ρ(G), i.e., the fastest reduction rate.
To find this τ , we consider the graph of f (λ) = 1 − τ λ and its behavior
as we vary τ . It is clear that the spectral radius of G for this value of τ
is bounded from above by the maximum of the absolute value of f (λ) over
the interval [λ1 , λn ]. Figure 1 shows two values of τ . The one on the left
7
corresponds to a smaller τ . For the left hand plot, it is clear that the spectral
radius of G is take on at λ1 and is given by 1 − τ λ1 . The second picture
is more interesting. From the picture, it is clear that the spectral radius
again comes from λ1 . As we make τ larger, the line will rotate in a clockwise
direction (it always passes through (0, 1)). It is clear that the value at λ1
will decrease as τ becomes larger. Moreover, the absolute value of f (λn ) will
be increasing (once τ is large enough so that f (λn ) ≤ 0). It is clear that we
get the smallest spectral radius when f (λ1 ) = −f (λn ), i.e.,
1 − τ λ1 = τ λn − 1
or
(7.10)
τ=
2
.
λ1 + λn
For this choice of τ , we get (check it!)
ρ(G) =
λn − λ1
.
λn + λ1
This is the optimal choice of τ and results in the smallest value of ρ(Gτ ), i.e.,
the fastest convergence. Unfortunately, though, the use of this parameter in
practice requires the knowledge of λ1 and λn , quantities which may not be
readily available.
1
1
f
λ1
f
λn
λ1
λn
Figure 1. f (λ) for two different values of τ .
The errors obtained using the Richardson method can be further characterized by expansion in the eigenvectors. Indeed, consider expanding
ei =
n
X
j=1
cij φj .
8
Then
ei+1 =
n
X
j=1
ci+1
j φj = (I − τ A)
n
X
cij φj =
j=1
n
X
(1 − τ λj )cij φj .
j=1
The above gives two different expansions for the same vector in a basis. It
follows that the coefficients must be equal, i.e.,
ci+1
= (1 − τ λj )cij .
j
Thus, each step of the iterative method reduces absolute value of the j’th
coefficient in the expansion by a factor of |1 − τ λj |. We gather some of these
results into the following proposition.
Proposition 1. Let A be a symmetric and positive definite matrix with
maximum and minimum eigenvalues λn and λ1 , respectively. Then, for τ
given by (7.10),
λn − λ1
kI − τ Akℓ2 = kI − τ AkA =
.
λn + λ1
These results of this class go over without change when A is a complex
matrix provided that it is conjugate symmetric and positive definite. We
also refer to conjugate symmetric matrices as Hermitian.
The changes to the development are as follows.
(1) We use Hermitian inner products < ·, · > defined on Cn × Cn which
produce complex numbers satisfying:
(a) < x, y >= < y, x > for all x, y ∈ Cn .
(b) < x, x > ≥ 0 for all x ∈ Cn and equals 0 only if x = 0 (positive
definite).
(c) For α, β ∈ C and x, y, z ∈ Cn ,
< αx + βy, z >= α < x, z > +β < y, z >
and
< x, αy + βz >= ᾱ < x, y > +β̄ < x, z > .
(2) The ℓ2 -Hermitian inner product is given by
n
X
(x, y) :=
xi ȳi .
i=1
(3) For a Hermitian, positive definite matrix A, the A-inner product is
also a Hermitian inner product,
(x, y)A := (Ax, y) for all x, y ∈ Cn
where (·, ·) is the ℓ2 Hermitian inner product above.
9
(4) A matrix A is self adjoint with respect to a Hermitian inner product
< ·, · > if
< Ax, y >=< x, Ay > for all x, y ∈ Cn .
(5) Matrices which are self adjoint with respect to a Hermitian inner
product have real eigenvalues and the spectral theorems above hold
provided that one replaces inner products with Hermitian inner product, symmetric matrices with conjugate symmetric matrices and Rn
with Cn . Moreover, the eigenvectors can be taken to be real. The
corollary also holds, i.e.,
kAkB = ρ(A).
when A is self adjoint with respect to the B-inner product.
If A is a complex Hermitian positive define matrix then the analysis above
for the Richardson method carries over without change. This includes the
choice of iterative parameter (τ = 2/(λ1 +λn ) is the best choice) and behavior
of the coefficients in the eigenvector expansion of the error, i.e.,
ci+1
= (1 − τ λj )cij .
j
Download