Class Notes 7: CONVERGENCE IN NATURAL NORMS FOR THE SPD CASE Math 639d Due Date: Oct. 16 (updated: October 9, 2014) In earlier classes, we saw that provided that the spectral radius of the reduction matrix G was less than one, there was an abstractly defined norm k · k∗ for which (7.1) kGk∗ < 1. This, in term, implied that the method ultimately converged for any initial iterate and right hand side. We also saw that there were examples where even though ρ(G) was zero, no convergence in the L∞ norm was observed until the n’th step where n was the number of unknowns (see, Example 1 of Class Notes 3). This, somewhat annoying behavior was a result of a Jordan block with a very large order. Even when large dimensional Jordan blocks are not present, the abstract norm k · k∗ may be far from some natural norm. For this class, we shall assume that A is a (real) symmetric and positive definite (SPD) n × n matrix. Our goal will be to study the convergence of iterative methods in two natural norms, the first being the ℓ2 norm and the second being the so-called “operator norm.” The operator norm or A-norm (for x ∈ Rn ) is defined by (7.2) kxkA = (Ax, x)1/2 where (·, ·) is defined by the dot product, (7.3) (x, y) = n X x j yj . j=1 In your linear algebra course, you should have studied inner products. We shall start with inner products defined on Rn . An inner product is a mapping from Rn × Rn into R. One often uses the notation < ·, · > so that < x, y > is a real number when x, y are vectors in Rn . The inner product is required to satisfy the following axioms: (1) < x, x > ≥ 0 for all x ∈ Rn and equals 0 only if x = 0 (positive definite). (2) < x, y >=< y, x > for all x, y ∈ Rn (symmetry). (3) For α, β ∈ R and x, y, z ∈ Rn , < αx + βy, z >= α < x, z > +β < y, z > 1 2 and < x, αy + βz >= α < x, y > +β < x, z > (bilinearity). The simplest inner product in Rn is the ℓ2 -inner product defined by (7.3). Exercise 1. Show that (7.3) satisfies the inner product axioms. Show that (·, ·)A defined by (7.4) (x, y)A = (Ax, y) for all x, y ∈ Rn satisfies the inner product axioms (this only holds for symmetric and positive definite A). Inner products have a number of interesting properties. The first is that for any inner product < ·, · >, there is a corresponding norm given by (7.5) kxk =< x, x >1/2 for all x ∈ Rn . In the case of the dot product, kxk is just the length of x. It is obvious that the first two axioms for being a norm are satisfied by (7.5). That the triangle inequality holds is a consequence of the Cauchy-Schwarz (or Schwarz) inequality: (7.6) | < x, y > | ≤ kxk kyk for all x, y ∈ Rn . Indeed, kx + yk2 =< x, x > +2 < x, y > + < y, y > ≤ kxk2 + 2kxkkyk + kyk2 = (kxk + kyk)2 and the triangle inequality follows by taking square roots. The inequality (7.6) is obvious for the dot product from the identity (7.7) (x, y) = kxk kyk cos(θ) where θ is the angle between x and y. Although we shall not prove this inequality in general, the Cauchy-Schwarz inequality holds for any inner product, e.g ., |(x, y)A | ≤ kxkA kykA . Whenever we deal with an inner product, we shall almost always be using the corresponding norm. Usually, we shall indicate that the norm corresponds to the inner product. In any event, we shall always explicitly point out if we are using a norm which does not correspond to the inner product norm. Remark 1. Note that the A-norm is the norm corresponding to the (·, ·)A inner product (the A-inner product). Consequently, the expression given by the right hand side of (7.2) is indeed a norm. 3 As we have already noticed, different norms assign different “lengths” to the same vector. The norms coming from different inner products have the same property. Besides defining a length (norm), inner products give rise to a notion of angle by generalizing (7.7), i.e., the angle between nonzero vectors x and y with respect to the inner product < ·, · > is θ where < x, y > . cos(θ) = kxkkyk Two vectors x and y are orthogonal with respect to the inner product < ·, · > if θ = π/2, i.e., < x, y >= 0. The angles between two vectors corresponding to different inner products are often not the same as seen in the following example. Example 1. Let x = (1, 1)t and y = (1, −1)t then x and y are orthogonal in (·, ·), i.e., (x, y) = 0. The matrix 1 0 A= 0 2 is symmetric and positive definite and (x, y)A = −1 (compute it!). Thus, x and y are not orthogonal in (·, ·)A . You should recall from you linear algebra class, that a set of vectors {v1 , v2 , . . . , vk } are (mutually) orthogonal if < vi , vj >= 0 when i 6= j. This set of vectors is called orthornormal if in addition, kvi k = 1 for i = 1, . . . , k. Of course, k · k is the norm corresponding to < ·, · >. Definition 1. We say that an n × n matrix B is “self-adjoint” with respect to the inner product < ·, · > on Rn if < Bx, y >=< x, By > for all x, y ∈ Rn . It is easy to check (do it!) that an n × n matrix A is self adjoint with respect to (·, ·) if and only if A is symmetric. We also generalize the notion of positive definite and say that A is positive definite with respect to < ·, · > if < Ax, x >≥ 0 for all x ∈ Rn and < Ax, x >= 0 only if x = 0. Example 2. Let A and B be n × n matrices with A symmetric and positive definite and B symmetric. In general, BA is not symmetric (unless B and A commute). However, BA is always self adjoint with respect to the A-inner product since (BAx, y)A = (ABAx, y) = (BAx, Ay) = (Ax, BAy) = (x, BAy)A . Note that if B is positive definite then BA is positive definite with respect to the A-inner product since (BAx, x)A = (BAx, Ax) 4 which is non-negative (and zero only if x = 0 since A is non-singular). Exercise 2. Assume that A and B are as in the previous example with B SPD. Show that B −1 is also symmetric and positive definite. Show that BA is self adjoint with respect to the B −1 -inner product. The following theorem will be important for the analysis of iterative methods. Its proof is available in any reasonably good book on linear algebra. Theorem 1. (Spectral Theorem). Let A be an n × n symmetric real matrix. Then there is an orthornormal basis of (real) eigenvectors for A, i.e. there is a set n vectors {φ1 , . . . , φn } satisfying Aφi = λi φi , for i = 1, . . . , n and (φi , φj ) = δij ≡ 1 0 if i = j, otherwise. We shall also use a more general version given in the following theorem. The result in the generality stated can be easily derived from the simpler case and is an exercise (below). Theorem 2. (A More General Spectral Theorem). Let A and B be n × n matrices with B symmetric and positive definite and A self adjoint with respect to the B-inner product. Then there is a B-orthornormal basis of (real) eigenvectors with real eigenvalues for A, i.e. there is a set of n vectors {φ1 , . . . , φn } satisfying Aφi = λi φi , for i = 1, . . . , n and (φi , φj )B = δij ≡ 1 0 if i = j, otherwise. As an example of the application of the first theorem, we consider the definition of the square root of a symmetric and positive n×n matrix A. Let {φ1 , . . . , φn } be the orthornormal basis of eigenvectors. We expand x ∈ Rn as n X c i φi x= i=1 and define (7.8) A1/2 x = n X i=1 1/2 λi c i φ i . 5 We already observed in an earlier class that the eigenvalues of a positive 1/2 definite matrix A are all positive and we set λi to be the positive square root. The above definition is more of a transformation representation of the matrix A1/2 . A more explicit (equivalent) representation is given by 1/2 (7.9) A = n X 1/2 λi φi [φi ]t . i=1 This is a sum of n × n matrices since the matrix product φi [φi ]t is an n × n matrix for each i. It is not hard to check that: (1) The matrix product A1/2 A1/2 equals A. (2) A1/2 is invertible and its inverse is given by A−1/2 (defined as in (7.8) 1/2 −1/2 ). or (7.9) with λi replaced by λi 1/2 (3) The matrix A is symmetric and positive definite. Exercise 3. Use Theorem 1 to prove the general version of the spectral theorem. First show that C = B 1/2 AB −1/2 is a symmetric matrix. Apply Theorem 1 to define an orthogonal basis {φ̃1 , . . . , φ̃n } of eigenvectors for C. Then show that {φ1 , φ2 , . . . , φn } where φj = B −1/2 φ̃i are B-orthornormal eigenvectors for A. Hint: You will need to use properties (1),(2),(3) above with B replacing A. Corollary 1. Let A and B be as in the above theorem. Then kAkB = ρ(A). Here kAkB denotes the operator norm of A induced from the vector norm k · kB . Proof. Let λi and φi , i = 1, . . . , n be as in the general spectral theorem. Let x ∈ Rn be nonzero with expansion x= n X c i φi . i=1 Then, by bilinearity, (x, x)B = ( = n X i=1 n X c i φi , c2i . i=1 n X j=1 c j φj )B = n X i,j=1 ci cj (φi , φj )B 6 A similar computation using the identity n n X X c i λi φ i ci Aφi = Ax = i=1 i=1 gives (Ax, Ax)B = n X c2i λ2i ≤ ρ(A) i=1 2 n X c2i = ρ(A)2 (x, x)B . i=1 Thus, 1/2 (Ax, Ax)B kAxkB = ≤ ρ(A). 1/2 kxkB (x, x)B Taking the supremum over x above gives kAkB ≤ ρ(A). The opposite inequality was proved in an earlier class, i.e., kAk ≥ ρ(A) holds for any norm k · k. Application: Let A be a symmetric and positive definite n × n matrix and consider the linear iterative method (the Richardson Method) xi+1 = xi + τ (b − Axi ). Its reduction matrix is given by G = (I − τ A). We clearly have that n ρ(G) = max |(1 − τ λi )| i=1 where λi , i = 1, . . . , n are the eigenvalues of A. Here we used the fact that the eigenvalues of G = (I − τ A) are (1 − τ λi ), i = 1, . . . , n. Since A is symmetric with respect to the usual inner product, G is also. It is also obvious that G is self adjoint with respect to the A-inner product (check it!). It follows from the corollary that kGk = kGkA = ρ(G). This immediately implies that kei+1 k ≤ ρ(G)kei k and kei+1 kA ≤ ρ(G)kei kA . This means that each step of the iteration is guaranteed to reduce the error by a factor of ρ(G) in either of the natural norms k · k or k · k, If λ̃ is any bound on ρ(A) then taking τ = 1/λ̃ leads to ρ(G) = 1 − λ1 /λ̃. Here we have assumed, w.l.o.g, that the eigenvalues of A are ordered so that 0 < λ1 ≤ λ2 ≤ · · · ≤ λn . It is natural to look for the value of τ which results in the smallest value of ρ(G), i.e., the fastest reduction rate. To find this τ , we consider the graph of f (λ) = 1 − τ λ and its behavior as we vary τ . It is clear that the spectral radius of G for this value of τ is bounded from above by the maximum of the absolute value of f (λ) over the interval [λ1 , λn ]. Figure 1 shows two values of τ . The one on the left 7 corresponds to a smaller τ . For the left hand plot, it is clear that the spectral radius of G is take on at λ1 and is given by 1 − τ λ1 . The second picture is more interesting. From the picture, it is clear that the spectral radius again comes from λ1 . As we make τ larger, the line will rotate in a clockwise direction (it always passes through (0, 1)). It is clear that the value at λ1 will decrease as τ becomes larger. Moreover, the absolute value of f (λn ) will be increasing (once τ is large enough so that f (λn ) ≤ 0). It is clear that we get the smallest spectral radius when f (λ1 ) = −f (λn ), i.e., 1 − τ λ1 = τ λn − 1 or (7.10) τ= 2 . λ1 + λn For this choice of τ , we get (check it!) ρ(G) = λn − λ1 . λn + λ1 This is the optimal choice of τ and results in the smallest value of ρ(Gτ ), i.e., the fastest convergence. Unfortunately, though, the use of this parameter in practice requires the knowledge of λ1 and λn , quantities which may not be readily available. 1 1 f λ1 f λn λ1 λn Figure 1. f (λ) for two different values of τ . The errors obtained using the Richardson method can be further characterized by expansion in the eigenvectors. Indeed, consider expanding ei = n X j=1 cij φj . 8 Then ei+1 = n X j=1 ci+1 j φj = (I − τ A) n X cij φj = j=1 n X (1 − τ λj )cij φj . j=1 The above gives two different expansions for the same vector in a basis. It follows that the coefficients must be equal, i.e., ci+1 = (1 − τ λj )cij . j Thus, each step of the iterative method reduces absolute value of the j’th coefficient in the expansion by a factor of |1 − τ λj |. We gather some of these results into the following proposition. Proposition 1. Let A be a symmetric and positive definite matrix with maximum and minimum eigenvalues λn and λ1 , respectively. Then, for τ given by (7.10), λn − λ1 kI − τ Akℓ2 = kI − τ AkA = . λn + λ1 These results of this class go over without change when A is a complex matrix provided that it is conjugate symmetric and positive definite. We also refer to conjugate symmetric matrices as Hermitian. The changes to the development are as follows. (1) We use Hermitian inner products < ·, · > defined on Cn × Cn which produce complex numbers satisfying: (a) < x, y >= < y, x > for all x, y ∈ Cn . (b) < x, x > ≥ 0 for all x ∈ Cn and equals 0 only if x = 0 (positive definite). (c) For α, β ∈ C and x, y, z ∈ Cn , < αx + βy, z >= α < x, z > +β < y, z > and < x, αy + βz >= ᾱ < x, y > +β̄ < x, z > . (2) The ℓ2 -Hermitian inner product is given by n X (x, y) := xi ȳi . i=1 (3) For a Hermitian, positive definite matrix A, the A-inner product is also a Hermitian inner product, (x, y)A := (Ax, y) for all x, y ∈ Cn where (·, ·) is the ℓ2 Hermitian inner product above. 9 (4) A matrix A is self adjoint with respect to a Hermitian inner product < ·, · > if < Ax, y >=< x, Ay > for all x, y ∈ Cn . (5) Matrices which are self adjoint with respect to a Hermitian inner product have real eigenvalues and the spectral theorems above hold provided that one replaces inner products with Hermitian inner product, symmetric matrices with conjugate symmetric matrices and Rn with Cn . Moreover, the eigenvectors can be taken to be real. The corollary also holds, i.e., kAkB = ρ(A). when A is self adjoint with respect to the B-inner product. If A is a complex Hermitian positive define matrix then the analysis above for the Richardson method carries over without change. This includes the choice of iterative parameter (τ = 2/(λ1 +λn ) is the best choice) and behavior of the coefficients in the eigenvector expansion of the error, i.e., ci+1 = (1 − τ λj )cij . j