Chapter 7. Applied Mathematics and A T A 396 7.2 Positive Definite Matrices and the SVD This chapter about applications of AT A depends on two important ideas in linear algebra. These ideas have big parts to play, we focus on them now. 1. Positive definite symmetric matrices (both AT A and AT CA are positive definite) 2. Singular Value Decomposition (A D U †V T gives perfect bases for the 4 subspaces) Those are orthogonal matrices U and V in the SVD. Their columns are orthonormal eigenvectors of AAT and AT A. The entries in the diagonal matrix † are the square roots of the eigenvalues. The matrices AAT and AT A have the same nonzero eigenvalues. Section 6.5 showed that the eigenvectors of these symmetric matrices are orthogonal. I will show now that the eigenvalues of AT A are positive, if A has independent columns. Start with AT Ax D x. Then x T AT Ax D x T x. Therefore D jjAxjj2 =jjxjj2 > 0 I separated x T AT Ax into .Ax/T .Ax/ D jjAxjj2 . We don’t have D 0 because AT A is invertible (since A has independent columns). The eigenvalues must be positive. Those are the key steps to understanding positive definite matrices. They give us three tests on S —three ways to recognize when a symmetric matrix S is positive definite : Positive definite symmetric 1. 2. 3. All the eigenvalues of S are positive. The “energy” x T S x is positive for all nonzero vectors x. S has the form S D AT A with independent columns in A. There is also a test on the pivots .all > 0/ and a test on n determinants .all > 0/. Are these matrices positive definite ? When their eigenvalues are positive, construct matrices A with S D AT A and find the positive energy x T S x. 4 0 5 4 4 5 (a) S D (b) S D (c) S D 0 1 4 5 5 4 Example 1 Solution (a) The answers are yes, yes, and no. The eigenvalues of those matrices S are 4 and 1 : positive (b) 9 and 1 : positive (c) 9 and 1 : not positive. A quicker test than eigenvalues uses two determinants : the 1 by 1 determinant S11 and the 2 by 2 determinant of S . Example (b) has S11 D 5 and det S D 25 16 D 9 (pass). Example (c) has S11 D 4 but det S D 16 25 D 9 (fail the test). 397 7.2. Positive Definite Matrices and the SVD Positive energy is equivalent to positive eigenvalues, when S is symmetric. Let me test the energy x T S x in all three examples. Two examples pass and the third fails : Œx1 x2 Œx1 x2 Œx1 x2 4 0 0 1 5 4 4 5 4 5 5 4 x1 x2 x1 x2 x1 x2 D 4x12 C x22 > 0 Positive energy when x ¤ 0 D 5x12 C 8x1 x2 C 5x22 Positive energy when x ¤ 0 D 4x12 C 10x1 x2 C 4x22 Energy 2 when x D .1; 1/ Positive energy is a fundamental property. This is the best definition of positive definiteness. When the eigenvalues are positive, there will be many matrices A that give AT A D S . One choice of A is symmetric and positive definite ! Then AT A is A2 , and this choice p A D S is a true square root of S . The successful examples (a) and (b) have S D A 2 : 4 0 0 1 D 2 0 0 1 2 0 0 1 and 5 4 4 5 D 2 1 1 2 2 1 1 2 We know that all symmetric matrices have the form S pD VƒV T with orthonormal eigenvectors in V . The diagonal matrix p p ƒ has a square root ƒ, when all eigenvalues are positive. In this case A D S D V ƒV T is the symmetric positive definite square root : p p p p p p S S D .V ƒV T /.V ƒV T / D V ƒ ƒV T D S because V T V D I: p p Starting from this unique square root S , other choices of A come easily. p Multiply S by any matrix Q that has orthonormal columns (so that QT Q D I ). Then Q S is another choice for A (not a symmetric choice). In fact all choices come this way : ATA D p p p p AT A D .Q S /T .Q S / D SQT Q S D S: I will choose a particular Q in Example 1, to get particular choices of A. p 0 1 Example 1 (continued) Choose Q D to multiply S . Then A D 1 0 0 1 2 0 0 1 4 T A D D has S D A A D 1 0 0 1 2 0 0 0 1 2 1 1 2 5 A D D has S D AT A D 1 0 1 2 2 1 4 (1) p Q S. 0 1 4 5 : Chapter 7. Applied Mathematics and A T A 398 Positive Semidefinite Matrices Positive semidefinite matrices include positive definite matrices, and more. Eigenvalues of S can be zero. Columns of A can be dependent. The energy x T S x can be zero—but not negative. This gives new equivalent conditions on a (possibly singular) matrix S D S T . 10 All eigenvalues of S satisfy 0 (semidefinite allows zero eigenvalues). 20 The energy is nonnegative for every x : x T S x 0 30 S has the form AT A Example 2 (zero energy is allowed). (every A is allowed; its columns can be dependent). The first two matrices are singular and positive semidefinite—but not the third : (d) SD 0 0 0 1 (e) SD 4 4 4 4 (f) SD 4 4 4 4 . The eigenvalues are 1; 0 and 8; 0 and 8; 0. The energies x T S x are x22 and 4.x1 C x2 /2 and 4.x1 x2 /2 . So the third matrix is actually negative semidefinite. Singular Value Decomposition Now we start with A, square or rectangular. Applications also start this way—the matrix comes from the model. The SVD splits any matrix into orthogonal U times diagonal † T times orthogonal V . Those orthogonal factors will give orthogonal bases for the four fundamental subspaces associated with A. Let me describe the goal for any m by n matrix, and then how to achieve that goal. Find orthonormal bases v 1 ; : : : ; v n for Rn and u1 ; : : : ; um for Rm so that Av 1 D 1 u1 ::: Av r D r ur Av rC1 D 0 Av n D 0 ::: (2) The rank of A is r. Those requirements in (4) are expressed by a multiplication AV D U †. The r nonzero singular values 1 2 : : : r > 0 are on the diagonal of † : 2 3 2 3 2 3 1 0 :: 6 7 : 7 (3) AV D U † A 4 v 1 : : : v r : : : v n 5 D 4 u1 : : : ur : : : um 5 6 4 r 5 0 0 The last n r vectors in V are a basis for the nullspace of A. The last m r vectors in U are a basis for the nullspace of AT . The diagonal matrix † is m by n, with r nonzeros. Remember that V 1 D V T , because the columns v 1 ; : : : ; v n are orthonormal in Rn : Singular Value Decomposition AV D U † becomes A D U †V T : (4) 399 7.2. Positive Definite Matrices and the SVD T T The SVD has orthogonal matrices U and V , containing eigenvectors of AA and A A. Comment. A square matrix is diagonalized by its eigenvectors : Ax i D i x i is like Av i D i ui . But even if A has n eigenvectors, they may not be orthogonal. We need two bases—an input basis of v’s in Rn and an output basis of u’s in Rm . With two bases, any m by n matrix can be diagonalized. The beauty of those bases is that they can be chosen orthonormal. Then U T U D I and V T V D I . The v’s are eigenvectors of the symmetric matrix S D AT A. We can guarantee their orthogonality, so that v Tj v i D 0 for j ¤ i . That matrix S is positive semidefinite, so its eigenvalues are i2 0. The key to the SVD is that Avj is orthogonal to Avi : Orthogonal u’s .Av j /T .Av i / D v Tj .AT Av i / D v Tj .i2 v i / D i2 0 if j D i if j ¤ i (5) This says that the vectors ui D Av i =i are orthonormal for i D 1; : : : ; r. They are a basis for the column space of A. And the u’s are eigenvectors of the symmetric matrix AAT , which is usually different from S D AT A (but the eigenvalues 12 ; : : : ; r2 are the same). Find the input and output eigenvectors v and u for the rectangular matrix A : Example 3 AD 2 1 2 0 1 0 D U †V T : Solution Compute S D AT A and its unit eigenvectors v 1 ; v 2p ; v 3 . The eigenvalues 2 p are 8; 2; 0 so the positive singular values are 1 D 8 and 2 D 2 : 2 5 AT A D 4 3 0 3 0 05 0 3 5 0 2p 3 2 1 4p 5 has v 1 D 2 ; 2 0 2 p 3 2 14 p 5 v2 D 2 ; 2 0 2 3 0 v3 D 4 0 5 : 1 The outputs u1 D Av 1 =1 and u2 D Av 2 =2 are also orthonormal, with 1 D p 2 D 2. Those vectors u1 and u2 are in the column space of A : u1 D 2 2 1 1 0 0 v1 p D 8 1 0 and u2 D 2 2 1 1 0 0 v2 p D 2 0 1 p 8 and : Then U D I and the Singular Value Decomposition for this 2 by 3 matrix is U †V T : AD 2 2 1 1 0 0 D 1 0 0 1 " p 8 p0 0 2 0 0 # 2 p 2 1 4 p 2 2 0 p p2 2 0 3T 0 0 5 : 2 Chapter 7. Applied Mathematics and A T A 400 The Fundamental Theorem of Linear Algebra I think of the SVD as the final step in the Fundamental Theorem. First come the dimensions of the four subspaces in Figure 7.3. Then come the orthogonality of those pairs of subspaces. Now come the orthonormal bases of v’s and u’s that diagonalize A : Av j D j uj SVD Av j D 0 for j r for j > r AT uj D j v j for j r T A uj D 0 for j > r Multiplying Av j D j uj by AT and dividing by j gives that equation A T uj D j vj . dim r dim r column space of A row space of A v1 vr R Avr D r ur n R Avn D 0 vrC1 vn nullspace of A dim n 1 u1 u1 Av1 D 1 u1 r ur r ur um urC1 nullspace of A T dim m m r Figure 7.3: Orthonormal bases of v’s and u’s that diagonalize A : m by n with rank r. The “norm” of A is its largest singular value : jjAjj D 1 . This measures the largest possible ratio of jjAvjj to jjvjj. That ratio of lengths is a maximum when v D v 1 and Av D 1 u1 . This singular value 1 is a much better measure for the size of a matrix than the largest eigenvalue. An extreme case can have zero eigenvalues and just one eigenvector (1; 1) for A. But AT A can still be large : if v D .1; 1/ then Av is 200 times larger. AD 100 100 100 100 has max D 0: But max D norm of A D 200: (6) 401 7.2. Positive Definite Matrices and the SVD The Condition Number T A valuable property of A D U †V is that it puts the pieces of A in order of importance. Multiplying a column ui times a row i v Ti produces one piece of the matrix. There will be r nonzero pieces from r nonzero ’s, when A has rank r. The pieces add up to A, when we multiply columns of U times rows of †V T : 2 32 3 1 v T1 The pieces A D 4 u1 : : : ur 5 4 : : : : : 5 D u1 .1 v T1 / C C ur .r v Tr /: (7) have rank 1 r v Tr 1 The first piece gives the norm of A which is 1 . The last piece gives the norm of A which is 1=n when A is invertible. The condition number is 1 times 1=n : Condition number of A c.A/ D jjAjj jjA 1 jj D 1 : n , (8) This number c.A/ is the key to numerical stability in solving Av D b. When A is an orthogonal matrix, the symmetric S D AT A is the identity matrix. So all singular values of an orthogonal matrix are D 1. At the other extreme, a singular matrix has n D 0. In that case c D 1. Orthogonal matrices have the best condition number c D 1. Data Matrices : Application of the SVD “Big data” is the linear algebra problem of this century (and we won’t solve it here). Sensors and scanners and imaging devices produce enormous volumes of information. Making decisive sense of that data is the problem for a world of analysts (mathematicians and statisticians of a new type). Most often the data comes in the form of a matrix. The usual approach is by PCA—Principal Component Analysis. That is essentially the SVD. The first piece 1 u1 v T1 holds the most information (in statistics this piece has the greatest variance). It tells us the most. The Chapter 7 Notes include references. REVIEW OF THE KEY IDEAS 1. Positive definite symmetric matrices have positive eigenvalues and pivots and energy. 2. S D AT A is positive definite if and only if A has independent columns. 3. x T AT Ax D .Ax/T .Ax/ is zero when Ax D 0. AT A can be positive semidefinite. 4. The SVD is a factorization A D U †V T D (orthogonal) (diagonal) (orthogonal). 5. The columns of V and U are eigenvectors of AT A and AAT (singular vectors of A). 6. Those orthonormal bases achieve Av i D i ui and A is diagonalized. 7. The largest piece of A D 1 u1 v T1 C C r ur v Tr gives the norm jjAjj D 1 . Chapter 7. Applied Mathematics and A T A 402 Problem Set 7.2 1 For a 2 by 2 matrix, suppose the 1 by 1 and 2 by 2 determinants a and ac positive. Then c > b 2 =a is also positive. (i) 1 and 2 have the same sign because their product 1 2 equals (i) That sign is positive because 1 C 2 equals Conclusion : The tests a > 0; ac b 2 are . . b 2 > 0 guarantee positive eigenvalues 1 ; 2 . 2 Which of S1 , S2 , S3 , S4 has two positive eigenvalues? Use a and ac b 2 , don’t compute the ’s. Find an x with x T S1 x < 0, confirming that A1 fails the test. 5 6 1 2 1 10 1 10 S1 D S2 D S3 D S4 D : 6 7 2 5 10 100 10 101 3 For which numbers b and c are these matrices positive definite ? 1 b 2 4 c b SD SD SD : b 9 4 c b c 4 What is the energy q D ax 2 C 2bxy C cy 2 D x T S x for each of these matrices ? Complete the square to write q as a sum of squares d1 . /2 C d2 . /2 . 1 2 1 3 SD and SD : 2 9 3 9 5 x T S x D 2x1 x2 certainly has a saddle point and not a minimum at .0; 0/. What symmetric matrix S produces this energy ? What are its eigenvalues ? 6 Test to see if AT A is positive definite in each case : 2 3 1 1 1 2 1 1 AD and A D 4 1 2 5 and A D 0 3 1 2 2 1 7 8 2 : 1 Which 3 by 3 symmetric matrices S and T produce these quadratic energies ? x T S x D 2 x12 C x22 C x32 x1 x2 x2 x3 . Why is S positive definite? x T T x D 2 x12 C x22 C x32 x1 x2 x1 x3 x2 x3 . Why is T semidefinite ? Compute the three upper left determinants of S to establish positive definiteness. (The first is 2.) Verify that their ratios give the second and third pivots. 2 3 2 2 0 Pivots D ratios of determinants S D 42 5 35 : 0 3 8 7.2. Positive Definite Matrices and the SVD 9 For what numbers c and d 2 c S D 41 1 10 If S is positive definite then S of S 1 are positive because The entries of S 1 403 are S and T positive definite? Test the 3 determinants : 3 2 3 1 1 1 2 3 c 15 and T D 42 d 45 : 1 c 3 4 5 1 is positive definite. Best proof : The eigenvalues . Second proof (only for 2 by 2) : 1 c b D pass the determinant tests : b a ac b 2 11 If S and T are positive definite, their sum S C T is positive definite. Pivots and eigenvalues are not convenient for S C T . Better to prove x T .S C T /x > 0. 12 A positive definite matrix cannot have a zero (or even worse, a negative number) on its diagonal. Show that this matrix fails to have x T S x > 0 : 2 32 3 x1 4 1 1 x1 x2 x3 4 1 0 2 5 4 x2 5 is not positive when .x1 ; x2 ; x3 / D . ; ; /: 1 2 5 x3 13 A diagonal entry ajj of a symmetric matrix cannot be smaller than all the ’s. If it eigenvalues and would be positive definite. were, then A ajj I would have But A ajj I has a on the main diagonal. 14 Show that if all > 0 then x T S x > 0. We must do this for every nonzero x, not just the eigenvectors. So write x as a combination of the eigenvectors and explain why all “cross terms” are x Ti x j D 0. Then x T S x is .c1 x 1 C C cn x n /T .c1 1 x 1 C C cn n xn / D c12 1 x T1 x 1 C C cn2 n x Tn x n > 0: 15 Give a quick reason why each of these statements is true: (a) Every positive definite matrix is invertible. (b) The only positive definite projection matrix is P D I . (c) A diagonal matrix with positive diagonal entries is positive definite. 16 (d) A symmetric matrix with a positive determinant might not be positive definite ! p p T With positive pivots in D, the factorization S D LDLT becomes p p p L T D DL . (Square roots of the pivots give D D D D.) Then A D DL yields the Cholesky factorization S D AT A which is “symmetrized L U ” : 3 1 4 8 From A D find S: From S D find A D chol.S /: 0 2 8 25 Chapter 7. Applied Mathematics and A T A 404 17 cos Without multiplying S D sin (a) the determinant of S (c) the eigenvectors of S 18 sin cos 2 0 cos 0 5 sin sin , find cos (b) the eigenvalues of S (d) a reason why S is symmetric positive definite. For F1 .x; y/ D 41 x 4 C x 2 y C y 2 and F2 .x; y/ D x 3 C xy x find the second derivative matrices H1 and H2 : " # @2F=@x 2 @2F=@x@y Test for minimum H D 2 is positive definite @ F=@y@x @2F=@y 2 H1 is positive definite so F1 is concave up .D convex). Find the minimum point of F1 and the saddle point of F2 (look only where first derivatives are zero). 19 The graph of z D x 2 C y 2 is a bowl opening upward. The graph of z D x 2 y 2 is a saddle. The graph of z D x 2 y 2 is a bowl opening downward. What is a test on a; b; c for z D ax 2 C 2bxy C cy 2 to have a saddle point at .0; 0/ ? 20 Which values of c give a bowl and which c give a saddle point for the graph of z D 4x 2 C 12xy C cy 2 ? Describe this graph at the borderline value of c. 21 When S and T are symmetric positive definite, S T might not even be symmetric. But its eigenvalues are still positive. Start from S T x D x and take dot products with T x. Then prove > 0. 22 Suppose C is positive definite (so y T C y > 0 whenever y ¤ 0) and A has independent columns (so Ax ¤ 0 whenever x ¤ 0). Apply the energy test to x T AT CAx to show that AT CA is positive definite : the crucial matrix in engineering. 23 Find the eigenvalues and unit eigenvectors v 1 ; v 2 of AT A. Then find u1 D Av 1 =1 : 1 2 10 20 5 15 T T AD and A A D and AA D : 3 6 20 40 15 45 Verify that u1 is a unit eigenvector of AAT . Complete the matrices U; †; V . h h i iT 1 2 1 v1 v2 SVD D u1 u2 : 3 6 0 24 25 Write down orthonormal bases for the four fundamental subspaces of this A. 2 (a) Why is the trace of AT A equal to the sum of all aij ? 2 (b) For every rank-one matrix, why is 12 D sum of all aij ? 405 7.2. Positive Definite Matrices and the SVD 26 Find the eigenvalues and unit eigenvectors of AT A and AAT . Keep each Av D u: 1 1 Fibonacci matrix AD 1 0 Construct the singular value decomposition and verify that A equals U †V T . 27 Compute AT A and AAT and their eigenvalues and unit eigenvectors for V and U: 1 1 0 Rectangular matrix AD : 0 1 1 Check AV D U † (this will decide ˙ signs in U ). † has the same shape as A. 1 2 .1; 1; 1; 1/ 28 Construct the matrix with rank one that has Av D 12u for v D u D 13 .2; 2; 1/. Its only singular value is 1 D . 29 Suppose A is invertible (with 1 > 2 > 0). Change A by as small a matrix as possible to produce a singular matrix A0 . Hint : U and V do not change. h h i iT 1 v1 v2 From A D u1 u2 find the nearest A0 . 2 30 The SVD for A C I doesn’t use † C I . Why is .A C I / not just .A/ C I ? 31 Multiply AT Av D 2 v by A. Put in parentheses to show that Av is an eigenvector of AAT . We divide by its length jjAvjj D to get the unit eigenvector u. 32 My favorite example of the SVD is when Av.x/ D dv=dx, with the endpoint conditions v.0/ D 0 and v.1/ D 0. We are looking for orthogonal functions v.x/ so that their derivatives Av D dv=dx are also orthogonal. The perfect choice is v1 D sin x and v2 D sin 2x and vk D sin kx. Then each uk is a cosine. and The derivative of v1 is Av1 D cos x D u1 . The singular values are 1 D and k D k. Orthogonality of the sines (and orthogonality of the cosines) is the foundation for Fourier series. You may object to AV D U †. The derivative A D d=dx is not a matrix ! The orthogonal factor V has functions sin kx in its columns, not vectors. The matrix U has cosine functions cos kx. Since when is this allowed ? One answer is to refer you to the chebfun package on the web. This extends linear algebra to matrices whose columns are functions—not vectors. Another answer is to replace d=dx by a first difference matrix A. Its shape will be N C 1 by N . A has 1’s down the diagonal and 1’s on the diagonal below. Then AV D U † has discrete sines in V and discrete cosines in U . For N D 2 those will be sines and cosines of 30ı and 60ı in v 1 and u1 . T Can you construct the p difference p matrix A (3 byp2) and Ap A (2 by 2) ? The discrete sines are v 1 D . 3=2; 3=2/ and v 2 D . 3=2; 3=2/. Test that Av 1 is orthogonal to Av 2 . What are the singular values 1 and 2 in † ?