7.2 Positive Definite Matrices and the SVD 396

advertisement
Chapter 7. Applied Mathematics and A T A
396
7.2 Positive Definite Matrices and the SVD
This chapter about applications of AT A depends on two important ideas in linear algebra.
These ideas have big parts to play, we focus on them now.
1. Positive definite symmetric matrices
(both AT A and AT CA are positive definite)
2. Singular Value Decomposition (A D U †V T gives perfect bases for the 4 subspaces)
Those are orthogonal matrices U and V in the SVD. Their columns are orthonormal
eigenvectors of AAT and AT A. The entries in the diagonal matrix † are the square
roots of the eigenvalues. The matrices AAT and AT A have the same nonzero eigenvalues.
Section 6.5 showed that the eigenvectors of these symmetric matrices are orthogonal.
I will show now that the eigenvalues of AT A are positive, if A has independent columns.
Start with AT Ax D x. Then x T AT Ax D x T x. Therefore D jjAxjj2 =jjxjj2 > 0
I separated x T AT Ax into .Ax/T .Ax/ D jjAxjj2 . We don’t have D 0 because AT A is
invertible (since A has independent columns). The eigenvalues must be positive.
Those are the key steps to understanding positive definite matrices. They give us three
tests on S —three ways to recognize when a symmetric matrix S is positive definite :
Positive
definite
symmetric
1.
2.
3.
All the eigenvalues of S are positive.
The “energy” x T S x is positive for all nonzero vectors x.
S has the form S D AT A with independent columns in A.
There is also a test on the pivots .all > 0/ and a test on n determinants .all > 0/.
Are these matrices positive definite ? When their eigenvalues are positive,
construct matrices A with S D AT A and find the positive energy x T S x.
4 0
5 4
4 5
(a) S D
(b) S D
(c) S D
0 1
4 5
5 4
Example 1
Solution
(a)
The answers are yes, yes, and no. The eigenvalues of those matrices S are
4 and 1 : positive
(b)
9 and 1 : positive
(c)
9 and 1 : not positive.
A quicker test than eigenvalues uses two determinants : the 1 by 1 determinant S11 and
the 2 by 2 determinant of S . Example (b) has S11 D 5 and det S D 25 16 D 9 (pass).
Example (c) has S11 D 4 but det S D 16 25 D 9 (fail the test).
397
7.2. Positive Definite Matrices and the SVD
Positive energy is equivalent to positive eigenvalues, when S is symmetric. Let me
test the energy x T S x in all three examples. Two examples pass and the third fails :
Œx1
x2 
Œx1
x2 
Œx1
x2 
4
0
0
1
5
4
4
5
4
5
5
4
x1
x2
x1
x2
x1
x2
D 4x12 C x22 > 0
Positive energy when x ¤ 0
D 5x12 C 8x1 x2 C 5x22
Positive energy when x ¤ 0
D 4x12 C 10x1 x2 C 4x22
Energy 2 when x D .1; 1/
Positive energy is a fundamental property. This is the best definition of positive definiteness.
When the eigenvalues are positive, there will be many matrices A that give AT A D S .
One choice
of A is symmetric and positive definite ! Then AT A is A2 , and this choice
p
A D S is a true square root of S . The successful examples (a) and (b) have S D A 2 :
4 0
0 1
D
2
0
0
1
2
0
0
1
and
5 4
4 5
D
2
1
1
2
2
1
1
2
We know that all symmetric matrices have the form S pD VƒV T with orthonormal
eigenvectors in V . The diagonal
matrix
p
p ƒ has a square root ƒ, when all eigenvalues are
positive. In this case A D S D V ƒV T is the symmetric positive definite square root :
p p
p
p
p p
S S D .V ƒV T /.V ƒV T / D V ƒ ƒV T D S because V T V D I:
p
p
Starting from this unique square root S , other choices of A come easily. p
Multiply S
by any matrix Q that has orthonormal columns (so that QT Q D I ). Then Q S is another
choice for A (not a symmetric choice). In fact all choices come this way :
ATA D
p
p
p
p
AT A D .Q S /T .Q S / D SQT Q S D S:
I will choose a particular Q in Example 1, to get particular choices of A.
p
0
1
Example 1 (continued) Choose Q D
to multiply S . Then A D
1
0
0
1
2 0
0
1
4
T
A D
D
has S D A A D
1
0
0 1
2
0
0
0
1
2 1
1
2
5
A D
D
has S D AT A D
1
0
1 2
2
1
4
(1)
p
Q S.
0
1
4
5
:
Chapter 7. Applied Mathematics and A T A
398
Positive Semidefinite Matrices
Positive semidefinite matrices include positive definite matrices, and more. Eigenvalues of
S can be zero. Columns of A can be dependent. The energy x T S x can be zero—but not
negative. This gives new equivalent conditions on a (possibly singular) matrix S D S T .
10 All eigenvalues of S satisfy 0
(semidefinite allows zero eigenvalues).
20 The energy is nonnegative for every x : x T S x 0
30 S has the form AT A
Example 2
(zero energy is allowed).
(every A is allowed; its columns can be dependent).
The first two matrices are singular and positive semidefinite—but not the
third :
(d)
SD
0 0
0 1
(e)
SD
4
4
4
4
(f)
SD
4
4
4
4
.
The eigenvalues are 1; 0 and 8; 0 and 8; 0. The energies x T S x are x22 and 4.x1 C x2 /2
and 4.x1 x2 /2 . So the third matrix is actually negative semidefinite.
Singular Value Decomposition
Now we start with A, square or rectangular. Applications also start this way—the matrix
comes from the model. The SVD splits any matrix into orthogonal U times diagonal †
T
times orthogonal V . Those orthogonal factors will give orthogonal bases for the four
fundamental subspaces associated with A.
Let me describe the goal for any m by n matrix, and then how to achieve that goal.
Find orthonormal bases v 1 ; : : : ; v n for Rn and u1 ; : : : ; um for Rm so that
Av 1 D 1 u1
:::
Av r D r ur
Av rC1 D 0
Av n D 0
:::
(2)
The rank of A is r. Those requirements in (4) are expressed by a multiplication AV D U †.
The r nonzero singular values 1 2 : : : r > 0 are on the diagonal of † :
2
3
2
3 2
3 1
0
::
6
7
:
7 (3)
AV D U † A 4 v 1 : : : v r : : : v n 5 D 4 u1 : : : ur : : : um 5 6
4
r 5
0
0
The last n r vectors in V are a basis for the nullspace of A. The last m r vectors in U
are a basis for the nullspace of AT . The diagonal matrix † is m by n, with r nonzeros.
Remember that V 1 D V T , because the columns v 1 ; : : : ; v n are orthonormal in Rn :
Singular Value Decomposition
AV D U †
becomes
A D U †V T :
(4)
399
7.2. Positive Definite Matrices and the SVD
T
T
The SVD has orthogonal matrices U and V , containing eigenvectors of AA and A A.
Comment. A square matrix is diagonalized by its eigenvectors : Ax i D i x i is like
Av i D i ui . But even if A has n eigenvectors, they may not be orthogonal. We need two
bases—an input basis of v’s in Rn and an output basis of u’s in Rm . With two bases, any
m by n matrix can be diagonalized. The beauty of those bases is that they can be chosen
orthonormal. Then U T U D I and V T V D I .
The v’s are eigenvectors of the symmetric matrix S D AT A. We can guarantee their
orthogonality, so that v Tj v i D 0 for j ¤ i . That matrix S is positive semidefinite, so its
eigenvalues are i2 0. The key to the SVD is that Avj is orthogonal to Avi :
Orthogonal u’s .Av j /T .Av i / D v Tj .AT Av i / D v Tj .i2 v i / D
i2
0
if j D i
if j ¤ i
(5)
This says that the vectors ui D Av i =i are orthonormal for i D 1; : : : ; r. They are a basis
for the column space of A. And the u’s are eigenvectors of the symmetric matrix AAT ,
which is usually different from S D AT A (but the eigenvalues 12 ; : : : ; r2 are the same).
Find the input and output eigenvectors v and u for the rectangular matrix A :
Example 3
AD
2
1
2 0
1 0
D U †V T :
Solution
Compute S D AT A and its unit eigenvectors
v 1 ; v 2p
; v 3 . The eigenvalues 2
p
are 8; 2; 0 so the positive singular values are 1 D 8 and 2 D 2 :
2
5
AT A D 4 3
0
3
0
05
0
3
5
0
2p 3
2
1 4p 5
has v 1 D
2 ;
2
0
2 p 3
2
14 p 5
v2 D
2 ;
2
0
2 3
0
v3 D 4 0 5 :
1
The outputs
u1 D Av 1 =1 and u2 D Av 2 =2 are also orthonormal, with 1 D
p
2 D 2. Those vectors u1 and u2 are in the column space of A :
u1 D
2 2
1 1
0
0
v1
p D
8
1
0
and u2 D
2 2
1 1
0
0
v2
p D
2
0
1
p
8 and
:
Then U D I and the Singular Value Decomposition for this 2 by 3 matrix is U †V T :
AD
2 2
1 1
0
0
D
1
0
0
1
" p
8 p0
0
2
0
0
#
2 p
2
1 4 p
2
2
0
p
p2
2
0
3T
0
0 5 :
2
Chapter 7. Applied Mathematics and A T A
400
The Fundamental Theorem of Linear Algebra
I think of the SVD as the final step in the Fundamental Theorem. First come the dimensions of the four subspaces in Figure 7.3. Then come the orthogonality of those pairs of
subspaces. Now come the orthonormal bases of v’s and u’s that diagonalize A :
Av j D j uj
SVD
Av j D 0
for j r
for j > r
AT uj D j v j
for j r
T
A uj D 0
for j > r
Multiplying Av j D j uj by AT and dividing by j gives that equation A T uj D j vj .
dim r
dim r
column
space
of A
row
space
of A
v1
vr
R
Avr D r ur
n
R
Avn D 0
vrC1 vn
nullspace
of A
dim n
1 u1
u1
Av1 D 1 u1
r
ur
r ur
um
urC1
nullspace
of A T
dim m
m
r
Figure 7.3: Orthonormal bases of v’s and u’s that diagonalize A : m by n with rank r.
The “norm” of A is its largest singular value : jjAjj D 1 . This measures the largest
possible ratio of jjAvjj to jjvjj. That ratio of lengths is a maximum when v D v 1 and
Av D 1 u1 . This singular value 1 is a much better measure for the size of a matrix than
the largest eigenvalue. An extreme case can have zero eigenvalues and just one eigenvector
(1; 1) for A. But AT A can still be large : if v D .1; 1/ then Av is 200 times larger.
AD
100
100
100
100
has max D 0: But max D norm of A D 200:
(6)
401
7.2. Positive Definite Matrices and the SVD
The Condition Number
T
A valuable property of A D U †V is that it puts the pieces of A in order of importance.
Multiplying a column ui times a row i v Ti produces one piece of the matrix. There will be
r nonzero pieces from r nonzero ’s, when A has rank r. The pieces add up to A, when
we multiply columns of U times rows of †V T :
2
32
3
1 v T1
The pieces
A D 4 u1 : : : ur 5 4 : : : : : 5 D u1 .1 v T1 / C C ur .r v Tr /:
(7)
have rank 1
r v Tr
1
The first piece gives the norm of A which is 1 . The last piece gives the norm of A
which is 1=n when A is invertible. The condition number is 1 times 1=n :
Condition number of A
c.A/ D jjAjj jjA
1
jj D
1
:
n
,
(8)
This number c.A/ is the key to numerical stability in solving Av D b. When A is an
orthogonal matrix, the symmetric S D AT A is the identity matrix. So all singular values of
an orthogonal matrix are D 1. At the other extreme, a singular matrix has n D 0.
In that case c D 1. Orthogonal matrices have the best condition number c D 1.
Data Matrices : Application of the SVD
“Big data” is the linear algebra problem of this century (and we won’t solve it here).
Sensors and scanners and imaging devices produce enormous volumes of information.
Making decisive sense of that data is the problem for a world of analysts (mathematicians
and statisticians of a new type). Most often the data comes in the form of a matrix.
The usual approach is by PCA—Principal Component Analysis. That is essentially
the SVD. The first piece 1 u1 v T1 holds the most information (in statistics this piece has
the greatest variance). It tells us the most. The Chapter 7 Notes include references.
REVIEW OF THE KEY IDEAS
1. Positive definite symmetric matrices have positive eigenvalues and pivots and energy.
2. S D AT A is positive definite if and only if A has independent columns.
3. x T AT Ax D .Ax/T .Ax/ is zero when Ax D 0. AT A can be positive semidefinite.
4. The SVD is a factorization A D U †V T D (orthogonal) (diagonal) (orthogonal).
5. The columns of V and U are eigenvectors of AT A and AAT (singular vectors of A).
6. Those orthonormal bases achieve Av i D i ui and A is diagonalized.
7. The largest piece of A D 1 u1 v T1 C C r ur v Tr gives the norm jjAjj D 1 .
Chapter 7. Applied Mathematics and A T A
402
Problem Set 7.2
1
For a 2 by 2 matrix, suppose the 1 by 1 and 2 by 2 determinants a and ac
positive. Then c > b 2 =a is also positive.
(i) 1 and 2 have the same sign because their product 1 2 equals
(i) That sign is positive because 1 C 2 equals
Conclusion : The tests a > 0; ac
b 2 are
.
.
b 2 > 0 guarantee positive eigenvalues 1 ; 2 .
2
Which of S1 , S2 , S3 , S4 has two positive eigenvalues? Use a and ac b 2 , don’t
compute the ’s. Find an x with x T S1 x < 0, confirming that A1 fails the test.
5 6
1
2
1 10
1 10
S1 D
S2 D
S3 D
S4 D
:
6 7
2
5
10 100
10 101
3
For which numbers b and c are these matrices positive definite ?
1 b
2 4
c b
SD
SD
SD
:
b 9
4 c
b c
4
What is the energy q D ax 2 C 2bxy C cy 2 D x T S x for each of these matrices ?
Complete the square to write q as a sum of squares d1 . /2 C d2 . /2 .
1 2
1 3
SD
and
SD
:
2 9
3 9
5
x T S x D 2x1 x2 certainly has a saddle point and not a minimum at .0; 0/. What
symmetric matrix S produces this energy ? What are its eigenvalues ?
6
Test to see if AT A is positive definite in each case :
2
3
1 1
1 2
1 1
AD
and A D 4 1 2 5 and A D
0 3
1 2
2 1
7
8
2
:
1
Which 3 by 3 symmetric matrices S and T produce these quadratic energies ?
x T S x D 2 x12 C x22 C x32 x1 x2 x2 x3 . Why is S positive definite?
x T T x D 2 x12 C x22 C x32 x1 x2 x1 x3 x2 x3 . Why is T semidefinite ?
Compute the three upper left determinants of S to establish positive definiteness.
(The first is 2.) Verify that their ratios give the second and third pivots.
2
3
2 2 0
Pivots D ratios of determinants
S D 42 5 35 :
0 3 8
7.2. Positive Definite Matrices and the SVD
9
For what numbers c and d
2
c
S D 41
1
10
If S is positive definite then S
of S 1 are positive because
The entries of S
1
403
are S and T positive definite? Test the 3 determinants :
3
2
3
1 1
1 2 3
c 15
and
T D 42 d 45 :
1 c
3 4 5
1
is positive definite. Best proof : The eigenvalues
. Second proof (only for 2 by 2) :
1
c
b
D
pass the determinant tests
:
b
a
ac b 2
11
If S and T are positive definite, their sum S C T is positive definite. Pivots and
eigenvalues are not convenient for S C T . Better to prove x T .S C T /x > 0.
12
A positive definite matrix cannot have a zero (or even worse, a negative number)
on its diagonal. Show that this matrix fails to have x T S x > 0 :
2
32 3
x1
4 1 1
x1 x2 x3 4 1 0 2 5 4 x2 5 is not positive when .x1 ; x2 ; x3 / D . ; ; /:
1 2 5
x3
13
A diagonal entry ajj of a symmetric matrix cannot be smaller than all the ’s. If it
eigenvalues and would be positive definite.
were, then A ajj I would have
But A ajj I has a
on the main diagonal.
14
Show that if all > 0 then x T S x > 0. We must do this for every nonzero x,
not just the eigenvectors. So write x as a combination of the eigenvectors and
explain why all “cross terms” are x Ti x j D 0. Then x T S x is
.c1 x 1 C C cn x n /T .c1 1 x 1 C C cn n xn / D c12 1 x T1 x 1 C C cn2 n x Tn x n > 0:
15
Give a quick reason why each of these statements is true:
(a) Every positive definite matrix is invertible.
(b) The only positive definite projection matrix is P D I .
(c) A diagonal matrix with positive diagonal entries is positive definite.
16
(d) A symmetric matrix with a positive determinant might not be positive definite !
p p
T
With positive pivots in D, the factorization
S D LDLT becomes
p p
p L T D DL .
(Square roots of the pivots give D D D D.) Then A D DL yields the
Cholesky factorization S D AT A which is “symmetrized L U ” :
3 1
4 8
From A D
find S:
From S D
find A D chol.S /:
0 2
8 25
Chapter 7. Applied Mathematics and A T A
404
17
cos Without multiplying S D
sin (a) the determinant of S
(c) the eigenvectors of S
18
sin cos 2 0
cos 0 5
sin sin , find
cos (b) the eigenvalues of S
(d) a reason why S is symmetric positive definite.
For F1 .x; y/ D 41 x 4 C x 2 y C y 2 and F2 .x; y/ D x 3 C xy x find the second
derivative matrices H1 and H2 :
"
#
@2F=@x 2 @2F=@x@y
Test for minimum H D 2
is positive definite
@ F=@y@x @2F=@y 2
H1 is positive definite so F1 is concave up .D convex). Find the minimum point
of F1 and the saddle point of F2 (look only where first derivatives are zero).
19
The graph of z D x 2 C y 2 is a bowl opening upward. The graph of z D x 2 y 2 is
a saddle. The graph of z D x 2 y 2 is a bowl opening downward. What is a test
on a; b; c for z D ax 2 C 2bxy C cy 2 to have a saddle point at .0; 0/ ?
20
Which values of c give a bowl and which c give a saddle point for the graph of
z D 4x 2 C 12xy C cy 2 ? Describe this graph at the borderline value of c.
21
When S and T are symmetric positive definite, S T might not even be symmetric.
But its eigenvalues are still positive. Start from S T x D x and take dot products
with T x. Then prove > 0.
22
Suppose C is positive definite (so y T C y > 0 whenever y ¤ 0) and A has independent columns (so Ax ¤ 0 whenever x ¤ 0). Apply the energy test to x T AT CAx
to show that AT CA is positive definite : the crucial matrix in engineering.
23
Find the eigenvalues and unit eigenvectors v 1 ; v 2 of AT A. Then find u1 D Av 1 =1 :
1 2
10 20
5 15
T
T
AD
and A A D
and AA D
:
3 6
20 40
15 45
Verify that u1 is a unit eigenvector of AAT . Complete the matrices U; †; V .
h
h
i iT
1 2
1
v1 v2
SVD
D u1 u2
:
3 6
0
24
25
Write down orthonormal bases for the four fundamental subspaces of this A.
2
(a) Why is the trace of AT A equal to the sum of all aij
?
2
(b) For every rank-one matrix, why is 12 D sum of all aij
?
405
7.2. Positive Definite Matrices and the SVD
26
Find the eigenvalues and unit eigenvectors of AT A and AAT . Keep each Av D u:
1 1
Fibonacci matrix
AD
1 0
Construct the singular value decomposition and verify that A equals U †V T .
27
Compute AT A and AAT and their eigenvalues and unit eigenvectors for V and U:
1 1 0
Rectangular matrix
AD
:
0 1 1
Check AV D U † (this will decide ˙ signs in U ). † has the same shape as A.
1
2 .1; 1; 1; 1/
28
Construct the matrix with rank one that has Av D 12u for v D
u D 13 .2; 2; 1/. Its only singular value is 1 D
.
29
Suppose A is invertible (with 1 > 2 > 0). Change A by as small a matrix as
possible to produce a singular matrix A0 . Hint : U and V do not change.
h
h
i iT
1
v1 v2
From
A D u1 u2
find the nearest A0 .
2
30
The SVD for A C I doesn’t use † C I . Why is .A C I / not just .A/ C I ?
31
Multiply AT Av D 2 v by A. Put in parentheses to show that Av is an eigenvector
of AAT . We divide by its length jjAvjj D to get the unit eigenvector u.
32
My favorite example of the SVD is when Av.x/ D dv=dx, with the endpoint conditions v.0/ D 0 and v.1/ D 0. We are looking for orthogonal functions v.x/
so that their derivatives Av D dv=dx are also orthogonal. The perfect choice is
v1 D sin x and v2 D sin 2x and vk D sin kx. Then each uk is a cosine.
and
The derivative of v1 is Av1 D cos x D u1 . The singular values are 1 D and k D k. Orthogonality of the sines (and orthogonality of the cosines) is the
foundation for Fourier series.
You may object to AV D U †. The derivative A D d=dx is not a matrix ! The
orthogonal factor V has functions sin kx in its columns, not vectors. The matrix
U has cosine functions cos kx. Since when is this allowed ? One answer is to
refer you to the chebfun package on the web. This extends linear algebra to matrices
whose columns are functions—not vectors.
Another answer is to replace d=dx by a first difference matrix A. Its shape will be
N C 1 by N . A has 1’s down the diagonal and 1’s on the diagonal below. Then
AV D U † has discrete sines in V and discrete cosines in U . For N D 2 those will
be sines and cosines of 30ı and 60ı in v 1 and u1 .
T
Can you construct the p
difference
p matrix A (3 byp2) and Ap A (2 by 2) ? The discrete sines are v 1 D . 3=2; 3=2/ and v 2 D . 3=2;
3=2/. Test that Av 1 is
orthogonal to Av 2 . What are the singular values 1 and 2 in † ?
Download