Eigen Decomposition and Singular Value Decomposition Mani Thomas CISC 489/689 Introduction Eigenvalue decomposition Physical interpretation of eigenvalue/eigenvectors Singular Value Decomposition Importance of SVD Spectral decomposition theorem Matrix inversion Solution to linear system of equations Solution to a homogeneous system of equations SVD application What are eigenvalues? Given a matrix, A, x is the eigenvector and is the corresponding eigenvalue if Ax = x A must be square the determinant of A - I must be equal to zero Ax - x = 0 ! x(A - I) = 0 Trivial solution is if x = 0 The non trivial solution occurs when det(A - I) = 0 Are eigenvectors are unique? If x is an eigenvector, then x is also an eigenvector and is an eigenvalue A(x) = (Ax) = (x) = (x) Calculating the Eigenvectors/values Expand the det(A - I) = 0 for a 2 £ 2 matrix a11 a12 1 0 0 det A I det a a 0 1 22 21 a12 a11 det 0 a11 a22 a12a21 0 a22 a21 2 a11 a22 a11a22 a12a21 0 For a 2 £ 2 matrix, this is a simple quadratic equation with two solutions (maybe complex) 2 a11 a22 a11 a22 4a11a22 a12a21 This “characteristic equation” can be used to solve for x Eigenvalue example Consider, The corresponding eigenvectors can be computed as 2 a11 a22 a11a22 a12a21 0 1 2 2 A (1 4) 1 4 2 2 0 2 4 2 (1 4) 0, 5 1 2 0 0 x 1 2 x 1x 2 y 0 0 2 4 y 2 x 4 y 0 2 4 0 0 y 1 2 5 0 x 4 2 x 4 x 2 y 0 5 0 2 1 y 2 x 1y 0 2 4 0 5 y 0 For = 0, one possible solution is x = (2, -1) For = 5, one possible solution is x = (1, 2) For more information: Demos in Linear algebra by G. Strang, http://web.mit.edu/18.06/www/ Physical interpretation Consider a correlation matrix, A 1 .75 A 1 1.75, 2 0.25 .75 1 Error ellipse with the major axis as the larger eigenvalue and the minor axis as the smaller eigenvalue Original Variable B Physical interpretation PC 2 PC 1 Original Variable A Orthogonal directions of greatest variance in data Projections along PC1 (Principal Component) discriminate the data most along any one axis Physical interpretation First principal component is the direction of greatest variability (covariance) in the data Second is the next orthogonal (uncorrelated) direction of greatest variability So first remove all the variability along the first component, and then find the next direction of greatest variability And so on … Thus each eigenvectors provides the directions of data variances in decreasing order of eigenvalues For more information: See Gram-Schmidt Orthogonalization in G. Strang’s lectures Spectral Decomposition theorem If A is a symmetric and positive definite k £ k matrix (xTAx > 0) with i (i > 0) and ei, i = 1 k being the k eigenvector and eigenvalue pairs, then k A e e e e e e A i ei eTi PPT k k T 1 1 1 k 1 1k T 2 2 2 k 1 1k T k k k k 1 1k 1 0 0 2 P e1 , e 2 e k , k k k k 0 0 k k i 1 k 1 1k 0 0 k This is also called the eigen decomposition theorem Any symmetric matrix can be reconstructed using its eigenvalues and eigenvectors Any similarity to what has been discussed before? Example for spectral decomposition Let A be a symmetric, positive definite matrix 2.2 0.4 A det A I 0 0.4 2.8 2 5 6.16 0.16 3 2 0 The eigenvectors for the corresponding eigenvalues are e 1 5 , 2 5 , e 2 5 , 1 5 Consequently, T 1 T 2 1 2.2 0.4 5 1 A 3 2 5 0.4 2.8 5 0.6 1.2 1.6 0.8 0.8 0.4 1 . 2 2 . 4 2 5 2 2 2 5 1 5 5 1 5 Singular Value Decomposition If A is a rectangular m £ k matrix of real numbers, then there exists an m £ m orthogonal matrix U and a k £ k orthogonal matrix V such that A U VT mk mm mk k k UU T VV T I is an m £ k matrix where the (i, j)th entry i ¸ 0, i = 1 min(m, k) and the other entries are zero The positive constants i are the singular values of A If A has rank r, then there exists r positive constants 1, 2,r, r orthogonal m £ 1 unit vectors u1,u2,,ur and r orthogonal k £ 1 unit vectors v1,v2,,vr such that r A i u i v Ti i 1 Similar to the spectral decomposition theorem Singular Value Decomposition (contd.) If A is a symmetric and positive definite then SVD = Eigen decomposition EIG(i) = SVD(i2) Here AAT has an eigenvalue-eigenvector pair (i2,ui) T T T T AA UV UV UV T VUT U2 UT Alternatively, the vi are the eigenvectors of ATA with the same non zero eigenvalue i2 A T A V2 V T Example for SVD Let A be a symmetric, positive definite matrix U can be computed as 3 1 3 1 1 3 1 1 11 1 T A AA 1 3 1 3 1 1 3 1 1 1 1 11 det AA T I 0 1 12, 2 10 u1T 1 , 1 , uT2 1 , 1 2 2 2 2 V can be computed as 3 3 1 1 1 T A A A 1 3 1 1 det AT A I 0 1 12, 2 1 10 0 2 3 1 1 0 10 4 3 1 3 1 2 4 2 1 10, 3 0 v1T 1 , 2 , 1 , v T2 2 , 1 ,0 , v T3 1 ,2 ,5 6 6 6 5 5 30 30 30 Example for SVD Taking 21=12 and 22=10, the singular value decomposition of A is 3 1 1 A 1 3 1 1 1 2 1 , 2 , 1 10 2 2 , 1 ,0 12 1 1 6 6 6 5 5 2 2 Thus the U, V and are computed by performing eigen decomposition of AAT and ATA Any matrix has a singular value decomposition but only symmetric, positive definite matrices have an eigen decomposition Applications of SVD in Linear Algebra Inverse of a n £ n square matrix, A If A is non-singular, then A-1 = (UVT)-1= V-1UT where -1=diag(1/1, 1/1,, 1/n) -1 T -1 -1 T If A is singular, then A = (UV ) ¼ V0 U where 0-1=diag(1/1, 1/2,, 1/i,0,0,,0) Least squares solutions of a m£n system Ax=b (A is m£n, m¸n) =(ATA)x=ATb ) x=(ATA)-1 ATb=A+b If ATA is singular, x=A+b¼ (V0-1UT)b where 0-1 = diag(1/1, 1/2,, 1/i,0,0,,0) Condition of a matrix Condition number measures the degree of singularity of A Larger the value of 1/n, closer A is to being singular http://www.cse.unr.edu/~bebis/MathMethods/SVD/lecture.pdf Applications of SVD in Linear Algebra Homogeneous equations, Ax = 0 Minimum-norm solution is x=0 (trivial solution) Impose a constraint, x 1 “Constrained” optimization problem min x 1 Ax Special Case If rank(A)=n-1 (m ¸ n-1, n=0) then x= vn ( is a constant) Genera Case If rank(A)=n-k (m ¸ n-k, nk+1== n=0) then x=1vn2 2 k+1++kvn with 1++ n=1 Has appeared before Homogeneous solution of a linear system of equations Computation of Homogrpahy using DLT Estimation of Fundamental matrix For proof: Johnson and Wichern, “Applied Multivariate Statistical Analysis”, pg 79 What is the use of SVD? SVD can be used to compute optimal low-rank approximations of arbitrary matrices. Face recognition Data mining Represent the face images as eigenfaces and compute distance between the query face image in the principal component space Latent Semantic Indexing for document extraction Image compression Karhunen Loeve (KL) transform performs the best image compression In MPEG, Discrete Cosine Transform (DCT) has the closest approximation to the KL transform in PSNR Image Compression using SVD The image is stored as a 256£264 matrix M with entries between 0 and 1 Select r ¿ 256 as an approximation to the original M The matrix M has rank 256 As r in increased from 1 all the way to 256 the reconstruction of M would improve i.e. approximation error would reduce Advantage To send the matrix M, need to send 256£264 = 67584 numbers To send an r = 36 approximation of M, need to send 36 + 36*256 + 36*264 = 18756 numbers 36 singular values 36 left vectors, each having 256 entries 36 right vectors, each having 264 entries Courtesy: http://www.uwlax.edu/faculty/will/svd/compression/index.html