Eigenvectors and SVD 1 Eigenvectors of a square matrix • Definition Ax = λx, x = 0 . • Intuition: x is unchanged by A (except for scaling) • Examples: axis of rotation, stationary distribution of a Markov chain 2 Diagonalization • Stack up evec equation to get AX = XΛ • Where X ∈ Rn×n | = x1 | | x2 | ··· | xn , Λ = diag(λ1 , . . . , λn ) . | • If evecs are linearly indep, X is invertible, so A = XΛX−1 . 3 Evecs of symmetric matrix • All evals are real (not complex) • Evecs are orthonormal uTi uj = 0 if i = j, uTi ui = 1 • So U is orthogonal matrix UT U = UUT = I 4 Diagonalizing a symmetric matrix • We have A = UΛUT = n λi ui uTi i=1 A = = | = u 1 | | u2 | | λ1 u1 − | ··· uT1 λ1 | un | uT1 uT2 − − λ2 .. .. . . λn − uTn | T − + · · · + λn un − un − | − − − 5 Transformation by an orthogonal matrix • Consider a vector x transformed by the orthogonal matrix U to give x̃ = Ux • The length of the vector is preserved since ||x̃||2 = x̃T x̃ = xT U T U T x = xT x = ||x||2 • The angle between vectors is preserved x̃T ỹ = xT U U y = xT y • Thus multiplication by U can be interpreted as a rigid rotation of the coordinate system. 6 Geometry of diagonalization • Let A be a linear transformation. We can always decompose this into a rotation U, a scaling Λ, and a reverse rotation UT=U-1. • Hence A = U Λ UT. • The inverse mapping is given by A-1 = U Λ-1 UT A A−1 = = m i=1 m i=1 λi ui uTi 1 ui uTi λi 7 Matlab example • Given 1.5 A = −0.5 0 −0.5 1.5 0 • Diagonalize [U,D]=eig(A) 0 0 3 U = -0.7071 -0.7071 -0.7071 0.7071 0 0 D = 1 0 0 0 2 0 0 0 3 • check >> U*D*U’ ans = 1.5000 -0.5000 0 -0.5000 1.5000 0 0 0 1.0000 Rot(45) Scale(1,2,3) 0 0 3.0000 8 Positive definite matrices • A matrix A is pd if xT A x > 0 for any non-zero vector x. • Hence all the evecs of a pd matrix are positive Aui = λi ui uTi Aui = λi uTi ui = λi > 0 • A matrix is positive semi definite (psd) if λi >= 0. • A matrix of all positive entries is not necessarily pd; conversely, a pd matrix can have negative entries > [u,v] = eig([1 2; 3 4]) u = -0.8246 -0.4160 0.5658 -0.9094 v = -0.3723 0 0 5.3723 [u,v]=eig([2 -1; -1 2]) u = -0.7071 -0.7071 -0.7071 0.7071 v = 1 0 0 3 9 Multivariate Gaussian • Multivariate Normal (MVN) def N (x|µ, Σ) = 1 (2π)p/2 |Σ|1/2 exp[− 12 (x − µ)T Σ−1 (x − µ)] • Exponent is the Mahalanobis distance between x and µ ∆ = (x − µ)T Σ−1 (x − µ) Σ is the covariance matrix (symmetric positive definite) xT Σx > 0 ∀x 10 Bivariate Gaussian • Covariance matrix is Σ= σx2 ρσx σy ρσx σy σy2 where the correlation coefficient is ρ= Cov(X, Y ) V ar(X)V ar(Y ) and satisfies -1 ≤ ρ ≤ 1 • Density is p(x, y) = 2πσx σy 1 1 − ρ2 exp − 1 2(1 − ρ2 ) x2 y2 2ρxy + 2− σx2 σy (σx σy ) 11 Spherical, diagonal, full covariance Σ= 2 σ 0 0 σ2 Σ= σx2 0 0 σy2 Σ= σx2 ρσx σy ρσx σy σy2 12 Surface plots 13 Matlab plotting code stepSize = 0.5; [x,y] = meshgrid(-5:stepSize:5,-5:stepSize:5); [r,c]=size(x); data = [x(:) y(:)]; p = mvnpdf(data, mu’, S); p = reshape(p, r, c); surfc(x,y,p); % 3D plot contour(x,y,p); % Plot contours 14 Visualizing a covariance matrix • Let Σ = U Λ UT. Hence Σ−1 = U −T Λ−1 U −1 p 1 −1 = UΛ U = ui uTi λ i=1 i • Let y = U(x-µ) be a transformed coordinate system, translated by µ and rotated by U. Then (x − µ)T Σ−1 (x − µ) = = p 1 T (x − µ) ui uTi (x − µ) λ i=1 i p p 2 1 y i (x − µ)T ui uTi (x − µ) = λ λ i=1 i i=1 i 15 Visualizing a covariance matrix • From the previous slide (x − µ)T Σ−1 (x − µ) = p y2 i i=1 λi • Recall that the equation for an ellipse in 2D is y12 y22 + =1 λ1 λ2 • Hence the contours of equiprobability are elliptical, with axes given by the evecs and scales given by the evals of Σ 16 Visualizing a covariance matrix • Let X ~ N(0,I) be points on a 2d circle. • If Y= 1 Λ2 • Then Cov[y] = 1 UΛ 2 X = diag( Λii ) 1 1 UΛ 2 Cov[x]Λ 2 UT = 1 1 UΛ 2 Λ 2 UT =Σ • So we can map a set of points on a circle to points on an ellipse 17 Implementation in Matlab Y= 1 Λ2 1 UΛ 2 X = diag( Λii ) function h=gaussPlot2d(mu, Sigma, color) [U, D] = eig(Sigma); n = 100; t = linspace(0, 2*pi, n); xy = [cos(t); sin(t)]; k = 6; %k = sqrt(chi2inv(0.95, 2)); w = (k * U * sqrt(D)) * xy; z = repmat(mu, [1 n]) + w; h = plot(z(1, :), z(2, :), color); axis(’equal 18 Standardizing the data • We can subtract off the mean and divide by the standard deviation of each dimension to get the following (for case i=1:n and dimension j=1:d) xij − xj yij = σj • Then E[Y]=0 and Var[Yj]=1. • However, Cov[Y] might still be elliptical due to correlation amongst the dimensions. 19 Whitening the data • Let X ~ N(µ,Σ) and Σ = U Λ UT. • To remove any correlation, we can apply the following linear transformation Y Λ • In Matlab 1 −2 = = 1 −2 UT X diag(1/ Λii ) Λ [U,D] = eig(cov(X)); Y = sqrt(inv(D)) * U’ * X; 20 Whitening: example 21 Whitening: proof • Let Y Λ • 1 −2 = = Using 1 −2 UT X diag(1/ Λii ) Λ Cov[AX] = ACov[X]AT we have Cov[Y ] = = = 1 −2 Λ EY 1 1 − T T 2 U (U ΛU )U Λ 2 − 1 1 − 2 ΛΛ 2 Λ = Λ U ΣU Λ 1 −2 − Λ and T 1 −2 =I U T E[X] 22 Singular Value Decomposition A = | T UΣV = λ1 u1 − vT1 − + | | T · · · + λr ur − vr − | 23 Right svectors are evecs of A^T A • For any matrix A T A A T (A A)V = T T VΣ U UΣV T = V(Σ Σ)V = T T T V(Σ Σ) = VD 24 Left svectors are evecs of A A^T T AA T T T T = UΣV VΣ U = T (AA )U = T U(ΣΣ )U T U(ΣΣ ) = UD 25 Truncated SVD A = U:,1:k Σ1:k,1:k VT1:,1:k | · · · + λk uk − | | = λ1 u1 − | vTk vT1 − + − Rank k approximation to matrix Spectrum of singular values 26 SVD on images • Run demo load clown [U,S,V] = svd(X,0); ranks = [1 2 5 10 20 rank(X)]; for k=ranks(:)’ Xhat = (U(:,1:k)*S(1:k,1:k)*V(:,1:k)’); image(Xhat); end 27 Clown example 1 10 2 20 5 200 28 Space savings A ≈ U:,1:k Σ1:k,1:k VT1:,1:k m×n ≈ (m × k) (k) (n × k) = (m + n + 1)k 200 × 320 = 64, 000 → (200 + 320 + 1)20 = 10, 420 29