Eigenvectors and SVD

advertisement
Eigenvectors and SVD
1
Eigenvectors of a square matrix
• Definition
Ax = λx,
x = 0 .
• Intuition: x is unchanged by A (except for scaling)
• Examples: axis of rotation, stationary distribution of
a Markov chain
2
Diagonalization
• Stack up evec equation to get
AX = XΛ
• Where
X ∈ Rn×n

|
=  x1
|
|
x2
|
···

|
xn  , Λ = diag(λ1 , . . . , λn ) .
|
• If evecs are linearly indep, X is invertible, so
A = XΛX−1 .
3
Evecs of symmetric matrix
• All evals are real (not complex)
• Evecs are orthonormal
uTi uj = 0 if i = j,
uTi ui = 1
• So U is orthogonal matrix
UT U = UUT = I
4
Diagonalizing a symmetric matrix
• We have
A = UΛUT =
n
λi ui uTi
i=1
A
=
=

|
= u 1
|

|
u2
|

| λ1 u1  −
|
···
uT1

 λ1
| 

un  

|

uT1
uT2
−
 −
λ2


..
..

.
.
λn
− uTn
 
| T


− + · · · + λn un
− un −
|

−
−



−
5
Transformation by an orthogonal matrix
• Consider a vector x transformed by the orthogonal
matrix U to give
x̃
=
Ux
• The length of the vector is preserved since
||x̃||2 = x̃T x̃ = xT U T U T x = xT x = ||x||2
• The angle between vectors is preserved
x̃T ỹ = xT U U y = xT y
• Thus multiplication by U can be interpreted as a
rigid rotation of the coordinate system.
6
Geometry of diagonalization
• Let A be a linear transformation. We can always
decompose this into a rotation U, a scaling Λ, and a
reverse rotation UT=U-1.
• Hence A = U Λ UT.
• The inverse mapping is given by A-1 = U Λ-1 UT
A
A−1
=
=
m
i=1
m
i=1
λi ui uTi
1
ui uTi
λi
7
Matlab example
• Given

1.5
A = −0.5
0
−0.5
1.5
0
• Diagonalize
[U,D]=eig(A)
0
0
3
U =
-0.7071
-0.7071
-0.7071
0.7071
0
0
D =
1
0
0
0
2
0
0
0
3
• check
>> U*D*U’
ans =
1.5000
-0.5000
0

-0.5000
1.5000
0
0
0
1.0000
Rot(45)
Scale(1,2,3)
0
0
3.0000
8
Positive definite matrices
• A matrix A is pd if xT A x > 0 for any non-zero
vector x.
• Hence all the evecs of a pd matrix are positive
Aui
=
λi ui
uTi Aui = λi uTi ui = λi > 0
• A matrix is positive semi definite (psd) if λi >= 0.
• A matrix of all positive entries is not necessarily pd;
conversely, a pd matrix can have negative entries
> [u,v] = eig([1 2; 3 4])
u =
-0.8246
-0.4160
0.5658
-0.9094
v =
-0.3723
0
0
5.3723
[u,v]=eig([2 -1; -1 2])
u =
-0.7071
-0.7071
-0.7071
0.7071
v =
1
0
0
3
9
Multivariate Gaussian
• Multivariate Normal (MVN)
def
N (x|µ, Σ) =
1
(2π)p/2 |Σ|1/2
exp[− 12 (x − µ)T Σ−1 (x − µ)]
• Exponent is the Mahalanobis distance between x
and µ
∆ = (x − µ)T Σ−1 (x − µ)
Σ is the covariance matrix (symmetric positive
definite)
xT Σx > 0 ∀x
10
Bivariate Gaussian
• Covariance matrix is
Σ=
σx2
ρσx σy
ρσx σy
σy2
where the correlation coefficient is
ρ= Cov(X, Y )
V ar(X)V ar(Y )
and satisfies -1 ≤ ρ ≤ 1
• Density is
p(x, y) =
2πσx σy
1
1 − ρ2
exp −
1
2(1 − ρ2 )
x2
y2
2ρxy
+ 2−
σx2
σy
(σx σy )
11
Spherical, diagonal, full covariance
Σ=
2
σ
0
0
σ2
Σ=
σx2
0
0
σy2
Σ=
σx2
ρσx σy
ρσx σy
σy2
12
Surface plots
13
Matlab plotting code
stepSize = 0.5;
[x,y] = meshgrid(-5:stepSize:5,-5:stepSize:5);
[r,c]=size(x);
data = [x(:) y(:)];
p = mvnpdf(data, mu’, S);
p = reshape(p, r, c);
surfc(x,y,p);
% 3D plot
contour(x,y,p);
% Plot contours
14
Visualizing a covariance matrix
• Let Σ = U Λ UT. Hence
Σ−1 = U −T Λ−1 U −1
p
1
−1
= UΛ U =
ui uTi
λ
i=1 i
• Let y = U(x-µ) be a transformed coordinate system,
translated by µ and rotated by U. Then
(x − µ)T Σ−1 (x − µ)
=
=
p
1
T
(x − µ)
ui uTi (x − µ)
λ
i=1 i
p
p
2
1
y
i
(x − µ)T ui uTi (x − µ) =
λ
λ
i=1 i
i=1 i
15
Visualizing a covariance matrix
• From the previous slide
(x − µ)T Σ−1 (x − µ)
=
p
y2
i
i=1
λi
• Recall that the equation for an ellipse in 2D is
y12
y22
+
=1
λ1
λ2
• Hence the contours of equiprobability are elliptical,
with axes given by the evecs and scales given by
the evals of Σ
16
Visualizing a covariance matrix
• Let X ~ N(0,I) be points on a 2d circle.
• If
Y=
1
Λ2
• Then
Cov[y] =
1
UΛ 2 X
= diag(
Λii )
1
1
UΛ 2 Cov[x]Λ 2 UT
=
1
1
UΛ 2 Λ 2 UT
=Σ
• So we can map a set of points on a circle to points
on an ellipse
17
Implementation in Matlab
Y=
1
Λ2
1
UΛ 2 X
= diag(
Λii )
function h=gaussPlot2d(mu, Sigma, color)
[U, D] = eig(Sigma);
n = 100;
t = linspace(0, 2*pi, n);
xy = [cos(t); sin(t)];
k = 6; %k = sqrt(chi2inv(0.95, 2));
w = (k * U * sqrt(D)) * xy;
z = repmat(mu, [1 n]) + w;
h = plot(z(1, :), z(2, :), color); axis(’equal
18
Standardizing the data
• We can subtract off the mean and divide by the
standard deviation of each dimension to get the
following (for case i=1:n and dimension j=1:d)
xij − xj
yij =
σj
• Then E[Y]=0 and Var[Yj]=1.
• However, Cov[Y] might still be elliptical due to
correlation amongst the dimensions.
19
Whitening the data
• Let X ~ N(µ,Σ) and Σ = U Λ UT.
• To remove any correlation, we can apply the
following linear transformation
Y
Λ
• In Matlab
1
−2
=
=
1
−2
UT X
diag(1/ Λii )
Λ
[U,D] = eig(cov(X));
Y = sqrt(inv(D)) * U’ * X;
20
Whitening: example
21
Whitening: proof
•
Let
Y
Λ
•
1
−2
=
=
Using
1
−2
UT X
diag(1/ Λii )
Λ
Cov[AX] = ACov[X]AT
we have
Cov[Y ]
=
=
=
1
−2
Λ
EY
1
1
−
T
T
2 U (U ΛU )U Λ 2
−
1
1
−
2 ΛΛ 2
Λ
= Λ
U ΣU Λ
1
−2
−
Λ
and
T
1
−2
=I
U T E[X]
22
Singular Value Decomposition
A
=


| T
UΣV = λ1 u1  − vT1 − +
|
 
| T
· · · + λr ur  − vr −
|
23
Right svectors are evecs of A^T A
• For any matrix A
T
A A
T
(A A)V
=
T
T
VΣ U UΣV
T
=
V(Σ Σ)V
=
T
T
T
V(Σ Σ) = VD
24
Left svectors are evecs of A A^T
T
AA
T
T
T
T
=
UΣV VΣ U
=
T
(AA )U =
T
U(ΣΣ )U
T
U(ΣΣ ) = UD
25
Truncated SVD
A
=
U:,1:k Σ1:k,1:k VT1:,1:k


| · · · + λk uk  −
|


| = λ1 u1  −
|
vTk
vT1
− +
−
Rank k approximation to matrix
Spectrum of singular values
26
SVD on images
• Run demo
load clown
[U,S,V] = svd(X,0);
ranks = [1 2 5 10 20 rank(X)];
for k=ranks(:)’
Xhat = (U(:,1:k)*S(1:k,1:k)*V(:,1:k)’);
image(Xhat);
end
27
Clown example
1
10
2
20
5
200
28
Space savings
A
≈
U:,1:k Σ1:k,1:k VT1:,1:k
m×n
≈
(m × k) (k) (n × k) = (m + n + 1)k
200 × 320 = 64, 000
→
(200 + 320 + 1)20 = 10, 420
29
Download