Structure from Motion

advertisement
Eigen Decomposition and
Singular Value Decomposition
Based on the slides by Mani Thomas
Modified and extended by Longin Jan Latecki
Introduction

Eigenvalue decomposition




Physical interpretation of eigenvalue/eigenvectors
Singular Value Decomposition
Importance of SVD




Spectral decomposition theorem
Matrix inversion
Solution to linear system of equations
Solution to a homogeneous system of equations
SVD application
What are eigenvalues?

Given a matrix, A, x is the eigenvector and  is the
corresponding eigenvalue if Ax = x

A must be square and the determinant of A -  I must be
equal to zero
Ax - x = 0 ! (A - I) x = 0



Trivial solution is if x = 0
The non trivial solution occurs when det(A - I) = 0
Are eigenvectors are unique?

If x is an eigenvector, then x is also an eigenvector and
 is an eigenvalue
A(x) = (Ax) = (x) = (x)
Calculating the Eigenvectors/values

Expand the det(A - I) = 0 for a 2 £ 2 matrix
  a11 a12 
1 0 
0
det  A  I   det  
 



a
a
0
1


22 
  21
a12 
a11  
det 
 0  a11   a22     a12a21  0

a22   
 a21
2   a11  a22   a11a22  a12a21   0


For a 2 £ 2 matrix, this is a simple quadratic equation with two solutions
(maybe complex)
2

a11  a22 
  a11  a22  
4a11a22  a12a21 
This “characteristic equation” can be used to solve for x
Eigenvalue example

Consider,

The corresponding eigenvectors can be computed as
2  a11  a22   a11a22  a12a21   0
1 2  
2
A


 (1  4)  1  4  2  2  0


 2 4 
2

 (1  4)    0,   5

 1 2  0 0    x 
1 2  x  1x  2 y  0



0

   
 
  






 2 4   y   2 x  4 y  0 
  2 4  0 0    y 
 1 2  5 0    x 
  4 2   x    4 x  2 y  0 
  5  



0

   
 
  






 2  1  y   2 x  1y  0
  2 4  0 5    y 
  0  


For  = 0, one possible solution is x = (2, -1)
For  = 5, one possible solution is x = (1, 2)
For more information: Demos in Linear algebra by G. Strang, http://web.mit.edu/18.06/www/
Physical interpretation

Consider a covariance matrix, A, i.e., A = 1/n S ST for some S
 1 .75
A
 1  1.75, 2  0.25

.75 1 

Error ellipse with the major axis as the larger eigenvalue and
the minor axis as the smaller eigenvalue
Original Variable B
Physical interpretation
PC 2
PC 1
Original Variable A


Orthogonal directions of greatest variance in data
Projections along PC1 (Principal Component) discriminate the data most
along any one axis
Physical interpretation


First principal component is the direction of
greatest variability (covariance) in the data
Second is the next orthogonal (uncorrelated)
direction of greatest variability



So first remove all the variability along the first
component, and then find the next direction of greatest
variability
And so on …
Thus each eigenvectors provides the directions of
data variances in decreasing order of eigenvalues
For more information: See Gram-Schmidt Orthogonalization in G. Strang’s lectures
Multivariate Gaussian
Bivariate Gaussian
Spherical, diagonal, full covariance
Eigen/diagonal Decomposition


Let
be a square matrix with m linearly
independent eigenvectors (a “non-defective”
matrix)
Unique
Theorem: Exists an eigen decomposition
diagonal

(cf. matrix diagonalization theorem)

Columns of U are eigenvectors of S

Diagonal elements of
are eigenvalues of
for
distinct
eigenvalues
Diagonal decomposition: why/how


Let U have the eigenvectors as columns: U  v1 ... vn 


Then, SU can be written

 
 
 1


SU  S v1 ... vn   1v1 ... n vn   v1 ... vn  
...


 
 
 
n 
Thus SU=U, or U–1SU=
And S=UU–1.
Diagonal decomposition - example
2 1 
S
; 1  1, 2  3.
Recall

1 2 
1
 1 1
1
The eigenvectors 
and   form U  


 1
1

1
1
 


 
Inverting, we have
Then,
U
1
S=UU–1 =
1 / 2  1 / 2


1
/
2
1
/
2


Recall
UU–1 =1.
 1 1 1 0 1 / 2  1 / 2
 1 1 0 3 1 / 2 1 / 2 




Example continued
Let’s divide U (and multiply U–1) by 2
 1 / 2 1 / 2  1 0 1 / 2
Then, S= 



 1 / 2 1 / 2  0 3 1 / 2
Q

Why? Stay tuned …
1/ 2 

1/ 2 
(Q-1= QT )
Symmetric Eigen Decomposition

If
is a symmetric matrix:

Theorem: Exists a (unique) eigen
T
decomposition S  QQ

where Q is orthogonal:

Q-1= QT

Columns of Q are normalized eigenvectors

Columns are orthogonal.

(everything is real)
Spectral Decomposition theorem

If A is a symmetric and positive definite k £ k matrix (xTAx
> 0) with i (i > 0) and ei, i = 1  k being the k
eigenvector and eigenvalue pairs, then k
A  1 e1 e1T  2 e 2 eT2   k e k eTk  A   i ei eTi  PPT
 k k 
 k 1 1k 
 k 1 1k 
 k 1 1k 
1 0
0 
2
P  e1 , e 2 e k ,   
 k k 
 k k   


0 0


 k k 
i 1
 k 1 1k 
0
0 
 

 k 


This is also called the eigen decomposition theorem
Any symmetric matrix can be reconstructed using its
eigenvalues and eigenvectors
Example for spectral decomposition

Let A be a symmetric, positive definite matrix
2.2 0.4
A
 det  A  I   0

0.4 2.8
 2  5  6.16  0.16    3  2  0

The eigenvectors for the corresponding eigenvalues
are e   1 5 , 2 5 , e   2 5 , 1 5 
Consequently,
T
1

T
2
1 
2.2 0.4
5 1

A

3

 2  
5
0.4 2.8
5 

0.6 1.2   1.6  0.8

   0.8 0.4 
1
.
2
2
.
4

 

 2

5  2
2   2
5   1  
5
5 

1

5 
Singular Value Decomposition

If A is a rectangular m £ k matrix of real numbers, then there exists an m
£ m orthogonal matrix U and a k £ k orthogonal matrix V such that
A  U  VT
mk 


mm  mk  k k 
UU T  VV T  I
 is an m £ k matrix where the (i, j)th entry i ¸ 0, i = 1  min(m, k) and the
other entries are zero

The positive constants i are the singular values of A
If A has rank r, then there exists r positive constants 1, 2,r, r
orthogonal m £ 1 unit vectors u1,u2,,ur and r orthogonal k £ 1 unit
vectors v1,v2,,vr such that
r
A   i u i v Ti
i 1

Similar to the spectral decomposition theorem
Singular Value Decomposition (contd.)

If A is a symmetric and positive definite
then

SVD = Eigen decomposition


EIG(i) = SVD(i2)
Here AAT has an eigenvalue-eigenvector
pair (i2,ui)
T
T
T T

AA  UV
UV 
 UV T VUT
 U2 UT

Alternatively, the vi are the eigenvectors of
ATA with the same non zero eigenvalue i2
A T A  V2 V T
Example for SVD

Let A be a symmetric, positive definite matrix

U can be computed as
3  1
 3 1 1
 3 1 1 
  11 1 
T
A

AA

1
3

 1 3 1 

 
 1 3 1

 1 1   1 11




det AA T  I  0   1  12,  2  10  u1T   1 , 1 , uT2   1 ,  1 
2
2
2
2



V can be computed as
3
 3 1 1
1
T
A

A
A



 1 3 1
1
det AT A  I  0   1  12,  2


 1
10 0 2
3
1
1


 0 10 4
3  


 1 3 1 

 2 4 2
1 
 10,  3  0

 v1T   1 , 2 , 1 , v T2   2 ,  1 ,0 , v T3   1
,2
,5
6
6
6
5
5 
30
30
30 



Example for SVD

Taking 21=12 and 22=10, the singular value
decomposition of A is
 3 1 1
A


1
3
1


1 
 1 
2   1 , 2 , 1   10 
2   2 ,  1 ,0 
 12 
 1  
 1 
6
6
6 
5
5 


2 
2 


Thus the U, V and  are computed by performing eigen
decomposition of AAT and ATA
Any matrix has a singular value decomposition but only
symmetric, positive definite matrices have an eigen
decomposition
Applications of SVD in Linear Algebra

Inverse of a n £ n square matrix, A
If A is non-singular, then A-1 = (UVT)-1= V-1UT where
-1=diag(1/1, 1/1,, 1/n)
-1
T -1
-1 T
 If A is singular, then A = (UV ) ¼ V0 U where
0-1=diag(1/1, 1/2,, 1/i,0,0,,0)


Least squares solutions of a m£n system



Ax=b (A is m£n, m¸n) =(ATA)x=ATb ) x=(ATA)-1 ATb=A+b
If ATA is singular, x=A+b¼ (V0-1UT)b where 0-1 = diag(1/1,
1/2,, 1/i,0,0,,0)
Condition of a matrix

Condition number measures the degree of singularity of A

Larger the value of 1/n, closer A is to being singular
http://www.cse.unr.edu/~bebis/MathMethods/SVD/lecture.pdf
Applications of SVD in Linear Algebra

Homogeneous equations, Ax = 0




Minimum-norm solution is x=0
(trivial solution)
Impose a constraint, x  1
“Constrained” optimization
problem min x 1 Ax
Special Case


If rank(A)=n-1 (m ¸ n-1, n=0)
then x= vn ( is a constant)
Genera Case

If rank(A)=n-k (m ¸ n-k, nk+1== n=0) then x=1vn2
2
k+1++kvn with  1++ n=1

Has appeared before



Homogeneous solution of a linear
system of equations
Computation of Homogrpahy
using DLT
Estimation of Fundamental matrix
For proof: Johnson and Wichern, “Applied Multivariate Statistical Analysis”, pg 79
What is the use of SVD?


SVD can be used to compute
optimal low-rank approximations
of arbitrary matrices.
Face recognition


Data mining


Represent the face images as
eigenfaces and compute distance
between the query face image in the
principal component space
Latent Semantic Indexing for
document extraction
Image compression

Karhunen Loeve (KL) transform
performs the best image
compression

In MPEG, Discrete Cosine
Transform (DCT) has the closest
approximation to the KL transform
in PSNR
Singular Value Decomposition

Illustration of SVD dimensions and
sparseness
SVD example
Let
1  1
A   0 1 
 1 0 
Thus m=3, n=2. Its SVD is
 0

1 / 2
1 / 2

2/ 6
1/ 6
1/ 6
1/ 3  1 0 
1 / 2


1 / 3  0
3
1/ 2


 1 / 3   0 0 
1/ 2 

1/ 2 
Typically, the singular values arranged in decreasing order.
Low-rank Approximation


SVD can be used to compute optimal lowrank approximations.
Approximation problem: Find Ak of rank k
such that
Ak  min A  X F
Frobenius norm
X :rank ( X )  k

Ak and X are both mn matrices.
Typically, want k << r.
Low-rank Approximation

Solution via SVD
Ak  U diag ( 1 ,...,  k ,0,...,0)V
T
set smallest r-k
singular values to zero
k
Ak  i 1 i ui viT
k
column notation: sum
of rank 1 matrices
Approximation error


How good (bad) is this approximation?
It’s the best possible, measured by the
Frobenius norm of the error:
min
X :rank ( X )  k
A X
F
 A  Ak
F
  k 1
where the i are ordered such that i  i+1.
Suggests why Frobenius error drops as k
increased.
Download