Introduction to Machine Learning CSE474/574: Principal Component Analysis Varun Chandola <> Recap

advertisement
Recap
Principal Components Analysis
References
Introduction to Machine Learning
CSE474/574: Principal Component Analysis
Varun Chandola <chandola@buffalo.edu>
Varun Chandola
Introduction to Machine Learning
Recap
Principal Components Analysis
References
Outline
1
Recap
2
Principal Components Analysis
Introduction to PCA
Principle of Maximal Variance
Defining Principal Components
Dimensionality Reduction Using PCA
PCA Algorithm
Recovering Original Data
Eigen Faces
Varun Chandola
Introduction to Machine Learning
Recap
Principal Components Analysis
References
Outline
1
Recap
2
Principal Components Analysis
Introduction to PCA
Principle of Maximal Variance
Defining Principal Components
Dimensionality Reduction Using PCA
PCA Algorithm
Recovering Original Data
Eigen Faces
Varun Chandola
Introduction to Machine Learning
Recap
Principal Components Analysis
References
What have we seen so far?
Factor Analysis Models
Assumption: xi is a multivariate Gaussian random variable
Mean is a function of zi
Covariance matrix is fixed
p(xi |zi , θ) = N (Wzi + µ, Ψ)
W is a D × L matrix (loading matrix)
Ψ is a D × D diagonal covariance matrix
Extensions:
Independent Component Analysis.
If Ψ = σ 2 I and W is orthonormal ⇒ FA is equivalent to
Probabilistic Principal Components Analysis (PPCA)
If σ 2 → 0, FA is equivalent to PCA
Varun Chandola
Introduction to Machine Learning
Recap
Principal Components Analysis
References
Introduction to PCA
Principle of Maximal Variance
Defining Principal Components
Dimensionality Reduction Using PCA
PCA Algorithm
Recovering Original Data
Eigen Faces
Outline
1
Recap
2
Principal Components Analysis
Introduction to PCA
Principle of Maximal Variance
Defining Principal Components
Dimensionality Reduction Using PCA
PCA Algorithm
Recovering Original Data
Eigen Faces
Varun Chandola
Introduction to Machine Learning
Introduction to PCA
Principle of Maximal Variance
Defining Principal Components
Dimensionality Reduction Using PCA
PCA Algorithm
Recovering Original Data
Eigen Faces
Recap
Principal Components Analysis
References
Introduction to PCA
Consider the following data points
0.6
0.4
0.2
0.2
0.3
0.4
0.5
0.6
0.7
Embed these points in 1 dimension
What is the best way?
Along the direction of the maximum variance
Why?
Varun Chandola
Introduction to Machine Learning
Introduction to PCA
Principle of Maximal Variance
Defining Principal Components
Dimensionality Reduction Using PCA
PCA Algorithm
Recovering Original Data
Eigen Faces
Recap
Principal Components Analysis
References
Introduction to PCA
Consider the following data points
0.6
0.4
0.2
0.2
0.3
0.4
0.5
0.6
0.7
Embed these points in 1 dimension
What is the best way?
Along the direction of the maximum variance
Why?
Varun Chandola
Introduction to Machine Learning
Introduction to PCA
Principle of Maximal Variance
Defining Principal Components
Dimensionality Reduction Using PCA
PCA Algorithm
Recovering Original Data
Eigen Faces
Recap
Principal Components Analysis
References
Why Maximal Variance?
Least loss of information
Best capture the “spread”
What is the direction of maximal variance?
Given any direction (û), the projection of x on û is given by:
x>
i û
Direction of maximal variance can be obtained by maximizing
N
1 X > 2
(xi û)
N
=
i=1
N
1 X > >
û xi xi û
N
i=1
=
>
û
N
1 X >
xi xi
N
!
i=1
Varun Chandola
Introduction to Machine Learning
û
Introduction to PCA
Principle of Maximal Variance
Defining Principal Components
Dimensionality Reduction Using PCA
PCA Algorithm
Recovering Original Data
Eigen Faces
Recap
Principal Components Analysis
References
Why Maximal Variance?
Least loss of information
Best capture the “spread”
What is the direction of maximal variance?
Given any direction (û), the projection of x on û is given by:
x>
i û
Direction of maximal variance can be obtained by maximizing
N
1 X > 2
(xi û)
N
=
i=1
N
1 X > >
û xi xi û
N
i=1
=
>
û
N
1 X >
xi xi
N
!
i=1
Varun Chandola
Introduction to Machine Learning
û
Recap
Principal Components Analysis
References
Introduction to PCA
Principle of Maximal Variance
Defining Principal Components
Dimensionality Reduction Using PCA
PCA Algorithm
Recovering Original Data
Eigen Faces
Finding Direction of Maximal Variance
Find:
max û> Sû
û:û> û=1
where:
N
1 X >
xi xi
S=
N
i=1
S is the sample (empirical) covariance matrix of the mean-centered
data
Varun Chandola
Introduction to Machine Learning
Recap
Principal Components Analysis
References
Introduction to PCA
Principle of Maximal Variance
Defining Principal Components
Dimensionality Reduction Using PCA
PCA Algorithm
Recovering Original Data
Eigen Faces
Defining Principal Components
First PC: Eigen-vector of the (sample)
covariance matrix with largest
eigen-value
What are eigen vectors
and values?
Second PC?
Eigen-vector with next largest value
Variance of each PC is given by λi
Variance captured by first L PC
(1 ≤ L ≤ D)
PL
i=1 λi
× 100
PD
i=1 λi
Varun Chandola
Av = λv
v is eigen vector and λ
is eigen-value for the
square matrix A
Geometric
interpretation?
Introduction to Machine Learning
Recap
Principal Components Analysis
References
Introduction to PCA
Principle of Maximal Variance
Defining Principal Components
Dimensionality Reduction Using PCA
PCA Algorithm
Recovering Original Data
Eigen Faces
Defining Principal Components
First PC: Eigen-vector of the (sample)
covariance matrix with largest
eigen-value
What are eigen vectors
and values?
Second PC?
Eigen-vector with next largest value
Variance of each PC is given by λi
Variance captured by first L PC
(1 ≤ L ≤ D)
PL
i=1 λi
× 100
PD
i=1 λi
Varun Chandola
Av = λv
v is eigen vector and λ
is eigen-value for the
square matrix A
Geometric
interpretation?
Introduction to Machine Learning
Recap
Principal Components Analysis
References
Introduction to PCA
Principle of Maximal Variance
Defining Principal Components
Dimensionality Reduction Using PCA
PCA Algorithm
Recovering Original Data
Eigen Faces
Dimensionality Reduction Using PCA
Consider first L eigen values and eigen vectors
Let W denote the D × L matrix with first L eigen vectors in the
columns (sorted by λ’s)
PC score matrix
Z = XW
Each input vector (D × 1) is replaced by a shorter L × 1 vector
Varun Chandola
Introduction to Machine Learning
Recap
Principal Components Analysis
References
Introduction to PCA
Principle of Maximal Variance
Defining Principal Components
Dimensionality Reduction Using PCA
PCA Algorithm
Recovering Original Data
Eigen Faces
PCA Algorithm
1
Center X
X = X − µ̂
2
Compute sample covariance matrix:
S=
3
4
1
X> X
N −1
Find eigen vectors and eigen values for S
W consists of first L eigen vectors as columns
Ordered by decreasing eigen-values
W is D × L
5
Let Z = XW
6
Each row in Z (or z>
i ) is the lower dimensional embedding of xi
Varun Chandola
Introduction to Machine Learning
Introduction to PCA
Principle of Maximal Variance
Defining Principal Components
Dimensionality Reduction Using PCA
PCA Algorithm
Recovering Original Data
Eigen Faces
Recap
Principal Components Analysis
References
Recovering Original Data
Using W and zi
x̂i = Wzi
Average Reconstruction Error
J(W, Z) =
N
1 X
kxi − x̂i k2
N
i=1
Theorem (Classical PCA Theorem)
Among all possible orthonormal sets of L basis vectors, PCA gives the
solution which has the minimum reconstruction error.
Optimal “embedding” in L dimensional space is given by zi = W> xi
Varun Chandola
Introduction to Machine Learning
Recap
Principal Components Analysis
References
Introduction to PCA
Principle of Maximal Variance
Defining Principal Components
Dimensionality Reduction Using PCA
PCA Algorithm
Recovering Original Data
Eigen Faces
Using PCA for Face Recognition
EigenFaces [1]
Input: A set of images (of faces)
Task: Identify if a new image is a face or not.
Varun Chandola
Introduction to Machine Learning
Recap
Principal Components Analysis
References
References
M. Turk and A. Pentland.
Face recognition using eigenfaces.
In Computer Vision and Pattern Recognition, 1991. Proceedings
CVPR ’91., IEEE Computer Society Conference on, pages 586–591,
Jun 1991.
Varun Chandola
Introduction to Machine Learning
Download