Recap Principal Components Analysis References Introduction to Machine Learning CSE474/574: Principal Component Analysis Varun Chandola <chandola@buffalo.edu> Varun Chandola Introduction to Machine Learning Recap Principal Components Analysis References Outline 1 Recap 2 Principal Components Analysis Introduction to PCA Principle of Maximal Variance Defining Principal Components Dimensionality Reduction Using PCA PCA Algorithm Recovering Original Data Eigen Faces Varun Chandola Introduction to Machine Learning Recap Principal Components Analysis References Outline 1 Recap 2 Principal Components Analysis Introduction to PCA Principle of Maximal Variance Defining Principal Components Dimensionality Reduction Using PCA PCA Algorithm Recovering Original Data Eigen Faces Varun Chandola Introduction to Machine Learning Recap Principal Components Analysis References What have we seen so far? Factor Analysis Models Assumption: xi is a multivariate Gaussian random variable Mean is a function of zi Covariance matrix is fixed p(xi |zi , θ) = N (Wzi + µ, Ψ) W is a D × L matrix (loading matrix) Ψ is a D × D diagonal covariance matrix Extensions: Independent Component Analysis. If Ψ = σ 2 I and W is orthonormal ⇒ FA is equivalent to Probabilistic Principal Components Analysis (PPCA) If σ 2 → 0, FA is equivalent to PCA Varun Chandola Introduction to Machine Learning Recap Principal Components Analysis References Introduction to PCA Principle of Maximal Variance Defining Principal Components Dimensionality Reduction Using PCA PCA Algorithm Recovering Original Data Eigen Faces Outline 1 Recap 2 Principal Components Analysis Introduction to PCA Principle of Maximal Variance Defining Principal Components Dimensionality Reduction Using PCA PCA Algorithm Recovering Original Data Eigen Faces Varun Chandola Introduction to Machine Learning Introduction to PCA Principle of Maximal Variance Defining Principal Components Dimensionality Reduction Using PCA PCA Algorithm Recovering Original Data Eigen Faces Recap Principal Components Analysis References Introduction to PCA Consider the following data points 0.6 0.4 0.2 0.2 0.3 0.4 0.5 0.6 0.7 Embed these points in 1 dimension What is the best way? Along the direction of the maximum variance Why? Varun Chandola Introduction to Machine Learning Introduction to PCA Principle of Maximal Variance Defining Principal Components Dimensionality Reduction Using PCA PCA Algorithm Recovering Original Data Eigen Faces Recap Principal Components Analysis References Introduction to PCA Consider the following data points 0.6 0.4 0.2 0.2 0.3 0.4 0.5 0.6 0.7 Embed these points in 1 dimension What is the best way? Along the direction of the maximum variance Why? Varun Chandola Introduction to Machine Learning Introduction to PCA Principle of Maximal Variance Defining Principal Components Dimensionality Reduction Using PCA PCA Algorithm Recovering Original Data Eigen Faces Recap Principal Components Analysis References Why Maximal Variance? Least loss of information Best capture the “spread” What is the direction of maximal variance? Given any direction (û), the projection of x on û is given by: x> i û Direction of maximal variance can be obtained by maximizing N 1 X > 2 (xi û) N = i=1 N 1 X > > û xi xi û N i=1 = > û N 1 X > xi xi N ! i=1 Varun Chandola Introduction to Machine Learning û Introduction to PCA Principle of Maximal Variance Defining Principal Components Dimensionality Reduction Using PCA PCA Algorithm Recovering Original Data Eigen Faces Recap Principal Components Analysis References Why Maximal Variance? Least loss of information Best capture the “spread” What is the direction of maximal variance? Given any direction (û), the projection of x on û is given by: x> i û Direction of maximal variance can be obtained by maximizing N 1 X > 2 (xi û) N = i=1 N 1 X > > û xi xi û N i=1 = > û N 1 X > xi xi N ! i=1 Varun Chandola Introduction to Machine Learning û Recap Principal Components Analysis References Introduction to PCA Principle of Maximal Variance Defining Principal Components Dimensionality Reduction Using PCA PCA Algorithm Recovering Original Data Eigen Faces Finding Direction of Maximal Variance Find: max û> Sû û:û> û=1 where: N 1 X > xi xi S= N i=1 S is the sample (empirical) covariance matrix of the mean-centered data Varun Chandola Introduction to Machine Learning Recap Principal Components Analysis References Introduction to PCA Principle of Maximal Variance Defining Principal Components Dimensionality Reduction Using PCA PCA Algorithm Recovering Original Data Eigen Faces Defining Principal Components First PC: Eigen-vector of the (sample) covariance matrix with largest eigen-value What are eigen vectors and values? Second PC? Eigen-vector with next largest value Variance of each PC is given by λi Variance captured by first L PC (1 ≤ L ≤ D) PL i=1 λi × 100 PD i=1 λi Varun Chandola Av = λv v is eigen vector and λ is eigen-value for the square matrix A Geometric interpretation? Introduction to Machine Learning Recap Principal Components Analysis References Introduction to PCA Principle of Maximal Variance Defining Principal Components Dimensionality Reduction Using PCA PCA Algorithm Recovering Original Data Eigen Faces Defining Principal Components First PC: Eigen-vector of the (sample) covariance matrix with largest eigen-value What are eigen vectors and values? Second PC? Eigen-vector with next largest value Variance of each PC is given by λi Variance captured by first L PC (1 ≤ L ≤ D) PL i=1 λi × 100 PD i=1 λi Varun Chandola Av = λv v is eigen vector and λ is eigen-value for the square matrix A Geometric interpretation? Introduction to Machine Learning Recap Principal Components Analysis References Introduction to PCA Principle of Maximal Variance Defining Principal Components Dimensionality Reduction Using PCA PCA Algorithm Recovering Original Data Eigen Faces Dimensionality Reduction Using PCA Consider first L eigen values and eigen vectors Let W denote the D × L matrix with first L eigen vectors in the columns (sorted by λ’s) PC score matrix Z = XW Each input vector (D × 1) is replaced by a shorter L × 1 vector Varun Chandola Introduction to Machine Learning Recap Principal Components Analysis References Introduction to PCA Principle of Maximal Variance Defining Principal Components Dimensionality Reduction Using PCA PCA Algorithm Recovering Original Data Eigen Faces PCA Algorithm 1 Center X X = X − µ̂ 2 Compute sample covariance matrix: S= 3 4 1 X> X N −1 Find eigen vectors and eigen values for S W consists of first L eigen vectors as columns Ordered by decreasing eigen-values W is D × L 5 Let Z = XW 6 Each row in Z (or z> i ) is the lower dimensional embedding of xi Varun Chandola Introduction to Machine Learning Introduction to PCA Principle of Maximal Variance Defining Principal Components Dimensionality Reduction Using PCA PCA Algorithm Recovering Original Data Eigen Faces Recap Principal Components Analysis References Recovering Original Data Using W and zi x̂i = Wzi Average Reconstruction Error J(W, Z) = N 1 X kxi − x̂i k2 N i=1 Theorem (Classical PCA Theorem) Among all possible orthonormal sets of L basis vectors, PCA gives the solution which has the minimum reconstruction error. Optimal “embedding” in L dimensional space is given by zi = W> xi Varun Chandola Introduction to Machine Learning Recap Principal Components Analysis References Introduction to PCA Principle of Maximal Variance Defining Principal Components Dimensionality Reduction Using PCA PCA Algorithm Recovering Original Data Eigen Faces Using PCA for Face Recognition EigenFaces [1] Input: A set of images (of faces) Task: Identify if a new image is a face or not. Varun Chandola Introduction to Machine Learning Recap Principal Components Analysis References References M. Turk and A. Pentland. Face recognition using eigenfaces. In Computer Vision and Pattern Recognition, 1991. Proceedings CVPR ’91., IEEE Computer Society Conference on, pages 586–591, Jun 1991. Varun Chandola Introduction to Machine Learning