Principle Component Analysis Jing Gao SUNY Buffalo

Principle Component Analysis Jing Gao SUNY Buffalo 1 Why Dimensionality Reduction? • We have too many dimensions – – – – – To reason about or obtain insights from To visualize Too much noise in the data Need to “reduce” them to a smaller set of factors Better representation of data without losing much information – Can build more effective data analyses on the reduced-dimensional space: classification, clustering, pattern recognition 2 Component Analysis • Discover a new set of factors/dimensions/axes against which to represent, describe or evaluate the data • Factors are combinations of observed variables – May be more effective bases for insights – Observed data are described in terms of these factors rather than in terms of original variables/dimensions 3 Basic Concept • Areas of variance in data are where items can be best discriminated and key underlying phenomena observed – Areas of greatest “signal” in the data • If two items or dimensions are highly correlated or dependent – They are likely to represent highly related phenomena – If they tell us about the same underlying variance in the data, combining them to form a single measure is reasonable 4 Basic Concept • So we want to combine related variables, and focus on uncorrelated or independent ones, especially those along which the observations have high variance • We want a smaller set of variables that explain most of the variance in the original data, in more compact and insightful form • These variables are called “factors” or “principal components” 5 Principal Component Analysis • Most common form of factor analysis • The new variables/dimensions – Are linear combinations of the original ones – Are uncorrelated with one another • Orthogonal in dimension space – Capture as much of the original variance in the data as possible – Are called Principal Components 6 Original Variable B What are the new axes? PC 2 PC 1 Original Variable A • Orthogonal directions of greatest variance in data • Projections along PC1 discriminate the data most along any one axis 7 Principal Components • First principal component is the direction of greatest variability (covariance) in the data • Second is the next orthogonal (uncorrelated) direction of greatest variability – So first remove all the variability along the first component, and then find the next direction of greatest variability • And so on … 8 Principal Components Analysis (PCA) • Principle – Linear projection method to reduce the number of parameters – Transfer a set of correlated variables into a new set of uncorrelated variables – Map the data into a space of lower dimensionality • Properties – It can be viewed as a rotation of the existing axes to new positions in the space defined by original variables – New axes are orthogonal and represent the directions with maximum variability 9 Algebraic definition of PCs Given a sample of n observations on a vector of p variables x1, x2 ,, xn   p define the first principal component of the sample by the linear transformation p z1  a1T x j   ai1 xij , j  1,2,, n. i 1 where the vector a1  (a11 , a21 ,  , a p1 ) x j  ( x1 j , x2 j ,  , x pj ) is chosen such that var[ z1 ] is maximum. 10 Algebraic derivation of PCs To find a1 first note that   n 2 1 2 T T var[ z1 ]  E (( z1  z1 ) )   a1 xi  a1 x n i 1    T 1 n T   a1 xi  x xi  x a1  a1T Sa1 n i 1 where    T 1 n S   xi  x xi  x n i 1 n is the covariance matrix. 1 x   xi is the mean. n i 1 In the following, we assume the Data is centered. x0 11 Algebraic derivation of PCs Assume Form the matrix: then x0 X  [ x1 , x2 ,, xn ]   pn 1 S  XX T n 12 Algebraic derivation of PCs To find a1 that maximizes var[ z1 ] subject to a1T a1  1 Let λ be a Lagrange multiplier L  a1T Sa1   (a1T a1  1)  L  Sa1  a1  0 a1  Sa1  a1  a1T Sa1   therefore a1 is an eigenvector of S corresponding to the largest eigenvalue   1. 13 Algebraic derivation of PCs To find the next coefficient vector subject to and to cov[ z2 , z1 ]  0 a2 maximizing var[ z2 ] uncorrelated a2T a2  1 cov[ z2 , z1 ]  a Sa2   a a2 T 1 T 1 1 then let λ and φ be Lagrange multipliers, and maximize L  a Sa2   (a a2  1)  a a T 2 T 2 T 2 1 14 Algebraic derivation of PCs We find that a2 whose eigenvalue is also an eigenvector of S   2 is the second largest. In general var[ zk ]  a Sak  k T k • The kth largest eigenvalue of S is the variance of the kth PC. • The kth PC z k retains the kth greatest fraction of the variation in the sample. 15 Algebraic derivation of PCs • Main steps for computing PCs – Form the covariance matrix S. – Compute its eigenvectors: a  p i i 1 – Use the first d eigenvectors d PCs. a  d i i 1 to form the – The transformation G is given by G  [a1 , a2 ,, ad ] A test point x    G x   . p T d 16 Dimensionality Reduction Original data reduced data Linear transformation G  T Y  d d p X p G  pd : X  Y  G X  T d 17 Steps of PCA • Let X be the mean vector (taking the mean of all rows) • Adjust the original data by the mean X’ = X – X • Compute the covariance matrix S of adjusted X • Find the eigenvectors and eigenvalues of S. 18 Principal components - Variance 25 Variance (%) 20 15 10 5 0 PC1 PC2 PC3 PC4 PC5 PC6 PC7 PC8 PC9 PC10 19 Transformed Data • Eigenvalues j corresponds to variance on each component j • Thus, sort by j • Take the first d eigenvectors ai; where d is the number of top eigenvalues • These are the directions with the largest variances   yi1   a1  xi1  x1         yi 2   a2  xi 2  x2   ...    ...  ...         y   a  x  x   id   d  in n 20 An Example X1 X2 19 X1' 63 Mean1=24.1 Mean2=53.8 X2' -5.1 9.25 100 90 39 74 14.9 20.25 80 70 60 50 30 87 Series1 40 5.9 33.25 30 20 10 0 30 23 5.9 -30.75 15 35 -9.1 -18.75 0 10 20 30 40 50 40 30 20 15 43 -9.1 -10.75 10 0 -15 15 32 -9.1 -21.75 -10 -5 -10 Series1 0 5 10 15 20 -20 -30 -40 30 73 5.9 19.25 21 Covariance Matrix • C= 75 106 106 482 • We find out: – Eigenvectors: – a2=(-0.98,-0.21), 2=51.8 – a1=(0.21,-0.98), 1=560.2 22 Transform to One-dimension 0.5 yi 0.4 -10.14 0.3 • We keep the dimension of a1=(0.21,-0.98) • We can obtain the final data as -40 -20 0.2 -16.72 0.1 0 -31.35 -0.1 0 20 31.374 40 16.464 -0.2 -0.3 8.624 -0.4 19.404 -0.5 -17.63  xi1  yi  0.21  0.98   0.21* xi1  0.98 * xi 2  xi 2  23

Principle Component Analysis Jing Gao SUNY Buffalo

Related documents

Products

Support

Principle Component Analysis Jing Gao SUNY Buffalo

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib