Determinant of a matrix |A| Examples: a b ad bc , c d 2 2 (2 )(3 ) 2 2 5 4 ( 1)( 4) 1 3 Eigenvalues of A: All for which |A-I|=0 2 2 has eigenvalues 1 and 4 (see above) 1 3 Example: A Facts: Determinants and eigenvalues (may be complex) exist for all square matrices. If |A|=0 then 0 is an eigenvalue and A has no inverse. Eigenvectors: For each eigenvalue , there exists a column vector X = (x1, x2)’ for which X’X=1 and AX = X Example: 2 2 1 1 4 2 2 x 4 x 1 so 1 3 x x 2 1 1/ does any multiple of it. Normalizing, we see that 1/ 2 1 1 4 and so 3 1 1 1 satisfies AX = X as 1 2 is one eigenvector. 2 2/ 5 2 2 1 1 2 2 1 1 is the 1 2 2 x 1 x 0.5 so 1 and 1/ 5 1 3 x x 1 3 0.5 0.5 other eigenvector. Note that any multiple of an eigenvector will also satisfy AX=X. Example: symmetric correlation matrix 1 1 1 1 (1 ) and 1 1 1 1 1 (1 ) so eigenvectors are 1 1 1 1/ 2 1/ 2 , 1/ 2 1/ 2 for the special case of a 2x2 correlation matrix. What are the eigenvalues? ___ ____ Now collect the 1 0 1 1 1 ' . Note that V V and we have shown that 2 1 1 0 1 0 1 1 V V . Multiplying through on the left by the transpose of V we see that 1 0 1 0 1 1 V V so if X1 and X2 are centered and scaled (mean 0, variance 1) random 1 1 0 eigenvectors in a matrix V = P1 x1 ( x1 x 2 ) / 2 P1 , the variance matrix of is V P2 x2 ( x1 x 2 ) / 2 P2 0 1 1 x1 the expected value of V x1 x2 V which is V V . This is a big deal 1 1 0 x2 variables with correlation then if since it means that we have converted correlated variables x1 and x2 to uncorrelated variables P1 and P2 by taking linear combinations of x1 and x2 based on the eigenvectors of the (x1 , x2) correlation matrix. The variance of P1 is seen to be 1+ and that of P2 is seen to be 1- which are the eigenvalues of the correlation matrix. By convention the eigenvalues are listed in descending order which imposes an order on the eigenvectors as well. The linear combinations P1 and P2, when applied to observed data, are the principal components corresponding to variables X1 and X2. In general, a kxk correlation matrix has k eigenvalues which are variances of the principal components and there are k associated eigenvectors which describe the directions in the k-dimensional data space in which the principal component axes point. Suppose we have an nxk data matrix X of n observations on k random variables. Suppose also that we have centered and scaled the data matrix so that the sample variance covariance matrix S has all 1s on the diagonal (in other words S is the correlation matrix of the centered and scaled data). This means that 1 1 X X Z Z S where Z X i.e. X n 1Z . The diagonal elements of S=Z’Z are all n 1 n 1 1. We can find matrices L (nxk) and R (kxk) such that Z=LDR’ where D (kxk) is a diagonal matrix, L’L=I and R’R=I. This is called the singular value decomposition of Z and the elements of D are called singular values. This implies that ZR=LD where D is a diagonal matrix. Note that ZR consists of k linear combinations of the columns of Z and hence of X. The ith linear combination uses the ith column of R to provide the weights in the linear combination. Note also that LD consists of linear combinations of the columns of L but since D is diagonal these are just multiples of the columns of L which are orthogonal to each other because L’L=Ik. Notice that R’SR=R’Z’ZR = D’L’LD=D’D=D2. Now D2 is diagonal, implying that the columns of ZR are orthogonal to each other and the sums of squares of the columns of ZR are the diagonal elements of D2. Multiplying both sides of R’Z’ZR =D2 by R on the left gives Z’ZR =SR= RD2 so it is no surprise that R is the matrix of eigenvectors of the sample correlation matrix S. Multiplying the left and right sides of this by (n-1) we see that similarly X’XR = R (n-1)D2 =R(D0)2 where (D0)2 =(n-1)D2 is diagonal, consisting of eigenvalues associated with the eigenvectors of X’X (columns of R) which we see are also the eigenvectors of S. Since the eigenvalues of S are variances of the centered and scaled data points, the elements of D0= n 1D are the corresponding standard deviations along the principal component axes. Note that X=LD0R’ is the singular value decomposition of X just as LDR’ is the singular value decomposition of Z. The principal components are the columns of P where P=LD0, a matrix whose columns are scalar multiples (D0) of orthogonal columns (L). We have shown these same principal components can be computed as linear combinations of the columns of X, namely P=XR. The elements of D0 are called singular values of X and are standard deviations along the principal component axes. Similarly the elements of D02 are variances along the principal component axes. Now replace L by its first few columns, say k’<k. Then the new LD has just k’ columns (k’ principal components). These principal components are also a new XR where the new R is just the first k’ columns of the old R. The new version of LD0R’ is the best rank k’ approximation to all the elements of X in the sense of least squares. That is, if you sum and square the elements of X minus the new LDR’ this is smaller than that of any other approximation to X that you can get by taking linear combinations of the columns of an nxk’ matrix. In a regression on k input variables we might get close to the same predictions by regressing on just k’ principal components. Because these are orthogonal we might then get some nice mathematical properties for the regression calculations but note that to get the principal components in the first place we need to measure all k of the input variables. If we have 5 X variables we might get an idea of what the 5 dimensional data scatterplot looks like by plotting the first 2 or 3 principal components by graphical methods we have already seen. Furthermore if we want to cluster the 5 dimensional vectors we might want to cluster based on just the first few principal components. Here is a small example in which we have two wraps A and B for some packages of frozen food we are transporting. Upon arrival at the destination we measure the temperatures of each box top and bottom for a two dimensional plot. You will see the directions of the principal component axes in the (x1,x2) space and the differences in variance along the two principal component axes. The second program shows the math worked out above. Run princomp.sas Run princomp2.sas