Factor Analysis - Learn Via Web .com

Factor Analysis and Principal Components Factor analysis with principal components presented as a subset of factor analysis techniques, which it is subset. Principal Components: (PC) Principal components is about explaining the variance-covariance structure, , of a set of variables through a few linear combinations of these variables. In general PC is used for either: 1. Data reduction or 2. Interpretation If you have p variables x   x1 , , x p  you need p  components to capture all the variability, but often a smaller number, k, principal components can capture most of the variability. So the original data of n measurements on p variables can be reduced to a to a data set of n measurements on k principal components. PC tends to be a means to an end but not the end itself. That is PC is often not the final step. The PC may be used for multiple regression, cluster analysis, etc. Let x   x1 , , x p  with   covariance matrix  consider the linear combination Y1  a1x  a11x1  a12 x 2   a1p x p p Y2  a 2 x   a 2i x i i 1 p Yp  a p x   a pi x i i 1 Var  Yi   a i a i , i  1, 2, ,p Cov  Yi , Yk   a i a k , i, k  1, 2, ,p 1. First principal component  linear combination a1x that maximizes Var  a1x  subject to a1a1  1 2. second principal component  linear combination a 2 x that maximizes Var  a 2 x  subject to a 2a 2  1 and Cov  a1x,a 2 x   0 i th . i th principal component  linear combination a i x that maximizes Var  a i x  subject to a i a i  1 and Cov  a i x,a k x   0 for k  i. Find the principal components and the proportion of the total population of the total population variance explained by each when the covariance matrix is   2  2 0  1 1  2 2 2         ,   2 2  0  2  2    To solve this you will have to go though your notes, but you can do this even though didn’t give you the formula. Hint : Re call : Maximization of Quadratic forms for points on the Unit Sphere. Let B be a positive definite matrix with eigenvalues  pp  1   2   3  e1 ,e 2 ,   p and associated normalized eigenvectors ,e p . Then x Bx max  1  attained when x  e1  x 0 x x x Bx min   p  attained when x  e p  x  0 x x Moreover x Bx max   k 1  attained when x  e k 1 k  1, 2, x e1 , ,e k x x , p  1 Answer :   I        2          0 3 2 2 2 2 2             242   0 2     2 or    2 1   2 for 1   2 e1  1     1   2  2  0 1 2   2  2 1   2 e2  1 2 1 2 1 2  3 e3  1 2 1 2 1 2 2  2 Principal Components 1 Y1  X1  2 1 Y2  X1  2 1 Y3  X1  2 Var 2 1  X3 2 1 1 X 2  X3 2 1   2 2 2 1 1 X2  X3 2 2 2  1  2 % Total var     1 3 1 1  2 3 1 1  2 3     Let  be the covariance matrix associated with the random vector x   x1 , x 2 , , x p  . Let  have the eigenvalue  eigenvector pairs  1 ,e1  ,   2 ,e2  , ,   p ,e p  where 1   2  Then the i th principal component is given by Yi  ei x  ei1x1  ei 2 x 2   eip x p i  1, 2, , p Var  Yi   ei ei   i i  1, 2, ,p Cov  Yi , Yk   ei e k  0 i  k 1. 11   22  p   pp   Var  x i   1   2  i 1 p   p   Var  Yi  i 1   p  0. Proof : By definition tr     11   22    pp we can write  as   PP where  is the diagonal matrix of eigenvalues and P  e1 ,e 2 , ,e p  PP  PP  I p tr     tr  PP   tr  PP   tr       i i 1 p p i 1 i 1  Var  x i   tr     tr       Yi  p p i 1 i 1 Thus total population variance   ii    i Thus proportion of total pop.var. due to k th principal component  is p k k  1, 2, , p.  i i 1 Y1  e1x , Y2  e2 x , , Yp  ep x are the principal components from the covariance matrix , then Yi ,x k  eik  i  kk i, k  1, 2, are the correlation coefficients between Yi and the variables x k . Here  1 ,e1  ,   2 ,e 2  , ,p ,   p ,e p  are the eigenvalue  eigenvector pair for . Show Yi ,x k  eik  i  kk Proof : set a k  0,0, ,1,0, ,0  Cov  X k , Yk   Cov  a k x,ei x   a k ei x k  a k x ei   i ei So Cov  X k , Yi   a k  i ei   i eik Var  Yi    i Show earlier  Var  X k    kk Yi ,Xk  Cov  Yi , X k  Var  Yi  Var  X k   i eik   i  kk eik  i   kk i, k  1, 2, ,p Pricipal Components from Standard Variables Z1   X1  1  11 Z2   X 2   2   22 Zp   X p   p   pp 1   In matrix notation Z  V     11 0  1  22  0 2 where V     0 0  1 2 X   0   0     pp  1 1     E  Z  0 Cov  Z    V    V        The PC of Z can be obtained from the eigenvectors of the correlation matrix  of X. 1 2 1 2 For notation we shall continue to use Yi to refer to the i th PC and  i ,ei  as the eigenvalue  eigenvector pair from either  or . However, the   i ,ei  derived from  are, in general, not the same as the ones derived from . The i th PC of the standardized variables Z   Z1, Z2 ,   with Cov  Z    is given by Yi  ei Z  ei  V    1 2 p p i 1 i 1 , Z p  1 X   i  1, ,p  Var  Yi    Var  Zi     The number of random variables, not rho  Yi ,Zk  eik  i i,k  1,2, ,p where   i ,ei  are the eigenvalue  eigenvector pair for , with 1   2    p  0. If S  Sik  is the p  p sample covariance matrix with    ,  ˆ ,eˆ  eigenvalue  eigenvector pairs ˆ 1 ,eˆ1 , ˆ 2 ,eˆ 2 , the i th sample principal component is given by yˆ i  eˆ i x  eˆ i1x1  eˆ i2 x 2   eˆ ip x p i  1, 2, , p. where ˆ 1  ˆ 2   ˆ p  0 and X is any observation on the var iables X1 , X 2 , , Xp. Also sample variance of yˆ k  ˆ k k  1, 2, sample covariance of  yˆ i , yˆ k   0 i  k ryˆ i ,x k  eˆ ik ˆ i Skk i, k  1, 2, ,p ,p p p Factor Analysis The main purpose of factor analysis is to try to describe the covariance relationships among many variables in term of a few underlying, but unobservable, random quantities called factors. The Orthogonal Factor Model: The observable random vector x, with p components, has mean  and covariance matrix, . The factor model proposes that x is linearly dependent upon a few unobservable random variables, F1, F2 ,….,Fm , called common factors and p additional sources of variation 1, 2 ,….,p called error, or specific factors. X1  1  l11F1  l12 F2   l1m Fm  1 X 2   2  l21F1  l22 F2   l2m Fm   2 X p   p  lp1F1  lp2 F2   lpm Fm   p in matrix notation : X   L (p1) F   (pm) (m1) (p1) lij is called the i th loading of the jth factor. L is called the matrix of factor loadings. The p deviations X1  1 , X 2   2 , , X p   p are expressed in terms of F1 , p  m random variables. , Fm , 1 , ,  p , that is Assumptions : E  F  0 Cov  F   E  FF  I E   0 1 0 0  2 Cov     E        pp  0 0 (m1) (p1) (mm) 0 0    p  F and  are independent thus Cov  , F   E  F  0 (pm) Solve for  in terms of L and .   Cov  X   E  X    X        X    X      LF    LF      LF      LF        LF  LF     LF   LF   E  X    X      E  LF  LF     LF   LF        E  LFFL  FL  LF    LE  FF L  E  F L  LE  F  E   indepenent  0  LIL    LL   independent  0  X    F   LF    F  LFF  F Cov  X, F   E  X    F  LE  FF  E  F  L. Also so So Var  X i   li12  2  lim  i Cov  X i , X k   li1l k1  Cov  X i , Fj   lij  lim l km i  k The Principal Component  and principal factor  method  One method for factor analysis  Re call Spectral decomposition : Let   a covariance matrix  have eigenvalue  eigenvector pair  i ,ei  with 1   2    1e1e1   2 e 2 e2    1 e1 |  2 e2 |   p  0. Then   p e p ep |  1 e1      2 e2    p e p         p ep  Thus for a factor analysis where m  p  #factors  # var iables  and  i  0 for all i   L L  0  LL (pp) (pp) (pp) (pp) (pp) Since we almost always want fewer factors than original var iables  m  p . One approach when the last p  m eigenvalue are small is to neglect the contribution of  m1e m1em1    p e p ep to . We use an approximation and now   L L removing the last (pp) (pm) (pm) p  m components. This assumes 1 error, can still be ignored. If we wish to allow for  to be included   LL     1 e1   2 e2 m where  i  ii   lij2 j1  1 e1    1 0   2 e2   0  2  m em         0 0    m em  for i  1, , p. 0 0    p  When applying this approach it is typical to center the observations  minus X . In the case where the units of the variables are not the same   e.g. K.g. and cm.   height weight   it is usually desirable to work with the standardized variables,   X j1  X1    X j2  X 2  Zj      X jp  X p  S11   S22     Spp  j  1, 2, ,n Maximum Likelihood Method for Factor Analysis If the common factors, F, and  can be assumed to be normally distributed, then ML estimates of the factor loadings and  variance may be obtained. When Fj and  j are jointly normal, the observations X j    LFj   j are then normal. The likelihood is L  ,     2   np 2  n 2  1 1 n exp   tr  X j  X X j  X   j1  2    n  X    X       1  1 n exp   tr    X j  X X j  X  L  ,     2   j1  2  p 2 1 2   n exp    X     1  X     .    2    2 which depends on L and  through   LL  .  n 1 p 2   n 1 2        To make L will defined, a unique solution, impose the condition that L 1L  a diagonal matrix Proportion of total sample th variance due to the j factor  ˆl 2  ˆl 2  2j 1j  ˆlpj2 S11  S22   Spp Factor Rotation All factor loadings obtained from the initial loadings by a orthogonal transformation have the same ability to reproduce the covariance (or correlation) matrix. ˆ where TT  TT  I Lˆ*  LT ˆ ˆ ˆ Lˆ  ˆ  Lˆ*Lˆ*  ˆ ˆ  LTT so LL ˆ remains unchanged also. The  Imagine if these were only m  2 factors Lˆ*  Lˆ T p2 (p2) (22)  cos  sin   clockwise where T    rotation  sin  cos    cos   sin   counter clockwise or T    sin  cos  rotation    is an angle which the factor loadings will be rotated through. A Reference:  The following 13 slides comes from:  Multivariate SPSS  By Data Analysis Using John Zhang  ARL, IUP Factor Analysis-1   The main goal of factor analysis is data reduction. A typical use of factor analysis is in survey research, where a researcher wishes to represent a number of questions with a smaller number of factors Two questions in factor analysis:   How many factors are there and what they represent (interpretation) Two technical aids:   Eigenvalues Percentage of variance accounted for Factor Analysis-2  Two types of factor analysis:    Exploratory: introduce here Confirmatory: SPSS AMOS Theoretical basis:   Correlations among variables are explained by underlying factors An example of mathematical 1 factor model for two variables: V1=L1*F1+E1 V2=L2*F1+E2 Factor Analysis-3     Each variable is composed of a common factor (F1) multiply by a loading coefficient (L1, L2 – the lambdas or factor loadings) plus a random component V1 and V2 correlate because the common factor and should relate to the factor loadings, thus, the factor loadings can be estimated by the correlations A set of correlations can derive different factor loadings (i.e. the solutions are not unique) One should pick the simplest solution Factor Analysis-4  That is the findings should not differ by methodology of analysis nor by sample A factor solution needs to confirm:  By a different factor method  By a different sample  More on terminology    Factor loading: interpreted as the Pearson correlation between the variable and the factor Communality: the proportion of variability for a given variable that is explained by the factor Extraction: the process by which the factors are determined from a large set of variables Factor Analysis-5 (Principle components)  Principle component: one of the extraction methods    A principle component is a linear combination of observed variables that is independent (orthogonal) of other components The first component accounts for the largest amount of variance in the input data; the second component accounts for the largest amount or the remaining variance… Components are orthogonal means they are uncorrelated Factor Analysis-6 (Principle components)  Possible application of principle components:  E.g. in a survey research, it is common to have many questions to address one issue (e.g. customer service). It is likely that these questions are highly correlated. It is problematic to use these variables in some statistical procedures (e.g. regression). One can use factor scores, computed from factor loadings on each orthogonal component Factor Analysis-7 (Principle components)  Principle component vs. other extract methods:     Principle component focus on accounting for the maximum among of variance (the diagonal of a correlation matrix) Other extract methods (e.g. principle axis factoring) focus more on accounting for the correlations between variables (off diagonal correlations) Principle component can be defined as a unique combination of variables but the other factor methods can not Principle component are use for data reduction but more difficult to interpret Factor Analysis-8  Number of factors:  Eigenvalues are often used to determine how many factors to take  Take as many factors there are eigenvalues greater than 1  Eigenvalue represents the amount of standardized variance in the variable accounted for by a factor  The amount of standardized variance in a variable is 1  The sum of eigenvalues is the percentage of variance accounted for Factor Analysis-9  Rotation   Objective: to facilitate interpretation Orthogonal rotation: done when data reduction is the objective and factors need to be orthogonal    Varimax: attempts to simplify interpretation by maximize the variances of the variable loadings on each factor Quartimax: simplify solution by finding a rotation that produces high and low loadings across factors for each variable Oblique rotation: use when there are reason to allow factors to be correlated  Oblimin and Promax (promax runs fast) Factor Analysis-10  Factor scores: if you are satisfy with a factor solution   You can request that a new set of variables be created that represents the scores of each observation on the factor (difficult of interpret) You can use the lambda coefficient to judge which variables are highly related to the factor; the compute the sum of the mean of this variables for further analysis (easy to interpret) Factor Analysis-11   Sample size: the sample size should be about 10 to 15 times the number of variables (as other multivariate procedures) Number of methods: there are 8 factoring methods, including principle component   Principle axis: account for correlations between the variables Unweighted least-squares: minimize the residual between the observed and the reproduced correlation matrix Factor Analysis-12     Generalize least-squares: similar to Unweighted leastsquares but give more weight to the variables with stronger correlation Maximum Likelihood: generate the solution that is the most likely to produce the correlation matrix Alpha Factoring: Consider variables as a sample; not using factor loadings Image factoring: decompose the variables into a common part and a unique part, then work with the common part Factor Analysis-13  Recommendations:    Principle components and principle axis are the most common used methods When there are multicollinearity, use principle components Rotations are often done. Try to use Varimax Reference Factor Analysis from SPSS  Much of the wording comes from the SPSS help and tutorial.  Factor Analysis  Factor Analysis is primarily used for data reduction or structure detection.   The purpose of data reduction is to remove redundant (highly correlated) variables from the data file, perhaps replacing the entire data file with a smaller number of uncorrelated variables. The purpose of structure detection is to examine the underlying (or latent) relationships between the variables. Factor Analysis  The Factor Analysis procedure has several extraction methods for constructing a solution.    For Data Reduction. The principal components method of extraction begins by finding a linear combination of variables (a component) that accounts for as much variation in the original variables as possible. It then finds another component that accounts for as much of the remaining variation as possible and is uncorrelated with the previous component, continuing in this way until there are as many components as original variables. Usually, a few components will account for most of the variation, and these components can be used to replace the original variables. This method is most often used to reduce the number of variables in the data file. For Structure Detection. Other Factor Analysis extraction methods go one step further by adding the assumption that some of the variability in the data cannot be explained by the components (usually called factors in other extraction methods). As a result, the total variance explained by the solution is smaller; however, the addition of this structure to the factor model makes these methods ideal for examining relationships between the variables. With any extraction method, the two questions that a good solution should try to answer are "How many components (factors) are needed to represent the variables?" and "What do these components represent?" Factor Analysis: Data Reduction An industry analyst would like to predict automobile sales from a set of predictors. However, many of the predictors are correlated, and the analyst fears that this might adversely affect her results.  This information is contained in the file car_sales.sav . Use Factor Analysis with principal components extraction to focus the analysis on a manageable subset of the predictors.  Factor Analysis: Structure Detection A telecommunications provider wants to better understand service usage patterns in its customer database. If services can be clustered by usage, the company can offer more attractive packages to its customers.  A random sample from the customer database is contained in telco.sav . Factor Analysis to determine the underlying structure in service usage.  Use: Principal Axis Factoring  Example of Factor Analysis: Structure Detection Telecommunications provider wants to better understand service usage patterns in its customer database. Selecting service offerings Example of Factor Analysis: Descriptives Click descriptives: Recommend checking Initial Solution (default) In addition, check “Anti-image” and “KMO and …”. Example of Factor Analysis: Extraction Click Extraction: Select Method “Principal axis factoring”. Recommend Keep defaults but also check “Scree plot”. Example of Factor Analysis: Rotation Click Rotation: Select “Varimax” and Loading plot(s)”. Understanding the Output The Kaiser-Meyer-Olkin Measure of Sampling Adequacy is a statistic that indicates the proportion of variance in your variables that might be caused by underlying factors. Perhaps can’t use factor analys if <0.5 KMO and Bartlett's Test Kaiser-Meyer-Olkin Measure of Sampling Adequacy. Bartlett' s Test of Sphericity Approx. Chi-Square df Sig . .888 6230.901 91 .000 Bartlett's test of sphericity tests the hypothesis that your correlation matrix is an identity matrix, which would indicate that your variables are unrelated and therefore unsuitable for structure detection. Sig. <0.05 than factor analysis may be helpful. Understanding the Output Communalities Long distance last month Toll free last month Equipment last month Calling card last month Wireless last month Multiple lines Voice mail Paging service Internet Caller ID Call waiting Call forwarding 3-way calling Electronic billing Initial .297 .510 .579 .266 .660 .276 .471 .527 .455 .552 .545 .532 .506 .416 Extraction .748 .564 .697 .307 .708 .340 .501 .541 .525 .623 .610 .596 .561 .488 Extraction Method: Principal Axis Factoring. Extraction communalities are estimates of the variance in each variable accounted for by the factors in the factor solution. Small values indicate variables that do not fit well with the factor solution, and should possibly be dropped from the analysis. The lower values of Multiple lines and Calling card show that they don't fit as well as the others. Understanding the Output Before rotation Only three factors in the initial solution have eigenvalues greater than 1. Together, they account for almost 65% of the variability in the original variables. This suggests that three latent influences are associated with service usage, but there remains room for a lot of unexplained variation. Understanding the Output After rotation From rotation approximately now 56% of the variation is explained about a 10% loss in explanation of the variation. In general, there are a lot of services that have correlations greater than 0.2 with multiple factors, which muddies the picture. The rotated factor matrix should clear this up. Understanding the Output Before rotation Factor Matrixa 1 Long distance last month Toll free last month Equipment last month Calling card last month Wireless last month Multiple lines Voice mail Paging service Internet Caller ID Call waiting Call forwarding 3-way calling Electronic billing .146 .652 .494 .364 .799 .257 .669 .692 .323 .689 .678 .684 .662 .250 Factor 2 -.254 -.373 .671 -.243 .261 .280 .228 .246 .648 -.345 -.366 -.336 -.338 .652 Extraction Method: Principal Axis Factoring . a. Attempted to extract 3 factors. More than 25 iterations req uired. (Converg ence=.002). Extraction was terminated. 3 .814 .020 .054 .339 .037 .442 -.038 -.050 -.014 -.172 -.126 -.128 -.093 -.035 The relationships in the unrotated factor matrix are somewhat clear. The third factor is associated with Long distance last month. The second corresponds most strongly to Equipment last month, Internet, and Electronic billing. The first factor is associated with Toll free last month, Wireless last month, Voice mail, Paging service, Caller ID, Call waiting, Call forwarding, and 3-way calling. Understanding the Output After rotation Rotated Factor Matrixa 1 Long distance last month Toll free last month Equipment last month Calling card last month Wireless last month Multiple lines Voice mail Paging service Internet Caller ID Call waiting Call forwarding 3-way calling Electronic billing .062 .726 .067 .348 .530 -.025 .455 .468 -.049 .787 .779 .768 .743 -.107 Factor 2 -.121 .018 .831 -.012 .637 .384 .539 .566 .722 .056 .033 .062 .050 .686 Extraction Method: Principal Axis Factoring . Rotation Method: Varimax with Kaiser Normalization. a. Rotation converged in 4 iterations. 3 .854 .191 .049 .431 .146 .438 .054 .044 -.045 .008 .054 .048 .078 -.080 The first rotated factor is most highly correlated with Toll free last month, Caller ID, Call waiting, Call forwarding, and 3way calling. These variables are not particularly correlated with the other two factors. The second factor is most highly correlated with Equipment last month, Internet, and Electronic billing. The third factor is largely unaffected by the rotation. Thus, there are three major groupings of services, as defined by the services that are most highly correlated with the three factors. Given these groupings, you can make the following observations about the remaining services: Understanding the Output Rotated Factor Matrixa 1 Long distance last month Toll free last month Equipment last month Calling card last month Wireless last month Multiple lines Voice mail Paging service Internet Caller ID Call waiting Call forwarding 3-way calling Electronic billing .062 .726 .067 .348 .530 -.025 .455 .468 -.049 .787 .779 .768 .743 -.107 Factor 2 -.121 .018 .831 -.012 .637 .384 .539 .566 .722 .056 .033 .062 .050 .686 Extraction Method: Principal Axis Factoring . Rotation Method: Varimax with Kaiser Normalization. a. Rotation converged in 4 iterations. 3 .854 .191 .049 .431 .146 .438 .054 .044 -.045 .008 .054 .048 .078 -.080 Because of their moderately large correlations with both the first and second factors, Wireless last month, Voice mail, and Paging service bridge the "Extras" and "Tech" groups. Calling card last month is moderately correlated with the first and third factors, thus it bridges the "Extras" and "Long Distance" groups. Multiple lines is moderately correlated with the second and third factors, thus it bridges the "Tech" and "Long Distance" groups. This suggests avenues for cross-selling. For example, customers who subscribe to extra services may be more predisposed to accepting special offers on wireless services than Internet services. Summary: What Was Learned  Using a principal axis factors extraction, you have uncovered three latent factors that describe relationships between your variables. These factors suggest various patterns of service usage, which you can use to more efficiently increase crosssales. Using Principal Components  Principal Components can aid in clustering.  What is principal components?  Principal is a statistical technique that creates new variables that are linear functions of the old variables. The main goal of principal components is to to reduce the number of variables needed to analyze. Principal Components Analysis (PCA) What it is and when it should be used. Introduction to PCA  What does principal components analysis do?    Takes a set of correlated variables and creates a smaller set of uncorrelated variables. These newly created variables are called principal components. There are two main objectives for using PCA 1. Reduce the dimensionality of the data. – – – 2. Identify new meaningful underlying variables. – –  In simple English: turn p variables into less than p variables. While reducing the number of variables we attempt to keep as much information of the original variables as possible. Thus we try to reduce the number of variables without loss of information. This is often not possible. The “principal components created are linear combinations of the original variables and often don’t lend to any meaning beyond that. There are several reasons why and situations where PCA is useful. Introduction to PCA  There are several reasons why PCA is useful. 1. 2. PCA is helpful in discovering if abnormalities exist in a multivariate dataset. Clustering (which will be covered later): – PCA is helpful when it is desirable to classify units into groups with similar attributes.  – 3. It can also be helpful for verifying the clusters created when clustering. Discriminant analysis: – – 4. For example: In marketing you may want to classify your customers into groups (or clusters) with similar attributes for marketing purposes. In some cases there may be more response variables than independent variables. It is not possible to use discriminant analysis in this case. Principal components can help reduce the number of response variables to a number less than that of the independent variables. Regression: – It can help address the issue of multicolinearity in the independent variables. Introduction to PCA  Formation of principal components 1. 2. 3. 4. 5. They are uncorrelated The 1st principal component accounts for as much of the variability in the data as possible. The 2nd principal component accounts for as much of the remaining variability as possible. The 3rd … Etc. Principal Components and Least Squares  Think of the Least Squares model Y  X E Y is a n  p matrix of the centered observed variables. X is a n  j matrix of the scores on the 1st j principal components. B is a j  p matrix of the eigenvectors. E is a n  p matrix of the residuals. • Eigenvector <mathematics> A vector which, when acted on by a particular linear transformations, produces a scalar multiple of the original vector. The scalar in question is called the eigenvalue corresponding to this eigenvector. •www.dictionary.com Calculation of the PCA  There are two options: 1. 2.   Correlation matrix. Covariance matrix. Using the covariance matrix will cause variables with large variances to be more strongly associated with components with large eigenvalues and the opposite is true of variables with small variances. For the above reason you should use the correlation matrix unless the variables are comparable or have been standardized. Limitations to Principal Components  PCA converts a set of correlated variables into a smaller set of uncorrelated variables.  If the variables are already uncorrelated than PCA has nothing to add.  Often it is difficult to impossible to explain a principal component. That is often principal components do not lend themselves to any meaning. SAS Example of PCA    We will analyze data on crime. CRIME RATES PER 100,000 POPULATION BY STATE. The variables are: 1. 2. 3. 4. 5. 6. 7.  MURDER RAPE ROBBERY ASSAULT BURGLARY LARCENY AUTO SAS command for PCA SAS CODE:   PROC PRINCOMP DATA=CRIME OUT=CRIMCOMP; run; The dataset is CRIME and results will be saved to CRIMCOMP SAS Output Of Crime Example Observations 50 7 Variables Simple Statistics MURDER RAPE ROBBERY ASSAULT BURGLARY Mean 7.444000000 25.73400000 124.0920000 211.3000000 StD 3.866768941 10.75962995 88.3485672 100.2530492 LARCENY AUTO 1291.904000 2671.288000 377.5260000 432.455711 725.908707 193.3944175 Correlation Matrix MURDER RAPE ROBBERY ASSAULT BURGLARY LARCENY AUTO MURDER 1.0000 0.6012 0.4837 0.6486 0.3858 0.1019 0.0688 RAPE 0.6012 1.0000 0.5919 0.7403 0.7121 0.6140 0.3489 ROBBERY 0.4837 0.5919 1.0000 0.5571 0.6372 0.4467 0.5907 ASSAULT 0.6486 0.7403 0.5571 1.0000 0.6229 0.4044 0.2758 BURGLARY 0.3858 0.7121 0.6372 0.6229 1.0000 0.7921 0.5580 LARCENY 0.1019 0.6140 0.4467 0.4044 0.7921 1.0000 0.4442 AUTO 0.0688 0.3489 0.5907 0.2758 0.5580 0.4442 1.0000 More SAS Output Of Crime Example 0.09798342=0.22203947 - 0.12045606 Eigenvalues of the Correlation Matrix Eigenvalue Difference Proportion Cumulative 1 4.11495951 2.87623768 0.5879 0.5879 2 1.23872183 0.51290521 0.1770 0.7648 3 0.72581663 0.40938458 0.1037 0.8685 4 0.31643205 0.05845759 0.0452 0.9137 5 0.25797446 0.03593499 0.0369 0.9506 6 0.22203947 0.09798342 0.0317 0.9823 7 0.12405606 0.0177 1.0000 The proportion of variability explained by each principal component individually. This value equals the Eigenvalue/(sum of the Eigenvalues). The first two principal components captures 76.48% of the variation. If you include 6 of the 7 principal components you capture 98.23% of the variability. The 7th component only captures 1.77%. More SAS Output Of Crime Example Eigenvectors Prin1 Prin2 Prin3 Prin4 Prin5 Prin6 Prin7 MURDER 0.300279 -.629174 0.178245 -.232114 0.538123 0.259117 0.267593 RAPE 0.431759 -.169435 -.244198 0.062216 0.188471 -.773271 -.296485 ROBBERY 0.396875 0.042247 0.495861 -.557989 -.519977 -.114385 -.003903 ASSAULT 0.396652 -.343528 -.069510 0.629804 -.506651 0.172363 0.191745 BURGLARY 0.440157 0.203341 -.209895 -.057555 0.101033 0.535987 -.648117 LARCENY 0.357360 0.402319 -.539231 -.234890 0.030099 0.039406 0.601690 AUTO 0.295177 0.502421 0.568384 0.419238 0.369753 -.057298 0.147046 Prin1 has all positive values. This variable can be used as a proxy for overall crime rate. Prin2 has positive and negative values. Murder, Rape, and Assault are all negative (Violent Crimes). Robbery, Burglary, Larceny, and Auto are all positive (Property). This variable can be used for an understanding of Property vs. Violent crime. CRIME RATES PER 100,000 POPULATION BY STATE STATES LISTED IN ORDER OF OVERALL CRIME RATE AS DETERMINED BY THE FIRST PRINCIPAL COMPONENT Lowest 10 States and Then theTop 10 States CRIME RATES PER 100,000 POPULATION BY STATE. STATES LISTED IN ORDER OF PROPERTY VS. VIOLENT CRIME AS DETERMINED BY THE SECOND PRINCIPAL COMPONENT Lowest 10 States and Then theTop 10 States Correlation From SAS: First the Descriptive Statistics (A part of the output from Correlation) Correlation Matrix Correlation Matrix: Just the Variables Note that there is correlation among the crime rates. Correlation Matrix: Just the Principal Components Note that there is no correlation among the principal components. Correlation Matrix: Just the Principal Components Note the higher/very high correlations with the 1st few principal components and it decreases as it goes closer to the last principal component. What If We Told SAS to Produce Only 2 Principal Components? Eigenvalues of the Correlation Matrix Eigenvalue Difference Proportion Cumulative The 2 principal components produced when it is asked to produce only 2 principal components are exactly the same for when it produced all. 1 4.11495951 2 1.23872183 2.87623768 0.5879 0.5879 0.1770 0.7648 Eigenvectors Prin1 Prin2 MURDER 0.300279 -.629174 RAPE 0.431759 -.169435 ROBBERY 0.396875 0.042247 ASSAULT 0.396652 -.343528 BURGLARY 0.440157 0.203341 LARCENY 0.357360 0.402319 AUTO 0.295177 0.502421

Factor Analysis - Learn Via Web .com

Related documents

Products

Support

Factor Analysis - Learn Via Web .com

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib