Principal Components Analysis with SAS Karl L. Wuensch Dept of Psychology East Carolina University When to Use PCA • You have a set of p continuous variables. • You want to repackage their variance into m components. • You will usually want m to be < p, but not always. Components and Variables • Each component is a weighted linear combination of the variables Ci Wi 1 X1 Wi 2 X 2 Wip X p • Each variable is a weighted linear combination of the components. X j A1 j C1 A2 j C2 Amj Cm Factors and Variables • In Factor Analysis, we exclude from the solution any variance that is unique, not shared by the variables. X j A1 j F1 A2 j F2 Amj Fm U j • Uj is the unique variance for Xj Goals of PCA and FA • Data reduction. • Discover and summarize pattern of intercorrelations among variables. • Test theory about the latent variables underlying a set a measurement variables. • Construct a test instrument. • There are many others uses of PCA and FA. Data Reduction • Ossenkopp and Mazmanian (Physiology and Behavior, 34: 935-941). • 19 behavioral and physiological variables. • A single criterion variable, physiological response to four hours of cold-restraint • Extracted five factors. • Used multiple regression to develop a multiple regression model for predicting the criterion from the five factors. Exploratory Factor Analysis • Want to discover the pattern of intercorrleations among variables. • Wilt et al., 2005 (thesis). • Variables are items on the SOIS at ECU. • Found two factors, one evaluative, one on difficulty of course. • Compared FTF students to DE students, on structure and means. Confirmatory Factor Analysis • Have a theory regarding the factor structure for a set of variables. • Want to confirm that the theory describes the observed intercorrelations well. • Thurstone: Intelligence consists of seven independent factors rather than one global factor. Construct Test Instrument • Write a large set of items designed to test the constructs of interest. • Administer the survey to a sample of persons from the target population. • Use FA to help select those items that will be used to measure each of the constructs of interest. • Use Cronbach’s alpha to check reliability of resulting scales. An Unusual Use of PCA • Poulson, Braithwaite, Brondino, and Wuensch (1997, Journal of Social Behavior and Personality, 12, 743-758). • Simulated jury trial, seemingly insane defendant killed a man. • Criterion variable = recommended verdict – Guilty – Guilty But Mentally Ill – Not Guilty By Reason of Insanity. • Predictor variables = jurors’ scores on 8 scales. • Discriminant function analysis. • Problem with multicollinearity. • Used PCA to extract eight orthogonal components. • Predicted recommended verdict from these 8 components. • Transformed results back to the original scales. A Simple, Contrived Example • Consumers rate importance of seven characteristics of beer. – low Cost – high Size of bottle – high Alcohol content – Reputation of brand – Color – Aroma – Taste PCA-Beer.sas • Download PCA-Beer.sas from http://core.ecu.edu/psyc/wuenschk/SAS/S AS-Programs.htm . • Bring it into SAS. • Run the program. Look at the output. Checking for Unique Variables 1 • Check the correlation matrix (page 1 of output). • If there are any variables not well correlated with some others, might as well delete them. • Or add more variables expected to be correlated with them. • Can still include deleted variables in postPCA analysis. Checking for Unique Variables 2 Correlation Matrix cost size alcohol reputat color aroma taste cost size alcohol reputat color aroma taste 1.00 .832 .767 -.406 .018 -.046 -.064 .832 1.00 .904 -.392 .179 .098 .026 .767 .904 1.00 -.463 .072 .044 .012 -.046 .098 .044 -.443 .909 1.00 .870 -.406 -.392 -.463 1.00 -.372 -.443 -.443 .018 .179 .072 -.372 1.00 .909 .903 -.064 .026 .012 -.443 .903 .870 1.00 Checking for Unique Variables 3 • For each variable, check R2 between it and the remaining variables. You will see these when we cover factor analysis. • Look at partial correlations – variables with large partial correlations share variance with one another but not with the remaining variables – this is problematic. • See page 2 of the output. Checking for Unique Variables 4 • Kaiser’s MSA will tell you, for each variable, how much of this problem exists. • The smaller the MSA, the greater the problem. • An MSA of .9 is marvelous, .5 miserable. • See page 2 of the output. • Typically we would have more than seven variables, and MSA would be likely be larger. Extracting Principal Components 1 • From p variables we can extract p components. • Each of p eigenvalues represents the amount of standardized variance that has been captured by one component. • The first component accounts for the largest possible amount of variance. • The second captures as much as possible of what is left over, and so on. • Each is orthogonal to the others. Extracting Principal Components 2 • Each variable has standardized variance = 1. • The total standardized variance in the p variables = p. • The sum of the m = p eigenvalues = p. • All of the variance is extracted. • For each component, the proportion of variance extracted = eigenvalue / p. Extracting Principal Components 3 • For our beer data, here are the eigenvalues and proportions of variance for the seven components: How Many Components to Retain • From p variables we can extract p components. • We probably want fewer than p. • Simple rule: Keep as many as have eigenvalues 1. • A component with eigenvalue < 1 captured less than one variable’s worth of variance. • Visual Aid: Use a Scree Plot • Scree is rubble at base of cliff. • See page 3 of the output. Scree Plot 3.5 3.0 2.5 2.0 1.5 Eigenvalue 1.0 .5 0.0 1 2 Component Number 3 4 5 6 7 • Only the first two components have eigenvalues greater than 1. • Big drop in eigenvalue between component 2 and component 3. • Components 3-7 are scree. • By default, SAS will retain all components with eigenvalues of 1 or more. • Should also look at a solution with one fewer component and one with one more component. Loadings, Unrotated and Rotated • Loading matrix = factor pattern matrix = component matrix. • Each loading is the Pearson r between one variable and one component. • Since the components are orthogonal, each loading is also a β weight from predicting X from the components. • Here are the unrotated loadings for our 2 component solution: Factor Pattern Matrix Pre-Rotation Loadings • All variables load well on first component, economy and quality vs. reputation. • Second component is more interesting, economy versus quality. • See page 4 of the output. • See the preplot on page 5 of output. Rotate the Axes • Rotate these axes so that the two dimensions pass more nearly through the two major clusters (COST, SIZE, ALCH and COLOR, AROMA, TASTE). • The number of degrees by which I rotate the axes is the angle PSI. For these data, rotating the axes -40.63 degrees has the desired effect. Loadings After Rotation Components After Rotation • Component 1 = Quality versus reputation. • Component 2 = Economy (or cheap drunk) versus reputation. • Page 6 of output. • See the postplot on page 7 of the output. Number of Components in the Rotated Solution • Try extracting one fewer component, try one more component. • Which produces the more sensible solution? • Error = difference in obtained structure and true structure. • Overextraction (too many components) produces less error than underextraction. • If there is only one true factor and no unique variables, can get “factor splitting.” • In this case, first unrotated factor true factor. • But rotation splits the factor, producing an imaginary second factor and corrupting the first. • Can avoid this problem by including a garbage variable that will be removed prior to the final solution. Explained Variance • Square the loadings and then sum them across variables. • Get, for each component, the amount of variance explained. • Prior to rotation, these are eigenvalues. • Our SAS output shows the SSL for each component on page 6, just below the rotated factor pattern. • After rotation the two components together account for (3.02 + 2.91) / 7 = 85% of the total variance. If the last component has a small SSL, one should consider dropping it. • If SSL = 1, the component has extracted one variable’s worth of variance. • If only one variable loads well on a component, the component is not well defined. • If only two load well, it may be reliable, if the two variables are highly correlated with one another but not with other variables. Naming Components • For each component, look at how it is correlated with the variables. • Try to name the construct represented by that factor. • If you cannot, perhaps you should try a different solution. • I have named our components “aesthetic quality” and “cheap drunk.” Communalities • For each variable, sum the squared loadings across components. • This gives you the R2 for predicting the variable from the components, • which is the proportion of the variable’s variance which has been extracted by the components. • See page 4 of the output. Orthogonal Rotations • Varimax -- minimize the complexity of the components by making the large loadings larger and the small loadings smaller within each component. • Quartimax -- makes large loadings larger and small loadings smaller within each variable. • Equamax – a compromize between these two. Oblique Rotations • Axes drawn through the two clusters in the upper right quadrant would not be perpendicular. • May better fit the data with axes that are not perpendicular, but at the cost of having components that are correlated with one another. • More on this later.