Week 12 Factor analysis READING Text reading Dancey and Reidy, Chapter 12 in 3rd edition, Or in 4th Edition its Chapter 14 CHAPTER OVERVIEW In social science, we are very often in a situation where we need to measure traits through the use of questionnaires. Often, we need to aggregate items together in some coherent way, since single questions (ie items) simply are not sufficient to really capture the nature of the underlying trait (called the latent variable). In questionnaire research in particular, we need tools that help us describe and define the range of items that will relate meaningfully to our underlying latent trait. This chapter will look at factor analysis, an especially valuable technique for identifying groups or clusters of variables (items). In this chapter, we will give a conceptual understanding of factor analysis show how to enter a dataset into SPSS and analyse it by factor analysis show how to interpret the statistical output from factor analysis KEY TERMS 1- Factor (latent variable) 2- Principal component analysis 3- Method of Least Squares (MLS). 4- Factor loadings 5- Communality 6- Structure matrix 7- Pattern matrix 8- Kaiser criterion 9- Screeplot 10- Rotation 11- Oblique rotation 12- Orthogonal rotation KEY POINTS 1- Factor analysis is used to uncover the latent structure (dimensions) of a set of variables (items). 2- Factor analysis seeks to uncover the underlying structure of a relatively large set of variables (items). The researcher’s a priori assumption is that any indicator may be associated with any factor. This is the most common form of factor analysis. There is no prior theory and one uses factor loadings to intuit the factor structure of the dataset. 3- Factors or components are the dimensions (or latent variables) identified with clusters of variables (items), as computed using factor analysis. 4- Principal components analysis (PCA) is the most common form of factor analysis. PCA seeks a linear combination of variables (items) such that the maximum variance is extracted from the variables. It then removes this variance and seeks a second linear combination which explains the maximum proportion of the remaining variance, and so on. This is called the principal axis method and results in orthogonal (uncorrelated) factors. PCA analyses total (common and unique) variance. 5- Factor analysis generates a table in which the rows are the observed raw indicator variables (items) and the columns are the factors or latent variables which explain as much of the variance in these variables as possible. The cells in this table are factor loadings, and the meaning of the factors must be induced from seeing which variables are most heavily loaded on which factors. This inferential labelling process will naturally introduce a level of interpretation or subjectivity as different researchers could employ different labels. This is normal and expected. 6- The factor loadings are the correlation coefficients between the variables (rows) and factors (columns). Analogous to Pearson's r, the squared factor loading is the percent of variance in that variable explained by the factor. 7- To get the percent of variance in all the variables accounted for by each factor, add the sum of the squared factor loadings for that factor (column) and divide by the number of variables. 8- The structure matrix is simply the factor loading matrix as in orthogonal rotation, representing the variance in a measured variable explained by a factor on both a unique and common contributions basis. 9- The pattern matrix, in contrast, contains coefficients which just represent unique contributions. For oblique rotation, the researcher looks at both the structure and pattern coefficients when attributing a label to a factor. 10- The sum of the squared factor loadings for all factors for a given variable (row) is the variance in that variable accounted for by all the factors, and this is called the communality. 11- The ratio of the squared factor loadings for a given variable shows the relative importance of the different factors in explaining the variance of the given variable. Factor loadings are the basis for imputing a label to the different factors. 12- Communality is the squared multiple correlation for the variable as dependent using the factors as predictors. The communality measures the percent of variance in a given variable explained by all the factors jointly and may be interpreted as the reliability of the indicator. 13- The eigenvalue for a given factor measures the variance in all the variables (items) which is accounted for by that factor. 14- Kaiser criterion is a common rule of thumb for dropping all factors with eigenvalues under 1.0. 15- Screeplot is the Cattell scree test plots the components as the X axis and the corresponding eigenvalues as the Y axis. As one moves to the right, toward later components, the eigenvalues drop. When the drop ceases and the curve makes an elbow toward less steep decline, Cattell's scree test says to drop all further components after the one starting the elbow. (ie some factors may still appear with eigens greater than 1, but still be essentially meaningless). 16- Rotation serves to make the output more understandable and is usually necessary to facilitate the interpretation of factors. Unrotated solutions are hard to interpret because variables (items) tend to load on multiple factors. 17- Oblique rotations allow the factors to be correlated. 18- Orthogonal rotations such as varimax are selected and no factor correlation matrix is produced as the correlation of any factor with another is zero. Further reading 1- Factor analysis http://www.psych.cornell.edu/Darlington/factor.htm 2- Best practices in factor analysis: An academic article detailing some of the debates around factor analysis. Not for beginners, and has views stated that are not necessarily shared by others. http://pareonline.net/pdf/v10n7.pdf 3- Factor analysis using SPSS http://www.sussex.ac.uk/Users/andyf/factor.pdf SPSS activities 1- Factor analysis http://calcnet.mth.cmich.edu/org/spss/Clips/24FACT~1.mov ACTIVE LEARNING AND OPPORTUNITIES 1- In order to name factors that have been extracted, researchers look at: A- The rotated factor loadings B- The unrotated factor loadings C- The table of the eigenvalues D- None of the above 2- A factor is thought of as an underlying latent variable: A- That is influenced by observed variables B- That is unexplained by unobserved variables C- Along which individuals differ D- Along which individuals are homogenous 3- Factor analysis requires that variables: A- Are not related to each other B- Are related to each other C- Have only a weak relationship with each other D- Are measured in the same units 4- The decision on how many factors to keep is decided on: A- Statistical criteria B- Theoretical criteria C- Both (a) and (b) D- Neither (a) nor (b) 5- The original unrotated matrix is usually rotated so that A- The factors are more significant B- The mathematical calculations are easier C- Interpretation is easier D- All of these 6- A scree plot is a number of: A- Variables plotted against variance account for B- Variables plotted against factor loadings C- Factors plotted against correlation coefficients D- None of the above Final note: Factor analysis is a strange method in that its most effective use will hinge upon making several clear decisions about the analysis as you employ it. That is, there are often certain assumptions and interpretations about “what goes with what”. For example, when items cross-load (ie load on more than one factor), there are several steps one can take, and sometimes, this may even involve eliminating items and rerunning the analysis. This is often normal and responsible action. Sometimes results appear with an unclear focus as simply too much variance is inherent in the analysis. A far clearer picture may emerge once irrelevant items are taken out. In short, in many ‘real life’ situations, it helps the researchers to seek assistance from experienced people accustomed to using this procedure. This is simply responsible behaviour as it is pointless to advancing knowledge claims based upon incomplete or superficial analyses. Indeed, a casual observer may suspect that somehow this method is ‘suspicious’ in that an experienced person seems to ‘steer’, or ‘shape’ the results he or she wants. But such a perception would be quite incorrect. An experienced researcher cannot create a variable using factor analysis, but he or she can use this tool to effectively locate and define a latent trait that genuinely does contribute to the variance.