DATA ANALYSIS MARKUS BRAUER Factor Analysis (FA) Goal: in a set of variables, identify which variables form coherent subsets that are relatively independent of each other. Example: the "characteristics of graduate students" – personality characteristics, motivation, intellectual ability, scholastic history, familial history, health, physical characteristics, etc. Major use: develop objective tests for the measurement of personality, intelligence, and other individual differences. Steps in FA: 1) Select and measure a set of variables 2) Extract the factors (perform a FA) 3) Determine the number of factors 4) Rotate the factors 5) Interpret the results Final test of a FA: its interpretability!!!!!! A good FA "makes sense", a bad one does not!!!! Problems with factor analysis: 1) There is no criterion beyond interpretability against which to test the solution! There is no definite statistical test!! 2) Steps 3, 4, and 5 (see above) involve subjective judgments of researchers; different researchers may come to a different solution 3) What you get is what you put in! 4) It's a data driven, exploratory statistical procedure (no theory). 5) FA is frequently used to "save" poorly conducted research; sometimes, FA creates apparent order from real chaos How to do a factor analysis 1) Select and measure a set of variables - include a sufficient number of variables, about 5 to 6 for each hypothesized factor if possible, include one "marker variable" per hypothesized factor select a sample expected to vary on the variables and factors select a sufficiently large sample (at least five cases for each observed variable) collect the data check for normality, linearity, and outliers inspect the correlation matrix (if there are only a few correlations above .30, reconsider using FA; identify "outlying variables") 2 2) Extract the factors (perform a FA) decide on the factor extraction technique a) Principal factors (= principal axis factoring = factor analysis = FA = AF) b) Principal components (= PCA = ACP) c) Image Factor Extraction d) Maximum Likelihood Factor Extraction e) Unweigthed Least Squares Factoring f) Generalized Least Squares Factoring g) Alpha Factoring Principal Factors (FA): - analyzes covariance (but not unique variance and error variance) - produces "factors" - a linear combination of all factors approximates, but does not duplicate, the observed correlation matrix - its purpose is to reproduce the correlation matrix (with a few orthogonal factors) - FA is your choice if you are interested in a theoretical solution uncontamined by unique and error variance Principal Components (PCA): - analyzes variance (including covariance, unique variance and error variance) - produces "components" - a linear combination of all components duplicates the observed correlation matrix - its purpose is to extract a maximum of variance (with a few orthogonal components) - PCA is your choice if you want an empirical summary of the data set 3) Determine the number of factors to retain Two criteria: 1) Eigenvalues: retain all factors with EV > 1 2) Scree plot: retain all factors "before the elbow" The number of retained factors is usually somewhere between the number of variables divided by three and the number of variables divided by five. (Ex: 20 variables 4 to 7 factors) After rotation, look at the factor loadings of all variables. If only one variable loads highly on a factor, the factor is poorly defined. If only two variables load highly on a factor, the factor may be reliable if (a) the two variables are highly correlated with each other (r > .7) and (b) relatively uncorrelated with the other variables The ultimate criterion for the "right" number of factors is the interpretability of the solution! 4) Rotate the factors: When more than one factor is retained, unrotated factors cannot be interpreted in most cases. Rotation does not affect the mathematical fit of the solution!!!!!! 3 Two types of rotations: a) Orthogonal rotation: The factors are uncorrelated (= orthogonal) b) Oblique rotation: The factors may (or may not) be correlated Orthogonal rotations: a) b) Varimax (simplifies factors) c) Quartmax (simplifies variables) d) Equamax Oblique rotations: (delta) = (gamma) = the maximum amount of correlation permitted between factors = 1 the correlation among factors may be very high = 0 the correlation among factors may be fairly high = -4 the factors are orthogonal a) Direct oblimin (simplifies factors) b) Direct quartimin (like direct oblimin but = 0) c) Promax For many research questions, an oblique rotation seems to be more adequate. Often, different rotations produce similar results. The ultimate criterion for the "right" rotation is the interpretability of the solution!!!! 5) Interpret the results: If rotation is orthogonal, the data are interpreted from the "loading matrix" (SPSS: "rotated factor matrix"). The values in this matrix are bivariate correlations between the variables and the factors. If rotation is oblique, the data are interpreted from the "pattern matrix". The values in this matrix are partial correlations between the variables and the factors. In both cases, the values are called "factor loadings". If rotation is oblique, the "structure matrix" contains the bivariate correlations between variables and factors (to be ignored). Basic rule: interpret only factor loadings above .30 !! Ideally, each variable loads only on one factor, and each factor has at least three variables that load highly on it. And: the factors are interpretable!!!! Other things to interpret/report: a) Factor correlation matrix (in oblique rotation only): Extremely high correlations among two factors suggest that these two factors may be combined into a single factor. Extremely low correlations among all factors suggest that an orthogonal rotation may have produced the same result. 4 b) Communalities: Communality values represent the proportion of the variance in a variable that is predictable from the factors underlying it. If communality values equal or exceed 1, there is a problem (too few data, wrong number of factors extracted). A very low communality value for a variable indicates that this variable is an "outlier variable" c) Proportion of variance accounted for by each factor: Def.: is an indicator of the importance of the factor. The values change with rotation. The total amount of variance accounted for by all factors does not change after an orthogonal rotation. With an oblique rotation, one cannot specify the exact proportions of variance accounted for by the factors d) Factor analysis – Related issues 1) Factor scores: Def.: estimates of the scores participants would have received on each of the factors had the factors been measured directly. Factor scores are highly, but not perfectly, correlated with the factors; factor scores should be considered estimates. If factors are orthogonal, factor scores are nearly uncorrelated Different methods to calculate factor scores: 1) 2) 3) 4) average the standardized variables that load highly on a factor regression approach Bartlett method Anderson-Rubin approach 2) Comparison between samples/groups: Do we find the same factor structure in two different samples/groups? Important: employ similar procedures at different steps of the procedure (handling of outliers, transformations, extraction technique, criteria for deciding on the number of factors, type of rotation, computation of factor scores, etc.) First step: comparison of the loading matrices (orthogonal rotation) or of the pattern matrices (oblique rotation) a) Did both groups generate the same number of factors? b) Do the same variables load highly on the factors for the two groups? c) Could you reasonably use the same labels to name factors for both groups? Formal procedures: a) Cattell's salient similarity index (s): compares patterns of loadings b) Pearson's product-moment correlation coefficient (r): compares both pattern and magnitude of loadings 5 Different types of factor analyses