Factor Analysis (FA)

advertisement
DATA ANALYSIS
MARKUS BRAUER
Factor Analysis (FA)
Goal: in a set of variables, identify which variables form coherent subsets that are relatively
independent of each other.
Example: the "characteristics of graduate students" – personality characteristics, motivation,
intellectual ability, scholastic history, familial history, health, physical characteristics, etc.
Major use: develop objective tests for the measurement of personality, intelligence, and other
individual differences.
Steps in FA:
1) Select and measure a set of variables
2) Extract the factors (perform a FA)
3) Determine the number of factors
4) Rotate the factors
5) Interpret the results
Final test of a FA: its interpretability!!!!!! A good FA "makes sense", a bad one does not!!!!
Problems with factor analysis:
1) There is no criterion beyond interpretability against which to test the solution! There
is no definite statistical test!!
2) Steps 3, 4, and 5 (see above) involve subjective judgments of researchers; different
researchers may come to a different solution
3) What you get is what you put in!
4) It's a data driven, exploratory statistical procedure (no theory).
5) FA is frequently used to "save" poorly conducted research; sometimes, FA creates
apparent order from real chaos
How to do a factor analysis
1) Select and measure a set of variables
-
include a sufficient number of variables, about 5 to 6 for each hypothesized factor
if possible, include one "marker variable" per hypothesized factor
select a sample expected to vary on the variables and factors
select a sufficiently large sample (at least five cases for each observed variable)
collect the data
check for normality, linearity, and outliers
inspect the correlation matrix (if there are only a few correlations above .30, reconsider using
FA; identify "outlying variables")
2
2) Extract the factors (perform a FA)
 decide on the factor extraction technique
a) Principal factors (= principal axis factoring = factor analysis = FA = AF)
b) Principal components (= PCA = ACP)
c) Image Factor Extraction
d) Maximum Likelihood Factor Extraction
e) Unweigthed Least Squares Factoring
f) Generalized Least Squares Factoring
g) Alpha Factoring
Principal Factors (FA):
- analyzes covariance (but not unique variance
and error variance)
- produces "factors"
- a linear combination of all factors
approximates, but does not duplicate, the
observed correlation matrix
- its purpose is to reproduce the correlation
matrix (with a few orthogonal factors)
- FA is your choice if you are interested in a
theoretical solution uncontamined by unique
and error variance
Principal Components (PCA):
- analyzes variance (including covariance,
unique variance and error variance)
- produces "components"
- a linear combination of all components
duplicates the observed correlation matrix
- its purpose is to extract a maximum of
variance (with a few orthogonal
components)
- PCA is your choice if you want an empirical
summary of the data set
3) Determine the number of factors to retain
Two criteria: 1) Eigenvalues: retain all factors with EV > 1
2) Scree plot: retain all factors "before the elbow"
The number of retained factors is usually somewhere between the number of variables divided by
three and the number of variables divided by five. (Ex: 20 variables  4 to 7 factors)
After rotation, look at the factor loadings of all variables. If only one variable loads highly on a
factor, the factor is poorly defined. If only two variables load highly on a factor, the factor may
be reliable if (a) the two variables are highly correlated with each other (r > .7) and (b) relatively
uncorrelated with the other variables
The ultimate criterion for the "right" number of factors is the interpretability of the solution!
4) Rotate the factors:
When more than one factor is retained, unrotated factors cannot be interpreted in most cases.
Rotation does not affect the mathematical fit of the solution!!!!!!
3
Two types of rotations:
a) Orthogonal rotation: The factors are uncorrelated (= orthogonal)
b) Oblique rotation: The factors may (or may not) be correlated
Orthogonal rotations:
a)
b) Varimax (simplifies factors)
c) Quartmax (simplifies variables)
d) Equamax
Oblique rotations:
 (delta) =  (gamma) = the maximum amount of correlation permitted between factors
 = 1  the correlation among factors may be very high
 = 0  the correlation among factors may be fairly high
 = -4  the factors are orthogonal
a) Direct oblimin (simplifies factors)
b) Direct quartimin (like direct oblimin but  = 0)
c) Promax
For many research questions, an oblique rotation seems to be more adequate. Often, different
rotations produce similar results.
The ultimate criterion for the "right" rotation is the interpretability of the solution!!!!
5) Interpret the results:
If rotation is orthogonal, the data are interpreted from the "loading matrix" (SPSS: "rotated factor
matrix"). The values in this matrix are bivariate correlations between the variables and the
factors. If rotation is oblique, the data are interpreted from the "pattern matrix". The values in this
matrix are partial correlations between the variables and the factors. In both cases, the values are
called "factor loadings". If rotation is oblique, the "structure matrix" contains the bivariate
correlations between variables and factors (to be ignored).
Basic rule: interpret only factor loadings above .30 !!
Ideally, each variable loads only on one factor, and each factor has at least three variables that
load highly on it. And: the factors are interpretable!!!!
Other things to interpret/report:
a) Factor correlation matrix (in oblique rotation only):
Extremely high correlations among two factors suggest that these two factors may be
combined into a single factor. Extremely low correlations among all factors suggest that
an orthogonal rotation may have produced the same result.
4
b) Communalities:
Communality values represent the proportion of the variance in a variable that is
predictable from the factors underlying it. If communality values equal or exceed 1, there
is a problem (too few data, wrong number of factors extracted). A very low communality
value for a variable indicates that this variable is an "outlier variable"
c) Proportion of variance accounted for by each factor:
Def.: is an indicator of the importance of the factor. The values change with rotation. The
total amount of variance accounted for by all factors does not change after an orthogonal
rotation. With an oblique rotation, one cannot specify the exact proportions of variance
accounted for by the factors
d)
Factor analysis – Related issues
1) Factor scores:
Def.: estimates of the scores participants would have received on each of the factors had
the factors been measured directly.
Factor scores are highly, but not perfectly, correlated with the factors; factor scores should
be considered estimates. If factors are orthogonal, factor scores are nearly uncorrelated
Different methods to calculate factor scores:
1)
2)
3)
4)
average the standardized variables that load highly on a factor
regression approach
Bartlett method
Anderson-Rubin approach
2) Comparison between samples/groups:
Do we find the same factor structure in two different samples/groups?
Important: employ similar procedures at different steps of the procedure (handling of
outliers, transformations, extraction technique, criteria for deciding on the number of
factors, type of rotation, computation of factor scores, etc.)
First step: comparison of the loading matrices (orthogonal rotation) or of the pattern
matrices (oblique rotation)
a) Did both groups generate the same number of factors?
b) Do the same variables load highly on the factors for the two groups?
c) Could you reasonably use the same labels to name factors for both groups?
Formal procedures:
a) Cattell's salient similarity index (s): compares patterns of loadings
b) Pearson's product-moment correlation coefficient (r): compares both pattern
and magnitude of loadings
5
Different types of factor analyses
Download