716 Spr 2013 HW 3 - Factor Analysis v1

advertisement
Multivariate Statistics (Psych 716), Spring 2013
Homework #3 –Factor Analysis and more Factor Analysis
For the following questions, please provide information about specific results relevant to your interpretations. That is,
be clear about which tables, plots, specific values, etc are the basis of your interpretations. You may provide values in
the text of your answer, copy and paste specific pieces of output form SPSS, or print the key results (I don’t need
everything) and clearly refer to facets of the output. In fact, it wouldn’t hurt to even mark on the output, so that I
know exactly what you might be referring to. Your goal should be to convince me that you have a firm grasp on the
issues in factor analysis, that you know what the results mean and how to interpret them.
Download the data set “Ch12BData - EFA.sav” from the class web site.
1. Begin by running a Principal Axis Factor analysis (PFA), with default settings for the “Eigenvalue greater than 1”
option. Don’t worry about rotation for now.
a) Considering at least two rules-of-thumb that we discussed, what do the eigenvalues and scree plot suggest
about the number of factors underlying the variables? Explain your answer.
2. Based on your evaluation of the eigenvalues and scree plot, run the appropriate analysis (ie, extracting however
many factors you think appropriate), using a promax rotation. What do the results indicate about the factor structure?
a) Does this solution seem to achieve simple structure (i.e., a “clean” structure)? Explain – what seems more
and less clean about the structure?
b) How would you characterize the factors, psychologically speaking? Of course, this is a subjective issue, so
be sure to offer clear statistical and logical support for your interpretation.
c) To what degree are the factors correlated?
3. Now let’s think about different rotation methods, to give you a deeper sense of what they do, what their
implications are, and their connection to factor loadings. Re-run the PFA from step 2, now using a varimax rotation.
a) Comment on the similarity/dissimilarity between the varimax’s rotated matrix and the promax’s pattern
matrix. Is one “cleaner” in terms of simple structure? Explain with specific examples from the matrices.
b) To what degree are the factors correlated?
c) Note that varimax (a rotation that creates orthogonal factors) gives only one rotated matrix of factor loadings,
whereas promax (a rotation that creates non-orthogonal/oblique factors) generates two rotated matrices – a
pattern matrix and a structure matrix. Why is this? That is, why is the distinction between pattern
coefficients and structure coefficients NOT relevant in varimax rotations (i.e., the pattern coefficients in a
varimax rotation would be the same exact values as the corresponding structure coefficients in the varimax
rotation)? To answer this question, please work through the following series of questions:
i.
In regression, why is a predictor’s standardized slope unequal to its zero-order correlation with the
outcome (and conversely, when do standardized weights equal zero-order correlations?),
ii.
In terms of a PFA, what’s the underlying model – i.e., what is “causing/predicting” what? (what is
treated as a predictor and what are treated as outcomes, see the book and question 4c, below)
iii.
In PFA, what can determine whether the “predictors” are correlated versus uncorrelated?
iv.
What is the difference between pattern coefficients and structure coefficients (as discussed in class)?
v.
And finally, for varimax rotation, why can the (potential) difference between pattern coefficients and
structure coefficients be ignored?
4. In class, we brushed off the “Communalities” output, and indeed, you don’t really need to examine these values in
order to get the basic information from a factor analysis. However, a sophisticated understanding of factor analysis
includes knowledge of these represent, and they can deepen your understanding of factor analysis in general.
a) Run a regression analysis predicting the first item from eight other items. Present and interpret the R2 value
from this equation (don’t worry about whether it’s significant, just think about the R2 value itself). Do the
same thing with the second item, predicting it from the other eight items.
b) Run a principal axis factor analysis of the data, extract two factors, use a promax rotation, and examine the
“Initial Communality” values of the first two items/variables. Considering your results from 4a…
In what sense does the term “communality” accurately describe these values? In what way does it reflect
something that the items have in “common.” What would a low communality imply about an item? What
would a high communality imply? (Note, that this interpretation applies only for the communality estimate
used for principal factor analysis (PFA), there are other types of estimates for other methods of conducting a
factor analysis; however, this is a pretty common approach and conveys the general idea).
c) Now, let’s think about the “Extraction Communality” values. Imagine that people have scores on each factor
and that their responses to the items are caused by their levels on the underlying factors – that is, the factors
are seen as latent psychological variables that affect peoples’ responses to the measured/observed variables.
Thus, you can imagine a multiple regression equation (for a two-factor structure):
Vki = a + b1k(F1i) + b2k(F2i)
In this equation, Vki is individual i’s response to (observed) variable/item k, F1i and F2i are the individual’s
scores/levels on the underlying factors/predictors, and b1k and b2k are the slopes predicting the observed
responses on variable k from Factor 1 and Factor 2 respectively. To underscore this, go back to your
notes/handouts from the first 1/3 of the semester and find the equation to compute the R-squared value for a
dv on the basis of predictor-outcome correlations and predictor-predictor correlation in a two-predictor model.
i.
Then, (using the PFA with promax rotation and a two-factor extraction) find the relevant values to
enter into this equation for the item “Often I don't succeed to take the time necessary to study for this
course in a thorough way” – i.e., find both predictor/outcome correlations and the predictor-predictor
correlation. Enter these values into the equation and compute the R-squared (please show your
work and be clear about where you got the values from in the output). Present the resulting value,
match it to the Extraction Communality value for the first item, and note the similarity/discrepancy.
ii.
Interpret the result, and explain the way in which the term “communality” accurately describes this
value? In what way does it reflect something that the items have in “common?” What would a low
communality imply about an item? What would a high communality imply?
iii.
Note that the extraction communality for item 1 is lower than the one for item 5. Why is this? That
is, note those items’ values in the pattern matrix – in what way do these pattern coefficients reveal
why item 1 has a lower extraction communality value than item 5. When answering this question,
please be clear about what the pattern coefficients mean (think regression, as explained above).
5. Analyze the data set “Factor Analysis data set HW 3.sav” on the class web site. These are real data taken from a
personality questionnaire. Responses to the items were made on a 5-point Likert scale, so that larger values (ie, 5)
indicate greater agreement with the item. The big goal – understand the factor structure underlying the set of items.
This information would then help us understand how best to create subscales (if any) from these items.
a) Conduct a factor analysis and describe your interpretation of the most plausible factor structure.
b) Be sure to address the key questions in factor analysis, as discussed in class.
c) Again, please be sure to provide information about specific results relevant to your interpretation.
d) Also, for the purposes of our homework assignment, briefly discuss one or two reasonable alternative
factor structures that merit consideration – what the structures were (eg, 17-factors, 2-factors, etc), why you
considered them, and why you rejected them in favor of the one you finally settled on. Hint – looking at the
results, there are several factor structures that I’d consider on the basis of various “rules-of-thumb” that we’ve
discussed.
FYI, items with (R) in the label have been reverse-scored. So for example, item 2's label is "Tends to find fault with
others (R)" - this means that the original item was phrased "Tends to find fault with others" where endorsements (ie,
responses of 4 or 5 on the 5-point scale) indicated agreement with the item. However, responses to this item have
been reverse-scored (5 changed to 1, 4 changed to 2, etc); so in the current data a 4 or 5 indicate the opposite of the
apparent item content - they indicate tendency to not find fault with others.
Download