Multivariate Stats
Psych 716
Spring 2015
Homework #3
For the following questions, please refer to specific results relevant to your interpretations. You may provide values in the text of your answer, copy and paste specific pieces of output from SPSS , or print the key results (I don’t need everything ) and clearly refer to facets of the output. In fact, it wouldn’t hurt to even mark on the output, so that I know exactly what you might be referring to. Your goal should be to convince me that you have a firm grasp on the issues in factor analysis, that you know what the results mean and how to interpret them. For any answers that require you to conduct analysis by hand, please show your work. Each question/subquestion is worth 1 pt, unless otherwise noted
1. Guided Exploratory Factor Analysis: Practice with conducting and interpreting EFA (4 pts)
Download the data set “Ch12BData - EFA.sav” from the class web site. a) Begin by running a Principal Axis Factor analysis (PFA), with default settings for the “Eigenvalue greater than 1” option. Don’t worry about rotation for now. Considering at least two rules-of-thumb that we discussed, what do the eigenvalues and scree plot suggest about the number of factors underlying the variables? Explain your answer. b) Based on your evaluation of the eigenvalues and scree plot, run the appropriate analysis (ie, extracting however many factors you think appropriate), using a promax rotation. What do the results indicate about the factor structure? i.
Does this solution seem to achieve simple structure (i.e., a “clean” structure)? Explain – what seems pretty good about the structure, and what seems not-as-good (please refer to at least two specific values to explain/illustrate your thoughts)? ii.
How would you characterize the factors , psychologically speaking? Of course, this is a subjective issue, so be sure to offer clear statistical and logical support for your interpretation. iii.
To what degree are the factors correlated ?
2. Depth in EFA: Understanding “Communalities” (2 pts)
In class, we brushed off the “Communalities” output, and indeed, you don’t really need to examine these values in order to get the basic information from a factor analysis. However, a sophisticated understanding of factor analysis includes understanding these, and they can deepen your understanding of factor analysis in general. (.5 pts each) a) Go back to the data set you used, above. Run a regression analysis predicting the first item from eight other items.
Present and interpret the R 2 value from this equation (don’t worry about whether it’s significant, just think about the
R 2 value itself). Do the same thing with the second item, predicting it from the other eight items. (1pt) b) Run a principal axis factor analysis of the data, extract two factors, use a promax rotation, and examine the “Initial
Communality” values of the first two items/variables. Considering your results from 4a…
In what sense does the term “communality” accurately describe these values? In what way does it reflect something that the items have in “common.” What would a low communality imply about an item? What would a high communality imply? (Note, that this interpretation applies only for the communality estimate used for principal factor analysis (PFA), there are other types of estimates for other methods of conducting a factor analysis; however, this is a pretty common approach and conveys the general idea). c) Now, let’s think about the “Extraction Communality” values. Imagine that people have scores on each factor and that their responses to the items are caused by their levels on the underlying factors – that is, the factors are seen as latent psychological variables that affect peoples’ responses to the measured/observed variables. Thus, you can imagine a multiple regression model (for a two-factor structure):
Vki = a + b1k(F1i) + b2k(F2i)
In this model, Vki is individual i’s response to (observed) variable/item k, F1i and F2i are the individual’s scores/levels on the underlying factors/predictors, and b1k and b2k are the slopes predicting the observed responses on variable k from Factor 1 and Factor 2 respectively. To underscore this, go back to your notes/handouts from the first 1/3 of the semester and find the equation to compute the R-squared value for a dv on the basis of predictor-outcome correlations and predictor-predictor correlation in a two-predictor model.
i.
Then, (using the PFA with promax rotation and a two-factor extraction) find the relevant values to enter into this equation for the item “Often I don't succeed to take the time necessary to study for this course in a thorough way” – i.e., find both predictor/outcome correlations and the predictor-predictor correlation. Enter these values into the equation and compute the R-squared (please show your work and be clear about where you got the values from in the output). Present the resulting value, match it to the Extraction Communality value for the first item, and note the similarity/discrepancy. ii.
Interpret the result, and explain the way in which the term “communality” accurately describes this value? In what way does it reflect something that the items have in “common?” What would a low communality imply about an item? What would a high communality imply?
3. Self-directed EFA: Practice with conducting and interpreting EFA (4 pts)
The first question on this assignment led you, step by step, in addressing key questions in factor analysis. Of course, in real life, you might not have anyone telling you exactly what to do. You’ll need to figure out what analyses to do, what results to examine, and what it all means. With that in mind, this question leaves all that to you, more or less. Let’s see what you come up with!
Analyze the data set “Factor Analysis data set HW 3.sav” on the class web site. These are real data taken from a personality questionnaire. Responses to the items were made on a 5-point Likert scale, so that larger values (ie, 5) indicate greater agreement with the item. The big goal – understand the factor structure underlying the set of items. This information would then help us understand how best to create subscales (if any) from these items.
So, please conduct a factor analysis and describe your interpretation of the most plausible factor structure.
Also, for the purposes of our homework assignment, briefly discuss one or two reasonable alternative factor structures that merit consideration – what the structures were (eg, 17-factors, 2-factors, etc), why you considered them, and why you rejected them in favor of the one you finally settled on. Hint – looking at the results, there are several factor structures that I’d consider on the basis of various “rules-of-thumb” that we’ve discussed.
Hints: a) Be sure to address the key questions in factor analysis, as discussed in class. b) Again, please be sure to provide information about specific results relevant to your interpretation.
FYI, items with (R) in the label have been reverse-scored. So for example, item 2's label is "Tends to find fault with others
(R)" - this means that the original item was phrased "Tends to find fault with others" where endorsements (ie, responses of 4 or
5 on the 5-point scale) indicated agreement with the item. However, responses to this item have been reverse-scored (5 changed to 1, 4 changed to 2, etc); so in the current data a 4 or 5 indicate the opposite of the apparent item content - they indicate tendency to not find fault with others.
4. The following problem is intended to give you insight into: (3 pts)
More complete integrations of regression-based procedures and variance-decomposition ideas.
The ANOVA output that accompanies regression analysis
The meaning of prediction error
The meaning of explained variance in regression. In an earlier homework, you learned what R2 meant in terms of the variate (i.e., the squared correlation between the variate and actual scores on the outcome). The current question gives you another perspective.
Go way back to the small example data set from the “correlation” class handout – the one with IQ and GPA (let me know if you can’t find this). Recall, we’re interested in understanding differences in GPA, and whether those differences are “explained by”
(or at least associated with) differences in IQ. (.5 pts each)
a) Enter the data into SPSS for IQ and GPA, and compute/report the total Sum of Squares for the GPA scores (you may need to do this by hand, unless you use SPSS to do it, say perhaps through an ANOVA procedure). What does this value represent?
b) Run a basic regression analysis predicting GPA from IQ (note that your results should match the regression equation that we obtained in class, way back when. If not, you might want to revisit your data and/or your analysis). Based on the regression parameters you get, compute a predicted GPA score for each individual (either by hand or, better yet, by using the “compute” command in SPSS syntax). Next, compute/report the Sum of Squares for the “predicted GPA” scores (again, you may need to do this by hand, unless you use SPSS to do it). c) Now, go back to the predicted GPA scores, and compute the difference between individual 1’s predicted GPA score and his/her actual GPA (note, such differences are often called “residuals”). Then, square this difference and report it.
What does this difference represent, for individual 1?
d) Repeat the previous step (i.e., computing squared residuals) for the other four individuals, and sum the five squared residuals and report this value. Again, you can do this either by hand or via SPSS “compute” commands to create new
“Residual” variables). What does this sum reflect?
e) Sum your two final values from b and d, and report. Compare this sum to the value you computed in a, above. Then divide the value of b by the value from a, above, and report. Finally, compare all four values (a, b, d, ratio of b to a) to the ANOVA table and the Model summary Table from your original SPSS regression analysis. Which values do they match?
f) So, interpret R 2 in terms of explained vs unexplained (ie error) variance.