Homework 1 – Correlation and regression Psych 716 , Spring, 2015 These problems are intended to do a number of things, including give you some practical experience and to help develop your critical thinking about the application and interpretation of regression. In addition, it’s intended to underscore some fundamental conceptual facets of regression analysis (and multivariate analyses more generally) that we don’t have time to delve into in class. For all calculations, please show your work; and for SPSS analyses, please turn in the relevant output (please include only the relevant output) – you can copy and paste SPSS output into a Word doc, though you might want to play around with the “paste special” options in Word to get the pasted output to look right. 1. The following problem is intended to solidify your understanding of the significance testing process with correlations, including confidence intervals. Dr. Cleese is interested in the effect of interpersonal rejection on self-esteem, hypothesizing that rejection will decrease selfesteem. He conducts an experiment by manipulating rejection (30 participants are ostensibly rejected by a confederate after a 5minute dyadic interaction, and 30 are not rejected) and then measuring participants’ state self-esteem. He finds that the association is r = -.268. a) Conduct the significance test of this correlation (H0: ρ = 0), be sure to state the statistical conclusion and make a clear psychological inference about the effect of rejection on self-esteem (i.e., does it seem to lower self-esteem, raise selfesteem, have no effect on self-esteem, etc.?). (1 pt) b) Compute and interpret a confidence interval around the parameter estimate (correlation) (1 pt) 2. This is to make sure you’re clear on what correlations tell us and what they do not tell us. It might seem simple and straightforward, but I often see confusions and common misinterpretations. This questions it intended to help you identify whether you’re tempted by one or more common misinterpretations ( Let’s say you’re interested in working memory capacity, anxiety, and gender. You run an analysis separately by gender, and you find that the correlation between measures of WMC and Anx is r = .50 among males and r = .00 among females. Putting aside issues of statistical significance, checkmark any of the following that are valid interpretations of these correlations. Note that a statement might be invalid either because it’s simply wrong or because we don’t have the relevant information. Also note, to avoid unnecessary complications, let’s assume that WMC and Anx are normally distributed and that NMale = NFemale. (2 pts) ____ WMC is more strongly related to Anx among males than among females. ____ Males with high WMC are likely to have a higher levels of anxiety than females with high WMC. ____ In general, males have higher levels of WMC and Anx than females (i.e., mean levels are related to sex) ____ If we know a male’s WMC, we have a .50 probably of being able to predict his Anxiety level. ____ The difference between females’ mean WMC score and mean Anx score is smaller than the difference for males ____ There is less variability in females’ scores than in males’ scores ____ If a male is above average on WMC, then he’s about 75% likely to also be above average on Anx ____ Males who have relatively low levels of WMC tend to have relatively low levels of Anx Now let’s say that you code Gender so that 0 = Male and 1= Female, then you correlate Gender with WMC, finding r = .40. Again, putting aside issues of statistical significance, Putting aside issues of statistical significance, checkmark any of the following that are valid interpretations of these correlations. (2 pts) ____ For females, the correlation is .40. ____ The correlation between WMC and Anx among females is .40. larger than it is among males ____ Gender explains 40% of the variance in WMC ____ Females have a stronger connection between WMC and Anx (as compared to males) ____ Males tend to have higher WMC scores than females ____ Females tend to have higher WMC scores than males ____ Females tend to have higher Anx scores than males ____ Females’ mean level of WMC is .40 points greater than males’ mean level of WMC 3. This question is intended to solidify your understanding of the implications that psychometric quality has for statistical analyses and research conclusions. A local school board is interested in enhancing students’ academic achievement, and considering two possible programs to implement. One is based on a theory that self-esteem affects academic achievement— students who feel good about themselves will perform better in school. Therefore, one program would be designed to increase students’ self-esteem, which, in turn, could have beneficial effects on their academic achievement. The second potential program is based on a theory that academic motivation affects academic achievement—students who are properly motivated will perform better in school. This program would be designed to increase students’ academic motivation, which could have beneficial effects on their achievement. Unfortunately, the school district has enough money to fund only one program, and the board wants to fund the program that might make the biggest impact on the students’ achievement. Your advisor is asked to determine which program might be most effective. Your advisor recruits a sample of students and measure all three constructs—academic achievement, self-esteem, and academic motivation. Keeping things simple, your advisor computes two correlations: (1) the correlation between self-esteem and academic achievement and (2) the correlation between academic motivation and academic achievement. The school board will fund the program for the variable that is most strongly associated with achievement, based on the assumption that it will have the larger impact on achievement. Therefore, if self-esteem is more strongly correlated with achievement, then the school board will fund the self-esteem program. However, if motivation is more strongly associated with achievement, then the school board will fund the motivation program. Your advisor finds that the correlation between self-esteem and achievement (r = .33) is somewhat higher than the correlation between motivation and achievement (r = .26). Consequently, the school board begins to decide to fund the self-esteem program. However, you ask your advisor about the reliability of the three measures. He/she tells you that the measure of achievement had a reliability of .86, the measure of self-esteem had an estimated reliability of .80, and the measure of motivation had an estimated reliability of .40. Considering this psychometric information, what do you think of the school board’s decision? Considering the observed correlations, the reliabilities, and anything else you can glean or compute on the basis of this psychometric information (hint hint), which program would you recommend and why? 4. This problem is intended to give you experience using SPSS to run multiple regression analysis give you experience interpreting multiple regression output help you understand the “variate” in a multivariate procedure help you understand where R (the “multiple correlation”) and R 2 (the squared multiple correlation) come from in regression analysis. Go to the class web site and get the “Functionality” SPSS data file. These data are from a study of “functionality after surgery.” Researchers were interested in psychosocial factors associated with patients’ recovery after heart surgery. They measured patients’: a) SES (higher scores reflect higher socioeconomic status), b) age, c) pre-surgery optimism (higher scores reflect greater belief that recovery will be swift and complete), and d) physical functionality (a medically-assessed judgment by a physician blind to the study’s purposes, made 1 year after surgery). Please answer the following questions (1 pt each): a) Conduct a multiple regression analyses, entering SES, age, and optimism into a single equation predicting functionality. Interpret the unstandardized regression coefficient for age (forgetting about its statistical significance, for a moment) – please be precise here, you might want to refer back to a class handout on interpreting regression slopes. b) I might interpret the results of the multiple regression as suggesting that – if I want to predict someone’s functionality and I know someone’s optimism, then I don’t gain anything by also knowing their age or SES. Please comment on the validity of this interpretation, referring to specific elements of the results to support your comment. c) Create the scores on the variate for each individual. As described early in your textbook, the “variate” is the variable that’s obtained when each individual’s predictor scores are entered into the regression equation. You can either use the menu-driven strategy for computing a new variable (Transform Compute…), or you can use the “COMPUTE” statement in a syntax window. Eg…. COMPUTE variate = 3.773 + -.073*Age + .236*ses + .175*optimism . execute. Note that this is simply a statement of the regression equation, getting the regression coefficients (intercept and unstandardized slopes) from SPSS output. Check the SPSS “data view” to make sure that the new variable has been added to the data set. Next, compute the correlation between the variate and the criterion variable, then square it. Conceptually speaking, what does the correlation represent? The association between what and what? Check back on the original SPSS regression output for these data. Does your correlation (and its square) look familiar? What are they? 5. The following questions are intended to give you practice thinking critically about the application and interpretation of hierarchical regression, using a real-life example of regression analyses and their (mis)interpretation. This Table is from Geiser & Studley’s (2001) study of the SAT-I as a predictor of college academic performance. The researchers obtained data from >80,000 students in the U of California system, focusing on 4 variables: a) undergrad college performance, as indicated by GPA (UCGPA), b) SATI, the traditional SAT “reasoning” test (which was the main test being debated), c) SAT-II, which is a different set of SAT tests (not the traditional SAT that’s being debated), d) High school GPA, and e) family SES, as reflected in family income and parental education. Note that all the slopes in EQ1 are significant, but the SAT-I slope in EQ2 is nonsignificant. These results were cited as partial justification for Wake Forest’s move away from the SAT-I as a required part of the admissions process. Specifically, the results were interpreted as evidence that the SAT-I is predictively biased because its predictive power is too substantially confounded with SES. a) Explain why the “predictively biased” interpretation is tempting, based on Table 6. That is, what is it about the pattern of results that might lead one to make the interpretation? (1 pt) b) Explain why the “predictively biased” interpretation is actually not justified on the basis of the results in the Table above. That is, based on the two models/equations that were tested, why do these results NOT clearly justify the “predictive bias” interpretation? Considering the models tested, what is a “non-bias” alternative explanation for the pattern of results that might seem to indicate bias? (1 pt) c) To address the hypothesis that the SAT-I’s predictive power is too entangled with SES correctly and directly, what twostep hierarchical regression analysis would you recommend, and why? Explain what results of your proposed analyses would support the “predictively biased” interpretation. (1 pt)