716 HW 1 Sp 2015 v1

advertisement
Homework 1 – Correlation and regression
Psych 716 , Spring, 2015
These problems are intended to do a number of things, including give you some practical experience and to help develop
your critical thinking about the application and interpretation of regression. In addition, it’s intended to underscore
some fundamental conceptual facets of regression analysis (and multivariate analyses more generally) that we don’t have
time to delve into in class. For all calculations, please show your work; and for SPSS analyses, please turn in the relevant
output (please include only the relevant output) – you can copy and paste SPSS output into a Word doc, though you
might want to play around with the “paste special” options in Word to get the pasted output to look right.
1. The following problem is intended to solidify your understanding of the significance testing process with correlations,
including confidence intervals.
Dr. Cleese is interested in the effect of interpersonal rejection on self-esteem, hypothesizing that rejection will decrease selfesteem. He conducts an experiment by manipulating rejection (30 participants are ostensibly rejected by a confederate after a 5minute dyadic interaction, and 30 are not rejected) and then measuring participants’ state self-esteem. He finds that the
association is r = -.268.
a)
Conduct the significance test of this correlation (H0: ρ = 0), be sure to state the statistical conclusion and make a clear
psychological inference about the effect of rejection on self-esteem (i.e., does it seem to lower self-esteem, raise selfesteem, have no effect on self-esteem, etc.?). (1 pt)
b) Compute and interpret a confidence interval around the parameter estimate (correlation) (1 pt)
2. This is to make sure you’re clear on what correlations tell us and what they do not tell us. It might seem simple and
straightforward, but I often see confusions and common misinterpretations. This questions it intended to help you
identify whether you’re tempted by one or more common misinterpretations (
Let’s say you’re interested in working memory capacity, anxiety, and gender. You run an analysis separately by gender, and you
find that the correlation between measures of WMC and Anx is r = .50 among males and r = .00 among females. Putting aside
issues of statistical significance, checkmark any of the following that are valid interpretations of these correlations. Note that a
statement might be invalid either because it’s simply wrong or because we don’t have the relevant information. Also note, to
avoid unnecessary complications, let’s assume that WMC and Anx are normally distributed and that NMale = NFemale. (2 pts)
____ WMC is more strongly related to Anx among males than among females.
____ Males with high WMC are likely to have a higher levels of anxiety than females with high WMC.
____ In general, males have higher levels of WMC and Anx than females (i.e., mean levels are related to sex)
____ If we know a male’s WMC, we have a .50 probably of being able to predict his Anxiety level.
____ The difference between females’ mean WMC score and mean Anx score is smaller than the difference for males
____ There is less variability in females’ scores than in males’ scores
____ If a male is above average on WMC, then he’s about 75% likely to also be above average on Anx
____ Males who have relatively low levels of WMC tend to have relatively low levels of Anx
Now let’s say that you code Gender so that 0 = Male and 1= Female, then you correlate Gender with WMC, finding r = .40.
Again, putting aside issues of statistical significance, Putting aside issues of statistical significance, checkmark any of the
following that are valid interpretations of these correlations. (2 pts)
____ For females, the correlation is .40.
____ The correlation between WMC and Anx among females is .40. larger than it is among males
____ Gender explains 40% of the variance in WMC
____ Females have a stronger connection between WMC and Anx (as compared to males)
____ Males tend to have higher WMC scores than females
____ Females tend to have higher WMC scores than males
____ Females tend to have higher Anx scores than males
____ Females’ mean level of WMC is .40 points greater than males’ mean level of WMC
3. This question is intended to solidify your understanding of the implications that psychometric quality has for statistical
analyses and research conclusions.
A local school board is interested in enhancing students’ academic achievement, and considering two possible programs to
implement. One is based on a theory that self-esteem affects academic achievement— students who feel good about themselves
will perform better in school. Therefore, one program would be designed to increase students’ self-esteem, which, in turn, could
have beneficial effects on their academic achievement. The second potential program is based on a theory that academic
motivation affects academic achievement—students who are properly motivated will perform better in school. This program
would be designed to increase students’ academic motivation, which could have beneficial effects on their achievement.
Unfortunately, the school district has enough money to fund only one program, and the board wants to fund the program that
might make the biggest impact on the students’ achievement. Your advisor is asked to determine which program might be most
effective.
Your advisor recruits a sample of students and measure all three constructs—academic achievement, self-esteem, and academic
motivation. Keeping things simple, your advisor computes two correlations: (1) the correlation between self-esteem and
academic achievement and (2) the correlation between academic motivation and academic achievement. The school board will
fund the program for the variable that is most strongly associated with achievement, based on the assumption that it will have the
larger impact on achievement. Therefore, if self-esteem is more strongly correlated with achievement, then the school board will
fund the self-esteem program. However, if motivation is more strongly associated with achievement, then the school board will
fund the motivation program.
Your advisor finds that the correlation between self-esteem and achievement (r = .33) is somewhat higher than the correlation
between motivation and achievement (r = .26). Consequently, the school board begins to decide to fund the self-esteem program.
However, you ask your advisor about the reliability of the three measures. He/she tells you that the measure of achievement had
a reliability of .86, the measure of self-esteem had an estimated reliability of .80, and the measure of motivation had an estimated
reliability of .40.
Considering this psychometric information, what do you think of the school board’s decision? Considering the observed
correlations, the reliabilities, and anything else you can glean or compute on the basis of this psychometric information (hint
hint), which program would you recommend and why?
4. This problem is intended to
 give you experience using SPSS to run multiple regression analysis
 give you experience interpreting multiple regression output
 help you understand the “variate” in a multivariate procedure
 help you understand where R (the “multiple correlation”) and R 2 (the squared multiple correlation) come from
in regression analysis.
Go to the class web site and get the “Functionality” SPSS data file. These data are from a study of “functionality after surgery.”
Researchers were interested in psychosocial factors associated with patients’ recovery after heart surgery. They measured
patients’: a) SES (higher scores reflect higher socioeconomic status), b) age, c) pre-surgery optimism (higher scores reflect
greater belief that recovery will be swift and complete), and d) physical functionality (a medically-assessed judgment by a
physician blind to the study’s purposes, made 1 year after surgery). Please answer the following questions (1 pt each):
a)
Conduct a multiple regression analyses, entering SES, age, and optimism into a single equation predicting functionality.
Interpret the unstandardized regression coefficient for age (forgetting about its statistical significance, for a moment) –
please be precise here, you might want to refer back to a class handout on interpreting regression slopes.
b) I might interpret the results of the multiple regression as suggesting that – if I want to predict someone’s functionality
and I know someone’s optimism, then I don’t gain anything by also knowing their age or SES. Please comment on the
validity of this interpretation, referring to specific elements of the results to support your comment.
c)
Create the scores on the variate for each individual. As described early in your textbook, the “variate” is the variable
that’s obtained when each individual’s predictor scores are entered into the regression equation. You can either use the
menu-driven strategy for computing a new variable (Transform  Compute…), or you can use the “COMPUTE”
statement in a syntax window. Eg….
COMPUTE variate = 3.773 + -.073*Age + .236*ses + .175*optimism .
execute.
Note that this is simply a statement of the regression equation, getting the regression coefficients (intercept and
unstandardized slopes) from SPSS output. Check the SPSS “data view” to make sure that the new variable has been
added to the data set. Next, compute the correlation between the variate and the criterion variable, then square it.

Conceptually speaking, what does the correlation represent? The association between what and what?

Check back on the original SPSS regression output for these data. Does your correlation (and its square) look
familiar? What are they?
5. The following questions are intended to give you practice thinking critically about the application and interpretation
of hierarchical regression, using a real-life example of regression analyses and their (mis)interpretation.
This Table is from Geiser & Studley’s (2001) study of
the SAT-I as a predictor of college academic
performance. The researchers obtained data from
>80,000 students in the U of California system,
focusing on 4 variables: a) undergrad college
performance, as indicated by GPA (UCGPA), b) SATI, the traditional SAT “reasoning” test (which was the
main test being debated), c) SAT-II, which is a
different set of SAT tests (not the traditional SAT
that’s being debated), d) High school GPA, and e)
family SES, as reflected in family income and parental
education. Note that all the slopes in EQ1 are
significant, but the SAT-I slope in EQ2 is nonsignificant.
These results were cited as partial justification for
Wake Forest’s move away from the SAT-I as a required part of the admissions process. Specifically, the results were interpreted
as evidence that the SAT-I is predictively biased because its predictive power is too substantially confounded with SES.
a)
Explain why the “predictively biased” interpretation is tempting, based on Table 6. That is, what is it about the pattern
of results that might lead one to make the interpretation? (1 pt)
b) Explain why the “predictively biased” interpretation is actually not justified on the basis of the results in the Table
above. That is, based on the two models/equations that were tested, why do these results NOT clearly justify the
“predictive bias” interpretation? Considering the models tested, what is a “non-bias” alternative explanation for the
pattern of results that might seem to indicate bias? (1 pt)
c)
To address the hypothesis that the SAT-I’s predictive power is too entangled with SES correctly and directly, what twostep hierarchical regression analysis would you recommend, and why? Explain what results of your proposed analyses
would support the “predictively biased” interpretation. (1 pt)
Download