Lab lecture 2

advertisement
Psychology 522/622
Lab Lecture 2
Multiple Linear Regression
We continue our exploration of multiple linear regression analysis.
The goal of this lab is to illustrate basic multiple regression. The simplest multiple
regression is a 2 predictor multiple regression problem. Let’s work through an example of
this situation. We will run a couple of multiple regression models and orient ourselves to
the relevant output and correct interpretations of the parameters.
DATAFILE: injury.sav
This file contains the data for 100 women on the following variables:
injury:
gluts:
age:
Overall injury index based on the records kept by the participants
A measure of strength of the muscles in the upper part of the back of the
leg and the buttocks
The age of the women when the study began.
Injury will be our dependent variable for all of the models.
INITIAL SCATTERPLOTS
First, create scatter plots for all combinations of the variables: Injury by gluts, Injury by
age, Age by gluts. We can do this by creating a scatterplot matrix.
GraphsScatterMatrixDefineMove Injury, gluts, and age to the Matrix Variables
boxOK.
GRAPH
/SCATTERPLOT(MATRIX)=injury gluts age
/MISSING=LISTWISE .
We’re checking for anything fishy (non-linearity, heteroscedasticity, outliers).
age
gluts
injury
2
injury
gluts
age
I don’t see anything fishy. Do you?
What can you tell by looking at the scatterplots here? Do you have any inkling of the
direction/magnitude of the linear associations between these variables? My guess is that
injury and age have a positive relationship, gluts and age likely have a very weak positive
relationship, and gluts and injury have a negative relationship.
Making a 3-D Interactive Scatterplot!!!
GraphsInteractiveScatterplotClick on the 2-D coordinate box and select 3-D
coordinateMove injury to the Y axis, age to the x axis, and gluts to the diagonalOK.
Double click on the plot in the output and play with your data in 3-D!
CORRELATION ANALYSIS
Let’s create a correlation matrix for these three variables. This is a really easy analysis, so
test yourself and see if you can remember how to do it using the dropdown commands.
CORRELATIONS
/VARIABLES=injury gluts age
/PRINT=TWOTAIL NOSIG
/MISSING=PAIRWISE .
3
Correl ations
injury
injury
gluts
age
Pearson Correlation
Sig. (2-tailed)
N
Pearson Correlation
Sig. (2-tailed)
N
Pearson Correlation
Sig. (2-tailed)
N
1
.
100
-.393**
.000
100
.290**
.003
100
gluts
-.393**
.000
100
1
.
100
.158
.115
100
age
.290**
.003
100
.158
.115
100
1
.
100
**. Correlation is s ignificant at t he 0.01 level (2-t ailed).
Were our predictions based on the bivariate scatterplots correct? Pretty much  Let’s
focus on the gluts X injury relationship. Note that in the scatterplot the majority of the
datapoints are just swirling around the center, but we’ve got maybe 3 hanging out toward
the bottom right corner and maybe 3 hanging out in the top left. Although they don’t
appear to be true “outliers” they are influential data points. If we dropped them from our
dataset, it would definitely affect (i.e., weaken) the correlation between these two
variables.
SIMPLE REGRESSION OF INJURY ON GLUTS
Let’s quickly run a simple regression predicting injury from gluts. In this simple
regression, injury is the dependent variable and gluts is the independent variable.
AnalyzeRegressionLinearMove injury to the DV box, move gluts to the IV box
StatisticsSelect Descriptive (Estimates and Model Fit should already be selected)
PlotsMove ZResid to the Y axis, move ZPred to the X axis, Select Histogram, Select
Normal Probability Plot (this is the P-P plot)
SaveUnder Predicted Values, select Standardized and Unstandardized
Click OK.
REGRESSION
/DESCRIPTIVES CORR
/MISSING LISTWISE
/STATISTICS COEFF OUTS R ANOVA
/CRITERIA=PIN(.05) POUT(.10)
/NOORIGIN
/DEPENDENT injury
/METHOD=ENTER gluts
/SCATTERPLOT=(*ZRESID ,*ZPRED )
/RESIDUALS HIST(ZRESID) NORM(ZRESID) .
Regression
4
Variables Entered/Removedb
Model
1
Variables
Entered
gluts a
Variables
Removed
Method
Enter
.
a. All requested variables entered.
b. Dependent Variable: injury
Model Summaryb
Model
1
R
.393a
R Square
.154
Adjusted
R Square
.146
Std. Error of
the Estimate
48.243
a. Predictors: (Constant), gluts
b. Dependent Variable: injury
When we have only one predictor, what do we know to be true about R (i.e., Multiple R)
and r (i.e., the Pearson correlation)? They’re the same, right? So what’s different here,
and why?? Multiple R is positive and r(gluts,injury) is negative. Why? Because R is
always positive. Why? Well, R=rY,Y’. Would you expect the correlation between the
actual values of Y and the predicted values of Y to be negative or positive? Positive, of
course. Remember, when we’re talking about R we’re not talking about the relationship
between X and Y (in this case, gluts and injury). We’re talking about the relationship
between the actual and predicted values of Y (injury). Now, if it turns out that our
predictor(s) do a really terrible job of predicting values our DV, R will just be much
closer to zero.
ANOVAb
Model
1
Regres sion
Residual
Total
Sum of
Squares
41625. 493
228088.5
269714.0
df
1
98
99
Mean Square
41625. 493
2327.434
F
17.885
Sig.
.000a
t
9.660
-4. 229
Sig.
.000
.000
a. Predic tors: (Constant), gluts
b. Dependent Variable: injury
Coeffi cientsa
Model
1
(Const ant)
gluts
Unstandardized
Coeffic ients
B
St d. Error
255.994
26.499
-3. 545
.838
St andardiz ed
Coeffic ients
Beta
a. Dependent Variable: injury
Let’s write out this regression equation for practice:
-.393
5
Ŷ = 255.99 – 3.56(gluts)
Residuals Statistics a
Minimum
89.36
-115.356
-2.753
-2.391
Predicted Value
Residual
Std. Predicted Value
Std. Residual
Maximum
202.81
125.825
2.780
2.608
Mean
145.80
.000
.000
.000
Std. Deviation
20.505
47.999
1.000
.995
N
100
100
100
100
a. Dependent Variable: injury
We want these residual values to be as small as possible because the smaller the residual
values, the better our data is fitting the regression line.
Charts
Normal P-P Plot of Regression Standardized Residual
Histogram
Dependent Variable: injury
Dependent Variable: injury
1.0
20
Expected Cum Prob
0.8
Frequency
15
10
5
0.6
0.4
0.2
Mean = -1.42E-16
Std. Dev. = 0.995
N = 100
0
-3
-2
-1
0
1
2
0.0
0.0
3
0.2
0.4
0.6
0.8
1.0
Observed Cum Prob
Regression Standardized Residual
Our histogram looks pretty good-our residuals are approximately normally distributed.
What about the P-P plot? Although there are a few hiccups, the data are approximating a
line.
Scatterplot
Dependent Variable: injury
Regression Standardized Residual
3
2
1
0
-1
-2
-3
-3
-2
-1
0
1
2
Regression Standardized Predicted Value
3
6
What about the scatterplot of our standardized residual values? We don’t want to see any
shape here. Blobs=good. And for the most past, that’s what we see.
So, on the basis of these three graphs, we’ll decide that we’re comfortable that we
haven’t seriously violated any of our assumptions.
STANDARD MULTIPLE REGRESSION OF INJURY ON
GLUTS AND AGE
In this regression, injury again is the dependent variable and now both gluts and age are
predictors. What we’ll run here is what T&F refer to as “standard multiple regression.”
AnalyzeRegressionLinearMove injury to the dependent variable box, Move gluts
and age to the IV box
Click Statistics  select Collinearity diagnostics (you can also select Part and partial
correlations if you’re interested in seeing those)
We’ll want the same information that we just asked for when running the simple
regression, so everything should already be set up for us.
REGRESSION
/DESCRIPTIVES CORR
/MISSING LISTWISE
/STATISTICS COEFF OUTS R ANOVA TOL
/CRITERIA=PIN(.05) POUT(.10)
/NOORIGIN
/DEPENDENT injury
/METHOD=ENTER gluts age
/SCATTERPLOT=(*ZRESID ,*ZPRED )
/RESIDUALS HIST(ZRESID) NORM(ZRESID) .
Regression
Descriptive Statistics
injury
age
gluts
Mean
145.80
67.32
31.08
Std. Deviation
52.196
3.235
5.783
N
100
100
100
7
Correl ations
Pearson Correlation
Sig. (1-tailed)
N
injury
age
gluts
injury
age
gluts
injury
age
gluts
injury
1.000
.290
-.393
.
.002
.000
100
100
100
age
.290
1.000
.158
.002
.
.058
100
100
100
gluts
-.393
.158
1.000
.000
.058
.
100
100
100
Variables Entered/Removedb
Model
1
Variables
Entered
age, gluts a
Variables
Removed
Method
Enter
.
a. All requested variables entered.
b. Dependent Variable: injury
Model Summaryb
Model
1
R
.531a
R Square
.282
Adjusted
R Square
.267
Std. Error of
the Estimate
44.685
a. Predictors: (Constant), age, gluts
b. Dependent Variable: injury
R=.53 and R2=.28, indicating that 28% of the variance in injury is accounted for by age
and glut strength.
ANOVAb
Model
1
Regres sion
Residual
Total
Sum of
Squares
76026. 610
193687.4
269714.0
df
2
97
99
Mean Square
38013. 305
1996.777
F
19.037
Sig.
.000a
a. Predic tors: (Constant), age, glut s
b. Dependent Variable: injury
This is the F ratio that we want to report in our results section. We’d report it just the way
we would for an ANOVA, F(2, 97) = 19.04, p < .01, R2 = .28.
8
Coeffi cientsa
Model
1
(Const ant)
gluts
age
Unstandardized
Coeffic ients
B
St d. Error
-120.867
94.054
-4. 063
.786
5.837
1.406
St andardiz ed
Coeffic ients
Beta
-.450
.362
t
-1. 285
-5. 166
4.151
Sig.
.202
.000
.000
Collinearity Statistics
Tolerance
VIF
.975
.975
1.026
1.026
a. Dependent Variable: injury
It looks like both of our IVs are significant predictors of injury.
How do we interpret these partial regression coefficients?
Gluts: Holding constant age, for every unit increase in glut strength there is an expected
decrease of 4.06 in injury. Colloquially stated: controlling for age, higher levels of glut
strength are related to fewer incidences of injury
Age: Holding gluts constant, for every unit (i.e., year) increase in age, injury is expected
to increase by 5.84 units. Colloquially stated: controlling for glut strength, injuries
increase with age.
Also, our collinearity statistics look good  Remember, we want high tolerance values.
Let’s write out the regression equation:
Ŷ = -120.87 - 4.06(gluts) + 5.84(age)
Residuals Statistics a
Predicted Value
Residual
Std. Predicted Value
Std. Residual
Minimum
55.91
-105.107
-3.244
-2.352
Maximum
203.42
97.433
2.079
2.180
Mean
145.80
.000
.000
.000
Std. Deviation
27.712
44.232
1.000
.990
N
100
100
100
100
a. Dependent Variable: injury
A rule of thumb is to have these standardized residuals not exceed |2|. They look OK
here.
Charts
9
Normal P-P Plot of Regression Standardized Residual
Histogram
Dependent Variable: injury
Dependent Variable: injury
1.0
14
0.8
Expected Cum Prob
Frequency
12
10
8
6
4
0.6
0.4
0.2
2
Mean = -5.63E-16
Std. Dev. = 0.99
N = 100
0
-4
-2
0
2
0.0
0.0
4
0.2
0.4
0.6
0.8
1.0
Observed Cum Prob
Regression Standardized Residual
Hmm, the histogram isn’t looking quite as good here, but it still isn’t too scary.
P-P plot: again, we’re seeing a few hiccups, especially at the very top of the graph. Let’s
look at the scatterplot of the residuals to see if it gives us any cause for concern.
Scatterplot
Dependent Variable: injury
Regression Standardized Residual
4
2
0
-2
-4
-4
-2
0
2
4
Regression Standardized Predicted Value
For the most part, we’re seeing the desired blob. The point that’s sort of hanging out on
its own to the left of the graph is potentially problematic.
NOTE: for the sake of time, we won’t go through all of the diagnostics, but recall from
last week that you could ask for Mahalanobis distance, Cook’s distance, and leverage
values for additional information about potential outliers. Also, since this is a regression
with two predictors, you could look at the data in 3-D like we discussed earlier!!
SEQUENTIAL (HIERARCHICAL) REGRESSION
This next model conducts a hierarchical regression analysis. Here gluts will be entered as
a predictor variable in the first step and age in the second step.
10
AnalyzeRegressionLinearMove injury to the DV box
Move gluts to the IV boxClick Next
Move Age into the IV box in Block 2 of 2
Click Statisticsselect R squared change
Everything else we want should already be selected from the previous analyses we ran, so
Click OK.
Regression
Variables Entered/Removedb
Model
1
2
Variables
Entered
gluts a
agea
Variables
Removed
Method
Enter
Enter
.
.
a. All requested variables entered.
b. Dependent Variable: injury
Model Summary
Change Statistics
Model
1
2
R
.393a
.531b
R Square
.154
.282
Adjusted
R Square
.146
.267
Std. Error of
the Estimate
48.243
44.685
R Square
Change
.154
.128
F Change
17.885
17.228
df1
df2
1
1
98
97
Sig. F Change
.000
.000
a. Predictors: (Constant), gluts
b. Predictors: (Constant), gluts, age
Already we see that our Model Summary table looks different than it did before. We see
two rows, Model 1 and Model 2. Model 1 refers to “Block 1,” in this case, just gluts
predicting injury. Model 2 refers to “Blocks 1 & 2,” in this case, gluts and age predicting
injury. So what kind of information are we getting here?
MODEL 1
This is the same information we got when we ran a simple regression predicting injury
from gluts. The model summary table for that analysis is below so that you can compare
the two. Nothing new and exciting in the first 5 columns.
Model Summaryb
Model
1
R
.393a
R Square
.154
Adjusted
R Square
.146
Std. Error of
the Estimate
48.243
a. Predictors: (Constant), gluts
b. Dependent Variable: injury
In examining the Change Statistics box, we’ll get the same information that’s provided to
us in the ANOVA summary table (reproduced below) for the simple regression predicting
injury from gluts.
11
Model Summary
Change Statistics
Model
1
2
R
.393a
.531b
R Square
.154
.282
Adjusted
R Square
.146
.267
Std. Error of
the Estimate
48.243
44.685
R Square
Change
.154
.128
F Change
17.885
17.228
df1
df2
1
1
98
97
Sig. F Change
.000
.000
a. Predictors: (Constant), gluts
b. Predictors: (Constant), gluts, age
ANOVAb
Model
1
Sum of
Squares
41625. 493
228088.5
269714.0
Regres sion
Residual
Total
df
1
98
99
Mean Square
41625. 493
2327.434
F
17.885
Sig.
.000a
a. Predic tors: (Constant), gluts
b. Dependent Variable: injury
In the change statistics box, we see columns titled R square change, F change, df1, df2,
and Sig. F Change.
In this case, R square change = R square. Why? Because here SPSS is comparing our
model 1 to a model with no predictors where R2=0.
F change = the F ratio we got when running our simple regression.
Df1 = df(regression), and df2=df(residual)
Sig. F Change = p value for the F ratio we got when running our simple regression.
MODEL 2
Below you see the model summary table for our hierarchical regression, followed by the
model summary table for the standard multiple regression predicting injury from gluts
and age.
Model Summary
Change Statistics
Model
1
2
R
.393a
.531b
R Square
.154
.282
Adjusted
R Square
.146
.267
Std. Error of
the Estimate
48.243
44.685
R Square
Change
.154
.128
F Change
17.885
17.228
df1
df2
1
1
98
97
Sig. F Change
.000
.000
a. Predictors: (Constant), gluts
b. Predictors: (Constant), gluts, age
Model Summaryb
Model
1
R
.531a
R Square
.282
Adjusted
R Square
.267
Std. Error of
the Estimate
44.685
a. Predictors: (Constant), age, gluts
b. Dependent Variable: injury
R = R for standard multiple regression;R2 = R2 for standard multiple regression
12
So far, we haven’t gotten any new information. The cool stuff comes in the change
statistics box.
R2 change = Model 2 R2 – Model 1 R2. So in this case, ΔR2 = .282 - .154 = .128. We
easily could have calculated this ourselves, but SPSS gives it to you 
F change: this is an F ratio that allows you to test whether the change in R2 between
Models 1 and 2 is significant. It is NOT the difference in the F values between Models 1
and 2 (i.e., you don’t calculate it via simple subtraction as you do with R2 change).
So we now have an F ratio and want to see if it is significant. The corresponding p
value for this F ratio shows up in the column titled Sig. F change.
Sig. F Change: This is a p-value for the F change ratio. Now that we have a p value,
what are the two questions we ask? 1) What’s the null hypothesis that we’re evaluating?
Here, we’re evaluating the null hypothesis that Model 1 R = Model 2 R, or that Model 1
R2 = Model 2 R2, or that ΔR2 = 0. The null hypothesis assumes that there will not be a
significant change in R/R2 by adding more predictors. Alternatively phrased to suit our
example, the null hypothesis assumes that the prediction of injury is not enhanced by
adding age as a predictor when gluts is already in the regression. In other words, the null
hypothesis assumes that age does not explain incremental variance in injury scores above
and beyond the variance explained by glut strength. So we’ve answered the first
question. The second question is: 2) Is it significant? Yes, it is significant, so we reject
the null hypothesis and conclude that age does explain incremental variance in injury
scores beyond the variance explained by glut strength.
Below I’ve reproduced the ANOVA summary table for each regression that we’ve run
today so that you can compare the results. This is what you’ll see:
Hierarchical regression’s model 1 = simple regression model 1
Hierarchical regression model 2 = standard multiple regression model 2
HIERARCHICAL REGRESSION
ANOVAc
Model
1
2
Regres sion
Residual
Total
Regres sion
Residual
Total
Sum of
Squares
41625. 493
228088.5
269714.0
76026. 610
193687.4
269714.0
df
1
98
99
2
97
99
Mean Square
41625. 493
2327.434
F
17.885
Sig.
.000a
38013. 305
1996.777
19.037
.000b
a. Predic tors: (Constant), gluts
b. Predic tors: (Constant), gluts , age
c. Dependent Variable: injury
SIMPLE REGRESSION PREDICTING INJURY FROM GLUT STRENGTH
13
ANOVAb
Model
1
Regres sion
Residual
Total
Sum of
Squares
41625. 493
228088.5
269714.0
df
1
98
99
Mean Square
41625. 493
2327.434
F
17.885
Sig.
.000a
Mean Square
38013. 305
1996.777
F
19.037
Sig.
.000a
a. Predic tors: (Constant), gluts
b. Dependent Variable: injury
STANDARD MULTIPLE REGRESSION
ANOVAb
Model
1
Regres sion
Residual
Total
Sum of
Squares
76026. 610
193687.4
269714.0
df
2
97
99
a. Predic tors: (Constant), age, glut s
b. Dependent Variable: injury
MODEL 1 COEFFICIENTS
What do you notice below about the coefficient for gluts? It’s the same as it was when we
ran a simple regression predicting injury from gluts. That’s because Model 1 IS the
simple regression predicting injury from gluts.
HIERARCHICAL REGRESSION
Coefficientsa
Model
1
2
(Constant)
gluts
(Constant)
gluts
age
Unstandardized
Coefficients
B
Std. Error
255.994
26.499
-3.545
.838
-120.867
94.054
-4.063
.786
5.837
1.406
Standardized
Coefficients
Beta
-.393
-.450
.362
t
9.660
-4.229
-1.285
-5.166
4.151
Sig.
.000
.000
.202
.000
.000
Collinearity Statistics
Tolerance
VIF
1.000
1.000
.975
.975
1.026
1.026
a. Dependent Variable: injury
SIMPLE REGRESSION
Coeffi cientsa
Model
1
(Const ant)
gluts
Unstandardized
Coeffic ients
B
St d. Error
255.994
26.499
-3. 545
.838
a. Dependent Variable: injury
St andardiz ed
Coeffic ients
Beta
-.393
t
9.660
-4. 229
Sig.
.000
.000
14
MODEL 2 COEFFICIENTS
What do you notice here? Hmm, these look familiar, too. They are the same coefficients
that we saw when we conducted a standard multiple regression, i.e., put all the
predictors into one step instead of separating them out into two steps.
HIERARCHICAL REGRESSION
Coefficientsa
Model
1
2
(Constant)
gluts
(Constant)
gluts
age
Unstandardized
Coefficients
B
Std. Error
255.994
26.499
-3.545
.838
-120.867
94.054
-4.063
.786
5.837
1.406
Standardized
Coefficients
Beta
-.393
-.450
.362
t
9.660
-4.229
-1.285
-5.166
4.151
Sig.
.000
.000
.202
.000
.000
t
-1. 285
-5. 166
4.151
Sig.
.202
.000
.000
Collinearity Statistics
Tolerance
VIF
1.000
1.000
.975
.975
1.026
1.026
a. Dependent Variable: injury
STANDARD MULTIPLE REGRESSION
Coeffi cientsa
Model
1
(Const ant)
gluts
age
Unstandardized
Coeffic ients
B
St d. Error
-120.867
94.054
-4. 063
.786
5.837
1.406
St andardiz ed
Coeffic ients
Beta
-.450
.362
Collinearity Statistics
Tolerance
VIF
.975
.975
1.026
1.026
a. Dependent Variable: injury
Let’s focus on the coefficients in the hierarchical regression. You can see that the
coefficient for gluts changes between Model 1 (B = -3.55) and Model 2 (B = -4.06).
We’ve seen in our examples in class that, due to multicollinearity and other reasons,
adding predictors can often reduce the size of a coefficient, but here the partial regression
coefficient for gluts actually gets larger. Why do you think that is? Based on the
correlation matrix, we saw that injury shared a pretty strong relationship with age. In the
first model, we’re ignoring age and only including gluts, so there’s a lot noise (i.e., error)
in our prediction. However, once we include age and control for it, some of that
error/noise is reduced. In addition, age and glut strength are not highly correlated (recall
that when we ran the bivariate correlations, rage,gluts = .16. Here we’ve got a situation
where including an additional predictor (age) helps to reduce additional noise (i.e., error;
i.e., variance) in the DV, so our partial regression coefficient for gluts improves.
Below we can check out our collinearity diagnostics, and they look good. We would
expect this because we decided when we ran the standard multiple regression (which is
the same as model 2 in the hierarchical regression) that multicollinearity was not a
problem.
15
b
Ex cluded Variabl es
Model
1
age
Beta In
.362a
t
4.151
Sig.
.000
Partial
Correlation
.388
Collinearity Statistics
Minimum
Tolerance
VIF
Tolerance
.975
1.026
.975
a. Predic tors in the Model: (Const ant), gluts
b. Dependent Variable: injury
So what have we learned??
In our example, Standard multiple regression results = Hierarchical regression
results for Model 2. Although hierarchical regression is nice because it gives you change
statistics, saving you the hassle of calculating them on your own, your last model in
hierarchical regression will ultimately yield the same results as throwing all of the
variables into one step. So, if you’re just interested in how well a particular group of
variables predicts scores on a given DV, standard multiple regression is completely
appropriate and the best suited. However, if you are especially interested in determining
which variable(s) contribute incrementally to the prediction of the DV, hierarchical
regression is a great option.
NOTE: Just to be explicit, you could run a “hierarchical” regression by conducting a
series of standard multiple regressions. In our example, you’d run a regression predicting
injury from gluts. You’d click OK and get your output. This information would be the
same as what we got in Model 1 of our hierarchical regression. Then you’d run another
regression and include gluts and age as predictors in one step. The information here
would be the same as what we got in Model 2 of our hierarchical regression, and the
same as what we saw when running the standard multiple regression. However, this
process takes longer than just clicking “Next” to get you to the next block of predictors in
SPSS, and you don’t get the change statistics. I’m just explaining this to drive home the
point that the results you get will be the same, whether you focus on the last model in
your hierarchical regression or a standard multiple regression where all variables were
entered in one step.
Sample APA results section:
Table 1 [I didn’t actually include this, but you should ] displays correlations
among all study variables. Both predictors shared moderate to strong relationships with
the outcome and were only weakly correlated with one another. Inspection of P-P plots
and a scatterplot of the standardized residual values indicated that the assumptions were
met. Examination of bivariate scatter plots did not reveal any extreme data values.
A hierarchical multiple regression was conducted to determine whether age
contributed incrementally to the prediction of injury scores above and beyond that
accounted for by glut strength. Glut strength was entered in step 1, and age was entered in
step 2. Results indicated that glut strength explained 15% of the variance in injury, F(1,
98) = 17.89, p < .01. Furthermore, age explained an incremental 13% of the variance in
injury scores, F(1, 97) = 17.23, p < .01, above and beyond the variance in accounted for
by glut strength. Partial regression coefficients are reported in Table 2. Results suggest
16
that women of the same age are less likely to be injured if they have more strength in
their gluts. Conversely, women who have the same level of glut strength are more likely
to be injured if they are older.
Variable
B
SE B
β
Step 1
Glut strength
-4.06
.79
-.45*
Step 2
Age
5.84
1.41
.36*
2
2
Note. R = .15 for Step 1; ΔR = .15 for Step 2 (ps < .05).
* p < .05
NOTE: The partial regression coefficients that you report here come from MODEL 2
(i.e., the FINAL model). You do NOT report the partial regression coefficient for glut
strength from Model 1.
NOTE 2: I did not report the t statistic that accompanies the partial regression coefficient
anywhere. This is primarily because we entered only one variable in each step, so the ΔR2
statistic is sufficient. However, if we had entered several variables in step 2, we would
likely report the t statistics associated with each partial regression coefficient. In this way
we’d let our readers know which variables were significant predictors of our DV.
APPENDIX
Here I want to show you how the order in which you enter your coefficients has NO
EFFECT on the regression coefficients that you see in the final model when conducting a
hierarchical regression, or in Model 1 (the only model) in standard multiple regression.
This will be brief, but hopefully it also helpful.
PREDICTING INJURY FROM GLUTS AND AGE
Standard Multiple Regression:
Model Summaryb
Model
1
R
.531a
R Square
.282
Adjusted
R Square
.267
a. Predictors: (Constant), age, gluts
b. Dependent Variable: injury
Std. Error of
the Estimate
44.685
17
ANOVAb
Model
1
Regres sion
Residual
Total
Sum of
Squares
76026. 610
193687.4
269714.0
df
2
97
99
Mean Square
38013. 305
1996.777
F
19.037
Sig.
.000a
a. Predic tors: (Constant), age, glut s
b. Dependent Variable: injury
Coeffi cientsa
Model
1
(Const ant)
gluts
age
Unstandardized
Coeffic ients
B
St d. Error
-120.867
94.054
-4. 063
.786
5.837
1.406
St andardiz ed
Coeffic ients
Beta
-.450
.362
t
-1. 285
-5. 166
4.151
Sig.
.202
.000
.000
a. Dependent Variable: injury
Hierarchical Regression, Step 1 = gluts, Step 2 = age:
Collinearity Statistics
Tolerance
VIF
.975
.975
1.026
1.026
18
Model Summary
Change Statistics
Model
1
2
R
.393a
.531b
R Square
.154
.282
Adjusted
R Square
.146
.267
Std. Error of
the Estimate
48.243
44.685
R Square
Change
.154
.128
F Change
17.885
17.228
df1
df2
1
1
98
97
Sig. F Change
.000
.000
a. Predictors: (Constant), gluts
b. Predictors: (Constant), gluts, age
ANOVAc
Model
1
2
Regres sion
Residual
Total
Regres sion
Residual
Total
Sum of
Squares
41625. 493
228088.5
269714.0
76026. 610
193687.4
269714.0
df
1
98
99
2
97
99
Mean Square
41625. 493
2327.434
F
17.885
Sig.
.000a
38013. 305
1996.777
19.037
.000b
a. Predic tors: (Constant), gluts
b. Predic tors: (Constant), gluts , age
c. Dependent Variable: injury
Coefficientsa
Model
1
2
(Constant)
gluts
(Constant)
gluts
age
Unstandardized
Coefficients
B
Std. Error
255.994
26.499
-3.545
.838
-120.867
94.054
-4.063
.786
5.837
1.406
Standardized
Coefficients
Beta
-.393
-.450
.362
t
9.660
-4.229
-1.285
-5.166
4.151
Sig.
.000
.000
.202
.000
.000
Collinearity Statistics
Tolerance
VIF
1.000
1.000
.975
.975
1.026
1.026
a. Dependent Variable: injury
Hierarchical Regression, Step 1 = age, Step 2 = gluts:
Model Summary
Change Statistics
Model
1
2
R
.290a
.531b
R Square
.084
.282
Adjusted
R Square
.075
.267
a. Predictors: (Constant), age
b. Predictors: (Constant), age, gluts
Std. Error of
the Estimate
50.201
44.685
R Square
Change
.084
.198
F Change
9.024
26.685
df1
df2
1
1
98
97
Sig. F Change
.003
.000
19
ANOVAc
Model
1
2
Regres sion
Residual
Total
Regres sion
Residual
Total
Sum of
Squares
22742. 229
246971.8
269714.0
76026. 610
193687.4
269714.0
df
1
98
99
2
97
99
Mean Square
22742. 229
2520.120
F
9.024
Sig.
.003a
38013. 305
1996.777
19.037
.000b
a. Predic tors: (Constant), age
b. Predic tors: (Constant), age, glut s
c. Dependent Variable: injury
Coefficientsa
Model
1
2
(Constant)
age
(Constant)
age
gluts
Unstandardized
Coefficients
B
Std. Error
-169.650
105.129
4.686
1.560
-120.867
94.054
5.837
1.406
-4.063
.786
Standardized
Coefficients
Beta
.290
.362
-.450
t
-1.614
3.004
-1.285
4.151
-5.166
Sig.
.110
.003
.202
.000
.000
Collinearity Statistics
Tolerance
VIF
1.000
1.000
.975
.975
1.026
1.026
a. Dependent Variable: injury
What’s the same between the two hierarchical regressions???
Model 2 R, Model 2 R2, and the Model 2 partial regression coefficients.
What’s different between the two hierarchical regressions???
Everything in Model 1, R square change, and F change.
For the things that are different, WHY are they different???
Because Model 1 has changed!! In the first hierarchical regression, we entered gluts in
step 1, age in step 2. Recall that by doing this, the results you get for model 1 are equal to
the results you get by running a simple regression predicting injury from gluts. Now that
we’ve changed the order in which we’ve entered the variables (i.e., we entered age in step
1 and gluts in step 2), Model 1 becomes equivalent to the simple regression predicting
injury from age. See below for proof 
Simple regression predicting injury from age:
20
Model Summary
Model
1
R
.290a
R Square
.084
Adjusted
R Square
.075
Std. Error of
the Estimate
50.201
a. Predictors: (Constant), age
ANOVAb
Model
1
Regres sion
Residual
Total
Sum of
Squares
22742. 229
246971.8
269714.0
df
1
98
99
Mean Square
22742. 229
2520.120
F
9.024
Sig.
.003a
a. Predic tors: (Constant), age
b. Dependent Variable: injury
Coeffi cientsa
Model
1
(Const ant)
age
Unstandardized
Coeffic ients
B
St d. Error
-169.650
105.129
4.686
1.560
St andardiz ed
Coeffic ients
Beta
.290
t
-1. 614
3.004
Sig.
.110
.003
a. Dependent Variable: injury
Remember that in hierarchical regression, you are always comparing models. Model 1 to
a model with no predictors, model 2 to model 1, and if we had a model 3 we’d be
comparing it to model 2. So we’re comparing any new variables we enter to all the other
variables that have already been put into the regression.
R square change and F change will also take on different values in the hierarchical
regression (step 1 = age, step 2 = gluts) than they did in the hierarchical regression (step 1
= gluts, step 2 = age). Again, this is because model 1 has changed! So although model 2
in both hierarchical regressions is the same model, the model to which we compare it
(model 1) has changed. This is why we see the same R and R2 values for Model 2 in the
two hierarchical regressions, but different values for ΔR2.
Partial Regression Coefficients across the 3 Regressions (Standard, Hierarchical 1
and Hierarchical 2)
Standard Multiple Regression:
21
Coeffi cientsa
Model
1
(Const ant)
gluts
age
Unstandardized
Coeffic ients
B
St d. Error
-120.867
94.054
-4. 063
.786
5.837
1.406
St andardiz ed
Coeffic ients
Beta
-.450
.362
t
-1. 285
-5. 166
4.151
Sig.
.202
.000
.000
Collinearity Statistics
Tolerance
VIF
.975
.975
1.026
1.026
a. Dependent Variable: injury
Hierarchical Regression (step 1 = gluts, step 2 =age): focus on model 2
Coefficientsa
Model
1
2
(Constant)
gluts
(Constant)
gluts
age
Unstandardized
Coefficients
B
Std. Error
255.994
26.499
-3.545
.838
-120.867
94.054
-4.063
.786
5.837
1.406
Standardized
Coefficients
Beta
-.393
-.450
.362
t
9.660
-4.229
-1.285
-5.166
4.151
Sig.
.000
.000
.202
.000
.000
Collinearity Statistics
Tolerance
VIF
1.000
1.000
.975
.975
1.026
1.026
a. Dependent Variable: injury
Hierarchical Regression (step 1 = age, step 2 =gluts): focus on model 2
Coefficientsa
Model
1
2
(Constant)
age
(Constant)
age
gluts
Unstandardized
Coefficients
B
Std. Error
-169.650
105.129
4.686
1.560
-120.867
94.054
5.837
1.406
-4.063
.786
Standardized
Coefficients
Beta
.290
.362
-.450
t
-1.614
3.004
-1.285
4.151
-5.166
Sig.
.110
.003
.202
.000
.000
Collinearity Statistics
Tolerance
VIF
1.000
1.000
.975
.975
1.026
1.026
a. Dependent Variable: injury
The partial regression coefficients in the final models ARE ALL THE SAME. The Venn
diagrams in the Tabachnick & Fidell may mistakenly lead you to believe that the order in
which you enter the variables will affect the partial regression coefficients that you see in
the final model. No. In hierarchical regression, the order in which you enter the variables
will affect ΔR2 (again, because the comparison model is being changed) but order will not
affect the values of B or β in the final model. And, of course, the values of B and β in the
final model of a hierarchical regression will yield the same values of B and β that you
would have obtained by putting all variables into the same step using standard multiple
regression.
Download