Multiple Regression - Radford University

advertisement
Multiple Regression
Dr. Tom Pierce
Department of Psychology
Radford University
In the previous chapter we talked about regression as a technique for using a person’s
score on one variable to make a best guess about that person’s score on another variable.
The regression equation, Y’ = a + bX tells you what to do with a person’s score for X to
generate the best guess you could make about a person’s score for Y. But why should the
researcher have to base their best guess on just one piece of information? Why can’t you
take what you know about a person’s scores on two variables, or three, or four, to help
you to make that best guess? The answer is that you can. Multiple regression is a set of
techniques for generating a predicted score for one variable from two or more
predictor variables. And the nice thing about multiple regression is that it’s just an
extension of regression with one predictor variable. All of the basic principles we covered
in the last chapter still hold true in this chapter.
Let’s say that a researcher works with Alzheimer’s caregivers. We know that persons in
this situation are more likely to suffer from many of the common markers of chronic
stress. For example, they are more likely to be depressed, to display impaired immune
function, and to report problems with sleep, high blood pressure, and ulcer. The
researcher is interested in predicting the quality of life of caregivers two years after a
diagnosis of Alzheimer’s disease in the person they’re caring for. The researcher would
need to think about the types of information they could collect now that might help them
to make an accurate prediction about a score on a measure of quality of life two years
from now.
One variable that might be helpful is the amount of social support the caregiver has
access to. A second potential predictor variable might be the age of the caregiver. A third
predictor variable might be the available financial assets to which the family has access.
Obviously, a study of this type would attempt to collect data from dozens, if not hundreds
of families affected by Alzheimer’s disease. I’m going to present hypothetical data for 35
caregivers, just so that we have something to talk about. So, imagine a study where
measures of social support, caregiver age, and caregiver financial assets are obtained at
the time a spouse is diagnosed with Alzheimer’s disease and then a measure of quality of
life is obtained two years later.
©Thomas W. Pierce 2005
2
Table X.1. Made-up data for the predictors of scores for quality of life.
Social
Participant Support
------------- ---------1
16
2
26
3
17
4
27
5
40
6
20
7
28
8
38
9
35
10
21
11
41
12
10
13
26
14
38
15
29
16
36
17
35
18
15
19
23
20
29
21
45
22
23
23
11
24
15
25
33
26
16
27
25
28
16
29
29
30
42
31
19
32
27
33
36
34
34
35
23
Caregiver
Age
----------56
44
75
59
58
78
63
44
59
76
50
82
79
69
76
73
68
71
71
75
63
79
75
67
67
54
41
75
61
56
79
65
67
57
75
Financial Quality
Assets
of Life
----------- --------275,000
12
325,000
8
1,500,000
12
2,100,000
12
560,000
10
790,000
9
1,100,000
12
973,000
18
372,000
11
70,000
8
210,000
10
65,000
5
1,150,000
7
15,000
10
36,000
9
72,000
15
221,000
11
14,000
8
115,000
9
28,000
8
550,000
16
79,000
10
35,000
14
110,000
8
270,000
12
250,000
11
285,000
9
120,000
13
210,000
13
560,000
11
650,000
8
130,000
11
945,000
19
272,000
10
50,000
8
Year
Born
-------1950
1962
1931
1947
1948
1928
1943
1962
1947
1930
1956
1924
1927
1937
1930
1933
1938
1935
1935
1931
1943
1927
1931
1939
1939
1952
1965
1931
1945
1950
1927
1941
1939
1949
1931
Value of each predictor variable entered separately
One good place to start when working with a criterion variable and a bunch of potential
predictor variables is to look at the correlations of these variables with each other. This
3
gives you a sense of a how good a job each predictor variable would do if it were used all
by itself to predict the criterion. It will also give you a sense of the degree to which the
predictor variables are correlated with each other. Here’s the correlation matrix for the
four variables in our study.
Correl ations
Pearson Correlation
Sig. (1-tailed)
N
Quality of Life
Quality of Life
1.000
Social Support
.417
Financ ial A ssets
.359
Age
-.282
Quality of Life
.
Social Support
.006
Financ ial A ssets
.017
Age
.050
Quality of Life
35
Social Support
35
Financ ial A ssets
35
Age
35
Social
Support
.417
1.000
.062
-.417
.006
.
.362
.006
35
35
35
35
Financ ial
As sets
.359
.062
1.000
-.118
.017
.362
.
.250
35
35
35
35
Age
-.282
-.417
-.118
1.000
.050
.006
.250
.
35
35
35
35
You can see that the strongest correlation of a predictor variable with quality of life is
.417 for social support. Squaring this correlation tells us that scores for social support
account for 17.4% of the variability in scores for quality of life. The weakest correlation
of a predictor variable with quality of life is the value of -.282 for age. Squaring this
correlation tells us that scores for age account for 7.95% of the variability in scores for
quality of life.
If someone were to tell you that you were only allowed to use one predictor variable
which one would you choose? Obviously, you’d pick the predictor variable that would
give you the most accurate predicted scores you could get. This means you’d pick the
predictor variable that has the strongest relationship with Quality of Life, which is Social
Support.
Okay. You enter Social Support in the regression routine of a program like SPSS to
predict scores for Quality of Life. The output tells you that Social Support accounts for
17.4% of the variability in Quality of Life and that the Standard Error of Estimate is 2.79.
The output from SPSS is shown below.
Model Summary
Model
1
R
.417a
R Square
.174
Adjusted
R Square
.149
a. Predictors: (Constant), Social Support
Std. Error of
the Estimate
2.78703
4
Coeffi cientsa
Model
1
(Const ant)
Social Support
Unstandardized
Coeffic ients
B
St d. Error
7.184
1.442
.133
.051
St andardiz ed
Coeffic ients
Beta
.417
t
4.982
2.633
Sig.
.000
.013
a. Dependent Variable: Quality of Life
SPSS’s Coefficients table above shows you that the y-intercept (the Constant) is 7.184
and the slope of the regression line is .133. This means that the regression equation for
predicting scores for Quality of Life from scores for Social Support is:
Predicted score for Quality of Life = 7.184 + .133(Score for Social Support)
The ANOVA table provided by SPSS shows us that scores for Social Support account for
a significant amount of variability in scores for Quality of Life.
ANOVAb
Model
1
Regres sion
Residual
Total
Sum of
Squares
53.842
256.329
310.171
df
1
33
34
Mean Square
53.842
7.768
F
6.932
Sig.
.013a
a. Predic tors: (Constant), Soc ial Support
b. Dependent Variable: Quality of Life
The F-test for the regression equation is significant, so we know that we’re not wasting
our time using regression. Predicted scores for Quality of Life based on scores support
are significantly more accurate than just using the mean score for Quality of Life. We
also know that social support, all by itself, accounts for 17.4 percent of the variability in
scores for quality of life.
Using two predictors to predict Quality of Life
Okay. Regression with one predictor variable; you’ve done that before. Now, let’s say
you ask SPSS to use two predictor variables to help you to make a best guess for
someone’s score for Quality of life. It’s just as easy to ask SPSS to use two predictors as
to ask it to use one. Just move the two predictor variables to want to use into the
Independent(s) box, make sure the Method selected is Enter, and then hit the OK button.
No big deal.
Let’s say we include the two predictor variables that are the most strongly correlated with
quality of life. This means that we’re going to use Social Support (r = .417) and Financial
Assets (r = .359). Here’s the Model Summary using Social Support and Financial Assets
as predictors:
5
Model Summary
Model
1
R
.534a
R Square
.285
Adjusted
R Square
.240
Std. Error of
the Estimate
2.63271
a. Predictors: (Constant), Financial Assets, Social Support
There are a couple of things to notice right off the bat. You’ll see the symbol R at the top
of the second column. The value for R is .534. This is considerably higher than the value
of .417 that we got when Social Support was the only predictor. This value for R is
known as a multiple correlation. It represents the correlation between the set of two
predictor variables and the one variable being predicted. Because correlations between
individual predictors and the criterion can be either positive or negative the multiple
correlation is always reported as a positive number. Another way of thinking about it is to
say that it represents the correlation between predicted scores for Y (which are based on
information from the two predictors) and actual scores for Y. It’s a little more obvious
that R is always going to be a positive number because the correlation between predicted
scores and actual scores is always going to end up as a positive number – higher
predicted scores for Y are going to be associated with higher actual scores for Y.
The column to the right of it is labeled “R Square”. The value for R Square of .285
represents the squared multiple correlation between the multiple (two) predictors and
the one criterion variable. This number indicates that the combination of our two
predictors is able to account for 28.5 percent of the variability in scores for Quality of
life. This is quite a bit more than the percentage of variability accounted for by Social
Support alone (17.3). So, adding in Financial Assets as a predictor variable gives us an
additional 11.2 percent of variability accounted for (i.e., 28.5 – 17.3 = 11.2). That makes
it sound like it was worth adding Financial Assets to the equation.
Another metric of whether adding a second predictor variable is really worth our while is
to look at the standard error of estimate displayed in the last column on the right. A
standard error of estimate in multiple regression means exactly the same thing as it meant
in simple regression. It’s the average amount that our predicted scores are off by -- it’s
just that here we’re basing our best guesses on two pieces of information, rather than one
just one piece of information. The value for the standard error of estimate displayed in the
Model Summary is 2.63. That means that predicted scores for Quality of life (based on
using Social Support and Financial Assets as predictors) are going to differ from actual
scores for Quality of Life by an average of 2.63 points. When we only used Social
Support the standard error of estimate was 2.79. Adding that second predictor variable
doesn’t seem that it improved the standard error of estimate by all that much, but, in fact,
it went down by 5.5%.
Okay, the next thing SPSS gives us is an F-test of whether this combination of two
predictor variables accounts for a significant amount of variability in scores on the
criterion. Here it is:
6
ANOVAb
Model
1
Regres sion
Residual
Total
Sum of
Squares
88.374
221.798
310.171
df
2
32
34
Mean Square
44.187
6.931
F
6.375
Sig.
.005a
a. Predic tors: (Constant), Financial As sets , Social Support
b. Dependent Variable: Quality of Life
You’ll notice that the format for the ANOVA table is exactly the same as when we had
one predictor variable, SS Regression, SS Residual, etc. One place where things do look a
little different is in the degrees of freedom column. Take a look at the number of degrees
of freedom for the Regression row. It’s “2”. That’s because the number of degrees of
freedom regression is equal to the number of predictor variables and we’ve got two
predictor variables (k = 2). The number of degrees of freedom for the Residual row is 32.
That comes from starting with 35 participants (N = 35) and then subtracting the number
of predictor variables (k = 2) and then subtracting one more degree of freedom. The
equation for the number of degrees of freedom residual is N – k – 1. So, we get 35 – 2 – 1
= 32. The total number of degrees of freedom stays the same at 34 (N – 1).
Another thing to notice about the ANOVA table is that you can use it to calculate the
multiple squared correlation. The squared multiple correlation represents the proportion
of variability accounted for by the two predictor variables. This value is equal to the sum
of squares accounted for by the regression equation (SS Regression = 88.374) divided by
the sum of squares total (310.171). 88.374 divided by 310.171 equals .285, the same
number that the Model Summary gave us before.
The F-ratio from the ANOVA table is 6.375 and statistically significant so we can say
that the regression model using Social Support and Financial Assets as predictors
accounts for a significant amount of variability in scores for Quality of life. Now let’s
look at the Coefficients section of the output.
Coeffi cientsa
Model
1
Unstandardized
Coeffic ient s
B
St d. Error
(Const ant)
6.431
1.403
Social Support
.126
.048
Financ ial Assets 1.75E-006
.000
St andardiz ed
Coeffic ient s
Beta
.396
.334
t
4.583
2.644
2.232
Sig.
.000
.013
.033
a. Dependent Variable: Qualit y of Life
In raw score (unstandardized) units, the regression equation has a y-intercept (constant)
of 6.431, the weighting applied to Social Support is .126, and the weighting applied to
Financial Assets is 1.75 X 10-6 (this means move the decimal place over to the left by 6
places) giving us .00000175. Collecting all of this information in one equation we get…
7
Quality of Life’ = 6.431 + .126(Social Support) + .00000175(Financial Assets)
When you add another predictor variable, all that happens is that you add an additional
component to the equation. You can add as many predictor variables as you want. Each
predictor in that model will get a weighting or coefficient that states what you would
need to multiply a score for the predictor by in order to maximize the predictive power of
the equation. By the way, the reason the coefficient applied to the Financial Assets
predictor is such a small number is that the units of measurement for this variable are so
large (tens or hundreds of thousands of dollars).
In general, the format for a regression equation looks like this…
Y’ = a + b1X1 + b2X2 + b3X3 + … bkXk
… where k equals the number of predictor variables.
Once you get the regression equation using SPSS, the process of getting predicted scores
is the same as with simple regression. Plug the raw scores for the predictor variables into
the equation and there you are. Let’s say that a person has a score on the measure of
Social Support of 20 and they have 50,000 in financial assets. When we plug these
numbers into the regression equation we get…
Quality of Life’ = 6.431 + .126*23 + .00000175*50,000
= 6.431 + 2.52 + .0875 = 9.42
The criterion for selecting regression coefficients in multiple regression
When there are two predictor variables it takes three regression coefficients to write the
regression equation (y-intercept, coefficient for predictor 1, coefficient for predictor 2).
SPSS gave us values of 6.431, .126, and .0000175 for these three coefficients,
respectively. So why these numbers? What makes them so special? The answer at the
surface is that this combination of numbers allows us to write the regression equation that
gives us the most accurate predicted values possible for Quality of Life. So how do we
know that these numbers do this? What’s the criterion for knowing that we’re getting
the most accurate predicted scores possible?
It turns out that we’ve already talked about the answer to this question. It’s the same
answer that we talked about using one predictor variable in Chapter XX. Remember, the
whole point of regression is to obtain predicted scores for Y that are as close to the actual
scores for Y as we can get. Another way of putting this is that we want to make the
differences between predicted scores for Y (values for Y’) and actual scores for Y (values
for Y) as small as possible. In other words we want the average error of prediction (Y –
Y’) to be as small as we can make it. A statistician will know that this is happening when
a regression equation produces values for Y’ that make the sum of squared errors of
8
prediction as small as it can be. The criterion for knowing that we’re using the best
regression equation possible is that it results in the fact that…
Σ(Y – Y’)2 is a minimum.
This is the same thing we said when we’re talking about simple regression! The equation
for a, b1, and b2 are arranged the way they are in order to make this happen. If we’re
using a program like SPSS, we don’t really need to know what the equations are, but
when SPSS uses them that’s what they do.
The regression equation as the equation for a straight line
Another way of thinking about the coefficients in multiple regression goes back to the
original way we talked about regression, in the context of using one predictor variable. If
you remember, we had two variables, X and Y, and we plotted people’s scores for both
variables in a scatterplot. The scatterplot showed the pattern of the relationship between
the two variables. We then said that we wanted to capture this relationship by running a
straight line as close to these points in the scatterplot as we could. The regression
equation was the equation for this straight line. Well, the regression equation we wrote
above for the situation when we had two predictor variables is also the equation for a
straight line. This probably seems like a stretch. I mean, to draw a straight line you just
need a y-intercept and a slope, right? True, if you’re drawing that straight line on a flat
surface.
When there are two variables to deal with there are two axes for the scatterplot: up-down
(Y) and left-right (X). These two dimensions define a flat surface. When there are two
predictor variables there are three variables involved. This means if you want to show
someone a picture of where that person is in terms of their scores on all three variables
you’re going to have to show them a drawing with three dimensions – we’re going to
have to add a third axis (back to front) for our second predictor variable. The graph below
shows the location of one subject within this three dimensional space. Instead of a point
set within a 2-D surface, we now have, essentially, a bubble, floating in this 3-D space.
The line going from the bubble to the floor of the graph is there to help us to get a sense
of exactly where the bubble is floating.
9
Figure X.1
Quality of Life
20.00
15.00
10.00
5.00
0.00
0.00
5.00
10.00 15.00
20.00 25.00
30.00
Social Sup
5000
1000 0.00
1500
2000 00. 00.
00
00. 00
00
0.00
ssets
cial A
n
a
n
i
F
port
OK, the graph above shows where one subject is located. Now let’s look at the 3-D
scatterplot of where all 35 participants are located.
Figure X.2
20.00
Quality of Life
18.00
16.00
14.00
12.00
10.00
8.00
6.00
20.00
Social Sup
40.00
port
5000
1000 00.
1500
2000 000. 000. 00
00
2500 000.
00
000. 00
00
0.00
ets
l Ass
a
i
c
n
Fina
When there are two predictor variables, the regression equation is really the equation
for how to draw a straight line in three dimensions. It’s the line that runs as close to
those bubbles floating in that 3-D space as we can get. When there are more than two
predictors variables you need more than three axes and three dimensions to capture all of
the information you have about each subject. That means you have to try to picture a
straight line in more than three dimensions, which is, like, impossible, unless you’re
10
Stephen Hawking or something. Fortunately, the math stuff works out so that no matter
how many predictor variables we have, SPSS can give us the regression coefficients we
need.
Maximizing the proportion of variability accounted for
OK, I give. Enough about the fourth or fifth dimensions. Just tell me what I’m supposed
to do. Alright. Here’s a question. You want to maximize the proportion of variability
accounted for. You’ve got three predictor variables available to you. Which ones should
you use? What do you do?
If you want to maximize the proportion of variability accounted for – if you want to
obtain predicted values for Y that are as accurate as you could have – use all the
predictor variables you’ve got! We’ve got three predictor variables available to us. Let’s
see how much variability we can account for when we include all three in one regression
model.
Ok, we go back to the Linear Regression window and enter Social Support, Financial
Assets, and Age. Here’s the output…
Model Summary
Model
1
R
.541a
R Square
.292
Adjusted
R Square
.224
Std. Error of
the Estimate
2.66092
a. Predictors: (Constant), Age, Financial Ass ets, Social
Support
ANOVAb
Model
1
Regres sion
Residual
Total
Sum of
Squares
90.675
219.496
310.171
df
3
31
34
Mean Square
30.225
7.081
F
4.269
Sig.
.012a
a. Predic tors: (Constant), Age, Financial Asset s, Social Support
b. Dependent Variable: Quality of Life
Coefficientsa
Model
1
(Constant)
Social Support
Financial Assets
Age
Unstandardized
Coefficients
B
Std. Error
8.518
3.926
.114
.053
1.70E-006
.000
-.026
.046
a. Dependent Variable: Quality of Life
Standardized
Coefficients
Beta
.357
.325
-.095
t
2.170
2.147
2.139
-.570
Sig.
.038
.040
.040
.573
11
When all of the predictor variables are entered the multiple squared correlation goes up to
.292 from the value of .285 we got when only two predictors were used. We didn’t gain
all that much – only an additional .7 percent – but the question was how to make the
proportion of variability accounted for as large as it could be, and that’s it. Accounting
for 29.2 percent of the variability is as good as we can do. If you look at the t-tests on the
right side of the “coefficients” window, you’ll notice that only the regression coefficients
for Social Support and Financial Assets are significant. The beta weight for Age is only .095, which carries a significance level of only .573. This tells us that only Social Support
and Financial Assets contribute significantly to the regression equation.
Automated strategies for selecting predictor variables
Ok, to maximize the proportion of variability accounted for, use every predictor variable.
However, using every predictor variable may not always be very practical. For example,
in the analysis with three predictors described above, Age contributed almost nothing to
the regression equation. It probably doesn’t make sense to include a predictor in a
regression equation if we’re unable to reject the null hypothesis that the variable
contributes nothing! In another research context it might be the case that a given variable
might contribute an extra half a percent to the predictive power of a regression model, but
maybe it costs $50 to obtain a score on that variable for each person. The expense of
obtaining scores for that variable may not be worth the ability to get predicted scores that
are trivially more accurate.
So, if we’ve got data from a bunch of potential predictor variables how do we decide
which predictors to use and which ones to leave out? How do we account for the most
variability using the fewest number of predictor variables? In other words, in most
situations we’re not going to want to use every possible predictor. We’re going to want to
use the most efficient regression equation possible. We want a regression equation that is
lean and mean!
We just said that we want a regression equation that is efficient. To generate a rule for
selecting predictor variables we need a definition for what it means for a regression
equation to be efficient. SPSS provides a number of automated procedures for selecting
predictor variables. These procedures differ from each other in terms of the definitions
they use for what it means for an equation to be efficient. These definitions translate into
rules or algorithms for selecting some variables for inclusion and leaving out others.
I’m going to describe three of these algorithms for selecting predictor variables, the
Forward, Backward, and Stepwise methods. To explain how they work I’m going to use a
type of diagram known as a Venn Diagram. The Venn diagram is cool because it shows
the degree to which the predictor variable overlap with the criterion and, in addition, the
degree to which each predictor variable overlaps with each other. Here’s a sample Venn
Diagram. I’ve labeled the variables Y, X1, X2, and X3.
12
Figure X.3
Before we talk about the Forward method, let’s look at the Venn diagram above. The area
of the largest circle represents the total amount of variability in the criterion variable Y.
The other three circles represent the amounts of variability in the predictor variables X1,
X2, and X3, respectively. The numbers represent proportions of variability. So the number
.65 tells us that 65% of the variability in scores for Y is not overlapped or accounted for
by any other three predictors. The other numbers correspond to proportions of variability
in regions in Y that are overlapped by one or more predictor variables. In other words,
these numbers can be interpreted in squared correlation units.
Let’s say we want to determine the proportion of overlap between X1 and Y. There are
three regions in the diagram that contribute to the overlap of X1 with Y. These are the
regions with the numbers .08, .07, and .03. They add up to a total of .18 which tells us
that X1 by itself accounts for 18% of the variability in Y. The diagram also shows us the
degree to which any two or three predictor variables overlap with each other. For
example, There are two regions that comprise the overlap between X1 and X2. These are
the regions with the proportions of variability of .07 and .03.
Now let’s use a Venn diagram to illustrate the logic that SPSS goes through in executing
the Forward method.
13
The Forward method for selecting predictor variables
The goal of all three algorithms for selecting predictors is to account for the most
variability in the criterion (Y) with the fewest predictor variables. That means we want to
pick individual predictors that have the most overlap with Y and, at the same time, have
the least amount of overlap with each other. We don’t want overlap among the predictors
because accounting for the same variability twice (redundant predictors) doesn’t help the
equation at all.
The Forward method starts when the researcher selects a set of variables for SPSS to pick
from. All the researcher has to do is pick the variables and then click OK. SPSS does the
rest.
The first thing SPSS will do is to identify the predictor variable that would do the best job
all by itself in predicting scores for Y. In other words, SPSS will find the predictor
variable that has highest squared correlation with Y. Go back and take a look at our Venn
diagram. Which of the three predictors has the highest squared correlation with Y? This
is the same thing as asking which of the predictors has the greatest amount of overlap
with Y. By adding up the various regions that correspond to each predictor’s overlap with
Y we get:
R2(Y*X1) = .18
R2(Y*X2) = .21
R2(Y*X3) = .17
This tells us that if the researcher were only allowed to use one predictor variable they
ought to use X2. This is because it accounts for a larger percentage of the variability in Y
than either of the other two predictors (21%). So, at this point SPSS will test this
proportion of variability to see if it’s significant. If it’s not, the whole process ends and
SPSS concludes that none of the predictors are worthy anything. After all, if the very best
predictor of the bunch fails to account for a significant amount of variability then none of
the other predictors will either. If this happens the researcher should probably make the L
sign over their forehead and go home a sulk for a while. When they come back, the only
thing the researcher can do then is to go find a different set of predictors that are more
strongly correlated with the criterion. But hopefully it won’t come to this and SPSS will
report that the contribution of this first variable is significant.
Now, when SPSS tests this squared correlation of .21 the question of whether .21 is good
enough to be significant will be determined by the sample size of the study. With a
decent sample size, .21 will be good enough. With a small sample size, it won’t. Because
we want to be able to talk about the logic of the procedure let’s say that, hypothetically,
just for the sake of argument, a variable has to make a contribution of 4% to the equation
to consider it statistically significant. In the example from the Venn diagram X2 by itself
accounts for 21 percent of the variability in Y. We’re saying that it only has to account
14
for 4 percent to consider that contribution significant. Twenty-one percent is clearly
greater than four percent so let’s say that the contribution of X2 is significant.
In the Forward method, once a variable has entered the regression equation it never
comes out. So, X2 is in. But the procedure isn’t over. Now SPSS will decide whether it
wants to add any more predictors to the equation. X2 already covers a fair amount of the
territory accounted for by the other predictors. In order to add to the predictive power of
the equation we need to think about which of the two remaining predictors would
contribute the most to the equation. We need to calculate the unique contributions of
both X1 and X3. The unique contribution of a predictor variable represents the
proportion of variability that the predictor adds to the regression equation above and
beyond the proportion of variability of variability already accounted for by the
predictor or predictors already entered into the equation.
So what’s the unique contribution of X1? That’s the proportion of variability that X1
accounts for in Y that X2 doesn’t already account for. From the Venn diagram this is
going to be a proportion of variability of .08 or 8%. OK, so what’s the unique
contribution of X3? That’s the proportion of variability that X3 accounts for in Y that X2
doesn’t already account for. From the Venn diagram it looks the unique contribution of
X3 is a proportion of variability of .06 or 6%.
In the second step of the Forward method, SPSS determines which of the remaining
predictors makes the largest unique contribution to the equation and then it tests this
amount of variability to see if it’s significant. X1 makes the largest unique contribution
(8%). This percentage of variability is larger than the hypothetical minimum of 4% we
said that it takes for a contribution to be significant. Therefore SPSS will accept X1 into
the equation to join X2 because it is adding something to the equation that X2 doesn’t
already do. At this point, if the unique contribution of X1 had not been significant, the
process would have stopped at SPSS would decide that the final regression equation
would only contain X2. But, the contribution of X1 is significant so now the equation has
two predictors: X1 and X2.
So now what does the program do? It goes through the same process all over again. It
determines which of the remaining predictors makes the largest unique contribution to the
equation and it tests this unique contribution to see if it’s significant. If it’s not, the
process stops and SPSS only uses the predictors it’s entered up to that point. If the unique
contribution is significant then it adds that predictor variable into the equation and it goes
on to identify the remaining predictor that makes the largest unique contribution to the
equation. Eventually, the overlaps among the predictors become so great that none of the
remaining predictors contribute anything significant above and beyond what predictors
already in the equation are able to do. This is the point at which the Forward method
stops and the point where no more predictors are entered into the equation.
At this point the only remaining predictor is X3. It’s unique contribution is 6%. SPSS
tests this unique contribution and finds that it is significant so it adds X3 into the model,
15
as well. Therefore, using the Forward method, the final regression equation will include
all three predictor variables.
The strategy of the Forward method is that it starts with no predictors and it adds them
one at a time in the order of the variables that make the largest unique contributions to the
equation. The Forward method stops when the largest unique contribution of predictors
not already in the equation fails to reach significance. In the Forward method, once a
variable enters the equation it can never be removed.
In a sense, the Forward method is like tenure. Once a faculty member is tenured, you
can’t get rid of them, no matter if they contribute practically nothing to anyone.
The Backward method
The Forward method started with no predictors in the model and then added predictors
one at a time in the order in which they could account for additional portions of
variability in Y. The Forward method stopped when the largest unique contribution
among the remaining predictors was not statistically significant. The Backward method
starts from the opposite point.
The Backward method starts with all of the predictors in the model. In the first step SPSS
determines which predictor variable makes the smallest unique contribution in predicting
scores for Y. Using the example of our Venn diagram, when all of the variables have
been entered, the unique contribution of X1 is a proportion of .08 (8%), the unique
contribution of X2 is a proportion of .03 (3%), and the unique contribution of X3 is a
proportion of.06 (3%). So, X2 has the smallest unique contribution because it only adds
three percent above and beyond what the other two predictors already account for. SPSS
tests this weakest unique contribution to see if it’s significant. If the weakest unique
contribution is significant SPSS keeps this variable in the equation and the process stops.
The reasoning is that if the weakest contribution is significant then all of the variables
must be pulling their own weight. If a unique contribution is not significant then that
predictor is removed.
We said that just for the sake of illustration, a predictor has to make a contribution of four
percent to consider it significant. The three percent contribution of X2 is not significant,
so X2 is removed from the equation.
OK, X2 gets kicked out because it isn’t contributing anything significant. Now we’ve
only got X1 and X2 left in the equation. The next step in the Backward method is to
determine the unique contributions of both remaining variables. It has to re-calculate the
unique contributions of X1 and X3 because is X2 is no longer overlapping with regions
covered by X1 or X3. The only overlap we need to take into account is the overlap
between X1 and X3.
With X2 out of there, the unique contribution of X1 is a proportion of .15 (.08 + .07) or
15%. The unique contribution of X3 is .14 (.06 + .08) or 14%. SPSS would determine that
16
X1 makes the smaller unique contribution (15%). It will test this smallest unique
contribution and find that it is significant. A 15% contribution is greater than our
hypothetical minimum to reach significance, so let’s say that SPSS would find that X1
makes a significant unique contribution.
At this point, the weakest unique contribution is significant so SPSS would stop
removing predictors and the final equation would be one that retains X1 and X3 as
predictor variables.
In summary, the strategy for the Backward method is that it starts with all of the
predictors and it then removes predictors one at a time when they fail to contribute a
significant amount of unique variability. The Backward method stops when the weakest
unique contribution of the variables in the equation is significant.
In this respect, the Backward method is analogous to a company that downsizes to try to
get the most work done using the fewest number of employees.
The Stepwise method
All right. One more. This last strategy for selecting predictor variables is perhaps the
most widely used. It’s referred to as the Stepwise method. The Stepwise method is
similar to the Forward method, except in one important respect. The Stepwise method
starts out the same as the Forward method – with no predictors in the model. In a first
step, SPSS identifies the predictor variables that has the largest squared correlation with
Y. It then tests this first predictor to see if it makes a significant contribution in predicting
scores for Y. In our example, SPSS would again select X2 and enter it into the equation
because it’s contribution of .21 is significant.
In Step 2, just like in the Forward method, SPSS will identify the remaining predictor that
would make the largest unique contribution to the equation. Just like the Forward
method, the Stepwise method will select X1 because its unique contribution (8%) is larger
than the unique contribution of X3 (6%). SPSS tests X1’s unique contribution of 8% to
see if it is significant. If it’s not, the process stops and the Stepwise method determines
that only X2 stays in the equation. If the unique contribution of the new variable, X1, is
significant than the Stepwise method adds X1 to the equation. At this point SPSS will
ALSO go back and it test the unique contribution of the variable already entered, X2. The
thing that makes the Stepwise method different from the Forward method is the fact that
the Stepwise method tests the unique contribution of every variable in the equation at
every step. This way, there’s a way of getting rid of predictor variables that, at some
point, are overlapped by the contributions of enough other predictors that it no longer
makes a significant contribution to the equation.
At this point in step 2, SPSS tests the unique contributions of both X1 (8%) and X2 (11%)
and finds that they are both significant. So, for the moment, they’re both in.
17
In Step 3 the Stepwise method evaluates all of the remaining predictors to see which one
makes the largest unique contribution to the equation. At this point there is only one
remaining predictor, X3. The unique contribution of X3 is 6%. SPSS tests this unique
contribution and finds that it’s significant. Now SPSS goes back and test the unique
contributions of both of the predictors it’s already entered, X1 and X2. The unique
contribution of X1 is still significant at 8%, so SPSS keeps X1. However, when both X1
and X3 are in the model the unique contribution of X2 (3%) is no longer significant, so
SPSS removes it from the equation. It may seem strange, but the single best predictor is
eventually removed because the other predictors together account for almost all of the
variability that it does.
In summary, the Stepwise method is similar to the Forward method, in that it starts with
no predictors and then adds predictor variables one at a time if they make significant
unique contributions to the equation. The stepwise method also provides a way to remove
predictor variables from a regression model after they had been added in an earlier step.
In this respect, the Stepwise method is analogous to Post-Tenure Review at a university.
A professor might be tenured, but if they screw up badly enough you can get rid of them!
Hierarchical regression: testing theories about relationships among variables
The Forward, Backward, and Stepwise methods are automated methods for generating
regression equations that are efficient. They’re great for developing prediction equations
that account for the most variability while using the fewest number of predictors. If the
goal of your work is prediction, then we’ve already talked about a great deal of what you
need to know.
However, many researchers are interested in answering questions that are more
theoretical in nature. The questions often concern an idea about how a set of variables are
related to each other. In words, a researcher has an idea about what the Venn diagram for
a set of variables looks like – and they want to use multiple regression to determine what
the Venn diagram actually looks like.
For example, a researcher might think that Social Support will still account for a
significant amount of variability in scores for Quality of Life after Financial Assets has
already been entered into the equation. In other words, the investigator is predicting that
the relationship between Social Support and Quality of Life is not something that can be
accounted for when the Financial Assets of the family are taken into account. In order to
answer that type of question we need to have control over the order in which each
variable enters a regression equation. SPSS makes it very easy to do this.
18
When the researcher specifies the order in which variables enter a regression equation
this is referred to as hierarchical regression. This process is very different from the
automated strategies we talked about above. In the Forward, Backward, and Stepwise
methods SPSS determined the order in which variables were entered. In hierarchical
regression, the researcher determines the order of entry into the equation. Again, in
hierarchical regression the goal is not to develop an effective prediction equation. The
goal is to test hypothesized relationships among variables.
If you’re using SPSS, the method that corresponds to hierarchical regression is the Enter
method. In the question outlined above, the researcher wants to test the idea that Social
Support still makes a significant unique contribution in predicting scores for Quality of
Life, even after the predictor variable Financial Assets has already been entered into the
equation. To answer this question the researcher needs to first enter Financial Assets in a
regression equation first and see how good a job it does all by itself. One statistic that
comes in very handy in conducting a hierarchical regression analysis is the R Square
Change option. Here, we would tell SPSS to enter Financial Assets in a first block of
predictors. Then the researcher would tell SPSS to enter Social Support in a second block
of predictors.
In its output, SPSS will report two sets of results. It will tell how good a job Financial
Assets does all by itself and it will tell you about how the regression equation did when it
used both predictors. It will also show you the degree to which the predictive power of
the equation improves when that second predictor is added.
OK, here’s what the output looks like…
Variables Entered/Removedb
Model
1
2
Variables
Entered
Financial
a
As sets
Social a
Support
Variables
Removed
Method
.
Enter
.
Enter
a. All requested variables entered.
b. Dependent Variable: Quality of Life
Model Summary
Change Statistics
Model
1
2
R
.359a
.534b
R Square
.129
.285
Adjusted
R Square
.102
.240
Std. Error of
the Estimate
2.86176
2.63271
a. Predictors: (Constant), Financial Assets
b. Predictors: (Constant), Financial Assets, Social Support
R Square
Change
.129
.156
F Change
4.874
6.992
df1
df2
1
1
33
32
Sig. F Change
.034
.013
19
ANOVAc
Model
1
2
Regres sion
Residual
Total
Regres sion
Residual
Total
Sum of
Squares
39.913
270.258
310.171
88.374
221.798
310.171
df
1
33
34
2
32
34
Mean Square
39.913
8.190
F
4.874
Sig.
.034a
44.187
6.931
6.375
.005b
a. Predic tors: (Constant), Financial As sets
b. Predic tors: (Constant), Financial As sets , Social Support
c. Dependent Variable: Quality of Life
Coefficientsa
Model
1
2
(Constant)
Financial Assets
(Constant)
Financial Assets
Social Support
Unstandardized
Coefficients
B
Std. Error
9.773
.662
1.87E-006
.000
6.431
1.403
1.75E-006
.000
.126
.048
Standardized
Coefficients
Beta
.359
.334
.396
t
14.760
2.208
4.583
2.232
2.644
Sig.
.000
.034
.000
.033
.013
a. Dependent Variable: Quality of Life
Most of this output is ready familiar to you. One this that’s different is that we’ve got two
sets of results. The first one, corresponding to the first “Model”, contains information
about the regression equation when only Financial assets is used a predictor. The second
“Model” reports results for the regression equation that uses both Financial Assets and
Social Support as predictors.
The second thing that’s new is contained in the Model Summary table. You can see in the
R Square column that the proportion of variability accounted for when the only predictor
variable is financial Assets .129. The value for R Square goes up to .285 when Social
Support is added to the equation. If you subtracted .129 from .285 you would arrive at a
value of .156. This means that Social Support accounts for an addition 15.6% of the
variability that scores for Financial Assets could not account for.
This last piece of information is also provided in the R Square Change column of the
Model Summary. An R Square Change indicates the amount of change in the proportion
of variability accounted for when an additional block of predictors (in this case one
predictor variable) is added to the equation. The R Square Change for the second model
is the same number we just came up with, .156. In this case, the R Square Change
represents the unique contribution made by Social Support when Financial Assets has
already been entered into the equation.
The F-ratio displayed just to the right of the R Square Change is testing this proportion of
variability – this unique contribution – to see if it is statistically significant. In this case
the F-ratio is 6.992 and its significance level is .013, indicating that Social Support
accounts for a significant proportion of variability in score for Quality of Life, above and
beyond the variability accounted for by Financial Assets.
20
Generating a Venn diagram with two predictor variables
Ok, we now know that the unique contribution of Social Support, above and beyond that
of Financial Assets is 15.5%. In all. When both Social Support and Financial Assets are
used as predictors the equation accounts for 28.5%.
What more would we need to do to generate a Venn diagram that shows us the
relationships of each predictor variable with the criterion and the degree to which the
predictor variables overlap with each other. It turns out that there’s only one more
analysis we need to do. Let’s draw a sample Venn diagram that shows us the information
we have and the information we need.
Figure X.4
This diagram shows us that we need to know the unique contributions for the two
variables and the degree of overlap of the two predictors in accounting for the criterion.
We already know the unique contribution for Social Support. It’s a proportion of .156 or
a percentage of 15.6%. I’ve added this in below…
21
Figure X.5
Now we need to go get the unique contribution of the other predictor variable, Financial
Assets. To do this, we just ask SPSS to do the same thing we did before – we just need to
reverse the order of entry of the variables. In other, enter Social Support first and
Financial Assets second. Here’s what the output from SPSS looks like when we do this…
Variables Entered/Removedb
Model
1
2
Variables
Entered
Social a
Support
Financial
a
As sets
Variables
Removed
Method
.
Enter
.
Enter
a. All requested variables entered.
b. Dependent Variable: Quality of Life
Model Summary
Change Statistics
Model
1
2
R
.417a
.534b
R Square
.174
.285
Adjusted
R Square
.149
.240
Std. Error of
the Estimate
2.78703
2.63271
a. Predictors: (Constant), Social Support
b. Predictors: (Constant), Social Support, Financial As sets
R Square
Change
.174
.111
F Change
6.932
4.982
df1
df2
1
1
33
32
Sig. F Change
.013
.033
22
ANOVAc
Model
1
2
Regres sion
Residual
Total
Regres sion
Residual
Total
Sum of
Squares
53.842
256.329
310.171
88.374
221.798
310.171
df
1
33
34
2
32
34
Mean Square
53.842
7.768
F
6.932
Sig.
.013a
44.187
6.931
6.375
.005b
a. Predic tors: (Constant), Soc ial Support
b. Predic tors: (Constant), Soc ial Support, Financial Ass ets
c. Dependent Variable: Quality of Life
Coefficientsa
Model
1
2
(Constant)
Social Support
(Constant)
Social Support
Financial Assets
Unstandardized
Coefficients
B
Std. Error
7.184
1.442
.133
.051
6.431
1.403
.126
.048
1.75E-006
.000
Standardized
Coefficients
Beta
.417
.396
.334
t
4.982
2.633
4.583
2.644
2.232
Sig.
.000
.013
.000
.013
.033
a. Dependent Variable: Quality of Life
When the variable Financial Assets is entered second the R Square Change is .111. This
means that Financial Assets accounts for 11.1% of the variability in scores for Quality of
Life, above and beyond the variability accounted for by Social Support. In other words,
the unique contribution of Financial Assets is 11.1%. Let’s put the unique contribution
for Financial Assets into our Venn diagram.
Figure X.6
Now the only thing left is the percentage of variability that corresponds to the overlap.
This is easy. We know that when both predictors are used we account for 28.5 percent of
the variability. The overlap must be the percentage of variability that’s left over when we
23
subtract the unique contributions of both variables from this total percentage of
variability accounted for. Doing the math we get…
Total percentage of
Overlap = variability accounted for - Unique for Social Support – Unique for Fin. Assets
Overlap = 28.5% - 15.6% - 11.1% = 1.8%
This tells us that the percentage of overlap of the two predictors in accounting for scores
for Quality of Life is 1.8%. Now we can fill in the last thing we need in the Venn
diagram.
Figure X.7
And that’s it! One way of thinking about hierarchical regression is that it’s like flying a
plane on manual. You get to say which variables enter the equation and when. The
Forward, backward, and Stepwise methods are like flying the plane on autopilot. The
computer knows what the rules are and it will follow those rules without the pilot having
to think about what’s really going on. Both hierarchical regression and the automated
strategies address important goals and objectives – they’re just not the same goals and
objectives.
Download