Multiple Regression Dr. Tom Pierce Department of Psychology Radford University In the previous chapter we talked about regression as a technique for using a person’s score on one variable to make a best guess about that person’s score on another variable. The regression equation, Y’ = a + bX tells you what to do with a person’s score for X to generate the best guess you could make about a person’s score for Y. But why should the researcher have to base their best guess on just one piece of information? Why can’t you take what you know about a person’s scores on two variables, or three, or four, to help you to make that best guess? The answer is that you can. Multiple regression is a set of techniques for generating a predicted score for one variable from two or more predictor variables. And the nice thing about multiple regression is that it’s just an extension of regression with one predictor variable. All of the basic principles we covered in the last chapter still hold true in this chapter. Let’s say that a researcher works with Alzheimer’s caregivers. We know that persons in this situation are more likely to suffer from many of the common markers of chronic stress. For example, they are more likely to be depressed, to display impaired immune function, and to report problems with sleep, high blood pressure, and ulcer. The researcher is interested in predicting the quality of life of caregivers two years after a diagnosis of Alzheimer’s disease in the person they’re caring for. The researcher would need to think about the types of information they could collect now that might help them to make an accurate prediction about a score on a measure of quality of life two years from now. One variable that might be helpful is the amount of social support the caregiver has access to. A second potential predictor variable might be the age of the caregiver. A third predictor variable might be the available financial assets to which the family has access. Obviously, a study of this type would attempt to collect data from dozens, if not hundreds of families affected by Alzheimer’s disease. I’m going to present hypothetical data for 35 caregivers, just so that we have something to talk about. So, imagine a study where measures of social support, caregiver age, and caregiver financial assets are obtained at the time a spouse is diagnosed with Alzheimer’s disease and then a measure of quality of life is obtained two years later. ©Thomas W. Pierce 2005 2 Table X.1. Made-up data for the predictors of scores for quality of life. Social Participant Support ------------- ---------1 16 2 26 3 17 4 27 5 40 6 20 7 28 8 38 9 35 10 21 11 41 12 10 13 26 14 38 15 29 16 36 17 35 18 15 19 23 20 29 21 45 22 23 23 11 24 15 25 33 26 16 27 25 28 16 29 29 30 42 31 19 32 27 33 36 34 34 35 23 Caregiver Age ----------56 44 75 59 58 78 63 44 59 76 50 82 79 69 76 73 68 71 71 75 63 79 75 67 67 54 41 75 61 56 79 65 67 57 75 Financial Quality Assets of Life ----------- --------275,000 12 325,000 8 1,500,000 12 2,100,000 12 560,000 10 790,000 9 1,100,000 12 973,000 18 372,000 11 70,000 8 210,000 10 65,000 5 1,150,000 7 15,000 10 36,000 9 72,000 15 221,000 11 14,000 8 115,000 9 28,000 8 550,000 16 79,000 10 35,000 14 110,000 8 270,000 12 250,000 11 285,000 9 120,000 13 210,000 13 560,000 11 650,000 8 130,000 11 945,000 19 272,000 10 50,000 8 Year Born -------1950 1962 1931 1947 1948 1928 1943 1962 1947 1930 1956 1924 1927 1937 1930 1933 1938 1935 1935 1931 1943 1927 1931 1939 1939 1952 1965 1931 1945 1950 1927 1941 1939 1949 1931 Value of each predictor variable entered separately One good place to start when working with a criterion variable and a bunch of potential predictor variables is to look at the correlations of these variables with each other. This 3 gives you a sense of a how good a job each predictor variable would do if it were used all by itself to predict the criterion. It will also give you a sense of the degree to which the predictor variables are correlated with each other. Here’s the correlation matrix for the four variables in our study. Correl ations Pearson Correlation Sig. (1-tailed) N Quality of Life Quality of Life 1.000 Social Support .417 Financ ial A ssets .359 Age -.282 Quality of Life . Social Support .006 Financ ial A ssets .017 Age .050 Quality of Life 35 Social Support 35 Financ ial A ssets 35 Age 35 Social Support .417 1.000 .062 -.417 .006 . .362 .006 35 35 35 35 Financ ial As sets .359 .062 1.000 -.118 .017 .362 . .250 35 35 35 35 Age -.282 -.417 -.118 1.000 .050 .006 .250 . 35 35 35 35 You can see that the strongest correlation of a predictor variable with quality of life is .417 for social support. Squaring this correlation tells us that scores for social support account for 17.4% of the variability in scores for quality of life. The weakest correlation of a predictor variable with quality of life is the value of -.282 for age. Squaring this correlation tells us that scores for age account for 7.95% of the variability in scores for quality of life. If someone were to tell you that you were only allowed to use one predictor variable which one would you choose? Obviously, you’d pick the predictor variable that would give you the most accurate predicted scores you could get. This means you’d pick the predictor variable that has the strongest relationship with Quality of Life, which is Social Support. Okay. You enter Social Support in the regression routine of a program like SPSS to predict scores for Quality of Life. The output tells you that Social Support accounts for 17.4% of the variability in Quality of Life and that the Standard Error of Estimate is 2.79. The output from SPSS is shown below. Model Summary Model 1 R .417a R Square .174 Adjusted R Square .149 a. Predictors: (Constant), Social Support Std. Error of the Estimate 2.78703 4 Coeffi cientsa Model 1 (Const ant) Social Support Unstandardized Coeffic ients B St d. Error 7.184 1.442 .133 .051 St andardiz ed Coeffic ients Beta .417 t 4.982 2.633 Sig. .000 .013 a. Dependent Variable: Quality of Life SPSS’s Coefficients table above shows you that the y-intercept (the Constant) is 7.184 and the slope of the regression line is .133. This means that the regression equation for predicting scores for Quality of Life from scores for Social Support is: Predicted score for Quality of Life = 7.184 + .133(Score for Social Support) The ANOVA table provided by SPSS shows us that scores for Social Support account for a significant amount of variability in scores for Quality of Life. ANOVAb Model 1 Regres sion Residual Total Sum of Squares 53.842 256.329 310.171 df 1 33 34 Mean Square 53.842 7.768 F 6.932 Sig. .013a a. Predic tors: (Constant), Soc ial Support b. Dependent Variable: Quality of Life The F-test for the regression equation is significant, so we know that we’re not wasting our time using regression. Predicted scores for Quality of Life based on scores support are significantly more accurate than just using the mean score for Quality of Life. We also know that social support, all by itself, accounts for 17.4 percent of the variability in scores for quality of life. Using two predictors to predict Quality of Life Okay. Regression with one predictor variable; you’ve done that before. Now, let’s say you ask SPSS to use two predictor variables to help you to make a best guess for someone’s score for Quality of life. It’s just as easy to ask SPSS to use two predictors as to ask it to use one. Just move the two predictor variables to want to use into the Independent(s) box, make sure the Method selected is Enter, and then hit the OK button. No big deal. Let’s say we include the two predictor variables that are the most strongly correlated with quality of life. This means that we’re going to use Social Support (r = .417) and Financial Assets (r = .359). Here’s the Model Summary using Social Support and Financial Assets as predictors: 5 Model Summary Model 1 R .534a R Square .285 Adjusted R Square .240 Std. Error of the Estimate 2.63271 a. Predictors: (Constant), Financial Assets, Social Support There are a couple of things to notice right off the bat. You’ll see the symbol R at the top of the second column. The value for R is .534. This is considerably higher than the value of .417 that we got when Social Support was the only predictor. This value for R is known as a multiple correlation. It represents the correlation between the set of two predictor variables and the one variable being predicted. Because correlations between individual predictors and the criterion can be either positive or negative the multiple correlation is always reported as a positive number. Another way of thinking about it is to say that it represents the correlation between predicted scores for Y (which are based on information from the two predictors) and actual scores for Y. It’s a little more obvious that R is always going to be a positive number because the correlation between predicted scores and actual scores is always going to end up as a positive number – higher predicted scores for Y are going to be associated with higher actual scores for Y. The column to the right of it is labeled “R Square”. The value for R Square of .285 represents the squared multiple correlation between the multiple (two) predictors and the one criterion variable. This number indicates that the combination of our two predictors is able to account for 28.5 percent of the variability in scores for Quality of life. This is quite a bit more than the percentage of variability accounted for by Social Support alone (17.3). So, adding in Financial Assets as a predictor variable gives us an additional 11.2 percent of variability accounted for (i.e., 28.5 – 17.3 = 11.2). That makes it sound like it was worth adding Financial Assets to the equation. Another metric of whether adding a second predictor variable is really worth our while is to look at the standard error of estimate displayed in the last column on the right. A standard error of estimate in multiple regression means exactly the same thing as it meant in simple regression. It’s the average amount that our predicted scores are off by -- it’s just that here we’re basing our best guesses on two pieces of information, rather than one just one piece of information. The value for the standard error of estimate displayed in the Model Summary is 2.63. That means that predicted scores for Quality of life (based on using Social Support and Financial Assets as predictors) are going to differ from actual scores for Quality of Life by an average of 2.63 points. When we only used Social Support the standard error of estimate was 2.79. Adding that second predictor variable doesn’t seem that it improved the standard error of estimate by all that much, but, in fact, it went down by 5.5%. Okay, the next thing SPSS gives us is an F-test of whether this combination of two predictor variables accounts for a significant amount of variability in scores on the criterion. Here it is: 6 ANOVAb Model 1 Regres sion Residual Total Sum of Squares 88.374 221.798 310.171 df 2 32 34 Mean Square 44.187 6.931 F 6.375 Sig. .005a a. Predic tors: (Constant), Financial As sets , Social Support b. Dependent Variable: Quality of Life You’ll notice that the format for the ANOVA table is exactly the same as when we had one predictor variable, SS Regression, SS Residual, etc. One place where things do look a little different is in the degrees of freedom column. Take a look at the number of degrees of freedom for the Regression row. It’s “2”. That’s because the number of degrees of freedom regression is equal to the number of predictor variables and we’ve got two predictor variables (k = 2). The number of degrees of freedom for the Residual row is 32. That comes from starting with 35 participants (N = 35) and then subtracting the number of predictor variables (k = 2) and then subtracting one more degree of freedom. The equation for the number of degrees of freedom residual is N – k – 1. So, we get 35 – 2 – 1 = 32. The total number of degrees of freedom stays the same at 34 (N – 1). Another thing to notice about the ANOVA table is that you can use it to calculate the multiple squared correlation. The squared multiple correlation represents the proportion of variability accounted for by the two predictor variables. This value is equal to the sum of squares accounted for by the regression equation (SS Regression = 88.374) divided by the sum of squares total (310.171). 88.374 divided by 310.171 equals .285, the same number that the Model Summary gave us before. The F-ratio from the ANOVA table is 6.375 and statistically significant so we can say that the regression model using Social Support and Financial Assets as predictors accounts for a significant amount of variability in scores for Quality of life. Now let’s look at the Coefficients section of the output. Coeffi cientsa Model 1 Unstandardized Coeffic ient s B St d. Error (Const ant) 6.431 1.403 Social Support .126 .048 Financ ial Assets 1.75E-006 .000 St andardiz ed Coeffic ient s Beta .396 .334 t 4.583 2.644 2.232 Sig. .000 .013 .033 a. Dependent Variable: Qualit y of Life In raw score (unstandardized) units, the regression equation has a y-intercept (constant) of 6.431, the weighting applied to Social Support is .126, and the weighting applied to Financial Assets is 1.75 X 10-6 (this means move the decimal place over to the left by 6 places) giving us .00000175. Collecting all of this information in one equation we get… 7 Quality of Life’ = 6.431 + .126(Social Support) + .00000175(Financial Assets) When you add another predictor variable, all that happens is that you add an additional component to the equation. You can add as many predictor variables as you want. Each predictor in that model will get a weighting or coefficient that states what you would need to multiply a score for the predictor by in order to maximize the predictive power of the equation. By the way, the reason the coefficient applied to the Financial Assets predictor is such a small number is that the units of measurement for this variable are so large (tens or hundreds of thousands of dollars). In general, the format for a regression equation looks like this… Y’ = a + b1X1 + b2X2 + b3X3 + … bkXk … where k equals the number of predictor variables. Once you get the regression equation using SPSS, the process of getting predicted scores is the same as with simple regression. Plug the raw scores for the predictor variables into the equation and there you are. Let’s say that a person has a score on the measure of Social Support of 20 and they have 50,000 in financial assets. When we plug these numbers into the regression equation we get… Quality of Life’ = 6.431 + .126*23 + .00000175*50,000 = 6.431 + 2.52 + .0875 = 9.42 The criterion for selecting regression coefficients in multiple regression When there are two predictor variables it takes three regression coefficients to write the regression equation (y-intercept, coefficient for predictor 1, coefficient for predictor 2). SPSS gave us values of 6.431, .126, and .0000175 for these three coefficients, respectively. So why these numbers? What makes them so special? The answer at the surface is that this combination of numbers allows us to write the regression equation that gives us the most accurate predicted values possible for Quality of Life. So how do we know that these numbers do this? What’s the criterion for knowing that we’re getting the most accurate predicted scores possible? It turns out that we’ve already talked about the answer to this question. It’s the same answer that we talked about using one predictor variable in Chapter XX. Remember, the whole point of regression is to obtain predicted scores for Y that are as close to the actual scores for Y as we can get. Another way of putting this is that we want to make the differences between predicted scores for Y (values for Y’) and actual scores for Y (values for Y) as small as possible. In other words we want the average error of prediction (Y – Y’) to be as small as we can make it. A statistician will know that this is happening when a regression equation produces values for Y’ that make the sum of squared errors of 8 prediction as small as it can be. The criterion for knowing that we’re using the best regression equation possible is that it results in the fact that… Σ(Y – Y’)2 is a minimum. This is the same thing we said when we’re talking about simple regression! The equation for a, b1, and b2 are arranged the way they are in order to make this happen. If we’re using a program like SPSS, we don’t really need to know what the equations are, but when SPSS uses them that’s what they do. The regression equation as the equation for a straight line Another way of thinking about the coefficients in multiple regression goes back to the original way we talked about regression, in the context of using one predictor variable. If you remember, we had two variables, X and Y, and we plotted people’s scores for both variables in a scatterplot. The scatterplot showed the pattern of the relationship between the two variables. We then said that we wanted to capture this relationship by running a straight line as close to these points in the scatterplot as we could. The regression equation was the equation for this straight line. Well, the regression equation we wrote above for the situation when we had two predictor variables is also the equation for a straight line. This probably seems like a stretch. I mean, to draw a straight line you just need a y-intercept and a slope, right? True, if you’re drawing that straight line on a flat surface. When there are two variables to deal with there are two axes for the scatterplot: up-down (Y) and left-right (X). These two dimensions define a flat surface. When there are two predictor variables there are three variables involved. This means if you want to show someone a picture of where that person is in terms of their scores on all three variables you’re going to have to show them a drawing with three dimensions – we’re going to have to add a third axis (back to front) for our second predictor variable. The graph below shows the location of one subject within this three dimensional space. Instead of a point set within a 2-D surface, we now have, essentially, a bubble, floating in this 3-D space. The line going from the bubble to the floor of the graph is there to help us to get a sense of exactly where the bubble is floating. 9 Figure X.1 Quality of Life 20.00 15.00 10.00 5.00 0.00 0.00 5.00 10.00 15.00 20.00 25.00 30.00 Social Sup 5000 1000 0.00 1500 2000 00. 00. 00 00. 00 00 0.00 ssets cial A n a n i F port OK, the graph above shows where one subject is located. Now let’s look at the 3-D scatterplot of where all 35 participants are located. Figure X.2 20.00 Quality of Life 18.00 16.00 14.00 12.00 10.00 8.00 6.00 20.00 Social Sup 40.00 port 5000 1000 00. 1500 2000 000. 000. 00 00 2500 000. 00 000. 00 00 0.00 ets l Ass a i c n Fina When there are two predictor variables, the regression equation is really the equation for how to draw a straight line in three dimensions. It’s the line that runs as close to those bubbles floating in that 3-D space as we can get. When there are more than two predictors variables you need more than three axes and three dimensions to capture all of the information you have about each subject. That means you have to try to picture a straight line in more than three dimensions, which is, like, impossible, unless you’re 10 Stephen Hawking or something. Fortunately, the math stuff works out so that no matter how many predictor variables we have, SPSS can give us the regression coefficients we need. Maximizing the proportion of variability accounted for OK, I give. Enough about the fourth or fifth dimensions. Just tell me what I’m supposed to do. Alright. Here’s a question. You want to maximize the proportion of variability accounted for. You’ve got three predictor variables available to you. Which ones should you use? What do you do? If you want to maximize the proportion of variability accounted for – if you want to obtain predicted values for Y that are as accurate as you could have – use all the predictor variables you’ve got! We’ve got three predictor variables available to us. Let’s see how much variability we can account for when we include all three in one regression model. Ok, we go back to the Linear Regression window and enter Social Support, Financial Assets, and Age. Here’s the output… Model Summary Model 1 R .541a R Square .292 Adjusted R Square .224 Std. Error of the Estimate 2.66092 a. Predictors: (Constant), Age, Financial Ass ets, Social Support ANOVAb Model 1 Regres sion Residual Total Sum of Squares 90.675 219.496 310.171 df 3 31 34 Mean Square 30.225 7.081 F 4.269 Sig. .012a a. Predic tors: (Constant), Age, Financial Asset s, Social Support b. Dependent Variable: Quality of Life Coefficientsa Model 1 (Constant) Social Support Financial Assets Age Unstandardized Coefficients B Std. Error 8.518 3.926 .114 .053 1.70E-006 .000 -.026 .046 a. Dependent Variable: Quality of Life Standardized Coefficients Beta .357 .325 -.095 t 2.170 2.147 2.139 -.570 Sig. .038 .040 .040 .573 11 When all of the predictor variables are entered the multiple squared correlation goes up to .292 from the value of .285 we got when only two predictors were used. We didn’t gain all that much – only an additional .7 percent – but the question was how to make the proportion of variability accounted for as large as it could be, and that’s it. Accounting for 29.2 percent of the variability is as good as we can do. If you look at the t-tests on the right side of the “coefficients” window, you’ll notice that only the regression coefficients for Social Support and Financial Assets are significant. The beta weight for Age is only .095, which carries a significance level of only .573. This tells us that only Social Support and Financial Assets contribute significantly to the regression equation. Automated strategies for selecting predictor variables Ok, to maximize the proportion of variability accounted for, use every predictor variable. However, using every predictor variable may not always be very practical. For example, in the analysis with three predictors described above, Age contributed almost nothing to the regression equation. It probably doesn’t make sense to include a predictor in a regression equation if we’re unable to reject the null hypothesis that the variable contributes nothing! In another research context it might be the case that a given variable might contribute an extra half a percent to the predictive power of a regression model, but maybe it costs $50 to obtain a score on that variable for each person. The expense of obtaining scores for that variable may not be worth the ability to get predicted scores that are trivially more accurate. So, if we’ve got data from a bunch of potential predictor variables how do we decide which predictors to use and which ones to leave out? How do we account for the most variability using the fewest number of predictor variables? In other words, in most situations we’re not going to want to use every possible predictor. We’re going to want to use the most efficient regression equation possible. We want a regression equation that is lean and mean! We just said that we want a regression equation that is efficient. To generate a rule for selecting predictor variables we need a definition for what it means for a regression equation to be efficient. SPSS provides a number of automated procedures for selecting predictor variables. These procedures differ from each other in terms of the definitions they use for what it means for an equation to be efficient. These definitions translate into rules or algorithms for selecting some variables for inclusion and leaving out others. I’m going to describe three of these algorithms for selecting predictor variables, the Forward, Backward, and Stepwise methods. To explain how they work I’m going to use a type of diagram known as a Venn Diagram. The Venn diagram is cool because it shows the degree to which the predictor variable overlap with the criterion and, in addition, the degree to which each predictor variable overlaps with each other. Here’s a sample Venn Diagram. I’ve labeled the variables Y, X1, X2, and X3. 12 Figure X.3 Before we talk about the Forward method, let’s look at the Venn diagram above. The area of the largest circle represents the total amount of variability in the criterion variable Y. The other three circles represent the amounts of variability in the predictor variables X1, X2, and X3, respectively. The numbers represent proportions of variability. So the number .65 tells us that 65% of the variability in scores for Y is not overlapped or accounted for by any other three predictors. The other numbers correspond to proportions of variability in regions in Y that are overlapped by one or more predictor variables. In other words, these numbers can be interpreted in squared correlation units. Let’s say we want to determine the proportion of overlap between X1 and Y. There are three regions in the diagram that contribute to the overlap of X1 with Y. These are the regions with the numbers .08, .07, and .03. They add up to a total of .18 which tells us that X1 by itself accounts for 18% of the variability in Y. The diagram also shows us the degree to which any two or three predictor variables overlap with each other. For example, There are two regions that comprise the overlap between X1 and X2. These are the regions with the proportions of variability of .07 and .03. Now let’s use a Venn diagram to illustrate the logic that SPSS goes through in executing the Forward method. 13 The Forward method for selecting predictor variables The goal of all three algorithms for selecting predictors is to account for the most variability in the criterion (Y) with the fewest predictor variables. That means we want to pick individual predictors that have the most overlap with Y and, at the same time, have the least amount of overlap with each other. We don’t want overlap among the predictors because accounting for the same variability twice (redundant predictors) doesn’t help the equation at all. The Forward method starts when the researcher selects a set of variables for SPSS to pick from. All the researcher has to do is pick the variables and then click OK. SPSS does the rest. The first thing SPSS will do is to identify the predictor variable that would do the best job all by itself in predicting scores for Y. In other words, SPSS will find the predictor variable that has highest squared correlation with Y. Go back and take a look at our Venn diagram. Which of the three predictors has the highest squared correlation with Y? This is the same thing as asking which of the predictors has the greatest amount of overlap with Y. By adding up the various regions that correspond to each predictor’s overlap with Y we get: R2(Y*X1) = .18 R2(Y*X2) = .21 R2(Y*X3) = .17 This tells us that if the researcher were only allowed to use one predictor variable they ought to use X2. This is because it accounts for a larger percentage of the variability in Y than either of the other two predictors (21%). So, at this point SPSS will test this proportion of variability to see if it’s significant. If it’s not, the whole process ends and SPSS concludes that none of the predictors are worthy anything. After all, if the very best predictor of the bunch fails to account for a significant amount of variability then none of the other predictors will either. If this happens the researcher should probably make the L sign over their forehead and go home a sulk for a while. When they come back, the only thing the researcher can do then is to go find a different set of predictors that are more strongly correlated with the criterion. But hopefully it won’t come to this and SPSS will report that the contribution of this first variable is significant. Now, when SPSS tests this squared correlation of .21 the question of whether .21 is good enough to be significant will be determined by the sample size of the study. With a decent sample size, .21 will be good enough. With a small sample size, it won’t. Because we want to be able to talk about the logic of the procedure let’s say that, hypothetically, just for the sake of argument, a variable has to make a contribution of 4% to the equation to consider it statistically significant. In the example from the Venn diagram X2 by itself accounts for 21 percent of the variability in Y. We’re saying that it only has to account 14 for 4 percent to consider that contribution significant. Twenty-one percent is clearly greater than four percent so let’s say that the contribution of X2 is significant. In the Forward method, once a variable has entered the regression equation it never comes out. So, X2 is in. But the procedure isn’t over. Now SPSS will decide whether it wants to add any more predictors to the equation. X2 already covers a fair amount of the territory accounted for by the other predictors. In order to add to the predictive power of the equation we need to think about which of the two remaining predictors would contribute the most to the equation. We need to calculate the unique contributions of both X1 and X3. The unique contribution of a predictor variable represents the proportion of variability that the predictor adds to the regression equation above and beyond the proportion of variability of variability already accounted for by the predictor or predictors already entered into the equation. So what’s the unique contribution of X1? That’s the proportion of variability that X1 accounts for in Y that X2 doesn’t already account for. From the Venn diagram this is going to be a proportion of variability of .08 or 8%. OK, so what’s the unique contribution of X3? That’s the proportion of variability that X3 accounts for in Y that X2 doesn’t already account for. From the Venn diagram it looks the unique contribution of X3 is a proportion of variability of .06 or 6%. In the second step of the Forward method, SPSS determines which of the remaining predictors makes the largest unique contribution to the equation and then it tests this amount of variability to see if it’s significant. X1 makes the largest unique contribution (8%). This percentage of variability is larger than the hypothetical minimum of 4% we said that it takes for a contribution to be significant. Therefore SPSS will accept X1 into the equation to join X2 because it is adding something to the equation that X2 doesn’t already do. At this point, if the unique contribution of X1 had not been significant, the process would have stopped at SPSS would decide that the final regression equation would only contain X2. But, the contribution of X1 is significant so now the equation has two predictors: X1 and X2. So now what does the program do? It goes through the same process all over again. It determines which of the remaining predictors makes the largest unique contribution to the equation and it tests this unique contribution to see if it’s significant. If it’s not, the process stops and SPSS only uses the predictors it’s entered up to that point. If the unique contribution is significant then it adds that predictor variable into the equation and it goes on to identify the remaining predictor that makes the largest unique contribution to the equation. Eventually, the overlaps among the predictors become so great that none of the remaining predictors contribute anything significant above and beyond what predictors already in the equation are able to do. This is the point at which the Forward method stops and the point where no more predictors are entered into the equation. At this point the only remaining predictor is X3. It’s unique contribution is 6%. SPSS tests this unique contribution and finds that it is significant so it adds X3 into the model, 15 as well. Therefore, using the Forward method, the final regression equation will include all three predictor variables. The strategy of the Forward method is that it starts with no predictors and it adds them one at a time in the order of the variables that make the largest unique contributions to the equation. The Forward method stops when the largest unique contribution of predictors not already in the equation fails to reach significance. In the Forward method, once a variable enters the equation it can never be removed. In a sense, the Forward method is like tenure. Once a faculty member is tenured, you can’t get rid of them, no matter if they contribute practically nothing to anyone. The Backward method The Forward method started with no predictors in the model and then added predictors one at a time in the order in which they could account for additional portions of variability in Y. The Forward method stopped when the largest unique contribution among the remaining predictors was not statistically significant. The Backward method starts from the opposite point. The Backward method starts with all of the predictors in the model. In the first step SPSS determines which predictor variable makes the smallest unique contribution in predicting scores for Y. Using the example of our Venn diagram, when all of the variables have been entered, the unique contribution of X1 is a proportion of .08 (8%), the unique contribution of X2 is a proportion of .03 (3%), and the unique contribution of X3 is a proportion of.06 (3%). So, X2 has the smallest unique contribution because it only adds three percent above and beyond what the other two predictors already account for. SPSS tests this weakest unique contribution to see if it’s significant. If the weakest unique contribution is significant SPSS keeps this variable in the equation and the process stops. The reasoning is that if the weakest contribution is significant then all of the variables must be pulling their own weight. If a unique contribution is not significant then that predictor is removed. We said that just for the sake of illustration, a predictor has to make a contribution of four percent to consider it significant. The three percent contribution of X2 is not significant, so X2 is removed from the equation. OK, X2 gets kicked out because it isn’t contributing anything significant. Now we’ve only got X1 and X2 left in the equation. The next step in the Backward method is to determine the unique contributions of both remaining variables. It has to re-calculate the unique contributions of X1 and X3 because is X2 is no longer overlapping with regions covered by X1 or X3. The only overlap we need to take into account is the overlap between X1 and X3. With X2 out of there, the unique contribution of X1 is a proportion of .15 (.08 + .07) or 15%. The unique contribution of X3 is .14 (.06 + .08) or 14%. SPSS would determine that 16 X1 makes the smaller unique contribution (15%). It will test this smallest unique contribution and find that it is significant. A 15% contribution is greater than our hypothetical minimum to reach significance, so let’s say that SPSS would find that X1 makes a significant unique contribution. At this point, the weakest unique contribution is significant so SPSS would stop removing predictors and the final equation would be one that retains X1 and X3 as predictor variables. In summary, the strategy for the Backward method is that it starts with all of the predictors and it then removes predictors one at a time when they fail to contribute a significant amount of unique variability. The Backward method stops when the weakest unique contribution of the variables in the equation is significant. In this respect, the Backward method is analogous to a company that downsizes to try to get the most work done using the fewest number of employees. The Stepwise method All right. One more. This last strategy for selecting predictor variables is perhaps the most widely used. It’s referred to as the Stepwise method. The Stepwise method is similar to the Forward method, except in one important respect. The Stepwise method starts out the same as the Forward method – with no predictors in the model. In a first step, SPSS identifies the predictor variables that has the largest squared correlation with Y. It then tests this first predictor to see if it makes a significant contribution in predicting scores for Y. In our example, SPSS would again select X2 and enter it into the equation because it’s contribution of .21 is significant. In Step 2, just like in the Forward method, SPSS will identify the remaining predictor that would make the largest unique contribution to the equation. Just like the Forward method, the Stepwise method will select X1 because its unique contribution (8%) is larger than the unique contribution of X3 (6%). SPSS tests X1’s unique contribution of 8% to see if it is significant. If it’s not, the process stops and the Stepwise method determines that only X2 stays in the equation. If the unique contribution of the new variable, X1, is significant than the Stepwise method adds X1 to the equation. At this point SPSS will ALSO go back and it test the unique contribution of the variable already entered, X2. The thing that makes the Stepwise method different from the Forward method is the fact that the Stepwise method tests the unique contribution of every variable in the equation at every step. This way, there’s a way of getting rid of predictor variables that, at some point, are overlapped by the contributions of enough other predictors that it no longer makes a significant contribution to the equation. At this point in step 2, SPSS tests the unique contributions of both X1 (8%) and X2 (11%) and finds that they are both significant. So, for the moment, they’re both in. 17 In Step 3 the Stepwise method evaluates all of the remaining predictors to see which one makes the largest unique contribution to the equation. At this point there is only one remaining predictor, X3. The unique contribution of X3 is 6%. SPSS tests this unique contribution and finds that it’s significant. Now SPSS goes back and test the unique contributions of both of the predictors it’s already entered, X1 and X2. The unique contribution of X1 is still significant at 8%, so SPSS keeps X1. However, when both X1 and X3 are in the model the unique contribution of X2 (3%) is no longer significant, so SPSS removes it from the equation. It may seem strange, but the single best predictor is eventually removed because the other predictors together account for almost all of the variability that it does. In summary, the Stepwise method is similar to the Forward method, in that it starts with no predictors and then adds predictor variables one at a time if they make significant unique contributions to the equation. The stepwise method also provides a way to remove predictor variables from a regression model after they had been added in an earlier step. In this respect, the Stepwise method is analogous to Post-Tenure Review at a university. A professor might be tenured, but if they screw up badly enough you can get rid of them! Hierarchical regression: testing theories about relationships among variables The Forward, Backward, and Stepwise methods are automated methods for generating regression equations that are efficient. They’re great for developing prediction equations that account for the most variability while using the fewest number of predictors. If the goal of your work is prediction, then we’ve already talked about a great deal of what you need to know. However, many researchers are interested in answering questions that are more theoretical in nature. The questions often concern an idea about how a set of variables are related to each other. In words, a researcher has an idea about what the Venn diagram for a set of variables looks like – and they want to use multiple regression to determine what the Venn diagram actually looks like. For example, a researcher might think that Social Support will still account for a significant amount of variability in scores for Quality of Life after Financial Assets has already been entered into the equation. In other words, the investigator is predicting that the relationship between Social Support and Quality of Life is not something that can be accounted for when the Financial Assets of the family are taken into account. In order to answer that type of question we need to have control over the order in which each variable enters a regression equation. SPSS makes it very easy to do this. 18 When the researcher specifies the order in which variables enter a regression equation this is referred to as hierarchical regression. This process is very different from the automated strategies we talked about above. In the Forward, Backward, and Stepwise methods SPSS determined the order in which variables were entered. In hierarchical regression, the researcher determines the order of entry into the equation. Again, in hierarchical regression the goal is not to develop an effective prediction equation. The goal is to test hypothesized relationships among variables. If you’re using SPSS, the method that corresponds to hierarchical regression is the Enter method. In the question outlined above, the researcher wants to test the idea that Social Support still makes a significant unique contribution in predicting scores for Quality of Life, even after the predictor variable Financial Assets has already been entered into the equation. To answer this question the researcher needs to first enter Financial Assets in a regression equation first and see how good a job it does all by itself. One statistic that comes in very handy in conducting a hierarchical regression analysis is the R Square Change option. Here, we would tell SPSS to enter Financial Assets in a first block of predictors. Then the researcher would tell SPSS to enter Social Support in a second block of predictors. In its output, SPSS will report two sets of results. It will tell how good a job Financial Assets does all by itself and it will tell you about how the regression equation did when it used both predictors. It will also show you the degree to which the predictive power of the equation improves when that second predictor is added. OK, here’s what the output looks like… Variables Entered/Removedb Model 1 2 Variables Entered Financial a As sets Social a Support Variables Removed Method . Enter . Enter a. All requested variables entered. b. Dependent Variable: Quality of Life Model Summary Change Statistics Model 1 2 R .359a .534b R Square .129 .285 Adjusted R Square .102 .240 Std. Error of the Estimate 2.86176 2.63271 a. Predictors: (Constant), Financial Assets b. Predictors: (Constant), Financial Assets, Social Support R Square Change .129 .156 F Change 4.874 6.992 df1 df2 1 1 33 32 Sig. F Change .034 .013 19 ANOVAc Model 1 2 Regres sion Residual Total Regres sion Residual Total Sum of Squares 39.913 270.258 310.171 88.374 221.798 310.171 df 1 33 34 2 32 34 Mean Square 39.913 8.190 F 4.874 Sig. .034a 44.187 6.931 6.375 .005b a. Predic tors: (Constant), Financial As sets b. Predic tors: (Constant), Financial As sets , Social Support c. Dependent Variable: Quality of Life Coefficientsa Model 1 2 (Constant) Financial Assets (Constant) Financial Assets Social Support Unstandardized Coefficients B Std. Error 9.773 .662 1.87E-006 .000 6.431 1.403 1.75E-006 .000 .126 .048 Standardized Coefficients Beta .359 .334 .396 t 14.760 2.208 4.583 2.232 2.644 Sig. .000 .034 .000 .033 .013 a. Dependent Variable: Quality of Life Most of this output is ready familiar to you. One this that’s different is that we’ve got two sets of results. The first one, corresponding to the first “Model”, contains information about the regression equation when only Financial assets is used a predictor. The second “Model” reports results for the regression equation that uses both Financial Assets and Social Support as predictors. The second thing that’s new is contained in the Model Summary table. You can see in the R Square column that the proportion of variability accounted for when the only predictor variable is financial Assets .129. The value for R Square goes up to .285 when Social Support is added to the equation. If you subtracted .129 from .285 you would arrive at a value of .156. This means that Social Support accounts for an addition 15.6% of the variability that scores for Financial Assets could not account for. This last piece of information is also provided in the R Square Change column of the Model Summary. An R Square Change indicates the amount of change in the proportion of variability accounted for when an additional block of predictors (in this case one predictor variable) is added to the equation. The R Square Change for the second model is the same number we just came up with, .156. In this case, the R Square Change represents the unique contribution made by Social Support when Financial Assets has already been entered into the equation. The F-ratio displayed just to the right of the R Square Change is testing this proportion of variability – this unique contribution – to see if it is statistically significant. In this case the F-ratio is 6.992 and its significance level is .013, indicating that Social Support accounts for a significant proportion of variability in score for Quality of Life, above and beyond the variability accounted for by Financial Assets. 20 Generating a Venn diagram with two predictor variables Ok, we now know that the unique contribution of Social Support, above and beyond that of Financial Assets is 15.5%. In all. When both Social Support and Financial Assets are used as predictors the equation accounts for 28.5%. What more would we need to do to generate a Venn diagram that shows us the relationships of each predictor variable with the criterion and the degree to which the predictor variables overlap with each other. It turns out that there’s only one more analysis we need to do. Let’s draw a sample Venn diagram that shows us the information we have and the information we need. Figure X.4 This diagram shows us that we need to know the unique contributions for the two variables and the degree of overlap of the two predictors in accounting for the criterion. We already know the unique contribution for Social Support. It’s a proportion of .156 or a percentage of 15.6%. I’ve added this in below… 21 Figure X.5 Now we need to go get the unique contribution of the other predictor variable, Financial Assets. To do this, we just ask SPSS to do the same thing we did before – we just need to reverse the order of entry of the variables. In other, enter Social Support first and Financial Assets second. Here’s what the output from SPSS looks like when we do this… Variables Entered/Removedb Model 1 2 Variables Entered Social a Support Financial a As sets Variables Removed Method . Enter . Enter a. All requested variables entered. b. Dependent Variable: Quality of Life Model Summary Change Statistics Model 1 2 R .417a .534b R Square .174 .285 Adjusted R Square .149 .240 Std. Error of the Estimate 2.78703 2.63271 a. Predictors: (Constant), Social Support b. Predictors: (Constant), Social Support, Financial As sets R Square Change .174 .111 F Change 6.932 4.982 df1 df2 1 1 33 32 Sig. F Change .013 .033 22 ANOVAc Model 1 2 Regres sion Residual Total Regres sion Residual Total Sum of Squares 53.842 256.329 310.171 88.374 221.798 310.171 df 1 33 34 2 32 34 Mean Square 53.842 7.768 F 6.932 Sig. .013a 44.187 6.931 6.375 .005b a. Predic tors: (Constant), Soc ial Support b. Predic tors: (Constant), Soc ial Support, Financial Ass ets c. Dependent Variable: Quality of Life Coefficientsa Model 1 2 (Constant) Social Support (Constant) Social Support Financial Assets Unstandardized Coefficients B Std. Error 7.184 1.442 .133 .051 6.431 1.403 .126 .048 1.75E-006 .000 Standardized Coefficients Beta .417 .396 .334 t 4.982 2.633 4.583 2.644 2.232 Sig. .000 .013 .000 .013 .033 a. Dependent Variable: Quality of Life When the variable Financial Assets is entered second the R Square Change is .111. This means that Financial Assets accounts for 11.1% of the variability in scores for Quality of Life, above and beyond the variability accounted for by Social Support. In other words, the unique contribution of Financial Assets is 11.1%. Let’s put the unique contribution for Financial Assets into our Venn diagram. Figure X.6 Now the only thing left is the percentage of variability that corresponds to the overlap. This is easy. We know that when both predictors are used we account for 28.5 percent of the variability. The overlap must be the percentage of variability that’s left over when we 23 subtract the unique contributions of both variables from this total percentage of variability accounted for. Doing the math we get… Total percentage of Overlap = variability accounted for - Unique for Social Support – Unique for Fin. Assets Overlap = 28.5% - 15.6% - 11.1% = 1.8% This tells us that the percentage of overlap of the two predictors in accounting for scores for Quality of Life is 1.8%. Now we can fill in the last thing we need in the Venn diagram. Figure X.7 And that’s it! One way of thinking about hierarchical regression is that it’s like flying a plane on manual. You get to say which variables enter the equation and when. The Forward, backward, and Stepwise methods are like flying the plane on autopilot. The computer knows what the rules are and it will follow those rules without the pilot having to think about what’s really going on. Both hierarchical regression and the automated strategies address important goals and objectives – they’re just not the same goals and objectives.