Psychology 522/622 Lab Lecture 2 Multiple Linear Regression We continue our exploration of multiple linear regression analysis. The goal of this lab is to illustrate basic multiple regression. The simplest multiple regression is a 2 predictor multiple regression problem. Let’s work through an example of this situation. We will run a couple of multiple regression models and orient ourselves to the relevant output and correct interpretations of the parameters. DATAFILE: injury.sav This file contains the data for 100 women on the following variables: injury: gluts: age: Overall injury index based on the records kept by the participants A measure of strength of the muscles in the upper part of the back of the leg and the buttocks The age of the women when the study began. Injury will be our dependent variable for all of the models. INITIAL SCATTERPLOTS First, create scatter plots for all combinations of the variables: Injury by gluts, Injury by age, Age by gluts. We can do this by creating a scatterplot matrix. GraphsScatterMatrixDefineMove Injury, gluts, and age to the Matrix Variables boxOK. GRAPH /SCATTERPLOT(MATRIX)=injury gluts age /MISSING=LISTWISE . We’re checking for anything fishy (non-linearity, heteroscedasticity, outliers). age gluts injury 2 injury gluts age I don’t see anything fishy. Do you? What can you tell by looking at the scatterplots here? Do you have any inkling of the direction/magnitude of the linear associations between these variables? My guess is that injury and age have a positive relationship, gluts and age likely have a very weak positive relationship, and gluts and injury have a negative relationship. Making a 3-D Interactive Scatterplot!!! GraphsInteractiveScatterplotClick on the 2-D coordinate box and select 3-D coordinateMove injury to the Y axis, age to the x axis, and gluts to the diagonalOK. Double click on the plot in the output and play with your data in 3-D! CORRELATION ANALYSIS Let’s create a correlation matrix for these three variables. This is a really easy analysis, so test yourself and see if you can remember how to do it using the dropdown commands. CORRELATIONS /VARIABLES=injury gluts age /PRINT=TWOTAIL NOSIG /MISSING=PAIRWISE . 3 Correl ations injury injury gluts age Pearson Correlation Sig. (2-tailed) N Pearson Correlation Sig. (2-tailed) N Pearson Correlation Sig. (2-tailed) N 1 . 100 -.393** .000 100 .290** .003 100 gluts -.393** .000 100 1 . 100 .158 .115 100 age .290** .003 100 .158 .115 100 1 . 100 **. Correlation is s ignificant at t he 0.01 level (2-t ailed). Were our predictions based on the bivariate scatterplots correct? Pretty much Let’s focus on the gluts X injury relationship. Note that in the scatterplot the majority of the datapoints are just swirling around the center, but we’ve got maybe 3 hanging out toward the bottom right corner and maybe 3 hanging out in the top left. Although they don’t appear to be true “outliers” they are influential data points. If we dropped them from our dataset, it would definitely affect (i.e., weaken) the correlation between these two variables. SIMPLE REGRESSION OF INJURY ON GLUTS Let’s quickly run a simple regression predicting injury from gluts. In this simple regression, injury is the dependent variable and gluts is the independent variable. AnalyzeRegressionLinearMove injury to the DV box, move gluts to the IV box StatisticsSelect Descriptive (Estimates and Model Fit should already be selected) PlotsMove ZResid to the Y axis, move ZPred to the X axis, Select Histogram, Select Normal Probability Plot (this is the P-P plot) SaveUnder Predicted Values, select Standardized and Unstandardized Click OK. REGRESSION /DESCRIPTIVES CORR /MISSING LISTWISE /STATISTICS COEFF OUTS R ANOVA /CRITERIA=PIN(.05) POUT(.10) /NOORIGIN /DEPENDENT injury /METHOD=ENTER gluts /SCATTERPLOT=(*ZRESID ,*ZPRED ) /RESIDUALS HIST(ZRESID) NORM(ZRESID) . Regression 4 Variables Entered/Removedb Model 1 Variables Entered gluts a Variables Removed Method Enter . a. All requested variables entered. b. Dependent Variable: injury Model Summaryb Model 1 R .393a R Square .154 Adjusted R Square .146 Std. Error of the Estimate 48.243 a. Predictors: (Constant), gluts b. Dependent Variable: injury When we have only one predictor, what do we know to be true about R (i.e., Multiple R) and r (i.e., the Pearson correlation)? They’re the same, right? So what’s different here, and why?? Multiple R is positive and r(gluts,injury) is negative. Why? Because R is always positive. Why? Well, R=rY,Y’. Would you expect the correlation between the actual values of Y and the predicted values of Y to be negative or positive? Positive, of course. Remember, when we’re talking about R we’re not talking about the relationship between X and Y (in this case, gluts and injury). We’re talking about the relationship between the actual and predicted values of Y (injury). Now, if it turns out that our predictor(s) do a really terrible job of predicting values our DV, R will just be much closer to zero. ANOVAb Model 1 Regres sion Residual Total Sum of Squares 41625. 493 228088.5 269714.0 df 1 98 99 Mean Square 41625. 493 2327.434 F 17.885 Sig. .000a t 9.660 -4. 229 Sig. .000 .000 a. Predic tors: (Constant), gluts b. Dependent Variable: injury Coeffi cientsa Model 1 (Const ant) gluts Unstandardized Coeffic ients B St d. Error 255.994 26.499 -3. 545 .838 St andardiz ed Coeffic ients Beta a. Dependent Variable: injury Let’s write out this regression equation for practice: -.393 5 Ŷ = 255.99 – 3.56(gluts) Residuals Statistics a Minimum 89.36 -115.356 -2.753 -2.391 Predicted Value Residual Std. Predicted Value Std. Residual Maximum 202.81 125.825 2.780 2.608 Mean 145.80 .000 .000 .000 Std. Deviation 20.505 47.999 1.000 .995 N 100 100 100 100 a. Dependent Variable: injury We want these residual values to be as small as possible because the smaller the residual values, the better our data is fitting the regression line. Charts Normal P-P Plot of Regression Standardized Residual Histogram Dependent Variable: injury Dependent Variable: injury 1.0 20 Expected Cum Prob 0.8 Frequency 15 10 5 0.6 0.4 0.2 Mean = -1.42E-16 Std. Dev. = 0.995 N = 100 0 -3 -2 -1 0 1 2 0.0 0.0 3 0.2 0.4 0.6 0.8 1.0 Observed Cum Prob Regression Standardized Residual Our histogram looks pretty good-our residuals are approximately normally distributed. What about the P-P plot? Although there are a few hiccups, the data are approximating a line. Scatterplot Dependent Variable: injury Regression Standardized Residual 3 2 1 0 -1 -2 -3 -3 -2 -1 0 1 2 Regression Standardized Predicted Value 3 6 What about the scatterplot of our standardized residual values? We don’t want to see any shape here. Blobs=good. And for the most past, that’s what we see. So, on the basis of these three graphs, we’ll decide that we’re comfortable that we haven’t seriously violated any of our assumptions. STANDARD MULTIPLE REGRESSION OF INJURY ON GLUTS AND AGE In this regression, injury again is the dependent variable and now both gluts and age are predictors. What we’ll run here is what T&F refer to as “standard multiple regression.” AnalyzeRegressionLinearMove injury to the dependent variable box, Move gluts and age to the IV box Click Statistics select Collinearity diagnostics (you can also select Part and partial correlations if you’re interested in seeing those) We’ll want the same information that we just asked for when running the simple regression, so everything should already be set up for us. REGRESSION /DESCRIPTIVES CORR /MISSING LISTWISE /STATISTICS COEFF OUTS R ANOVA TOL /CRITERIA=PIN(.05) POUT(.10) /NOORIGIN /DEPENDENT injury /METHOD=ENTER gluts age /SCATTERPLOT=(*ZRESID ,*ZPRED ) /RESIDUALS HIST(ZRESID) NORM(ZRESID) . Regression Descriptive Statistics injury age gluts Mean 145.80 67.32 31.08 Std. Deviation 52.196 3.235 5.783 N 100 100 100 7 Correl ations Pearson Correlation Sig. (1-tailed) N injury age gluts injury age gluts injury age gluts injury 1.000 .290 -.393 . .002 .000 100 100 100 age .290 1.000 .158 .002 . .058 100 100 100 gluts -.393 .158 1.000 .000 .058 . 100 100 100 Variables Entered/Removedb Model 1 Variables Entered age, gluts a Variables Removed Method Enter . a. All requested variables entered. b. Dependent Variable: injury Model Summaryb Model 1 R .531a R Square .282 Adjusted R Square .267 Std. Error of the Estimate 44.685 a. Predictors: (Constant), age, gluts b. Dependent Variable: injury R=.53 and R2=.28, indicating that 28% of the variance in injury is accounted for by age and glut strength. ANOVAb Model 1 Regres sion Residual Total Sum of Squares 76026. 610 193687.4 269714.0 df 2 97 99 Mean Square 38013. 305 1996.777 F 19.037 Sig. .000a a. Predic tors: (Constant), age, glut s b. Dependent Variable: injury This is the F ratio that we want to report in our results section. We’d report it just the way we would for an ANOVA, F(2, 97) = 19.04, p < .01, R2 = .28. 8 Coeffi cientsa Model 1 (Const ant) gluts age Unstandardized Coeffic ients B St d. Error -120.867 94.054 -4. 063 .786 5.837 1.406 St andardiz ed Coeffic ients Beta -.450 .362 t -1. 285 -5. 166 4.151 Sig. .202 .000 .000 Collinearity Statistics Tolerance VIF .975 .975 1.026 1.026 a. Dependent Variable: injury It looks like both of our IVs are significant predictors of injury. How do we interpret these partial regression coefficients? Gluts: Holding constant age, for every unit increase in glut strength there is an expected decrease of 4.06 in injury. Colloquially stated: controlling for age, higher levels of glut strength are related to fewer incidences of injury Age: Holding gluts constant, for every unit (i.e., year) increase in age, injury is expected to increase by 5.84 units. Colloquially stated: controlling for glut strength, injuries increase with age. Also, our collinearity statistics look good Remember, we want high tolerance values. Let’s write out the regression equation: Ŷ = -120.87 - 4.06(gluts) + 5.84(age) Residuals Statistics a Predicted Value Residual Std. Predicted Value Std. Residual Minimum 55.91 -105.107 -3.244 -2.352 Maximum 203.42 97.433 2.079 2.180 Mean 145.80 .000 .000 .000 Std. Deviation 27.712 44.232 1.000 .990 N 100 100 100 100 a. Dependent Variable: injury A rule of thumb is to have these standardized residuals not exceed |2|. They look OK here. Charts 9 Normal P-P Plot of Regression Standardized Residual Histogram Dependent Variable: injury Dependent Variable: injury 1.0 14 0.8 Expected Cum Prob Frequency 12 10 8 6 4 0.6 0.4 0.2 2 Mean = -5.63E-16 Std. Dev. = 0.99 N = 100 0 -4 -2 0 2 0.0 0.0 4 0.2 0.4 0.6 0.8 1.0 Observed Cum Prob Regression Standardized Residual Hmm, the histogram isn’t looking quite as good here, but it still isn’t too scary. P-P plot: again, we’re seeing a few hiccups, especially at the very top of the graph. Let’s look at the scatterplot of the residuals to see if it gives us any cause for concern. Scatterplot Dependent Variable: injury Regression Standardized Residual 4 2 0 -2 -4 -4 -2 0 2 4 Regression Standardized Predicted Value For the most part, we’re seeing the desired blob. The point that’s sort of hanging out on its own to the left of the graph is potentially problematic. NOTE: for the sake of time, we won’t go through all of the diagnostics, but recall from last week that you could ask for Mahalanobis distance, Cook’s distance, and leverage values for additional information about potential outliers. Also, since this is a regression with two predictors, you could look at the data in 3-D like we discussed earlier!! SEQUENTIAL (HIERARCHICAL) REGRESSION This next model conducts a hierarchical regression analysis. Here gluts will be entered as a predictor variable in the first step and age in the second step. 10 AnalyzeRegressionLinearMove injury to the DV box Move gluts to the IV boxClick Next Move Age into the IV box in Block 2 of 2 Click Statisticsselect R squared change Everything else we want should already be selected from the previous analyses we ran, so Click OK. Regression Variables Entered/Removedb Model 1 2 Variables Entered gluts a agea Variables Removed Method Enter Enter . . a. All requested variables entered. b. Dependent Variable: injury Model Summary Change Statistics Model 1 2 R .393a .531b R Square .154 .282 Adjusted R Square .146 .267 Std. Error of the Estimate 48.243 44.685 R Square Change .154 .128 F Change 17.885 17.228 df1 df2 1 1 98 97 Sig. F Change .000 .000 a. Predictors: (Constant), gluts b. Predictors: (Constant), gluts, age Already we see that our Model Summary table looks different than it did before. We see two rows, Model 1 and Model 2. Model 1 refers to “Block 1,” in this case, just gluts predicting injury. Model 2 refers to “Blocks 1 & 2,” in this case, gluts and age predicting injury. So what kind of information are we getting here? MODEL 1 This is the same information we got when we ran a simple regression predicting injury from gluts. The model summary table for that analysis is below so that you can compare the two. Nothing new and exciting in the first 5 columns. Model Summaryb Model 1 R .393a R Square .154 Adjusted R Square .146 Std. Error of the Estimate 48.243 a. Predictors: (Constant), gluts b. Dependent Variable: injury In examining the Change Statistics box, we’ll get the same information that’s provided to us in the ANOVA summary table (reproduced below) for the simple regression predicting injury from gluts. 11 Model Summary Change Statistics Model 1 2 R .393a .531b R Square .154 .282 Adjusted R Square .146 .267 Std. Error of the Estimate 48.243 44.685 R Square Change .154 .128 F Change 17.885 17.228 df1 df2 1 1 98 97 Sig. F Change .000 .000 a. Predictors: (Constant), gluts b. Predictors: (Constant), gluts, age ANOVAb Model 1 Sum of Squares 41625. 493 228088.5 269714.0 Regres sion Residual Total df 1 98 99 Mean Square 41625. 493 2327.434 F 17.885 Sig. .000a a. Predic tors: (Constant), gluts b. Dependent Variable: injury In the change statistics box, we see columns titled R square change, F change, df1, df2, and Sig. F Change. In this case, R square change = R square. Why? Because here SPSS is comparing our model 1 to a model with no predictors where R2=0. F change = the F ratio we got when running our simple regression. Df1 = df(regression), and df2=df(residual) Sig. F Change = p value for the F ratio we got when running our simple regression. MODEL 2 Below you see the model summary table for our hierarchical regression, followed by the model summary table for the standard multiple regression predicting injury from gluts and age. Model Summary Change Statistics Model 1 2 R .393a .531b R Square .154 .282 Adjusted R Square .146 .267 Std. Error of the Estimate 48.243 44.685 R Square Change .154 .128 F Change 17.885 17.228 df1 df2 1 1 98 97 Sig. F Change .000 .000 a. Predictors: (Constant), gluts b. Predictors: (Constant), gluts, age Model Summaryb Model 1 R .531a R Square .282 Adjusted R Square .267 Std. Error of the Estimate 44.685 a. Predictors: (Constant), age, gluts b. Dependent Variable: injury R = R for standard multiple regression;R2 = R2 for standard multiple regression 12 So far, we haven’t gotten any new information. The cool stuff comes in the change statistics box. R2 change = Model 2 R2 – Model 1 R2. So in this case, ΔR2 = .282 - .154 = .128. We easily could have calculated this ourselves, but SPSS gives it to you F change: this is an F ratio that allows you to test whether the change in R2 between Models 1 and 2 is significant. It is NOT the difference in the F values between Models 1 and 2 (i.e., you don’t calculate it via simple subtraction as you do with R2 change). So we now have an F ratio and want to see if it is significant. The corresponding p value for this F ratio shows up in the column titled Sig. F change. Sig. F Change: This is a p-value for the F change ratio. Now that we have a p value, what are the two questions we ask? 1) What’s the null hypothesis that we’re evaluating? Here, we’re evaluating the null hypothesis that Model 1 R = Model 2 R, or that Model 1 R2 = Model 2 R2, or that ΔR2 = 0. The null hypothesis assumes that there will not be a significant change in R/R2 by adding more predictors. Alternatively phrased to suit our example, the null hypothesis assumes that the prediction of injury is not enhanced by adding age as a predictor when gluts is already in the regression. In other words, the null hypothesis assumes that age does not explain incremental variance in injury scores above and beyond the variance explained by glut strength. So we’ve answered the first question. The second question is: 2) Is it significant? Yes, it is significant, so we reject the null hypothesis and conclude that age does explain incremental variance in injury scores beyond the variance explained by glut strength. Below I’ve reproduced the ANOVA summary table for each regression that we’ve run today so that you can compare the results. This is what you’ll see: Hierarchical regression’s model 1 = simple regression model 1 Hierarchical regression model 2 = standard multiple regression model 2 HIERARCHICAL REGRESSION ANOVAc Model 1 2 Regres sion Residual Total Regres sion Residual Total Sum of Squares 41625. 493 228088.5 269714.0 76026. 610 193687.4 269714.0 df 1 98 99 2 97 99 Mean Square 41625. 493 2327.434 F 17.885 Sig. .000a 38013. 305 1996.777 19.037 .000b a. Predic tors: (Constant), gluts b. Predic tors: (Constant), gluts , age c. Dependent Variable: injury SIMPLE REGRESSION PREDICTING INJURY FROM GLUT STRENGTH 13 ANOVAb Model 1 Regres sion Residual Total Sum of Squares 41625. 493 228088.5 269714.0 df 1 98 99 Mean Square 41625. 493 2327.434 F 17.885 Sig. .000a Mean Square 38013. 305 1996.777 F 19.037 Sig. .000a a. Predic tors: (Constant), gluts b. Dependent Variable: injury STANDARD MULTIPLE REGRESSION ANOVAb Model 1 Regres sion Residual Total Sum of Squares 76026. 610 193687.4 269714.0 df 2 97 99 a. Predic tors: (Constant), age, glut s b. Dependent Variable: injury MODEL 1 COEFFICIENTS What do you notice below about the coefficient for gluts? It’s the same as it was when we ran a simple regression predicting injury from gluts. That’s because Model 1 IS the simple regression predicting injury from gluts. HIERARCHICAL REGRESSION Coefficientsa Model 1 2 (Constant) gluts (Constant) gluts age Unstandardized Coefficients B Std. Error 255.994 26.499 -3.545 .838 -120.867 94.054 -4.063 .786 5.837 1.406 Standardized Coefficients Beta -.393 -.450 .362 t 9.660 -4.229 -1.285 -5.166 4.151 Sig. .000 .000 .202 .000 .000 Collinearity Statistics Tolerance VIF 1.000 1.000 .975 .975 1.026 1.026 a. Dependent Variable: injury SIMPLE REGRESSION Coeffi cientsa Model 1 (Const ant) gluts Unstandardized Coeffic ients B St d. Error 255.994 26.499 -3. 545 .838 a. Dependent Variable: injury St andardiz ed Coeffic ients Beta -.393 t 9.660 -4. 229 Sig. .000 .000 14 MODEL 2 COEFFICIENTS What do you notice here? Hmm, these look familiar, too. They are the same coefficients that we saw when we conducted a standard multiple regression, i.e., put all the predictors into one step instead of separating them out into two steps. HIERARCHICAL REGRESSION Coefficientsa Model 1 2 (Constant) gluts (Constant) gluts age Unstandardized Coefficients B Std. Error 255.994 26.499 -3.545 .838 -120.867 94.054 -4.063 .786 5.837 1.406 Standardized Coefficients Beta -.393 -.450 .362 t 9.660 -4.229 -1.285 -5.166 4.151 Sig. .000 .000 .202 .000 .000 t -1. 285 -5. 166 4.151 Sig. .202 .000 .000 Collinearity Statistics Tolerance VIF 1.000 1.000 .975 .975 1.026 1.026 a. Dependent Variable: injury STANDARD MULTIPLE REGRESSION Coeffi cientsa Model 1 (Const ant) gluts age Unstandardized Coeffic ients B St d. Error -120.867 94.054 -4. 063 .786 5.837 1.406 St andardiz ed Coeffic ients Beta -.450 .362 Collinearity Statistics Tolerance VIF .975 .975 1.026 1.026 a. Dependent Variable: injury Let’s focus on the coefficients in the hierarchical regression. You can see that the coefficient for gluts changes between Model 1 (B = -3.55) and Model 2 (B = -4.06). We’ve seen in our examples in class that, due to multicollinearity and other reasons, adding predictors can often reduce the size of a coefficient, but here the partial regression coefficient for gluts actually gets larger. Why do you think that is? Based on the correlation matrix, we saw that injury shared a pretty strong relationship with age. In the first model, we’re ignoring age and only including gluts, so there’s a lot noise (i.e., error) in our prediction. However, once we include age and control for it, some of that error/noise is reduced. In addition, age and glut strength are not highly correlated (recall that when we ran the bivariate correlations, rage,gluts = .16. Here we’ve got a situation where including an additional predictor (age) helps to reduce additional noise (i.e., error; i.e., variance) in the DV, so our partial regression coefficient for gluts improves. Below we can check out our collinearity diagnostics, and they look good. We would expect this because we decided when we ran the standard multiple regression (which is the same as model 2 in the hierarchical regression) that multicollinearity was not a problem. 15 b Ex cluded Variabl es Model 1 age Beta In .362a t 4.151 Sig. .000 Partial Correlation .388 Collinearity Statistics Minimum Tolerance VIF Tolerance .975 1.026 .975 a. Predic tors in the Model: (Const ant), gluts b. Dependent Variable: injury So what have we learned?? In our example, Standard multiple regression results = Hierarchical regression results for Model 2. Although hierarchical regression is nice because it gives you change statistics, saving you the hassle of calculating them on your own, your last model in hierarchical regression will ultimately yield the same results as throwing all of the variables into one step. So, if you’re just interested in how well a particular group of variables predicts scores on a given DV, standard multiple regression is completely appropriate and the best suited. However, if you are especially interested in determining which variable(s) contribute incrementally to the prediction of the DV, hierarchical regression is a great option. NOTE: Just to be explicit, you could run a “hierarchical” regression by conducting a series of standard multiple regressions. In our example, you’d run a regression predicting injury from gluts. You’d click OK and get your output. This information would be the same as what we got in Model 1 of our hierarchical regression. Then you’d run another regression and include gluts and age as predictors in one step. The information here would be the same as what we got in Model 2 of our hierarchical regression, and the same as what we saw when running the standard multiple regression. However, this process takes longer than just clicking “Next” to get you to the next block of predictors in SPSS, and you don’t get the change statistics. I’m just explaining this to drive home the point that the results you get will be the same, whether you focus on the last model in your hierarchical regression or a standard multiple regression where all variables were entered in one step. Sample APA results section: Table 1 [I didn’t actually include this, but you should ] displays correlations among all study variables. Both predictors shared moderate to strong relationships with the outcome and were only weakly correlated with one another. Inspection of P-P plots and a scatterplot of the standardized residual values indicated that the assumptions were met. Examination of bivariate scatter plots did not reveal any extreme data values. A hierarchical multiple regression was conducted to determine whether age contributed incrementally to the prediction of injury scores above and beyond that accounted for by glut strength. Glut strength was entered in step 1, and age was entered in step 2. Results indicated that glut strength explained 15% of the variance in injury, F(1, 98) = 17.89, p < .01. Furthermore, age explained an incremental 13% of the variance in injury scores, F(1, 97) = 17.23, p < .01, above and beyond the variance in accounted for by glut strength. Partial regression coefficients are reported in Table 2. Results suggest 16 that women of the same age are less likely to be injured if they have more strength in their gluts. Conversely, women who have the same level of glut strength are more likely to be injured if they are older. Variable B SE B β Step 1 Glut strength -4.06 .79 -.45* Step 2 Age 5.84 1.41 .36* 2 2 Note. R = .15 for Step 1; ΔR = .15 for Step 2 (ps < .05). * p < .05 NOTE: The partial regression coefficients that you report here come from MODEL 2 (i.e., the FINAL model). You do NOT report the partial regression coefficient for glut strength from Model 1. NOTE 2: I did not report the t statistic that accompanies the partial regression coefficient anywhere. This is primarily because we entered only one variable in each step, so the ΔR2 statistic is sufficient. However, if we had entered several variables in step 2, we would likely report the t statistics associated with each partial regression coefficient. In this way we’d let our readers know which variables were significant predictors of our DV. APPENDIX Here I want to show you how the order in which you enter your coefficients has NO EFFECT on the regression coefficients that you see in the final model when conducting a hierarchical regression, or in Model 1 (the only model) in standard multiple regression. This will be brief, but hopefully it also helpful. PREDICTING INJURY FROM GLUTS AND AGE Standard Multiple Regression: Model Summaryb Model 1 R .531a R Square .282 Adjusted R Square .267 a. Predictors: (Constant), age, gluts b. Dependent Variable: injury Std. Error of the Estimate 44.685 17 ANOVAb Model 1 Regres sion Residual Total Sum of Squares 76026. 610 193687.4 269714.0 df 2 97 99 Mean Square 38013. 305 1996.777 F 19.037 Sig. .000a a. Predic tors: (Constant), age, glut s b. Dependent Variable: injury Coeffi cientsa Model 1 (Const ant) gluts age Unstandardized Coeffic ients B St d. Error -120.867 94.054 -4. 063 .786 5.837 1.406 St andardiz ed Coeffic ients Beta -.450 .362 t -1. 285 -5. 166 4.151 Sig. .202 .000 .000 a. Dependent Variable: injury Hierarchical Regression, Step 1 = gluts, Step 2 = age: Collinearity Statistics Tolerance VIF .975 .975 1.026 1.026 18 Model Summary Change Statistics Model 1 2 R .393a .531b R Square .154 .282 Adjusted R Square .146 .267 Std. Error of the Estimate 48.243 44.685 R Square Change .154 .128 F Change 17.885 17.228 df1 df2 1 1 98 97 Sig. F Change .000 .000 a. Predictors: (Constant), gluts b. Predictors: (Constant), gluts, age ANOVAc Model 1 2 Regres sion Residual Total Regres sion Residual Total Sum of Squares 41625. 493 228088.5 269714.0 76026. 610 193687.4 269714.0 df 1 98 99 2 97 99 Mean Square 41625. 493 2327.434 F 17.885 Sig. .000a 38013. 305 1996.777 19.037 .000b a. Predic tors: (Constant), gluts b. Predic tors: (Constant), gluts , age c. Dependent Variable: injury Coefficientsa Model 1 2 (Constant) gluts (Constant) gluts age Unstandardized Coefficients B Std. Error 255.994 26.499 -3.545 .838 -120.867 94.054 -4.063 .786 5.837 1.406 Standardized Coefficients Beta -.393 -.450 .362 t 9.660 -4.229 -1.285 -5.166 4.151 Sig. .000 .000 .202 .000 .000 Collinearity Statistics Tolerance VIF 1.000 1.000 .975 .975 1.026 1.026 a. Dependent Variable: injury Hierarchical Regression, Step 1 = age, Step 2 = gluts: Model Summary Change Statistics Model 1 2 R .290a .531b R Square .084 .282 Adjusted R Square .075 .267 a. Predictors: (Constant), age b. Predictors: (Constant), age, gluts Std. Error of the Estimate 50.201 44.685 R Square Change .084 .198 F Change 9.024 26.685 df1 df2 1 1 98 97 Sig. F Change .003 .000 19 ANOVAc Model 1 2 Regres sion Residual Total Regres sion Residual Total Sum of Squares 22742. 229 246971.8 269714.0 76026. 610 193687.4 269714.0 df 1 98 99 2 97 99 Mean Square 22742. 229 2520.120 F 9.024 Sig. .003a 38013. 305 1996.777 19.037 .000b a. Predic tors: (Constant), age b. Predic tors: (Constant), age, glut s c. Dependent Variable: injury Coefficientsa Model 1 2 (Constant) age (Constant) age gluts Unstandardized Coefficients B Std. Error -169.650 105.129 4.686 1.560 -120.867 94.054 5.837 1.406 -4.063 .786 Standardized Coefficients Beta .290 .362 -.450 t -1.614 3.004 -1.285 4.151 -5.166 Sig. .110 .003 .202 .000 .000 Collinearity Statistics Tolerance VIF 1.000 1.000 .975 .975 1.026 1.026 a. Dependent Variable: injury What’s the same between the two hierarchical regressions??? Model 2 R, Model 2 R2, and the Model 2 partial regression coefficients. What’s different between the two hierarchical regressions??? Everything in Model 1, R square change, and F change. For the things that are different, WHY are they different??? Because Model 1 has changed!! In the first hierarchical regression, we entered gluts in step 1, age in step 2. Recall that by doing this, the results you get for model 1 are equal to the results you get by running a simple regression predicting injury from gluts. Now that we’ve changed the order in which we’ve entered the variables (i.e., we entered age in step 1 and gluts in step 2), Model 1 becomes equivalent to the simple regression predicting injury from age. See below for proof Simple regression predicting injury from age: 20 Model Summary Model 1 R .290a R Square .084 Adjusted R Square .075 Std. Error of the Estimate 50.201 a. Predictors: (Constant), age ANOVAb Model 1 Regres sion Residual Total Sum of Squares 22742. 229 246971.8 269714.0 df 1 98 99 Mean Square 22742. 229 2520.120 F 9.024 Sig. .003a a. Predic tors: (Constant), age b. Dependent Variable: injury Coeffi cientsa Model 1 (Const ant) age Unstandardized Coeffic ients B St d. Error -169.650 105.129 4.686 1.560 St andardiz ed Coeffic ients Beta .290 t -1. 614 3.004 Sig. .110 .003 a. Dependent Variable: injury Remember that in hierarchical regression, you are always comparing models. Model 1 to a model with no predictors, model 2 to model 1, and if we had a model 3 we’d be comparing it to model 2. So we’re comparing any new variables we enter to all the other variables that have already been put into the regression. R square change and F change will also take on different values in the hierarchical regression (step 1 = age, step 2 = gluts) than they did in the hierarchical regression (step 1 = gluts, step 2 = age). Again, this is because model 1 has changed! So although model 2 in both hierarchical regressions is the same model, the model to which we compare it (model 1) has changed. This is why we see the same R and R2 values for Model 2 in the two hierarchical regressions, but different values for ΔR2. Partial Regression Coefficients across the 3 Regressions (Standard, Hierarchical 1 and Hierarchical 2) Standard Multiple Regression: 21 Coeffi cientsa Model 1 (Const ant) gluts age Unstandardized Coeffic ients B St d. Error -120.867 94.054 -4. 063 .786 5.837 1.406 St andardiz ed Coeffic ients Beta -.450 .362 t -1. 285 -5. 166 4.151 Sig. .202 .000 .000 Collinearity Statistics Tolerance VIF .975 .975 1.026 1.026 a. Dependent Variable: injury Hierarchical Regression (step 1 = gluts, step 2 =age): focus on model 2 Coefficientsa Model 1 2 (Constant) gluts (Constant) gluts age Unstandardized Coefficients B Std. Error 255.994 26.499 -3.545 .838 -120.867 94.054 -4.063 .786 5.837 1.406 Standardized Coefficients Beta -.393 -.450 .362 t 9.660 -4.229 -1.285 -5.166 4.151 Sig. .000 .000 .202 .000 .000 Collinearity Statistics Tolerance VIF 1.000 1.000 .975 .975 1.026 1.026 a. Dependent Variable: injury Hierarchical Regression (step 1 = age, step 2 =gluts): focus on model 2 Coefficientsa Model 1 2 (Constant) age (Constant) age gluts Unstandardized Coefficients B Std. Error -169.650 105.129 4.686 1.560 -120.867 94.054 5.837 1.406 -4.063 .786 Standardized Coefficients Beta .290 .362 -.450 t -1.614 3.004 -1.285 4.151 -5.166 Sig. .110 .003 .202 .000 .000 Collinearity Statistics Tolerance VIF 1.000 1.000 .975 .975 1.026 1.026 a. Dependent Variable: injury The partial regression coefficients in the final models ARE ALL THE SAME. The Venn diagrams in the Tabachnick & Fidell may mistakenly lead you to believe that the order in which you enter the variables will affect the partial regression coefficients that you see in the final model. No. In hierarchical regression, the order in which you enter the variables will affect ΔR2 (again, because the comparison model is being changed) but order will not affect the values of B or β in the final model. And, of course, the values of B and β in the final model of a hierarchical regression will yield the same values of B and β that you would have obtained by putting all variables into the same step using standard multiple regression.