Stat 501 Oct. 22 Some Chapter 7 Ideas 1. If x1 and x2 are not correlated (correlation = 0) SSR(x1|x2) = SSR(x1) and SSR(x2|x1) = SSR(x2). That is, in a multiple regression, x1 and x2 make independent contributions to reducing SSE. The sample estimate of β1 is the same in the model E( y) 0 1 x 1 as it is in the model E( y) 0 1 x 1 2 x 2 . The sample estimate of β2 is the same in the model E( y) 0 2 x 2 as it is in the model E( y) 0 1 x 1 2 x 2 . In other words, in the multiple regression, the coefficient multiplying x1 is estimated independently of x2 and the coefficient multiplying x2 is estimated independently of x1. 2. If x1 and x2 are correlated (correlation ≠ 0) SSR(x1|x2) ≠ SSR(x1) and SSR(x2|x1) ≠ SSR(x2). That is, in a multiple regression, x1 and x2 do not make independent contributions to reducing SSE. The sample estimate of β1 is not the same in the model E( y) 0 1 x 1 as it is in the model E( y) 0 1 x 1 2 x 2 . The sample estimate of β2 is the not the same in the model E( y) 0 2 x 2 as it is in the model E( y) 0 1 x 1 2 x 2 . In other words, in the multiple regression the coefficient multiplying x1 is not estimated independently of x2 and the coefficient multiplying x2 is not estimated independently of x1. 3. Each t-test for an individual coefficient (other than the intercept) is essentially assessing the significance of the size of SSR(this variable | other variables in the model). This is true whether x1 and x2 are correlated or not. 4. The Sequential SS given be Minitab or other programs can be used to put together a general Linear F-test. Suppose a null hypothesis is that the β coefficients multiplying two particular variables are both = 0. List those two variables last in the list of predictor variables when specifying the model. The sum of the sequential SS for those two variables will equal SSE(Reduced) – SSE(Full). 5. The SEQ SS is the only thing affected by the order of listing predictor variables. All other aspects of the fit will be the same regardless of order. For example, MSE, R2, and estimated coefficients don’t depend upon order. An example illustrating point 4 ( and point 2 as well) is on the following page. Example: For the hospital infection risk data, y = infection risk, x1 = average length of stay, x2 = number of daily bacterial cultures done, x3 = daily number of patients, x4 = number of beds in hospital and x5 = number of nurses employed. Output for Full Model The regression equation is InfctRsk = 0.841 + 0.247 Stay + 0.0525 Cultures - 0.00054 Census - 0.00039 Beds + 0.00291 Nurses Analysis of Variance Source DF SS Regression 5 98.086 Residual Error 107 103.294 Total 112 201.380 Source Stay Cultures Census Beds Nurses DF 1 1 1 1 1 MS 19.617 0.965 F 20.32 P 0.000 Seq SS 57.305 33.397 4.645 0.057 2.681 Suppose we test H o : 4 5 0 The “Seq SS” can be used to learn that SSE(Reduced) – SSE(Full) = SSR(Beds, Nurses|Stay, Cultures,Census) = 0.057+2.681 = 2.738. 2.738 And the F-statistic is F 2 1.42 with 2 and 107 df. (By the way, this won’t be significant.) 0.965 We can check this by estimating the reduced model that includes only the first three variables. The results are below. You’ll see that SSE(Reduced)= 106.032. Above you can find that SSE(Full) = 103.294. The difference is 106.032− 103.294 = 2.738, the same value we got using the SEQ SS. Output for Reduced Model The regression equation is InfctRsk = 1.07 + 0.218 Stay + 0.0568 Cultures + 0.00150 Census Source Regression Residual Error Total DF 3 109 112 SS 95.347 106.032 201.380 MS 31.782 0.973 F 32.67 P 0.000 ---------------------------------------------------------------------NOTE: In the full model we could have listed Beds and Nurses in either order, so long as they appear last in the list of variables. Here’s the result of listing them in the order Nurses, Beds. Source Stay Cultures Census Nurses Beds DF 1 1 1 1 1 Seq SS 57.305 33.397 4.645 2.718 0.020 Adding the last two values gives 2.738, the same value we got above.