Stat 501 Oct. 29
Here are results carried over from the Oct. 25 handout. Y = infection risk in hospitals. The variables I1,
I2, I3 are indicators of region of the U.S. Ther are four regions in all.
The regression equation is
InfctRsk = - 0.281 + 0.597 Stay - 1.23 I1 - 1.02 I2 - 1.28 I3
Predictor Coef SE Coef T P
Constant -0.2813 0.6707 -0.42 0.676
Stay 0.59737 0.07710 7.75 0.000
I1 -1.2331 0.3821 -3.23 0.002
I2 -1.0244 0.3462 -2.96 0.004
I3 -1.2823 0.3248 -3.95 0.000
S = 1.01823 R-Sq = 40.7% R-Sq(adj) = 38.5%
Source DF SS MS F P
Regression 4 74.141 18.535 17.88 0.000
Residual Error 104 107.827 1.037
Total 108 181.968
Source DF Seq SS
Stay 1 57.523
I1 1 0.436
I2 1 0.027
I3 1 16.155
Equations for each region:
If Region = 1, I1=1, I2=I3=0 equation is −0.281 +0.597 Stay − 1.23 = −1.511+0.597 Stay
If Region = 2, I2=1, I1=I3=0 equation is −0.281 +0.597 Stay − 1.02 = −1.301+0.597 Stay
If Region = 3, I3=1, I1=I2=0 equation is −0.281 +0.597 Stay − 1.28 = −1.561+0.597 Stay
If Region = 4, I1= I2=I3=0 equation is = −0.281 +0.597 Stay
Notice that each β multiplying an indicator changes the intercept compared to the intercept in region 4.
The t-tests for these βs are tests of how each of regions 1, 2, and 3 differ from region 4.
Overall test for Regions
It’s clear from the t-tests that regions 1, 2 and 3 differ from regions 4 so an F-test of region effect really isn’t necessary. But, to show how it’s done we’ll test H
0
:
2
3
4
0
We can use the SEQ SS to get SSR(I1,I2,I3|Stay) = 0.436+0.027+16.155= 16.618.
So F = (16.618/3)/1.037 with 3 and 104 df.
We can also determine SSR(I2,I2,I3|Stay) by finding the SSE for the reduced model (Stay as only predictor) and finding the difference SSE(Red)-SSE(Full) Some results for this model are below.
SSE(Stay)=124.445, so SSE(Reduced)-SSE(Full) = 124.445 – 107.827 = 16.618
Regression Analysis: InfctRsk versus Stay
The regression equation is
InfctRsk = - 0.291 + 0.491 Stay
Analysis of Variance
Source DF SS MS F P
Regression 1 57.523 57.523 49.46 0.000
Residual Error 107 124.445 1.163
Total 108 181.968
Final Model?
Some thought about the results leads to the realization that the first three regions are about the same (see the size of the coefficients and the plotted regression equations n the Oct. 25 handout). The fourth region differs. A “final” model might include just Stay and I4 as predictors.
Results are below. Notice that “S=” and R 2 are nearly the same as for the model that included I1, I2, I3 as predictors.
The regression equation is
InfctRsk = - 1.51 + 0.602 Stay + 1.19 I4
Predictor Coef SE Coef T P
Constant -1.5070 0.7041 -2.14 0.035
Stay 0.60169 0.07161 8.40 0.000
I4 1.1911 0.3077 3.87 0.000
S = 1.01420 R-Sq = 40.1% R-Sq(adj) = 39.0%
Analysis of Variance
Source DF SS MS F P
Regression 2 72.935 36.467 35.45 0.000
Residual Error 106 109.033 1.029
Total 108 181.968
Adding Interaction
To investigate the interaction between Region and Stay using the full model of the previous page, add
Stay*I1, Stay*I2, and Stay*I3 as predictors. An overall test of interaction would have a null hypothesis that the coefficients multiplying these variables all equal 0.
Interpreting Coefficients in the presence of interaction
Suppose a model is E ( Y )
0
1
X
1
2
X
2
3
X
1
X
2
.
How do we interpret t-test results for each of the coefficients β
1
, β
2
, β
3
?