Oct. 29 Handout

advertisement

Stat 501 Oct. 29

Here are results carried over from the Oct. 25 handout. Y = infection risk in hospitals. The variables I1,

I2, I3 are indicators of region of the U.S. Ther are four regions in all.

The regression equation is

InfctRsk = - 0.281 + 0.597 Stay - 1.23 I1 - 1.02 I2 - 1.28 I3

Predictor Coef SE Coef T P

Constant -0.2813 0.6707 -0.42 0.676

Stay 0.59737 0.07710 7.75 0.000

I1 -1.2331 0.3821 -3.23 0.002

I2 -1.0244 0.3462 -2.96 0.004

I3 -1.2823 0.3248 -3.95 0.000

S = 1.01823 R-Sq = 40.7% R-Sq(adj) = 38.5%

Source DF SS MS F P

Regression 4 74.141 18.535 17.88 0.000

Residual Error 104 107.827 1.037

Total 108 181.968

Source DF Seq SS

Stay 1 57.523

I1 1 0.436

I2 1 0.027

I3 1 16.155

Equations for each region:

If Region = 1, I1=1, I2=I3=0 equation is −0.281 +0.597 Stay − 1.23 = −1.511+0.597 Stay

If Region = 2, I2=1, I1=I3=0 equation is −0.281 +0.597 Stay − 1.02 = −1.301+0.597 Stay

If Region = 3, I3=1, I1=I2=0 equation is −0.281 +0.597 Stay − 1.28 = −1.561+0.597 Stay

If Region = 4, I1= I2=I3=0 equation is = −0.281 +0.597 Stay

Notice that each β multiplying an indicator changes the intercept compared to the intercept in region 4.

The t-tests for these βs are tests of how each of regions 1, 2, and 3 differ from region 4.

Overall test for Regions

It’s clear from the t-tests that regions 1, 2 and 3 differ from regions 4 so an F-test of region effect really isn’t necessary. But, to show how it’s done we’ll test H

0

:

2

 

3

 

4

0

We can use the SEQ SS to get SSR(I1,I2,I3|Stay) = 0.436+0.027+16.155= 16.618.

So F = (16.618/3)/1.037 with 3 and 104 df.

We can also determine SSR(I2,I2,I3|Stay) by finding the SSE for the reduced model (Stay as only predictor) and finding the difference SSE(Red)-SSE(Full) Some results for this model are below.

SSE(Stay)=124.445, so SSE(Reduced)-SSE(Full) = 124.445 – 107.827 = 16.618

Regression Analysis: InfctRsk versus Stay

The regression equation is

InfctRsk = - 0.291 + 0.491 Stay

Analysis of Variance

Source DF SS MS F P

Regression 1 57.523 57.523 49.46 0.000

Residual Error 107 124.445 1.163

Total 108 181.968

Final Model?

Some thought about the results leads to the realization that the first three regions are about the same (see the size of the coefficients and the plotted regression equations n the Oct. 25 handout). The fourth region differs. A “final” model might include just Stay and I4 as predictors.

Results are below. Notice that “S=” and R 2 are nearly the same as for the model that included I1, I2, I3 as predictors.

The regression equation is

InfctRsk = - 1.51 + 0.602 Stay + 1.19 I4

Predictor Coef SE Coef T P

Constant -1.5070 0.7041 -2.14 0.035

Stay 0.60169 0.07161 8.40 0.000

I4 1.1911 0.3077 3.87 0.000

S = 1.01420 R-Sq = 40.1% R-Sq(adj) = 39.0%

Analysis of Variance

Source DF SS MS F P

Regression 2 72.935 36.467 35.45 0.000

Residual Error 106 109.033 1.029

Total 108 181.968

Adding Interaction

To investigate the interaction between Region and Stay using the full model of the previous page, add

Stay*I1, Stay*I2, and Stay*I3 as predictors. An overall test of interaction would have a null hypothesis that the coefficients multiplying these variables all equal 0.

Interpreting Coefficients in the presence of interaction

Suppose a model is E ( Y )

 

0

 

1

X

1

 

2

X

2

 

3

X

1

X

2

.

How do we interpret t-test results for each of the coefficients β

1

, β

2

, β

3

?

Download