252solnJ2 11/26/07 (Open this document in 'Page Layout' view!)

advertisement
252solnJ2 11/26/07
(Open this document in 'Page Layout' view!)
J. MULTIPLE REGRESSION
1. Two explanatory variables
a. Model
b. Solution.
2. Interpretation
Text 14.1, 14.3, 14.4 [14.1, 14.3, 14.4] (14.1, 14.3, 14.4) Minitab output for 14.4 will be available on the website; you must be able to
answer the problem from it.
3. Standard errors
J1, Text Problems 14.9, 14.14, 14.23, 14.26 [14.13, 14.16, 14.20, 14.23] (14.17, 14.19, 14.24, 14.27)
4. Stepwise regression
Problem J2 (J1), Text exercises 14.32, 14.34[14.28, 14.29] (14.32, 14.33)
(Computer Problem – instructions to be given)
This document includes solutions to text problems 14.28 through 14.29 and Problem J2. Again, there are
extra problems included. They are very worth looking at!
________________________________________________________________________________
Stepwise Regression Problems.
Exercise 14.32 [14.28 in 9th] (14.32 in 8th edition): Assume the following ANOVA summary, where there
are 2 independent variables and the regression sum of squares for X1 is 20 and the regression sum of
squares for X2 is 15. a) Is there a significant relationship between Y and each of the independent variables
at the 5% significance level? b) Compute rY21.2 and rY22.1
SOURCE
Regression
Error
Total
DF
2
10
12
SS
30
120
150
MS
F
p
MS
30/2=15
120/10=12
F
p
Solution: Complete the ANOVA
Analysis of Variance
SOURCE
Regression
Error
Total
DF
2
10
12
SOURCE
X1
X2
DF
1
1
14.28
(a)
SS
30
120
150
SSR
20
15
For X1: SSR( X 1 X 2 )  SSR( X 1 and X 2 )  SSR( X 2 )  30  15  15 this is the additional
explanatory power from adding X1 after X2. We are itemizing the regression sum of squares.
The ANOVA would read
SOURCE
X2
X1
Error
Total
DF
1
1
10
12
SS
15
15
120
150
SSR( X 1 X 2 )
MS
15
120/10=12
F
p
1.25
15
1,10  4.96 . Do not reject H .

 1.25 This is compared to F.05
0
MSE
120 / 10
There is not sufficient evidence that the variable X1 contributes to a model already containing X2.
For X2: SSR( X 2 X 1 )  SSR( X 1 and X 2 )  SSR( X 1 )  30  20  10 . This is additional
explanatory power from adding X2 after X1.
F
252solnJ2 11/26/07
(Open this document in 'Page Layout' view!)
The ANOVA would read
SOURCE
X1
X2
Error
Total
DF
1
1
10
12
SS
20
10
120
150
MS
10
120/10=12
F
p
0.833
SSR( X 2 X 1 )
10
1,10  4.96 . Do not reject

 0.833 This is compared to F.05
MSE
120 / 10
H0.There is not sufficient evidence that the variable X2 contributes to a model already containing X1.
Neither independent variable X1 nor X2 makes a significant contribution to the model in
the presence of the other variable. Also the overall regression equation involving both independent
variables is not significant:
MSR
30 / 2
2,10  4.10
F

 1.25 This is compared to F.05
MSE 120 / 10
Neither variable should be included in the model and other variables should be
investigated.
F
(b)
rY21.2 
SSR( X 1 X 2 )
15

SST  SSR( X 1 and X 2 )  SSR( X 1 X 2 ) 150  30  15
= 0.1111. The denominator is what is unexplained after adding X 2 only. Holding
constant the effect of variable X2, 11.11% of the variation in Y can be explained by the variation in variable
X1.
rY22.1 
SSR( X 2 X 1 )
10

SST  SSR( X 1 and X 2 )  SSR( X 2 X 1 ) 150  30  10
= 0.0769. For this one the denominator is what is unexplained after adding X 2 only.
Holding constant the effect of variable X1, 7.69% of the variation in Y can be explained by the variation in
variable X2.
Exercise 14.34 [14.29 in 9th] (14.33 in 8th edition): Recall in the Warecost problem (14.4 in 10th edition)
Analysis of Variance
Source
Regression
Residual Error
Total
DF
2
21
23
SS
3368.1
477.0
3845.1
Source
DF
Seq SS
Sales
Orders
1
1
2726.8
641.3
MS
1684.0
22.7
F
74.13
P
0.000
Note that these two add to the Regression SS
in the ANOVA.
Or we could run the independent variables in opposite sequence.
Source
Regression
Residual Error
Total
Source
Orders
Sales
DF
1
1
DF
2
21
23
SS
3368.1
477.0
3845.1
Seq SS
3246.1
122.0
MS
1684.0
22.7
F
74.13
P
0.000
252solnJ2 11/26/07
(Open this document in 'Page Layout' view!)
a) Find if the independent variables make a significant contribution to the regression model and what the
most appropriate model is.   .05 ? b) Compute rY21.2 and rY22.1 .
Solution:
(a)
For X1: SSR( X 1 X 2 )  SSR( X 1 and X 2 )  SSR( X 2 )  3368.1  3246.1  122.0 .
SSR( X 1 X 2 )
122 .0
1, 21  4.32 . Reject H . There

 5.37 . Compare this with F.05
0
MSE
477 / 21
is evidence that the variable X1 contributes to a model already containing X2.
For X2:
SSR( X 2 X 1 )  SSR( X 1 and X 2 )  SSR( X 1 )  3368.1  2726.8  641.3
F
SSR( X 2 X 1 )
641 .3
1, 21  4.32 . Reject H .
 28 .23 . Compare this with F.05
0
MSE
477 / 21
There is evidence that the variable X2 contributes to a model already containing X1.
Since each independent variable X1 and X2 makes a significant contribution to the model
in the presence of the other variable, both variables should be included in the model.
F

Analysis of Variance
Source
Regression
Residual Error
Total
DF
2
21
23
SS
3368.1
477.0
3845.1
Source
DF
Seq SS
Sales
Orders
1
1
2726.8
641.3
MS
1684.0
22.7
F
74.13
P
0.000
Note that these two add to the Regression SS
in the ANOVA.
Or we could run the independent variables in opposite sequence.
Source
Regression
Residual Error
Total
Source
Orders
Sales
DF
1
1
DF
2
21
23
SS
3368.1
477.0
3845.1
MS
1684.0
22.7
F
74.13
P
0.000
Seq SS
3246.1
122.0
(b)
rY21.2 
SSR( X 1 X 2 )
SST  SSR ( X 1 and X 2 )  SSR ( X 1 X 2 )
122 .0
 .2037 Holding constant the effect of the number of orders,
3845 .1  3368 .1  122 .0
20.37% of the variation in Y can be explained by the variation in sales.

rY22.1 
SSR ( X 2 X 1 )
SST  SSR ( X 1 and X 2 )  SSR ( X 2 X 1 )
641 .3
 .5735 . Holding constant the effect of sales, 57.35% of the
3845 .1  3368 .1  641 .3
variation in Y can be explained by the variation in the number of orders.

252solnJ2 11/26/07
(Open this document in 'Page Layout' view!)
More of Old text exercise 11.5:
The Minitab printout read
The regression equation is
y = 1.10 + 1.64 x - 0.160 x*x
Predictor
Constant
x
x*x
Coef
1.09524
1.63571
-0.15952
s = 0.1047
Stdev
0.09135
0.07131
0.01142
R-sq = 99.7%
t-ratio
11.99
22.94
-13.97
p
0.000
0.000
0.000
R-sq(adj) = 99.6%
Analysis of Variance
SOURCE
Regression
Error
Total
DF
2
4
6
SS
15.0305
0.0438
15.0743
SOURCE
x
x*x
DF
1
1
SEQ SS
12.8929
2.1376
MS
7.5152
0.0110
F
686.17
p
0.000
Two sections remain unexplained: First R-squared adjusted .
R 2 adjusted for degrees of freedom is Ra2 or Rk2 
n  1R 2  k , where
k is the number of independent
n  k 1
variables and n is the number of observations. It is intended to compensate for the fact that increasing the
number of independent variables always raises R 2 . In this version of the regression, we have n  7
7  1 0.997  2  .9955
observations and k  2 independent variables, so R k2 
. If this does not go up as
7  2 1
you add new independent variables, you can be rather sure that the new variables accomplish nothing.
Second sequential sums of squares:
The two values given, 12.8929 and 2.1376 represent an itemization of the regression sum of squares,
15.0305. This means that we could split up the ANOVA to read
SOURCE
x
x*x
Error
Total
DF
SS
MS
1
1
4
6
12.8929
2.1376
0.0438
15.0743
12.8929
2.1376
0.0110
F
p
1172.08
194.32
1, 4  , for example, we will see that both Fs are highly significant indicating that
If we compare these Fs to F.05
x explained Y well, but that adding x*x definitely improved the explanation. Note that, for the coefficient
of x*x has a t-ratio of -13.97. If this is squared, it will give us, except for rounding errors 194.32, so the test
is essentially the same as a t-test on the last independent variable added.
252solnJ2 11/26/07
(Open this document in 'Page Layout' view!)
Problem J1: n = 80, k = 3, R2 = .95
n = 80, k = 4, R2 = .99
Use an F test to show if the second regression is an improvement.
Solution: There are two ways to do this.
a) Fake an ANOVA. Call the first result R32  .95 and the second R42  .99 . Remember that R 2 
so that if SST  100 and
Source
3 Xs
Error
Total
R32  .95 , then SSR  95 . For the two regressions we get
SS
DF
Source
SS
95
3
and
4 Xs
99
5
76
Error
1
100
79
Total
100
SSR
,
SST
DF
4
75
79
If we combine these and get new values of F by dividing the MS values by 0.013333, our new error MS,
we get
Source
SS
DF
MS
F
F.05
3 Xs
95
3
31.67
4
1
4
1 more X
2375.24
300.00
3,75  2.78
F.05
1,75  3.97
F.05
Error
1
75
0.013333
Total
100
79
The second F test gives us our answer. We reject the hypothesis that the 4 th x does not contribute to the
explanation of Y.
b). If we add r independent variables so that we end with k independent variables, use the formula
2
2
n  k  1  Rk  Rk r 
F r ,n  k 1 

 . Here k  4 and r  1 , so n  k  1  80  4  1  75
2
r
 1  Rk 
F 1,75 
75  .99  .95 
 300 The test gives the same results as in a)
1  1  .99 
Exercise 11.87 in James T. McClave, P. George Benson and Terry Sincich, Statistics for Business and
Economics, 8th ed. , Prentice Hall, 2001, last year’s text:
a)Minitab was used to fit the complete model, Yˆ  14.6  0.611X 1  0.439X 2  0.080X 3  0.064X 4 and the
reduced model, Yˆ  14.0  0.642X  0.396X
n  20.
1
2
The ANOVAs follow.
Complete model:
Analysis of Variance
SOURCE
Regression
Error
Total
DF
4
15
19
SS
831.09
152.66
983.75
MS
207.77
10.18
F
20.41
p
0.002
SS
831.31
160.44
983.75
MS
411.66
9.44
F
43.61
p
0.000
Reduced model:
Analysis of Variance
SOURCE
Regression
Error
Total
DF
2
17
19
252solnJ2 11/26/07
(Open this document in 'Page Layout' view!)
b) The Minitab printout shows that in the complete model the error sum of squares is 152.66 and in the
reduced model it is 160.44. These represent the unexplained part of each model. The amount of reduction
of the unexplained part was thus only 7.78 out of 160.44.
c) We have 5 parameters in the complete model and 3 in the reduced model.
d) We can investigate the null hypothesis H 0 :  3   4  0 against the alternative that at least one of the
betas is significant.
d) We can do this using an ANOVA or using the formula in problem J1. Note that between the two
regressions, the regression sum of square rose from 832.31 to 831.09, an increase of 7.78.
If we combine these two ANOVA tables and get new values of F by dividing the MS values by the new
MSE we get
Source
SS
DF
MS
F
F.05
2 Xs
823.31
2
411.66
40.449
2,15  3.68
2 more Xs
7.78
2
3.89
0.3822
F.05
Error
152.66
15
10.17733
Total
983.75
19
We cannot reject the null hypothesis because our computed F is less than the table F.
Alternately, if we add r independent variables so that we end with k independent variables, use the
formula
2
2
n  k  1  Rk  Rk r 
F r ,n  k 1 

 . Here k  4 and r  2 , so n  k  1  20  4  1  15
2
r
 1  Rk 
F 1,75 
15  .845  .837 
 0387
2  1  .845 
f) By comparing 0.38 with other values of F 2,15 on the F table, you should be able to figure out that the
p-value is above 10%.
Download