Document 15930302

advertisement
252solnI2 11/9/07(Open this document in 'Page Layout' view!)
I. LINEAR REGRESSION-Confidence Intervals and Tests
1. Confidence Intervals for
2. Tests for
b1 .
b1 .
3. Confidence Intervals and Tests for
b0
Text 13.40-13.42, 13.49 [13.35-13.37, 13.43] (13.35-13.37, 13.43) (In 13.42[13.37] do test for
4. Prediction and Confidence Intervals for
b0 as well)
y
Text 13.55-13.56, 13.58 [13.49-13.51] (13.47-13.49)
This document includes exercises 13.49 to 13.51
----------------------------------------------------------------------------------------------------------------------------- ----
Problems involving confidence and Prediction Intervals for Y.
Answers below are heavily edited versions of the answers in the Instructor’s Solution Manual. Note that
your text uses S YX for the standard error, which I call s e . The sum S xy and the covariance
s xy 
 xy  nxy , which was introduced last term, are not the same thing.
n 1

Exercise 13.55 [13.49 in 9th] (13.47 in 8th edition): Assume   .05 , Y  5.00  3.00 x , X  2 ,
s e  SYX  1.0 and
 X  X    X
2
2
 nX 2  SS x  20 . Construct a) confidence interval and b) a
prediction interval for Y when X 0  2 .
Solution: From the Outline, the Confidence Interval is  Y0  Yˆ0  t sYˆ , where
1
sY2ˆ  s e2  
n

1
sY2  s e2  
n

X 0  X 2
 X
2
 nX 2
X 0  X 2
 X

2
 nX 2

2  and the Prediction Interval is Y



  s2  1  X 0  X
e

n
SS x 




1 X X
 1  s e2   0

n
SS x 





0
 Yˆ0  t sY , where
2  1 .


Given:   .05 , Y  5.00  3.00 x , n  20 , df  n  2  18, s e  SYX  1.0 , X  2 and
 X  X    X
2
2
 nX 2  SS x  20 we need confidence and prediction intervals for X  2. Note
that if X 0  X  2, Yˆ0  Y  5  32  11.
18
a) So the confidence interval is  Y0  Yˆ0  t sYˆ , where the t is t.n2  t.025
 2.101 and

2

2
1 X X 2 
  1.0 2  1  2  2   1.0 1   0.05 . So the 95% confidence interval is
s Y2ˆ  s e2   0
 20
n
SS x  
20  
 20 


Y0  Yˆ0  t sYˆ  11 2.101 0.05  11 0.470 or 10.530 to 11.470.
b) The prediction interval is the same as the confidence interval except for the addition of 1 inside the
parentheses. Y0  Yˆ0  t sY , where
sY2


1
s e2 


n
2

 1 2  22


X0  X
 1

 1  1.0 2  
 1  1.0
SS x 


 20

20 



 1  1.05 . So the 95% prediction
 20 
interval is Y0  Yˆ0  t sY  11 2.101 1.05  11 2.153 or 8.847 to 13.153.
1
252solnI2 11/18/03
Exercise 13.56 [13.50 in 9th] (13.48 in 8th edition): For the previous problem, construct a) a confidence
interval and b) a prediction interval for Y if X 0  4 . c) Compare with the previous problem.

Solution: Given:   .05 , Y  5.00  3.00 x , n  20 , df  n  2  18, s e  SYX  1.0 , X  2 and
 X  X    X
2
2
 nX 2  SS x  20 we need confidence and prediction intervals for X  4. Note
that if X 0  4, Yˆ0  5  34  17. The value of t is unchanged.




2
1 X X 2 
  1.0 2  1  4  2   1.0 1  4   5  0.25 . So the 95% confidence
a) s Y2ˆ  s e2   0
n
 20
SS x  
20  
 20 20  20


interval is Y0  Yˆ0  t sYˆ  17  2.101 0.25  17  1.0505 or 15.9495 to 18.0505.
1 X X 2

 1 4  22

4
 1

b) sY2  s e2   0
 1  1.0 2  
 1  1.0 
 1  1.25 . So the 95% prediction


n

SS x 
20  
 20 20 
 20


interval is Y  Yˆ  t s  17  2.101 1.25  17  2.349 or 14.651 to 19.349.
0
0
Y

c) One of the major parts of these intervals is the term X 0  X
X , the larger this interval should be.
2 . The farther
X 0 is from the mean of
Exercise 13.58 [13.51 in 9th] (13.49 in 8th edition): More Petfood. They want 95% confidence and
prediction intervals for Y if X 0  8 and an explanation of the difference between the two intervals.
Solution: From our previous work on this problem, x 
We had spare parts: S xy 

S xy
n
12
xy  nx y  384  1212.52.375   27.75 , SS x 
 2250  1212 .5  375 , SST  SS y 
2
 x  150  12.5 , y   y  28.5  2.375 .
y
2

n
12
x 2  nx 2
 ny  70 .69  122.375   3.0025 . n  12 .
2
2
27 .75
 0.074 (the slope), b0  y  b1 x  2.375  0.074 12.5  1.45 (the intercept).
SS x
375


Y  b0  b1 x became Y  1.45  0.074 x . SSR  b1 S xy  0.07427.75  2.0535.
b1 

SSR 2.0535
10

 .6839 . If n  12 , t.n2  t.025
 2.228 .
2
SST 3.0025
SSE 0.9490

 0.09490 . The standard error (of the estimate) is s e  S YX  0.09490  .3081 .
a) s e2 
n  2 12  2
R2 

So s b21  s e2 



1
s b20  s e2  
n

 s2
  e  .09490  0.0002531 sb  0.0002531  .01591
1
2
2
SS x
375
X  nX


1

2


  .09490  1   12 .5
 12  375
X 2  nX 2 



X2

   .04745 sb  0.004745  .21783 .
0


2
252solnI2 11/18/03

At X 0  8 , Y0  1.45  0.0748  2.042 .

1 X X
a) s Y2ˆ  s e2   0
n
SS x 

2   0.09490  1  8 12.52   0.0949  1  20.25   0.01303 .
375 
 12





 12
375 
So the 95%
confidence interval is Y0  Yˆ0  t sYˆ  2.042  2.228 0.01303  2.042  0.254 or 1.788 to 2.296.
b)
sY2


1
s e2 


n
2

 1 8  12 .52


X0  X
 1 20 .25 

 1  0.0949  
 1  0.0949  
 1  0.10793 . So the
SS x 
375 
 12





 12
375

95% prediction interval is Y0  Yˆ0  t sY  2.042  2.228 0.10793  2.042  0.732 or 1.310 to 2.774.
c) Part (b) provides an estimate for an individual response and Part (a) provides an estimate for an average
predicted value.
Recall the computer output from this problem.
Obs
1
2
3
4
5
6
7
8
9
10
11
12
Space
5.0
5.0
5.0
10.0
10.0
10.0
15.0
15.0
15.0
20.0
20.0
20.0
Sales
1.6000
2.2000
1.4000
1.9000
2.4000
2.6000
2.3000
2.7000
2.8000
2.6000
2.9000
3.1000
Fit
1.8200
1.8200
1.8200
2.1900
2.1900
2.1900
2.5600
2.5600
2.5600
2.9300
2.9300
2.9300
SE Fit
0.1488
0.1488
0.1488
0.0974
0.0974
0.0974
0.0974
0.0974
0.0974
0.1488
0.1488
0.1488
Residual
-0.2200
0.3800
-0.4200
-0.2900
0.2100
0.4100
-0.2600
0.1400
0.2400
-0.3300
-0.0300
0.1700
St Resid
-0.82
1.41
-1.56
-0.99
0.72
1.40
-0.89
0.48
0.82
-1.22
-0.11
0.63

You can get fit on line 1, for example, by computing Y0  1.45  0.0745 , since the space for that point is


2
1 X X 2 
  0.09490  1  5  12 .5  and take the square root. So
5. To get SE Fit, use s Y2ˆ  s e2   0
 12
n
SS x  
375  


with that and t n2  t 10  2.228 , you can get the confidence interval. To get the prediction interval,
. 2
remember that
sY2
.025

s Y2ˆ
 s e2 . .
James T. McClave, P. George Benson and Terry Sincich, Statistics for Business and Economics, 8th ed. ,
Prentice Hall, 2001, last year’s text, had some more problems of this type if you want more practice. The
advantage of these problems is that they provide the whole story from the initial data to the end.
Exercise 10.57: a) Scattergram.
Exercise 10.58
y
12
6
Y = 1.5 + 0.946429X
R-Squared = 0.900
0
0
5
x
10
3
b)Row
y
x2
x
xy
Spare Parts:
S xy 
xy  nx y
y2

 254  745.28571   106 .000 .
1
2
3
4
5
6
7
0 -2
4
0
0
3 0
0
0
9
2 2
4
4
4
3 4 16 12
9
8 6 36 48 64
10 8 64 80 100
11 10 100 110 121
37 28 224 254 307
x  28,
y  37 ,

x
2
n  7.
x

 224 ,  y
2
 307 ,
x
y
SS x 
2
 nx 2 = 224  732  112 .000 .
SS y
2
 ny 2 
307  75.28571 2  111 .429 .
b1 
S xy
SS x

106 .000
 0.94643
112 .000
b0  y  b1 x  5.28571  0.94643 4  1.5000 .

So the equation is Y  1.5000  0.9464 x
df  n  2  5
 xy  254 , and
 x  28  4 , y   y  37  5.28571
n
7
n
7
c) SSE  SST  SSR  SS y  b1 S xy . So
SSE SS y  b1S xy 111 .429  0.0.9464 106 .000 


 2.2215
n2
n2
5

d) If x0  3, Y0  1.5000  0.9464x0  1.5000  0.94643  4.339.
se2 
From the outline, the Confidence Interval is  Y0


5
  .10 . tn  2  t.05
 2.015.
2
1
 Yˆ0  t sYˆ , where sY2ˆ  s e2  
n

X 0  X 2
 X
2
 nX 2





2
1
 1 3  42 
X  X 
 = 0.33719. So s y  0.33719  0.5807 and
 se2   0
 2.2215  
 7 112 .00 
n

SS x  




ˆ
  Y  t s ˆ  4.339  2.015 0.5807   4.339  1.170 or 3.17 to 5.51.
Y0
0
Y
e) The Prediction Interval is Y0  Yˆ0  t sY , where
1
sY2  s e2  
n

X 0  X 2


2
2



  s 2  1  X 0  X  1  2.2215  1  3  4  1 = 2.55869. So

1
 en
 7 112 .00


SS x 
X 2  nX 2





s y  2.55869  1.5996 and Y0  Yˆ0  t sY  4.339  2.015 1.5996   4.339  3.223 or 1.12 to 7.56.


f) A Minitab plot of these intervals is shown below. Note that the prediction interval is larger than the
confidence interval. This is because the confidence interval is actually only a confidence interval for the
average value of Y for a given X. If we go back to our model y  0  1x   , where  is assumed to be
a Normally distributed random variable, the confidence interval only reflects the effects of the variability of
 on our estimates of the slope and the y-intercept. The prediction interval is a confidence interval for the
actual value of Y for a given X. Thus, in addition to the effects of  on the coefficients, it also reflects the
fact that the actual value of Y would not be on the regression line, even if our regression line was
absolutely correct, because of  .
4
Regression Plot
y
12
6
Y = 1.5 + 0.946429X
R-Squared = 0.900
0
Regression
90% CI
90% PI
5
0
10
x
5
Exercise 10.61: (Use Y in thousands- actually millions)
a) Scattergram.
Regression Plot
4.6
y
3.6
Y = 5.56613 - 0.210346X
R-Squared = 0.844
2.6
1.6
7
8
9
10
11
12
13
14
15
16
x
b) Y is homes sold in millions. X is the interest rate in per cent (i.e. '8.00' means 8%). I prepared a table
with the given data and got the following results using the program 252sols given in the solution to exercise
10.42:
Worksheet size: 100000 cells
MTB > RETR 'C:\MINITAB\MBS10-61.MTW'.
Retrieving worksheet from file: C:\MINITAB\MBS10-61.MTW
Worksheet was saved on 4/10/2001
MTB > Execute 'C:\MINITAB\252SOLS.MTB' 1.
Executing from file: C:\MINITAB\252SOLS.MTB
Data Display
Row
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
y
1.990
2.719
2.868
3.214
3.565
3.526
3.594
3.346
3.211
3.220
3.520
3.802
3.946
3.812
4.087
4.215
x
15.82
13.44
13.81
12.29
10.09
10.17
10.31
10.22
10.08
9.20
8.43
7.36
8.59
8.05
8.03
7.76
x2
250.272
180.634
190.716
151.044
101.808
103.429
106.296
104.448
101.606
84.640
71.065
54.170
73.788
64.803
64.481
60.218
xy
31.4818
36.5434
39.6071
39.5001
35.9709
35.8594
37.0541
34.1961
32.3669
29.6240
29.6736
27.9827
33.8961
30.6866
32.8186
32.7084
y2
3.9601
7.3930
8.2254
10.3298
12.7092
12.4327
12.9168
11.1957
10.3105
10.3684
12.3904
14.4552
15.5709
14.5313
16.7036
17.7662
6
Data Display
K1
54.6350
K2
163.650
K3
1763.42
K4
539.970
K5
191.259
y
x
x
  xy
y

2
2
Data Display
n
K17
16.0000
K18
89.5853
K19
4.69788
K20
-18.8437
K21
10.2281
 x  nx
 SS   y  ny
 S   xy  nx y
x
x
K22
3.41469
y
 SS x 
2
2
2
2
y
xy
n
y
n
I then used a regression command equivalent to the one in the text.
MTB > regress c1 on 1 c2;
SUBC> predict 8.
Regression Analysis
The regression equation is
y = 5.57 - 0.210 x
Predictor
Constant
x
Coef
5.5661
-0.21035
s = 0.2290
Stdev
0.2540
0.02419
R-sq = 84.4%
t-ratio
21.91
-8.69
p
0.000
0.000
R-sq(adj) = 83.3%
Analysis of Variance
SOURCE
Regression
Error
Total
DF
1
14
15
SS
3.9637
0.7341
4.6979
Unusual Observations
Obs.
x
y
1
15.8
1.9900
MS
3.9637
0.0524
Fit
2.2385
F
75.59
Stdev.Fit
0.1469
p
0.000
Residual
-0.2485
St.Resid
-1.41 X
X denotes an obs. whose X value gives it large influence.
Fit
3.8834
Stdev.Fit
0.0786
(
95.0% C.I.
3.7147, 4.0521)
(
95.0% P.I.
3.3639, 4.4028)
7

c) From the table above, our formula is Y  5.5661  0.21035 x
H :   0
b 0
To test  0 1
use t  1
 8.69 .
sb1
H 1 :  1  0
df  n  2  16  2  14 . Make a diagram. Show an almost normal curve with a 95% 'accept' region
between  t 14  2.145 and t 14  2.145 . Since -8.69 is not between these two values, we reject the null
.025
.025
hypothesis and must conclude that the slope is significant. Mortgage rates seem to affect the number of
homes sold. Note also that the ANOVA, which tests the same thing, gives us a high value of F, and that all
p-values are zero, indicating that the null hypothesis of insignificance would be rejected at any significance
level.
H 0 :  0  0
b 0
Note: To test 
use t  0
 21 .91 . Since this is in our 'reject' region, reject the null
sb0
H 1 :  0  0
hypothesis and conclude that the intercept is significant.
d) The coefficient of determination, R 2 , is 84.4% , This indicates that 84.4% of the variation in Y is
explained by the regression.
e), f) The last line of the regression printout gives the confidence and prediction intervals. The confidence
interval tells us that there is a 95% probability that the average number of homes sole when the interest rate
is 8% is between 3.71 and 4.05 millions. The prediction interval tells us that there is a 95% probability that
in a given year when the mortgage rate is 8%, the actual number of homes sold will be between 3.36 and
4.40 million.
g) See the previous problem.
8
Download