252solnI2 11/9/07(Open this document in 'Page Layout' view!) I. LINEAR REGRESSION-Confidence Intervals and Tests 1. Confidence Intervals for 2. Tests for b1 . b1 . 3. Confidence Intervals and Tests for b0 Text 13.40-13.42, 13.49 [13.35-13.37, 13.43] (13.35-13.37, 13.43) (In 13.42[13.37] do test for 4. Prediction and Confidence Intervals for b0 as well) y Text 13.55-13.56, 13.58 [13.49-13.51] (13.47-13.49) This document includes exercises 13.49 to 13.51 ----------------------------------------------------------------------------------------------------------------------------- ---- Problems involving confidence and Prediction Intervals for Y. Answers below are heavily edited versions of the answers in the Instructor’s Solution Manual. Note that your text uses S YX for the standard error, which I call s e . The sum S xy and the covariance s xy xy nxy , which was introduced last term, are not the same thing. n 1 Exercise 13.55 [13.49 in 9th] (13.47 in 8th edition): Assume .05 , Y 5.00 3.00 x , X 2 , s e SYX 1.0 and X X X 2 2 nX 2 SS x 20 . Construct a) confidence interval and b) a prediction interval for Y when X 0 2 . Solution: From the Outline, the Confidence Interval is Y0 Yˆ0 t sYˆ , where 1 sY2ˆ s e2 n 1 sY2 s e2 n X 0 X 2 X 2 nX 2 X 0 X 2 X 2 nX 2 2 and the Prediction Interval is Y s2 1 X 0 X e n SS x 1 X X 1 s e2 0 n SS x 0 Yˆ0 t sY , where 2 1 . Given: .05 , Y 5.00 3.00 x , n 20 , df n 2 18, s e SYX 1.0 , X 2 and X X X 2 2 nX 2 SS x 20 we need confidence and prediction intervals for X 2. Note that if X 0 X 2, Yˆ0 Y 5 32 11. 18 a) So the confidence interval is Y0 Yˆ0 t sYˆ , where the t is t.n2 t.025 2.101 and 2 2 1 X X 2 1.0 2 1 2 2 1.0 1 0.05 . So the 95% confidence interval is s Y2ˆ s e2 0 20 n SS x 20 20 Y0 Yˆ0 t sYˆ 11 2.101 0.05 11 0.470 or 10.530 to 11.470. b) The prediction interval is the same as the confidence interval except for the addition of 1 inside the parentheses. Y0 Yˆ0 t sY , where sY2 1 s e2 n 2 1 2 22 X0 X 1 1 1.0 2 1 1.0 SS x 20 20 1 1.05 . So the 95% prediction 20 interval is Y0 Yˆ0 t sY 11 2.101 1.05 11 2.153 or 8.847 to 13.153. 1 252solnI2 11/18/03 Exercise 13.56 [13.50 in 9th] (13.48 in 8th edition): For the previous problem, construct a) a confidence interval and b) a prediction interval for Y if X 0 4 . c) Compare with the previous problem. Solution: Given: .05 , Y 5.00 3.00 x , n 20 , df n 2 18, s e SYX 1.0 , X 2 and X X X 2 2 nX 2 SS x 20 we need confidence and prediction intervals for X 4. Note that if X 0 4, Yˆ0 5 34 17. The value of t is unchanged. 2 1 X X 2 1.0 2 1 4 2 1.0 1 4 5 0.25 . So the 95% confidence a) s Y2ˆ s e2 0 n 20 SS x 20 20 20 20 interval is Y0 Yˆ0 t sYˆ 17 2.101 0.25 17 1.0505 or 15.9495 to 18.0505. 1 X X 2 1 4 22 4 1 b) sY2 s e2 0 1 1.0 2 1 1.0 1 1.25 . So the 95% prediction n SS x 20 20 20 20 interval is Y Yˆ t s 17 2.101 1.25 17 2.349 or 14.651 to 19.349. 0 0 Y c) One of the major parts of these intervals is the term X 0 X X , the larger this interval should be. 2 . The farther X 0 is from the mean of Exercise 13.58 [13.51 in 9th] (13.49 in 8th edition): More Petfood. They want 95% confidence and prediction intervals for Y if X 0 8 and an explanation of the difference between the two intervals. Solution: From our previous work on this problem, x We had spare parts: S xy S xy n 12 xy nx y 384 1212.52.375 27.75 , SS x 2250 1212 .5 375 , SST SS y 2 x 150 12.5 , y y 28.5 2.375 . y 2 n 12 x 2 nx 2 ny 70 .69 122.375 3.0025 . n 12 . 2 2 27 .75 0.074 (the slope), b0 y b1 x 2.375 0.074 12.5 1.45 (the intercept). SS x 375 Y b0 b1 x became Y 1.45 0.074 x . SSR b1 S xy 0.07427.75 2.0535. b1 SSR 2.0535 10 .6839 . If n 12 , t.n2 t.025 2.228 . 2 SST 3.0025 SSE 0.9490 0.09490 . The standard error (of the estimate) is s e S YX 0.09490 .3081 . a) s e2 n 2 12 2 R2 So s b21 s e2 1 s b20 s e2 n s2 e .09490 0.0002531 sb 0.0002531 .01591 1 2 2 SS x 375 X nX 1 2 .09490 1 12 .5 12 375 X 2 nX 2 X2 .04745 sb 0.004745 .21783 . 0 2 252solnI2 11/18/03 At X 0 8 , Y0 1.45 0.0748 2.042 . 1 X X a) s Y2ˆ s e2 0 n SS x 2 0.09490 1 8 12.52 0.0949 1 20.25 0.01303 . 375 12 12 375 So the 95% confidence interval is Y0 Yˆ0 t sYˆ 2.042 2.228 0.01303 2.042 0.254 or 1.788 to 2.296. b) sY2 1 s e2 n 2 1 8 12 .52 X0 X 1 20 .25 1 0.0949 1 0.0949 1 0.10793 . So the SS x 375 12 12 375 95% prediction interval is Y0 Yˆ0 t sY 2.042 2.228 0.10793 2.042 0.732 or 1.310 to 2.774. c) Part (b) provides an estimate for an individual response and Part (a) provides an estimate for an average predicted value. Recall the computer output from this problem. Obs 1 2 3 4 5 6 7 8 9 10 11 12 Space 5.0 5.0 5.0 10.0 10.0 10.0 15.0 15.0 15.0 20.0 20.0 20.0 Sales 1.6000 2.2000 1.4000 1.9000 2.4000 2.6000 2.3000 2.7000 2.8000 2.6000 2.9000 3.1000 Fit 1.8200 1.8200 1.8200 2.1900 2.1900 2.1900 2.5600 2.5600 2.5600 2.9300 2.9300 2.9300 SE Fit 0.1488 0.1488 0.1488 0.0974 0.0974 0.0974 0.0974 0.0974 0.0974 0.1488 0.1488 0.1488 Residual -0.2200 0.3800 -0.4200 -0.2900 0.2100 0.4100 -0.2600 0.1400 0.2400 -0.3300 -0.0300 0.1700 St Resid -0.82 1.41 -1.56 -0.99 0.72 1.40 -0.89 0.48 0.82 -1.22 -0.11 0.63 You can get fit on line 1, for example, by computing Y0 1.45 0.0745 , since the space for that point is 2 1 X X 2 0.09490 1 5 12 .5 and take the square root. So 5. To get SE Fit, use s Y2ˆ s e2 0 12 n SS x 375 with that and t n2 t 10 2.228 , you can get the confidence interval. To get the prediction interval, . 2 remember that sY2 .025 s Y2ˆ s e2 . . James T. McClave, P. George Benson and Terry Sincich, Statistics for Business and Economics, 8th ed. , Prentice Hall, 2001, last year’s text, had some more problems of this type if you want more practice. The advantage of these problems is that they provide the whole story from the initial data to the end. Exercise 10.57: a) Scattergram. Exercise 10.58 y 12 6 Y = 1.5 + 0.946429X R-Squared = 0.900 0 0 5 x 10 3 b)Row y x2 x xy Spare Parts: S xy xy nx y y2 254 745.28571 106 .000 . 1 2 3 4 5 6 7 0 -2 4 0 0 3 0 0 0 9 2 2 4 4 4 3 4 16 12 9 8 6 36 48 64 10 8 64 80 100 11 10 100 110 121 37 28 224 254 307 x 28, y 37 , x 2 n 7. x 224 , y 2 307 , x y SS x 2 nx 2 = 224 732 112 .000 . SS y 2 ny 2 307 75.28571 2 111 .429 . b1 S xy SS x 106 .000 0.94643 112 .000 b0 y b1 x 5.28571 0.94643 4 1.5000 . So the equation is Y 1.5000 0.9464 x df n 2 5 xy 254 , and x 28 4 , y y 37 5.28571 n 7 n 7 c) SSE SST SSR SS y b1 S xy . So SSE SS y b1S xy 111 .429 0.0.9464 106 .000 2.2215 n2 n2 5 d) If x0 3, Y0 1.5000 0.9464x0 1.5000 0.94643 4.339. se2 From the outline, the Confidence Interval is Y0 5 .10 . tn 2 t.05 2.015. 2 1 Yˆ0 t sYˆ , where sY2ˆ s e2 n X 0 X 2 X 2 nX 2 2 1 1 3 42 X X = 0.33719. So s y 0.33719 0.5807 and se2 0 2.2215 7 112 .00 n SS x ˆ Y t s ˆ 4.339 2.015 0.5807 4.339 1.170 or 3.17 to 5.51. Y0 0 Y e) The Prediction Interval is Y0 Yˆ0 t sY , where 1 sY2 s e2 n X 0 X 2 2 2 s 2 1 X 0 X 1 2.2215 1 3 4 1 = 2.55869. So 1 en 7 112 .00 SS x X 2 nX 2 s y 2.55869 1.5996 and Y0 Yˆ0 t sY 4.339 2.015 1.5996 4.339 3.223 or 1.12 to 7.56. f) A Minitab plot of these intervals is shown below. Note that the prediction interval is larger than the confidence interval. This is because the confidence interval is actually only a confidence interval for the average value of Y for a given X. If we go back to our model y 0 1x , where is assumed to be a Normally distributed random variable, the confidence interval only reflects the effects of the variability of on our estimates of the slope and the y-intercept. The prediction interval is a confidence interval for the actual value of Y for a given X. Thus, in addition to the effects of on the coefficients, it also reflects the fact that the actual value of Y would not be on the regression line, even if our regression line was absolutely correct, because of . 4 Regression Plot y 12 6 Y = 1.5 + 0.946429X R-Squared = 0.900 0 Regression 90% CI 90% PI 5 0 10 x 5 Exercise 10.61: (Use Y in thousands- actually millions) a) Scattergram. Regression Plot 4.6 y 3.6 Y = 5.56613 - 0.210346X R-Squared = 0.844 2.6 1.6 7 8 9 10 11 12 13 14 15 16 x b) Y is homes sold in millions. X is the interest rate in per cent (i.e. '8.00' means 8%). I prepared a table with the given data and got the following results using the program 252sols given in the solution to exercise 10.42: Worksheet size: 100000 cells MTB > RETR 'C:\MINITAB\MBS10-61.MTW'. Retrieving worksheet from file: C:\MINITAB\MBS10-61.MTW Worksheet was saved on 4/10/2001 MTB > Execute 'C:\MINITAB\252SOLS.MTB' 1. Executing from file: C:\MINITAB\252SOLS.MTB Data Display Row 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 y 1.990 2.719 2.868 3.214 3.565 3.526 3.594 3.346 3.211 3.220 3.520 3.802 3.946 3.812 4.087 4.215 x 15.82 13.44 13.81 12.29 10.09 10.17 10.31 10.22 10.08 9.20 8.43 7.36 8.59 8.05 8.03 7.76 x2 250.272 180.634 190.716 151.044 101.808 103.429 106.296 104.448 101.606 84.640 71.065 54.170 73.788 64.803 64.481 60.218 xy 31.4818 36.5434 39.6071 39.5001 35.9709 35.8594 37.0541 34.1961 32.3669 29.6240 29.6736 27.9827 33.8961 30.6866 32.8186 32.7084 y2 3.9601 7.3930 8.2254 10.3298 12.7092 12.4327 12.9168 11.1957 10.3105 10.3684 12.3904 14.4552 15.5709 14.5313 16.7036 17.7662 6 Data Display K1 54.6350 K2 163.650 K3 1763.42 K4 539.970 K5 191.259 y x x xy y 2 2 Data Display n K17 16.0000 K18 89.5853 K19 4.69788 K20 -18.8437 K21 10.2281 x nx SS y ny S xy nx y x x K22 3.41469 y SS x 2 2 2 2 y xy n y n I then used a regression command equivalent to the one in the text. MTB > regress c1 on 1 c2; SUBC> predict 8. Regression Analysis The regression equation is y = 5.57 - 0.210 x Predictor Constant x Coef 5.5661 -0.21035 s = 0.2290 Stdev 0.2540 0.02419 R-sq = 84.4% t-ratio 21.91 -8.69 p 0.000 0.000 R-sq(adj) = 83.3% Analysis of Variance SOURCE Regression Error Total DF 1 14 15 SS 3.9637 0.7341 4.6979 Unusual Observations Obs. x y 1 15.8 1.9900 MS 3.9637 0.0524 Fit 2.2385 F 75.59 Stdev.Fit 0.1469 p 0.000 Residual -0.2485 St.Resid -1.41 X X denotes an obs. whose X value gives it large influence. Fit 3.8834 Stdev.Fit 0.0786 ( 95.0% C.I. 3.7147, 4.0521) ( 95.0% P.I. 3.3639, 4.4028) 7 c) From the table above, our formula is Y 5.5661 0.21035 x H : 0 b 0 To test 0 1 use t 1 8.69 . sb1 H 1 : 1 0 df n 2 16 2 14 . Make a diagram. Show an almost normal curve with a 95% 'accept' region between t 14 2.145 and t 14 2.145 . Since -8.69 is not between these two values, we reject the null .025 .025 hypothesis and must conclude that the slope is significant. Mortgage rates seem to affect the number of homes sold. Note also that the ANOVA, which tests the same thing, gives us a high value of F, and that all p-values are zero, indicating that the null hypothesis of insignificance would be rejected at any significance level. H 0 : 0 0 b 0 Note: To test use t 0 21 .91 . Since this is in our 'reject' region, reject the null sb0 H 1 : 0 0 hypothesis and conclude that the intercept is significant. d) The coefficient of determination, R 2 , is 84.4% , This indicates that 84.4% of the variation in Y is explained by the regression. e), f) The last line of the regression printout gives the confidence and prediction intervals. The confidence interval tells us that there is a 95% probability that the average number of homes sole when the interest rate is 8% is between 3.71 and 4.05 millions. The prediction interval tells us that there is a 95% probability that in a given year when the mortgage rate is 8%, the actual number of homes sold will be between 3.36 and 4.40 million. g) See the previous problem. 8