CHAPTER 4: Solutions to Selected Exercises 4.1 a. The plot of price versus size has a straight-line appearance and thus the model y 0 1 x1 is appropriate. The plot of price versus rating has a straight-line appearance. The model y 0 1 x2 is appropriate. Combining these two models, we obtain the model y 0 1 x1 2 x2 . b. y| x 20, x 2 9 is the mean (or average) of the sales prices of all houses having 20 hundred (that is, 2000) square feet and a niceness rating of 9. c. 0 = mean sales price of all houses having 0 square feet and 0 niceness rating – meaningless. 1 = change in mean sales price associated with each increase in house size of 100 square feet, when niceness rating stays constant. 2 = change in mean sales price associated with each increase in niceness rating of 1, when house size remains constant. d. 4.2 The error term represents all factors other than the square footage and the niceness rating. One such factor is the ability and effort of the real estate agent listing the house. The plots suggest a linear relationship between each of the independent variables and hours. 0 = meaningless 1 = The change in mean monthly labor hours associated with an increase of one x-ray exposure, when the other independent variables are held constant. 2 - The change in mean monthly labor hours associated with an increase of one in the number of monthly bed days, when the other independent variables are held constant. 3 = The change in mean monthly labor hours associated with a one day increase in the average length of stay, when the other independent variables are held constant. The error term represents all factors other than x1 , x2 , and x3 . Such factors could involve patient load, size of hospital, etc. 4.3 a. b0 29.347, b1 5.6128, b2 3.8344 b0 = meaningless b1 = 5.6128 implies that we estimate that mean sales price increases by $5,612.80 for each increase of 100 square feet in house size, when the niceness rating stays constant. b2 = 3.8344 implies that we estimate that mean sales price increases by $3,834.40 for each increase in niceness rating of 1, when the square footage remains constant. 4.4 b. 172.28. From yˆ 29.347 5.6128(20) 3.8344(8) a. b0 =1523.38924 b1 =.05299 b2 =.97848 28 b3 = -320.95083 b. 16,065 is computed by yˆ 1523.38924 .05299(56194) .97848(14077.88) 320.95083(6.89) c. y yˆ 17,207.31 16,065 1,142.31 hours Actual hours exceeds predicted hours by 1,142.31 4.5 SSE 73.6 73.6 10.5; s 10.5 3.242 n (k 1) 10 (2 1) 7 a. SSE 73.6; s 2 b. Total variation = 7447.5 Unexplained variation = 73.6 Explained variation = 7374 c. R2 7374 .99 7447.5 k n 1 R 2 R2 n 1 n (k 1) = .99 2 10 1 10 1 10 (2 1) = .987 d. F(model) = = e. (Explained variation ) / k ( Unexplaine d variation ))/(n - (k 1)) 7374 / 2 7374 / 2 350.87 73.6 /(10 (2 1)) 73.6 / 7 Based on 2 and 7 degrees of freedom, F.05 = 4.74. Since F(model) = 350.87 > 4.74, we reject H 0 : 1 2 0 by setting =.05. f. Based on 2 and 7 degrees of freedom, F.01 = 9.55. Since F (model) = 350.87 > 9.55, we reject H 0 : 1 2 0 by setting = .01. g. p-value = 0.000 (which means less than.001). Since this p-value is less than = .10, .05, .01, and .001, we have extremely strong evidence that H 0 : 1 2 0 is false. That is, we have extremely strong evidence that at least one of x1 and x2 is significantly related to y. 4.6 a. SSE = 4,913,399 29 s2 SSE 4,913,399 377,954 n (k 1) 17 (3 1) s 377,954 614.77942 b. Total Variation = 494,712,540 Unexplained Variation = 4,913,399 Explained Variation = 489,799,142 c. R2 489,799,142 .9901 494,712,540 k n 1 3 17 1 .9901 .9878 R 2 R2 n 1 n (k 1) 17 1 17 (3 1) d. F (model) = (Explained Variation) / k 489,799,142 / 3 431.97 (Unexplain ed Variation) /(n-(k 1 )) 4,913,399 /(17 (3 1)) e. Based on 3 and 13 degrees of freedom, F.05 = 3.41 F (model) = 431.97 > F.05 =3.41. Reject H 0 : 1 2 0 at =.05 f. Based on 3 and 13 degrees of freedom, F.01 =4.35. F (model) = 431.97 > F.01 =4.35. Reject H 0 : 1 2 0 at =.01 g. p-value < .0001. Reject H 0 at .001 . We have extremely strong evidence that at least one of x1 , x2 , and x3 is significantly related to y. 4.7 We first consider the intercept 0 a. b0 = 29.347, sb0 =4.891, t = 6.00 where t = b0 / sb0 29.347 / 4.891 6.00 b. We reject H 0 : 0 0 (and conclude that the intercept is significant) with .05 if | t | t.705 / 2 t.7025 Since t .7025 = 2.365 (with n – (k + 1) = 10 – (2 + 1) = 7 degrees of freedom), we have t = 6.00 > t .7025 = 2.365. We reject H 0 : 0 0 with .05 and conclude that the intercept is significant at the .05 level. 30 c. We reject H 0 : 0 0 with .01 if |t| > t .701 / 2 t .7005 Since t .7005 = 3.499, we have t = 6.00 > t .005 = 3.499. We reject H 0 : 0 0 with .01 and conclude that the intercept is significant at the .01 level. d. The Minitab output tells us that the p-value for testing H 0 : 0 0 is 0.000. Since this pvalue is less than each given value of , we reject H 0 : 0 0 at each of these values of . We can conclude that the intercept 0 is significant at the .10, .05, .01, and the .001 levels of significance. e. A 95% confidence interval for 0 is [b0 t n / 2k1 sb0 ] [b0 t .7025 s b0 ] = [29.347 2.365 (4.891)] = [17.780, 40.914] This interval has no practical interpretation since 0 is meaningless. f. A 99% confidence interval for 0 is [b0 t.7005 sb0 ] [29.347 3.499(4.891)] = [12.233, 46.461] We next consider 1 . a. b1 5.6128, sb1 .2285, t 24.56 where t b1 / sb1 = 5.6128 / .2285 = 24.56 b., c., and d.: We reject H 0 : 1 0 (and conclude that the independent variable x1 is significant) at level of significance if | t | t7 / 2 (based on n – (k + 1) = 10 – 3 = 7 d.f.) 7 7 For = .05, t7 / 2 t.7025 2.365 , and for = .01, t / 2 t.005 3.499. Since t = 24.56 > t.7025 2.365 , we reject H 0 : 1 0 with = .05. Since t = 24.56 > t.7005 3.499 , we reject H 0 : 1 0 with = .01. Further, the Minitab output tells us that the p-value related to testing H 0 : 1 0 is 0.000. Since this p-value is less than each given value of , we reject H 0 at each of these values of (.10, .05, .01, and .001). 31 The rejection points and p-value tell us to reject H 0 : 1 0 with = .10, = .05, = .01, and = .001. We conclude that the independent variable x1 (home size) is significant at the .10, .05, .01, and .001 levels of significance. e. and f.: 95% interval for 1 : [b1 t.7025 sbi ] [5.6128 2.365(.2285)] [5.072,6.153] 99% interval for 1 : [b1 t.7005 sbi ] [5.6128 3.499(.2285)] [4.813,6.412] For instance, we are 95% confident that the mean sales price increases by between $5072 and $6153 for each increase of 100 square feet in home size, when the rating stays constant. Last, we consider 2 . a. b2 3.8344, sb2 .4332, t 8.85 where t b2 / sb2 = 3.8344 / .4332 = 8.85 b., c. and d.: We reject H 0 : 2 0 (and conclude that the independent variable x2 is significant) at level of significance if | t | t7 / 2 . 7 7 For = .05, t7 / 2 t.7025 2.365 , and for = .01, t / 2 t.005 3.499. Since t = 8.85 > t.7025 2.365 , we reject H 0 : 2 0 with = .05. Since t = 8.85 > t.7005 3.499 , we reject H 0 : 2 0 with = .01. Further, the Minitab output tells us that the p-value related to testing H 0 : 2 0 is 0.000. Since this p-value is less than each given value of , we reject H 0 at each of these values of (.10, .05, .01, and .001). The rejection points and p-value tell us to reject H 0 : 2 0 with = .10, = .05, = .01, and = .001. We conclude that the independent variable x2 (niceness rating) is significant at the .10, .05, .01, and .001 levels of significance. e. and f.: 95% interval for 2 : [b2 t.7025 sb2 ] [3.8344 2.365(.4332)] [2.810,4.860] 32 99% interval for 2 : [b2 t.7005 sb2 ] [3.8344 3.499(.4332)] [2.319,5.350] For instance, we are 95% confident that the mean sales price increases by between $2810 and $4860 for each increase of one rating point, when the home size stays constant. 4.8 Same process as Exercise 4.7 y 0 1 x1 2 x2 3 x3 n (k 1) 17 (3 1) 13 t.13 005 3.012 t.13 025 2.160 H 0 : 0 0 t 1523.38924 1.94 . Do not reject H 0 at .05, .01 786.89772 .05299 2.64 . Reject at .05 , not .01 .02009 .97848 t 9.31 . Reject at .05, .01 H 0 : 2 0 .10515 320.95083 t 2.10 . Do not reject at .05 , .01 H 0 : 3 0 153.1922 p-value for testing H 0 : 1 0 is .0205. Reject at .05 . H 0 : 1 0 t H 0 : 2 0 is <.0001. Reject at =.001. H 0 : 3 0 is .0583. Do not reject at .05 but can reject a .10 . 95% C.I. [b j 2.160sb j ] 99% C.I. [b j 3.012sb j ] 4.9 a. Point estimate is yˆ 172.28 ($172,280) 95% confidence interval is [168.56, 175.99] b. Point prediction is yˆ 172.28 95% prediction interval is [163.76, 180.80] g. Stdev Fit = s Distance value 1.57 This implies that Distance value = (1.57 /s) 2 = (1.57 / 3.242) 2 = 0.2345 The 99% confidence interval for mean sales price is [ yˆ t[.7005 ] s Distance value ] = [172.28 3.499 (1.57)] = [172.28 5.49] = [166.79, 177.77] 33 The 99% prediction interval for an individual sales price is [ yˆ t[.7005 ] s 1 Distance value ] = [172.28 3.499 (3.242) 1 0.2345 ] = [172.28 12.60] = [159.68, 184.88] 4.10 y = 17,207.31 is within the P.I. interval [14,511, 17,618]. There is no statistical evidence to say the labor hours are unusually high or low for this hospital. 4.11 ŷ =30,626 + 3.893 (28,000) – 29,607 (1.56) + 86.52 (1821.7) = 251,056.564 4.12 For a house of a given square footage and age, each additional room adds $6321.78 to the price of the home when the number of bedrooms stays constant, while adding a bedroom deducts $11,103.16 from the price when the number of rooms stays constant. 4.13 a. The straight line appearance of the plot of y versus x1 suggests that the model y 0 1 x1 might appropriately relate y to x1 . The possibly quadratic appearance of the plot of y versus x2 suggests that the model y 0 1 x2 2 x22 might appropriately relate y to x2 . Combining these models, we obtain the model y 0 1 x1 2 x2 3 x22 which might appropriately relate y to x1 and x2 . b. The p-value related to F(model) is 0.000. Since this p-value is less than .001, we have extremely strong evidence that the model is significant. The p-values related to x1, x2 , and x22 are, respectively, 0.000, 0.000, and 0.006. Since each of these p-values is less than .01, we have very strong evidence that each of x1, x2 , and x22 is significant (important). c. The point prediction is ŷ = 171.222 ($171,222) The 95% prediction interval is [166.365, 176.079] We are 95% confident that the sales price for an individual house with 2,000 square feet and a “niceness rating” of 8 will be between $166,365 and $176,079. 4.14 a. b. The plots have a quadratic appearance. ŷ = 35.0261 (1) 95% C.I. = [34.4997, 35.5525] ŷ = 35.0261 (2) 95% P.I. = [35.5954, 36.4568] 4.15 a. The p-value for x1x2 is .014. Since this p-value is less than .05, we have strong evidence that x1x2 is important. 34 b. 171.751, [168.835, 174.666] The length of this interval is 174.666 – 168.835 = 5.83 The length of the interval for the model in Figure 4.27 is 176.079 – 166.36 = 9.72 Hence, the interaction model is giving us a more accurate estimate for the sales price of a house with 2,000 square feet and a rating of 8. a, yˆ 27.438 5.0813x1 7.2899(2) .5311(2)2 .11473x1 (2) 27.438 5.0813x1 14.5798 2.1244 .22946 x1 4.16 39.8934 5.31076 x1 when x1 = 13 ŷ = 39.8934 + 5.31076 (13) when x1 = 22 ŷ = 39.8934 + 5.31076 (22) = 39.8934 + 116.83672 = 156.73012 = 39.8934 + 69.03988 = 108.93328 b. yˆ 27.438 5.0813x1 7.2899(8) .5311(8)2 .11473x1 (8) 27.438 5.0813x1 58.3192 33.9904 .91784 x1 51.7668 5.99914 x1 when x1 = 13 ŷ = 51.7668 + 5.99914 (13) when x1 = 22 ŷ = 51.7668 + 5.99914 (22) = 51.7668 + 131.98108 = 183.74788 = 51.7668 + 77.98882 = 129.75562 c. One can see from the slopes in the two equations that the slope of 5.99914 when x2 8 is somewhat larger than the slope of 5.31076 when x2 2 . The graph also shows that the line for x2 8 rises a little faster. Thus we estimate that as square feet (home size) increases, the mean sales price increases faster when the rating is 8 than when the rating is 2. 4.17 ŷ = 6.0599; 95% P.I. = [3.7578, 8.3620]; 95% confident the actual profit for a future construction project with a contract size of $480,000 and a supervisor with 6 years experience will be between $375,780 and $836,200. 4.18 a. yˆ 19.3050 1.4866 x1 6.375152 .7522 x12 1.7171x1 2 6.562 1.9476 x1 .7522 x12 35 When x1 3 , yˆ 6.562 1.94763 .75223 5.635. 2 When x1 4, yˆ 6.562 1.94764 .75224 2.3172 . 2 When x1 5, yˆ 6.562 1.94765 .75225 2.505. 2 Plot of ŷ against x1 (for x1 = 3, 4, and 5) when x2 2 b. yˆ 19.3050 1.4866 x1 6.3715(4) .7522 x12 1.7171x1 (4) = 6.181 5.3818 x1 .7522 x12 When x1 =3, yˆ 6.181 5.3818(3) .7522(3) 2 3.1946. When x1 =4, yˆ 6.181 5.3818(4) .7522(4) 2 3.311. When x1 =5, yˆ 6.181 5.3818(5) .7522(5) 2 1.923. Plot of ŷ against x1 (for x1 = 3, 4, and 5) when x2 4 c. yˆ 19.3050 1.4866 x1 6.37156 .7522 x12 1.7171x1 6 18.924 8.816 x1 .7522 x12 When x1 =3, yˆ 18.924 8.816(3) .7522(3) 2 .7542. When x1 =4, yˆ 18.924 8.816(4) .7522(4) 2 4.3048. When x1 =5, yˆ 18.924 8.816(5) .7522(5) 2 6.351. 36 Plot of ŷ against x1 (for x1 = 3, 4, and 5) when x2 6 Plot of ŷ against x1 (for x1 = 3, 4, and 5) These plots suggest that: For a low level of supervisor experience ( x2 2 ), profit decreases substantially as the contract size ( x1 ) increases. For a moderate level of supervisor experience ( x2 4 ), profit decreases when the contract size ( x1 ) becomes large. For a high level of supervisor experience ( x2 6 ), profit increases as the contract size ( x1 ) increases. These results suggest that inexperienced supervisors should be assigned to smaller contracts, while the most experienced supervisors should be assigned to the largest 37 contracts. Medium sized contracts can be assigned to supervisors with medium experience levels with slightly better results than would be obtained when an inexperienced supervisor is assigned, and the slightly worse results than would be obtained when a more experienced supervisor is assigned. If we set 6.562 + 1.9476 x1* .7522 x1* 2 6.181 5.3818 x1* .7522 x1* 2 we find 12.743 = 3.4342 x or x 3.71 . (See part a when x2 2 . See part b when x2 4 .) * 1 * 1 If we set 6.562 + 1.9476 x1* - .7522( x1* ) = -18.924 + 8.816 x1* - .7522 ( x1* ) 2 we find 25.486 = 6.8684 x1* or x1* = 3.71. (See part a when x2 2 . See part c when x2 6 .) Therefore, all three curves meet at a point corresponding to a contract size of x1* = 3.71. (i) The curve when x2 2 . Supervisors with 2 years of experience should be assigned to contracts of size less than x1* = 3.71. (ii) The curve when x2 6 . Supervisors with 6 years of experience should be assigned to contracts of size greater than x1* = 3.71. (iii) Contract sizes “near” x1* = 3.71 should be assigned to supervisors with 4 years of experience. 4.19 The effect on attendance from a promotion for a day game if all other variables remain the same is (Promotion = 1, Daygame = 1) 4,745 (1) + 5,059 (1) (1) – 4,690 (1) (Weekend) + 696.5 (1) (Rival) The effect on attendance from a promotion for a night game if all other variables remain the same is (Promotion = 1, Daygame = 0) 4,745 (1) + 5,059 (1) (0) – 4,690 (1) (Weekend) + 696.5 (1) (Rival) Thus, the impact of promotion on attendance is greater by 5,059 for a day game. For example, a promotion for a day game on a weekday against a rival is expected to increase average attendance by 4,745 + 5,059 – 0 + 696.5 = 10,500.5. For a night game on a weekday against a rival the increase would be 4745 + 0 – 0 + 696.5 = 5,441.5 (i.e. 5,059 less). By a similar argument, promotion on a weekend would decrease the increase in attendance by 4,690 compared to promotion on a weekday. 4.20 a. The lines relating the size of the firm and average months to adoptions are parallel, indicating no interaction of size and type of firm. Also, the two lines are different. b. 2 equals the difference between the mean innovation adoption times of stock companies and mutual companies. c. p-value is less than .001; Reject H 0 at both levels of . 2 is significant at both levels of . 95% C.I. for 2 : [4.9770, 11.1339]; 95% confident that for any given size of insurance firm, the mean adoption time for an insurance innovation is between 4.9770 and 11.1339 months longer if the firm is a stock company rather than a mutual company. h. No interaction. 4.21 a. B B M 0 T 0 B 38 M B M 1 T 0 B M T B M 0 T 1 B T b. F = 184.57, p-value < .001: Reject H 0 ; conclude the means are not equal, that is, at least one is different. c. M B B M (B ) M T B B T ( B ) T M T B M (B T ) M T bM 21.4 bT - 4.3 bM bT 21.4 (4.3) 25.7 95% C.I. for M : [21.4 (2.131) 1.433] [18.346, 24.454] 95% C.I. for T : [-4.300 (2.131) 1.433] [-7.354, - 1.246] d. 77.20, [75.040, 79.360], [71.486, 82.914] e. [25.700 2.131 (1.433)] = [22.646, 28.754] For M , p - value .001. Hence, M T M 0. Or since 0 is not in 95% C.I. for M , M T M 0. 4.22 a. b5 .21369 b6 .38178 b6 b5 .38178 .21369 .16809 [.21369 2.069.06215] [.0851, .3423] 95% C.I. for 5 95% C.I. for 6 [.38178 2.069.06125] [.2551, .5085] Both p-values < .01, 5 and 6 are significant. b. 2 yˆ 25.61270 9.0568.20 6.57676.50 .584446.50 1.15648.206.50 .381781 8.5005 95% C.I. for mean demand: [8.4037, 8.5977] We are 95% confident that the mean demand for all sales periods when the price difference is .20, the advertising expenditure is 6.50, and campaign C is used will be between 840,370 and 859,770. 95% P.I. for individual demand: [8.2132, 8.7881] We are 95% confident that the actual demand in a particular sales period when the price difference is .20, the advertising expenditure is 6.50, and campaign C is used will be between 821,322 bottles and 878,813 bottles. c. [.0363, .2999]; p-value = .0147 6 significant at = .05. 39 95% C.I. for 6 : [.16809 2.069 (.06371)] = [.0363, .2999] 4.23 a. d ,a,C d ,a, A [ 0 1d 2 a 3 a 2 4 da 5 (0) 6 (1) 7 a(0) 8 a1] [ 0 1 d 2 a 3 a 2 4 da 5 (0) 6 (0) 7 a(0) 8 a0] 6 8a .9351 .2035 (6.2) .3266 .9351 .2035 (6.6) .408 d ,a,C d ,a,B [0 1d 2a 3a 2 4da 5 (0) 6 (1) 7 a(0) 8a1] [ 0 1 d 2 a 3 a 2 4 da 5 (1) 6 (0) 7 a(1) 8 a0 6 8a 5 7a 6 - 5 8a 7a .9351 (.4807) .2035 (6.2) .1072 (6.2) .14266 .9351 (.4807) .2035 (6.6) .1072 (6.6) .18118 Both differences increased with the larger value of a. b. yˆ 8.5118 (851,180 bottles) 95% P. I. [8.2249, 8.7988] length = 8.7988 – 8.2249 = .5739 For P.I. in Exercise 4.22 (Figure 4.36), Length = 8.7881 - 8.2132 = .5739 The intervals are essentially the same length. This is not surprising since the p-values for the two additional interaction terms both exceed = .10. 4.24 To test H 0 : 5 6 0 in Model 2, we note that Model 2 is the complete model, and therefore k = 6. If H 0 : 5 6 0 is true, Model 2 reduces to Model 1, which is the reduced model, and therefore k – g = 2. It follows that SSE R SSEC 1.0644 .3936 6708 .3354 kg 2 F 2 19.599. SSEC .3936 .3936 .017113 23 23 n (k 1) Based on 2 and 23 degrees of freedom F.05 = 3.42 and F.01 = 5.66. Since F = 19.599 is greater than F.05 and F.01 , we reject H 0 : 5 6 0 at = .05 and = .01. We have seen in Exercise 4.22 that 5 d , a , B d , a , A and 6 d , a ,C d , a , A. Therefore, if 5 6 0 , it follows that d , a , B d , a , A 0 and d , a ,C d , a , A 0. That is, H 0 : 5 6 0 is equivalent to H 0 : d , a , A d , a , B d , a ,C . If H 0 is true, the mean demand does not differ with type of advertising. We have shown that at = .01, the mean demand for at least one type of advertising is different from the other two means when price difference and advertising expenditures remain the same. 4.25 Model 3 – complete Model 1 – reduced 40 H 0 : 5 6 7 8 0 1.0644 .3518 4 F 10.634 .3518 21 F.05 = 2.84 based on 4 and 21 degrees of freedom. F.01 = 4.37 based on 4 and 21 degrees of freedom. Since 10.634 > 4.37, reject H 0 at = .05 and .01. The type of advertising has an impact on mean demand. 4.26 Model 3 – complete Model 2 – reduced H 0 : 7 8 0 .3936 .3518 2 F 1.248 .3518 21 F.05 = 3.47 based on 2 and 21 degrees of freedom. Since 1.248 < 3.47, do not reject H 0 at = .05 and .01. The effect of the type of advertising on demand is not altered by amount spent on advertising. 4.27 4.28 pˆ (4.5) a. e 3.74561.1109( 4.5) 3.5024 .7779 3.7561.1109( 4.5) 4.5024 1 e group 0: pˆ (85,82) e 56.17.4833(85).1652(82) .2317 .1761 56.17.4833(85) .1652(82) 1.2137 1 e 41