Chap 17 17.6 a Scatter Diagram 30 Test scores 25 20 15 10 5 0 0 20 40 60 80 Lengths b b1 s xy s 2x = 51 .86 = .2675, b 0 y b1x = 13.80 – .2675(38.00) = 3.635 193 .9 Regression line: ŷ = 3.635 + .2675x (Excel: ŷ = 3.636 + .2675x) c b1 = .2675; for each additional second of commercial, the memory test score increases on average by .2675. b0 = 3.64 is the y-intercept. 17.8 a Scatter Diagram 200 Income 150 100 50 0 0 5 10 15 20 25 Education b b1 s xy s 2x = 46 .02 4.138 , b 0 y b1x = 78.13 – 4.138(13.17) =23.63. 11 .12 Regression line: ŷ = 23.63 + 4.138x (Excel: ŷ = 23.63 + 4.137x) c The slope coefficient tells us that for each additional year of education income increases on average by $4.138 thousand ($4,138). The y-intercept has no meaning. 17.16 a b1 s xy s 2x = 10 .78 .3039 , b 0 y b1x = 17.20 – (–.3039)(11.33) = 20.64. 35 .47 Regression line: ŷ = 20.64 – .3039x (Excel: ŷ = 20.64 – .3038x) b The slope indicates that for each additional one percentage point increase in the vacancy rate rents on average decrease by $.3039. The y-intercept is 20.64. 17.18 b1 s xy s 2x = .8258 .0514 , b 0 y b1x = 93.89 –.0514(79.47) = 89.81. 16 .07 Regression line: ŷ = 89.81 + .0514x (Excel: ŷ = 89.81 + .0514x) 17.98 a b1 s xy s 2x = 936 .82 2.47 b 0 y b1x = 395.21 – 2.47(113.35) = 115.24. 378 .77 Regression line: ŷ = 115.24 + 2.47x (Excel: ŷ = 114.85 + 2.47x) b b1 = 2.47; for each additional month of age, repair costs increase on average by $2.47. b0 = 114.85 is the y-intercept. c R2 s 2xy s 2x s 2y = (936 .82)2 .5659 (Excel: R 2 = .5659) 56.59% of the variation in repair costs s (378 .77 )( 4,094 .79) explained by the variation in ages. H0 : 0 17.104 H1 : 0 Rejection region: t t , n 2 t .05, 428 1.645 or r s xy s xs y tr n2 1 r2 255 ,877 (99 .11)( 2,152 ,602 ,614 ) (.5540 ) 430 2 1 (.5540 ) 2 .5540 (Excel: .5540) 13 .77 (Excel: t = 13.77, p–value = 0). There is enough evidence of a positive linear relationship. The theory appears to be valid. 18.8 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 A SUMMARY OUTPUT B Regression Statistics Multiple R R Square Adjusted R Square Standard Error Observations C D E F 0.8415 0.7081 0.7021 213.7 100 ANOVA df Regression Residual Total Intercept Space Water SS 10,744,454 4,429,664 15,174,118 2 97 99 Coefficients Standard Error 576.8 514.0 90.61 6.48 9.66 2.41 MS 5,372,227 45,667 F Significance F 117.6 0.0000 t Stat P-value 1.12 0.2646 13.99 0.0000 4.00 0.0001 a The regression equation is ŷ = 576.8 + 90.61x 1 + 9.66x 2 b The coefficient of determination is R 2 = .7081; 70.81% of the variation in electricity consumption is explained by the model. The model fits reasonably well. H 0 : 1 2 0 c H1 : At least one i is not equal to zero F = 117.6, p-value = 0. There is enough evidence to conclude that the model is valid. d&e A B C 1 Prediction Interval 2 3 Consumption 4 5 Predicted value 8175 6 7 Prediction Interval 8 Lower limit 7748 9 Upper limit 8601 10 11 Interval Estimate of Expected Value 12 Lower limit 8127 13 Upper limit 8222 D e We predict that the house will consume between 7748 and 8601 units of electricity. f We estimate that the average house will consume between 8127 and 8222 units of electricity. 18.10a 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 b A SUMMARY OUTPUT B C D E F Regression Statistics Multiple R 0.8608 R Square 0.7411 Adjusted R Square 0.7301 Standard Error 2.66 Observations 100 ANOVA df Regression Residual Total Intercept Mother Father Gmothers Gfathers SS 4 95 99 1930 674 2604 Coefficients Standard Error 3.24 5.42 0.451 0.0545 0.411 0.0498 0.0166 0.0661 0.0869 0.0657 MS 482.38 7.10 F Significance F 67.97 0.0000 t Stat P-value 0.60 0.5512 8.27 0.0000 8.26 0.0000 0.25 0.8028 1.32 0.1890 H 0 : 1 2 3 0 H1 : At least one i is not equal to zero F = 67.97, p-value = 0. There is enough evidence to conclude that the model is valid. c b1 = .451; for each one year increase in the mother's age the customer's age increases on average by .451 provided the other variables are constant (which may not be possible because of the multicollinearity). b2 = .411; for each one year increase in the father's age the customer's age increases on average by .411 provided the other variables are constant. b3 = .0166; for each one year increase in the grandmothers' mean age the customer's age increases on average by .0166 provided the other variables are constant. b4 = .0869; for each one year increase in the grandfathers' mean age the customer's age increases on average by .0869 provided the other variables are constant. H 0 : i 0 H 1 : i 0 Mothers: t = 8.27, p-value = 0 Fathers: t = 8.26, p-value = 0 Grandmothers: t = .25, p-value .8028 Grandfathers: t = 1.32, p-value = .1890 The ages of mothers and fathers are linearly related to the ages of their children. The other two variables are not. d 1 2 3 4 5 6 7 8 9 10 11 12 13 A B Prediction Interval C D Longvity Predicted value 71.43 Prediction Interval Lower limit Upper limit 65.54 77.31 Interval Estimate of Expected Value Lower limit 68.85 Upper limit 74.00 The man is predicted to live to an age between 65.54 and 77.31 g A B C 1 Prediction Interval 2 3 Longvity 4 5 Predicted value 71.71 6 7 Prediction Interval 8 Lower limit 65.65 9 Upper limit 77.77 10 11 Interval Estimate of Expected Value 12 Lower limit 68.75 13 Upper limit 74.66 D The mean longevity is estimated to fall between 68.75 and 74.66. 18.12 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 A SUMMARY OUTPUT B C D E F Regression Statistics Multiple R 0.8984 R Square 0.8072 Adjusted R Square 0.7990 Standard Error 7.07 Observations 50 ANOVA df Regression Residual Total SS 2 47 49 Coefficients Standard Error -28.43 6.89 0.604 0.0557 0.374 0.0847 Intercept Boxes Weight MS 4,916 49.97 9,832 2,349 12,181 F Significance F 98.37 0.0000 t Stat P-value -4.13 0.0001 10.85 0.0000 4.42 0.0001 a ŷ = –28.43 + .604x 1 + .374x 2 b s = 7.07 and R 2 = .8072; the model fits well. c b1 = .604; for each one additional box, the amount of time to unload increases on average by .604 minutes provided the weight is constant. b2 = .374; for each additional hundred pounds the amount of time to unload increases on average by .374 minutes provided the number of boxes is constant. H 0 : i 0 H1 : i 0 Boxes: t = 10.85, p-value = 0 Weight: t = 4.42, p-value = .0001 Both variables are linearly related to time to unload. d&e 1 2 3 4 5 6 7 8 9 10 11 12 13 A B Prediction Interval C D Time Predicted value 50.70 Prediction Interval Lower limit Upper limit 35.16 66.24 Interval Estimate of Expected Value Lower limit 44.43 Upper limit 56.96 d It is predicted that the truck will be unloaded in a time between 35.16 and 66.24 minutes. e The mean time to unload the trucks is estimated to lie between 44.43 and 56.96 minutes 18.40 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 b A SUMMARY OUTPUT B C D E F Regression Statistics Multiple R 0.6882 R Square 0.4736 Adjusted R Square 0.4134 Standard Error 2,644 Observations 40 ANOVA df Regression Residual Total Intercept Size Apartments Age Floors 4 35 39 SS MS 220,130,124 55,032,531 244,690,939 6,991,170 464,821,063 Coefficients Standard Error 1,433 2,093 -14.55 20.70 113.0 24.01 -50.10 98.81 -223.8 171.1 F 7.87 Significance F 0.0001 t Stat P-value 0.68 0.4980 -0.70 0.4866 4.70 0.0000 -0.51 0.6153 -1.31 0.1994 H 0 : 1 2 3 4 0 H1 : At least one i is not equal to zero F = 7.87, p-value = .0001. There is enough evidence to conclude that the model is valid. The regression equation for Exercise 17.12 is ŷ = 4040 + 44.97x. The addition of the new variables changes the coefficients of the regression line in Exercise 17.12. 19.4a First–order model: a Demand = 0 + 1 Price+ Second–order model: a Demand = 0 + 1 Price + 2 Price 2 + First–order model: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 A SUMMARY OUTPUT B Regression Statistics Multiple R R Square Adjusted R Square Standard Error Observations C D E F 0.9249 0.8553 0.8473 13.29 20 ANOVA df Regression Residual Total Intercept Price Second–order model: 1 18 19 SS 18,798 3,179 21,977 Coefficients Standard Error 453.6 15.18 -68.91 6.68 MS 18,798 176.6 F Significance F 106.44 0.0000 t Stat P-value 29.87 0.0000 -10.32 0.0000 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 A SUMMARY OUTPUT B C D E F Regression Statistics Multiple R 0.9862 R Square 0.9726 Adjusted R Square 0.9693 Standard Error 5.96 Observations 20 ANOVA df Regression Residual Total Intercept Price Price-sq 2 17 19 SS 21,374 603 21,977 Coefficients Standard Error 766.9 37.40 -359.1 34.19 64.55 7.58 MS 10,687 35.49 F Significance F 301.15 0.0000 t Stat P-value 20.50 0.0000 -10.50 0.0000 8.52 0.0000 c The second order model fits better because its standard error of estimate is 5.96, whereas that of the first–order models is 13.29 d ŷ .= 766.9 –359.1(2.95) + 64.55(2.95) 2 = 269.3 19.8a 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 A SUMMARY OUTPUT B C D E F Regression Statistics Multiple R 0.9255 R Square 0.8566 Adjusted R Square 0.8362 Standard Error 5.20 Observations 25 ANOVA df Regression Residual Total Intercept Temperature Currency Temp-Curr 3 21 24 SS 3398.7 568.8 3967.4 Coefficients Standard Error 260.7 162.3 -3.32 2.09 -164.3 667.1 3.64 8.54 MS 1132.9 27.08 F Significance F 41.83 0.0000 t Stat P-value 1.61 0.1230 -1.59 0.1270 -0.25 0.8078 0.43 0.6741 b 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 A SUMMARY OUTPUT C B D E F Regression Statistics 0.9312 Multiple R 0.8671 R Square 0.8322 Adjusted R Square 5.27 Standard Error 25 Observations ANOVA df 5 19 24 Regression Residual Total Intercept Temperature Currency Temp-sq Curr-sq Temp-Curr SS 3440.3 527.1 3967.4 Coefficients Standard Error 283.8 274.8 6.88 -1.72 888.5 -828.6 0.0475 -0.0024 1718.5 2054.0 10.57 -0.870 MS 688.1 27.74 Significance F F 0.0000 24.80 P-value t Stat 0.3449 0.97 0.8053 -0.25 0.3627 -0.93 0.9608 -0.05 0.2467 1.20 0.9353 -0.08 c Both models fit equally well. The standard errors of estimate and coefficients of determination are quite similar. 19.16a 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 b A SUMMARY OUTPUT B Regression Statistics Multiple R R Square Adjusted R Square Standard Error Observations C D E F 0.8368 0.7002 0.6659 810.8 40 ANOVA df Regression Residual Total 4 35 39 SS MS 53,729,535 13,432,384 23,007,438 657,355 76,736,973 Coefficients Standard Error 3490 469.2 0.369 0.078 1623 492.5 733.5 394.4 -765.5 484.7 Intercept Yest Att I1 I2 I3 t Stat 7.44 4.73 3.30 1.86 -1.58 F Significance F 20.43 0.0000 P-value 0.0000 0.0000 0.0023 0.0713 0.1232 H 0 : 1 2 3 4 0 H1 : At least on i is not equal to 0 F = 20.43, p-value = 0. There is enough evidence to infer that the model is valid. c H 0 : i 0 H1 : i 0 I 2 : t = 1.86, p-value = .0713 I 3 : t = –1.58, p-value = .1232 Weather is not a factor in attendance. d H0 : 2 0 H1 : 2 > 0 t = 3.30, p-value = .0023/2 = .0012. There is sufficient evidence to infer that weekend attendance is larger than weekday attendance. 19.22a 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 A SUMMARY OUTPUT B C D E F F Significance F 0.0000 Regression Statistics Multiple R 0.5125 R Square 0.2626 Adjusted R Square 0.2234 Standard Error 5866 Observations 100 ANOVA df Regression Residual Total 5 94 99 Coefficients Standard Error 30,523 2,358 -108.9 77.58 63.95 33.86 2591 1,287 -3714 1,347 -1260 221.5 Intercept Pct PT Pct U Av Shift UM Rel Absent b SS MS 1,151,889,624 230,377,925 3,234,297,164 34,407,417 4,386,186,788 6.70 t Stat P-value 12.95 0.0000 -1.40 0.1635 1.89 0.0620 2.01 0.0470 -2.76 0.0070 -5.69 0.0000 H0 : 4 0 H1 : 4 0 t = 2.01, p-value = .0470. There is enough evidence to infer that the availability of shiftwork affects absenteeism. c H 0 : 5 0 H1 : 5 0 t = –2.76, p-value =.0070. There is enough evidence to infer that in organizations where the union–management relationship is good absenteeism is lower. 19.40a 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 H I J Results of stepwise regression K L M N Step 1 - Entering variable: Absent Summary measures Multiple R R-Square Adj R-Square StErr of Est 0.3989 0.1591 0.1505 6134.7729 ANOVA Table Source Explained Unexplained df SS MS 1 697913636.0400 697913636.0400 98 3688273152.0000 37635440.3265 F 18.5441 p-value 0.0000 Regression coefficients Constant Absent Coefficient 28516.9941 -790.9393 Std Err 1298.6729 183.6711 t-value 21.9586 -4.3063 Change 0.0520 0.0442 0.0363 -132.6689 % Change %13.0 %27.8 %24.1 -%2.2 p-value 0.0000 0.0000 Step 2 - Entering variable: UM_Rel Summary measures Multiple R R-Square Adj R-Square StErr of Est 0.4509 0.2033 0.1869 6002.1040 ANOVA Table Source Explained Unexplained df SS MS 2 891737380.0400 445868690.0200 97 3494449408.0000 36025251.6289 F 12.3766 p-value 0.0000 Regression coefficients Constant Absent UM_Rel Coefficient 31636.3125 -967.8824 -3150.9519 Std Err 1850.1073 195.2204 1358.4437 t-value 17.0997 -4.9579 -2.3195 p-value 0.0000 0.0000 0.0225 b In the stepwise regression equation only the number of days absent and union– management relations were statistically significant. c The three variables that were not statistically significant and one that was borderline were excluded by the stepwise regression process. 19.48a Depletion = 0 + 1 Temperature + 2 PH–level + 3 PH–level 2 + 4 I 4 + 5 I 5 + where I1 = 1 if mainly cloudy I1 = 0 otherwise I 2 = 1 if sunny I 2 = 0 otherwise b 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 c A SUMMARY OUTPUT B C D E F Regression Statistics Multiple R 0.8085 R Square 0.6537 Adjusted R Square 0.6452 Standard Error 4.14 Observations 210 ANOVA df Regression Residual Total Intercept Temperature PH Level PH-sq I1 I2 SS 5 204 209 6596 3495 10091 Coefficients Standard Error 1003 55.12 0.194 0.029 -265.6 14.75 17.76 0.983 -1.07 0.700 1.16 0.700 MS 1319 17.13 F Significance F 77.00 0.0000 t Stat P-value 18.19 0.0000 6.78 0.0000 -18.01 0.0000 18.07 0.0000 -1.53 0.1282 1.65 0.0997 H 0 : 1 2 3 4 5 0 H1 : At least on i is not equal to 0 F = 77.00, p-value = 0. There is enough evidence to infer that the model is valid. d H 0 : 1 0 H1 : 1 > 0 t = 6.78, p-value = 0. There is enough evidence to infer that higher temperatures deplete chlorine more quickly. e H 0 : 3 0 H1 : 3 > 0 t = 18.07, p-value = 0. There is enough evidence to infer that there is a quadratic relationship between chlorine depletion and PH level. f H 0 : i 0 H1 : i 0 I1 : t = –1.53, p-value = .1282. There is not enough evidence to infer that chlorine depletion differs between mainly cloudy days and partly sunny days. I 2 : t = 1.65, p-value = .0997. There is not enough evidence to infer that chlorine depletion differs between sunny days and partly sunny days. Weather is not a factor in chlorine depletion.