190 Solutions Manual for Statistics for Business & Economics, 5 th Edition CHAPTER 11: Multiple Regression 11-1 The estimated regression slope coefficients are interpreted as follows: b1 = .661: All else equal, an increase in the plane’s top speed by one mph will increase the expected number of hours in the design effort by an estimated .661 million worker-hours. b2 = .065: All else equal, an increase in the plane’s weight by one ton will increase the expected number of hours in the design effort by an estimated .065 million worker-hours b3 = -.018: All else equal, an increase in the percentage of parts in common with other models will result in a decrease in the expected number of hours in the design effort by an estimated .065 million worker-hours 11-2 The estimated regression slope coefficients are interpreted as follows: b1 = .057: All else equal, an increase of one unit in the change over the quarter in equity purchases by financial institutions results in an estimated .057 increase in the change over the quarter in the Financial Times stock price index b2 = -.065: All else equal, an increase of one unit in the change over the quarter in equity sales by financial institutions results in an estimated .065 decrease in the change over the quarter in the Financial Times stock price index 11-3 a. b1 = .052: All else equal, an increase of one hundred dollars in weekly income results in an estimated .052 quarts per week increase in milk consumption. b2 = 1.14: All else equal, an increase in family size by one person will result in an estimated increase in milk consumption by 1.14 quarts per week b. The intercept term b0 of -.025 is the estimated milk consumption of quarts of milk per week given that the family’s weekly income is 0 dollars and there are 0 members in the family. This is likely extrapolating beyond the observed data series and is not a useful interpretation. 11-4 a. b1 = .653: All else equal, a one unit increase in the average number of meals eaten per week will result in an estimated .653 pounds gained during freshman year. b2 = -1.345: All else equal, a one unit increase in the average number of hours of exercise per week will result in an estimated 1.345 pound weight loss. b3 = .613: All else equal, a one unit increase in the average number of beers consumed per week will result in an estimated .613 pound weight gain. 11-5 a. When the correlation is = 0 the slope coefficient of the X1 term simplifies to the slope coefficient of the bivariate regression: Start with equation 11.4: S y (rx1 y rx1x2 rx2 y ) . Note that if the correlation between X1 and X2 is zero, then the b1 S x1 (1 r 2 x1x2 ) second terms in both the numerator and denominator are zero and the formula Chapter 11: Multiple Regression algebraically reduces to b1 S y rx1 y S x1 191 which is the equivalent of the bivariate slope coefficient (see box on bottom of page 380). b. When the correlation between X1 and X2 is = 1, then the second term in the denominator goes to 0 and the slope coefficient is undefined. 11-6 a. Electricity sales as a function of number of customers and price Regression Analysis: salesmw2 versus priclec2, numcust2 The regression equation is salesmw2 = - 647363 + 19895 priclec2 + 2.35 numcust2 Predictor Coef SE Coef T P Constant -647363 291734 -2.22 0.030 priclec2 19895 22515 0.88 0.380 numcust2 2.3530 0.2233 10.54 0.000 S = 66399 R-Sq = 79.2% R-Sq(adj) = 78.5% Analysis of Variance Source DF SS MS Regression 2 1.02480E+12 5.12400E+11 Residual Error 61 2.68939E+11 4408828732 Total 63 1.29374E+12 Source priclec2 numcust2 F 116.22 P 0.000 DF Seq SS 1 5.35306E+11 1 4.89494E+11 All else equal, for every one unit increase in the price of electricity, we estimate that sales will increase by 19895 mwh. Note that this estimated coefficient is not significantly different from zero. All else equal, for every additional residential customer who uses electricity in the heating of their home, we estimate that sales will increase by 2.353 mwh. b. Electricity sales as a function of number of customers Regression Analysis: salesmw2 versus numcust2 The regression equation is salesmw2 = - 410202 + 2.20 numcust2 Predictor Coef SE Coef Constant -410202 114132 numcust2 2.2027 0.1445 S = 66282 R-Sq = 78.9% T -3.59 15.25 P 0.001 0.000 R-Sq(adj) = 78.6% Analysis of Variance Source DF SS MS Regression 1 1.02136E+12 1.02136E+12 Residual Error 62 2.72381E+11 4393240914 Total 63 1.29374E+12 F 232.48 P 0.000 All else equal, an additional residential customer will add 2.2027 mwh to electricity sales. The two models have roughly equivalent explanatory power; therefore, adding price as a variable does not add a significant amount of explanatory power to the model. There appears to be a problem of multicollinearity between price and customers. 192 Solutions Manual for Statistics for Business & Economics, 5 th Edition c. Regression Analysis: salesmw2 versus priclec2, degrday2 The regression equation is salesmw2 = 2312260 - 165275 priclec2 + 56.1 degrday2 Predictor Coef SE Coef T P Constant 2312260 148794 15.54 0.000 priclec2 -165275 24809 -6.66 0.000 degrday2 56.06 60.37 0.93 0.357 S = 110725 R-Sq = 42.2% R-Sq(adj) = 40.3% Analysis of Variance Source DF SS MS Regression 2 5.45875E+11 2.72938E+11 Residual Error 61 7.47863E+11 12260053296 Total 63 1.29374E+12 Source priclec2 degrday2 F 22.26 P 0.000 DF Seq SS 1 5.35306E+11 1 10569514277 All else equal, an increase in the price of electricity will reduce electricity sales by 165,275 mwh. All else equal, an increase in the degree days (departure from normal weather) by one unit will increase electricity sales by 56.06 mwh. Note that the coefficient on the price variable is now negative, as expected, and it is significantly different from zero (p-value = .000) d. Regression Analysis: salesmw2 versus Yd872, degrday2 The regression equation is salesmw2 = 293949 + 326 Yd872 + 58.4 degrday2 Predictor Coef SE Coef T Constant 293949 67939 4.33 Yd872 325.85 21.30 15.29 degrday2 58.36 35.79 1.63 S = 66187 R-Sq = 79.3% R-Sq(adj) = 78.7% Analysis of Variance Source DF SS MS Regression 2 1.02652E+12 5.13259E+11 Residual Error 61 2.67221E+11 4380674677 Total 63 1.29374E+12 Source Yd872 degrday2 P 0.000 0.000 0.108 F 117.16 P 0.000 DF Seq SS 1 1.01487E+12 1 11651638615 All else equal, an increase in personal disposable income by one unit will increase electricity sales by 325.85 mwh. All else equal, an increase in degree days by one unit will increase electricity sales by 58.36 mwh. Chapter 11: Multiple Regression 11-7 193 a. mpg as a function of horsepower and weight Regression Analysis: milpgal versus horspwr, weight The regression equation is milpgal = 55.8 - 0.105 horspwr - 0.00661 weight 150 cases used 5 cases contain missing values Predictor Coef SE Coef T P Constant 55.769 1.448 38.51 0.000 horspwr -0.10489 0.02233 -4.70 0.000 weight -0.0066143 0.0009015 -7.34 0.000 S = 3.901 R-Sq = 72.3% Analysis of Variance Source DF Regression 2 Residual Error 147 Total 149 SS 5850.0 2236.8 8086.8 R-Sq(adj) = 72.0% MS 2925.0 15.2 F 192.23 P 0.000 All else equal, a one unit increase in the horsepower of the engine will reduce fuel mileage by .10489 mpg. All else equal, an increase in the weight of the car by 100 pounds will reduce fuel mileage by .66143 mpg. b. Add number of cylinders Regression Analysis: milpgal versus horspwr, weight, cylinder The regression equation is milpgal = 55.9 - 0.117 horspwr - 0.00758 weight 150 cases used 5 cases contain missing values Predictor Coef SE Coef T Constant 55.925 1.443 38.77 horspwr -0.11744 0.02344 -5.01 weight -0.007576 0.001066 -7.10 cylinder 0.7260 0.4362 1.66 + 0.726 cylinder P 0.000 0.000 0.000 0.098 S = 3.878 R-Sq = 72.9% R-Sq(adj) = 72.3% Analysis of Variance Source DF SS MS F Regression 3 5891.6 1963.9 130.62 Residual Error 146 2195.1 15.0 Total 149 8086.8 P 0.000 All else equal, one additional cylinder in the engine of the auto will increase fuel mileage by .726 mpg. Note that this is not significant at the .05 level (p-value = .098). Horsepower and weight still have the expected negative signs c. mpg as a function of weight, number of cylinders Regression Analysis: milpgal versus weight, cylinder The regression equation is milpgal = 55.9 - 0.0104 weight + 0.121 cylinder 154 cases used 1 cases contain missing values Predictor Coef SE Coef T P Constant 55.914 1.525 36.65 0.000 weight -0.0103680 0.0009779 -10.60 0.000 cylinder 0.1207 0.4311 0.28 0.780 S = 4.151 R-Sq = 68.8% R-Sq(adj) = 68.3% Analysis of Variance Source DF SS MS F Regression 2 5725.0 2862.5 166.13 Residual Error 151 2601.8 17.2 Total 153 8326.8 P 0.000 194 Solutions Manual for Statistics for Business & Economics, 5 th Edition All else equal, an increase in the weight of the car by 100 pounds will reduce fuel mileage by 1.0368 mpg. All else equal, an increase in the number of cylinders in the engine will increase mpg by .1207 mpg. The explanatory power of the models has stayed relatively the same with a slight drop in explanatory power for the latest regression model. Note that the coefficient on weight has stayed negative and significant (p-values of .000) for all of the regression models; although the value of the coefficient has changed. The number of cylinders is not significantly different from zero in either of the models where it was used as an independent variable. There is likely some correlation between cylinders and the weight of the car as well as between cylinders and the horsepower of the car. d. mpg as a function of horsepower, weight, price Regression Analysis: milpgal versus horspwr, weight, price The regression equation is milpgal = 54.4 - 0.0938 horspwr - 0.00735 weight +0.000137 price 150 cases used 5 cases contain missing values Predictor Coef SE Coef T P Constant 54.369 1.454 37.40 0.000 horspwr -0.09381 0.02177 -4.31 0.000 weight -0.0073518 0.0008950 -8.21 0.000 price 0.00013721 0.00003950 3.47 0.001 S = 3.762 R-Sq = 74.5% Analysis of Variance Source DF Regression 3 Residual Error 146 Total 149 SS 6020.7 2066.0 8086.8 R-Sq(adj) = 73.9% MS 2006.9 14.2 F 141.82 P 0.000 All else equal, an increase by one unit in the horsepower of the auto will reduce fuel mileage by .09381 mpg. All else equal, an increase by 100 pounds in the weight of the auto will reduce fuel mileage by .73518 mpg and an increase in the price of the auto by one dollar will increase fuel mileage by .00013721 mpg. e. Horse power and weight remain significant negative independent variables throughout whereas the number of cylinders has been insignificant. The size of the coefficients change as the combinations of independent variables changes. This is likely due to strong correlation that may exist between the independent variables. Chapter 11: Multiple Regression 11-8 195 a. Horsepower as a function of weight, cubic inches of displacement Regression Analysis: horspwr versus weight, displace The regression equation is horspwr = 23.5 + 0.0154 weight + 0.157 displace 151 cases used 4 cases contain missing values Predictor Coef SE Coef T P Constant 23.496 7.341 3.20 0.002 weight 0.015432 0.004538 3.40 0.001 displace 0.15667 0.03746 4.18 0.000 S = 13.64 R-Sq = 69.2% Analysis of Variance Source DF Regression 2 Residual Error 148 Total 150 SS 61929 27551 89480 VIF 6.0 6.0 R-Sq(adj) = 68.8% MS 30964 186 F 166.33 P 0.000 All else equal, a 100 pound increase in the weight of the car is associated with a 1.54 increase in horsepower of the auto. All else equal, a 10 cubic inch increase in the displacement of the engine is associated with a 1.57 increase in the horsepower of the auto. b. Horsepower as a function of weight, displacement, number of cylinders Regression Analysis: horspwr versus weight, displace, cylinder The regression equation is horspwr = 16.7 + 0.0163 weight + 0.105 displace + 2.57 cylinder 151 cases used 4 cases contain missing values Predictor Coef SE Coef T P VIF Constant 16.703 9.449 1.77 0.079 weight 0.016261 0.004592 3.54 0.001 6.2 displace 0.10527 0.05859 1.80 0.074 14.8 cylinder 2.574 2.258 1.14 0.256 7.8 S = 13.63 R-Sq = 69.5% Analysis of Variance Source DF Regression 3 Residual Error 147 Total 150 SS 62170 27310 89480 R-Sq(adj) = 68.9% MS 20723 186 F 111.55 P 0.000 All else equal, a 100 pound increase in the weight of the car is associated with a 1.63 increase in horsepower of the auto. All else equal, a 10 cubic inch increase in the displacement of the engine is associated with a 1.05 increase in the horsepower of the auto. All else equal, one additional cylinder in the engine is associated with a 2.57 increase in the horsepower of the auto. Note that adding the independent variable number of cylinders has not added to the explanatory power of the model. R square has increased marginally. Engine displacement is no longer significant at the .05 level (p-value of .074) and the estimated regression slope coefficient on the number of cylinders is not significantly different from zero. This is due to the strong correlation that exists between cubic inches of engine displacement and the number of cylinders. 196 Solutions Manual for Statistics for Business & Economics, 5 th Edition c. Horsepower as a function of weight, displacement and fuel mileage Regression Analysis: horspwr versus weight, displace, milpgal The regression equation is horspwr = 93.6 + 0.00203 weight + 0.165 displace - 1.24 milpgal 150 cases used 5 cases contain missing values Predictor Coef SE Coef T P VIF Constant 93.57 15.33 6.11 0.000 weight 0.002031 0.004879 0.42 0.678 8.3 displace 0.16475 0.03475 4.74 0.000 6.1 milpgal -1.2392 0.2474 -5.01 0.000 3.1 S = 12.55 R-Sq = 74.2% Analysis of Variance Source DF Regression 3 Residual Error 146 Total 149 SS 66042 22994 89036 R-Sq(adj) = 73.6% MS 22014 157 F 139.77 P 0.000 All else equal, a 100 pound increase in the weight of the car is associated with a .203 increase in horsepower of the auto. All else equal, a 10 cubic inch increase in the displacement of the engine is associated with a 1.6475 increase in the horsepower of the auto. All else equal, an increase in the fuel mileage of the vehicle by 1 mile per gallon is associated with a reduction in horsepower of 1.2392. Note that the negative coefficient on fuel mileage indicates the trade-off that is expected between horsepower and fuel mileage. The displacement variable is significantly positive, as expected, however, the weight variable is no longer significant. Again, one would expect high correlation among the independent variables. d. Horsepower as a function of weight, displacement, mpg and price Regression Analysis: horspwr versus weight, displace, milpgal, price The regression equation is horspwr = 98.1 - 0.00032 weight + 0.175 displace - 1.32 milpgal +0.000138 price 150 cases used 5 cases contain missing values Predictor Coef SE Coef T P VIF Constant 98.14 16.05 6.11 0.000 weight -0.000324 0.005462 -0.06 0.953 10.3 displace 0.17533 0.03647 4.81 0.000 6.8 milpgal -1.3194 0.2613 -5.05 0.000 3.5 price 0.0001379 0.0001438 0.96 0.339 1.3 S = 12.55 R-Sq = 74.3% Analysis of Variance Source DF Regression 4 Residual Error 145 Total 149 SS 66187 22849 89036 R-Sq(adj) = 73.6% MS 16547 158 F 105.00 P 0.000 e. Explanatory power has marginally increased from the first model to the last. The estimated coefficient on price is not significantly different from zero. Displacement and fuel mileage have the expected signs. The coefficient on weight has the wrong sign; however, it is not significantly different from zero (p-value of .953). Chapter 11: Multiple Regression 11-9 197 3.549 .9145 , therefore, 91.45% of the variability in work-hours of design 3.881 effort can be explained by the variation in the plane’s top speed, weight and percentage number of parts in common with other models. b. SSE = 3.881-3.549 = .332 .332 /(27 4) .9033 c. R 2 1 3.881/ 26 d. R .9145 .9563 . This is the sample correlation between observed and predicted values of the design effort a. R 2 88.2 .4559 , therefore, 45.59% of the variability in milk consumption can 162.1 be explained by the variations in weekly income and family size. 88.2 /(30 3) .4156 b. R 2 1 162.1/ 29 c. R .4559 .6752 . This is the sample correlation between observed and predicted values of milk consumption. 11-10 a. R 2 1 79.2 .6331 , therefore, 63.31% of the variability in weight gain can be 79.2 45.9 explained by the variations in the average number of meals eaten, number of hours exercised and number of beers consumed weekly. 45.9 /(25 4) .5807 b. R 2 1 125.1/ 24 c. R .6331 .7957 . This is the sample correlation between observed and predicted values of weight gained 11-11 a. R 2 11-12 a. Regression Analysis: Y profit versus X2 offices The regression equation is Y profit = 1.55 -0.000120 X2 offices Predictor Coef SE Coef Constant 1.5460 0.1048 X2 offi -0.00012033 0.00001434 S = 0.07049 R-Sq = 75.4% Analysis of Variance Source DF Regression 1 Residual Error 23 Total 24 SS 0.34973 0.11429 0.46402 T 14.75 -8.39 P 0.000 0.000 R-Sq(adj) = 74.3% MS 0.34973 0.00497 F 70.38 P 0.000 198 Solutions Manual for Statistics for Business & Economics, 5 th Edition b. Regression Analysis: X1 revenue versus X2 offices The regression equation is X1 revenue = - 0.078 +0.000543 X2 Predictor Constant X2 offi S = 0.2000 Coef -0.0781 0.00054280 SE Coef 0.2975 0.00004070 R-Sq = 88.5% Analysis of Variance Source DF Regression 1 Residual Error 23 Total 24 offices T -0.26 13.34 P 0.795 0.000 R-Sq(adj) = 88.1% SS 7.1166 0.9204 8.0370 MS 7.1166 0.0400 F 177.84 P 0.000 c. Regression Analysis: Y profit versus X1 revenue The regression equation is Y profit = 1.33 - 0.169 X1 revenue Predictor Coef SE Coef Constant 1.3262 0.1386 X1 reven -0.16913 0.03559 S = 0.1009 R-Sq = 49.5% Analysis of Variance Source DF Regression 1 Residual Error 23 Total 24 T 9.57 -4.75 P 0.000 0.000 R-Sq(adj) = 47.4% SS 0.22990 0.23412 0.46402 MS 0.22990 0.01018 F 22.59 P 0.000 d. Regression Analysis: X2 offices versus X1 revenue The regression equation is X2 offices = 957 + 1631 X1 revenue Predictor Coef SE Coef Constant 956.9 476.5 X1 reven 1631.3 122.3 S = 346.8 R-Sq = 88.5% Analysis of Variance Source DF Regression 1 Residual Error 23 Total 24 SS 21388013 2766147 24154159 T 2.01 13.34 P 0.057 0.000 R-Sq(adj) = 88.1% MS 21388013 120267 F 177.84 e. Using the residuals obtained from a, b: r = .6731 f. Using the residuals obtained from c, d: r = -.8562 11-13 a. b1 .661, sb 1 .099, n 27, t23,.05/.025 1.714, 2.069 90% CI: .661 1.714(.099); .4913 up to .8307 95% CI: .661 2.069(.099); .4562 up to .8658 b. b2 .065, sb 2 .032, t23,.025/.005 2.069, 2.807 95% CI: .065 2.069(.032); -.0012 up to .1312 99% CI: .065 2.807(.032); -.0248 up to .1548 P 0.000 Chapter 11: Multiple Regression c. H 0 : 2 0, H1 : 2 0 .065 t 2.031 .032 t23,.05/.025 = 1.714, 2.069 Therefore, reject H 0 at the 10% level but not at the 5% level d. H 0 : 1 2 0, H1 : At least one i 0,(i 1, 2) (3.311 .332) / 2 F 103.188 .332 / 23 Therefore, reject H 0 at the 1% level since F = 103.188 > 5.66 = F 2,23,.01 11-14 a. H 0 : 1 0; H1 : 1 0 .052 t 2.26 .023 t27,.025/.01 2.052, 2.473 Therefore, reject H 0 at the 2.5% level but not at the 1% level b. t27,.05/.025/.005 1.703, 2.052, 2.771 90% CI: 1.14 1.703(.35); .5439 up to 1.7361 95% CI: 1.14 2.052(.35); .4218 up to 1.8582 99% CI: 1.14 2.771(.35); .1701 up to 2.1099 11-15 a. H 0 : 2 0; H1 : 2 0 1.345 t 2.381 .565 t21,.025/.01 2.080, 2.518 Therefore, reject H 0 at the 2.5% level but not at the 1% level b. H 0 : 3 0; H1 : 3 0 .613 t 2.523 .243 t21,.01/.005 2.518, 2.831 Therefore, reject H 0 at the 1% level but not at the .5% level c. t21,.05/.025/.005 1.721, 2.080, 2.831 90% CI: .653 1.721(.189); .3277 up to .9783 95% CI: .653 2.080(.189); .2599 up to 1.0461 99% CI: .653 2.831(.189); .1179 up to 1.1881 11-16 a. H 0 : 3 0, H1 : 3 0 .000191 t .428 .000446 199 200 Solutions Manual for Statistics for Business & Economics, 5 th Edition t16,.10 = -1.337 Therefore, do not reject H 0 at the 20% level b. H 0 : 1 2 3 0, H1 : At least one i 0, (i 1, 2,3) 16 .71 F 13.057 , F 3,16,.01 = 5.29 3 1 .71 Therefore, reject H 0 at the 1% level 11-17 a. t85,.025/.005 2.000, 2.660 90% CI: 7.878 2.000(1.809); 4.260 up to 11.496 95% CI: 7.878 2.660(1.809); 3.0661 up to 12.6899 .003666 2.73 , t85,.005 2.660 b. H 0 : 2 0; H1 : 2 0 , t .001344 Therefore, reject H 0 at the .5% level 11-18 a. All else being equal, an extra $1 in mean per capita personal income leads to an expected extra $.04 of net revenue per capita from the lottery b. b2 .8772, sb 2 .3107, n 29, t24,.025 2.064 95% CI: .8772 2.064(.3107), .2359 up to 1.5185 c. H 0 : 3 0, H1 : 3 0 365.01 t 1.383 263.88 t24,.10 /.05 = -1.318, -1.711 Therefore, reject H 0 at the 10% level but not at the 5% level 11-19 a. b3 7.653, sb 3 3.082, n 19, t15,.025 2.131 95% CI: 7.653 2.131(3.082), 1.0853 up to 14.2207 19.729 2.194 b. H 0 : 2 0, H1 : 2 0 , t 8.992 t15,.025 /.01 = 2.131, 2.602 Therefore, reject H 0 at the 2.5% level but not at the 1% level 11-20 a. n 19, b1 .2, sb 1 .0092, t16,.025 2.12 95% CI: .2 2.12(.0092), .1805 up to .2195 .1 1.19 b. H 0 : 2 0, H1 : 2 0 , t .084 t16,.10 = -1.337, Therefore, do not reject H 0 at the 10% level 11-21 a. n 14, b1 .101, sb 1 .023, t10,.05 1.812 90% CI: .101 1.812(.023), .0593 up to .1427 Chapter 11: Multiple Regression b. H 0 : 2 0, H1 : 2 0 , t .244 3.05 .08 t10,.01/.005 = -2.764, -3.169 Therefore, reject H 0 at the 1% level but not at the .5% level c. H 0 : 3 0, H1 : 3 0 .057 t 6.162 .00925 t10,.005 = 3.169 Therefore, reject H 0 at the .5% level 11-22 a. n 39, b5 .0495, sb 1 .01172, t30,.005 2.750 99% CI: .0495 2.750(.01172), .0173 up to .0817 b. H 0 : 4 0, H1 : 4 0 .48122 t .617 .77954 t30,.10 = 1.31 Therefore, do not reject H 0 at the 20% level c. H 0 : 7 0, H1 : 7 0 .00645 t 2.108 .00306 t30,.025 /.01 = 2.042, 2.457 Therefore, reject H 0 at the 5% level but not at the 2% level SSE /(n k 1) SST /(n 1) n 1 (1 R 2 ) = 1 n k 1 n 1 k R2 = n k 1 n k 1 2 (n 1) R k = n k 1 (n 1) R 2 k (n k 1) R 2 k b. Since R 2 , then R 2 n k 1 n 1 SSR / k n k 1 SSR / SST c. SSE /(n k 1) k SSE / SST 11-23 a. R 2 1 = n k 1 R2 k 1 R2 201 202 Solutions Manual for Statistics for Business & Economics, 5 th Edition n k 1 [(n k 1) R 2 k ]/(n 1) k [n 1 (n k 1) R 2 k ]/(n 1) n k 1 (n k 1) R 2 k = k (n k 1)(1 R 2 ) n k 1 R 2 k = k (1 R 2 ) = 11-24 a. SST = 3.881, SSR = 3.549, SSE = .332 H 0 : 1 2 3 0, H1 : At least one i 0, (i 1, 2,3) 3.549 / 3 F 81.955 .332 / 23 F 3,23,.01 = 4.76 Therefore, reject H 0 at the 1% level b. Analysis of Variance table: Sources of Degress of variation Sum of Squares Freedom Mean Squares Regressor 3.549 3 1.183 Error .332 23 .014435 Total 3.881 26 F-Ratio 81.955 11-25 H 0 : 1 2 0, H1 : At least one i 0,(i 1, 2) 45 .463 (2 / 45) F 21.262 2 1 .463 F 2,45,.01 = 5.18 Therefore, reject H 0 at the 1% level 11-26 a. SST = 162.1, SSR = 73.9, SSE = 88.2 H 0 : 1 2 0, H1 : At least one i 0,(i 1, 2) 73.9 / 2 F 11.311 , F 2,27,.01 = 5.49 88.2 / 27 Therefore, reject H 0 at the 1% level b. Sources of Degress of variation Sum of Squares Freedom Regressor 73.9 2 Error 88.2 27 Total 162.1 29 Mean Squares 36.95 3.266667 F-Ratio 11.311 Chapter 11: Multiple Regression 11-27 a. SST = 125.1, SSR = 79.2, SSE = 45.9 H 0 : 1 2 3 0, H1 : At least one i 0, (i 1, 2,3) 79.2 / 3 F 12.078 , F 3,21,.01 = 4.87 45.9 / 21 Therefore, reject Ho at the 1% level b. Sources of Degress of variation Sum of Squares Freedom Mean Squares Regressor 79.2 3 26.4 Error 45.9 21 2.185714 Total 125.1 24 F-Ratio 12.078 11-28 a. H 0 : 1 2 3 4 0, H1 : At least one i 0, (i 1, 2,3, 4) 24 .51 F 6.2449 4 1 .51 F 4,24,.01 = 4.22 Therefore, reject H 0 at the 1% level 11-29 a. H 0 : 1 2 3 0, H1 : At least one i 0, (i 1, 2,3) 15 .84 F 26.25 3 1 .84 F 3,15,.01 = 5.42 Therefore, reject H 0 at the 1% level 11-30 a. H 0 : 1 2 0, H1 : At least one i 0,(i 1, 2) 16 .96 (2 /16) F 217 , F 2,16,.01 = 6.23 2 1 .96 Therefore, reject H 0 at the 1% level 11-31 H 0 : 1 2 3 4 5 6 7 0, H1 : At least one i 0, (i 1, 2,3, 4,5, 6, 7) 31 .3572 (7 / 31) F 4.017 7 1 .3572 F 7,30,.01 = 3.30 Therefore, reject H 0 at the 1% level 11-32 ( SSE * SSE )k1 n k 1 ( SSE * SSE ) / SST SSE /(n k 1) k1 SSE / SST n k 11 R 2* (1 R 2 ) = k1 1 R2 203 204 Solutions Manual for Statistics for Business & Economics, 5 th Edition = n k 1 R 2 R 2* k1 1 R2 11-33 Let 3 be the coefficient on the number of preschool children in the household H 0 : 3 0, H1 : 3 0 (88.2 83.7) /1 F 1.398 83.7 / 26 F 1,26,.05 = 4.23 Therefore, do not reject H 0 at the 5% level 11-34 Yˆ 7.35 .653(20) 1.345(10) .613(6) 10.638 pounds 11-35 Yˆ .025 .052(6) 1.14(4) 4.847 quarts of milk per week 11-36 Yˆ .578 .661(1) .065(7) .018(50) .794 million worker=hours 11-37 a. All else equal, a one square foot increase in the lot size is expected to increase the selling price of the house by $1.468 b. 98.43% of the variation in the selling price of homes can be explained by the variation in house size, lot size, number of bedrooms and number of bathrooms c. H 0 : 4 0, H1 : 4 0 2701.1 t 1.353 1996.2 t15,.10 /.05 = 1.341, 1.753 Therefore, reject H 0 at the 10% level but not at the 5% level d. Yˆ 1998.5 22.352(1250) 1.4686(4700) 6767.3(3) 2701.1(1.5) 61194.47 11-38 There are many possible answers. Relationships that can be approximated by a non-linear quadratic model include many supply functions, production functions and cost functions including average cost versus the number of units produced. 11-39 To estimate the function with linear least squares, solve the equation 1 2 2 for 2 . Since 2 2 1 , plug into the equation and algebraically manipulate: Y o 1 X 1 (2 1 ) X 21 3 X 2 Y o 1 X 1 2 X 21 1 X 21 3 X 2 Y o 1[ X 1 X 21 ] 2 X 21 3 X 2 Y 2 X 21 o 1[ X 1 X 21 ] 3 X 2 Conduct the variable transformations and estimate the model using least squares. Chapter 11: Multiple Regression 11-40 Linear model: Regression Plot Salary = 20544.5 + 616.113 Experience S = 3117.89 R-Sq = 78.0 % R-Sq(adj) = 77.9 % 50000 Salary 40000 30000 20000 0 10 20 30 40 Experience Quadratic model: Regression Plot Salary = 18683.8 + 910.807 Experience - 8.21382 Experience**2 S = 3027.17 R-Sq = 79.4 % R-Sq(adj) = 79.1 % 50000 Salary 40000 30000 20000 0 10 20 Experience 30 40 205 206 Solutions Manual for Statistics for Business & Economics, 5 th Edition Cubic model: Regression Plot Salary = 20881.1 + 344.484 Experience + 26.4323 Experience**2 - 0.582553 Experience**3 S = 2982.43 R-Sq = 80.2 % R-Sq(adj) = 79.8 % 50000 Salary 40000 30000 20000 0 10 20 30 40 Experience All three of the models appear to fit the data well. The cubic model appears to fit the data the best as the standard error of the estimate is lowest. In addition, explanatory power is marginally higher for the cubic model than the other models. 11-41 a. All else equal, 1% increase in annual consumption expenditures will be associated with a 1.1556% increase in expenditures on vacation travel. All else equal, a 1% increase in the size of the household will be associated with a .4408% decrease in expenditures on vacation travel. b. 16.8% of the variation in vacation travel expenditures can be explained by the variations in the log of total consumption expenditures and log of the number of members in the household c. 1.1556 1.96(.0546) = 1.049 up to 1.2626 d. H 0 : 2 0, H1 : 2 0 .4408 t 8.996 .0490 Therefore, reject H 0 at the 1% level 11-42 a. A 1% increase in median income leads to an expected .68% increase in store size. b. H 0 : 1 0, H1 : 1 0 .68 t 8.831 .077 Therefore, reject H 0 at the 1% level Chapter 11: Multiple Regression 207 11-43 a. All else equal, a 1% increase in the price of beef will be associated with a decrease of .529% in the tons of beef consumed annually in the U.S. b. All else equal, a 1% increase in the price of pork will be associated with an increase of .217% in the tons of beef consumed annually in the U.S. c. H 0 : 4 0, H1 : 4 0 .416 t 2.552 .163 t25,.01 = 2.485 Therefore, reject H 0 at the 1% level d. H 0 : 1 2 3 4 0, H1 : At least one i 0, (i 1, 2,3, 4) n k 1 R2 25 .683 13.466 2 k 1 R 4 1 .683 F 4,25,.01 = 4.18 Therefore, reject H 0 at the 1% level e. If an important independent variable has been omitted, there may be specification bias. The regression coefficients produced for the misspecified model would be misleading. F 11-44 Results for: GermanImports.xls Regression Analysis: LogYt versus LogX1t, LogX2t The regression equation is LogYt = - 4.07 + 1.36 LogX1t + 0.101 LogX2t Predictor Coef SE Coef T Constant -4.0709 0.3100 -13.13 LogX1t 1.35935 0.03005 45.23 LogX2t 0.10094 0.05715 1.77 S = 0.04758 R-Sq = 99.7% Analysis of Variance Source DF Regression 2 Residual Error 28 Total 30 Source LogX1t LogX2t DF 1 1 SS 21.345 0.063 21.409 P 0.000 0.000 0.088 VIF 4.9 4.9 R-Sq(adj) = 99.7% MS 10.673 0.002 F 4715.32 P 0.000 Seq SS 21.338 0.007 11-45 Estimate a Cobb-Douglas production function with three independent variables: Y 0 X1 1 X 2 2 X 3 3 where X1 = capital, X2 = labor and X3 = basic research Taking the log of both sides of the equation yields: log(Y ) log( 0 ) 1 log( X1 ) 2 log( X 2 ) 3 log( X 3 ) Using this form, now regression the log of Y on the logs of the three independent variables and obtain the estimated regression slope coefficients. 208 Solutions Manual for Statistics for Business & Economics, 5 th Edition 11-46 11-47 The interpretation of the dummy variable is that we can conclude that for a given difference between the spot price in the current year and OPEC price in the previous year, the difference between the OPEC price in the current year and OPEC price in the previous years is $5.22 higher in 1974 during the oil embargo than in other years 11-48 a. All else being equal, expected selling price is higher by $3,219 if house has a fireplace. b. All else being equal, expected selling price is higher by $2,005 if house has brick siding. c. 95% CI: 3219 1.96(947) = $1,362.88 up to $5,075.12 d. H 0 : 5 0, H1 : 5 0 2005 t 2.611 768 t809,.005 = 2.576 Therefore, reject H 0 at the .5% level 11-49 a. All else being equal, the price-earnings ratio is higher by 1.23 for a regional bank than a money center bank b. H 0 : 2 0, H1 : 2 0 1.23 t 2.48 .496 t29,.01/.005 = 2.462, 2.756 Therefore, reject H 0 at the 2% level but not at the 1% level c. H 0 : 1 2 0, H1 : At least one i 0,(i 1, 2) n k 1 R2 29 .37 8.516 2 k 1 R 2 1 .37 F 2,29,.05 = 3.33 Therefore, reject H 0 at the 5% level F 11-50 35.6% of the variation in overall performance in law school can be explained by the variation in undergraduate gpa, scores on the LSATs and whether the student’s letter of recommendation are unusually strong. The overall model is significant since we can reject the null hypothesis that the model has no explanatory power in favor of the alternative hypothesis that the model has significant explanatory power. The individual regression coefficients that are significantly different than zero include the scores on the LSAT and whether the student’s letters of recommendation were unusually strong. The coefficient on undergraduate gpa was not found to be significant at the 5% level. Chapter 11: Multiple Regression 209 11-51 a. All else equal, the annual salary of a chief justice who can be removed is $5,793 higher than if the chief justice cannot be removed b. All else equal, the annual salary of a chief justice elected on partisan ballots is $3,100 lower than if the chief justice is not elected in this manner c. H 0 : 5 0, H1 : 5 0 5793 t 1.9996 2897 t43,.05/.025 = 1.68, 2.016 Therefore, reject H 0 at the 5% level but not at the 2.5% level d. H 0 : 6 0, H1 : 6 0 3100 t 1.76 1761 t43,.05/.025 = -1.68, -20.16 Therefore, reject H 0 at the 5% level but not at the 2.5% level e. t43,.05/.025 = 2.016 95% CI: 547 2.016(124.3), 296.41 up to 797.59 11-52 Results for: Student Performance.xls Regression Analysis: Y versus X1, X2, X3, X4, X5 The regression equation is Y = 2.00 + 0.0099 X1 + 0.0763 X2 - 0.137 X3 + 0.064 X4 + 0.138 X5 Predictor Constant X1 X2 X3 X4 X5 S = 0.5416 Coef 1.997 0.00990 0.07629 -0.13652 0.0636 0.13794 SE Coef 1.273 0.01654 0.05654 0.06922 0.2606 0.07521 R-Sq = 26.5% Analysis of Variance Source DF Regression 5 Residual Error 21 Total 26 SS 2.2165 6.1598 8.3763 T 1.57 0.60 1.35 -1.97 0.24 1.83 P 0.132 0.556 0.192 0.062 0.810 0.081 VIF 1.3 1.2 1.1 1.4 1.1 R-Sq(adj) = 9.0% MS 0.4433 0.2933 F 1.51 P 0.229 The model is not significant (p-value of the F-test = .229). The model only explains 26.5% of the variation in gpa with the hours spent studying, hours spent preparing for tests, hours spent in bars, whether or not students take notes or mark highlights when reading tests and the average number of credit hours taken per semester. The only independent variables that are marginally significant (10% level but not the 5% level) include number of hours spent in bars and whether or not students take notes or mark highlights when reading tests. The other independent variables are not significant at common levels of alpha. 210 Solutions Manual for Statistics for Business & Economics, 5 th Edition 11-53 a. All else equal, the average rating of a course is 6.21 units higher if a visiting lecturer is brought in than if otherwise. b. H 0 : 4 0, H1 : 4 0 6.21 t 1.73 3.59 t20,.05 = 1.725 Therefore, reject H 0 at the 5% level c.56.9% of the variation in the average course rating can be explained by the variation in the percentage of time spent in group discussions, the dollars spent on preparing the course materials, the dollars spent on food and drinks, and whether a guest lecturer is brought in. H 0 : 1 2 3 4 0, H1 : At least one i 0, (i 1, 2,3, 4) n k 1 R2 20 .569 F 6.6 2 k 1 R 4 1 .569 F 4,20,.01 = 4.43 Therefore, reject H 0 at the 1% level d. t20,.025 = 2.086 95% CI: .52 2.086(.21), .0819 up to .9581 11-54 34.4% of the variation in a test on understanding college economics can be explained by the student’s gpa, the teacher that taught the course, the gender of the student, the pre-test score, the number of credit hours completed and the age of the student. The regression model has significant explanatory power: H 0 : 1 2 3 4 5 6 7 0, H1 : At least one i 0, (i 1, 2,3, 4,5, 6, 7) F n k 1 R2 342 .344 25.62 2 k 1 R 7 1 .344 11-55 a. Begin the analysis with the correlation matrix – identify important independent variables as well as correlations between the independent variables Correlations: Salary, Experience, yearsenior, Gender_1F Experien yearseni Salary Experien yearseni 0.883 0.000 0.777 0.000 0.674 0.000 Gender_1 -0.429 0.000 -0.378 0.000 -0.292 0.000 Chapter 11: Multiple Regression 211 Regression Analysis: Salary versus Experience, yearsenior, Gender_1F The regression equation is Salary = 22644 + 437 Experience + 415 yearsenior - 1443 Gender_1F Predictor Coef SE Coef T P VIF Constant 22644.1 521.8 43.40 0.000 Experien 437.10 31.41 13.92 0.000 2.0 yearseni 414.71 55.31 7.50 0.000 1.8 Gender_1 -1443.2 519.8 -2.78 0.006 1.2 S = 2603 R-Sq = 84.9% Analysis of Variance Source DF Regression 3 Residual Error 146 Total 149 R-Sq(adj) = 84.6% SS 5559163505 989063178 6548226683 MS 1853054502 6774405 F 273.54 P 0.000 84.9% of the variation in annual salary (in dollars) can be explained by the variation in the years of experience, the years of seniority and the gender of the employee. All of the variables are significant at the .01 level of significance (p-values of .000, .000 and .006 respectively). The Ftest of the significance of the overall model shows that we reject H 0 that all of the slope coefficients are jointly equal to zero in favor of H1 that at least one slope coefficient is not equal to zero. The F-test yielded a p-value of .000. b. H 0 : 3 0, H1 : 3 0 1443.2 t 2.78 , t146,.01 = 2.326 519.8 Therefore, reject H 0 at the 1% level. And conclude that the annual salaries for females are statistically significantly lower than they are for males. c. Add an interaction term and test for the significance of the slope coefficient on the interaction term. 11-56 Analyze the correlation matrix first: Correlations: Sales Pizza1, Price Pizza1, Promotion Pi, Sales B2, Price B2, Sale Sales Pi Price Pi Promotio Sales B2 Price B2 Sales B3 Price B3 Sales B4 Price Pi -0.263 0.001 Promotio 0.570 0.000 -0.203 0.011 Sales B2 0.136 0.090 0.170 0.034 0.031 0.700 Price B2 0.118 0.143 0.507 0.000 0.117 0.146 -0.370 0.000 Sales B3 0.014 0.862 0.174 0.029 0.045 0.581 0.103 0.199 0.199 0.013 Price B3 0.179 0.026 0.579 0.000 0.034 0.675 0.162 0.043 0.446 0.000 -0.316 0.000 Sales B4 0.248 0.002 0.102 0.205 0.123 0.127 0.310 0.000 0.136 0.091 0.232 0.004 0.081 0.313 Price B4 0.177 0.027 0.509 0.000 0.124 0.124 0.229 0.004 0.500 0.000 0.117 0.147 0.523 0.000 -0.158 0.049 212 Solutions Manual for Statistics for Business & Economics, 5 th Edition Strongest correlation with sales of Pizza1 is the type of promotion. Price of Pizza1 has the correct ‘negative’ association with sales. Prices of the competing brands are expected to be positive since they are substitutes with Pizza1; however, the sales of the competing brands are expected to be negatively related to the sales of Pizza1. Regression Analysis: Sales Pizza1 versus Price Pizza1, Promotion Pi, ... The regression equation is Sales Pizza1 = - 6406 - 24097 Price Pizza1 + 1675 Promotion Pizza1 + 0.0737 Sales B2 + 4204 Price B2 + 0.177 Sales B3 + 18003 Price B3 + 0.345 Sales B4 + 11813 Price B4 Predictor Coef SE Coef T P VIF Constant -6406 2753 -2.33 0.021 Price Pi -24097 3360 -7.17 0.000 2.5 Promotio 1674.6 283.9 5.90 0.000 1.2 Sales B2 0.07370 0.08281 0.89 0.375 3.1 Price B2 4204 4860 0.87 0.388 4.3 Sales B3 0.17726 0.09578 1.85 0.066 1.9 Price B3 18003 4253 4.23 0.000 3.0 Sales B4 0.3453 0.1392 2.48 0.014 1.9 Price B4 11813 6151 1.92 0.057 3.0 S = 3700 R-Sq = 54.9% Analysis of Variance Source DF Regression 8 Residual Error 147 Total 155 SS 2447394873 2012350019 4459744891 R-Sq(adj) = 52.4% MS 305924359 13689456 F 22.35 P 0.000 The multiple regression with all of the independent variables indicates that 54.9% of the variation in the sales of Pizza1 can be explained by all of the independent variables. However, not all of the independent variables are significantly different from zero. It appears that neither the price of Brand 2, nor the sales of Brand 2 has a statistically significant effect on Pizza1. Eliminating those variables that are insignificant yields: Regression Analysis: Sales Pizza1 versus Price Pizza1, Promotion Pi, ... The regression equation is Sales Pizza1 = - 6546 - 23294 Price Pizza1 + 1701 Promotion Pizza1 + 0.197 Sales B3 + 18922 Price B3 + 0.418 Sales B4 + 15152 Price B4 Predictor Coef SE Coef T P VIF Constant -6546 2676 -2.45 0.016 Price Pi -23294 3210 -7.26 0.000 2.3 Promotio 1701.0 279.9 6.08 0.000 1.2 Sales B3 0.19737 0.09234 2.14 0.034 1.8 Price B3 18922 4092 4.62 0.000 2.8 Sales B4 0.4183 0.1137 3.68 0.000 1.3 Price B4 15152 4978 3.04 0.003 2.0 S = 3686 R-Sq = 54.6% Analysis of Variance Source DF Regression 6 Residual Error 149 Total 155 SS 2435527670 2024217221 4459744891 R-Sq(adj) = 52.8% MS 405921278 13585350 F 29.88 P 0.000 All of the variables are now significant at the .05 level and all have the correct sign excepting the sales of brand 3 and 4. Chapter 11: Multiple Regression 213 11-57 through 11-59 Reports can be written by following the extended Case Study on the data file Cotton – see Section 11.9 11-60 Regression Analysis: y_deathrate versus x1_totmiles, x2_avgspeed The regression equation is y_deathrate = - 2.97 - 0.00447 x1_totmiles + 0.219 x2_avgspeed Predictor Coef SE Coef T P VIF Constant -2.969 3.437 -0.86 0.416 x1_totmi -0.004470 0.001549 -2.89 0.023 11.7 x2_avgsp 0.21879 0.08391 2.61 0.035 11.7 S = 0.1756 R-Sq = 55.1% Analysis of Variance Source DF Regression 2 Residual Error 7 Total 9 R-Sq(adj) = 42.3% SS 0.26507 0.21593 0.48100 MS 0.13254 0.03085 F 4.30 P 0.061 55.1% of the variation in death rates can be explained by the variation in total miles traveled and in average travel speed. The overall model is significant at the 10% but not the 5% level since the p-value of the F-test is .061. All else equal, the average speed variable has the expected sign since as average speed increases, the death rate also increases. The total miles traveled variable is negative which indicates that the more miles traveled, the lower the death rate. Both of the independent variables are significant at the 5% level (p-values of .023 and .035 respectively). There appears to be some correlation between the independent variables. Due to the high correlation between the independent variables, an alternative model using a quadratic model is as follows: Regression Analysis: y_deathrate versus x1_totmiles, x1_totsquared The regression equation is y_deathrate = - 6.54 + 0.0268 x1_totmiles -0.000015 x1_totsquared Predictor Coef Constant -6.539 x1_totmi 0.026800 x1_totsq -0.00001480 S = 0.06499 R-Sq = 93.9% Analysis of Variance Source DF Regression 2 Residual Error 7 Total 9 Source x1_totmi x1_totsq SE Coef 1.296 0.002835 0.00000153 DF 1 1 SS 0.45143 0.02957 0.48100 Seq SS 0.05534 0.39609 T -5.04 9.45 -9.68 P 0.001 0.000 0.000 VIF 285.5 285.5 R-Sq(adj) = 92.1% MS 0.22572 0.00422 F 53.44 P 0.000 214 Solutions Manual for Statistics for Business & Economics, 5 th Edition 11-61 Regression Analysis: y_FemaleLFPR versus x1_income, x2_yrsedu, ... The regression equation is y_FemaleLFPR = 0.2 +0.000406 x1_income + 4.84 x2_yrsedu - 1.55 x3_femaleun Predictor Coef SE Coef T P VIF Constant 0.16 34.91 0.00 0.996 x1_incom 0.0004060 0.0001736 2.34 0.024 1.2 x2_yrsed 4.842 2.813 1.72 0.092 1.5 x3_femal -1.5543 0.3399 -4.57 0.000 1.3 S = 3.048 R-Sq = 54.3% Analysis of Variance Source DF Regression 3 Residual Error 46 Total 49 SS 508.35 427.22 935.57 R-Sq(adj) = 51.4% MS 169.45 9.29 F 18.24 P 0.000 11.62 Regression Analysis: y_money versus x1_pcincome, x2_ir The regression equation is y_money = - 1158 + 0.253 x1_pcincome - 19.6 x2_ir Predictor Coef SE Coef T P Constant -1158.4 587.9 -1.97 0.080 x1_pcinc 0.25273 0.03453 7.32 0.000 x2_ir -19.56 21.73 -0.90 0.391 S = 84.93 R-Sq = 89.8% Analysis of Variance Source DF Regression 2 Residual Error 9 Total 11 Source x1_pcinc x2_ir DF 1 1 SS 570857 64914 635771 VIF 1.3 1.3 R-Sq(adj) = 87.5% MS 285429 7213 F 39.57 P 0.000 Seq SS 565012 5845 11-63 Regression Analysis: y_manufgrowt versus x1_aggrowth, x2_exportgro, ... The regression equation is y_manufgrowth = 2.15 + 0.493 x1_aggrowth + 0.270 x2_exportgrowth - 0.117 x3_inflation Predictor Coef SE Coef T P VIF Constant 2.1505 0.9695 2.22 0.032 x1_aggro 0.4934 0.2020 2.44 0.019 1.0 x2_expor 0.26991 0.06494 4.16 0.000 1.0 x3_infla -0.11709 0.05204 -2.25 0.030 1.0 S = 3.624 R-Sq = 39.3% Analysis of Variance Source DF Regression 3 Residual Error 44 Total 47 Source x1_aggro x2_expor x3_infla DF 1 1 1 SS 373.98 577.97 951.95 Seq SS 80.47 227.02 66.50 R-Sq(adj) = 35.1% MS 124.66 13.14 F 9.49 P 0.000 Chapter 11: Multiple Regression 215 11-64 The method of least squares regression yields estimators that are BLUE – Best Linear Unbiased Estimators. This result holds when the assumptions regarding the behavior of the error term are true. BLUE estimators are the most efficient (best) estimators out of the class of all unbiased estimators. The advent of computing power incorporating the method of least squares has dramatically increased its use. 11-65 The analysis of variance table identifies how the total variability of the dependent variable (SST) is split up between the portion of variability that is explained by the regression model (SSR) and the part that is unexplained (SSE). The Coefficient of Determination (R2) is derived as the ratio of SSR to SST. The analysis of variance table also computes the F statistic for the test of the significance of the overall regression – whether all of the slope coefficients are jointly equal to zero. The associated p-value is also generally reported in this table. 11-66 A. False. If the regression model does not explain a large enough portion of the variability of the dependent variable, then the error sum of squares can be larger than the regression sum of squares b. False – the sum of several simple linear regressions will not equal a multiple regression since the assumption of ‘all else equal’ will be violated in the simple linear regressions. The multiple regression ‘holds’ all else equal in calculating the partial effect that a change in one of the independent variables has on the dependent variable. c. True d. False – While the regular coefficient of determination (R2) cannot be negative, the adjusted coefficient of determination R 2 can become negative. If the independent variables added into a regression equation have very little explanatory power, the loss of degrees of freedom may more than offset the added explanatory power. e. True 11-67 If one model contains more explanatory variables, then SST remains the same for both models but SSR will be higher for the model with more explanatory variables. Since SST = SSR1 + SSE1 which is equivalent to SSR2 + SSE2 and given that SSR2 > SSR1, then SSE1 > SSE2. Hence, the coefficient of determination will be higher with a greater number of explanatory variables and the coefficient of determination must be interpreted in conjunction with whether or not the regression slope coefficients on the explanatory variables are significantly different from zero. 11-68 This is a classic example of what happens when there is a high degree of multicollinearity between the independent variables. The overall model can be shown to have significant explanatory power and yet none of the slope coefficients on the independent variables are significantly different from zero. 216 Solutions Manual for Statistics for Business & Economics, 5 th Edition 11-69 e (y a b x b x e (y y b x b x e ny ny nb x nb x e 0 ) i i 1 1i 2 2i i i 1 1i 2 2i b1 x1i b2 x2i ) 2 2 nb1 x1 nb2 x2 i 1 1 i 11-70 a. All else equal, a unit change in population, industry size, measure of economic quality, measure of health and social life results in a respective 4.983, 2.198, 3.816, 3.215, and .085 increase in the new business starts in the industry. 2 b. R .766 c. t 62,.05 = 1.67, therefore, the 90% CI: 3.816 1.67(2.063) = .3708 up to 7.2612 d. H 0 : 5 0, H1 : 5 0 .886 t .29 3.055 t62,.10 = -1.295 Therefore, do not reject H 0 at the 20% level e. H 0 : 6 0, H1 : 6 0 3.125 t 2.05 , t62,.025 = 1.999 1.568 Therefore, reject H 0 at the 5% level f. H 0 : 1 2 3 4 5 6 7 0, H1 : At least one i 0, (i 1, 2,3, 4,5, 6, 7) n k 1 R2 62 .766 F 28.99 2 k 1 R 2 1 .766 F 7,62,.01 = 2.79, Therefore, reject H 0 at the 1% level 11-71 a. All else equal, an increase of one question results in a decrease of 1.834 in expected percentage of responses received. All else equal, an increase in one word in length of the questionnaire results in a decrease of .016 in expected percentage of responses received. b. 63.7% of the variability in the percentage of responses received can be explained by the variability in the number of questions asked and the number of words c. H 0 : 1 2 0, H1 : At least one i 0,(i 1, 2) n k 1 R2 27 .637 23.69 2 k 1 R 2 1 .637 F 2,27,.10 = 5.49, Therefore, reject H 0 at the 1% level d. t27,.005 = 2.771, 99% CI: -1.8345 2.771(.6349). –3.5938 up to -.0752 F e. t = -1.78, t27,.05/.025 = -1.703, -2.052. Therefore, reject H 0 at the 5% level but not at the 2.5% level Chapter 11: Multiple Regression 217 11-72 a. All else equal, a 1% increase in course time spent in group discussion results in an expected increase of .3817 in the average rating of the course. All else equal, a unit increase in money spent on the course results in an expected increase of .5172 in the average rating by participants of the course. All else equal, a unit increase in expenditure on non-course activities results in an expected increase of .0753 in the average rating of the course. b. 57.9% of the variation in the average rating can be explained by the linear relationship with % of class time spent on discussion, money spent on class activities and money spent on non-class activities. c. H 0 : 1 2 3 0, H1 : At least one i 0, (i 1, 2,3) n k 1 R2 21 .579 F 9.627 2 k 1 R 3 1 .579 F 2,21,.01 = 4.87 Therefore, reject H 0 at the 1% level d. t21,.05 = 1.721, 90% CI: .3817 1.721(.2018) .0344 up to .729 e. t = 2.64, t21,.01/.005 = 2.518, 2.831 Therefore, reject H 0 at the 1% level but not at the .5% level f. t = 1.09, t21,.10 = 1.323 Therefore, do not reject H 0 at the 20% level 11-73 Regression Analysis: y_rating versus x1_expgrade, x2_Numstudents The regression equation is y_rating = - 0.200 + 1.41 x1_expgrade - 0.0158 x2_Numstudents Predictor Coef SE Coef T P VIF Constant -0.2001 0.6968 -0.29 0.777 x1_expgr 1.4117 0.1780 7.93 0.000 1.5 x2_Numst -0.015791 0.003783 -4.17 0.001 1.5 S = 0.1866 R-Sq = 91.5% Analysis of Variance Source DF Regression 2 Residual Error 17 Total 19 Source x1_expgr x2_Numst DF 1 1 SS 6.3375 0.5920 6.9295 R-Sq(adj) = 90.5% MS 3.1687 0.0348 F 90.99 P 0.000 Seq SS 5.7308 0.6067 11.74 Use the result from exercise 32 H 0 : 4 5 6 7 0, H1 : At least one i 0, (i 4,5, 6, 7, ) n k 1 R2 55 .467 .242 5.804 2 k 1 R 4 1 .467 F 4,55,.01 = 3.68 Therefore, reject H 0 at the 1% level F 218 Solutions Manual for Statistics for Business & Economics, 5 th Edition 11-75 a. All else equal, each extra point in the student’s expected score leads to an expected increase of .469 in the actual score b. t 103,.025 = 1.98, therefore, the 95% CI: 3.369 1.98(.456) = 2.4661 up to 4.2719 c. H 0 : 3 0, H1 : 3 0 3.054 t 2.096 1.457 t103,.025 = 1.98 Therefore, reject H 0 at the 5% level d. 68.6% of the variation in the exam scores is explained by their linear dependence on the student’s expected score, hours per week spent working on the course and the student’s grade point average e. H 0 : 1 2 3 0, H1 : At least one i 0, (i 1, 2,3) n k 1 R2 103 .686 75.008 2 k 1 R 3 1 .686 F3,103,.01 = 3.97 Reject H 0 at any common levels of alpha F f. R .686 .82825 g. Yˆ 2.178 .469(80) 3.369(8) 3.054(3) 75.812 11-76 a. t 22,.01 = 2.819, therefore, the 99% CI: .0974 2.819(0.0215) = .0368 up to .1580 b. H 0 : 2 0, H1 : 2 0 .374 t 1.789 .209 t22,.05/.025 = 1.717, 2.074, therefore, reject H 0 at the 5% level but not the 2.5% level 22(.91) 2 .9175 24 d. H 0 : 1 2 0, H1 : At least one i 0,(i 1, 2) c. R 2 n k 1 R2 22 .9175 122.33 2 k 1 R 2 1 .9175 F2,22,.01 = 5.72 Reject H 0 at any common levels of alpha F e. R .9175 .9579 Chapter 11: Multiple Regression 219 11-77 a. t 2669,.05 = 1.645, therefore, the 90% CI: 480.04 1.645(224.9) = 110.0795 up to 850.0005 b. t 2669,.005 = 2.576, therefore, the 99% CI: 1350.3 2.576(212.3) = 803.4152 up to 1897.1848 c. H 0 : 8 0, H1 : 8 0 891.67 t 4.9299 180.87 t2669,.005 = 2.576, therefore, reject H 0 at the .5% level d. H 0 : 9 0, H1 : 9 0 722.95 t 6.5142 110.98 t2669,.005 = 2.576, therefore, reject H 0 at the .5% level e. 52.39% of the variability in minutes played can be explained by the variability in field goal percentage, free throw percentage and the number of rebounds per minutes f. R .5239 .7238 11-78 a. H 0 : 1 0, H1 : 1 0 .052 t 2.737 .019 t60,.005 = 2.66, therefore, reject H 0 at the 1% level b. H 0 : 2 0, H1 : 2 0 .005 t .119 .042 t60,.10 = 1.296, therefore, do not reject H 0 at the 20% level c. 17% of the variation in the growth rate in GDP can be explained by the variations in real income per capita and the average tax rate, as a proportion of GNP. d. R .17 .4123 11-79 A report can be written by following the Case Study and testing the significance of the model. See section 11.9 11-80 a. Start with the correlation matrix: Correlations: EconGPA, SATverb, SATmath, HSPct EconGPA 0.478 0.000 SATverb SATmath 0.427 0.000 0.353 0.003 HSPct 0.362 0.000 0.201 0.121 SATverb SATmath 0.497 0.000 220 Solutions Manual for Statistics for Business & Economics, 5 th Edition Regression Analysis: EconGPA versus SATverb, SATmath, HSPct The regression equation is EconGPA = 0.612 + 0.0239 SATverb + 0.0117 SATmath + 0.00530 61 cases used 51 cases contain missing values Predictor Coef SE Coef T P Constant 0.6117 0.4713 1.30 0.200 SATverb 0.023929 0.007386 3.24 0.002 SATmath 0.011722 0.007887 1.49 0.143 HSPct 0.005303 0.004213 1.26 0.213 S = 0.4238 R-Sq = 32.9% Analysis of Variance Source DF Regression 3 Residual Error 57 Total 60 Source SATverb SATmath HSPct DF 1 1 1 SS 5.0171 10.2385 15.2556 HSPct VIF 1.2 1.5 1.3 R-Sq(adj) = 29.4% MS 1.6724 0.1796 F 9.31 P 0.000 Seq SS 3.7516 0.9809 0.2846 The regression model indicates positive coefficients, as expected, for all three independent variables. The greater the high school rank, and the higher the SAT verbal and SAT math scores, the larger the Econ GPA. The high school rank variable has the smallest t-statistic and is removed from the model: Regression Analysis: EconGPA versus SATverb, SATmath The regression equation is EconGPA = 0.755 + 0.0230 SATverb + 0.0174 SATmath 67 cases used 45 cases contain missing values Predictor Coef SE Coef T P Constant 0.7547 0.4375 1.72 0.089 SATverb 0.022951 0.006832 3.36 0.001 SATmath 0.017387 0.006558 2.65 0.010 S = 0.4196 R-Sq = 30.5% Analysis of Variance Source DF Regression 2 Residual Error 64 Total 66 Source SATverb SATmath DF 1 1 SS 4.9488 11.2693 16.2181 VIF 1.1 1.1 R-Sq(adj) = 28.3% MS 2.4744 0.1761 F 14.05 P 0.000 Seq SS 3.7109 1.2379 Both SAT variables are now statistically significant at the .05 level and appear to pick up separate influences on the dependent variable. The simple correlation coefficient between SAT math and SAT verbal is relatively low at .353. Thus, multicollinearity will not be dominant in this regression model. The final regression model, with conditional t-statistics in parentheses under the coefficients, is: Yˆ .755 .023( SATverbal ) .0174( SATmath) (3.36) (2.65) S = .4196 R2 = .305 n = 67 Chapter 11: Multiple Regression 221 b. Start with the correlation matrix: Correlations: EconGPA, Acteng, ACTmath, ACTss, ACTcomp, HSPct EconGPA 0.387 0.001 Acteng ACTmath 0.338 0.003 0.368 0.001 ACTss 0.442 0.000 0.448 0.000 0.439 0.000 ACTcomp 0.474 0.000 0.650 0.000 0.765 0.000 0.812 0.000 HSPct 0.362 0.000 0.173 0.150 0.290 0.014 0.224 0.060 Acteng ACTmath ACTss ACTcomp 0.230 0.053 Regression Analysis: EconGPA versus Acteng, ACTmath, ... The regression equation is EconGPA = - 0.207 + 0.0266 Acteng - 0.0023 ACTmath + 0.0212 ACTss + 0.0384 ACTcomp + 0.0128 HSPct 71 cases used 41 cases contain missing values Predictor Coef SE Coef T P VIF Constant -0.2069 0.6564 -0.32 0.754 Acteng 0.02663 0.02838 0.94 0.352 2.2 ACTmath -0.00229 0.03031 -0.08 0.940 4.2 ACTss 0.02118 0.02806 0.75 0.453 4.6 ACTcomp 0.03843 0.07287 0.53 0.600 12.7 HSPct 0.012817 0.005271 2.43 0.018 1.2 S = 0.5034 R-Sq = 31.4% Analysis of Variance Source DF Regression 5 Residual Error 65 Total 70 Source Acteng ACTmath ACTss ACTcomp HSPct DF 1 1 1 1 1 SS 7.5253 16.4691 23.9945 R-Sq(adj) = 26.1% MS 1.5051 0.2534 F 5.94 P 0.000 Seq SS 3.5362 1.0529 1.4379 0.0001 1.4983 The regression shows that only high school rank is significant at the .05 level. We may suspect multicollinearity between the variables, particularly since there is a ‘total’ ACT score (ACT composite) as well as the components that make up the ACT composite. Since conditional significance is dependent on which other independent variables are included in the regression equation, drop one variable at a time. ACTmath has the lowest t-statistic and is removed: 222 Solutions Manual for Statistics for Business & Economics, 5 th Edition Regression Analysis: EconGPA versus Acteng, ACTss, ACTcomp, HSPct The regression equation is EconGPA = - 0.195 + 0.0276 Acteng + 0.0224 ACTss + 0.0339 ACTcomp + 0.0127 HSPct 71 cases used 41 cases contain missing values Predictor Coef SE Coef T P VIF Constant -0.1946 0.6313 -0.31 0.759 Acteng 0.02756 0.02534 1.09 0.281 1.8 ACTss 0.02242 0.02255 0.99 0.324 3.0 ACTcomp 0.03391 0.04133 0.82 0.415 4.2 HSPct 0.012702 0.005009 2.54 0.014 1.1 S = 0.4996 R-Sq = 31.4% Analysis of Variance Source DF Regression 4 Residual Error 66 Total 70 Source Acteng ACTss ACTcomp HSPct DF 1 1 1 1 SS 7.5239 16.4706 23.9945 R-Sq(adj) = 27.2% MS 1.8810 0.2496 F 7.54 P 0.000 Seq SS 3.5362 2.1618 0.2211 1.6048 Again, high school rank is the only conditionally significant variable. ACTcomp has the lowest t-statistic and is removed: Regression Analysis: EconGPA versus Acteng, ACTss, HSPct The regression equation is EconGPA = 0.049 + 0.0390 Acteng + 0.0364 ACTss + 0.0129 HSPct 71 cases used 41 cases contain missing values Predictor Coef SE Coef T P VIF Constant 0.0487 0.5560 0.09 0.930 Acteng 0.03897 0.02114 1.84 0.070 1.3 ACTss 0.03643 0.01470 2.48 0.016 1.3 HSPct 0.012896 0.004991 2.58 0.012 1.1 S = 0.4983 R-Sq = 30.7% Analysis of Variance Source DF Regression 3 Residual Error 67 Total 70 Source Acteng ACTss HSPct DF 1 1 1 SS 7.3558 16.6386 23.9945 R-Sq(adj) = 27.6% MS 2.4519 0.2483 F 9.87 P 0.000 Seq SS 3.5362 2.1618 1.6579 Now ACTss and high school rank are conditionally significant. ACTenglish has a t-statistic less than 2 and is removed: Chapter 11: Multiple Regression 223 Regression Analysis: EconGPA versus ACTss, HSPct The regression equation is EconGPA = 0.566 + 0.0479 ACTss + 0.0137 HSPct 71 cases used 41 cases contain missing values Predictor Coef SE Coef T Constant 0.5665 0.4882 1.16 ACTss 0.04790 0.01355 3.53 HSPct 0.013665 0.005061 2.70 S = 0.5070 R-Sq = 27.1% Analysis of Variance Source DF Regression 2 Residual Error 68 Total 70 Source ACTss HSPct DF 1 1 P 0.250 0.001 0.009 VIF 1.1 1.1 R-Sq(adj) = 25.0% SS 6.5123 17.4821 23.9945 MS 3.2562 0.2571 F 12.67 P 0.000 Seq SS 4.6377 1.8746 Both of the independent variables are statistically significant at the .05 level and hence, the final regression model, with conditional t-statistics in parentheses under the coefficients, is: Yˆ .567 .0479( ACTss) .0137( HSPct ) (3.53) (2.70) S = .5070 R2 = .271 n = 71 c. The regression model with the SAT variables is the better predictor because the standard error of the estimate is smaller than for the ACT model (.4196 vs. .5070). The R2 measure cannot be directly compared due to the sample size differences. 11-81 Correlations: Salary, age, Experien, yrs_asoc, yrs_full, Sex_1Fem, Market, C8 age Salary 0.749 0.000 age Experien yrs_asoc yrs_full Sex_1Fem Experien 0.883 0.000 0.877 0.000 yrs_asoc 0.698 0.000 0.712 0.000 0.803 0.000 yrs_full 0.777 0.000 0.583 0.000 0.674 0.000 0.312 0.000 Sex_1Fem -0.429 0.000 -0.234 0.004 -0.378 0.000 -0.367 0.000 -0.292 0.000 Market 0.026 0.750 -0.134 0.103 -0.150 0.067 -0.113 0.169 -0.017 0.833 0.062 0.453 -0.029 0.721 -0.189 0.020 -0.117 0.155 -0.073 0.373 -0.043 0.598 -0.094 0.254 C8 Market -0.107 0.192 The correlation matrix indicates that several of the independent variables are likely to be significant, however, multicollinearity is also a likely result. The regression model with all independent variables is: 224 Solutions Manual for Statistics for Business & Economics, 5 th Edition Regression Analysis: Salary versus age, Experien, ... The regression equation is Salary = 23725 - 40.3 age + 357 Experien + 263 yrs_asoc + 493 yrs_full - 954 Sex_1Fem + 3427 Market + 1188 C8 Predictor Coef SE Coef T P VIF Constant 23725 1524 15.57 0.000 age -40.29 44.98 -0.90 0.372 4.7 Experien 356.83 63.48 5.62 0.000 10.0 yrs_asoc 262.50 75.11 3.49 0.001 4.0 yrs_full 492.91 59.27 8.32 0.000 2.6 Sex_1Fem -954.1 487.3 -1.96 0.052 1.3 Market 3427.2 754.1 4.54 0.000 1.1 C8 1188.4 597.5 1.99 0.049 1.1 S = 2332 R-Sq = 88.2% R-Sq(adj) = 87.6% Analysis of Variance Source DF SS MS F Regression 7 5776063882 825151983 151.74 Residual Error 142 772162801 5437766 Total 149 6548226683 Source age Experien yrs_asoc yrs_full Sex_1Fem Market DF 1 1 1 1 1 1 P 0.000 Seq SS 3669210599 1459475287 1979334 500316356 22707368 100860164 Since age is insignificant and has the smallest t-statistics, it is removed from the model: Regression Analysis: Salary versus Experien, yrs_asoc, ... The regression equation is Salary = 22455 + 324 Experien + 258 yrs_asoc + 491 yrs_full + 3449 Market + 1274 C8 Predictor Coef SE Coef T P Constant 22455.2 557.7 40.26 0.000 Experien 324.24 51.99 6.24 0.000 yrs_asoc 257.88 74.88 3.44 0.001 yrs_full 490.97 59.19 8.29 0.000 Sex_1Fem -1043.4 476.7 -2.19 0.030 Market 3449.4 753.2 4.58 0.000 C8 1274.5 589.3 2.16 0.032 S = 2330 R-Sq = 88.1% R-Sq(adj) = 87.6% Analysis of Variance Source DF SS MS F Regression 6 5771700580 961950097 177.15 Residual Error 143 776526103 5430252 Total 149 6548226683 - 1043 Sex_1Fem VIF 6.7 4.0 2.6 1.2 1.1 1.1 P 0.000 The conditional F test for age is: SSRF SSRR 5, 766, 064, 000 5, 771, 700, 736 FX 2 .80 S 2Y | X (2332) 2 Which is well below any common critical value of F. Thus, age is removed from the model. The remaining independent variables are all significant at the .05 level of significance and hence, becomes the final regression model. Residual analysis to determine if the assumption of linearity holds true follows: Chapter 11: Multiple Regression 10000 RESI1 5000 0 -5000 0 10 20 30 40 Experien 10000 RESI1 5000 0 -5000 0 10 yrs_asoc 20 225 Solutions Manual for Statistics for Business & Economics, 5 th Edition 10000 RESI1 5000 0 -5000 0 10 20 yrs_full 10000 5000 RESI1 226 0 -5000 0.0 0.5 Sex_1Fem 1.0 Chapter 11: Multiple Regression 227 10000 RESI1 5000 0 -5000 0.0 0.5 1.0 Market 10000 RESI1 5000 0 -5000 0.0 0.5 1.0 C8 The residual plot for Experience shows a relatively strong quadratic relationship between Experience and Salary. Therefore, a new variable, taking into account the quadratic relationship is generated and added to the model. None of the other residual plots shows strong evidence of non-linearity. 228 Solutions Manual for Statistics for Business & Economics, 5 th Edition Regression Analysis: Salary versus Experien, ExperSquared, ... The regression equation is Salary = 18915 + 875 Experien - 15.9 ExperSquared + 222 yrs_asoc + 612 yrs_full - 650 Sex_1Fem + 3978 Market + 1042 C8 Predictor Coef SE Coef T P VIF Constant 18915.2 583.2 32.43 0.000 Experien 875.35 72.20 12.12 0.000 20.6 ExperSqu -15.947 1.717 -9.29 0.000 16.2 yrs_asoc 221.58 59.40 3.73 0.000 4.0 yrs_full 612.10 48.63 12.59 0.000 2.8 Sex_1Fem -650.1 379.6 -1.71 0.089 1.2 Market 3978.3 598.8 6.64 0.000 1.1 C8 1042.3 467.1 2.23 0.027 1.1 S = 1844 R-Sq = 92.6% R-Sq(adj) = 92.3% Analysis of Variance Source DF SS MS F Regression 7 6065189270 866455610 254.71 Residual Error 142 483037413 3401672 Total 149 6548226683 Source Experien ExperSqu yrs_asoc yrs_full Sex_1Fem Market C8 DF 1 1 1 1 1 1 1 P 0.000 Seq SS 5109486518 91663414 15948822 678958872 12652358 139540652 16938635 The squared term for experience is statistically significant; however, the Sex_1Fem is no longer significant at the .05 level and hence is removed from the model: Regression Analysis: Salary versus Experien, ExperSquared, ... The regression equation is Salary = 18538 + 888 Experien - 16.3 ExperSquared + 237 yrs_asoc + 624 yrs_full + 3982 Market + 1145 C8 Predictor Coef SE Coef T P VIF Constant 18537.8 543.6 34.10 0.000 Experien 887.85 72.32 12.28 0.000 20.4 ExperSqu -16.275 1.718 -9.48 0.000 16.0 yrs_asoc 236.89 59.11 4.01 0.000 3.9 yrs_full 624.49 48.41 12.90 0.000 2.8 Market 3981.8 602.9 6.60 0.000 1.1 C8 1145.4 466.3 2.46 0.015 1.0 S = 1857 R-Sq = 92.5% R-Sq(adj) = 92.2% Analysis of Variance Source DF SS MS F Regression 6 6055213011 1009202168 292.72 Residual Error 143 493013673 3447648 Total 149 6548226683 P 0.000 This is the final model with all of the independent variables being conditionally significant, including the quadratic transformation of Experience. This would indicate that a non-linear relationship exists between experience and salary. Chapter 11: Multiple Regression 229 11-82 Correlations: hseval, Comper, Homper, Indper, sizehse, incom72 hseval -0.335 0.001 Comper Homper 0.145 0.171 -0.499 0.000 Indper -0.086 0.419 -0.140 0.188 -0.564 0.000 sizehse 0.542 0.000 -0.278 0.008 0.274 0.009 -0.245 0.020 incom72 0.426 0.000 -0.198 0.062 -0.083 0.438 0.244 0.020 Comper Homper Indper sizehse 0.393 0.000 The correlation matrix indicates that the size of the house, income and percent homeowners have a positive relationship with house value. There is a negative relationship between the percent industrial and percent commercial and house value. Regression Analysis: hseval versus Comper, Homper, ... The regression equation is hseval = - 19.0 - 26.4 Comper - 12.1 Homper - 15.5 Indper + 7.22 sizehse + 0.00408 incom72 Predictor Coef SE Coef T P VIF Constant -19.02 13.20 -1.44 0.153 Comper -26.393 9.890 -2.67 0.009 2.2 Homper -12.123 7.508 -1.61 0.110 3.0 Indper -15.531 8.630 -1.80 0.075 2.6 sizehse 7.219 2.138 3.38 0.001 1.5 incom72 0.004081 0.001555 2.62 0.010 1.4 S = 3.949 R-Sq = 40.1% R-Sq(adj) = 36.5% Analysis of Variance Source DF SS MS F Regression 5 876.80 175.36 11.25 Residual Error 84 1309.83 15.59 Total 89 2186.63 P 0.000 All variables are conditionally significant with the exception of Indper and Homper. Since Homper has the smaller t-statistic, it is removed: Regression Analysis: hseval versus Comper, Indper, sizehse, incom72 The regression equation is hseval = - 30.9 - 15.2 Comper - 5.73 Indper + 7.44 sizehse + 0.00418 incom72 Predictor Coef SE Coef T P VIF Constant -30.88 11.07 -2.79 0.007 Comper -15.211 7.126 -2.13 0.036 1.1 Indper -5.735 6.194 -0.93 0.357 1.3 sizehse 7.439 2.154 3.45 0.001 1.5 incom72 0.004175 0.001569 2.66 0.009 1.4 S = 3.986 R-Sq = 38.2% R-Sq(adj) = 35.3% Analysis of Variance Source DF SS MS F Regression 4 836.15 209.04 13.16 Residual Error 85 1350.48 15.89 Total 89 2186.63 Indper is not significant and is removed: P 0.000 230 Solutions Manual for Statistics for Business & Economics, 5 th Edition Regression Analysis: hseval versus Comper, sizehse, incom72 The regression equation is hseval = - 34.2 - 13.9 Comper + 8.27 sizehse + 0.00364 incom72 Predictor Coef SE Coef T P VIF Constant -34.24 10.44 -3.28 0.002 Comper -13.881 6.974 -1.99 0.050 1.1 sizehse 8.270 1.957 4.23 0.000 1.2 incom72 0.003636 0.001456 2.50 0.014 1.2 S = 3.983 R-Sq = 37.6% Analysis of Variance Source DF Regression 3 Residual Error 86 Total 89 R-Sq(adj) = 35.4% SS 822.53 1364.10 2186.63 MS 274.18 15.86 F 17.29 P 0.000 This becomes the final regression model. The selection of a community with the objective of having larger house values would include communities where the percent of commercial property is low, the median rooms per residence is high and the per capita income is high. 11-83a. Correlation matrix: Correlations: deaths, vehwt, impcars, lghttrks, carage vehwt impcars lghttrks carage deaths 0.244 0.091 vehwt impcars lghttrks -0.284 0.048 -0.943 0.000 0.726 0.000 0.157 0.282 -0.175 0.228 -0.422 0.003 0.123 0.400 0.011 0.943 -0.329 0.021 Crash deaths are positively related to vehicle weight and percentage of light trucks and negatively related to percent imported cars and car age. Light trucks will have the strongest linear association of any independent variable followed by car age. Multicollinearity is likely to exist due to the strong correlation between impcars and vehicle weight. b. Regression Analysis: deaths versus vehwt, impcars, lghttrks, carage The regression equation is deaths = 2.60 +0.000064 vehwt - 0.00121 impcars - 0.0395 carage Predictor Coef SE Coef T Constant 2.597 1.247 2.08 vehwt 0.0000643 0.0001908 0.34 impcars -0.001213 0.005249 -0.23 lghttrks 0.008332 0.001397 5.96 carage -0.03946 0.01916 -2.06 + 0.00833 lghttrks P 0.043 0.738 0.818 0.000 0.045 S = 0.05334 R-Sq = 59.5% R-Sq(adj) = 55.8% Analysis of Variance Source DF SS MS F Regression 4 0.183634 0.045909 16.14 Residual Error 44 0.125174 0.002845 Total 48 0.308809 VIF 10.9 10.6 1.2 1.4 P 0.000 Light trucks is a significant positive variable. Since impcars has the smallest t-statistic, it is removed from the model: Chapter 11: Multiple Regression 231 Regression Analysis: deaths versus vehwt, lghttrks, carage The regression equation is deaths = 2.55 +0.000106 vehwt + 0.00831 lghttrks - 0.0411 carage Predictor Coef SE Coef T P VIF Constant 2.555 1.220 2.09 0.042 vehwt 0.00010622 0.00005901 1.80 0.079 1.1 lghttrks 0.008312 0.001380 6.02 0.000 1.2 carage -0.04114 0.01754 -2.34 0.024 1.2 S = 0.05277 R-Sq = 59.4% Analysis of Variance Source DF Regression 3 Residual Error 45 Total 48 R-Sq(adj) = 56.7% SS 0.183482 0.125326 0.308809 MS 0.061161 0.002785 F 21.96 P 0.000 Also, remove vehicle weight using the same argument: Regression Analysis: deaths versus lghttrks, carage The regression equation is deaths = 2.51 + 0.00883 lghttrks - 0.0352 carage Predictor Coef SE Coef T P Constant 2.506 1.249 2.01 0.051 lghttrks 0.008835 0.001382 6.39 0.000 carage -0.03522 0.01765 -2.00 0.052 S = 0.05404 R-Sq = 56.5% Analysis of Variance Source DF Regression 2 Residual Error 46 Total 48 VIF 1.1 1.1 R-Sq(adj) = 54.6% SS 0.174458 0.134351 0.308809 MS 0.087229 0.002921 F 29.87 P 0.000 The model has light trucks and car age as the significant variables. Note that car age is marginally significant (p-value of .052) and hence could also be dropped from the model. c. The regression modeling indicates that the percentage of light trucks is conditionally significant in all of the models and hence is an important predictor in the model. Car age and imported cars are marginally significant predictors when only light trucks is included in the model. 11.84 a. Correlation matrix: Correlations: deaths, Purbanpop, Ruspeed, Prsurf deaths Purbanpo Purbanpo -0.594 0.000 Ruspeed Prsurf 0.305 0.033 -0.224 0.121 -0.556 0.000 0.207 0.153 Ruspeed -0.232 0.109 232 Solutions Manual for Statistics for Business & Economics, 5 th Edition Descriptive Statistics: deaths, Purbanpop, Prsurf, Ruspeed Variable deaths Purbanpo Prsurf Ruspeed N 49 49 49 49 Mean 0.1746 0.5890 0.7980 58.186 Median 0.1780 0.6311 0.8630 58.400 TrMean 0.1675 0.5992 0.8117 58.222 Variable deaths Purbanpo Prsurf Ruspeed Minimum 0.0569 0.0000 0.2721 53.500 Maximum 0.5505 0.9689 1.0000 62.200 Q1 0.1240 0.4085 0.6563 57.050 Q3 0.2050 0.8113 0.9485 59.150 StDev 0.0802 0.2591 0.1928 1.683 SE Mean 0.0115 0.0370 0.0275 0.240 The proportion of urban population and rural roads that are surfaced are negatively related to crash deaths. Average rural speed is positively related, but the relationship is not as strong as the proportion of urban population and surfaced roads. The simple correlation coefficients among the independent variables are relatively low and hence multicollinearity should not be dominant in this model. Note the relatively narrow range for average rural speed. This would indicate that there is not much variability in this independent variable. Regression Analysis: deaths versus Purbanpop, Prsurf, Ruspeed The regression equation is deaths = 0.141 - 0.149 Purbanpop - 0.181 Prsurf + 0.00457 Ruspeed Predictor Constant Purbanpo Prsurf Ruspeed S = 0.05510 Coef 0.1408 -0.14946 -0.18058 0.004569 SE Coef 0.2998 0.03192 0.04299 0.004942 R-Sq = 55.8% Analysis of Variance Source DF Regression 3 Residual Error 45 Total 48 SS 0.172207 0.136602 0.308809 T 0.47 -4.68 -4.20 0.92 P 0.641 0.000 0.000 0.360 VIF 1.1 1.1 1.1 R-Sq(adj) = 52.8% MS 0.057402 0.003036 F 18.91 P 0.000 The model has conditionally significant variables for percent urban population and percent surfaced roads. Since average rural speed is not conditionally significant, it is dropped from the model: Regression Analysis: deaths versus Purbanpop, Prsurf The regression equation is deaths = 0.416 - 0.155 Purbanpop - 0.188 Prsurf Predictor Coef SE Coef T P Constant 0.41609 0.03569 11.66 0.000 Purbanpo -0.15493 0.03132 -4.95 0.000 Prsurf -0.18831 0.04210 -4.47 0.000 S = 0.05501 R-Sq = 54.9% Analysis of Variance Source DF Regression 2 Residual Error 46 Total 48 SS 0.169612 0.139197 0.308809 VIF 1.0 1.0 R-Sq(adj) = 53.0% MS 0.084806 0.003026 F 28.03 P 0.000 This becomes the final model since both variables are conditionally significant. Chapter 11: Multiple Regression 233 d. Conclude that the proportions of urban populations and the percent of rural roads that are surfaced are important independent variables in explaining crash deaths. All else equal, increases in the proportion of urban population, the lower the crash deaths. All else equal, increases in the proportion of rural roads that are surfaced will result in lower crash deaths. The average rural speed is not conditionally significant. 11-85 a. Correlation matrix and descriptive statistics Correlations: hseval, sizehse, Taxhse, Comper, incom72, totexp sizehse Taxhse Comper incom72 totexp hseval 0.542 0.000 0.248 0.019 -0.335 0.001 0.426 0.000 0.261 0.013 sizehse Taxhse Comper incom72 0.289 0.006 -0.278 0.008 0.393 0.000 -0.022 0.834 -0.114 0.285 0.261 0.013 0.228 0.030 -0.198 0.062 0.269 0.010 0.376 0.000 The correlation matrix shows that multicollinearity is not likely to be a problem in this model since all of the correlations among the independent variables are relatively low. Descriptive Statistics: hseval, sizehse, Taxhse, Comper, incom72, totexp Variable hseval sizehse Taxhse Comper incom72 totexp N 90 90 90 90 90 90 Mean 21.031 5.4778 130.13 0.16211 3360.9 1488848 Median 20.301 5.4000 131.67 0.15930 3283.0 1089110 TrMean 20.687 5.4638 128.31 0.16206 3353.2 1295444 Variable hseval sizehse Taxhse Comper incom72 totexp Minimum 13.300 5.0000 35.04 0.02805 2739.0 361290 Maximum 35.976 6.2000 399.60 0.28427 4193.0 7062330 Q1 17.665 5.3000 98.85 0.11388 3114.3 808771 Q3 24.046 5.6000 155.19 0.20826 3585.3 1570275 StDev 4.957 0.2407 48.89 0.06333 317.0 1265564 SE Mean 0.522 0.0254 5.15 0.00668 33.4 133402 The range for applying the regression model (variable means + / - 2 standard errors): Hseval 21.03 +/- 2(4.957) = 11.11 to 30.94 Sizehse 5.48 +/- 2(.24) = 5.0 to 5.96 Taxhse 130.13 +/- 2(48.89) = 32.35 to 227.91 Comper .16 +/- 2(.063) = .034 to .286 Incom72 3361 +/- 2(317) = 2727 to 3995 Totexp 1488848 +/- 2(1265564) = not a good approximation 234 Solutions Manual for Statistics for Business & Economics, 5 th Edition b. Regression models: Regression Analysis: hseval versus sizehse, Taxhse, ... The regression equation is hseval = - 31.1 + 9.10 sizehse - 0.00058 Taxhse +0.000001 totexp Predictor Coef SE Coef T Constant -31.07 10.09 -3.08 sizehse 9.105 1.927 4.72 Taxhse -0.000584 0.008910 -0.07 Comper -22.197 7.108 -3.12 incom72 0.001200 0.001566 0.77 totexp 0.00000125 0.00000038 3.28 S = 3.785 R-Sq = 45.0% Analysis of Variance Source DF Regression 5 Residual Error 84 Total 89 SS 982.98 1203.65 2186.63 - 22.2 Comper + 0.00120 incom72 P 0.003 0.000 0.948 0.002 0.445 0.002 VIF 1.3 1.2 1.3 1.5 1.5 R-Sq(adj) = 41.7% MS 196.60 14.33 F 13.72 P 0.000 Taxhse is not conditionally significant, nor is income; however, dropping one variable at a time, eliminate Taxhse first, then eliminate income: Regression Analysis: hseval versus sizehse, Comper, totexp The regression equation is hseval = - 29.9 + 9.61 sizehse - 23.5 Comper +0.000001 totexp Predictor Coef SE Coef T P VIF Constant -29.875 9.791 -3.05 0.003 sizehse 9.613 1.724 5.58 0.000 1.1 Comper -23.482 6.801 -3.45 0.001 1.2 totexp 0.00000138 0.00000033 4.22 0.000 1.1 S = 3.754 R-Sq = 44.6% Analysis of Variance Source DF Regression 3 Residual Error 86 Total 89 SS 974.55 1212.08 2186.63 R-Sq(adj) = 42.6% MS 324.85 14.09 F 23.05 P 0.000 This is the final regression model. All of the independent variables are conditionally significant. Both the size of house and total government expenditures enhances market value of homes while the percent of commercial property tends to reduce market values of homes. c. In the final regression model, the tax variable was not found to be conditionally significant and hence it is difficult to support the developer’s claim. Chapter 11: Multiple Regression 235 11-86 a. Correlation matrix Correlations: retsal84, Unemp84, perinc84 retsal84 Unemp84 -0.370 0.008 perinc84 0.633 0.000 Unemp84 -0.232 0.101 There is a positive association between per capita income and retail sales. There is a negative association between unemployment and retail sales. Multicollinearity does not appear to be a problem since the correlation between the independent variables is relatively low. Descriptive Statistics: retsal84, perinc84, Unemp84 Variable retsal84 perinc84 Unemp84 N 51 51 51 Mean 5536 12277 7.335 Median 5336 12314 7.000 TrMean 5483 12166 7.196 Variable retsal84 perinc84 Minimum 4250 8857 Maximum 8348 17148 Q1 5059 10689 Q3 6037 13218 StDev 812 1851 2.216 SE Mean 114 259 0.310 Regression Analysis: retsal84 versus Unemp84, perinc84 The regression equation is retsal84 = 3054 - 86.3 Unemp84 + 0.254 perinc84 Predictor Coef SE Coef T P Constant 3054.3 724.4 4.22 0.000 Unemp84 -86.25 40.20 -2.15 0.037 perinc84 0.25368 0.04815 5.27 0.000 S = 612.9 R-Sq = 45.3% Analysis of Variance Source DF Regression 2 Residual Error 48 Total 50 SS 14931938 18029333 32961271 VIF 1.1 1.1 R-Sq(adj) = 43.0% MS 7465969 375611 F 19.88 P 0.000 This is the final model since all of the independent variables are conditionally significant at the .05 level. The 95% confidence intervals for the regression slope coefficients: ˆ1 t (Sˆ ) : -86.25 +/- 2.011(40.2) = -86.25 +/- 80.84 ˆ2 t (Sˆ ) : .254 + / - 2.011(.04815) = .254 + / - .0968 1 2 b. All things equal, the condition effect of a $1,000 decrease in per capita income on retail sales would be to reduce retail sales by $254. 236 Solutions Manual for Statistics for Business & Economics, 5 th Edition c. Adding state population as a predictor yields the following regression results: Regression Analysis: retsal84 versus Unemp84, perinc84, Totpop84 The regression equation is retsal84 = 2828 - 71.3 Unemp84 + 0.272 perinc84 - 0.0247 Totpop84 Predictor Coef SE Coef T P VIF Constant 2828.4 737.9 3.83 0.000 Unemp84 -71.33 41.40 -1.72 0.091 1.1 perinc84 0.27249 0.04977 5.47 0.000 1.1 Totpop84 -0.02473 0.01845 -1.34 0.187 1.1 S = 607.8 R-Sq = 47.3% R-Sq(adj) = 44.0% Analysis of Variance Source DF SS MS F Regression 3 15595748 5198583 14.07 Residual Error 47 17365523 369479 Total 50 32961271 P 0.000 The population variable is not conditionally significant and adds little explanatory power, therefore, it will not improve the multiple regression model. 11-87 a. Correlations: FRH, FBPR, FFED, FM2, GDPH, GH FBPR FFED FM2 GDPH GH FRH 0.510 0.000 0.244 0.001 0.854 0.000 0.934 0.000 0.907 0.000 FBPR FFED FM2 GDPH 0.957 0.000 0.291 0.000 0.580 0.000 0.592 0.000 0.077 0.326 0.287 0.000 0.285 0.000 0.987 0.000 0.977 0.000 0.973 0.000 The correlation matrix shows that both interest rates have a significant positive impact on residential investment. The money supply, GDP and government expenditures also have a significant linear association with residential investment. Note the high correlation between the two interest rate variables, which, as expected, would create significant problems with multicollinearity if both variables are included in the regression model. Hence, the interest rates will be developed in two separate models. Regression Analysis: FRH versus FBPR, FM2, GDPH, GH The regression equation is FRH = 70.0 - 3.79 FBPR - 0.0542 FM2 + 0.0932 GDPH - 0.165 GH 166 cases used 52 cases contain missing values Predictor Coef SE Coef T P VIF Constant 70.00 24.87 2.82 0.005 FBPR -3.7871 0.6276 -6.03 0.000 1.2 FM2 -0.054210 0.009210 -5.89 0.000 46.8 GDPH 0.093223 0.007389 12.62 0.000 58.1 GH -0.16514 0.03747 -4.41 0.000 28.7 S = 23.42 R-Sq = 86.7% Analysis of Variance Source DF Regression 4 Residual Error 161 Total 165 SS 573700 88331 662030 R-Sq(adj) = 86.3% MS 143425 549 F 261.42 P 0.000 Chapter 11: Multiple Regression 237 This will be the final model with prime rate as the interest rate variable since all of the independent variables are conditionally significant. Note the significant multicollinearity that exists between the independent variables. Regression Analysis: FRH versus FFED, FM2, GDPH, GH The regression equation is FRH = 55.0 - 2.76 FFED - 0.0558 FM2 + 0.0904 GDPH - 0.148 GH 166 cases used 52 cases contain missing values Predictor Coef SE Coef T P VIF Constant 55.00 26.26 2.09 0.038 FFED -2.7640 0.6548 -4.22 0.000 1.2 FM2 -0.05578 0.01007 -5.54 0.000 50.7 GDPH 0.090402 0.007862 11.50 0.000 59.6 GH -0.14752 0.03922 -3.76 0.000 28.5 S = 24.61 R-Sq = 85.3% Analysis of Variance Source DF Regression 4 Residual Error 161 Total 165 SS 564511 97519 662030 R-Sq(adj) = 84.9% MS 141128 606 F 233.00 P 0.000 The model with the federal funds rate as the interest rate variable is also the final model with all of the independent variables conditionally significant. Again, multicollinearity will be a problem with this regression model. b. 95% confidence intervals for the slope coefficients on the interest rate term: Bank prime rate as the interest rate variable: ˆ1 t (Sˆ ) : -3.7871 +/- 1.96(.6276) = -3.7871 +/- 1.23 1 Federal funds rate as the interest rate variable: ˆ1 t (Sˆ ) : -2.764 +/- 1.96(.6548) = -2.764 +/- 1.2834 1 11-88 a. Correlations: Infmrt82, Phys82, Perinc84, Perhosp Infmrt82 0.434 0.001 Perinc84 0.094 0.511 Perhosp 0.411 0.003 Phys82 Perinc84 Phys82 0.614 0.000 0.285 0.042 0.267 0.058 The correlation matrix shows a positive association with Phys82 and Perhosp. These variables are the number of physicians per 100,000 population and the total per capita expenditures for hospitals. One would expect a negative association, therefore, examine the scatterdiagram of infant mortality vs. phys82: 238 Solutions Manual for Statistics for Business & Economics, 5 th Edition Infmrt82 20 15 10 100 150 200 250 300 350 400 450 500 550 Phys82 The graph shows an obvious outlier which, upon further investigation, is the District of Columbia. Due to the outlier status, this row is dropped from the analysis and the correlation matrix is recalculated: Correlations: Infmrt82, Phys82, Perinc84, Perhosp Infmrt82 -0.147 0.309 Perinc84 -0.192 0.181 Perhosp 0.199 0.166 Phys82 Perinc84 Phys82 0.574 0.000 -0.065 0.654 0.140 0.331 The physicians per 100,000 population now has the correct sign, however, none of the independent variables has a statistically significant linear association with the dependent variable. Per capita expenditures for hospitals is an unexpected positive sign; however, it is not conditionally significant. The multiple regression results are likely to yield low explanatory power with insignificant independent variables: Regression Analysis: Infmrt82 versus Phys82, Perinc84, Perhosp The regression equation is Infmrt82 = 12.7 - 0.00017 Phys82 -0.000206 Perinc84 + 6.30 Perhosp Predictor Coef SE Coef T P VIF Constant 12.701 1.676 7.58 0.000 Phys82 -0.000167 0.006647 -0.03 0.980 1.5 Perinc84 -0.0002064 0.0001637 -1.26 0.214 1.6 Perhosp 6.297 3.958 1.59 0.118 1.1 S = 1.602 R-Sq = 8.9% R-Sq(adj) = 3.0% Analysis of Variance Source DF SS MS F Regression 3 11.546 3.849 1.50 Residual Error 46 118.029 2.566 Total 49 129.575 P 0.227 Chapter 11: Multiple Regression 239 As expected, the model explains less than 9% of the variability in infant mortality. None of the independent variables are conditionally significant and multicollinearity does not appear to be a significant problem. The standard error of the estimate is very large (1.602) relative to the size of the infant mortality rates and hence the model would not be a good predictor. Sequentially dropping the independent variable with the lowest t-statistic confirms the conclusion that none of the independnet variables is conditionally significant. The search is on for better independent variables. b. The two variables to include are per capita spending on education (PerEduc) and per capita spending on public welfare (PerPbwel). Since the conditional significance of the independent variables is a function of other independent variables in the model, we will include the original set of variables: Regression Analysis: Infmrt82 versus Phys82, Perinc84, ... The regression equation is Infmrt82 = 12.2 - 0.00122 Phys82 +0.000015 Perinc84 + 8.87 Perhosp - 1.96 PerEduc - 4.56 PerPbwel Predictor Coef SE Coef T P VIF Constant 12.167 1.654 7.35 0.000 Phys82 -0.001219 0.008028 -0.15 0.880 2.4 Perinc84 0.0000152 0.0001942 0.08 0.938 2.4 Perhosp 8.872 4.010 2.21 0.032 1.2 PerEduc -1.960 1.315 -1.49 0.143 1.8 PerPbwel -4.555 3.360 -1.36 0.182 1.7 S = 1.550 R-Sq = 18.4% R-Sq(adj) = 9.1% Analysis of Variance Source DF SS MS F Regression 5 23.848 4.770 1.98 Residual Error 44 105.727 2.403 Total 49 129.575 P 0.100 The model shows low explanatory power and only one independent variable that is conditionally significant (Perhosp). Dropping sequentially the independent variable with the lowest t-statistic yields a model with no conditionally significant independent variables. This problem illustrates that in some applications, the variables that have been identified as theoretically important predictors do not meet the statistical test. 11-89 a. Correlations: Salary, age, yrs_asoc, yrs_full, Sex_1Fem, Market, C8 Salary 0.749 0.000 yrs_asoc 0.698 0.000 yrs_full 0.777 0.000 Sex_1Fem -0.429 0.000 Market 0.026 0.750 C8 -0.029 0.721 age yrs_asoc yrs_full Sex_1Fem Market age 0.712 0.000 0.583 0.000 -0.234 0.004 -0.134 0.103 -0.189 0.020 0.312 0.000 -0.367 0.000 -0.113 0.169 -0.073 0.373 -0.292 0.000 -0.017 0.833 -0.043 0.598 0.062 0.453 -0.094 0.254 -0.107 0.192 The correlation matrix indicates several independent variables that should provide good explanatory power in the regression model. We would expect that age, years at Associate professor and years at full professor are likely to be conditionally significant: 240 Solutions Manual for Statistics for Business & Economics, 5 th Edition Regression Analysis: Salary versus age, yrs_asoc, ... The regression equation is Salary = 21107 + 105 age + 532 yrs_asoc + 690 yrs_full - 1312 Sex_1Fem + 2854 Market + 1101 C8 Predictor Coef SE Coef T P VIF Constant 21107 1599 13.20 0.000 age 104.59 40.62 2.58 0.011 3.1 yrs_asoc 532.27 63.66 8.36 0.000 2.4 yrs_full 689.93 52.66 13.10 0.000 1.7 Sex_1Fem -1311.8 532.3 -2.46 0.015 1.3 Market 2853.9 823.3 3.47 0.001 1.0 C8 1101.0 658.1 1.67 0.097 1.1 S = 2569 R-Sq = 85.6% Analysis of Variance Source DF Regression 6 Residual Error 143 Total 149 SS 5604244075 943982608 6548226683 R-Sq(adj) = 85.0% MS 934040679 6601277 F 141.49 P 0.000 Dropping the C8 variable yields: Regression Analysis: Salary versus age, yrs_asoc, ... The regression equation is Salary = 21887 + 90.0 age + 539 yrs_asoc + 697 yrs_full - 1397 Sex_1Fem + 2662 Market Predictor Coef SE Coef T P VIF Constant 21887 1539 14.22 0.000 age 90.02 39.92 2.26 0.026 3.0 yrs_asoc 539.48 63.91 8.44 0.000 2.4 yrs_full 697.35 52.80 13.21 0.000 1.7 Sex_1Fem -1397.2 533.2 -2.62 0.010 1.2 Market 2662.3 820.3 3.25 0.001 1.0 S = 2585 R-Sq = 85.3% Analysis of Variance Source DF Regression 5 Residual Error 144 Total 149 SS 5585766862 962459821 6548226683 R-Sq(adj) = 84.8% MS 1117153372 6683749 F 167.14 P 0.000 This is the final model. All of the independent variables are conditionally significant and the model explains a sizeable portion of the variability in salary. b. To test the hypothesis that the rate of change in female salaries as a function of age is less than the rate of change in male salaries as a function of age, the dummy variable Sex_1Fem is used to see if the slope coefficient for age (X1) is different for males and females. The following model is used: Y 0 ( 1 6 X 4 ) X1 2 X 2 3 X 3 4 X 4 5 X 5 0 1 X1 6 X 4 X1 2 X 2 3 X 3 4 X 4 5 X 5 Chapter 11: Multiple Regression 241 Create the variable X4X1 and then test for conditional significance in the regression model. If it proves to be a significant predictor of salaries then there is strong evidence to conclude that the rate of change in female salaries as a function of age is different than for males: Regression Analysis: Salary versus age, femage, ... The regression equation is Salary = 22082 + 85.1 age + 11.7 femage + 543 yrs_asoc + 701 yrs_full - 1878 Sex_1Fem + 2673 Market Predictor Coef SE Coef T P VIF Constant 22082 1877 11.77 0.000 age 85.07 48.36 1.76 0.081 4.4 femage 11.66 63.89 0.18 0.855 32.2 yrs_asoc 542.85 66.73 8.13 0.000 2.6 yrs_full 701.35 57.35 12.23 0.000 2.0 Sex_1Fem -1878 2687 -0.70 0.486 31.5 Market 2672.8 825.1 3.24 0.001 1.0 S = 2594 R-Sq = 85.3% Analysis of Variance Source DF Regression 6 Residual Error 143 Total 149 SS 5585990999 962235684 6548226683 R-Sq(adj) = 84.7% MS 930998500 6728921 F 138.36 P 0.000 The regression shows that the newly created variable of femage is not conditionally significant. Thus, we cannot conclude that the rate of change in female salaries as a function of age differs from that of male salaries. 11-90 Regression Analysis: hseval versus sizehse, taxrate, incom72, Homper The regression equation is hseval = - 32.7 + 6.74 sizehse - 223 taxrate + 0.00464 incom72 + 11.2 Homper Predictor Coef SE Coef T P VIF Constant -32.694 8.972 -3.64 0.000 sizehse 6.740 1.880 3.58 0.001 1.4 taxrate -222.96 45.39 -4.91 0.000 1.2 incom72 0.004642 0.001349 3.44 0.001 1.2 Homper 11.215 4.592 2.44 0.017 1.3 S = 3.610 R-Sq = 49.3% Analysis of Variance Source DF Regression 4 Residual Error 85 Total 89 SS 1079.08 1107.55 2186.63 R-Sq(adj) = 47.0% MS 269.77 13.03 F 20.70 P 0.000 All of the independent variables are conditionally significant. Now add the percent of commercial property to the model to see if it is significant: 242 Solutions Manual for Statistics for Business & Economics, 5 th Edition Regression Analysis: hseval versus sizehse, taxrate, ... The regression equation is hseval = - 31.6 + 6.76 sizehse - 218 taxrate + 0.00453 incom72 + 10.3 Homper - 2.18 Comper Predictor Coef SE Coef T P VIF Constant -31.615 9.839 -3.21 0.002 sizehse 6.757 1.892 3.57 0.001 1.4 taxrate -217.63 49.58 -4.39 0.000 1.4 incom72 0.004534 0.001412 3.21 0.002 1.4 Homper 10.287 5.721 1.80 0.076 2.0 Comper -2.182 7.940 -0.27 0.784 1.7 S = 3.630 R-Sq = 49.4% Analysis of Variance Source DF Regression 5 Residual Error 84 Total 89 SS 1080.07 1106.56 2186.63 R-Sq(adj) = 46.4% MS 216.01 13.17 F 16.40 P 0.000 With a t-statistic of -.27 we have not found strong enough evidence to reject H 0 that the slope coefficient on percent commercial property is significantly different from zero. The conditional F test: SSRF SSRR 1080.07 1079.08 FComper .075 , Thus, at any common level of alpha, S 2Y | X (3.63)2 do not reject H 0 that the percent commercial property has no effect on house values. Add percent industrial property to the base model: Regression Analysis: hseval versus sizehse, taxrate, ... The regression equation is hseval = - 28.6 + 6.10 sizehse - 232 taxrate + 0.00521 incom72 + 8.68 Homper - 7.50 Indper Predictor Coef SE Coef T P VIF Constant -28.643 9.602 -2.98 0.004 sizehse 6.096 1.956 3.12 0.003 1.5 taxrate -232.34 46.00 -5.05 0.000 1.2 incom72 0.005208 0.001431 3.64 0.000 1.4 Homper 8.681 5.070 1.71 0.091 1.6 Indper -7.505 6.427 -1.17 0.246 1.7 S = 3.602 R-Sq = 50.2% Analysis of Variance Source DF Regression 5 Residual Error 84 Total 89 SS 1096.77 1089.86 2186.63 R-Sq(adj) = 47.2% MS 219.35 12.97 F 16.91 P 0.000 Likewise, the percent industrial property is not significantly different from zero. The conditional F test: RSS5 RSS4 1096.77 1079.08 FIndper 1.36 , Again this is lower than the S 2Y | X (3.602) 2 critical value of F based on common levels of alpha, therefore, do not reject H 0 that the percent industrial property has no effect on house values. Chapter 11: Multiple Regression 243 Tax rate models: Regression Analysis: taxrate versus taxbase, expercap, Homper The regression equation is taxrate = - 0.0174 -0.000000 taxbase +0.000162 expercap + 0.0424 Homper Predictor Coef SE Coef T P VIF Constant -0.017399 0.007852 -2.22 0.029 taxbase -0.00000000 0.00000000 -0.80 0.426 1.2 expercap 0.00016204 0.00003160 5.13 0.000 1.1 Homper 0.042361 0.009378 4.52 0.000 1.2 S = 0.007692 R-Sq = 31.9% Analysis of Variance Source DF Regression 3 Residual Error 86 Total 89 SS 0.00237926 0.00508785 0.00746711 R-Sq(adj) = 29.5% MS 0.00079309 0.00005916 F 13.41 P 0.000 Since taxbase is not significant, it is dropped from the model: Regression Analysis: taxrate versus expercap, Homper The regression equation is taxrate = - 0.0192 +0.000158 expercap + 0.0448 Homper Predictor Coef SE Coef T P Constant -0.019188 0.007511 -2.55 0.012 expercap 0.00015767 0.00003106 5.08 0.000 Homper 0.044777 0.008860 5.05 0.000 S = 0.007676 R-Sq = 31.4% Analysis of Variance Source DF Regression 2 Residual Error 87 Total 89 SS 0.0023414 0.0051257 0.0074671 VIF 1.1 1.1 R-Sq(adj) = 29.8% MS 0.0011707 0.0000589 F 19.87 P 0.000 Both of the independent variables are significant. This becomes the base model that we now add percent commercial property and percent industrial property sequentially: Regression Analysis: taxrate versus expercap, Homper, Comper The regression equation is taxrate = - 0.0413 +0.000157 expercap + 0.0643 Homper + 0.0596 Comper Predictor Coef SE Coef T P VIF Constant -0.041343 0.008455 -4.89 0.000 expercap 0.00015660 0.00002819 5.55 0.000 1.1 Homper 0.064320 0.009172 7.01 0.000 1.4 Comper 0.05960 0.01346 4.43 0.000 1.3 S = 0.006966 R-Sq = 44.1% Analysis of Variance Source DF Regression 3 Residual Error 86 Total 89 SS 0.0032936 0.0041735 0.0074671 R-Sq(adj) = 42.2% MS 0.0010979 0.0000485 F 22.62 P 0.000 Percent commercial property is conditionally significant and an important independent variable as shown by the conditional F-test: RSS3 RSS2 .003294 .00234 FComper 19.62 S 2Y | X (.006966) 2 244 Solutions Manual for Statistics for Business & Economics, 5 th Edition With 1 degree of freedom in the numberator and (90-3-1) = 86 degrees of freedom in the denominator, the critical value of F at the .05 level is 3.95. Hence we would conclude that the percentage of commercial property has a statistically significant positive impact on tax rate. We now add industrial property to test the effect on tax rate: Regression Analysis: taxrate versus expercap, Homper, Indper The regression equation is taxrate = - 0.0150 +0.000156 expercap + 0.0398 Homper - 0.0105 Indper Predictor Coef SE Coef T P VIF Constant -0.015038 0.009047 -1.66 0.100 expercap 0.00015586 0.00003120 5.00 0.000 1.1 Homper 0.03982 0.01071 3.72 0.000 1.6 Indper -0.01052 0.01273 -0.83 0.411 1.5 S = 0.007690 R-Sq = 31.9% Analysis of Variance Source DF Regression 3 Residual Error 86 Total 89 R-Sq(adj) = 29.5% SS 0.00238178 0.00508533 0.00746711 MS 0.00079393 0.00005913 F 13.43 P 0.000 The percent industrial property is insignificant with a t-statistic of only -.83. The F-test confirms that the variable does not have a significant impact on tax rate: RSS3 RSS2 .002382 .00234 FIndper .683 S 2Y | X (.00769)2 With 1 degree of freedom in the numberator and (90-3-1) = 86 degrees of freedom in the denominator, the critical value of F at the .05 level is 3.95. Hence we would conclude that the percentage of commercial property has no statistically significant impact on tax rate. In conclusion, we found no evidence to back three of the activists claims and strong evidence to reject one of them. We concluded that commercial development will have no effect on house value, while it will actually increase tax rate. In addition, we concluded that industrial development will have no effect on house value or tax rate. It was important to include all of the other independent variables in the regression models because the conditional significance of any one variable is influenced by which other independent variables are in the regression model. Therefore, it is important to test if direct relationships can be ‘explained’ by the relationships with other predictor variables. 11-91 a. Correlation matrix: Correlations: EconGPA, sex, Acteng, ACTmath, ACTss, ACTcomp, HSPct EconGPA 0.187 0.049 Acteng 0.387 0.001 ACTmath 0.338 0.003 ACTss 0.442 0.000 ACTcomp 0.474 0.000 HSPct 0.362 0.000 sex Acteng ACTmath ACTss ACTcomp 0.270 0.021 -0.170 0.151 -0.105 0.375 -0.084 0.478 0.216 0.026 0.368 0.001 0.448 0.000 0.650 0.000 0.173 0.150 0.439 0.000 0.765 0.000 0.290 0.014 0.812 0.000 0.224 0.060 0.230 0.053 sex Chapter 11: Multiple Regression 245 There exists a positive relationship between EconGPA and all of the independent variables, which is expected. Note that there is a high correlation between the composite ACT score and the individual components, which is again, as expected. Thus, multicollinearity is likely to be a serious concern in this regression model. Regression Analysis: EconGPA versus sex, Acteng, ... The regression equation is EconGPA = - 0.050 + 0.261 sex + 0.0099 Acteng + 0.0064 ACTmath + 0.0270 ACTss + 0.0419 ACTcomp + 0.00898 HSPct 71 cases used 41 cases contain missing values Predictor Coef SE Coef T P VIF Constant -0.0504 0.6554 -0.08 0.939 sex 0.2611 0.1607 1.62 0.109 1.5 Acteng 0.00991 0.02986 0.33 0.741 2.5 ACTmath 0.00643 0.03041 0.21 0.833 4.3 ACTss 0.02696 0.02794 0.96 0.338 4.7 ACTcomp 0.04188 0.07200 0.58 0.563 12.8 HSPct 0.008978 0.005716 1.57 0.121 1.4 S = 0.4971 R-Sq = 34.1% Analysis of Variance Source DF Regression 6 Residual Error 64 Total 70 SS 8.1778 15.8166 23.9945 R-Sq(adj) = 27.9% MS 1.3630 0.2471 F 5.52 P 0.000 As expected, multicollinearity is affecting the results. A strategy of dropping the variable with the lowest t-statistic with each successive model causes the dropping of the following variables (in order): 1) ACTmath, 2) ACTeng, 3) ACTss, 4) HSPct. The two variables that remain are the final model of gender and ACTcomp: Regression Analysis: EconGPA versus sex, ACTcomp The regression equation is EconGPA = 0.322 + 0.335 sex + 0.0978 ACTcomp 73 cases used 39 cases contain missing values Predictor Coef SE Coef T Constant 0.3216 0.5201 0.62 sex 0.3350 0.1279 2.62 ACTcomp 0.09782 0.01989 4.92 S = 0.4931 R-Sq = 29.4% Analysis of Variance Source DF Regression 2 Residual Error 70 Total 72 SS 7.0705 17.0192 24.0897 P 0.538 0.011 0.000 VIF 1.0 1.0 R-Sq(adj) = 27.3% MS 3.5352 0.2431 F 14.54 P 0.000 Both independent variables are conditionally significant. b. The model could be used in college admission decisions by creating a predicted GPA in economics based on sex and ACT comp scores. This predicted GPA could then be used with other factors in deciding admission. Note that this model predicts that females will outperform males with equal test scores. Using this model as the only source of information may lead to charges of unequal treatment. 246 Solutions Manual for Statistics for Business & Economics, 5 th Edition 11-92 92.7% of the total variation in gross revenue from a medical practice can be explained by the variations in number of hours worked, number of physicians, number of allied health personnel and the number of rooms used in the practice. The regression model indicates that gross revenue from a medical practice is positively associated with the number of hours worked, the number of physicians in the the practice, the number of allied health personnel and the number of rooms used in the practice. All of these variables are expected to have a positive effect on gross revenue. With df = (50 – 4 – 1) = 45, the critical value of t for a two-tailed test at an alpha of .05 is approximately 2.021. Therefore, one can conclude that all of the independent variables are conditionally significant with the sole exception of the number of rooms used in the practice. This variable, while a positive estimated coefficient, is not significantly different from zero. The number of physicians in the practice has a large effect on gross revenue since, all else equal, for a 1% increase in the number of physicians in the practice, we estimate an increase in the gross revenue from the medical practice by .673%. Note however that before additional physicians are hired, the analysis should be extended to include not only gross revenue but also profits. 11-93 82.5% of the total variation in the industry 4-firm concentration ratio can be explained by the variation in the amount of capital required for entry, and economies of scale. All else equal, for a one percent increase in the amount of capital required for entry into the industry, the 4-firm concentration ratio is estimated to increase by .234%. All else equal, for a one percent increase in the economies of scale of the industry, the 4-firm concentration ratio is estimated to increase by .267%. Both of the independent variables are conditionally significant as their t-statistics are larger than the critical value of t for a two-tailed test at an alpha of .05 (df = 35-2-1) = 32 (t-critical = approximately 2.042). Therefore, one can conclude that both capital requirements as well as economies of scale are conditionally significant. Both of the variables have the expected positive sign since as the amount of capital required to enter into the industry acts as a ‘barrier to entry’ and hence higher concentration is likely to result. Likewise, the more significant the economies of scale, the larger firm has a cost advantage over a smaller firm. This is an industry where it is more difficult to ‘start small’ and grow. Therefore, again, we would expect higher concentration in the industry.