Chapter 11

advertisement
190
Solutions Manual for Statistics for Business & Economics, 5 th Edition
CHAPTER 11: Multiple Regression
11-1
The estimated regression slope coefficients are interpreted as follows:
b1 = .661: All else equal, an increase in the plane’s top speed by one mph will
increase the expected number of hours in the design effort by an estimated .661
million worker-hours.
b2 = .065: All else equal, an increase in the plane’s weight by one ton will
increase the expected number of hours in the design effort by an estimated .065
million worker-hours
b3 = -.018: All else equal, an increase in the percentage of parts in common with
other models will result in a decrease in the expected number of hours in the
design effort by an estimated .065 million worker-hours
11-2
The estimated regression slope coefficients are interpreted as follows:
b1 = .057: All else equal, an increase of one unit in the change over the quarter in
equity purchases by financial institutions results in an estimated .057 increase in
the change over the quarter in the Financial Times stock price index
b2 = -.065: All else equal, an increase of one unit in the change over the quarter in
equity sales by financial institutions results in an estimated .065 decrease in the
change over the quarter in the Financial Times stock price index
11-3
a. b1 = .052: All else equal, an increase of one hundred dollars in weekly income
results in an estimated .052 quarts per week increase in milk consumption. b2 =
1.14: All else equal, an increase in family size by one person will result in an
estimated increase in milk consumption by 1.14 quarts per week
b. The intercept term b0 of -.025 is the estimated milk consumption of quarts of
milk per week given that the family’s weekly income is 0 dollars and there are 0
members in the family. This is likely extrapolating beyond the observed data
series and is not a useful interpretation.
11-4 a. b1 = .653: All else equal, a one unit increase in the average number of meals
eaten per week will result in an estimated .653 pounds gained during freshman year.
b2 = -1.345: All else equal, a one unit increase in the average number of hours of
exercise per week will result in an estimated 1.345 pound weight loss.
b3 = .613: All else equal, a one unit increase in the average number of beers
consumed per week will result in an estimated .613 pound weight gain.
11-5 a. When the correlation is = 0 the slope coefficient of the X1 term simplifies to
the slope coefficient of the bivariate regression: Start with equation 11.4:
S y (rx1 y  rx1x2 rx2 y )
. Note that if the correlation between X1 and X2 is zero, then the
b1 
S x1 (1  r 2 x1x2 )
second terms in both the numerator and denominator are zero and the formula
Chapter 11: Multiple Regression
algebraically reduces to b1 
S y rx1 y
S x1
191
which is the equivalent of the bivariate slope
coefficient (see box on bottom of page 380).
b. When the correlation between X1 and X2 is = 1, then the second term in the
denominator goes to 0 and the slope coefficient is undefined.
11-6
a. Electricity sales as a function of number of customers and price
Regression Analysis: salesmw2 versus priclec2, numcust2
The regression equation is
salesmw2 = - 647363 + 19895 priclec2 + 2.35 numcust2
Predictor
Coef
SE Coef
T
P
Constant
-647363
291734
-2.22
0.030
priclec2
19895
22515
0.88
0.380
numcust2
2.3530
0.2233
10.54
0.000
S = 66399
R-Sq = 79.2%
R-Sq(adj) = 78.5%
Analysis of Variance
Source
DF
SS
MS
Regression
2 1.02480E+12 5.12400E+11
Residual Error
61 2.68939E+11 4408828732
Total
63 1.29374E+12
Source
priclec2
numcust2
F
116.22
P
0.000
DF
Seq SS
1 5.35306E+11
1 4.89494E+11
All else equal, for every one unit increase in the price of electricity, we estimate that sales will
increase by 19895 mwh. Note that this estimated coefficient is not significantly different from
zero.
All else equal, for every additional residential customer who uses electricity in the heating of
their home, we estimate that sales will increase by 2.353 mwh.
b. Electricity sales as a function of number of customers
Regression Analysis: salesmw2 versus numcust2
The regression equation is
salesmw2 = - 410202 + 2.20 numcust2
Predictor
Coef
SE Coef
Constant
-410202
114132
numcust2
2.2027
0.1445
S = 66282
R-Sq = 78.9%
T
-3.59
15.25
P
0.001
0.000
R-Sq(adj) = 78.6%
Analysis of Variance
Source
DF
SS
MS
Regression
1 1.02136E+12 1.02136E+12
Residual Error
62 2.72381E+11 4393240914
Total
63 1.29374E+12
F
232.48
P
0.000
All else equal, an additional residential customer will add 2.2027 mwh to electricity sales.
The two models have roughly equivalent explanatory power; therefore, adding price as a variable
does not add a significant amount of explanatory power to the model. There appears to be a
problem of multicollinearity between price and customers.
192
Solutions Manual for Statistics for Business & Economics, 5 th Edition
c.
Regression Analysis: salesmw2 versus priclec2, degrday2
The regression equation is
salesmw2 = 2312260 - 165275 priclec2 + 56.1 degrday2
Predictor
Coef
SE Coef
T
P
Constant
2312260
148794
15.54
0.000
priclec2
-165275
24809
-6.66
0.000
degrday2
56.06
60.37
0.93
0.357
S = 110725
R-Sq = 42.2%
R-Sq(adj) = 40.3%
Analysis of Variance
Source
DF
SS
MS
Regression
2 5.45875E+11 2.72938E+11
Residual Error
61 7.47863E+11 12260053296
Total
63 1.29374E+12
Source
priclec2
degrday2
F
22.26
P
0.000
DF
Seq SS
1 5.35306E+11
1 10569514277
All else equal, an increase in the price of electricity will reduce electricity sales by 165,275 mwh.
All else equal, an increase in the degree days (departure from normal weather) by one unit will
increase electricity sales by 56.06 mwh.
Note that the coefficient on the price variable is now negative, as expected, and it is significantly
different from zero (p-value = .000)
d.
Regression Analysis: salesmw2 versus Yd872, degrday2
The regression equation is
salesmw2 = 293949 + 326 Yd872 + 58.4 degrday2
Predictor
Coef
SE Coef
T
Constant
293949
67939
4.33
Yd872
325.85
21.30
15.29
degrday2
58.36
35.79
1.63
S = 66187
R-Sq = 79.3%
R-Sq(adj) = 78.7%
Analysis of Variance
Source
DF
SS
MS
Regression
2 1.02652E+12 5.13259E+11
Residual Error
61 2.67221E+11 4380674677
Total
63 1.29374E+12
Source
Yd872
degrday2
P
0.000
0.000
0.108
F
117.16
P
0.000
DF
Seq SS
1 1.01487E+12
1 11651638615
All else equal, an increase in personal disposable income by one unit will increase electricity
sales by 325.85 mwh.
All else equal, an increase in degree days by one unit will increase electricity sales by 58.36
mwh.
Chapter 11: Multiple Regression
11-7
193
a. mpg as a function of horsepower and weight
Regression Analysis: milpgal versus horspwr, weight
The regression equation is
milpgal = 55.8 - 0.105 horspwr - 0.00661 weight
150 cases used 5 cases contain missing values
Predictor
Coef
SE Coef
T
P
Constant
55.769
1.448
38.51
0.000
horspwr
-0.10489
0.02233
-4.70
0.000
weight
-0.0066143
0.0009015
-7.34
0.000
S = 3.901
R-Sq = 72.3%
Analysis of Variance
Source
DF
Regression
2
Residual Error
147
Total
149
SS
5850.0
2236.8
8086.8
R-Sq(adj) = 72.0%
MS
2925.0
15.2
F
192.23
P
0.000
All else equal, a one unit increase in the horsepower of the engine will reduce fuel mileage by
.10489 mpg. All else equal, an increase in the weight of the car by 100 pounds will reduce fuel
mileage by .66143 mpg.
b. Add number of cylinders
Regression Analysis: milpgal versus horspwr, weight, cylinder
The regression equation is
milpgal = 55.9 - 0.117 horspwr - 0.00758 weight
150 cases used 5 cases contain missing values
Predictor
Coef
SE Coef
T
Constant
55.925
1.443
38.77
horspwr
-0.11744
0.02344
-5.01
weight
-0.007576
0.001066
-7.10
cylinder
0.7260
0.4362
1.66
+ 0.726 cylinder
P
0.000
0.000
0.000
0.098
S = 3.878
R-Sq = 72.9%
R-Sq(adj) = 72.3%
Analysis of Variance
Source
DF
SS
MS
F
Regression
3
5891.6
1963.9
130.62
Residual Error
146
2195.1
15.0
Total
149
8086.8
P
0.000
All else equal, one additional cylinder in the engine of the auto will increase fuel mileage by .726
mpg. Note that this is not significant at the .05 level (p-value = .098).
Horsepower and weight still have the expected negative signs
c. mpg as a function of weight, number of cylinders
Regression Analysis: milpgal versus weight, cylinder
The regression equation is
milpgal = 55.9 - 0.0104 weight + 0.121 cylinder
154 cases used 1 cases contain missing values
Predictor
Coef
SE Coef
T
P
Constant
55.914
1.525
36.65
0.000
weight
-0.0103680
0.0009779
-10.60
0.000
cylinder
0.1207
0.4311
0.28
0.780
S = 4.151
R-Sq = 68.8%
R-Sq(adj) = 68.3%
Analysis of Variance
Source
DF
SS
MS
F
Regression
2
5725.0
2862.5
166.13
Residual Error
151
2601.8
17.2
Total
153
8326.8
P
0.000
194
Solutions Manual for Statistics for Business & Economics, 5 th Edition
All else equal, an increase in the weight of the car by 100 pounds will reduce fuel mileage by
1.0368 mpg. All else equal, an increase in the number of cylinders in the engine will increase
mpg by .1207 mpg.
The explanatory power of the models has stayed relatively the same with a slight drop in
explanatory power for the latest regression model.
Note that the coefficient on weight has stayed negative and significant (p-values of .000) for all
of the regression models; although the value of the coefficient has changed. The number of
cylinders is not significantly different from zero in either of the models where it was used as an
independent variable. There is likely some correlation between cylinders and the weight of the
car as well as between cylinders and the horsepower of the car.
d. mpg as a function of horsepower, weight, price
Regression Analysis: milpgal versus horspwr, weight, price
The regression equation is
milpgal = 54.4 - 0.0938 horspwr - 0.00735 weight +0.000137 price
150 cases used 5 cases contain missing values
Predictor
Coef
SE Coef
T
P
Constant
54.369
1.454
37.40
0.000
horspwr
-0.09381
0.02177
-4.31
0.000
weight
-0.0073518
0.0008950
-8.21
0.000
price
0.00013721 0.00003950
3.47
0.001
S = 3.762
R-Sq = 74.5%
Analysis of Variance
Source
DF
Regression
3
Residual Error
146
Total
149
SS
6020.7
2066.0
8086.8
R-Sq(adj) = 73.9%
MS
2006.9
14.2
F
141.82
P
0.000
All else equal, an increase by one unit in the horsepower of the auto will reduce fuel mileage by
.09381 mpg. All else equal, an increase by 100 pounds in the weight of the auto will reduce fuel
mileage by .73518 mpg and an increase in the price of the auto by one dollar will increase fuel
mileage by .00013721 mpg.
e.
Horse power and weight remain significant negative independent variables throughout whereas
the number of cylinders has been insignificant. The size of the coefficients change as the
combinations of independent variables changes. This is likely due to strong correlation that may
exist between the independent variables.
Chapter 11: Multiple Regression
11-8
195
a. Horsepower as a function of weight, cubic inches of displacement
Regression Analysis: horspwr versus weight, displace
The regression equation is
horspwr = 23.5 + 0.0154 weight + 0.157 displace
151 cases used 4 cases contain missing values
Predictor
Coef
SE Coef
T
P
Constant
23.496
7.341
3.20
0.002
weight
0.015432
0.004538
3.40
0.001
displace
0.15667
0.03746
4.18
0.000
S = 13.64
R-Sq = 69.2%
Analysis of Variance
Source
DF
Regression
2
Residual Error
148
Total
150
SS
61929
27551
89480
VIF
6.0
6.0
R-Sq(adj) = 68.8%
MS
30964
186
F
166.33
P
0.000
All else equal, a 100 pound increase in the weight of the car is associated with a 1.54 increase in
horsepower of the auto.
All else equal, a 10 cubic inch increase in the displacement of the engine is associated with a
1.57 increase in the horsepower of the auto.
b. Horsepower as a function of weight, displacement, number of cylinders
Regression Analysis: horspwr versus weight, displace, cylinder
The regression equation is
horspwr = 16.7 + 0.0163 weight + 0.105 displace + 2.57 cylinder
151 cases used 4 cases contain missing values
Predictor
Coef
SE Coef
T
P
VIF
Constant
16.703
9.449
1.77
0.079
weight
0.016261
0.004592
3.54
0.001
6.2
displace
0.10527
0.05859
1.80
0.074
14.8
cylinder
2.574
2.258
1.14
0.256
7.8
S = 13.63
R-Sq = 69.5%
Analysis of Variance
Source
DF
Regression
3
Residual Error
147
Total
150
SS
62170
27310
89480
R-Sq(adj) = 68.9%
MS
20723
186
F
111.55
P
0.000
All else equal, a 100 pound increase in the weight of the car is associated with a 1.63 increase in
horsepower of the auto.
All else equal, a 10 cubic inch increase in the displacement of the engine is associated with a
1.05 increase in the horsepower of the auto.
All else equal, one additional cylinder in the engine is associated with a 2.57 increase in the
horsepower of the auto.
Note that adding the independent variable number of cylinders has not added to the explanatory
power of the model. R square has increased marginally. Engine displacement is no longer
significant at the .05 level (p-value of .074) and the estimated regression slope coefficient on the
number of cylinders is not significantly different from zero. This is due to the strong correlation
that exists between cubic inches of engine displacement and the number of cylinders.
196
Solutions Manual for Statistics for Business & Economics, 5 th Edition
c. Horsepower as a function of weight, displacement and fuel mileage
Regression Analysis: horspwr versus weight, displace, milpgal
The regression equation is
horspwr = 93.6 + 0.00203 weight + 0.165 displace - 1.24 milpgal
150 cases used 5 cases contain missing values
Predictor
Coef
SE Coef
T
P
VIF
Constant
93.57
15.33
6.11
0.000
weight
0.002031
0.004879
0.42
0.678
8.3
displace
0.16475
0.03475
4.74
0.000
6.1
milpgal
-1.2392
0.2474
-5.01
0.000
3.1
S = 12.55
R-Sq = 74.2%
Analysis of Variance
Source
DF
Regression
3
Residual Error
146
Total
149
SS
66042
22994
89036
R-Sq(adj) = 73.6%
MS
22014
157
F
139.77
P
0.000
All else equal, a 100 pound increase in the weight of the car is associated with a .203 increase in
horsepower of the auto.
All else equal, a 10 cubic inch increase in the displacement of the engine is associated with a
1.6475 increase in the horsepower of the auto.
All else equal, an increase in the fuel mileage of the vehicle by 1 mile per gallon is associated
with a reduction in horsepower of 1.2392.
Note that the negative coefficient on fuel mileage indicates the trade-off that is expected between
horsepower and fuel mileage. The displacement variable is significantly positive, as expected,
however, the weight variable is no longer significant. Again, one would expect high correlation
among the independent variables.
d. Horsepower as a function of weight, displacement, mpg and price
Regression Analysis: horspwr versus weight, displace, milpgal, price
The regression equation is
horspwr = 98.1 - 0.00032 weight + 0.175 displace - 1.32 milpgal +0.000138 price
150 cases used 5 cases contain missing values
Predictor
Coef
SE Coef
T
P
VIF
Constant
98.14
16.05
6.11
0.000
weight
-0.000324
0.005462
-0.06
0.953
10.3
displace
0.17533
0.03647
4.81
0.000
6.8
milpgal
-1.3194
0.2613
-5.05
0.000
3.5
price
0.0001379
0.0001438
0.96
0.339
1.3
S = 12.55
R-Sq = 74.3%
Analysis of Variance
Source
DF
Regression
4
Residual Error
145
Total
149
SS
66187
22849
89036
R-Sq(adj) = 73.6%
MS
16547
158
F
105.00
P
0.000
e. Explanatory power has marginally increased from the first model to the last. The
estimated coefficient on price is not significantly different from zero. Displacement and fuel
mileage have the expected signs. The coefficient on weight has the wrong sign; however, it is
not significantly different from zero (p-value of .953).
Chapter 11: Multiple Regression
11-9
197
3.549
 .9145 , therefore, 91.45% of the variability in work-hours of design
3.881
effort can be explained by the variation in the plane’s top speed, weight and percentage
number of parts in common with other models.
b. SSE = 3.881-3.549 = .332
.332 /(27  4)
 .9033
c. R 2  1 
3.881/ 26
d. R  .9145  .9563 . This is the sample correlation between observed and predicted
values of the design effort
a. R 2 
88.2
 .4559 , therefore, 45.59% of the variability in milk consumption can
162.1
be explained by the variations in weekly income and family size.
88.2 /(30  3)
 .4156
b. R 2  1 
162.1/ 29
c. R  .4559  .6752 . This is the sample correlation between observed and predicted
values of milk consumption.
11-10 a. R 2  1 
79.2
 .6331 , therefore, 63.31% of the variability in weight gain can be
79.2  45.9
explained by the variations in the average number of meals eaten, number of hours
exercised and number of beers consumed weekly.
45.9 /(25  4)
 .5807
b. R 2  1 
125.1/ 24
c. R  .6331  .7957 . This is the sample correlation between observed and predicted
values of weight gained
11-11 a. R 2 
11-12 a.
Regression Analysis: Y profit versus X2 offices
The regression equation is
Y profit = 1.55 -0.000120 X2 offices
Predictor
Coef
SE Coef
Constant
1.5460
0.1048
X2 offi -0.00012033 0.00001434
S = 0.07049
R-Sq = 75.4%
Analysis of Variance
Source
DF
Regression
1
Residual Error
23
Total
24
SS
0.34973
0.11429
0.46402
T
14.75
-8.39
P
0.000
0.000
R-Sq(adj) = 74.3%
MS
0.34973
0.00497
F
70.38
P
0.000
198
Solutions Manual for Statistics for Business & Economics, 5 th Edition
b.
Regression Analysis: X1 revenue versus X2 offices
The regression equation is
X1 revenue = - 0.078 +0.000543 X2
Predictor
Constant
X2 offi
S = 0.2000
Coef
-0.0781
0.00054280
SE Coef
0.2975
0.00004070
R-Sq = 88.5%
Analysis of Variance
Source
DF
Regression
1
Residual Error
23
Total
24
offices
T
-0.26
13.34
P
0.795
0.000
R-Sq(adj) = 88.1%
SS
7.1166
0.9204
8.0370
MS
7.1166
0.0400
F
177.84
P
0.000
c.
Regression Analysis: Y profit versus X1 revenue
The regression equation is
Y profit = 1.33 - 0.169 X1 revenue
Predictor
Coef
SE Coef
Constant
1.3262
0.1386
X1 reven
-0.16913
0.03559
S = 0.1009
R-Sq = 49.5%
Analysis of Variance
Source
DF
Regression
1
Residual Error
23
Total
24
T
9.57
-4.75
P
0.000
0.000
R-Sq(adj) = 47.4%
SS
0.22990
0.23412
0.46402
MS
0.22990
0.01018
F
22.59
P
0.000
d.
Regression Analysis: X2 offices versus X1 revenue
The regression equation is
X2 offices = 957 + 1631 X1 revenue
Predictor
Coef
SE Coef
Constant
956.9
476.5
X1 reven
1631.3
122.3
S = 346.8
R-Sq = 88.5%
Analysis of Variance
Source
DF
Regression
1
Residual Error
23
Total
24
SS
21388013
2766147
24154159
T
2.01
13.34
P
0.057
0.000
R-Sq(adj) = 88.1%
MS
21388013
120267
F
177.84
e. Using the residuals obtained from a, b: r = .6731
f. Using the residuals obtained from c, d: r = -.8562
11-13 a. b1  .661, sb 1  .099, n  27, t23,.05/.025  1.714, 2.069
90% CI: .661  1.714(.099); .4913 up to .8307
95% CI: .661  2.069(.099); .4562 up to .8658
b. b2  .065, sb 2  .032, t23,.025/.005  2.069, 2.807
95% CI: .065  2.069(.032); -.0012 up to .1312
99% CI: .065  2.807(.032); -.0248 up to .1548
P
0.000
Chapter 11: Multiple Regression
c. H 0 :  2  0, H1 : 2  0
.065
t
 2.031
.032
t23,.05/.025 = 1.714, 2.069
Therefore, reject H 0 at the 10% level but not at the 5% level
d. H 0 : 1  2  0, H1 : At least one i  0,(i  1, 2)
(3.311  .332) / 2
F
 103.188
.332 / 23
Therefore, reject H 0 at the 1% level since F = 103.188 > 5.66 = F 2,23,.01
11-14 a. H 0 : 1  0; H1 : 1  0
.052
t
 2.26
.023
t27,.025/.01  2.052, 2.473
Therefore, reject H 0 at the 2.5% level but not at the 1% level
b. t27,.05/.025/.005  1.703, 2.052, 2.771
90% CI: 1.14  1.703(.35); .5439 up to 1.7361
95% CI: 1.14  2.052(.35); .4218 up to 1.8582
99% CI: 1.14  2.771(.35); .1701 up to 2.1099
11-15 a. H 0 :  2  0; H1 : 2  0
1.345
t
 2.381
.565
t21,.025/.01  2.080, 2.518
Therefore, reject H 0 at the 2.5% level but not at the 1% level
b. H 0 : 3  0; H1 : 3  0
.613
t
 2.523
.243
t21,.01/.005  2.518, 2.831
Therefore, reject H 0 at the 1% level but not at the .5% level
c. t21,.05/.025/.005  1.721, 2.080, 2.831
90% CI: .653  1.721(.189); .3277 up to .9783
95% CI: .653  2.080(.189); .2599 up to 1.0461
99% CI: .653  2.831(.189); .1179 up to 1.1881
11-16 a. H 0 : 3  0, H1 : 3  0
.000191
t
 .428
.000446
199
200
Solutions Manual for Statistics for Business & Economics, 5 th Edition
t16,.10 = -1.337
Therefore, do not reject H 0 at the 20% level
b. H 0 : 1  2  3  0, H1 : At least one i  0, (i  1, 2,3)
16 .71
F
 13.057 , F 3,16,.01 = 5.29
3 1  .71
Therefore, reject H 0 at the 1% level
11-17 a. t85,.025/.005  2.000, 2.660
90% CI: 7.878  2.000(1.809); 4.260 up to 11.496
95% CI: 7.878  2.660(1.809); 3.0661 up to 12.6899
.003666
 2.73 , t85,.005  2.660
b. H 0 :  2  0; H1 : 2  0 , t 
.001344
Therefore, reject H 0 at the .5% level
11-18 a. All else being equal, an extra $1 in mean per capita personal income leads to an
expected extra $.04 of net revenue per capita from the lottery
b. b2  .8772, sb 2  .3107, n  29, t24,.025  2.064
95% CI: .8772  2.064(.3107), .2359 up to 1.5185
c. H 0 : 3  0, H1 : 3  0
365.01
t
 1.383
263.88
t24,.10 /.05 = -1.318, -1.711
Therefore, reject H 0 at the 10% level but not at the 5% level
11-19 a. b3  7.653, sb 3  3.082, n  19, t15,.025  2.131
95% CI: 7.653  2.131(3.082), 1.0853 up to 14.2207
19.729
 2.194
b. H 0 :  2  0, H1 : 2  0 , t 
8.992
t15,.025 /.01 = 2.131, 2.602
Therefore, reject H 0 at the 2.5% level but not at the 1% level
11-20 a. n  19, b1  .2, sb 1  .0092, t16,.025  2.12
95% CI: .2  2.12(.0092), .1805 up to .2195
.1
 1.19
b. H 0 :  2  0, H1 : 2  0 , t 
.084
t16,.10 = -1.337, Therefore, do not reject H 0 at the 10% level
11-21 a. n  14, b1  .101, sb 1  .023, t10,.05  1.812
90% CI: .101  1.812(.023), .0593 up to .1427
Chapter 11: Multiple Regression
b. H 0 :  2  0, H1 : 2  0 , t 
.244
 3.05
.08
t10,.01/.005 = -2.764, -3.169
Therefore, reject H 0 at the 1% level but not at the .5% level
c. H 0 : 3  0, H1 : 3  0
.057
t
 6.162
.00925
t10,.005 = 3.169
Therefore, reject H 0 at the .5% level
11-22 a. n  39, b5  .0495, sb 1  .01172, t30,.005  2.750
99% CI: .0495  2.750(.01172), .0173 up to .0817
b. H 0 :  4  0, H1 : 4  0
.48122
t
 .617
.77954
t30,.10 = 1.31
Therefore, do not reject H 0 at the 20% level
c. H 0 : 7  0, H1 : 7  0
.00645
t
 2.108
.00306
t30,.025 /.01 = 2.042, 2.457
Therefore, reject H 0 at the 5% level but not at the 2% level
SSE /(n  k  1)
SST /(n  1)
n 1
(1  R 2 )
= 1
n  k 1
n 1
k
R2 
=
n  k 1
n  k 1
2
(n  1) R  k
=
n  k 1
(n  1) R 2  k
(n  k  1) R 2  k
b. Since R 2 
, then R 2 
n  k 1
n 1
SSR / k
n  k  1 SSR / SST
c.

SSE /(n  k  1)
k
SSE / SST
11-23 a. R 2  1 
=
n  k 1 R2
k
1  R2
201
202
Solutions Manual for Statistics for Business & Economics, 5 th Edition
n  k  1 [(n  k  1) R 2  k ]/(n 1)
k
[n  1  (n  k  1) R 2  k ]/(n 1)
n  k  1 (n  k  1) R 2  k
=
k
(n  k  1)(1  R 2 )
n  k 1 R 2  k
=
k
(1  R 2 )
=
11-24 a. SST = 3.881, SSR = 3.549, SSE = .332
H 0 : 1  2  3  0, H1 : At least one i  0, (i  1, 2,3)
3.549 / 3
F
 81.955
.332 / 23
F 3,23,.01 = 4.76
Therefore, reject H 0 at the 1% level
b. Analysis of Variance table:
Sources of
Degress of
variation Sum of Squares
Freedom
Mean Squares
Regressor
3.549
3
1.183
Error
.332
23
.014435
Total
3.881
26
F-Ratio
81.955
11-25 H 0 : 1  2  0, H1 : At least one i  0,(i  1, 2)
45 .463  (2 / 45)
F
 21.262
2
1  .463
F 2,45,.01 = 5.18
Therefore, reject H 0 at the 1% level
11-26 a. SST = 162.1, SSR = 73.9, SSE = 88.2
H 0 : 1  2  0, H1 : At least one i  0,(i  1, 2)
73.9 / 2
F
 11.311 , F 2,27,.01 = 5.49
88.2 / 27
Therefore, reject H 0 at the 1% level
b.
Sources of
Degress of
variation Sum of Squares
Freedom
Regressor
73.9
2
Error
88.2
27
Total
162.1
29
Mean Squares
36.95
3.266667
F-Ratio
11.311
Chapter 11: Multiple Regression
11-27 a. SST = 125.1, SSR = 79.2, SSE = 45.9
H 0 : 1  2  3  0, H1 : At least one i  0, (i  1, 2,3)
79.2 / 3
F
 12.078 , F 3,21,.01 = 4.87
45.9 / 21
Therefore, reject Ho at the 1% level
b.
Sources of
Degress of
variation Sum of Squares
Freedom
Mean Squares
Regressor
79.2
3
26.4
Error
45.9
21
2.185714
Total
125.1
24
F-Ratio
12.078
11-28 a. H 0 : 1  2  3  4  0, H1 : At least one i  0, (i  1, 2,3, 4)
24 .51
F
 6.2449
4 1  .51
F 4,24,.01 = 4.22
Therefore, reject H 0 at the 1% level
11-29 a. H 0 : 1  2  3  0, H1 : At least one i  0, (i  1, 2,3)
15 .84
F
 26.25
3 1  .84
F 3,15,.01 = 5.42
Therefore, reject H 0 at the 1% level
11-30 a. H 0 : 1  2  0, H1 : At least one i  0,(i  1, 2)
16 .96  (2 /16)
F
 217 , F 2,16,.01 = 6.23
2
1  .96
Therefore, reject H 0 at the 1% level
11-31 H 0 : 1  2  3  4  5  6  7  0, H1 : At least one i  0, (i  1, 2,3, 4,5, 6, 7)
31 .3572  (7 / 31)
F
 4.017
7
1  .3572
F 7,30,.01 = 3.30
Therefore, reject H 0 at the 1% level
11-32
( SSE *  SSE )k1 n  k  1 ( SSE *  SSE ) / SST

SSE /(n  k  1)
k1
SSE / SST
n  k  11  R 2*  (1  R 2 )
=
k1
1  R2
203
204
Solutions Manual for Statistics for Business & Economics, 5 th Edition
=
n  k  1 R 2  R 2*
k1
1  R2
11-33 Let  3 be the coefficient on the number of preschool children in the household
H 0 : 3  0, H1 : 3  0
(88.2  83.7) /1
F
 1.398
83.7 / 26
F 1,26,.05 = 4.23
Therefore, do not reject H 0 at the 5% level
11-34 Yˆ  7.35  .653(20)  1.345(10)  .613(6)  10.638 pounds
11-35 Yˆ  .025  .052(6)  1.14(4)  4.847 quarts of milk per week
11-36 Yˆ  .578  .661(1)  .065(7)  .018(50)  .794 million worker=hours
11-37 a. All else equal, a one square foot increase in the lot size is expected to increase the
selling price of the house by $1.468
b. 98.43% of the variation in the selling price of homes can be explained by the variation
in house size, lot size, number of bedrooms and number of bathrooms
c. H 0 :  4  0, H1 : 4  0
2701.1
t
 1.353
1996.2
t15,.10 /.05 = 1.341, 1.753
Therefore, reject H 0 at the 10% level but not at the 5% level
d. Yˆ  1998.5  22.352(1250)  1.4686(4700)  6767.3(3)  2701.1(1.5)  61194.47
11-38 There are many possible answers. Relationships that can be approximated by a non-linear
quadratic model include many supply functions, production functions and cost functions
including average cost versus the number of units produced.
11-39 To estimate the function with linear least squares, solve the equation 1   2  2 for  2 .
Since 2  2  1 , plug into the equation and algebraically manipulate:
Y   o  1 X 1  (2  1 ) X 21   3 X 2
Y   o  1 X 1  2 X 21  1 X 21   3 X 2
Y   o  1[ X 1  X 21 ]  2 X 21   3 X 2
Y  2 X 21   o  1[ X 1  X 21 ]   3 X 2
Conduct the variable transformations and estimate the model using least squares.
Chapter 11: Multiple Regression
11-40 Linear model:
Regression Plot
Salary = 20544.5 + 616.113 Experience
S = 3117.89
R-Sq = 78.0 %
R-Sq(adj) = 77.9 %
50000
Salary
40000
30000
20000
0
10
20
30
40
Experience
Quadratic model:
Regression Plot
Salary = 18683.8 + 910.807 Experience
- 8.21382 Experience**2
S = 3027.17
R-Sq = 79.4 %
R-Sq(adj) = 79.1 %
50000
Salary
40000
30000
20000
0
10
20
Experience
30
40
205
206
Solutions Manual for Statistics for Business & Economics, 5 th Edition
Cubic model:
Regression Plot
Salary = 20881.1 + 344.484 Experience
+ 26.4323 Experience**2 - 0.582553 Experience**3
S = 2982.43
R-Sq = 80.2 %
R-Sq(adj) = 79.8 %
50000
Salary
40000
30000
20000
0
10
20
30
40
Experience
All three of the models appear to fit the data well. The cubic model appears to fit the data the
best as the standard error of the estimate is lowest. In addition, explanatory power is marginally
higher for the cubic model than the other models.
11-41 a. All else equal, 1% increase in annual consumption expenditures will be associated
with a 1.1556% increase in expenditures on vacation travel.
All else equal, a 1% increase in the size of the household will be associated with a
.4408% decrease in expenditures on vacation travel.
b. 16.8% of the variation in vacation travel expenditures can be explained by the
variations in the log of total consumption expenditures and log of the number of
members in the household
c. 1.1556  1.96(.0546) = 1.049 up to 1.2626
d. H 0 :  2  0, H1 : 2  0
.4408
t
 8.996
.0490
Therefore, reject H 0 at the 1% level
11-42 a. A 1% increase in median income leads to an expected .68% increase in store size.
b. H 0 : 1  0, H1 : 1  0
.68
t
 8.831
.077
Therefore, reject H 0 at the 1% level
Chapter 11: Multiple Regression
207
11-43 a. All else equal, a 1% increase in the price of beef will be associated with a decrease of
.529% in the tons of beef consumed annually in the U.S.
b. All else equal, a 1% increase in the price of pork will be associated with an increase
of .217% in the tons of beef consumed annually in the U.S.
c. H 0 :  4  0, H1 : 4  0
.416
t
 2.552
.163
t25,.01 = 2.485
Therefore, reject H 0 at the 1% level
d. H 0 : 1  2  3  4  0, H1 : At least one i  0, (i  1, 2,3, 4)
n  k 1 R2
25 .683

 13.466
2
k
1 R
4 1  .683
F 4,25,.01 = 4.18
Therefore, reject H 0 at the 1% level
e. If an important independent variable has been omitted, there may be specification bias.
The regression coefficients produced for the misspecified model would be misleading.
F
11-44
Results for: GermanImports.xls
Regression Analysis: LogYt versus LogX1t, LogX2t
The regression equation is
LogYt = - 4.07 + 1.36 LogX1t + 0.101 LogX2t
Predictor
Coef
SE Coef
T
Constant
-4.0709
0.3100
-13.13
LogX1t
1.35935
0.03005
45.23
LogX2t
0.10094
0.05715
1.77
S = 0.04758
R-Sq = 99.7%
Analysis of Variance
Source
DF
Regression
2
Residual Error
28
Total
30
Source
LogX1t
LogX2t
DF
1
1
SS
21.345
0.063
21.409
P
0.000
0.000
0.088
VIF
4.9
4.9
R-Sq(adj) = 99.7%
MS
10.673
0.002
F
4715.32
P
0.000
Seq SS
21.338
0.007
11-45 Estimate a Cobb-Douglas production function with three independent variables:



Y  0 X1 1 X 2 2 X 3 3  where X1 = capital, X2 = labor and X3 = basic research
Taking the log of both sides of the equation yields:
log(Y )  log( 0 )  1 log( X1 )   2 log( X 2 )  3 log( X 3 )  
Using this form, now regression the log of Y on the logs of the three independent
variables and obtain the estimated regression slope coefficients.
208
Solutions Manual for Statistics for Business & Economics, 5 th Edition
11-46
11-47 The interpretation of the dummy variable is that we can conclude that for a given
difference between the spot price in the current year and OPEC price in the previous year,
the difference between the OPEC price in the current year and OPEC price in the
previous years is $5.22 higher in 1974 during the oil embargo than in other years
11-48 a. All else being equal, expected selling price is higher by $3,219 if house has a fireplace.
b. All else being equal, expected selling price is higher by $2,005 if house has brick
siding.
c. 95% CI: 3219  1.96(947) = $1,362.88 up to $5,075.12
d. H 0 : 5  0, H1 : 5  0
2005
t
 2.611
768
t809,.005 = 2.576
Therefore, reject H 0 at the .5% level
11-49 a. All else being equal, the price-earnings ratio is higher by 1.23 for a regional bank than
a money center bank
b. H 0 :  2  0, H1 : 2  0
1.23
t
 2.48
.496
t29,.01/.005 = 2.462, 2.756
Therefore, reject H 0 at the 2% level but not at the 1% level
c. H 0 : 1  2  0, H1 : At least one i  0,(i  1, 2)
n  k 1 R2
29 .37

 8.516
2
k
1 R
2 1  .37
F 2,29,.05 = 3.33
Therefore, reject H 0 at the 5% level
F
11-50 35.6% of the variation in overall performance in law school can be explained by the
variation in undergraduate gpa, scores on the LSATs and whether the student’s letter of
recommendation are unusually strong. The overall model is significant since we can reject the
null hypothesis that the model has no explanatory power in favor of the alternative hypothesis
that the model has significant explanatory power. The individual regression coefficients that are
significantly different than zero include the scores on the LSAT and whether the student’s letters
of recommendation were unusually strong. The coefficient on undergraduate gpa was not found
to be significant at the 5% level.
Chapter 11: Multiple Regression
209
11-51 a. All else equal, the annual salary of a chief justice who can be removed is $5,793
higher than if the chief justice cannot be removed
b. All else equal, the annual salary of a chief justice elected on partisan ballots is $3,100
lower than if the chief justice is not elected in this manner
c. H 0 : 5  0, H1 : 5  0
5793
t
 1.9996
2897
t43,.05/.025 = 1.68, 2.016
Therefore, reject H 0 at the 5% level but not at the 2.5% level
d. H 0 : 6  0, H1 : 6  0
3100
t
 1.76
1761
t43,.05/.025 = -1.68, -20.16
Therefore, reject H 0 at the 5% level but not at the 2.5% level
e. t43,.05/.025 = 2.016
95% CI: 547  2.016(124.3), 296.41 up to 797.59
11-52
Results for: Student Performance.xls
Regression Analysis: Y versus X1, X2, X3, X4, X5
The regression equation is
Y = 2.00 + 0.0099 X1 + 0.0763 X2 - 0.137 X3 + 0.064 X4 + 0.138 X5
Predictor
Constant
X1
X2
X3
X4
X5
S = 0.5416
Coef
1.997
0.00990
0.07629
-0.13652
0.0636
0.13794
SE Coef
1.273
0.01654
0.05654
0.06922
0.2606
0.07521
R-Sq = 26.5%
Analysis of Variance
Source
DF
Regression
5
Residual Error
21
Total
26
SS
2.2165
6.1598
8.3763
T
1.57
0.60
1.35
-1.97
0.24
1.83
P
0.132
0.556
0.192
0.062
0.810
0.081
VIF
1.3
1.2
1.1
1.4
1.1
R-Sq(adj) = 9.0%
MS
0.4433
0.2933
F
1.51
P
0.229
The model is not significant (p-value of the F-test = .229). The model only explains 26.5% of
the variation in gpa with the hours spent studying, hours spent preparing for tests, hours spent in
bars, whether or not students take notes or mark highlights when reading tests and the average
number of credit hours taken per semester. The only independent variables that are marginally
significant (10% level but not the 5% level) include number of hours spent in bars and whether
or not students take notes or mark highlights when reading tests. The other independent
variables are not significant at common levels of alpha.
210
Solutions Manual for Statistics for Business & Economics, 5 th Edition
11-53 a. All else equal, the average rating of a course is 6.21 units higher if a visiting lecturer is
brought in than if otherwise.
b. H 0 :  4  0, H1 : 4  0
6.21
t
 1.73
3.59
t20,.05 = 1.725
Therefore, reject H 0 at the 5% level
c.56.9% of the variation in the average course rating can be explained by the variation in
the percentage of time spent in group discussions, the dollars spent on preparing the
course materials, the dollars spent on food and drinks, and whether a guest lecturer is
brought in.
H 0 : 1  2  3  4  0, H1 : At least one i  0, (i  1, 2,3, 4)
n  k 1 R2
20 .569
F

 6.6
2
k
1 R
4 1  .569
F 4,20,.01 = 4.43
Therefore, reject H 0 at the 1% level
d. t20,.025 = 2.086
95% CI: .52  2.086(.21), .0819 up to .9581
11-54 34.4% of the variation in a test on understanding college economics can be explained by
the student’s gpa, the teacher that taught the course, the gender of the student, the pre-test
score, the number of credit hours completed and the age of the student. The regression
model has significant explanatory power:
H 0 : 1  2  3  4  5  6  7  0, H1 : At least one i  0, (i  1, 2,3, 4,5, 6, 7)
F
n  k 1 R2
342 .344

 25.62
2
k
1 R
7 1  .344
11-55 a. Begin the analysis with the correlation matrix – identify important independent
variables as well as correlations between the independent variables
Correlations: Salary, Experience, yearsenior, Gender_1F
Experien
yearseni
Salary Experien yearseni
0.883
0.000
0.777
0.000
0.674
0.000
Gender_1 -0.429
0.000
-0.378
0.000
-0.292
0.000
Chapter 11: Multiple Regression
211
Regression Analysis: Salary versus Experience, yearsenior, Gender_1F
The regression equation is
Salary = 22644 + 437 Experience + 415 yearsenior - 1443 Gender_1F
Predictor
Coef
SE Coef
T
P
VIF
Constant
22644.1
521.8
43.40
0.000
Experien
437.10
31.41
13.92
0.000
2.0
yearseni
414.71
55.31
7.50
0.000
1.8
Gender_1
-1443.2
519.8
-2.78
0.006
1.2
S = 2603
R-Sq = 84.9%
Analysis of Variance
Source
DF
Regression
3
Residual Error
146
Total
149
R-Sq(adj) = 84.6%
SS
5559163505
989063178
6548226683
MS
1853054502
6774405
F
273.54
P
0.000
84.9% of the variation in annual salary (in dollars) can be explained by the variation in the years
of experience, the years of seniority and the gender of the employee. All of the variables are
significant at the .01 level of significance (p-values of .000, .000 and .006 respectively). The Ftest of the significance of the overall model shows that we reject H 0 that all of the slope
coefficients are jointly equal to zero in favor of H1 that at least one slope coefficient is not equal
to zero. The F-test yielded a p-value of .000.
b. H 0 : 3  0, H1 : 3  0
1443.2
t
 2.78 , t146,.01 = 2.326
519.8
Therefore, reject H 0 at the 1% level. And conclude that the annual salaries for females
are statistically significantly lower than they are for males.
c. Add an interaction term and test for the significance of the slope coefficient on the
interaction term.
11-56 Analyze the correlation matrix first:
Correlations: Sales Pizza1, Price Pizza1, Promotion Pi, Sales B2, Price B2, Sale
Sales Pi Price Pi Promotio Sales B2 Price B2 Sales B3 Price B3 Sales B4
Price Pi -0.263
0.001
Promotio
0.570
0.000
-0.203
0.011
Sales B2
0.136
0.090
0.170
0.034
0.031
0.700
Price B2
0.118
0.143
0.507
0.000
0.117
0.146
-0.370
0.000
Sales B3
0.014
0.862
0.174
0.029
0.045
0.581
0.103
0.199
0.199
0.013
Price B3
0.179
0.026
0.579
0.000
0.034
0.675
0.162
0.043
0.446
0.000
-0.316
0.000
Sales B4
0.248
0.002
0.102
0.205
0.123
0.127
0.310
0.000
0.136
0.091
0.232
0.004
0.081
0.313
Price B4
0.177
0.027
0.509
0.000
0.124
0.124
0.229
0.004
0.500
0.000
0.117
0.147
0.523
0.000
-0.158
0.049
212
Solutions Manual for Statistics for Business & Economics, 5 th Edition
Strongest correlation with sales of Pizza1 is the type of promotion. Price of Pizza1 has the
correct ‘negative’ association with sales. Prices of the competing brands are expected to be
positive since they are substitutes with Pizza1; however, the sales of the competing brands are
expected to be negatively related to the sales of Pizza1.
Regression Analysis: Sales Pizza1 versus Price Pizza1, Promotion Pi, ...
The regression equation is
Sales Pizza1 = - 6406 - 24097 Price Pizza1 + 1675 Promotion Pizza1
+ 0.0737 Sales B2 + 4204 Price B2 + 0.177 Sales B3 + 18003 Price B3
+ 0.345 Sales B4 + 11813 Price B4
Predictor
Coef
SE Coef
T
P
VIF
Constant
-6406
2753
-2.33
0.021
Price Pi
-24097
3360
-7.17
0.000
2.5
Promotio
1674.6
283.9
5.90
0.000
1.2
Sales B2
0.07370
0.08281
0.89
0.375
3.1
Price B2
4204
4860
0.87
0.388
4.3
Sales B3
0.17726
0.09578
1.85
0.066
1.9
Price B3
18003
4253
4.23
0.000
3.0
Sales B4
0.3453
0.1392
2.48
0.014
1.9
Price B4
11813
6151
1.92
0.057
3.0
S = 3700
R-Sq = 54.9%
Analysis of Variance
Source
DF
Regression
8
Residual Error
147
Total
155
SS
2447394873
2012350019
4459744891
R-Sq(adj) = 52.4%
MS
305924359
13689456
F
22.35
P
0.000
The multiple regression with all of the independent variables indicates that 54.9% of the
variation in the sales of Pizza1 can be explained by all of the independent variables. However,
not all of the independent variables are significantly different from zero. It appears that neither
the price of Brand 2, nor the sales of Brand 2 has a statistically significant effect on Pizza1.
Eliminating those variables that are insignificant yields:
Regression Analysis: Sales Pizza1 versus Price Pizza1, Promotion Pi, ...
The regression equation is
Sales Pizza1 = - 6546 - 23294 Price Pizza1 + 1701 Promotion Pizza1
+ 0.197 Sales B3 + 18922 Price B3 + 0.418 Sales B4 + 15152 Price B4
Predictor
Coef
SE Coef
T
P
VIF
Constant
-6546
2676
-2.45
0.016
Price Pi
-23294
3210
-7.26
0.000
2.3
Promotio
1701.0
279.9
6.08
0.000
1.2
Sales B3
0.19737
0.09234
2.14
0.034
1.8
Price B3
18922
4092
4.62
0.000
2.8
Sales B4
0.4183
0.1137
3.68
0.000
1.3
Price B4
15152
4978
3.04
0.003
2.0
S = 3686
R-Sq = 54.6%
Analysis of Variance
Source
DF
Regression
6
Residual Error
149
Total
155
SS
2435527670
2024217221
4459744891
R-Sq(adj) = 52.8%
MS
405921278
13585350
F
29.88
P
0.000
All of the variables are now significant at the .05 level and all have the correct sign excepting the
sales of brand 3 and 4.
Chapter 11: Multiple Regression
213
11-57 through 11-59 Reports can be written by following the extended Case Study on the data
file Cotton – see Section 11.9
11-60
Regression Analysis: y_deathrate versus x1_totmiles, x2_avgspeed
The regression equation is
y_deathrate = - 2.97 - 0.00447 x1_totmiles + 0.219 x2_avgspeed
Predictor
Coef
SE Coef
T
P
VIF
Constant
-2.969
3.437
-0.86
0.416
x1_totmi
-0.004470
0.001549
-2.89
0.023
11.7
x2_avgsp
0.21879
0.08391
2.61
0.035
11.7
S = 0.1756
R-Sq = 55.1%
Analysis of Variance
Source
DF
Regression
2
Residual Error
7
Total
9
R-Sq(adj) = 42.3%
SS
0.26507
0.21593
0.48100
MS
0.13254
0.03085
F
4.30
P
0.061
55.1% of the variation in death rates can be explained by the variation in total miles traveled and
in average travel speed. The overall model is significant at the 10% but not the 5% level since
the p-value of the F-test is .061.
All else equal, the average speed variable has the expected sign since as average speed increases,
the death rate also increases. The total miles traveled variable is negative which indicates that
the more miles traveled, the lower the death rate. Both of the independent variables are
significant at the 5% level (p-values of .023 and .035 respectively). There appears to be some
correlation between the independent variables.
Due to the high correlation between the independent variables, an alternative model using a
quadratic model is as follows:
Regression Analysis: y_deathrate versus x1_totmiles, x1_totsquared
The regression equation is
y_deathrate = - 6.54 + 0.0268 x1_totmiles -0.000015 x1_totsquared
Predictor
Coef
Constant
-6.539
x1_totmi
0.026800
x1_totsq -0.00001480
S = 0.06499
R-Sq = 93.9%
Analysis of Variance
Source
DF
Regression
2
Residual Error
7
Total
9
Source
x1_totmi
x1_totsq
SE Coef
1.296
0.002835
0.00000153
DF
1
1
SS
0.45143
0.02957
0.48100
Seq SS
0.05534
0.39609
T
-5.04
9.45
-9.68
P
0.001
0.000
0.000
VIF
285.5
285.5
R-Sq(adj) = 92.1%
MS
0.22572
0.00422
F
53.44
P
0.000
214
Solutions Manual for Statistics for Business & Economics, 5 th Edition
11-61
Regression Analysis: y_FemaleLFPR versus x1_income, x2_yrsedu, ...
The regression equation is
y_FemaleLFPR = 0.2 +0.000406 x1_income + 4.84 x2_yrsedu - 1.55 x3_femaleun
Predictor
Coef
SE Coef
T
P
VIF
Constant
0.16
34.91
0.00
0.996
x1_incom
0.0004060
0.0001736
2.34
0.024
1.2
x2_yrsed
4.842
2.813
1.72
0.092
1.5
x3_femal
-1.5543
0.3399
-4.57
0.000
1.3
S = 3.048
R-Sq = 54.3%
Analysis of Variance
Source
DF
Regression
3
Residual Error
46
Total
49
SS
508.35
427.22
935.57
R-Sq(adj) = 51.4%
MS
169.45
9.29
F
18.24
P
0.000
11.62
Regression Analysis: y_money versus x1_pcincome, x2_ir
The regression equation is
y_money = - 1158 + 0.253 x1_pcincome - 19.6 x2_ir
Predictor
Coef
SE Coef
T
P
Constant
-1158.4
587.9
-1.97
0.080
x1_pcinc
0.25273
0.03453
7.32
0.000
x2_ir
-19.56
21.73
-0.90
0.391
S = 84.93
R-Sq = 89.8%
Analysis of Variance
Source
DF
Regression
2
Residual Error
9
Total
11
Source
x1_pcinc
x2_ir
DF
1
1
SS
570857
64914
635771
VIF
1.3
1.3
R-Sq(adj) = 87.5%
MS
285429
7213
F
39.57
P
0.000
Seq SS
565012
5845
11-63
Regression Analysis: y_manufgrowt versus x1_aggrowth, x2_exportgro, ...
The regression equation is
y_manufgrowth = 2.15 + 0.493 x1_aggrowth + 0.270 x2_exportgrowth
- 0.117 x3_inflation
Predictor
Coef
SE Coef
T
P
VIF
Constant
2.1505
0.9695
2.22
0.032
x1_aggro
0.4934
0.2020
2.44
0.019
1.0
x2_expor
0.26991
0.06494
4.16
0.000
1.0
x3_infla
-0.11709
0.05204
-2.25
0.030
1.0
S = 3.624
R-Sq = 39.3%
Analysis of Variance
Source
DF
Regression
3
Residual Error
44
Total
47
Source
x1_aggro
x2_expor
x3_infla
DF
1
1
1
SS
373.98
577.97
951.95
Seq SS
80.47
227.02
66.50
R-Sq(adj) = 35.1%
MS
124.66
13.14
F
9.49
P
0.000
Chapter 11: Multiple Regression
215
11-64 The method of least squares regression yields estimators that are BLUE – Best Linear
Unbiased Estimators. This result holds when the assumptions regarding the behavior of the
error term are true. BLUE estimators are the most efficient (best) estimators out of the class
of all unbiased estimators. The advent of computing power incorporating the method of
least squares has dramatically increased its use.
11-65 The analysis of variance table identifies how the total variability of the dependent
variable (SST) is split up between the portion of variability that is explained by the
regression model (SSR) and the part that is unexplained (SSE). The Coefficient of
Determination (R2) is derived as the ratio of SSR to SST. The analysis of variance table
also computes the F statistic for the test of the significance of the overall regression –
whether all of the slope coefficients are jointly equal to zero. The associated p-value is also
generally reported in this table.
11-66 A. False. If the regression model does not explain a large enough portion of the
variability of the dependent variable, then the error sum of squares can be larger than the
regression sum of squares
b. False – the sum of several simple linear regressions will not equal a multiple regression
since the assumption of ‘all else equal’ will be violated in the simple linear regressions.
The multiple regression ‘holds’ all else equal in calculating the partial effect that a
change in one of the independent variables has on the dependent variable.
c. True
d. False – While the regular coefficient of determination (R2) cannot be negative, the
adjusted coefficient of determination R 2 can become negative. If the independent
variables added into a regression equation have very little explanatory power, the loss
of degrees of freedom may more than offset the added explanatory power.
e. True
11-67 If one model contains more explanatory variables, then SST remains the same for both
models but SSR will be higher for the model with more explanatory variables. Since SST
= SSR1 + SSE1 which is equivalent to SSR2 + SSE2 and given that SSR2 > SSR1, then SSE1
> SSE2. Hence, the coefficient of determination will be higher with a greater number of
explanatory variables and the coefficient of determination must be interpreted in
conjunction with whether or not the regression slope coefficients on the explanatory
variables are significantly different from zero.
11-68 This is a classic example of what happens when there is a high degree of multicollinearity
between the independent variables. The overall model can be shown to have significant
explanatory power and yet none of the slope coefficients on the independent variables are
significantly different from zero.
216
Solutions Manual for Statistics for Business & Economics, 5 th Edition
11-69
e  (y  a b x b x
e  (y  y b x b x
 e  ny  ny  nb x  nb x
e  0
)
i
i
1 1i
2 2i
i
i
1 1i
2 2i
 b1 x1i  b2 x2i )
2 2
 nb1 x1  nb2 x2
i
1 1
i
11-70 a. All else equal, a unit change in population, industry size, measure of economic quality,
measure of health and social life results in a respective 4.983, 2.198, 3.816, 3.215, and .085
increase in the new business starts in the industry.
2
b. R  .766
c. t 62,.05 = 1.67, therefore, the 90% CI: 3.816  1.67(2.063) = .3708 up to 7.2612
d. H 0 : 5  0, H1 : 5  0
.886
t
 .29
3.055
t62,.10 = -1.295
Therefore, do not reject H 0 at the 20% level
e. H 0 : 6  0, H1 : 6  0
3.125
t
 2.05 , t62,.025 = 1.999
1.568
Therefore, reject H 0 at the 5% level
f. H 0 : 1  2  3  4  5  6  7  0, H1 : At least one i  0, (i  1, 2,3, 4,5, 6, 7)
n  k 1 R2
62 .766
F

 28.99
2
k
1 R
2 1  .766
F 7,62,.01 = 2.79, Therefore, reject H 0 at the 1% level
11-71 a. All else equal, an increase of one question results in a decrease of 1.834 in expected
percentage of responses received. All else equal, an increase in one word in length of the
questionnaire results in a decrease of .016 in expected percentage of responses received.
b. 63.7% of the variability in the percentage of responses received can be explained by the
variability in the number of questions asked and the number of words
c. H 0 : 1  2  0, H1 : At least one i  0,(i  1, 2)
n  k 1 R2
27 .637

 23.69
2
k
1 R
2 1  .637
F 2,27,.10 = 5.49, Therefore, reject H 0 at the 1% level
d. t27,.005 = 2.771, 99% CI: -1.8345  2.771(.6349). –3.5938 up to -.0752
F
e. t = -1.78, t27,.05/.025 = -1.703, -2.052.
Therefore, reject H 0 at the 5% level but not at the 2.5% level
Chapter 11: Multiple Regression
217
11-72 a. All else equal, a 1% increase in course time spent in group discussion results in an
expected increase of .3817 in the average rating of the course. All else equal, a unit
increase in money spent on the course results in an expected increase of .5172 in the
average rating by participants of the course. All else equal, a unit increase in expenditure
on non-course activities results in an expected increase of .0753 in the average rating of the
course.
b. 57.9% of the variation in the average rating can be explained by the linear relationship
with % of class time spent on discussion, money spent on class activities and money
spent on non-class activities.
c. H 0 : 1  2  3  0, H1 : At least one i  0, (i  1, 2,3)
n  k 1 R2
21 .579
F

 9.627
2
k
1 R
3 1  .579
F 2,21,.01 = 4.87
Therefore, reject H 0 at the 1% level
d. t21,.05 = 1.721, 90% CI: .3817  1.721(.2018) .0344 up to .729
e. t = 2.64, t21,.01/.005 = 2.518, 2.831
Therefore, reject H 0 at the 1% level but not at the .5% level
f. t = 1.09, t21,.10 = 1.323
Therefore, do not reject H 0 at the 20% level
11-73
Regression Analysis: y_rating versus x1_expgrade, x2_Numstudents
The regression equation is
y_rating = - 0.200 + 1.41 x1_expgrade - 0.0158 x2_Numstudents
Predictor
Coef
SE Coef
T
P
VIF
Constant
-0.2001
0.6968
-0.29
0.777
x1_expgr
1.4117
0.1780
7.93
0.000
1.5
x2_Numst
-0.015791
0.003783
-4.17
0.001
1.5
S = 0.1866
R-Sq = 91.5%
Analysis of Variance
Source
DF
Regression
2
Residual Error
17
Total
19
Source
x1_expgr
x2_Numst
DF
1
1
SS
6.3375
0.5920
6.9295
R-Sq(adj) = 90.5%
MS
3.1687
0.0348
F
90.99
P
0.000
Seq SS
5.7308
0.6067
11.74 Use the result from exercise 32
H 0 :  4  5  6  7  0, H1 : At least one i  0, (i  4,5, 6, 7, )
n  k 1 R2
55 .467  .242

 5.804
2
k
1 R
4 1  .467
F 4,55,.01 = 3.68
Therefore, reject H 0 at the 1% level
F
218
Solutions Manual for Statistics for Business & Economics, 5 th Edition
11-75 a. All else equal, each extra point in the student’s expected score leads to an expected
increase of .469 in the actual score
b. t 103,.025 = 1.98, therefore, the 95% CI: 3.369  1.98(.456) = 2.4661 up to 4.2719
c. H 0 : 3  0, H1 : 3  0
3.054
t
 2.096
1.457
t103,.025 = 1.98
Therefore, reject H 0 at the 5% level
d. 68.6% of the variation in the exam scores is explained by their linear dependence on
the student’s expected score, hours per week spent working on the course and the
student’s grade point average
e. H 0 : 1  2  3  0, H1 : At least one i  0, (i  1, 2,3)
n  k 1 R2
103 .686

 75.008
2
k
1 R
3 1  .686
F3,103,.01 = 3.97
Reject H 0 at any common levels of alpha
F
f. R  .686  .82825
g. Yˆ  2.178  .469(80)  3.369(8)  3.054(3)  75.812
11-76 a. t 22,.01 = 2.819, therefore, the 99% CI: .0974  2.819(0.0215) = .0368 up to .1580
b. H 0 :  2  0, H1 : 2  0
.374
t
 1.789
.209
t22,.05/.025 = 1.717, 2.074, therefore, reject H 0 at the 5% level but not the 2.5% level
22(.91)  2
 .9175
24
d. H 0 : 1  2  0, H1 : At least one i  0,(i  1, 2)
c. R 2 
n  k 1 R2
22 .9175

 122.33
2
k
1 R
2 1  .9175
F2,22,.01 = 5.72
Reject H 0 at any common levels of alpha
F
e. R  .9175  .9579
Chapter 11: Multiple Regression
219
11-77 a. t 2669,.05 = 1.645, therefore, the 90% CI: 480.04  1.645(224.9) = 110.0795 up to
850.0005
b. t 2669,.005 = 2.576, therefore, the 99% CI: 1350.3  2.576(212.3) = 803.4152 up to
1897.1848
c. H 0 : 8  0, H1 : 8  0
891.67
t
 4.9299
180.87
t2669,.005 = 2.576, therefore, reject H 0 at the .5% level
d. H 0 : 9  0, H1 : 9  0
722.95
t
 6.5142
110.98
t2669,.005 = 2.576, therefore, reject H 0 at the .5% level
e. 52.39% of the variability in minutes played can be explained by the variability in field
goal percentage, free throw percentage and the number of rebounds per minutes
f. R  .5239  .7238
11-78 a. H 0 : 1  0, H1 : 1  0
.052
t
 2.737
.019
t60,.005 = 2.66, therefore, reject H 0 at the 1% level
b. H 0 :  2  0, H1 : 2  0
.005
t
 .119
.042
t60,.10 = 1.296, therefore, do not reject H 0 at the 20% level
c. 17% of the variation in the growth rate in GDP can be explained by the variations in
real income per capita and the average tax rate, as a proportion of GNP.
d. R  .17  .4123
11-79 A report can be written by following the Case Study and testing the significance of the
model. See section 11.9
11-80 a. Start with the correlation matrix:
Correlations: EconGPA, SATverb, SATmath, HSPct
EconGPA
0.478
0.000
SATverb
SATmath
0.427
0.000
0.353
0.003
HSPct
0.362
0.000
0.201
0.121
SATverb
SATmath
0.497
0.000
220
Solutions Manual for Statistics for Business & Economics, 5 th Edition
Regression Analysis: EconGPA versus SATverb, SATmath, HSPct
The regression equation is
EconGPA = 0.612 + 0.0239 SATverb + 0.0117 SATmath + 0.00530
61 cases used 51 cases contain missing values
Predictor
Coef
SE Coef
T
P
Constant
0.6117
0.4713
1.30
0.200
SATverb
0.023929
0.007386
3.24
0.002
SATmath
0.011722
0.007887
1.49
0.143
HSPct
0.005303
0.004213
1.26
0.213
S = 0.4238
R-Sq = 32.9%
Analysis of Variance
Source
DF
Regression
3
Residual Error
57
Total
60
Source
SATverb
SATmath
HSPct
DF
1
1
1
SS
5.0171
10.2385
15.2556
HSPct
VIF
1.2
1.5
1.3
R-Sq(adj) = 29.4%
MS
1.6724
0.1796
F
9.31
P
0.000
Seq SS
3.7516
0.9809
0.2846
The regression model indicates positive coefficients, as expected, for all three independent
variables. The greater the high school rank, and the higher the SAT verbal and SAT math scores,
the larger the Econ GPA. The high school rank variable has the smallest t-statistic and is
removed from the model:
Regression Analysis: EconGPA versus SATverb, SATmath
The regression equation is
EconGPA = 0.755 + 0.0230 SATverb + 0.0174 SATmath
67 cases used 45 cases contain missing values
Predictor
Coef
SE Coef
T
P
Constant
0.7547
0.4375
1.72
0.089
SATverb
0.022951
0.006832
3.36
0.001
SATmath
0.017387
0.006558
2.65
0.010
S = 0.4196
R-Sq = 30.5%
Analysis of Variance
Source
DF
Regression
2
Residual Error
64
Total
66
Source
SATverb
SATmath
DF
1
1
SS
4.9488
11.2693
16.2181
VIF
1.1
1.1
R-Sq(adj) = 28.3%
MS
2.4744
0.1761
F
14.05
P
0.000
Seq SS
3.7109
1.2379
Both SAT variables are now statistically significant at the .05 level and appear to pick up
separate influences on the dependent variable. The simple correlation coefficient between SAT
math and SAT verbal is relatively low at .353. Thus, multicollinearity will not be dominant in
this regression model.
The final regression model, with conditional t-statistics in parentheses under the coefficients, is:
Yˆ  .755  .023( SATverbal )  .0174( SATmath)
(3.36)
(2.65)
S = .4196 R2 = .305 n = 67
Chapter 11: Multiple Regression
221
b. Start with the correlation matrix:
Correlations: EconGPA, Acteng, ACTmath, ACTss, ACTcomp, HSPct
EconGPA
0.387
0.001
Acteng
ACTmath
0.338
0.003
0.368
0.001
ACTss
0.442
0.000
0.448
0.000
0.439
0.000
ACTcomp
0.474
0.000
0.650
0.000
0.765
0.000
0.812
0.000
HSPct
0.362
0.000
0.173
0.150
0.290
0.014
0.224
0.060
Acteng
ACTmath
ACTss
ACTcomp
0.230
0.053
Regression Analysis: EconGPA versus Acteng, ACTmath, ...
The regression equation is
EconGPA = - 0.207 + 0.0266 Acteng - 0.0023 ACTmath + 0.0212 ACTss
+ 0.0384 ACTcomp + 0.0128 HSPct
71 cases used 41 cases contain missing values
Predictor
Coef
SE Coef
T
P
VIF
Constant
-0.2069
0.6564
-0.32
0.754
Acteng
0.02663
0.02838
0.94
0.352
2.2
ACTmath
-0.00229
0.03031
-0.08
0.940
4.2
ACTss
0.02118
0.02806
0.75
0.453
4.6
ACTcomp
0.03843
0.07287
0.53
0.600
12.7
HSPct
0.012817
0.005271
2.43
0.018
1.2
S = 0.5034
R-Sq = 31.4%
Analysis of Variance
Source
DF
Regression
5
Residual Error
65
Total
70
Source
Acteng
ACTmath
ACTss
ACTcomp
HSPct
DF
1
1
1
1
1
SS
7.5253
16.4691
23.9945
R-Sq(adj) = 26.1%
MS
1.5051
0.2534
F
5.94
P
0.000
Seq SS
3.5362
1.0529
1.4379
0.0001
1.4983
The regression shows that only high school rank is significant at the .05 level. We may suspect
multicollinearity between the variables, particularly since there is a ‘total’ ACT score (ACT
composite) as well as the components that make up the ACT composite. Since conditional
significance is dependent on which other independent variables are included in the regression
equation, drop one variable at a time. ACTmath has the lowest t-statistic and is removed:
222
Solutions Manual for Statistics for Business & Economics, 5 th Edition
Regression Analysis: EconGPA versus Acteng, ACTss, ACTcomp, HSPct
The regression equation is
EconGPA = - 0.195 + 0.0276 Acteng + 0.0224 ACTss + 0.0339 ACTcomp
+ 0.0127 HSPct
71 cases used 41 cases contain missing values
Predictor
Coef
SE Coef
T
P
VIF
Constant
-0.1946
0.6313
-0.31
0.759
Acteng
0.02756
0.02534
1.09
0.281
1.8
ACTss
0.02242
0.02255
0.99
0.324
3.0
ACTcomp
0.03391
0.04133
0.82
0.415
4.2
HSPct
0.012702
0.005009
2.54
0.014
1.1
S = 0.4996
R-Sq = 31.4%
Analysis of Variance
Source
DF
Regression
4
Residual Error
66
Total
70
Source
Acteng
ACTss
ACTcomp
HSPct
DF
1
1
1
1
SS
7.5239
16.4706
23.9945
R-Sq(adj) = 27.2%
MS
1.8810
0.2496
F
7.54
P
0.000
Seq SS
3.5362
2.1618
0.2211
1.6048
Again, high school rank is the only conditionally significant variable. ACTcomp has the lowest
t-statistic and is removed:
Regression Analysis: EconGPA versus Acteng, ACTss, HSPct
The regression equation is
EconGPA = 0.049 + 0.0390 Acteng + 0.0364 ACTss + 0.0129 HSPct
71 cases used 41 cases contain missing values
Predictor
Coef
SE Coef
T
P
VIF
Constant
0.0487
0.5560
0.09
0.930
Acteng
0.03897
0.02114
1.84
0.070
1.3
ACTss
0.03643
0.01470
2.48
0.016
1.3
HSPct
0.012896
0.004991
2.58
0.012
1.1
S = 0.4983
R-Sq = 30.7%
Analysis of Variance
Source
DF
Regression
3
Residual Error
67
Total
70
Source
Acteng
ACTss
HSPct
DF
1
1
1
SS
7.3558
16.6386
23.9945
R-Sq(adj) = 27.6%
MS
2.4519
0.2483
F
9.87
P
0.000
Seq SS
3.5362
2.1618
1.6579
Now ACTss and high school rank are conditionally significant. ACTenglish has a t-statistic less
than 2 and is removed:
Chapter 11: Multiple Regression
223
Regression Analysis: EconGPA versus ACTss, HSPct
The regression equation is
EconGPA = 0.566 + 0.0479 ACTss + 0.0137 HSPct
71 cases used 41 cases contain missing values
Predictor
Coef
SE Coef
T
Constant
0.5665
0.4882
1.16
ACTss
0.04790
0.01355
3.53
HSPct
0.013665
0.005061
2.70
S = 0.5070
R-Sq = 27.1%
Analysis of Variance
Source
DF
Regression
2
Residual Error
68
Total
70
Source
ACTss
HSPct
DF
1
1
P
0.250
0.001
0.009
VIF
1.1
1.1
R-Sq(adj) = 25.0%
SS
6.5123
17.4821
23.9945
MS
3.2562
0.2571
F
12.67
P
0.000
Seq SS
4.6377
1.8746
Both of the independent variables are statistically significant at the .05 level and hence, the final
regression model, with conditional t-statistics in parentheses under the coefficients, is:
Yˆ  .567  .0479( ACTss)  .0137( HSPct )
(3.53)
(2.70)
S = .5070 R2 = .271 n = 71
c. The regression model with the SAT variables is the better predictor because the
standard error of the estimate is smaller than for the ACT model (.4196 vs. .5070). The R2
measure cannot be directly compared due to the sample size differences.
11-81
Correlations: Salary, age, Experien, yrs_asoc, yrs_full, Sex_1Fem, Market, C8
age
Salary
0.749
0.000
age Experien yrs_asoc yrs_full Sex_1Fem
Experien
0.883
0.000
0.877
0.000
yrs_asoc
0.698
0.000
0.712
0.000
0.803
0.000
yrs_full
0.777
0.000
0.583
0.000
0.674
0.000
0.312
0.000
Sex_1Fem -0.429
0.000
-0.234
0.004
-0.378
0.000
-0.367
0.000
-0.292
0.000
Market
0.026
0.750
-0.134
0.103
-0.150
0.067
-0.113
0.169
-0.017
0.833
0.062
0.453
-0.029
0.721
-0.189
0.020
-0.117
0.155
-0.073
0.373
-0.043
0.598
-0.094
0.254
C8
Market
-0.107
0.192
The correlation matrix indicates that several of the independent variables are likely to be
significant, however, multicollinearity is also a likely result. The regression model with all
independent variables is:
224
Solutions Manual for Statistics for Business & Economics, 5 th Edition
Regression Analysis: Salary versus age, Experien, ...
The regression equation is
Salary = 23725 - 40.3 age + 357 Experien + 263 yrs_asoc + 493 yrs_full
- 954 Sex_1Fem + 3427 Market + 1188 C8
Predictor
Coef
SE Coef
T
P
VIF
Constant
23725
1524
15.57
0.000
age
-40.29
44.98
-0.90
0.372
4.7
Experien
356.83
63.48
5.62
0.000
10.0
yrs_asoc
262.50
75.11
3.49
0.001
4.0
yrs_full
492.91
59.27
8.32
0.000
2.6
Sex_1Fem
-954.1
487.3
-1.96
0.052
1.3
Market
3427.2
754.1
4.54
0.000
1.1
C8
1188.4
597.5
1.99
0.049
1.1
S = 2332
R-Sq = 88.2%
R-Sq(adj) = 87.6%
Analysis of Variance
Source
DF
SS
MS
F
Regression
7 5776063882
825151983
151.74
Residual Error
142
772162801
5437766
Total
149 6548226683
Source
age
Experien
yrs_asoc
yrs_full
Sex_1Fem
Market
DF
1
1
1
1
1
1
P
0.000
Seq SS
3669210599
1459475287
1979334
500316356
22707368
100860164
Since age is insignificant and has the smallest t-statistics, it is removed from the model:
Regression Analysis: Salary versus Experien, yrs_asoc, ...
The regression equation is
Salary = 22455 + 324 Experien + 258 yrs_asoc + 491 yrs_full
+ 3449 Market + 1274 C8
Predictor
Coef
SE Coef
T
P
Constant
22455.2
557.7
40.26
0.000
Experien
324.24
51.99
6.24
0.000
yrs_asoc
257.88
74.88
3.44
0.001
yrs_full
490.97
59.19
8.29
0.000
Sex_1Fem
-1043.4
476.7
-2.19
0.030
Market
3449.4
753.2
4.58
0.000
C8
1274.5
589.3
2.16
0.032
S = 2330
R-Sq = 88.1%
R-Sq(adj) = 87.6%
Analysis of Variance
Source
DF
SS
MS
F
Regression
6 5771700580
961950097
177.15
Residual Error
143
776526103
5430252
Total
149 6548226683
- 1043 Sex_1Fem
VIF
6.7
4.0
2.6
1.2
1.1
1.1
P
0.000
The conditional F test for age is:
SSRF  SSRR 5, 766, 064, 000  5, 771, 700, 736
FX 2 

 .80
S 2Y | X
(2332) 2
Which is well below any common critical value of F. Thus, age is removed from the model. The
remaining independent variables are all significant at the .05 level of significance and hence,
becomes the final regression model. Residual analysis to determine if the assumption of linearity
holds true follows:
Chapter 11: Multiple Regression
10000
RESI1
5000
0
-5000
0
10
20
30
40
Experien
10000
RESI1
5000
0
-5000
0
10
yrs_asoc
20
225
Solutions Manual for Statistics for Business & Economics, 5 th Edition
10000
RESI1
5000
0
-5000
0
10
20
yrs_full
10000
5000
RESI1
226
0
-5000
0.0
0.5
Sex_1Fem
1.0
Chapter 11: Multiple Regression
227
10000
RESI1
5000
0
-5000
0.0
0.5
1.0
Market
10000
RESI1
5000
0
-5000
0.0
0.5
1.0
C8
The residual plot for Experience shows a relatively strong quadratic relationship between
Experience and Salary. Therefore, a new variable, taking into account the quadratic relationship
is generated and added to the model. None of the other residual plots shows strong evidence of
non-linearity.
228
Solutions Manual for Statistics for Business & Economics, 5 th Edition
Regression Analysis: Salary versus Experien, ExperSquared, ...
The regression equation is
Salary = 18915 + 875 Experien - 15.9 ExperSquared + 222 yrs_asoc + 612 yrs_full
- 650 Sex_1Fem + 3978 Market + 1042 C8
Predictor
Coef
SE Coef
T
P
VIF
Constant
18915.2
583.2
32.43
0.000
Experien
875.35
72.20
12.12
0.000
20.6
ExperSqu
-15.947
1.717
-9.29
0.000
16.2
yrs_asoc
221.58
59.40
3.73
0.000
4.0
yrs_full
612.10
48.63
12.59
0.000
2.8
Sex_1Fem
-650.1
379.6
-1.71
0.089
1.2
Market
3978.3
598.8
6.64
0.000
1.1
C8
1042.3
467.1
2.23
0.027
1.1
S = 1844
R-Sq = 92.6%
R-Sq(adj) = 92.3%
Analysis of Variance
Source
DF
SS
MS
F
Regression
7 6065189270
866455610
254.71
Residual Error
142
483037413
3401672
Total
149 6548226683
Source
Experien
ExperSqu
yrs_asoc
yrs_full
Sex_1Fem
Market
C8
DF
1
1
1
1
1
1
1
P
0.000
Seq SS
5109486518
91663414
15948822
678958872
12652358
139540652
16938635
The squared term for experience is statistically significant; however, the Sex_1Fem is no longer
significant at the .05 level and hence is removed from the model:
Regression Analysis: Salary versus Experien, ExperSquared, ...
The regression equation is
Salary = 18538 + 888 Experien - 16.3 ExperSquared + 237 yrs_asoc + 624 yrs_full
+ 3982 Market + 1145 C8
Predictor
Coef
SE Coef
T
P
VIF
Constant
18537.8
543.6
34.10
0.000
Experien
887.85
72.32
12.28
0.000
20.4
ExperSqu
-16.275
1.718
-9.48
0.000
16.0
yrs_asoc
236.89
59.11
4.01
0.000
3.9
yrs_full
624.49
48.41
12.90
0.000
2.8
Market
3981.8
602.9
6.60
0.000
1.1
C8
1145.4
466.3
2.46
0.015
1.0
S = 1857
R-Sq = 92.5%
R-Sq(adj) = 92.2%
Analysis of Variance
Source
DF
SS
MS
F
Regression
6 6055213011 1009202168
292.72
Residual Error
143
493013673
3447648
Total
149 6548226683
P
0.000
This is the final model with all of the independent variables being conditionally significant,
including the quadratic transformation of Experience. This would indicate that a non-linear
relationship exists between experience and salary.
Chapter 11: Multiple Regression
229
11-82
Correlations: hseval, Comper, Homper, Indper, sizehse, incom72
hseval
-0.335
0.001
Comper
Homper
0.145
0.171
-0.499
0.000
Indper
-0.086
0.419
-0.140
0.188
-0.564
0.000
sizehse
0.542
0.000
-0.278
0.008
0.274
0.009
-0.245
0.020
incom72
0.426
0.000
-0.198
0.062
-0.083
0.438
0.244
0.020
Comper
Homper
Indper
sizehse
0.393
0.000
The correlation matrix indicates that the size of the house, income and percent homeowners have
a positive relationship with house value. There is a negative relationship between the percent
industrial and percent commercial and house value.
Regression Analysis: hseval versus Comper, Homper, ...
The regression equation is
hseval = - 19.0 - 26.4 Comper - 12.1 Homper - 15.5 Indper + 7.22 sizehse
+ 0.00408 incom72
Predictor
Coef
SE Coef
T
P
VIF
Constant
-19.02
13.20
-1.44
0.153
Comper
-26.393
9.890
-2.67
0.009
2.2
Homper
-12.123
7.508
-1.61
0.110
3.0
Indper
-15.531
8.630
-1.80
0.075
2.6
sizehse
7.219
2.138
3.38
0.001
1.5
incom72
0.004081
0.001555
2.62
0.010
1.4
S = 3.949
R-Sq = 40.1%
R-Sq(adj) = 36.5%
Analysis of Variance
Source
DF
SS
MS
F
Regression
5
876.80
175.36
11.25
Residual Error
84
1309.83
15.59
Total
89
2186.63
P
0.000
All variables are conditionally significant with the exception of Indper and Homper. Since
Homper has the smaller t-statistic, it is removed:
Regression Analysis: hseval versus Comper, Indper, sizehse, incom72
The regression equation is
hseval = - 30.9 - 15.2 Comper - 5.73 Indper + 7.44 sizehse + 0.00418 incom72
Predictor
Coef
SE Coef
T
P
VIF
Constant
-30.88
11.07
-2.79
0.007
Comper
-15.211
7.126
-2.13
0.036
1.1
Indper
-5.735
6.194
-0.93
0.357
1.3
sizehse
7.439
2.154
3.45
0.001
1.5
incom72
0.004175
0.001569
2.66
0.009
1.4
S = 3.986
R-Sq = 38.2%
R-Sq(adj) = 35.3%
Analysis of Variance
Source
DF
SS
MS
F
Regression
4
836.15
209.04
13.16
Residual Error
85
1350.48
15.89
Total
89
2186.63
Indper is not significant and is removed:
P
0.000
230
Solutions Manual for Statistics for Business & Economics, 5 th Edition
Regression Analysis: hseval versus Comper, sizehse, incom72
The regression equation is
hseval = - 34.2 - 13.9 Comper + 8.27 sizehse + 0.00364 incom72
Predictor
Coef
SE Coef
T
P
VIF
Constant
-34.24
10.44
-3.28
0.002
Comper
-13.881
6.974
-1.99
0.050
1.1
sizehse
8.270
1.957
4.23
0.000
1.2
incom72
0.003636
0.001456
2.50
0.014
1.2
S = 3.983
R-Sq = 37.6%
Analysis of Variance
Source
DF
Regression
3
Residual Error
86
Total
89
R-Sq(adj) = 35.4%
SS
822.53
1364.10
2186.63
MS
274.18
15.86
F
17.29
P
0.000
This becomes the final regression model. The selection of a community with the objective of
having larger house values would include communities where the percent of commercial property
is low, the median rooms per residence is high and the per capita income is high.
11-83a. Correlation matrix:
Correlations: deaths, vehwt, impcars, lghttrks, carage
vehwt
impcars
lghttrks
carage
deaths
0.244
0.091
vehwt
impcars lghttrks
-0.284
0.048
-0.943
0.000
0.726
0.000
0.157
0.282
-0.175
0.228
-0.422
0.003
0.123
0.400
0.011
0.943
-0.329
0.021
Crash deaths are positively related to vehicle weight and percentage of light trucks and negatively
related to percent imported cars and car age. Light trucks will have the strongest linear
association of any independent variable followed by car age. Multicollinearity is likely to exist
due to the strong correlation between impcars and vehicle weight.
b.
Regression Analysis: deaths versus vehwt, impcars, lghttrks, carage
The regression equation is
deaths = 2.60 +0.000064 vehwt - 0.00121 impcars
- 0.0395 carage
Predictor
Coef
SE Coef
T
Constant
2.597
1.247
2.08
vehwt
0.0000643
0.0001908
0.34
impcars
-0.001213
0.005249
-0.23
lghttrks
0.008332
0.001397
5.96
carage
-0.03946
0.01916
-2.06
+ 0.00833 lghttrks
P
0.043
0.738
0.818
0.000
0.045
S = 0.05334
R-Sq = 59.5%
R-Sq(adj) = 55.8%
Analysis of Variance
Source
DF
SS
MS
F
Regression
4
0.183634
0.045909
16.14
Residual Error
44
0.125174
0.002845
Total
48
0.308809
VIF
10.9
10.6
1.2
1.4
P
0.000
Light trucks is a significant positive variable. Since impcars has the smallest t-statistic, it is
removed from the model:
Chapter 11: Multiple Regression
231
Regression Analysis: deaths versus vehwt, lghttrks, carage
The regression equation is
deaths = 2.55 +0.000106 vehwt + 0.00831 lghttrks - 0.0411 carage
Predictor
Coef
SE Coef
T
P
VIF
Constant
2.555
1.220
2.09
0.042
vehwt
0.00010622 0.00005901
1.80
0.079
1.1
lghttrks
0.008312
0.001380
6.02
0.000
1.2
carage
-0.04114
0.01754
-2.34
0.024
1.2
S = 0.05277
R-Sq = 59.4%
Analysis of Variance
Source
DF
Regression
3
Residual Error
45
Total
48
R-Sq(adj) = 56.7%
SS
0.183482
0.125326
0.308809
MS
0.061161
0.002785
F
21.96
P
0.000
Also, remove vehicle weight using the same argument:
Regression Analysis: deaths versus lghttrks, carage
The regression equation is
deaths = 2.51 + 0.00883 lghttrks - 0.0352 carage
Predictor
Coef
SE Coef
T
P
Constant
2.506
1.249
2.01
0.051
lghttrks
0.008835
0.001382
6.39
0.000
carage
-0.03522
0.01765
-2.00
0.052
S = 0.05404
R-Sq = 56.5%
Analysis of Variance
Source
DF
Regression
2
Residual Error
46
Total
48
VIF
1.1
1.1
R-Sq(adj) = 54.6%
SS
0.174458
0.134351
0.308809
MS
0.087229
0.002921
F
29.87
P
0.000
The model has light trucks and car age as the significant variables. Note that car age is
marginally significant (p-value of .052) and hence could also be dropped from the model.
c. The regression modeling indicates that the percentage of light trucks is conditionally
significant in all of the models and hence is an important predictor in the model. Car
age and imported cars are marginally significant predictors when only light trucks is
included in the model.
11.84 a. Correlation matrix:
Correlations: deaths, Purbanpop, Ruspeed, Prsurf
deaths Purbanpo
Purbanpo -0.594
0.000
Ruspeed
Prsurf
0.305
0.033
-0.224
0.121
-0.556
0.000
0.207
0.153
Ruspeed
-0.232
0.109
232
Solutions Manual for Statistics for Business & Economics, 5 th Edition
Descriptive Statistics: deaths, Purbanpop, Prsurf, Ruspeed
Variable
deaths
Purbanpo
Prsurf
Ruspeed
N
49
49
49
49
Mean
0.1746
0.5890
0.7980
58.186
Median
0.1780
0.6311
0.8630
58.400
TrMean
0.1675
0.5992
0.8117
58.222
Variable
deaths
Purbanpo
Prsurf
Ruspeed
Minimum
0.0569
0.0000
0.2721
53.500
Maximum
0.5505
0.9689
1.0000
62.200
Q1
0.1240
0.4085
0.6563
57.050
Q3
0.2050
0.8113
0.9485
59.150
StDev
0.0802
0.2591
0.1928
1.683
SE Mean
0.0115
0.0370
0.0275
0.240
The proportion of urban population and rural roads that are surfaced are negatively related to
crash deaths. Average rural speed is positively related, but the relationship is not as strong as the
proportion of urban population and surfaced roads. The simple correlation coefficients among
the independent variables are relatively low and hence multicollinearity should not be dominant
in this model. Note the relatively narrow range for average rural speed. This would indicate that
there is not much variability in this independent variable.
Regression Analysis: deaths versus Purbanpop, Prsurf, Ruspeed
The regression equation is
deaths = 0.141 - 0.149 Purbanpop - 0.181 Prsurf + 0.00457 Ruspeed
Predictor
Constant
Purbanpo
Prsurf
Ruspeed
S = 0.05510
Coef
0.1408
-0.14946
-0.18058
0.004569
SE Coef
0.2998
0.03192
0.04299
0.004942
R-Sq = 55.8%
Analysis of Variance
Source
DF
Regression
3
Residual Error
45
Total
48
SS
0.172207
0.136602
0.308809
T
0.47
-4.68
-4.20
0.92
P
0.641
0.000
0.000
0.360
VIF
1.1
1.1
1.1
R-Sq(adj) = 52.8%
MS
0.057402
0.003036
F
18.91
P
0.000
The model has conditionally significant variables for percent urban population and percent
surfaced roads. Since average rural speed is not conditionally significant, it is dropped from the
model:
Regression Analysis: deaths versus Purbanpop, Prsurf
The regression equation is
deaths = 0.416 - 0.155 Purbanpop - 0.188 Prsurf
Predictor
Coef
SE Coef
T
P
Constant
0.41609
0.03569
11.66
0.000
Purbanpo
-0.15493
0.03132
-4.95
0.000
Prsurf
-0.18831
0.04210
-4.47
0.000
S = 0.05501
R-Sq = 54.9%
Analysis of Variance
Source
DF
Regression
2
Residual Error
46
Total
48
SS
0.169612
0.139197
0.308809
VIF
1.0
1.0
R-Sq(adj) = 53.0%
MS
0.084806
0.003026
F
28.03
P
0.000
This becomes the final model since both variables are conditionally significant.
Chapter 11: Multiple Regression
233
d. Conclude that the proportions of urban populations and the percent of rural roads that
are surfaced are important independent variables in explaining crash deaths. All else
equal, increases in the proportion of urban population, the lower the crash deaths. All
else equal, increases in the proportion of rural roads that are surfaced will result in
lower crash deaths. The average rural speed is not conditionally significant.
11-85 a. Correlation matrix and descriptive statistics
Correlations: hseval, sizehse, Taxhse, Comper, incom72, totexp
sizehse
Taxhse
Comper
incom72
totexp
hseval
0.542
0.000
0.248
0.019
-0.335
0.001
0.426
0.000
0.261
0.013
sizehse
Taxhse
Comper
incom72
0.289
0.006
-0.278
0.008
0.393
0.000
-0.022
0.834
-0.114
0.285
0.261
0.013
0.228
0.030
-0.198
0.062
0.269
0.010
0.376
0.000
The correlation matrix shows that multicollinearity is not likely to be a problem in this model
since all of the correlations among the independent variables are relatively low.
Descriptive Statistics: hseval, sizehse, Taxhse, Comper, incom72, totexp
Variable
hseval
sizehse
Taxhse
Comper
incom72
totexp
N
90
90
90
90
90
90
Mean
21.031
5.4778
130.13
0.16211
3360.9
1488848
Median
20.301
5.4000
131.67
0.15930
3283.0
1089110
TrMean
20.687
5.4638
128.31
0.16206
3353.2
1295444
Variable
hseval
sizehse
Taxhse
Comper
incom72
totexp
Minimum
13.300
5.0000
35.04
0.02805
2739.0
361290
Maximum
35.976
6.2000
399.60
0.28427
4193.0
7062330
Q1
17.665
5.3000
98.85
0.11388
3114.3
808771
Q3
24.046
5.6000
155.19
0.20826
3585.3
1570275
StDev
4.957
0.2407
48.89
0.06333
317.0
1265564
SE Mean
0.522
0.0254
5.15
0.00668
33.4
133402
The range for applying the regression model (variable means + / - 2 standard errors):
Hseval
21.03 +/- 2(4.957) = 11.11 to 30.94
Sizehse
5.48 +/- 2(.24) = 5.0 to 5.96
Taxhse
130.13 +/- 2(48.89) = 32.35 to 227.91
Comper
.16 +/- 2(.063) = .034 to .286
Incom72
3361 +/- 2(317) = 2727 to 3995
Totexp
1488848 +/- 2(1265564) = not a good approximation
234
Solutions Manual for Statistics for Business & Economics, 5 th Edition
b. Regression models:
Regression Analysis: hseval versus sizehse, Taxhse, ...
The regression equation is
hseval = - 31.1 + 9.10 sizehse - 0.00058 Taxhse
+0.000001 totexp
Predictor
Coef
SE Coef
T
Constant
-31.07
10.09
-3.08
sizehse
9.105
1.927
4.72
Taxhse
-0.000584
0.008910
-0.07
Comper
-22.197
7.108
-3.12
incom72
0.001200
0.001566
0.77
totexp
0.00000125 0.00000038
3.28
S = 3.785
R-Sq = 45.0%
Analysis of Variance
Source
DF
Regression
5
Residual Error
84
Total
89
SS
982.98
1203.65
2186.63
- 22.2 Comper + 0.00120 incom72
P
0.003
0.000
0.948
0.002
0.445
0.002
VIF
1.3
1.2
1.3
1.5
1.5
R-Sq(adj) = 41.7%
MS
196.60
14.33
F
13.72
P
0.000
Taxhse is not conditionally significant, nor is income; however, dropping one variable at a time,
eliminate Taxhse first, then eliminate income:
Regression Analysis: hseval versus sizehse, Comper, totexp
The regression equation is
hseval = - 29.9 + 9.61 sizehse - 23.5 Comper +0.000001 totexp
Predictor
Coef
SE Coef
T
P
VIF
Constant
-29.875
9.791
-3.05
0.003
sizehse
9.613
1.724
5.58
0.000
1.1
Comper
-23.482
6.801
-3.45
0.001
1.2
totexp
0.00000138 0.00000033
4.22
0.000
1.1
S = 3.754
R-Sq = 44.6%
Analysis of Variance
Source
DF
Regression
3
Residual Error
86
Total
89
SS
974.55
1212.08
2186.63
R-Sq(adj) = 42.6%
MS
324.85
14.09
F
23.05
P
0.000
This is the final regression model. All of the independent variables are conditionally significant.
Both the size of house and total government expenditures enhances market value of homes while
the percent of commercial property tends to reduce market values of homes.
c. In the final regression model, the tax variable was not found to be conditionally
significant and hence it is difficult to support the developer’s claim.
Chapter 11: Multiple Regression
235
11-86 a. Correlation matrix
Correlations: retsal84, Unemp84, perinc84
retsal84
Unemp84 -0.370
0.008
perinc84 0.633
0.000
Unemp84
-0.232
0.101
There is a positive association between per capita income and retail sales. There is a negative
association between unemployment and retail sales. Multicollinearity does not appear to be a
problem since the correlation between the independent variables is relatively low.
Descriptive Statistics: retsal84, perinc84, Unemp84
Variable
retsal84
perinc84
Unemp84
N
51
51
51
Mean
5536
12277
7.335
Median
5336
12314
7.000
TrMean
5483
12166
7.196
Variable
retsal84
perinc84
Minimum
4250
8857
Maximum
8348
17148
Q1
5059
10689
Q3
6037
13218
StDev
812
1851
2.216
SE Mean
114
259
0.310
Regression Analysis: retsal84 versus Unemp84, perinc84
The regression equation is
retsal84 = 3054 - 86.3 Unemp84 + 0.254 perinc84
Predictor
Coef
SE Coef
T
P
Constant
3054.3
724.4
4.22
0.000
Unemp84
-86.25
40.20
-2.15
0.037
perinc84
0.25368
0.04815
5.27
0.000
S = 612.9
R-Sq = 45.3%
Analysis of Variance
Source
DF
Regression
2
Residual Error
48
Total
50
SS
14931938
18029333
32961271
VIF
1.1
1.1
R-Sq(adj) = 43.0%
MS
7465969
375611
F
19.88
P
0.000
This is the final model since all of the independent variables are conditionally significant at the
.05 level. The 95% confidence intervals for the regression slope coefficients:
ˆ1  t (Sˆ ) :
-86.25 +/- 2.011(40.2) = -86.25 +/- 80.84
ˆ2  t (Sˆ ) :
.254 + / - 2.011(.04815) = .254 + / - .0968
1
2
b. All things equal, the condition effect of a $1,000 decrease in per capita income on
retail sales would be to reduce retail sales by $254.
236
Solutions Manual for Statistics for Business & Economics, 5 th Edition
c. Adding state population as a predictor yields the following regression results:
Regression Analysis: retsal84 versus Unemp84, perinc84, Totpop84
The regression equation is
retsal84 = 2828 - 71.3 Unemp84 + 0.272 perinc84 - 0.0247 Totpop84
Predictor
Coef
SE Coef
T
P
VIF
Constant
2828.4
737.9
3.83
0.000
Unemp84
-71.33
41.40
-1.72
0.091
1.1
perinc84
0.27249
0.04977
5.47
0.000
1.1
Totpop84
-0.02473
0.01845
-1.34
0.187
1.1
S = 607.8
R-Sq = 47.3%
R-Sq(adj) = 44.0%
Analysis of Variance
Source
DF
SS
MS
F
Regression
3
15595748
5198583
14.07
Residual Error
47
17365523
369479
Total
50
32961271
P
0.000
The population variable is not conditionally significant and adds little explanatory power,
therefore, it will not improve the multiple regression model.
11-87 a.
Correlations: FRH, FBPR, FFED, FM2, GDPH, GH
FBPR
FFED
FM2
GDPH
GH
FRH
0.510
0.000
0.244
0.001
0.854
0.000
0.934
0.000
0.907
0.000
FBPR
FFED
FM2
GDPH
0.957
0.000
0.291
0.000
0.580
0.000
0.592
0.000
0.077
0.326
0.287
0.000
0.285
0.000
0.987
0.000
0.977
0.000
0.973
0.000
The correlation matrix shows that both interest rates have a significant positive impact on
residential investment. The money supply, GDP and government expenditures also have a
significant linear association with residential investment. Note the high correlation between the
two interest rate variables, which, as expected, would create significant problems with
multicollinearity if both variables are included in the regression model. Hence, the interest rates
will be developed in two separate models.
Regression Analysis: FRH versus FBPR, FM2, GDPH, GH
The regression equation is
FRH = 70.0 - 3.79 FBPR - 0.0542 FM2 + 0.0932 GDPH - 0.165 GH
166 cases used 52 cases contain missing values
Predictor
Coef
SE Coef
T
P
VIF
Constant
70.00
24.87
2.82
0.005
FBPR
-3.7871
0.6276
-6.03
0.000
1.2
FM2
-0.054210
0.009210
-5.89
0.000
46.8
GDPH
0.093223
0.007389
12.62
0.000
58.1
GH
-0.16514
0.03747
-4.41
0.000
28.7
S = 23.42
R-Sq = 86.7%
Analysis of Variance
Source
DF
Regression
4
Residual Error
161
Total
165
SS
573700
88331
662030
R-Sq(adj) = 86.3%
MS
143425
549
F
261.42
P
0.000
Chapter 11: Multiple Regression
237
This will be the final model with prime rate as the interest rate variable since all of the
independent variables are conditionally significant. Note the significant multicollinearity that
exists between the independent variables.
Regression Analysis: FRH versus FFED, FM2, GDPH, GH
The regression equation is
FRH = 55.0 - 2.76 FFED - 0.0558 FM2 + 0.0904 GDPH - 0.148 GH
166 cases used 52 cases contain missing values
Predictor
Coef
SE Coef
T
P
VIF
Constant
55.00
26.26
2.09
0.038
FFED
-2.7640
0.6548
-4.22
0.000
1.2
FM2
-0.05578
0.01007
-5.54
0.000
50.7
GDPH
0.090402
0.007862
11.50
0.000
59.6
GH
-0.14752
0.03922
-3.76
0.000
28.5
S = 24.61
R-Sq = 85.3%
Analysis of Variance
Source
DF
Regression
4
Residual Error
161
Total
165
SS
564511
97519
662030
R-Sq(adj) = 84.9%
MS
141128
606
F
233.00
P
0.000
The model with the federal funds rate as the interest rate variable is also the final model with all
of the independent variables conditionally significant. Again, multicollinearity will be a problem
with this regression model.
b. 95% confidence intervals for the slope coefficients on the interest rate term:
Bank prime rate as the interest rate variable:
ˆ1  t (Sˆ ) : -3.7871 +/- 1.96(.6276) = -3.7871 +/- 1.23
1
Federal funds rate as the interest rate variable:
ˆ1  t (Sˆ ) : -2.764 +/- 1.96(.6548) = -2.764 +/- 1.2834
1
11-88 a.
Correlations: Infmrt82, Phys82, Perinc84, Perhosp
Infmrt82
0.434
0.001
Perinc84 0.094
0.511
Perhosp
0.411
0.003
Phys82 Perinc84
Phys82
0.614
0.000
0.285
0.042
0.267
0.058
The correlation matrix shows a positive association with Phys82 and Perhosp. These variables
are the number of physicians per 100,000 population and the total per capita expenditures for
hospitals. One would expect a negative association, therefore, examine the scatterdiagram of
infant mortality vs. phys82:
238
Solutions Manual for Statistics for Business & Economics, 5 th Edition
Infmrt82
20
15
10
100
150
200
250
300
350
400
450
500
550
Phys82
The graph shows an obvious outlier which, upon further investigation, is the District of
Columbia. Due to the outlier status, this row is dropped from the analysis and the correlation
matrix is recalculated:
Correlations: Infmrt82, Phys82, Perinc84, Perhosp
Infmrt82
-0.147
0.309
Perinc84 -0.192
0.181
Perhosp
0.199
0.166
Phys82 Perinc84
Phys82
0.574
0.000
-0.065
0.654
0.140
0.331
The physicians per 100,000 population now has the correct sign, however, none of the
independent variables has a statistically significant linear association with the dependent
variable. Per capita expenditures for hospitals is an unexpected positive sign; however, it is not
conditionally significant. The multiple regression results are likely to yield low explanatory
power with insignificant independent variables:
Regression Analysis: Infmrt82 versus Phys82, Perinc84, Perhosp
The regression equation is
Infmrt82 = 12.7 - 0.00017 Phys82 -0.000206 Perinc84 + 6.30 Perhosp
Predictor
Coef
SE Coef
T
P
VIF
Constant
12.701
1.676
7.58
0.000
Phys82
-0.000167
0.006647
-0.03
0.980
1.5
Perinc84
-0.0002064
0.0001637
-1.26
0.214
1.6
Perhosp
6.297
3.958
1.59
0.118
1.1
S = 1.602
R-Sq = 8.9%
R-Sq(adj) = 3.0%
Analysis of Variance
Source
DF
SS
MS
F
Regression
3
11.546
3.849
1.50
Residual Error
46
118.029
2.566
Total
49
129.575
P
0.227
Chapter 11: Multiple Regression
239
As expected, the model explains less than 9% of the variability in infant mortality. None of the
independent variables are conditionally significant and multicollinearity does not appear to be a
significant problem. The standard error of the estimate is very large (1.602) relative to the size of
the infant mortality rates and hence the model would not be a good predictor. Sequentially
dropping the independent variable with the lowest t-statistic confirms the conclusion that none of
the independnet variables is conditionally significant. The search is on for better independent
variables.
b. The two variables to include are per capita spending on education (PerEduc) and per
capita spending on public welfare (PerPbwel). Since the conditional significance of
the independent variables is a function of other independent variables in the model,
we will include the original set of variables:
Regression Analysis: Infmrt82 versus Phys82, Perinc84, ...
The regression equation is
Infmrt82 = 12.2 - 0.00122 Phys82 +0.000015 Perinc84 + 8.87 Perhosp
- 1.96 PerEduc - 4.56 PerPbwel
Predictor
Coef
SE Coef
T
P
VIF
Constant
12.167
1.654
7.35
0.000
Phys82
-0.001219
0.008028
-0.15
0.880
2.4
Perinc84
0.0000152
0.0001942
0.08
0.938
2.4
Perhosp
8.872
4.010
2.21
0.032
1.2
PerEduc
-1.960
1.315
-1.49
0.143
1.8
PerPbwel
-4.555
3.360
-1.36
0.182
1.7
S = 1.550
R-Sq = 18.4%
R-Sq(adj) = 9.1%
Analysis of Variance
Source
DF
SS
MS
F
Regression
5
23.848
4.770
1.98
Residual Error
44
105.727
2.403
Total
49
129.575
P
0.100
The model shows low explanatory power and only one independent variable that is conditionally
significant (Perhosp). Dropping sequentially the independent variable with the lowest t-statistic
yields a model with no conditionally significant independent variables. This problem illustrates
that in some applications, the variables that have been identified as theoretically important
predictors do not meet the statistical test.
11-89 a.
Correlations: Salary, age, yrs_asoc, yrs_full, Sex_1Fem, Market, C8
Salary
0.749
0.000
yrs_asoc 0.698
0.000
yrs_full 0.777
0.000
Sex_1Fem -0.429
0.000
Market
0.026
0.750
C8
-0.029
0.721
age yrs_asoc yrs_full Sex_1Fem
Market
age
0.712
0.000
0.583
0.000
-0.234
0.004
-0.134
0.103
-0.189
0.020
0.312
0.000
-0.367
0.000
-0.113
0.169
-0.073
0.373
-0.292
0.000
-0.017
0.833
-0.043
0.598
0.062
0.453
-0.094
0.254
-0.107
0.192
The correlation matrix indicates several independent variables that should provide good
explanatory power in the regression model. We would expect that age, years at Associate
professor and years at full professor are likely to be conditionally significant:
240
Solutions Manual for Statistics for Business & Economics, 5 th Edition
Regression Analysis: Salary versus age, yrs_asoc, ...
The regression equation is
Salary = 21107 + 105 age + 532 yrs_asoc + 690 yrs_full - 1312 Sex_1Fem
+ 2854 Market + 1101 C8
Predictor
Coef
SE Coef
T
P
VIF
Constant
21107
1599
13.20
0.000
age
104.59
40.62
2.58
0.011
3.1
yrs_asoc
532.27
63.66
8.36
0.000
2.4
yrs_full
689.93
52.66
13.10
0.000
1.7
Sex_1Fem
-1311.8
532.3
-2.46
0.015
1.3
Market
2853.9
823.3
3.47
0.001
1.0
C8
1101.0
658.1
1.67
0.097
1.1
S = 2569
R-Sq = 85.6%
Analysis of Variance
Source
DF
Regression
6
Residual Error
143
Total
149
SS
5604244075
943982608
6548226683
R-Sq(adj) = 85.0%
MS
934040679
6601277
F
141.49
P
0.000
Dropping the C8 variable yields:
Regression Analysis: Salary versus age, yrs_asoc, ...
The regression equation is
Salary = 21887 + 90.0 age + 539 yrs_asoc + 697 yrs_full - 1397 Sex_1Fem
+ 2662 Market
Predictor
Coef
SE Coef
T
P
VIF
Constant
21887
1539
14.22
0.000
age
90.02
39.92
2.26
0.026
3.0
yrs_asoc
539.48
63.91
8.44
0.000
2.4
yrs_full
697.35
52.80
13.21
0.000
1.7
Sex_1Fem
-1397.2
533.2
-2.62
0.010
1.2
Market
2662.3
820.3
3.25
0.001
1.0
S = 2585
R-Sq = 85.3%
Analysis of Variance
Source
DF
Regression
5
Residual Error
144
Total
149
SS
5585766862
962459821
6548226683
R-Sq(adj) = 84.8%
MS
1117153372
6683749
F
167.14
P
0.000
This is the final model. All of the independent variables are conditionally significant and the
model explains a sizeable portion of the variability in salary.
b. To test the hypothesis that the rate of change in female salaries as a function of age is
less than the rate of change in male salaries as a function of age, the dummy variable
Sex_1Fem is used to see if the slope coefficient for age (X1) is different for males and
females. The following model is used:
Y  0  ( 1   6 X 4 ) X1   2 X 2  3 X 3   4 X 4  5 X 5
 0  1 X1  6 X 4 X1   2 X 2  3 X 3   4 X 4  5 X 5
Chapter 11: Multiple Regression
241
Create the variable X4X1 and then test for conditional significance in the regression model. If it
proves to be a significant predictor of salaries then there is strong evidence to conclude that the
rate of change in female salaries as a function of age is different than for males:
Regression Analysis: Salary versus age, femage, ...
The regression equation is
Salary = 22082 + 85.1 age + 11.7 femage + 543 yrs_asoc + 701 yrs_full
- 1878 Sex_1Fem + 2673 Market
Predictor
Coef
SE Coef
T
P
VIF
Constant
22082
1877
11.77
0.000
age
85.07
48.36
1.76
0.081
4.4
femage
11.66
63.89
0.18
0.855
32.2
yrs_asoc
542.85
66.73
8.13
0.000
2.6
yrs_full
701.35
57.35
12.23
0.000
2.0
Sex_1Fem
-1878
2687
-0.70
0.486
31.5
Market
2672.8
825.1
3.24
0.001
1.0
S = 2594
R-Sq = 85.3%
Analysis of Variance
Source
DF
Regression
6
Residual Error
143
Total
149
SS
5585990999
962235684
6548226683
R-Sq(adj) = 84.7%
MS
930998500
6728921
F
138.36
P
0.000
The regression shows that the newly created variable of femage is not conditionally significant.
Thus, we cannot conclude that the rate of change in female salaries as a function of age differs
from that of male salaries.
11-90
Regression Analysis: hseval versus sizehse, taxrate, incom72, Homper
The regression equation is
hseval = - 32.7 + 6.74 sizehse - 223 taxrate + 0.00464 incom72 + 11.2 Homper
Predictor
Coef
SE Coef
T
P
VIF
Constant
-32.694
8.972
-3.64
0.000
sizehse
6.740
1.880
3.58
0.001
1.4
taxrate
-222.96
45.39
-4.91
0.000
1.2
incom72
0.004642
0.001349
3.44
0.001
1.2
Homper
11.215
4.592
2.44
0.017
1.3
S = 3.610
R-Sq = 49.3%
Analysis of Variance
Source
DF
Regression
4
Residual Error
85
Total
89
SS
1079.08
1107.55
2186.63
R-Sq(adj) = 47.0%
MS
269.77
13.03
F
20.70
P
0.000
All of the independent variables are conditionally significant. Now add the percent of
commercial property to the model to see if it is significant:
242
Solutions Manual for Statistics for Business & Economics, 5 th Edition
Regression Analysis: hseval versus sizehse, taxrate, ...
The regression equation is
hseval = - 31.6 + 6.76 sizehse - 218 taxrate + 0.00453 incom72 + 10.3 Homper
- 2.18 Comper
Predictor
Coef
SE Coef
T
P
VIF
Constant
-31.615
9.839
-3.21
0.002
sizehse
6.757
1.892
3.57
0.001
1.4
taxrate
-217.63
49.58
-4.39
0.000
1.4
incom72
0.004534
0.001412
3.21
0.002
1.4
Homper
10.287
5.721
1.80
0.076
2.0
Comper
-2.182
7.940
-0.27
0.784
1.7
S = 3.630
R-Sq = 49.4%
Analysis of Variance
Source
DF
Regression
5
Residual Error
84
Total
89
SS
1080.07
1106.56
2186.63
R-Sq(adj) = 46.4%
MS
216.01
13.17
F
16.40
P
0.000
With a t-statistic of -.27 we have not found strong enough evidence to reject H 0 that the slope
coefficient on percent commercial property is significantly different from zero. The conditional
F test:
SSRF  SSRR 1080.07  1079.08
FComper 

 .075 , Thus, at any common level of alpha,
S 2Y | X
(3.63)2
do not reject H 0 that the percent commercial property has no effect on house values.
Add percent industrial property to the base model:
Regression Analysis: hseval versus sizehse, taxrate, ...
The regression equation is
hseval = - 28.6 + 6.10 sizehse - 232 taxrate + 0.00521 incom72 + 8.68 Homper
- 7.50 Indper
Predictor
Coef
SE Coef
T
P
VIF
Constant
-28.643
9.602
-2.98
0.004
sizehse
6.096
1.956
3.12
0.003
1.5
taxrate
-232.34
46.00
-5.05
0.000
1.2
incom72
0.005208
0.001431
3.64
0.000
1.4
Homper
8.681
5.070
1.71
0.091
1.6
Indper
-7.505
6.427
-1.17
0.246
1.7
S = 3.602
R-Sq = 50.2%
Analysis of Variance
Source
DF
Regression
5
Residual Error
84
Total
89
SS
1096.77
1089.86
2186.63
R-Sq(adj) = 47.2%
MS
219.35
12.97
F
16.91
P
0.000
Likewise, the percent industrial property is not significantly different from zero. The conditional
F test:
RSS5  RSS4 1096.77  1079.08
FIndper 

 1.36 , Again this is lower than the
S 2Y | X
(3.602) 2
critical value of F based on common levels of alpha, therefore, do not reject H 0 that the percent
industrial property has no effect on house values.
Chapter 11: Multiple Regression
243
Tax rate models:
Regression Analysis: taxrate versus taxbase, expercap, Homper
The regression equation is
taxrate = - 0.0174 -0.000000 taxbase +0.000162 expercap + 0.0424 Homper
Predictor
Coef
SE Coef
T
P
VIF
Constant
-0.017399
0.007852
-2.22
0.029
taxbase
-0.00000000 0.00000000
-0.80
0.426
1.2
expercap
0.00016204 0.00003160
5.13
0.000
1.1
Homper
0.042361
0.009378
4.52
0.000
1.2
S = 0.007692
R-Sq = 31.9%
Analysis of Variance
Source
DF
Regression
3
Residual Error
86
Total
89
SS
0.00237926
0.00508785
0.00746711
R-Sq(adj) = 29.5%
MS
0.00079309
0.00005916
F
13.41
P
0.000
Since taxbase is not significant, it is dropped from the model:
Regression Analysis: taxrate versus expercap, Homper
The regression equation is
taxrate = - 0.0192 +0.000158 expercap + 0.0448 Homper
Predictor
Coef
SE Coef
T
P
Constant
-0.019188
0.007511
-2.55
0.012
expercap
0.00015767 0.00003106
5.08
0.000
Homper
0.044777
0.008860
5.05
0.000
S = 0.007676
R-Sq = 31.4%
Analysis of Variance
Source
DF
Regression
2
Residual Error
87
Total
89
SS
0.0023414
0.0051257
0.0074671
VIF
1.1
1.1
R-Sq(adj) = 29.8%
MS
0.0011707
0.0000589
F
19.87
P
0.000
Both of the independent variables are significant. This becomes the base model that we now add
percent commercial property and percent industrial property sequentially:
Regression Analysis: taxrate versus expercap, Homper, Comper
The regression equation is
taxrate = - 0.0413 +0.000157 expercap + 0.0643 Homper + 0.0596 Comper
Predictor
Coef
SE Coef
T
P
VIF
Constant
-0.041343
0.008455
-4.89
0.000
expercap
0.00015660 0.00002819
5.55
0.000
1.1
Homper
0.064320
0.009172
7.01
0.000
1.4
Comper
0.05960
0.01346
4.43
0.000
1.3
S = 0.006966
R-Sq = 44.1%
Analysis of Variance
Source
DF
Regression
3
Residual Error
86
Total
89
SS
0.0032936
0.0041735
0.0074671
R-Sq(adj) = 42.2%
MS
0.0010979
0.0000485
F
22.62
P
0.000
Percent commercial property is conditionally significant and an important independent variable
as shown by the conditional F-test:
RSS3  RSS2 .003294  .00234
FComper 

 19.62
S 2Y | X
(.006966) 2
244
Solutions Manual for Statistics for Business & Economics, 5 th Edition
With 1 degree of freedom in the numberator and (90-3-1) = 86 degrees of freedom in the
denominator, the critical value of F at the .05 level is 3.95. Hence we would conclude that the
percentage of commercial property has a statistically significant positive impact on tax rate.
We now add industrial property to test the effect on tax rate:
Regression Analysis: taxrate versus expercap, Homper, Indper
The regression equation is
taxrate = - 0.0150 +0.000156 expercap + 0.0398 Homper - 0.0105 Indper
Predictor
Coef
SE Coef
T
P
VIF
Constant
-0.015038
0.009047
-1.66
0.100
expercap
0.00015586 0.00003120
5.00
0.000
1.1
Homper
0.03982
0.01071
3.72
0.000
1.6
Indper
-0.01052
0.01273
-0.83
0.411
1.5
S = 0.007690
R-Sq = 31.9%
Analysis of Variance
Source
DF
Regression
3
Residual Error
86
Total
89
R-Sq(adj) = 29.5%
SS
0.00238178
0.00508533
0.00746711
MS
0.00079393
0.00005913
F
13.43
P
0.000
The percent industrial property is insignificant with a t-statistic of only -.83. The F-test confirms
that the variable does not have a significant impact on tax rate:
RSS3  RSS2 .002382  .00234
FIndper 

 .683
S 2Y | X
(.00769)2
With 1 degree of freedom in the numberator and (90-3-1) = 86 degrees of freedom in the
denominator, the critical value of F at the .05 level is 3.95. Hence we would conclude that the
percentage of commercial property has no statistically significant impact on tax rate.
In conclusion, we found no evidence to back three of the activists claims and strong evidence to
reject one of them. We concluded that commercial development will have no effect on house
value, while it will actually increase tax rate. In addition, we concluded that industrial
development will have no effect on house value or tax rate.
It was important to include all of the other independent variables in the regression models
because the conditional significance of any one variable is influenced by which other
independent variables are in the regression model. Therefore, it is important to test if direct
relationships can be ‘explained’ by the relationships with other predictor variables.
11-91 a. Correlation matrix:
Correlations: EconGPA, sex, Acteng, ACTmath, ACTss, ACTcomp, HSPct
EconGPA
0.187
0.049
Acteng
0.387
0.001
ACTmath
0.338
0.003
ACTss
0.442
0.000
ACTcomp
0.474
0.000
HSPct
0.362
0.000
sex
Acteng
ACTmath
ACTss
ACTcomp
0.270
0.021
-0.170
0.151
-0.105
0.375
-0.084
0.478
0.216
0.026
0.368
0.001
0.448
0.000
0.650
0.000
0.173
0.150
0.439
0.000
0.765
0.000
0.290
0.014
0.812
0.000
0.224
0.060
0.230
0.053
sex
Chapter 11: Multiple Regression
245
There exists a positive relationship between EconGPA and all of the independent variables,
which is expected. Note that there is a high correlation between the composite ACT score and
the individual components, which is again, as expected. Thus, multicollinearity is likely to be a
serious concern in this regression model.
Regression Analysis: EconGPA versus sex, Acteng, ...
The regression equation is
EconGPA = - 0.050 + 0.261 sex + 0.0099 Acteng + 0.0064 ACTmath + 0.0270 ACTss
+ 0.0419 ACTcomp + 0.00898 HSPct
71 cases used 41 cases contain missing values
Predictor
Coef
SE Coef
T
P
VIF
Constant
-0.0504
0.6554
-0.08
0.939
sex
0.2611
0.1607
1.62
0.109
1.5
Acteng
0.00991
0.02986
0.33
0.741
2.5
ACTmath
0.00643
0.03041
0.21
0.833
4.3
ACTss
0.02696
0.02794
0.96
0.338
4.7
ACTcomp
0.04188
0.07200
0.58
0.563
12.8
HSPct
0.008978
0.005716
1.57
0.121
1.4
S = 0.4971
R-Sq = 34.1%
Analysis of Variance
Source
DF
Regression
6
Residual Error
64
Total
70
SS
8.1778
15.8166
23.9945
R-Sq(adj) = 27.9%
MS
1.3630
0.2471
F
5.52
P
0.000
As expected, multicollinearity is affecting the results. A strategy of dropping the variable with
the lowest t-statistic with each successive model causes the dropping of the following variables
(in order): 1) ACTmath, 2) ACTeng, 3) ACTss, 4) HSPct. The two variables that remain are the
final model of gender and ACTcomp:
Regression Analysis: EconGPA versus sex, ACTcomp
The regression equation is
EconGPA = 0.322 + 0.335 sex + 0.0978 ACTcomp
73 cases used 39 cases contain missing values
Predictor
Coef
SE Coef
T
Constant
0.3216
0.5201
0.62
sex
0.3350
0.1279
2.62
ACTcomp
0.09782
0.01989
4.92
S = 0.4931
R-Sq = 29.4%
Analysis of Variance
Source
DF
Regression
2
Residual Error
70
Total
72
SS
7.0705
17.0192
24.0897
P
0.538
0.011
0.000
VIF
1.0
1.0
R-Sq(adj) = 27.3%
MS
3.5352
0.2431
F
14.54
P
0.000
Both independent variables are conditionally significant.
b. The model could be used in college admission decisions by creating a predicted GPA
in economics based on sex and ACT comp scores. This predicted GPA could then be used with
other factors in deciding admission. Note that this model predicts that females will outperform
males with equal test scores. Using this model as the only source of information may lead to
charges of unequal treatment.
246
Solutions Manual for Statistics for Business & Economics, 5 th Edition
11-92
92.7% of the total variation in gross revenue from a medical practice can be explained by the
variations in number of hours worked, number of physicians, number of allied health personnel
and the number of rooms used in the practice. The regression model indicates that gross revenue
from a medical practice is positively associated with the number of hours worked, the number of
physicians in the the practice, the number of allied health personnel and the number of rooms
used in the practice. All of these variables are expected to have a positive effect on gross
revenue. With df = (50 – 4 – 1) = 45, the critical value of t for a two-tailed test at an alpha of .05
is approximately 2.021. Therefore, one can conclude that all of the independent variables are
conditionally significant with the sole exception of the number of rooms used in the practice.
This variable, while a positive estimated coefficient, is not significantly different from zero.
The number of physicians in the practice has a large effect on gross revenue since, all else equal,
for a 1% increase in the number of physicians in the practice, we estimate an increase in the gross
revenue from the medical practice by .673%.
Note however that before additional physicians are hired, the analysis should be extended to
include not only gross revenue but also profits.
11-93
82.5% of the total variation in the industry 4-firm concentration ratio can be explained by the
variation in the amount of capital required for entry, and economies of scale.
All else equal, for a one percent increase in the amount of capital required for entry into the
industry, the 4-firm concentration ratio is estimated to increase by .234%.
All else equal, for a one percent increase in the economies of scale of the industry, the 4-firm
concentration ratio is estimated to increase by .267%.
Both of the independent variables are conditionally significant as their t-statistics are larger than
the critical value of t for a two-tailed test at an alpha of .05 (df = 35-2-1) = 32 (t-critical =
approximately 2.042). Therefore, one can conclude that both capital requirements as well as
economies of scale are conditionally significant.
Both of the variables have the expected positive sign since as the amount of capital required to
enter into the industry acts as a ‘barrier to entry’ and hence higher concentration is likely to
result. Likewise, the more significant the economies of scale, the larger firm has a cost
advantage over a smaller firm. This is an industry where it is more difficult to ‘start small’ and
grow. Therefore, again, we would expect higher concentration in the industry.
Download