Chapter 13 Multiple Regression Analysis

advertisement
Chapter 13: Multiple Regression Analysis
Chapter 13
Multiple Regression Analysis
LEARNING OBJECTIVES
This chapter presents the potential of multiple regression analysis as a tool in
business decision making and its applications, thereby enabling you to:
1.
2.
3.
4.
5.
Define multiple regression analysis
Specify a multiple regression model.
Interpret the results of the model
Understand and apply significance tests of the regression model
and its coefficients.
Compute and interpret residuals, the standard error of the estimate,
and the coefficient of determination.
CHAPTER TEACHING STRATEGY
In chapter 12 using simple regression, the groundwork was prepared for
chapter 13 by presenting the regression model along with mechanisms for testing
the strength of the model such as se, r2, a t test of the slope, and the residuals. In
this chapter, multiple regression is presented as an extension of the simple linear
regression case. It is initially pointed out that any model that has at least one
interaction term or a variable that represents a power of two or more is considered
a multiple regression model. Multiple regression opens up the possibilities of
predicting by multiple independent variables and nonlinear relationships. It is
emphasized in the chapter that with both simple and multiple regression models
there is only one dependent variable. Where simple regression utilizes only one
independent variable, multiple regression can utilize more than one independent
variable.
© 2010 John Wiley & Sons Canada, Ltd.
435
Chapter 13: Multiple Regression Analysis
Presented early in chapter 13 are the simultaneous equations that need to
be solved to develop a first-order multiple regression model using two predictors.
This should help the student to see that there are three equations with three
unknowns to be solved. In addition, there are eight values that need to be
determined before solving the simultaneous equations (x1, x2, y, x12, . . .)
Suppose there are five predictors. Six simultaneous equations must be solved and
the number of sums needed as constants in the equations become overwhelming.
At this point, the student will begin to realize that most researchers do not want to
take the time nor the effort to solve for multiple regression models by hand. For
this reason, much of the chapter is presented using computer printouts. The
assumption is that the use of multiple regression analysis is largely from computer
analysis.
Topics included in this chapter are similar to the ones in chapter 12
including tests of the slope, R2, and se. In addition, an adjusted R2 is introduced in
chapter 13. The adjusted R2 takes into account the degrees of freedom error and
total degrees of freedom whereas R2 does not. If there is a significant discrepancy
between adjusted R2 and R2, then the regression model may not be as strong as it
appears to be with the R2. The gap between R2 and adjusted R2 tends to increase
as non significant independent variables are added to the regression model and
decreases with increased sample size.
© 2010 John Wiley & Sons Canada, Ltd.
436
Chapter 13: Multiple Regression Analysis
CHAPTER OUTLINE
13.1 The Multiple Regression Model
Multiple Regression Model with Two Independent Variables (First-Order)
Determining the Multiple Regression Equation
A Multiple Regression Model
13.2 Significant Tests of the Regression Model and its Coefficients
Testing the Overall Model
Significance Tests of the Regression Coefficients
13.3 Residuals, Standard Error of the Estimate, and R2
Residuals
SSE and Standard Error of the Estimate
Coefficient of Multiple Determination (R2)
Adjusted R2
13.4 Interpreting Multiple Regression Computer Output
A Re-examination of the Multiple Regression Output
KEY TERMS
Adjusted R2
Coefficient of Multiple Determination (R2)
Dependent Variable
Independent Variable
Least Squares Analysis
Multiple Regression
Outliers
Partial Regression Coefficient
Residual
Response Plane
Response Surface
Response Variable
Standard Error of the Estimate
© 2010 John Wiley & Sons Canada, Ltd.
437
Chapter 13: Multiple Regression Analysis
SOLUTIONS TO PROBLEMS IN CHAPTER 13
13.1
The regression model is:
ŷ = 25.0287 – 0.0497 x1 + 1.9282 x2
Predicted value of y for x1 = 200 and x2 = 7 is:
ŷ = 25.0287 – 0.0497(200) + 1.9282(7) = 28.586
13.2
The regression model is:
ŷ = 118.5595 – 0.0794 x1 – 0.8843 x2 + 0.3769 x3
Predicted value of y for x1 = 33, x2 = 29, and x3 = 13 is:
ŷ = 118.5595 – 0.0794(33) – 0.8843(29) + 0.3769(13) = 95.1943
13.3
The regression model is:
ŷ = 121.62 – 0.174 x1 + 6.02 x2 + 0.00026 x3 + 0.0041 x4
There are four independent variables. If x2, x3, and x4 are held constant,
the predicted y will decrease by 0.174 for every unit increase in x1.
Predicted y will increase by 6.02 for every unit increase in x2 as x1, x3, and
x4 are held constant. Predicted y will increase by 0.00026 for every unit
increase in x3 holding x1, x2, and x4 constant. If x4 is increased by one
unit, the predicted y will increase by 0.0041 if x1, x2, and x3 are held
constant.
13.4
The regression model is:
ŷ = 31,409.5 + 0.08425 x1 + 289.62 x2 – 0.0947 x3
For every unit increase in x1, the predicted y increases by 0.08425 if x2 and
x3 are held constant. The predicted y will increase by 289.62 for every
unit increase in x2 if x1 and x3 are held constant. The predicted y will
decrease by 0.0947 for every unit increase in x3 if x1 and x2 are held
constant.
© 2010 John Wiley & Sons Canada, Ltd.
438
Chapter 13: Multiple Regression Analysis
13.5
The regression model is:
Per Capita = –7,655.99 + 116.66 Paper Consumption –
– 265.09 Fish Consumption + 45.63 Gasoline Consumption.
For every unit increase in paper consumption, the predicted per capita
consumption increases by 116.66 if fish and gasoline consumptions are
held constant. For every unit increase in fish consumption, the predicted
per capita consumption decreases by 265.09 if paper and gasoline
consumptions are held constant. For every unit increase in gasoline
consumption, the predicted per capita consumption increases by 45.63 if
paper and fish consumptions are held constant.
13.6
The regression model is:
Insider Ownership =
17.8141 – 0.0651 Debt Ratio – 0.1286 Dividend Payout
For every unit of increase in debt ratio there is a predicted decrease of
0.0651 in insider ownership if dividend payout is held constant. If
dividend payout is increased by one unit, then there is a predicted drop of
insider ownership by 0.1286 with debt ratio held constant.
13.7
There are 9 predictors in this model. The F test for overall significance of
the model is 1.99 with a p value of .0825. This model is not significant at
 = .05. Only one of the t values is statistically significant. Predictor x1
has a t of 2.73 which has an associated probability of .011 and this is
significant at  = .05.
13.8
This model contains three predictors. The F test is significant at  = .05
but not at  = .01. The t values indicate that only one of the three
predictors is significant. Predictor x1 yields a t value of 3.41 with an
associated probability of .005. The recommendation is to rerun the model
using only x1 and then search for other variables besides x2 and x3 to
include in future models.
© 2010 John Wiley & Sons Canada, Ltd.
439
Chapter 13: Multiple Regression Analysis
13.9
The regression model is:
Per Capita = –7,655.99 + 116.66 Paper Consumption –
– 265.09 Fish Consumption + 45.63 Gasoline Consumption.
This model yields an F = 14.32 with p-value = .0023. Thus, there is
overall significance at  = .01. One of the three predictors is significant.
Gasoline Consumption has a t = 2.66 with p-value of .033 which is
statistically significant at  = .05. The p-values of the t statistics for the
other two predictors are insignificant indicating that a model with just
Gasoline Consumption as a single predictor might be nearly as strong.
13.10 The regression model is:
Insider Ownership =
17.8141 – 0.0651 Debt Ratio – 0.1286 Dividend Payout
The overall value of F is only 0.02 with p-value of .978. This model is
not significant. Neither of the t values are significant (tDebt = – 0.21 with a
p-value of .840 and tDividend = – 0.12 with a p-value of .905).
13.11 The regression model is:
ŷ = 3.98077 + 0.07322 x1 – 0.03232 x2 – 0.00389 x3
The overall F for this model is 100.47 with p-value of .000 000 03. This
model is significant at  = .000 000 1. Only one of the predictors, x1, has
a significant t value (t = 3.50, p-value of .005). The other independent
variables have non significant t values (x2: t = –1.55, p-value of .150 and
x3: t = –1.01, p-value of .332). Since x2 and x3 are non significant
predictors, the researcher should consider the using a simple regression
model with only x1 as a predictor. The R2 would drop some but the model
would be much more parsimonious.
© 2010 John Wiley & Sons Canada, Ltd.
440
Chapter 13: Multiple Regression Analysis
13.12 The regression equation for the model using both x1 and x2 is:
ŷ = 243.4408 – 16.6079 x1 – 0.0732 x2
The overall F = 156.89 with a p-value of .000. x1 is a significant
predictor of y as indicated by t = – 16.10 and a p-value of .000.
For x2, t = – 0.39 with a p-value of .702. x2 is not a significant predictor
of y when included with x1. Since x2 is not a significant predictor, the
researcher might want to rerun the model using just x1 as a predictor.
The regression model using only x1 as a predictor is:
ŷ = 235.1429 – 16.7678 x1
There is very little change in the coefficient of x1 from model one
(2 predictors) to this model. The overall F = 335.47 with a p-value of
.000 is highly significant. By using the one-predictor model, we get
virtually the same predictability as model with the two predictors and it is
more parsimonious.
13.13 There are 3 predictors in this model and 15 observations.
The regression equation is:
ŷ = 657.053 + 5.7103 x1 – 0.4169 x2 –3.4715 x3
F = 8.96 with a p-value of .0027
x1 is significant at  = .01 (t = 3.19, p-value of .0087)
x3 is significant at  = .05 (t = – 2.41, p-value of .0349)
The model is significant overall.
13.14 The standard error of the estimate is 3.503. R2 is .408 and the adjusted R2
is only .203. This indicates that there are a lot of insignificant predictors
in the model. That is underscored by the fact that eight of the nine
predictors have nonsignificant t values.
© 2010 John Wiley & Sons Canada, Ltd.
441
Chapter 13: Multiple Regression Analysis
13.15 S = 9.722, R2 = .515 but the adjusted R2 is only .404. The difference in the
two is due to the fact that two of the three predictors in the model are nonsignificant. The model fits the data only modestly. The adjusted R2
indicates that 40.4% of the variance of y is accounted for by this model
and 59.6% is unaccounted for by the model.
13.16 The standard error of the estimate of 14,599.85 indicates that this model
predicts Per Capita Personal Consumption to within + 14,599.85 about
68% of the time. The entire range of Personal Per Capita for the data is
slightly less than 110,000. Relative to this range, the standard error of the
estimate is modest. R2 = .85989 and the adjusted value of R2 is .799848
indicating that there are potentially some nonsignificant variables in the
model. An examination of the t statistics reveals that two of the three
predictors are not significant. The model has relatively good
predictability.
13.17 S = 6.490. R2 = .0056. R2 (adj.) = – .243. This model has no
predictability.
13.18 The value of S = se = 0.2331, R2 = .965, and adjusted R2 = .955. This is a
very strong regression model. However, since x2 and x3 are not significant
predictors, the researcher should consider the using a simple regression
model with only x1 as a predictor. The R2 would drop some but the model
would be much more parsimonious.
13.19 For the regression equation for the model using both x1 and x2, S = se =
6.333, R2 = .963 and adjusted R2 = .957. Overall, this is a very strong
model. For the regression model using only x1 as a predictor, the standard
error of the estimate is 6.124, R2 = .963 and the adjusted R2 = .960. The
value of R2 is the same as it was with the two predictors. However, the
adjusted R2 is slightly higher with the one-predictor model because the
non-significant variable has been removed. In conclusion, by using the one
predictor model, we get virtually the same predictability as with the two
predictor model and it is more parsimonious.
13.20 R2 = .710, adjusted R2 = .630, S = se = 109.43. The model is overall
significant. A comparison of R2 with the adjusted R2 shows that the
adjusted R2 reduces the overall proportion of variation of the dependent
variable accounted for by the independent variables by a factor of 0.08, or
8%. The model is moderately strong.
© 2010 John Wiley & Sons Canada, Ltd.
442
Chapter 13: Multiple Regression Analysis
13.21 The Histogram indicates that there may be some problem with the error
terms being normally distributed as does the Normal Probability Plot of
the Residuals in which the plotted points are not completely lined up on
the line. The Residuals vs. Fits plot reveals that there may be some lack of
homogeneity of error variance.
13.22 There are four predictors. The equation of the regression model is:
ŷ = –55.93 + 0.01049 x1 – 0.1072 x2 + 0.57922 x3 – 0.8695 x4
The test for overall significance yields an F = 55.52 with a p-value of .000
which is significant at  = .001. Three of the t tests for regression
coefficients are significant at  = .01 including the coefficients for
x2, x3, and x4. The R2 value of 80.2% indicates strong predictability for the
model. The value of the adjusted R2 (78.7%) is close to R2 and S = se is
9.025.
13.23 There are two predictors in this model. The equation of the regression
model is:
ŷ = 203.3937 + 1.1151 x1 – 2.2115 x2
The F test for overall significance yields a value of 24.55 with an
associated p-value of .0000013 which is significant at  = .00001. Both
variables yield t values that are significant at a 5% level of significance.
x2 is significant at  = .001. The R2 is a rather modest 66.3% and the
standard error of the estimate is 51.761.
13.24 The regression model is:
ŷ = 137.268 + 0.002515 x1 + 29.2061 x2
F = 10.89 with p = .005, S = se = 9.401, R2 = .731, adjusted R2 = .664. For
x1, t = 0.01 with p = .99 and for x2, t = 4.47 with p = .002. This model has
good predictability. The gap between R2 and adjusted R2 indicates that
there may be a non-significant predictor in the model. The t values show
x1 has virtually no predictability and x2 is a significant predictor of y.
© 2010 John Wiley & Sons Canada, Ltd.
443
Chapter 13: Multiple Regression Analysis
13.25 The regression model is:
ŷ = 362.3054 – 4.74552 x1 – 13.8997 x2 + 1.874297 x3
F = 16.05 with p = .001, S = se = 37.07, R2 = .858, adjusted R2 = .804. For
x1, t = – 4.35 with p = .002; for x2, t = – 0.73 with p = .483, for x3, t = 1.96
with p = .086. Thus, only one of the three predictors, x1, is a significant
predictor in this model. This model has very good predictability (R2 =
.858). The gap between R2 and adjusted R2 underscores the fact that there
are two non-significant predictors in this model.
13.26 The regression model is:
Gold = – 51.5749 + 0.0696 Copper + 18.7835 Silver + 3.5378 Aluminum
The overall F for this model is 12.19 with a p-value of .002 which is
significant at  = .01. The t test for Silver is significant at  = .01 ( t =
4.94, p = .001). The t test for Aluminum yields a t = 3.03 with a p-value of
.016 which is significant at  = .05. The t test for Copper is insignificant
with a p-value of .939. The value of R2 was 82.1% compared
to an adjusted R2 of 75.3%. The gap between the two indicates the
presence of some insignificant predictors (Copper). The standard error of
the estimate is 53.44.
13.27 The regression model was:
Treasury Rate = – 1.3128+ 0 Bank Rate + 0.9015 Prime Rate =
= – 1.3128+ 0.9015 Prime Rate
F = 689.8266 with p = .000 00836 (significant)
R2 = 0.993 and adjusted R2 = 0.991
The high value of adjusted R2 indicates that the model has a very strong
predictability. The t test for Prime Rate is significant ( t = 26.26,
p = .000 0015). For the regression model using only Prime Rate as a
predictor, the standard error of the estimate is 0.073 , R2 = .993 and the
adjusted R2 = .991. The value of R2 is the same as it was with the two
predictors. However, the adjusted R2 is higher with the one-predictor
model because the non-significant variable has been removed.
© 2010 John Wiley & Sons Canada, Ltd.
444
Chapter 13: Multiple Regression Analysis
13.28 The regression model was:
Total Goods = – 0.5272+ 0.2575Durable Goods +
+ 0.1610Semi-durable Goods + 0.5827Non-durable Goods
F = 21,767.09 with a p-value of .000
S = se = 0.1985, R2 = 0.99972 and adjusted R2 = 0.99968. The high value
of adjusted R2 indicates that the model has a very strong predictability.
All variables are significant (Durable Goods t = 12.79, p-value of .000;
Semi-durable Goods t = 8.67, p-value of .000; and Non-durable Goods t =
125.14, p-value of .000).
13.29 Exchange rate(C$ per US$) = 1.949 + 0.0000 (Price Index) -1.0406
(Relative Unit Labour Cost) + 0.0000 (PPI-Mfg) + 0.5442 (CPI).
The results show that two of the initial predictors – Price Index and PPIMfg – have no predictive value at all. Relative Unit Labour Cost and CPI
are strong predictors. Relative Unit Labour Cost is inversely related to the
exchange rate.
13.30 The regression model was:
New Car Dealers = – 6854.01+ 3.957893Used Vehicles and Parts +
+ 0.248643 Total Excluding Used Vehicles and Parts –
– 1.43775 Gas Stations
F = 348.6134 with p = .000.
S = se = 1838.944, R2 = 0.987722, and adjusted R2 = 0.984889.
The high value of adjusted R2 indicates that the model has a very strong
predictability. The t test for Used Vehicles and Parts is significant
(t = 5.37, p = 0.000128). The t test for Total Excluding Used Vehicles and
Parts is significant at  = .05 (t = 2.85, p = 0.013687). The t test for Gas
Stations yields a t = – 4.69 with a p-value of 0.000422.
13.31 The regression equation is:
ŷ = 87.890 – 0.256 x1 – 2.714 x2 + 0.071 x3
F = 47.571 with a p-value of .000 significant at  = .001.
S = se = 0.8503, R2 = .9407, adjusted R2 = .9209.
All three predictors produced significant t tests with two of them
(x2 and x3) significant at .01 and the other, x1 significant at  = .05. This is
a very strong model.
© 2010 John Wiley & Sons Canada, Ltd.
445
Chapter 13: Multiple Regression Analysis
13.32 Two of the diagnostic charts indicate that there may be a problem with the
error terms being normally distributed. The histogram indicates that the
error term distribution might be skewed to the right and the normal
probability plot is somewhat nonlinear. In addition, the residuals vs. fits
chart indicates a potential heteroscadasticity problem with residuals for
middle values of x producing more variability that those for lower and
higher values of x.
© 2010 John Wiley & Sons Canada, Ltd.
446
Download