Multiple Linear Regression

advertisement
Chapter 12
Multiple Linear Regression
Introduction
Multiple linear regression (MLR) is an extension of simple linear regression. In the
previous chapter we considered a single dependent variable, y, and a single independent
variable x. MLR is used when there are two or more independent variables where the
model using population information is
yt   0  1 x1t   2 x2t 
  k xkt
(12.1)
It should not be surprising that most interesting phenomena are too complex to be
modeled using just a single independent variable. Corn yield, from the previous chapter,
might be more accurately modeled using Eq. (12.1) with
yi  corn yield per acre
x1i  amount of water applied per acre
x2i  amount of fertilizer applied per acre
x3i  type of soil the corn is planted in.
In practice, the Keynesian consumption function consumption function considered in the
previous chapter, is probably too simple to be of much use. A better consumption
function might be given by Eq. (12.1) with
yi  real household consumption
x1i  household real income
x2i  the real interest rate
x3i  household real wealth.
MLR allows a much more comprehensive model than does simple LR and should provide
superior predictions.
Figure 12.1 is a time series plot showing how Arizona state and local government
employment has changed over time. It does not seem that a single straight line will fit this
data very well. You might be curious about the periodic dips in government
employment. The dips generally represent school teachers who are not considered (using
the government definition of employment) to be employed in the summer.
1
Figure 12.1 Arizona state and local government employment
12.1 The assumptions of MLR
MLR uses some assumptions. These are necessary for the mathematics to work out
properly and we need not worry about these details. We do need to worry about what
happens to the MLR process when these assumptions are not met (usually we will say
“'when the assumptions have been violated”'). Strange and awful things can happen
when the assumptions have been violated as we will see later. The assumptions used in
MLR:
1. The error terms  i are normally distributed
2. The error terms are independent of past error terms, that is

E  i  i 1 ,  i 2,
  E  
i
3. The populations all have equal variances,  1   2   3 
4. The independent variables are not correlated with each other.
12.2 The MLR output
In this section we will examine a MLR output from the SPSS statistical software
package. You will find there is some carryover from the previous chapters, but that
there is some new material as well. In the previous chapter we considered a regression
equation with household consumption as the dependent variable and household income as
the independent variable. It seems quite reasonable that other variables might affect
consumption as well. The classical economists would maintain that interest rates affect
2
consumption (as interest rates increase people save more and consume less). Other
economists believe that household wealth is a determining factor in the household
consumption decision. Let's consider a MLR equation
yi   0  1 x1i   2 x2i  3 x3i 
  k xki   i
where
yi  real household consumption
x1i  household real income
x2i  the real interest rate
x3i  household real wealth.
The SPSS regression output is shown in Fig.12.2. The portion of Fig. 12.2 labeled
coefficients contains values for the ̂ ’s, t-statistics, and p—values. The values for the
̂ ’s are to be found in the column labeled Unstandardized coefficients. Again we will
ignore the column labeled Standardized coefficients. The column labeled Std. Error
contains values for the
sˆ
’s. The column labeled t contains t--values used in
hypothesis testing and the column labeled Sig contains p--values. As before, we want to
determine whether we believe that a particular   0 or not. Recall that   0 means
that an independent variable does not cause a change to occur in the dependent variable.
ˆ1  income   0.888
p  0.000
ˆ2  wealth   9.16 E  02
p  0.676
ˆ3  interest rate   4.731
p  0.062
Now we can see if there is a statistically significant relationship between household
income and household consumption. The easiest way to do this is with the p--value. The
p--value is located in the column labeled Sig. Note that the p--value is that for a two-tailed test. Suppose that the level of significance is   0.05 , then for 1 ,
p  0.000    0.05 and we would reject the hypothesis H 0 : 1  0 . Thus we would
conclude that a statistically significant relation exists between household income and
household consumption. You might note the large value of the test statistic, t= 27.691,
would certainly lead to the rejection of the null hypothesis for any reasonable level of
significance.
3
Figure 12.2 The SPSS regression output for a regression function using household
consumption for the dependent variable and household income, household wealth, and
the real interest rate as explanitory variables.
The row labeled WEALTH contains information about ̂ 2 , the coefficient associated
with household real wealth .
ˆ2  9.16 E  02  0.00916
sˆ  0.0.217
2
Consider the hypothesis
4
H0 : 2  0
H A : 2  0
and consider the p—value, p  0.676 . Here we would not reject the hypothese that there
is no statistically significant relationship between household wealth and household
consumption . However, we would not claim this at a 5% level of significance.
Next, let's look at real interest rates ( x3 ). The estimates are
ˆ3  2.041
sˆ  0.009
3
The p—value, p  0.009 indicates that we would reject the null hypothesis for all but the
very smallest levels of significance. The negative sign indicates that when real interest
rates increase household consumption decreases (household save more and consume
less).
You may have noticed that ˆ3  4.731 has a greater magnitude than ˆ1 . Do not think
that this means that interest rates have a greater influence on consumption than does
household income. The size of the coefficients depends on the units used when recording
the data. We could represent a five percent rate of interest as 5 or as 0.05. If we use 0.05
the resulting coefficient will be 100 times as large the result we would get using 5. What
we can say is that a one unit increase in household income will increase consumption by
0.888 units and that a one unit increase in interest rates will decrease consumption by –
4.731 units, but that it matters what the units are.
12.3 The ANOVA table: Testing H 0 : 1   2  3 
 k  0
We can also test the regression equation as a whole rather than just the individual parts.
We can use information in the ANOVA table to test the hypothesis
H 0 : 1   2  3 
 k  0
H A : Not all of the  's  0
We can test this using the p--value from the ANOVA table (it is in the column labeled
SIG). This is p  0.000 and we would reject the null hypothesis for any reasonable level
of significance. Thus we believe we have sufficient evidence to conclude that at least one
of the  ' s is not zero. The very large value of R 2  0.945 also indicates that the
regression equation fits the data quite well. A final bit of information we will use is that
an estimate of the population standard deviation can be found in the column labeled Std.
5
Error of theEstimate. That value is s  5.4551 which is a point estimate for the
population standard deviation  .
Note that the ANOVA table contains a column labeled df (this does stand for degrees of
freedom). The value df = 3 for the REGRESSION row indicates there are 3 slope terms
in the regression equation. The number of observations can be obtained from the df for
the TOTAL row. The number of observations is n = df TOTAL + 1 = 29+1 =30.
Usually we experiment with different regression models until we find one with only
statistically significant coefficients. In this example, let's rerun the model excluding
wealth from the equation. The new MLR results are shown in Fig. 12.3. It appears that
both ˆ1  0.875 (income) and ˆ2  3.735 (interest rates) suggest both statistically
signifcant relationships because each have p—values of 0. So now the regression
equation contains only statistically significant relationships.
Once we have decided that we have a good regression equation, we can use it to predict.
Suppose that we wish a prediction for consumption if x1  1000 and x2  5 , then
Cons  117.977  0.875 x1  3.735 x2
Cons  117.977  0.875 1000   3.735 5 
Cons  974.302
Note: the data used in this example is data I created and does not represent real
world data at all. There is an advantage to this. I know the true values for the  ' s . The
true values are 1  0.9 and 2  4 . In this case, it looks like MLR has done a pretty
good job of estimating the values of the slope parameters.
6
Figure 12.3 SPSS Regression output using household consumption as the dependent
vartiable and household income and interest rates as independent variables.
12.4 Adjusted R2
Another term that we have not yet discussed in the MLR output is Adjusted R2 or
Adjusted R Square on the printout. Anytime you add a variable to a regression equation
you will increase R2 . This is true regardless of whether or not there is a statistically
significant relationship between the dependent variable and the variable added to the
equation. So you can make R2 as large as you want by adding more and more variables.
That is because any variable will have some correlation with the dependent variable.
Adjusted R2 is a statistic that helps determine whether a variable should be included in
the regression equation or not. A statistically insignificant variable will increase R2 but
decrease Adjusted R2 . Fig. 12.4 shows the values of R2 and adjusted R2 for regression
7
equations using different combinations of the variables x1, x2, x3, and x4, as independent
variables. The independent variables included in the equation are shown in the left most
column. Note that when x4 is eliminated from the equation R2 decreases but adjusted
R2 increases. This suggests that x4 probably does not belong in the equation. When x3
is dropped from the equation, a large drop occurs in both statistics --this suggests that x3
belongs in the equation (unless some otherevidence exists for dropping it).
Figure 12.4. Different models showning how dropping a variable can affect R2
12.5 Colinearity
Let's consider another example. The model is
yi  0  1 x1i   2 x2i  3 x3i   i
The regression output is shown in Fig. 12.5. A close look at this regression output
indicates that something strange has happened.
First note that the value of R2 is large indicating a statistically significant relationship
between the dependent variable and some of the independent variables. The p--value for
the ANOVA table is p  0.000 meaning we could reject H 0 : 1   2  3  0 for almost
8
any level of significance. These two bits of evidence strongly suggest that at least one of
the  ' s s does not equal zero.
This evidence is contradicted when we look at the statistics for the individual  ' s ,
however.
ˆ0  3.371
p  0.111
ˆ1  3.382
p  0.159
ˆ2  3.560
p  0.448
ˆ3  0.124
p  0.478
The p—values for all the ˆ ' s are fairly large. We would not reject the hypothesis that
any slope coefficient was zero for any level of significance smaller than 15%. In short
this is the sort of result we would expect to get if none of the independent variables had a
statistically significant relationship with the independent variable.
Figure 12.5. A regression output of colinear data
9
So what went wrong? Should we conclude that MLR does not work? No, what we have
here is a violation of one of the assumptions of MLR. We violated the assumption that
says the independent variables must not be correlated with each other. The scatterplot of
x1 and x2 is shown in Fig. 12.6. It seems pretty clear that these two variables are
correlated. The scatter plot of x1 and x3 is shown in Fig 12.7. These two variables are
not correlated with each other. The violation of the assumption that the independent
variables are not correlated is called colinearity or multiple colinearity. The problem that
colinearity creates is that it makes a mess of the statistics. As above you can conlude
both that there is and that there is not a statistically significant relationship between the
independent and dependent variables. Most modern statistical packages now provide at
least some means of helping detect colinearity (though none is perfect). SPSS calculates
a variance inflation factor (VIF). If the VIF between two variable is between 0 and 10,
colinearity is presumed not to exist. If the VIF is greater than 10, it is presumed to exist.
The VIF for x1 is 839.712 in Fig 12.5 and for x2 the VIF is 839.976. This suggests that
Figure 12.6. Correlated independent variables
10
Figure 12.7. Uncorrelated independent variables
x1 and x2 are correlated with some other independent variable. The VIF for x3 is 1.017
indicating that it is not correlated with any other independent variable.
Value of VIF Colinearity ?
0--10
NO
>10
Yes
Table 12.1 Values of the Variance Inflation Factor
Next let’s eliminated one of the colinear variables ( x2 ) from the regression and see what
happens. The new regression output is shown in Fig. 12.8.
11
Figure 12.8 A regression output with no colinear variables
The VIF suggests that the two remaining variables are not colinear. Note now that there
does seem to be a statistically significant relationship between y and x1 which had been
obscured by the colinearity. Colinearity can create a number of problems. Significant
relationships can appear to be insignificant, insignificant relationshipts can appear
significant. In some cases the coefficients may have the wrong sign. Suppose that a
regression equation produced a negative value for the marginal propensity to consume.
This would mean that an increase in income would cause a decrease in consumption.
Clearly this would be incorrect. But that is one of the hazards of colinearity—the chance
of reaching the wrong conclusion.
12. 6 Heteroscedasticity
Heteroscedasticity is a Greek phrase roughly meaning “different scatter”'.
Homoscedasticity roughly means “same scatter”. This refers to one of the assumptions
of MLR, that is assumption 3 from the list in Sec 12.1. This assumption is that all
populations have the same variance (the same scatter). If they don't have the same scatter
(heteroscedasticity), then this assumption has been violated. An example of
12
heteroscedastic data is shown in Fig. 12.9. An example of homoscedastic data is shown
in Fig. 12.10.
Figure 12.9. Heteroscedastic data
Figure 12.10. Homoscedastic data
13
Figure 12.11 Regression output of the heteroscedastic data.
14
Figure 12.12. Regression output of the homoscedastic data
The MLR output in Fig. 12.11 should be compared with that in Fig. 12.12. The effects
are similar to those for colinearity. In Fig. 12.11 the p—value in the ANOVA table is
p  0.000 suggesting strongly that there is a statistically significant relationship present,
but the the p—value for ̂ 1 is 0.117 which suggests that there in not a statistically
significant relationship.
Unfortunately, there is no particularly good way of detecting the presence of
heteroscedasticity, that is we can't compute a number like the VIF. One popular
technique often used is to plot the values of the dependent variable against values of the
residual terms. The residual(or error) terms are given by
15
ei  yio  yˆi
where yio is the i—th observed value of the variable and yˆ i is the prediction for that
observation. If the variance of the populations is constant, then we should not expect to
find any relationship between the error terms and the dependent variables. Fig. 12.14 is
an example of such a graph. It uses the results from the regression example in Fig. 12.13,
the example of homoscedastic data.. Fig. 12.13 uses the residuals from the regression
shown in Fig. 12.12 (the heteroscedastic case) . In this case small values of yio seem to
produce small values of the residuals and large values of of yio seem to be associated
with large values of the residuals. No such pattern is present in the homoscedastic case.
Generally, if you can detect a pattern in a plot where one of the variables is the residuals
from a regression equation, there is a problem.
Figure 12.13. Heteroscedastic residuals
16
Figure 12.14 Homoscedastic residuals
12.7 Serial correlation
Serial correlation is a violation of the assumption that the error terms are statistically
independent. This error would occur if the error for one observation were somehow
related to the errors of other observations. This sort of error most likely occurs with time
series data. Time series data occurs when the data is observed and recorded at distinct
time periods, usually daily, weekly, monthly, quarterly or yearly. For example, with the
consumption function example, data on income, consumption and interest rates might be
collected every quarter. Another way of collecting data would be to survey several
households at roughly the same time and record each households consumption and
income. This sort of data is called cross sectional data. Note that we would not record
interest rates in the case of cross sectional data because we would not expect interest rates
to change from family to family.
The order with which the data are recorded should not be important with cross sectional
data. It should not matter which household is recorded as observation 10 or 3 or 49 in a
household survey of income and consumption. Such is not the case with time series data.
In the case of time series data order is important. The observation for 1949 should come
before the observation for 1950. Economic conditions that prevail in one time period
may persist for a number of time periods. Recessions, for example, often extend over
more than just a couple of time periods. If economics activity is below (or above) normal
in one time period, it may be expected to be below (or above) normal in the next time
period.
Suppose that interest rates did affect household consumption and we left it out of the
regression equation. If interest rates vary randomly over time perhaps not much damage
17
would result from our regression model. Interest rates don't seem to vary randomly over
time. You probably have heard the expression that interest rates are “low” or that they
are “high”. What people mean by this is that they have been “low” or “high” for an
extended period of time. If interest rates do affect consumption and we leave them out of
the regression equation, then the resulting equation will systematically under or over
predict consumption depending on whether interest rates or “low” or “high” at that
particular time. This means the error terms will tend to be positive or negative during
these time periods. So you might see a period of negative error terms followed by a
period of positive error terms. MLR assumes that the error terms should be randomly
distributed without any pattern. The most common cause of serial correlation is the
omission of a variable which should be present in the regression equation.
In the following example the true model is y   0  1 x1t   2 x2t   Suppose that
we don't know nature of the true model (who knows everything?) and we think the true
model is y  0  1 x1t   Again we use computer generated data to emphasize a
particular point. The regression results obtained form the model with the missing
dependent variable are given in Fig. 12.15. The regression results using the true model
are shown in Fig.12.16.
18
Figure 12.15 Regression results with a missing variable
The results in Fig. 12.15 suggest that no statistically significant relationship exists
between yt and x1t. The p—value from the coefficient table is p  0.274 which is
considered very weak evidence for a relationship. Identical results appear in the ANOVA
table. Finally the value R 2  0.025 is very close to zero also suggesting the lack of a
statistically significant relationship.
If x2t is now included in the regression equation the results are those shown in Fig. 12.16
These results seem quite good. The p—values all suggest the existence of statistically
significant results. Note the huge increase in the value of R2 . There is no evidence of
colinearity. Because the is the true model it should not be too surprising to find good
results
19
Figure 12.16. Regression results with no missing variable
If you check the portion of the table that contains R Square you will see a new item at the
right end of the table, the Durbin—Watson (DW) statistic. We will use the rules of
thumb for the DW statistic given in Table 12.2 .
DW statistic Serial correlation present?
0-1
Yes
1-1.5
Don’t know
1.5-2.5
No
2.5-3
Don’t know
3-4
Yes
Table 12.2 The Durbin--Watson statistic
So if DW has a value of 2.2, say, we would conclude that serial correlation is not a
problem. If D--W has a value of $0.9$ we would say that serial correlation is a problem.
If DW = 1.3 then we simply can't say. The regression output in Fig. 12.15 has DW =
.037, a strong indication that serial correlation exists. The regression output in Fig. 12.16
has DW=1.885 and we would con clude that serial correlation is not a problem for that
20
regression equation. We can also look at a time series plot of the residuals to see is a
pattern exists. If so, then serial correlation is likely a problem. Fig. 12.17 shows a plot of
the residuals that do not contain a distinct patter (it there is a pattern there I can’t see it)
The time series residual plot for the regression in Fig. 12.15 is shown in Figure 12.19.
Another typical pattern of residuals is given in Fig. 12.18 where the residual tends to
have positive values for a long period of time followed by negative values for a long
period of time. One would expect that the residuals would have randomly determined
positive and negative values.
Figure 12.17 Residuals with no apparent pattern
Figure 12.18 Residuals with a pattern (The residuals seem to be consistently positive or
negative)
21
Figure 12.19 Another set of residuals with a pattern
We have seen that the presence of serial correlation can produce unreliable parameter
estimates and give misleading conclusions about the statistical significance of the
regression equation. How about using the resulting regression equation for prediction?
The plot of the residuals suggests that the equation with the missing variable will, at
times, give very large residuals. Suppose we find predictions for y using the model in Fig.
12.15 (the model with the missing variable). In that case the prediction for the 50—th
value of y. The values for the independent variables are x1t  50.00 and x2t  148.41 .
(Don’t look for these values, they we values in the data set but I haven’t given it to you).
The observed value of y at that point is y o  56.09 . Using the model in Fig. 12.15 the
prediction is
yˆ  ˆ0  ˆ1 x1  44.594  0.433(50)  66.24 .
Using the “true” model in Figure 12.16
yˆ  ˆ0  ˆ1 x1  ˆ2 x2  9.494  4.972(50)  1.990(148.41)  56.23 .
The prediction using the “true” model (56.23) is much closer to the observed value of y
(56.09) than is the prediction using the model with the missing variable(-66.24). Not only
is the prediction wrong, but it even has the wrong sign. Not only are the parameter
estimates drastically wrong, but the predictions are as well in this case.
22
12.8 Dummy variables
Often business and economic data will contain observations which cannot be easily
modeled using causal data. For example, we might want to predict output for a particular
industry, but that industry might have been subject to a strike during the period of data
collection. Such data might look like that in Fig. 12.20. The strike occurred in time
periods 50, 51, and} 52. Suppose that we have a pretty good regression model (ignoring
the strike period) given by
yt  ˆ0  ˆ1 xt   t
We might expect that the model might have some problems if we ignore the strike period.
The regression output is shown in Fig. 12.21.. The time series plot of the actual and
predicted values is shown in Fig.12.22.
Figure 12.20 Time series data of industry output with a strike occurring in time periods
50, 51, and 52.
The results of the regression look pretty bad. The p—value from the ANOVA table and
from the coefficient table both suggest that there is no relationship between time *(the
independent variable) and output. The value of R2 is also quite small. Finally the value
of the DW statistic (DW=0.726) suggests that there is a value missing from the regression
model . What we haven’t accounted for is the strike.
23
Figure 12.21. The output for the strike data using a regression model without a strike
dummy variable
24
Figure 12.22 Plot of actual vs predicted values for industry output without using a strike
dummy variable.
A “dummy variable” can be used to model the effects of the strike in the regression
model. A dummy variable is one that has a value of zero when the phenomena is absent
and a value of one when it is present. For this example, the strike lasts for time periods
$50, 51, 52$ so the dummy is given a value of one for these time periods.The dummy is
assigned a value of zero for the preceding and subsequent time periods as shown in table
12.3..
Time Strike dummy (SD)
48
0
49
0
50
1
51
1
52
1
53
0
54
0
Table12.3 A dummy variable. Variable SD has a value 1 while a strike is in progress
and a value of 0 when one is not in progress.
25
Figure 12.23 Output from a regression model that includes a dummy variable to
represent a strike
26
Figure 12.24. A plot of the predicted vs actual values for the regression model that
includes the strike dummy variable.
The regression output which includes the dummy variable is shown in Fig. 12.23 and a
plot of the actual vs predicted values is shown in Fig. 12.24. . The value of the Durbin
Watson statistic is greatly improved, with DW = 2.170 . The value of R2 is also
considerably improved. The ANOVA table indicates that at least one slope term is
significantly related to the dependent varible. The coefficient table shows both
independent variable have a statistically significant relationship. The plot in Fig. 12.24
also seems better than the one in Fig. 12.22. The regression equation for Fig. 12.22
predicts that output would have a tendency to decrease with time while the equation for
Fig. 12.24 shows that it increases with time.
We can use the regression equations to predict output at time period time  50 and
SD  1 . The regression model that does not include the strike dummy is
yˆ  ˆ0  ˆ1 (time)  201.5  (0.175)(50)  192.75
while the model with the strike dummy is
yˆ  ˆ0  ˆ1 (time)  ˆ2 (SD)  198.58  (0.144)(50) 195.9(1)  9.88
and clearly the model with the strike dummy makes much better predictions during a
strike. The prediction the model makes in time period 53 where there is no strike is
27
yˆ  ˆ0  ˆ1 (time)  ˆ2 (SD)  198.58  (0.144)(53) 195.9(0)  206.21 .
Another thing to note is that the model with the strike dummy has the regression equation
with a positive value for the slope of the time variable (0.144) where the regression
equation without the strike dummy has a negative slope for the time variable(-0.175).
This can be particularly useful when you can predict that something out of the ordinary
like a strike might occur. Other uses for dummy variables might include using a dummy
variable to predict sales of seasonal goods (Christmas trees, soft drinks in summer,
automobiles, etc).
12.9 Applications of regression
Example 12.1. In the text we discussed some of the factors that might affect real
household consumption, that is real income, real wealth, and the real interest rate. It is
very difficult to get aggregate household wealth measures, so we will consider a slightly
different model. We will use nominal consumption ( cons ) as the dependent variable,
nominal disposable income ( dpi ), nominal interest rates ( ffr --the Federal Funds rate),
and the money supply ( m2ns --money 2 not seasonally adjusted) as explanatory
variables. A graph of cons vs dpi is shown in Fig. 12.25 and is strongly suggestive of a
relationship between the two. Note: this is real data, not hypothetically generated data.
The regression model we will use is
cons  ˆ0  ˆ1 (dpi)  ˆ2 ( ffr )  ˆ3 (m2ns)
The regression output is shown in Fig. 12.26. This output shows that the regression
model is plagued with at least two violations of the assumptions. The Durbin—Watson
statistic (0.350) surely indicates the presence of serial correlation. The large VIF’s of
DPI and M2NS indicate the presence of colinearity. This model leaves much to be
desired. A model where one of the colinear variables, M2NS, has been eliminated from
the regression equation produces the output shown in Fig. 12.27. This model does not
show any evidence of colinearity, but the D—W statistic is even worse.
28
Figure 12.25 A plot of nominal household consumption vs. nominal disposable personal
income.
29
Figure 12.26. The regression output for Example 12.1.
30
Figure 12.27 The regression output for Example 12.1 with M2NS removed from the
model.
Example 12.2 Another way of looking at the consumption function is in the form of
rates of change. That is we would model
cons  0  1  dpi   2  ffr   2  confidence   4  m2ns 
where

xt  xt  xt 1
 . That is the
symbol indicates that the past value of a
varible is to be subtraced from the current value creating a new variable called a
difference. The regression output is shown in Fig. 12. 28. Note that the statistics of this
equation are much superior to those of the previous equation. There is no evidence of
31
either serial correlation or collinearity. The consumer confidence index does not seem
significant here, however. The interest rate variable is only moderately significant.
32
Problems
Panel A
Panel B
Panel C
Panel D
Problem 12.1 Which of the above residual plots indicate a possible violation of the
assumptions of MLR? What is the violation called (e.g. heteroscedasticity). What is the
cause of the violation.
Problem 12.2 What is meant by the phrase “a statistically significant relationship.”
What is the difference between a statistically significant relationship and a causal
relationship.
33
Problem 12.3 Use the output from Fig. P12.1 to answer the questions that follow.
a) Is there any evidence of collinearity? State why or why not.
b) Is there any evidence of serial correlation? State why or why not.
c) Should the hypothesis 1   2    k  0 be rejected at a 5% level of significance?
d) Should the hypothesis 1   2    k  0 be rejected at a 10% level of
significance?
e) Which independent variables should be included in the regression model. Why?
Figure P12.1 Use for Problem 12.3
34
Problem 12.4 Use the output from Fig. P12.2 to answer the questions that follow.
a) Is there any evidence of collinearity? State why or why not.
b) Is there any evidence of serial correlation? State why or why not.
c) Should the hypothesis 1   2    k  0 be rejected at a 5% level of significance?
d) Should the hypothesis 1   2    k  0 be rejected at a 10% level of
significance?
e) Which independent variables should be included in the regression model. Why?
Figure P12.2 Use for Problem 12.4
35
Problem 12.5 Use the output from Fig. P12.3 to answer the questions that follow.
a) Is there any evidence of collinearity? State why or why not.
b) Is there any evidence of serial correlation? State why or why not.
c) Should the hypothesis 1   2    k  0 be rejected at a 5% level of significance?
d) Should the hypothesis 1   2    k  0 be rejected at a 10% level of
significance?
e) Which independent variables should be included in the regression model. Why?
Figure P12.3 Use for Problem 12.5
36
Download