Violation of the Assumptions of the Linear Regression Model 1

advertisement
Violation of the Assumptions of the
Linear Regression Model
1
• We study the underlying assumptions of the Linear Regression
model further, and look at:
– How to test for violations?
– What causes the violations?
– What are the consequences?
e.g any combination of the following problems:
– the coefficient estimates are wrong
– the associated standard errors are wrong
– the distribution for the test statistics are inappropriate
– How can it be fixed?
– Change the model, so that the assumptions are no longer
violated
– Work around the problem by using alternative (econometric)
techniques which are still valid
2
• More specifically, we are going to study:
1. E(ǫt ) = 0
2. var(ǫt ) = σ 2 < ∞
3. cov(ǫi ,ǫj ) = 0
4. No perfect multicollinearity
5. Omitting/Including variables
6. Errors correlated with regressors E(ǫt xk,t ) 6= 0
7. Model selection and specification checking
7.1 Model Building
7.2 Lasso, Forward Stage-wise regression and LARS
7.3 Specification checking: residual plots and non-linearity;
parameter stability; influential observations
3
1. Assumption: E (ǫt ) = 0
• A1 states that the mean of the disturbances is zero.
• Disturbances can never be observed, so we use the residuals
instead.
• it can only be an approximate investigation of the properties of
the errors
• since residuals depend on the chosen estimation method,
different methods will yield different residuals with potentially
different properties.
• The mean of the residuals will always be zero provided that
there is a constant term in the regression.
• Always include a constant term... but what if the
economic/finance model does not support a constant term?
• This can be a way to test the validity of the econ/finance
theory in the data: include a constant term and test whether it
is equal to zero.
• Example: CAPM
4
2. Assumption: var(ǫt ) = σ 2 < ∞
• The variance of the errors is constant, σ 2 - this is known as
(unconditional) homoscedasticity. If the errors do not have a
constant variance, we say that they are heteroskedastic.
• How can we detect heteroskedasticity?
• Graphical methods
• Formal tests:
• we will discuss Goldfeld-Quandt test and White’s test
• both test H0 : homoskedasticity vs H1 : heteroskedasticity
5
Detection of Heteroskedasticity: graph
• Say we estimate a regression model, calculate the residuals,
ǫ̂t , and plot them against one regressor
ût
+
x 2t
–
6
Detection of Heterosk.: Goldfeld-Quandt (GQ) test
It is carried out as follows:
1. Split the total sample of length T into two sub-samples of
length T1 and T2 . The regression model is estimated on each
sub-sample and the two residual variances are calculated.
2. The null hypothesis is that the variances of the disturbances
are equal, H0 : σ12 = σ22
3. The test statistic, denoted GQ, is simply the ratio of the two
residual variances where the larger of the two variances must
be placed in the numerator.
GQ =
s12
s22
7
Detection of Heterosk.: Goldfeld-Quandt (GQ) test
(Cont’d)
4. The test statistic is distributed as an F(T1 − k, T2 − k) under
the null of homoscedasticity.
• Big practical issue: where do you split the sample? It is often
arbitrary, and it may crucially affect the outcome of the test.
8
Detection of Heterosk.: White’s test
• White’s general test for heteroskedasticity is one of the best
approaches because it makes few assumptions about the form
of the heteroskedasticity.
• The test is carried out as follows:
1. Assume that the regression we carried out is as follows
yt = β1 + β2 x2t + β3 x3t + ǫt
And we want to test Var(ǫt ) = σ 2 . We estimate the model,
obtaining the residuals, ǫˆt .
2. Then run the auxiliary regression
2
2
ǫ̂2t = α1 + α2 x2t + α3 x3t + α4 x2t
+ α5 x3t
+ α6 x2t x3t + vt
9
Detection of Heterosk.: White’s test (Cont’d)
3. Obtain R 2 from the auxiliary regression and multiply it by the
number of observations, T. It can be shown that
T × R 2 ∼ χ2 (m)
where m is the number of regressors in the auxiliary regression
excluding the constant term.
4. If the χ2 test statistic from step 3 is greater than the
corresponding value from the statistical table then reject the
null hypothesis that the disturbances are homoskedastic.
10
Consequences of using OLS in the presence of Heterosk.
• OLS estimation still gives unbiased coefficient estimates, but
they are no longer BLUE.
• This implies that if we still use OLS in the presence of
heteroskedasticity, our standard errors could be inappropriate
and hence any inferences we make could be misleading.
• Whether the standard errors calculated using the usual
formula are too big or too small will depend upon the form of
the heteroskedasticity.
11
How do we Deal with Heterosk.?
• If the form (i.e. the cause) of the heteroskedasticity is known,
then we can use an estimation method which takes this into
account (called Generalised Least Squares, GLS).
• A simple illustration of GLS is as follows: Suppose that the
error variance is related to another variable zt by
var(ǫt ) = σ 2 zt2
• To remove the heteroskedasticity, rescale the regression model
for each obsevation by zt (standardization)
yt
1
x2t
x3t
= β1 + β2
+ β3
+ vt
zt
zt
zt
zt
12
How do we Deal with Heterosk.? (Cont’d)
where vt =
ut
is an error term.
zt
• Now var(ut ) = σ 2 zt2 ,
var(vt ) = var
ut
zt
=
var(ut ) σ 2 zt2
= 2 = σ 2 for known zt .
zt2
zt
• Other solutions include:
1. Transforming the variables into logs or reducing by some other
measure of “size”.
2. Use White’s heteroscedasticity consistent standard error
estimates.
The effect of using White’s correction is that in general the
standard errors for the slope coefficients are increased relative
to the usual OLS standard errors.
13
How do we Deal with Heterosk.? (Cont’d)
This makes us more “conservative” in hypothesis testing, so
that we would need more evidence against the null hypothesis
before we would reject it.
14
3. Assumption: cov(ǫi ,ǫj ) = 0
• This assumption essentially states that there is no pattern in
the errors.
• If there are patterns in the residuals from a model, we say
that they are autocorrelated.
• Some stereotypical patterns we may find in the residuals are
given on the next 3 slides.
15
Positive Autocorrelation
+
ût
ût
+
+
–
û t–1
–
time
–
Positive Autocorrelation is indicated by a cyclical residual plot over
time.
16
Negative Autocorrelation
+
ût
ût
+
+
–
û t–1
time
–
–
Negative autocorrelation is indicated by an alternating pattern
where the residuals cross the time axis more frequently than if they
were distributed randomly
17
No pattern in residuals – No autocorrelation
+
ût
ût
+
+
–
û t–1
time
–
–
No pattern in residuals at all: this is what we would like to see
18
Detecting Autocorrelation: Durbin-Watson Test
• The Durbin-Watson (DW) is a test for first order
autocorrelation - i.e. it assumes that the relationship is
between an error and its first lag, that is
ǫt = ρǫt−1 + vt
• The test is: H0 : ρ = 0
where vt ∼ N(0, σv2 )
and H1 : ρ 6= 0
• Full details are available in Chapter 5 of Brooks.
• Many limitations:
1. Only consider 1st-order
2. Constant term in regression
3. Regressors are non-stochastic
4. No lags of dependent variable
19
Detecting Autocorrelation: Breusch-Godfrey Test
• Related to the modified Ljung-Box test (see e.g. Hayashi
chapter 2.10).
• It is a more general test for r th order autocorrelation:
= ρ1 ǫt−1 + ρ2 ǫt−2 + ρ3 ǫt−3 + · · · + ρr ǫt−r + vt ,
where vt ∼ N 0, σv2
ǫt
• The null and alternative hypotheses are:
H0 : ρ1 = 0 and ρ2 = 0 and . . . and ρr = 0
H1 : ρ1 6= 0 or ρ2 6= 0 or . . . or ρr 6= 0
20
• The test is carried out as follows:
1. Estimate the linear regression using OLS and obtain the
residuals, ût .
2. Regress ǫ̂t on all of the regressors from stage 1 (the xs) plus
ǫ̂t−1 , ǫ̂t−2 , . . . , ǫ̂t−r ;
Obtain R 2 from this regression.
3. It can be shown that
(T − r )R 2 ∼ χ2r
• If the test statistic exceeds the critical value from the
statistical tables, reject the null hypothesis of no
autocorrelation.
21
Csq of ignoring Autocorrelation if it is present
• The coefficient estimates derived using OLS are still unbiased,
but they are inefficient, i.e. they are not BLUE, even in large
sample sizes.
• Thus, if the standard error estimates are inappropriate, there
exists the possibility that we could make the wrong inferences.
• R 2 is likely to be inflated relative to its “correct” value for
positively correlated residuals.
22
“Remedies” for Autocorrelation
• If the form of the autocorrelation is known, we could use a
GLS procedure – i.e. an approach that allows for
autocorrelated residuals e.g., Cochrane-Orcutt.
• But such procedures that “correct” for autocorrelation require
assumptions about the form of the autocorrelation.
• If these assumptions are invalid, the cure would be more
dangerous than the disease! - see Hendry and Mizon (1978).
• However, it is unlikely to be the case that the form of the
autocorrelation is known, and a more “modern” view is that
residual autocorrelation presents an opportunity to modify
the regression.
23
Dynamic Models
• All of the models we have considered so far have been static,
yt = β1 + β2 x2t + · · · + βk xkt + ut
• But we can easily extend this analysis to the case where the
current value of yt depends on previous values of y or one of
the x’s, e.g.
yt
= β1 + β2 x2t + · · · + βk xkt + γ1 yt−1 + γ2 x2t−1
+ · · · + γk xkt−1 + ut
• We could extend the model even further by adding extra lags,
e.g. x2t−2 , yt−3 .
24
• Additional motivation for including lags:
– Inertia of the dependent variable
– it may take some time for the dependent variable to react to a
news announcement, a change in policy...
– Overreaction
– initial overreaction to an announcement: e.g. if a firm
announces that its profits are expected to be lower than
anticipated, the markets may adjust preemptively by lowering
the price of its share; when the exact profits are released, the
markets may readjust and increase the price of its share (but
lower than the original price).
• However, other problems with the regression could cause the
null hypothesis of no autocorrelation to be rejected:
– Omission of relevant (autocorrelated) variables.
– Misspecification by using an inappropriate functional form (e.g.
linear).
– Unparameterised seasonality.
25
• Models in first differences
– Another way to sometimes deal with the problem of
autocorrelation is to switch to a model in first differences.
– Denote the first difference of yt , i.e. yt − yt−1 as ∆yt ; similarly
for the x-variables, ∆x2t = x2t − x2t−1 etc.
– The model is now
∆yt = β1 + β2 ∆x2t + · · · βk ∆xkt + ut
– The change in y may also depend on previous values of y or xt :
∆yt = β1 + β2 ∆x2t + β3 ∆x2t−1 + β4 yt−1 + ut
26
• Other problems with the addition of lagged regressors to
”cure” autocorrelation
– Inclusion of lagged values of the dependent variable violates
the assumption that the RHS variables are non-stochastic.
– not a big deal if the asymptotic framework is reliable:
estimators are still consistent
– What does an equation with a large number of lags actually
mean?
– adding lagged variables may be motivated by ”statistical
analysis” rather than ”economic theory”: how do you interpret
the new model? how does it address the validity of the econ.
theory being tested?
! If there is still autocorrelation in the residuals of a model
including lags, then the OLS estimators will not even be
consistent. For example:
yt = β1 + β2 xt + β3 yt−1 + ut
with ut = ρut−1 + vt
It is easy to show that yt−1 is correlated with ut (via ut−1 )
which violates one of the key assumptions (no correlation btw
errors and regressors!)
27
4. Assumption: No perfect multicollinearity
• Perfect multicollinearity:
– Easy to detect b/c it is associated with identification issues
– e.g. estimators cannot be computed and the software returns
an error message
– Example: suppose x3 = 2x2
and the model is yt = β1 + β2 x2t + βx3t + β4 x4t + ut
• Real issue: near Multicollinearity
– R 2 will be high but the individual coefficients will have high
standard errors.
– confidence intervals for the parameters will be very wide, and
significance tests might therefore give inappropriate
conclusions.
– The regression becomes very sensitive to small changes in the
specification: e.g. additional observations, or additional
variables.
28
• Can we get a sense for ”near-multicollinearity”?
– The first step is simply to look at the matrix of correlations
between the individual variables. e.g.
corr
x2
x3
x4
x2
– 0.2 0.8
x3
0.2 – 0.3
0.8 0.3 –
x4
– But unfortunately, it does not informs us when 3 or more
variables are linear - e.g. x2t + x3t = x4t
– Another indicator: the condition number of the matrix X ′ X :
– near-multicollinearity means that X ′ X is close to being
singular;
– the condition number is the ratio of the largest singular value
and the smallest singular value (in the SVD): when it is
”large” it means that the matrix is ill-conditioned, aka close to
being singular;
– how large? log(C ) > precision of the matrix entries
29
• What can be done in presence of near-multicollinearity?
– Regularization methods, such as ridge regression or principal
components. But these may bring more problems than they
solve.
– Some econometricians argue that if the model is otherwise
OK, just ignore it
– The easiest ways to “cure” the problems are
– drop one of the collinear variables
– transform the highly correlated variables into a ratio
– go out and collect more data e.g.
– a longer run of data
– switch to a higher frequency
30
5. Omitting/Including some variables
Omission of an Important Variable
• Consequence: The estimated coefficients on all the other
variables will be biased and inconsistent unless the excluded
variable is uncorrelated with all the included variables.
• Even if this condition is satisfied, the estimate of the
coefficient on the constant term will be biased.
• The standard errors will also be biased.
Inclusion of an Irrelevant Variable
• Coefficient estimates will still be consistent and unbiased, but
the estimators will be inefficient.
31
Overall
• Bias-variance tradeoff in finite samples: fewer regressors leads
to more precise estimates, while more regressors lead to less
bias.
• Asymptotically, only the bias remains as it is higher order than
the variance: when scaling the difference btw estimator and
√
true param. value by n, the associated bias explodes while
the variance is constant.
32
6. Errors correlated with regressors (endogeneity)
• We already discussed 2 examples:
– when lagged variables are introduced in the model and the
error term is still autocorrelated;
– when relevant variables are omitted.
• Solution: find at least one instrumental variable for each
endogeneous regressor (you can use more) and compute the
2SLS or IV estimator.
– IV Z needs to be correlated with endog. var. X , but cannot be
correlated with the error term (so Z is exog. and highly
correlated with X )
• Updated regularity conditions:
1. {(zi , xi , ǫi )} is a strictly stationary and ergodic sequence.
2. E (zi′ xi ) is full rank.
3. {(zi′ ǫi , Fi )} is a MDS.
4. E (xi4,j ) < ∞; E (zi4,k ) < ∞; E (ǫ2i ) = σ 2 < ∞.
33
Download