Notes on Violation of Gauss Markov

advertisement
Econ 301.02
Econometrics
TASKIN
Notes on the violation of the Gauss Markov assumptions
Heteroskedasticity:
One of the important assumption of classical regression model is that the variance of
Var(ui ) = s 2 , and the violation of this assumption is
the error term is constant, ie.
known as the heteroskedasticty problem in the regression anaylsis.
When the error term does not have a constant variance the problem can be stated
as:
Var(ui ) = s i2
What might be the causes of heteroskedasticity:
 In some cases the dependent variable shows a larger variability at different
levels of the explanatory variable. Examples of this phenomenon can be
observed in the analysis of consumption behavior or saving behavior. In most
cases, there is little variability in the desired consumption amounts at low
levels of disposable income but larger variability in the desired consumption
at higher disposable income levels. The similar observation may be made for
saving behavior. At high income levels the spending on necessities make up a
lower percentage of total spending and hence the amount of spending
beyond that level can be different for each high income family. A similar
observation can be made in the analysis of firm dividends explained by the
level of profits. There is low variability in the amounts of dividends
distributed by firms with low profits but high variability in the dividends of
high profit firms.
 Heteroscedasticity may be the results of outlier observations.
 Other forms of misspecification can also produce heterscedasticity. For
example is a variable is omitted, the resulting error terms may exhibit a
pattern similar to the omitted variable. For example if your are estimating a
demand function and you missed the price of other variables that is an either
a substitute good or a complementary good, the errors in the misspecified
model will behave as the omitted price variable. As long as the variable
omitted is not linearly correlated with the explanatory variables included
into the regression, the estimates will be unbiased.
 Another form of misspecification which looks like a heteroscedastic errors is
the choice of wrong functional form. If the relationship studied is a quadratic
relationship but if the squared variables are not included in to the estimation,
then model will exhibit heteroscedastic errors.
 The systematic measurement errors in one direction in the variables also
may lead to heteroscedastic error results.
Consequences of heteroscedasticity:
 The unbiasedness property of the OLS estimators does not change. OLS
esimates are still unbiased but they are not efficient. Therefore B.L.U.E. does
not hold. OLS estimators may not have the minimum variance among all the
linear unbiased estimators.
Eg. More distant observations from the true line will have a larger weight in
the determination of the slope coefficients, and hence tend to have
unbalanced influence on the result.(this is clearly the mistake that the
weighted least square estimation tagests to correct).
 Standard errors or variances are biased and have a different formula than the
one that is used in the OLS with well behaved error terms.
Eg. In the simple model: yi = b0 + b1 xi + ui , the variance of the slope estimator
is
Var(b̂1 ) =
s2
å(xi - x )2 under the conditions of the standard assumptions.
However, when the errors are htereoscedastic, then the variance becomes
å(xi - x )2 s i2
Var(b̂1 ) =
(å(xi - x)2 )2 . Hence the reported OLS results without ay correction
is the first one, when in fact the correct one is the second formula.
 The following t stats are also incorrect and and any test that follows the
reported that uses these variance estimates such as the conventionally
computed t or F statistics or confidence interval estimates are also wrong.
s 2 is also wrong, hence E(ŝ ) = s 2 does not hold.
The standard estimate of
Detection of the presence of the heteroscedasticity problem: Does the problem
of heretoskedasticity exists in a data that you are working with?
 Visual tests:
 Goldfeld and Quant test:
 White’s heteroskedasticity test:
 Breush-Pagan test:
Correction of the heteroscedasticity problem:
I. Generalized Least Square estimation: (Weighted Least Square estimation)
Here the objective is to minimize the weight of the large variance
observation and maximize the variance of the small variance observation.
If you have information on
s i2 , then the corrected estimation will be where
each observation is weighted by the inverse of this true error variance, i.e
.
The corrected Weighted Least Square estimation will be:
yi
1
x u
= b0 + b1 i + i
si
si
si si
s i2

s i2 iis impossible to obtain. Hence, the above
s2
GLS method can be used if there is a proxy for i .\There may be a
Var(ui ) = s i2 ,and yi , or xi j or
relationship between the error term . ui , hence
any other variable zi which may not be in the model. The starting point of
However, the information on
finding this relationship will be at the stages of detection of the
heteroscedasticity with either visual methods or methods that examine the
residual of the initial OLS estimation and the possible set of variables.
e.g. (1) If you think that the form of heresockedasticity is as follows:
Var(ui ) = s 2 .xi ,
which essentially says that there is a proportional relationship
s .
between the variance and the variable xi by a constant factor of
2
With this case the weighted relationship will be:
yi
1
x
u
= b0
+ b1 i + i
xi
xi
xi
xi
e.g. (2) If your judgment indicates that the form of heresockedasticity is as
follows:
Var(ui ) = s 2 .xi2 ,
which essentially says that there is a proportional relationship
between the variance and the square of the variable xi by a constant
factor of
s2.
With this case the weighted relationship will be:
yi
1
x u
= b0 + b1 i + i
xi
xi
xi xi
s i2 ,, then the results of the

If you had been able to know the true value of the
weighted least square would have been B.L.U.E.

Since this is not possible, then the estimation with the proxy for i ,the
estimators are consistent.
Another examples of a generalized least square estimation information is

s2
s2
when you have information that there are two different values of i .
e.g. (3) You know that or you were able to observe that for your sample that
covers the period 1971- 2010, the first sub period of 1971-1990 have low
variance but the subperiod of 1991-2010 has a larger variance for the error
term. It is possible to do weighted least squares here.
If you think that first sub perioEstima d the variance of the error term is s 1
and it is lower that the variance of the error term in the second sub period,
2
s 22 . The correction that you will do should follow the following steps:
i.
ii.
iii.
Estimate the equation with OLS and obtain the residuals.
ˆ2
ˆ2
Compute s 1 and s 1 , by using the residual values separately for each
subperiod.
Estimate the GSL (WLS) with the following equation:
 y1971   1 x1971 
 u1971 
 ˆ   ˆ

 ˆ 
ˆ1
 1   1

 1 
y
x
1
1972
1972

 

 u1972 
 ˆ1   ˆ1
 ˆ1 
ˆ1 
 .   .

 . 



.
0
 y    1 x     u 
1991
 1991  
  1   1991 
ˆ
ˆ
ˆ
  2   2  2 
 ˆ 2 
.
 .

.

.

 



 y 2010   1 x1991 
 u 2010 
 ˆ 2  ˆ 2 ˆ 2 
 ˆ 2 
Autocorrelation (Serial Correlation)
When error terms from different time periods are correlated. This problem occurs
usually in time series data. Most of the time the errors in adjacent time periods are
correlated. This violates the assumption of unrelated disturbance terms belonging
to different periods, i.e. Cov(uiu j ) = 0 . E(uiu j ) = 0 .
This becomes Cov(uiu j ) ¹ 0 . Since we are using a time series data, it is common to
use (t subscript rather than i), and the presence of autocorrelation can be despicted
as Cov(ut ut+s ) ¹ 0 where s ¹ 0 .
Since there are many different forms of correlation is possible between the error
terms with the above too general form, it is customary to specify a mechanism that
generates the error terms to create an autocorrelated error terms. One such
mechanism is the following:
ut = rut-1 + et
-1< r <1
The
r
The is known as the coefficient of autocorrelation. The et is the white noise error
term with all the desired properties.
This process that creates the ut is known as the first –order autoregressive scheme
(AR1) . 1
The properties of such error terms will be :
Var(ut ) =
2
s e2
s se
,
Cov(u
u
)
=
r
, Corr(ut ut-s ) = r s
t t-s
2
2
1- r
1- r
What might be the causes of autocorrelation:
 If there is an inertia in the economic series, especially in macroeconomic
series. The sluggish adjustment in the economic series creates this pure
autocorrelated error terms.
 If there is an excluded explanatory variable in the model, the effect of this
variable will be observed as a systematic factor in the error term. Usually this
type of autocorrelation problem is corrected by including the omitted
variable.
There can be higher order processes if the error term today is related to just not
the the error term last period but the period before that. An equation such as will
1
explain such a phenomena ut = r1ut-1 + r2ut-2 + et ...




Incorrect functional form also gives the autocorrelated error terms. The
correction of the functional form in the estimation also corrects the problem.
Lagged dependent variable is an explanatory factor, especially in slow
adjusting variables. If this lagged dependent is omitted from the model, then
the error will contain the effect of this lagged dependent variable and the
errors will be correlated.
The nontsationarity of the series also will create a correlated error term.s
Data transformations may also create correlation in the error terms, if the
original errors are uncorrelated.
Consequences of autocorrelation
 The unbiasedness property of the OLS esimators does not change. OLS
esimates are still unbiased but they are not efficient. Therefore B.L.U.E. does
not hold. OLS estimators may not have the minimum variance among all the
linear unbiased estimators.
 The estimators are linear unbiased, as well as consistent and asymptotically
normally distributed. But they are not efficient.
Var(b̂1 )
 The formula for
is not the standard formula which is
Var(b̂1 ) =
s2
å(xi - x )2
Hence if we continue to use the standard variance reported for statistical
test such as t or F we, they will be incorrect.

Var(b̂ )

1 AR the correctly calculated formula for the variance is
Even if the
used the confidence intervals will still be wider than the confidence intervals
which may be commputed with an alternative estimator (such as GLS).
sˆ 2 with the standard formula is likely to
The estimated error variance

underestimate the true value of
We are likely to overestimate R2.


s2.
s 2 is correctly estimated , still Var(b̂1 ) will underestimate Var(b̂1 )AR
Even if
Therefore the usual t and F tests are no longer valid and is likely to give
misleading results.
(The relationship between
sˆ 2 and the true s 2 )
Detection of the presence of the autocorrelation problem: Does the problem
exits or not?
Visual test: Run the OLS estimate of the initial model yt = b0 + b1 xt + ut , obtain

the residuals ût , plot the these residuals to see if the same sign tends to
follow each other. (+ residuals following the previous + residuals and –
residuals following the – coefficients)
Durbin Watson test
o Conditions that is necessary for the use of Durbin-Watson.
1. DW can only test for the first order autocorrelation, ie. AR(1),

2. ut should have normal distribution,
3. xt should be nonstochastic (fixed in repeated samples),
4. model should have an intercept term,
5. the model should not include the lagged dependent variable as an
explanatory variable since this creates the problem of endogeneity
if there is autocorrelated error terms.
H o :   0,
The null hypothesis is H :   0, and if DW is less than dlower critical
A
value, then we can reject the above null hypothesis.
t test for the AR(1) term in the following equation:

by using the residuals of the OLS estimates of the original model.
Breusch-Godfrey test: Allows for higher order autocorrelation and
regressors, such as lagged dependent variable.
y   0  1 xt  u t
Step 1: Run the OLS estimate of the initial model t
, obtain

residuals ût ,
Step 2: Run the following regression:
uˆ t   o   1 xt  1uˆ t 1   2 uˆ t  2   3 uˆ t 3  ...   p uˆ t  p  et
;
p
And then test the joint significance of the coefficients of autocorrelation
the
2
terms, with the following c distribution:
2
2
(n  p) Ruˆ ~  p
if the null hypothesis of all p
autocorrelation terms.
coefficients of
Correction of the autocorrelation problem:
I. Generalized Least Square estimation: (Weighted Least Square estimation)
When there is pure autocorrelation problem, it is possible to transform the
model to get rid of dependency of the error terms. This is one version of
Generalized Least Square estimation method.
In a simple regression model
-1< r <1 ;
y t   0  1 xt  u t
with the error term ut = rut-1 + et and
The same equation will also hold for the time period t-1 and multiplied by r :
yt 1   0  1 xt 1  ut 1 ,
Then the last equation is subtracted from the first one which gives the
following equation:
yt  yt 1   0 (1   )  1 ( xt  xt 1 )  (u t  u t 1 )
This equation has uncorrelated error terms which satisfies the Cov(et et-1 ) = 0
condition. Hence the above transformations performs the necessary
correction for the problem of first order autocorrelation problem. The new
equation is with transformed variables and the application of the OLS
estimation to this new variables is known as Generalized Least Square
estimation. The transformed equation is:
*
y t*   0 x ot*   1 xt*  et
y *  y t   y t 1 xt*  xt  xt 1 x0t
=1- r
where t
,
,
In order to perform this estimation, we need a value for r . However the true
value of r is unknown. It is possible to only come up with an estimate of r .
There are several methods we can use to estimate r . (1) method is to use DW
ˆ
statistics and to use the DW  2(1   ) ; (2) method is to estimate the
ˆ
ˆ
following equation u t  u t 1  et by using the residual from the initial OLS
estimate and use the estimated r̂ .
The GLS estimation conducted with the estimated r̂ values provide
ˆ GLS 0 and ˆ GLS 1 2. This estimation can only be done
consistent estimates of
with (n-1) number of observations, since one observation is lost in the
transformation3. To avoid any inefficiency due to a loss of observation,
especially in cases when the n is not large, another transformation can be
conducted for the first observation. This transformation is as follows:
*
y1 1  ˆ 2
x 1  ˆ 2
y
is used for the first observation of 1 and 1
is
*
x
used for the first observation of 1 .
Endogeneity Problem:
Another important assumption about the error term is that it is uncorrelated with
the explanatory variables. Hence, when we assume that the explanatory variables
Cov(u i , xi )  0
are fixed in repeated sample, ( xij 's are not stochastic) then the
. With
2
r
If the true value of
had been known, then GSL estimates would have been unbiased
3 To avoid any inefficiency due to a loss of observation, especially in cases when the n is not large,
another transformation can be conducted for the first observation. This transformation is as follows:
y1 1  ˆ 2
*
is used for the first observation of
y1
and
x1 1  ˆ 2
for the first observation of
x1
*
.
this
assumption we can state the Gauss
E(ui )  0, Var(ui )   2 , Cov(ui , u j )  0 .
Markov
assumptions
as
However, is the assumption of fixed explanatory variables is violated and the
explanatory variables are also random variables drawn from a distribution then the
same Gauss Markov assumptions has to be stated as:
E(ui | xi )  0, Var(ui | xi )   2 , Cov(ui , u j | xi x j )  0
This is to say that no matter what explanatory variable values are observed the
assumptions given the value of x’s still holds. Then the unbiasedess can be stated as
E(ˆ1 | xi )  1
However, there can be cases where the explanatory variable is contemporaneously
correlated with the error terms. The cases that lead to this correlations are as
follows:
1) Errors in measurement of the explanatory variables,
2) Omitted variables (correlated with the included variable),
3) Jointly determined variables (simultaneity),
4) Lagged dependent variables with autocorrelated error terms,
These create the endogeneity of the explanatory variables. Under these
conditions the OLS estimator will be biased and inconsistent, which is known as
the endogeneity problem.
The proof of this inconsistency:
OLS estimator of the slope coefficient in the simple linear regression model is
given as :
( xi  x )u i
( xi  x )( y i  y )
ˆ1 
 1 
2
( x i  x ) 2
( x  x )
i
p lim(( 1 / n) ( xi  x )ui )
cov( xi ui )
p lim ˆ1  p lim 1 
 1 
2
Var (ui )
p lim(( 1 / n) ( xi  x ) )
Since
cov( xi u i )  0
, the
p lim ˆ1   1
and hence the estimator is inconsistent.
The solution is to use Instrumental variable estimation technique, ie. IV
zi
estimator. You have to find another variable, which can be used as an
ui
xi
instrument that is uncorrelated with and closely correlated with . Then the
formula for the instrumental estimator is:
ˆ IV 1 
( z i  z )( y i  y )
( z i  z )( xi  x )
This estimator is going to be a consistent estimator.
Download