The Linear Regression Model with Autocorrelated Disturbances

advertisement
The Linear Regression Model with Autocorrelated
Disturbances: Large Sample Theory
(References– Greene, Chapter 13; White, Chapter 6)
Now we will consider estimation and inference in the
linear regression model with autocorrelated
disturbances from the point of view of large sample
theory, which is the more modern way to think about
these.
First, consider the problem of testing for
autocorrelation. In the finite sample case we have a
single approach (the DW test), which relies on
strictly exogenous regressors and homoskedastic
disturbances.
Not surprisingly, evidence of autocorrelation is much
easier to find in very large samples. Consequently,
we have a variety of asymptotically valid approaches,
which allow for predetermined or even, in some
cases, simply orthogonal regressors and, in some
cases, heteroskedastic disturbances.
Testing for serially correlated disturbances in
regressions with strictly exogenous regressors and
homoskedastic disturbances –
 Durbin-Watson test
 Box-Pierce, Ljung-Box Q tests
 Fit the regression
ˆt  ˆt 1  ut
or
ˆt  1ˆt 1  ...   p ˆt  p  u t
by OLS, where the ε-hats are OLS residuals
from the regression of y on x, then apply a
standard t-test or F-test to test H0: ρ = 0 or
H0: ρ1 =…= ρp = 0.
(Heteroskedasticity-robust versions of this
test exist.)
Testing for serially correlated disturbances in
regressions with predetermined (or strictly
exogenous) regressors and homoskedastic
disturbances –
 Durbin h-test
 Modified Box-Pierce Q Test
 (Durbin-) Breusch-Godfrey Test
Breush-Godfrey Test –
Fit the regression
ˆt  xt' b  ˆt 1  ut
Under the null of no serial correlation in the ε’s,
the t-statistic associated with the OLS estimator
of ρ is asymptotically N(0,1). (Durbin)
Or, fit the regression
ˆt  xt' b  1ˆt 1  ...   pˆt  p  ut
Under the null of no serial correlation in the ε’s,
the LM-statistic, (T-p)R2, is asymptotically χ2(p).
(Heteroskedasticity-robust versions of this test
exist.)
Next we turn to estimation and inference.
Suppose we conclude that the disturbances are
autocorrelated. Then we can either
 apply OLS and use a properly-adjusted variance
matrix for ̂ OLS
or
 apply an asymptotically efficient procedure, i.e.,
FGLS.
Correcting the OLS estimator for autocorrelation –
When the regressors and disturbances are
autocorrelated but meet appropriate moment and
memory conditions (e.g., orthogonality of regressors
and disturbances, stationarity, asymptotically
uncorrelated disturbances, the existence of k-th
moments for sufficiently large k), then
T ( ˆT   )  N (0, a var( ˆ )) ,
d
where
a var( ˆ )   XX1 S XX1
 xx  E ( xt xt' )
S=
E ( t2 xt xt' ) +

'
'
E
[(


)(
x
x

x
x
 t t i t t i t i t ]
i 1
{Note that the difference between this and the
asymptotic variance matrix we derived earlier under
the i.ni.d. assumption, i.e., the assumption that the ε’s
form a conditionally heteroskedastic m.d.s., is the
second part of S.}
To apply this result we need a consistent estimator of
the asymptotic variance matrix.
Under appropriate conditions we know that we can
apply an LLN (e.g., the Ergodic Theorem) to show
that
1 T
xt xt'   XX

.
T 1
(a.s. or plim)
We need a consistent estimator of S A popular and commonly available consistent
nonparametric estimator of S is the Newey-West
heteroskedasticity-autocorrelation consistent (HAC)
estimator:
1 T 2 ' M T
ˆ
S NW  { ˆt xt xt    wmˆt ˆt m ( xt xt' m  xt m xt' )}
T 1
m 1 t  s 1
where wm = 1-m/(1+M), m = 1,…,m.
Practical issue – selecting the proper M.
(This is similar to the lag length selection problem in
fitting AR’s and VAR’s)
Note that this is a nonparametric approach – it does
not require us to specify a parametric model of the
disturbance process nor does it require exogenous
regressors. However, this approach is not
asymptotically efficient.
An asymptotically efficient estimator when the
regressors are strictly exogenous is the FGLS
estimator, which relies on the (correct) specificiation
of a parametric model of the disturbance process.
Recall that the FGLS estimator of β is
ˆ 1 X )1 X ' 
ˆ 1Y
ˆFGLS  ( X ' 
where ̂ is any consistent estimator of Ω
(Σ=σ2 Ω = E(εε’), where σ is an arbitrary constant).
Under appropriate additional conditions on the
regressors and the disturbances, the FGLS estimator
is consistent, asymptotically normal and
asymptotically efficient with
ˆ 1 X / T )1
a var( ˆFGLS )  ˆ 2 ( X ' 
In the FGLS approach, a parametric model of the ε’s
is formulated and estimated to obtain ̂ . That is,
Ω=Ω(γ), γ an unknown paramter vector, and
ˆ  (ˆ ).

Suppose, for example, assume that the error process
is a stationary AR(1) process, i.e.,
εt = ρεt-1 + vt , │ρ│< 1
where vt is a white noise process with variance σv2.
Without loss of generality, let’s assume that there is a
single regressor so that:
yt = β0 + β1xt + εt
εt = ρεt-1 + vt , │ρ│< 1
vt ~ wn(0,σv2)
In this case, it can be shown that
  (  ) 
1

2



1

 T 1
 T 2

 T 1
 T 2


1


 T 3
and C’C = Ω-1 , where
C = C(ρ) =
(1   2 )1 / 2

0
0
1

0
0
1
0
0
0
0
0
0

0



0
0
0
0
0
0


1
Consequently, the GLS estimator β is the OLS
estimator applied to the transformed data:
(1   2 )1/ 2 y1 


y


y
1
 2

~
y  Cy   y3  y2 





 y  y 
T 1 
 T
 (1   2 )1 / 2

 1 
~
x  Cx   1  



 1 

(1   2 )1 / 2 x1 

x 2  x1 
x3  x 2 



xT  xT 1 
That is, we transform the data matrix by “quasidifferencing” observations 2,…,T (and the first
observation is simply multiplied by sqrt(1-ρ2)).
To estimate the model by FGLS, we need a
consistent estimator of ρ. A consistent estimator of ρ
is found from the regression of ˆ t on ˆt 1 , where
ˆ t is the OLS residual from the regression of y on
1,x.
So, in this case, the FGLS estimator is a sequence of
three regressions –
1. Regress yt on 1,xt to obtain ˆ t
2. Regress ˆ t on ˆt 1 to obtain
̂
~
~
3. Regress y ( ˆ ) on x ( ˆ ) to obtain ˆ
This is also sometimes referred to as the (“two-step”)
Prais-Winsten estimator of β.
Notes –
~
1. Sometimes, the first observations of y ( ˆ ) and
~
x ( ˆ ) are dropped in Step 3 for convenience.
The resulting estimator is called the CochraneOrcutt estimator. It is asymptotically
equivalent to the Prais-Winsten estimator but
may not do as well in modest samples,
especially when ρ is close to 1.
2. Iterative versions of the P-W and C-O
estimators are sometimes applied:
First, follows steps 1-3 to obtain ˆ . Then
use ˆ to construct new ˆ ’s. Follow steps
t
2 and 3. … Continue until ˆ converges.
3. These ideas extend in a straightforward way to
the case where the ε’s follow a higher-order
AR process (and/or additional explanatory
variables).
For example, if p = 2, the second step of the
C-O estimator would be:
Regress ˆ t on ˆt 1 and ˆt  2 to get
̂1 and
̂ 2
The third step would be:
̂1 yt-1 - ̂ 2 yt-2) on
(1- ̂1 - ̂ 2 ) and (xt - ̂1 xt-1 - ̂ 2 xt-2) for
Regress (yt -
t = 3,…,T to get ˆ1 and ˆ 2 .
For p > 2, the C-O is usually preferred in practice
to the P-W estimator because of the increasingly
complicated form of the transformations for the
first p observations of y and x.
If the regressors are predetermined but not strictly
exogenous, apparently the estimator of ρ given in
step 2 of the C-O and P-W estimators will be
inconsistent (and, therefore, the FGLS estimator of β
will be inconsistent, too).
There appear to be at least a couple of ways to
proceed in this case to construct a consistent and
asymptotically efficient estimator of β. These involve
the joint estimation of ρ and β to minimize the sum of
squared v’s.
1. Nonlinear least squares
2. Maximum likelihood
The NLS Estimator –
Let
yt = β0 + β1xt + εt, εt = ρεt-1 + vt
Then
ρyt-1 = ρβ0 + ρβ1xt-1 + ρεt-1
and so,
yt – ρyt-1 = (1-ρ)β0 + β1xt –ρβ1xt-1 + (εt – ρεt-1)
or, rearranging,
yt = (1-ρ)β0 + β1xt –ρβ1xt-1 + ρyt-1 + vt
T
Minimize  vt2 (  ,  0 , 1 ) with respect to ρ, β0, β1.
2
The (quasi-) maximum likelihood estimator –
2 1
ln L = constant + 0.5ln(1-ρ ) – (T/2)lnσv
2
2
T
2
v
v
2
t
1
where
v1 = (1-ρ2)(y1-β0-β1x1)
vt = yt - (1-ρ)β0 - β1xt +ρβ1xt-1 - ρyt-1 for t > 1
and
1 T 2
   vt
T 1
2
v
Minimize ln L with respect to β0, β1, and ρ.
Download