Lecture 5 – Large Sample Results for OLS in a Time Series Setting

advertisement
Lecture 5 – Large Sample Results for OLS in a
Time Series Setting
(Reference – Section 2.3,2.4.2.5,2.6, Hayashi)
We will formulate a set of conditions under which the
OLS estimator is consistent and asymptotically
normally. These conditions will replace the
assumption that the regressors are i.i.d. with the
assumption that they are stationary and ergodic. The
conditions for consistency will allow for serial
correlation and conditional heteroskedasticity in the
disturbances. The conditions for asymptotic
normality will rule out serial correlation in the
disturbances but will allow for conditional
heteroskedasticity. The proof of these theorems,
which are provided in the text, rely on the Ergodic
Theorem (consistency) and the Martingale
Differences Central Limit Theorem (asymptotic
normality).
Sufficient Conditions for the Consistency of the OLS
Estimator
Assume the univariate stochastic processes {yt} and
{t} and the k-dimensional stochastic process {xt} are
generated accorded to:
A.1. (Linearity)
yt  xt'    t
t = 1,2,…
A.2. (Stationarity and Ergodicity)
{yt,xt} [or, equivalently, {t,xt}] is a jointly stationary
and ergodic process.
A.3. (Orthogonality Condition)
E(xtit) = 0 for i = 1,…,k and t = 1,2,…
A.4. (Rank Condition)
E ( xt xt' )   xx , a finite p.d. matrix, for t = 1,2,…
ˆ
.
Under these conditions,  T 
a.s.
Some comments on these asssumptions1. We have replaced the i.i.d. assumption with
stationarity, a more time series friendly
assumption that allow for temporal dependence
in the regressors and the disturbances (although
we will put some additional restrictions on the
temporal dependence of the disturbances when
we formulate the conditions for asymptotic
normality).
2. Assumption A.3 requires that the disturbances
are “contemporaneously uncorrelated” with the
regressors. This “orthogonality condition” is
weaker than the condition that E(t│xt, xt-1,…) =
0, which, in turn, is weaker than the strict
exogeneity condition.
Note: Hayashi correctly points out in footnote
10, the phrase “predetermined regressors” is used
differently by different people. Some, like me,
use it to mean E(t│xt, xt-1,…) = 0 or the weaker
condition E(xt-st) = 0 for all s > 0. Hayashi
prefers to use it to mean E(t│xt) or the weaker
condition E(xtt) = 0.
3. According to A.3, E(t) = 0 if xt includes an
intercept and according to A.2 if the variance of
t is finite then it is constant, i.e., E(t2) = 2 for
all t (and, so, the disturbances are
unconditionally homoskedastic).
----Let xt1 = 1 for all t. Then E(t) = 0 is an
immediate consequence of (A.3).
By (A.2), t is stationary, so if it has a finite
variance that variance is the same for all t.
----4. The assumptions do not rule out conditional
heteroskedasticity in the ’s. That is,
the model does not restrict the dependence of
E(t2│t-1, t-2, …,xt, xt-1,…) on current and past
x’s or on past ’s. So, for example, ARCH
disturbances are consistent with this set of
assumptions.
5. The rank condition is essentially a no
multicollinearity (in the limit) condition.
[Since the x’s form a stationary and ergodic
process, xtxt’ is also stationary and ergodic. By
the Ergodic Theorem,
1 T
xt xt'   XX

a.s.
T 1
that is,
1 T
xt xt'

T 1
large T.]
is a.s. nonsingular for sufficiently
Sufficient Conditions for the Asymptotic Normality
of the OLS Estimator
We will add the following assumption to (A.1)-(A.4)
above. (Or since the added assumption will be
stronger than (A.3) we could simply replace (A.3)
with this assumption.)
A.5
{xtt} is a (k-dimensional) martingale difference
sequence with finite second moment matrix, S.
Then under (A.1)-(A.5)
T ( ˆT   )  N (0, a var( ˆ )) ,
d
1
1
ˆ
a
var(

)


S

where
XX
XX .
Some comments on these asssumptions1. The m.d.s. assumption on {xtt} is stronger than
A.3 because an m.d.s. is a zero mean sequence.
A sufficient condition for A.3E(t│xt,xt-1,…) = 0.
A sufficient condition for the m.d.s. part of A.5E(t│xt,xt-1,…,t-1,t-2,…) = 0.
2. These assumptions imply that t is uncorrelated
with current and past x’s and with past ’s. So
they do not allow for serial correlation in the ’s.
They do allow for predetermined but not strictly
exogenous regressors and they do allow for
conditionally heteroskedastic disturbances.
2 '
2
'
S

E
(
x

x
)

E
(

x
x
3.
t t t
t t t)
So, the assumption that S is finite is a “fourthmoment” restriction.
To apply this result to hypothesis testing or
confidence interval construction, we will need a
1
1
ˆ
consistent estimator of a var(  )   XX S XX .
1
1
A consistent estimator of  XX is S XX , where
S XX
1 T
  xt xt' . This follows from the Ergodic
T 1
Theorem. Suppose we have a consistent estimator of
1
1
Ŝ
Ŝ
S
S
S, T . Then XX T XX is a consistent estimator of
a var( ˆ ) .
Note that if we add Hayashi’s assumption A.6,
E[(xtixtj)2] < ∞
then
1
SˆT 
T
T
 ˆ
x xt'  S
2
t t
1
p
In the special case where the ’s are conditionally
homoskedastic with variance 2
S=
E( t2 xt xt' )  E[ E( t2 xt xt' xt )]   2 E( xt xt' )   2  xx
2 1
ˆ
and so, in this case, a var(  )    XX .
ˆ
In this case, a consistent estimator of a var(  ) will be
1
2
ˆ 2 S XX
̂
where
is any consistent estimator of 2.
Consistent estimators of 2 under these assumptions
1
1
SSR
include T
and T  k SSR
Applications to Hypothesis Testing
1. Assume that
T ( ˆT   )  N (0, a var( ˆ ))
d
and
1
1
ˆ
Ŝ T S XX
S XX
is a consistent estimator of a var(  ) ,
where Ŝ T is a consistent estimator of S.
Then
ˆ
ˆ
N (0,1)
ti = ( i   i ) / se( i ) 
d
where
se(ˆi ) 
1 1 ˆ 1
( S xx SS xx ) ii
T
Comments –
i.
Recall that the t(n-k) distribution converges to
the N(0,1) distribution as n goes to infinity, so
that in large samples whether we use the t(n-k)
distribution or the N(0,1) distribution will not
matter (i.e., will not give different test
outcomes).
ii. This standard error is called the
heteroskedasticity-consistent standard error
(heteroskedasticity-robust standard
error,White’s standard error), since it is valid
even if there is conditional heteroskedasticity
in the disturbances.
iii. If we make the stronger assumption that the
disturbances are conditionally
homoskedastic, then this standard error
reduces to the “usual” form:
se(ˆi )  ˆ 2 ( X ' X ) ii1
2. Under A.1 to A.5 and the restriction that R = r,
where R is a qxk matrix and r is a qx1 vector of real
numbers, where rank(R) = q,
W≡
1
1
T ( Rˆ  r )' ( R S XX
Ŝ T S XX
R ' ) 1 ( Rˆ  r )
  2 ( q)
d
where Ŝ T is any consistent estimator of S.
Note that if the disturbances are conditionally
homoskedastic, this statistic reduces to
1
T ( Rˆ  r )' ( R ˆ 2 S XX
R ' ) 1 ( Rˆ  r )
=
( Rˆ  r )' ( R( X ' X ) 1 R' ) 1 ( Rˆ  r ) / ˆ 2
=
qF
where F is the usual OLS F-statistic.
{Recall: F(T-k,q) converges in distribution to χ2(q).}
3. Suppose A.1 – A.5 hold and g() = 0, where g is
a set of q nonlinear functions such that G(),
the qxk matrix of (continuous at ) first
derivatives evaluated at , has rank q. Then
1
1
Ŝ T S XX
W  Tg ( ˆ )' (G ( ˆ ) S XX
  2 ( q)
d
G ( ˆ )' ) 1 g ( ˆ )
Download