Review from last time 1 14.384 Time Series Analysis, Fall 2007

advertisement
Review from last time
1
14.384 Time Series Analysis, Fall 2007
Professor Anna Mikusheva
Paul Schrimpf, scribe
October 23, 2007
revised October 22, 2009
Lecture 17
Unit Roots
Review from last time
Let yt be a random walk
yt = ρyt + t , ρ = 1
where t is a martingale difference sequence (E[t |It ] = 0), with
Then {t } satisfy a functional central limit theorem,
1
T
P
E[2t |It−1 ] → σ 2 a.s. and E4t < K < ∞.
[τ T ]
1 X
ξT (τ ) = √
t ⇒ σW (·)
T t=1
We showed last time that:


1
yt−1 t
T
P
1
y
T 3/2P t−1
1
2
y
t−1
T2
P


⇒


R1
W (t)dW (t)
0R

1
σ 0 W (t)dt

R
1
σ 2 0 W (t)2 dt
σ2
and our OLS estimator and t-statistic have non-standard distributions:
R
W dW
T (ρ̂ − ρ) ⇒ R 2
W dt
R
W dW
t ⇒ qR
W 2 dt
This is quite different from the case with |ρ| < 1. If
xt = ρxt−1 + t , |ρ| < 1
then,


√1
T
1
T
1
T


P

σ4
N (0, 1−ρ
xt−1 t
2)
P

⇒
 Ext = 0 
P xt2−1
2
σ
xt−1
Ex2t = 1−ρ
2
and the OLS estimate and t-stat converge to normal distributions:
√
T (ρ̂ − ρ) ⇒N (0, (1 − ρ2 ))
t ⇒N (0, 1)
Cite as: Anna Mikusheva, course materials for 14.384 Time Series Analysis, Fall 2007. MIT OpenCourseWare (http://ocw.mit.edu),
Massachusetts Institute of Technology. Downloaded on [DD Month YYYY].
Adding a constant
2
Adding a constant
Suppose we add a constant to our OLS regression of yt . This is equivalent to running OLS on demeaned yt .
Let ytm = yt − ȳ. Then,
P m
y t
ρˆ − 1 = P t−1
m )2
(yt−1
Consider the numerator:
1X m
1X
yt−1 t =
(yt−1 − ȳ)t
T
T
1X
=
yt−1 t − ȳ¯
T
R
P
P
P
1
1
ȳ¯ = T 3/2
yt−1 √1T t . We know that T 3/2
yt−1 ⇒ σ W dt, and √1T
t ⇒ σW (1). Also, from before
R
P
we know that T1
yt−1 t ⇒ σ 2 W dW . Combining this, we have:
Z
Z
1X m
yt−1 t ⇒σ 2
W (s)dW (s) − W (1) W (t)dt
T
We can think of this as the integral of a demeaned Brownian motion,
Z
Z
Z Z
W (s)dW (s) − W (1) W (t)dt =
W (s) − W (t)dt dW (s)
Z
= W m (s)dW (s)
The limiting distributions of all statistics (ρ̂, t, etc) would change. In most cases, the change is only in
replacing W by W m . One also can include a linear trend in the regression, then the limiting distribution
would depend on a detrended Brownian motion.
Limiting Distribution
Let’s
return to the case without a constant. The limiting distribution of the OLS esitmate is T (ρ̂ − 1) ⇒
R
R W dW . This distribution is skewed and shifted to the left. If we include a constant, the distribution is even
W 2 dt
more shifted. If we also include a trend, the distribution shifts yet more. For example, the 2.5%-tile without
a constant is -10.5, with a constant is -16.9, and with a trend is -25. The 97.5%-tiles are 1.6, 0.41, and -1.8,
respectively. Thus, estimates of ρ have bias of order T1 . This bias is quite large in small samples.
The distribution of the t-statistic is also shifted to the left and skewed, but less so than the distribution
of the OLS estimate. The 2.5%-tiles are -2.23, -3.12, -3.66 and of the 97.5%-tiles are 1.6, 0.23, and -0.66 for
no constant, with a constant, and with a trend, respectively.
Allowing Auto-correlation
So far, we have assumed that t are not auto-correlated. This assumption is too strong for empirical work.
Let,
yt = yt−1 + vt
where vt is a zero-mean stationary process with an MA representation,
vt =
∞
X
cj j
j=0
Cite as: Anna Mikusheva, course materials for 14.384 Time Series Analysis, Fall 2007. MIT OpenCourseWare (http://ocw.mit.edu),
Massachusetts Institute of Technology. Downloaded on [DD Month YYYY].
Augmented Dickey Fuller
3
where jPis white
Now we need
P noise. P
Pto look at the limiting distributions of each of the terms we looked at
before ( vt ,
yt−1 vt ,
yt2 , etc).
vt is a term we already encountered in HAC estimation:
T
1 X
√
vt ⇒ N (0, ω 2 )
T t=1
where ω 2 is the long-term variance,
ω2 =
∞
X
γj = σ 2 c(1)2 ≡ ω 2
j=−∞
For linearly dependent process, vt , we have the following central limit theorem:
[T τ ]
1 X
ξT (τ ) = √
vt ⇒ ωW (·)
T t=1
This can be proven by using the Beveridge-Nelson decomposition. All the other terms converge to things
similar to the uncorrelated case, except with σ replaced by ω. For example,
y
√ = ξT (t/T ) ⇒ ωW (t/T )
T
Z
1 X
⇒
ω
W (s)ds
y
t−1
T 3/2
An exception is:
1X
1
1 X 2
yt−1 vt = yT2 −
vt
T
2T
2T
1
⇒ (ω 2 W (1)2 − σv2 )
2
Z
ω 2 − σv2
2
⇒ω
W dW +
2
This leads to an extra constant in the distribution of ρ̂:
R
T (ρ̂ − 1) ⇒
ω 2 −σv2
ω2
W dW − 12
R
W 2 dt
This additional constant is “nuisance” parameter, and thus, the statistic is not pivotal. So it is impossible
to just look up critical values in a table. Phillips & Perron (1988) suggested corrected statistics:
R
1 2
ˆ − σ̂v2
W dW
2ω
T (ρ̂ − 1) + 1 P 2 ⇒ R 2
W dt
yt
T2
where ω̂ 2 is an estimate of the long-run variance, say Newey-West, and σ̂v2 is the variance of the residuals.
Augmented Dickey Fuller
Another approach is the augmented Dickey-Fuller test. Suppose,
yt =
p
X
aj yt−j + t
j=1
Cite as: Anna Mikusheva, course materials for 14.384 Time Series Analysis, Fall 2007. MIT OpenCourseWare (http://ocw.mit.edu),
Massachusetts Institute of Technology. Downloaded on [DD Month YYYY].
Augmented Dickey Fuller
4
where z = 1 is a root,
1−
p
X
aj z j =0
j=1
1=
X
aj
Suppose we factor a(L) as
1−
with b(L) = 1 −
Pp−1
j=1
p
X
aj Lj = (1 − L)b(L)
βj Lj . We can write the model as
∆yt =
p−1
X
βj ∆yt−j + t
j=1
This suggests estimating:
yt = ρyt−1 +
p−1
X
βj ∆yt−j + t
j=1
and testing whether ρ = 1 to see if we have a unit root. This is another way of allowing auto-correlation
because we can write the model as:
a(L)yt =t
b(L)∆yt =t
∆yt =b(L)−1 t
yt = yt−1 + vt
where vt = b(L)−1 t is auto-correlated.
The nice thing about the augemented
Dickey-Fuller test is that the coefficients on ∆yt−j each converge
√
to a normal distribution at rate T , and the coefficient on yt−1 converges to a non-standard distribution
0
without nuisance parameters. To see this, let xt = [yt−1 , ∆yt−1 , ..., ∆yt−p+1 ] and let θ = [ρ, β1 , ..., βp−1 ] .
Consider: θ̂ − θ = (X 0 X)−1 (X 0 ). We need to normalize this so that it converges. Our normalizing matrix
will be


T √...
0
 0
T
0 


Q= . .

..
 ..

√
0
T
Multiplying by Q gives:
Q(θ̂ − θ) =(Q−1 (X 0 X)−1 Q−1 )−1 (Q−1 X 0 )
The denominator is:

Q−1 (X 0 X)−1 Q−1 = 
1
2
TP
yt2−1
yt−1 ∆yt−1
..
.
P

1
T 3/2
1
T 3/2
P
yt−1 ∆yt−1
1 ˜0 ˜
XX
T
...



Cite as: Anna Mikusheva, course materials for 14.384 Time Series Analysis, Fall 2007. MIT OpenCourseWare (http://ocw.mit.edu),
Massachusetts Institute of Technology. Downloaded on [DD Month YYYY].
Other Tests
Note that
know that
5
P
p
→ 0 (since we proved that T1
yt−1 ∆yt−j ⇒ 12 (ω 2 W (1)2 − σv2 )). Also, we
2
W dt, so

 2R 2
ω W dt
0
...


EX̃ 0 X̃
0
Q−1 (X 0 X)−1 Q−1 ⇒ 

..
.
1
y ∆y
T 3/2
P 2 t−1 t2−Rj
1
y
t −1 ⇒ ω
T2
P
Similarly,
Q−1 X 0 =
1
T
√1
T
R
P
σ ω W dW
P yt−1 t
⇒
∆yt−j t
N (0, E [X̃ 0 X̃ ]σ 2 )
Thus,
T (ρ̂ − 1) ⇒
R
σ W dW
R
ω W 2 dt
or
R
W dW
ω̂
T (ρ̂ − 1)
T (ρ̂ − 1) =
Pˆ ⇒ R 2
σ̂
W dt
1 − βj
Other Tests
Sargan-Bhargrava (1983) For a non-stationary process, the variance of yt grows with time, so we could
look at:
P 2
1
y
T2
P t−1
1
2
∆y
t−1
T
when t is an m.d.s., this converges to a distribution,
P 2
1
yt−1
T 2P
1
2
∆yt−
1
T
⇒
R
W 2 (dt). For a stationary process, this
converges to 0 in probability.
Range (Mandelbrot and coauthors) You could also look at the range of yt , this should blow up for
non-stationary processes for a random walk, we have:
√1
T
(max yt − min yt )
q P
⇒ sup W (λ) − inf W (λ)
λ
1
2
λ
∆y
t
T
Test Comparison
The Sargan-Bhargrava (1983) test has optimal asymptotic power against local alternatives, but has bad size
distortions in small samples, especially if errors are negatively autocorrelated. The augmented Dickey-Fuller
test with the BIC used to choose lag-length has overall good size in finite samples.
Cite as: Anna Mikusheva, course materials for 14.384 Time Series Analysis, Fall 2007. MIT OpenCourseWare (http://ocw.mit.edu),
Massachusetts Institute of Technology. Downloaded on [DD Month YYYY].
MIT OpenCourseWare
http://ocw.mit.edu
14.384 Time Series Analysis
Fall 2013
For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.
Download