A Moving Average Sieve Bootstrap for Unit Root Tests Patrick Richard

advertisement
A Moving Average Sieve Bootstrap for Unit Root Tests
by
Patrick Richard
Department of Economics
McGill University
Montréal, Québec, Canada
Département d’économie
Université de Sherbrooke
Sherbrooke, Québec, Canada
ABSTRACT
This paper considers the use of bootstrap methods for the test of the unit root
hypothesis for a time series with a first difference that can be written as a general
linear process admitting an infinite moving average (MA(∞)) representation. The
standard test procedure for such cases is the augmented Dickey Fuller (ADF) test
introduced by Said and Dickey (1984). However, it is well known that this test’s
true rejection probability under the unit root null hypothesis is often quite different from what asymptotic theory predicts. The bootstrap is a natural solution to
such error in rejection probability (ERP) problems and ADF tests are consequently
often based on block bootstrap or autoregressive (AR) sieve bootstrap distributions. In this paper, we propose the use of moving average (MA) sieve bootstrap
distributions. To justify this, we derive an invariance principle for sieve bootstrap
samples based on MA approximations. Also, we demonstrate that the ADF test
based on this MA sieve bootstrap is consistent. Similar results have been proved
for the AR sieve bootstrap by Park (2002) and Chang and Park (2003). The finite
sample performances of the MA sieve bootstrap ADF test are investigated through
simulations. Our main conclusions are that it is often, though not always, more accurate than the AR sieve bootstrap or the block bootstrap, that it requires smaller
parametric orders to achieve comparable or better accuracy than the AR sieve and
that it is more robust than the block bootstrap to the choice of approximation
order or block length and slightly more robust to the data generating process.
I am very grateful to my thesis director, professor Russell Davidson, for suggesting this research and
for his constant guidance and support. I also thank professors John Galbraith and Victoria Zinde-Walsh
for their very insightful comments.
April 2006.
1
1
Introduction
We consider the problem of testing the hypothesis of the presence of a unit root
in a time series process yt when its first difference ∆yt is stationary. The simplest
unit root test procedure is the one devised by Dickey and Fuller (1979, 1981) and
consists in testing the hypothesis H0 : α=1 against the alternative H1 : |α| <1 in the
simple regression, often referred to as the Dickey-Fuller (DF) regression:
yt = αyt−1 + et
(1)
where et , which is equal to ∆yt under the null, is assumed to be white noise. Often, the DF regression is augmented with an intercept and a deterministic trend.
Under the null hypothesis, this regression is unbalanced, in the sense that it implies regressing a non-stationary variable on a stationary one. Accordingly, the
asymptotic distribution of the t-statistic of H0 against H1 , henceforth called the
DF statistic, is non-standard and is a function of Brownian motions. This distribution was first derived by Phillips (1987) and its critical values can be found in
Fuller (1996) among others. The major flaw of the DF testing procedure is that it
assumes that the error process et is a white noise. In reality, this is seldom true.
Unfortunately, this happens to be a crucial assumption for the derivation of the
asymptotic distribution of the DF test, so that whenever et is not white noise, the
DF test is not appropriate.
Numerous alternative testing procedures have been proposed over the years,
among which two have come to be widely used. The first, which is both the
simplest and the most popular, consists in augmenting the DF regression with
lagged values of ∆yt . Under some fairly weak conditions, this augmented Dickey
Fuller (ADF) test has been shown to be asymptotically consistent, see Said and
Dickey (1984) and Chang and Park (2002). The second, called the Phillips and
Perron (PP) test (Phillips 1987, Phillips and Perron 1988), seeks to modify the DF
test statistic non-parametrically so that it has the DF distribution asymptotically
under the null.
The success of these two tests at providing reliable inferences in finite samples
depends on their ability to correctly model the correlation structure of ∆yt . In some
cases, such as when ∆yt has an infinite correlation structure, this is impossible.
It is nevertheless still possible to show that both tests are asymptotically valid,
provided that the lag length (ADF) or the lag truncation (PP) increases at an
2
appropriate rate function of the sample size and that some regularity conditions
are respected. Unfortunately, even though they are asymptotically valid, these
tests are known to suffer from important error in rejection probability (ERP) in
small samples when the series ∆yt is highly correlated. 1
The bootstrap is a natural solution to ERP problems. It does however entail
the estimation of the model under the null. This is a necessary condition for the
bootstrap samples to display the same kind of correlation as the original sample.
This estimation must, of course, be consistent, but also as precise as possible in
small samples in order to yield accurate inferences. Indeed, the key feature of
bootstrap methods for dependent processes is to successfully replicate the original
sample’s correlation. One way to do this is to use a method called the block bootstrap (Künsch, 1989). This is carried out by first estimating the model under the
null and then resampling blocks of residuals. Another way is to approximate the
dependence structure of the residuals under the null by an increasing, finite order
model. This is called the sieve bootstrap. Insofar, the only sieve bootstrap procedure that has been considered by theoretical econometricians is the autoregressive
sieve bootstrap (see, among others, Bülhmann, 1997). This is mainly because it
is easy to implement and it has proven to perform better than asymptotic tests
in small samples. With modern computer technology and the advancement of estimation theory, other sieve bootstrap methods can now be considered. Among
these are MA and ARMA sieves.
In the present paper, we derive some asymptotic properties of MA sieve bootstrap unit root tests. We also propose to estimate the MA sieve bootstrap models
by the analytical indirect inference method of Galbraith and Zinde-Walsh (1994)
(henceforth, GZW (1994)). This method is computationally much faster than maximum likelihood, which is an asset for the bootstrap test if one needs to compute
several test statistics or perform simulation experiments. Moreover, as argued by
the authors, this method is more robust to under-specification and is more accurate because it minimizes the Hilbert distance between the estimated process and
the true one. We discuss these points below. Our results can however be extended
to maximum likelihood estimates without any problem since both estimators are
asymptotically consistent. In related work, we combine the results of the present
1 Several analytical solutions have been proposed to circumvent this problem. Among these,
one of the most popular is the non-parametric correction of the PP test proposed by Perron
and Ng (1996). A more recent one is the correction to the ADF test proposed by Johansen
(2005). We do not discuss them here.
3
paper to those derived by Park (2002) and Chang and Park (2003) (henceforth CP
(2003)) for AR sieve bootstrap unit root tests and propose a sieve bootstrap test
based on ARMA sieves. In the next section, we derive an invariance principle for
the MA sieve bootstrap. In section 3 we establish the asymptotic validity of ADF
tests based on these sieves. Finally, we present some simulation evidence in favor
of the MA sieve bootstrap ADF test in section 4, apply the technique to a quite
simplified test for PPP among a small set of European countries in section 5 and
we conclude and point to further research directions in section 6.
2
The invariance principle
It is a well known fact that standard central limit theory cannot be applied to
unit root tests. The standard tools for the asymptotic analysis of unit root tests
are theorems called invariance principles or functional central limit theorems. Establishing results of this sort for sieve bootstrap procedure is a relatively new
strand in the literature and only 2 attempts have been made to date. First, Bickel
and Bühlmann (1999) derive a bootstrap functional central limit theorem under
a bracketing condition for the AR sieve bootstrap. Second, Park (2002) derives
an invariance principle for the AR sieve bootstrap. The approach of Park (2002)
is more interesting for our present purpose because his invariance principle establishes the convergence of the bootstrap partial sum process and that most of the
theory on unit root tests is based on such processes. In this section, we derive a
similar invariance principle for processes generated by a MA sieve bootstrap. This
in turn allows us to prove, in section 3, that the MA sieve bootstrap ADF test
follows the DF distribution asymptotically under the I(1) null hypothesis.
2.1
General bootstrap invariance principle
Let {εt} be a sequence of iid random variables with finite second moment. Consider
a sample of size n and define the partial sum process:
[nt]
1 X
εk
Wn (t) = √
σ n
k=1
4
where [y] denotes the largest integer smaller or equal to y and t is an index such
j
that j−1
n ≤ t < n , j=1,2,...n. Thus, Wn (t) is a step function that converges to a
random walk as n→ ∞. Also, as n→ ∞, Wn (t) becomes infinitely dense on the [0, 1]
interval. By the classical Donsker’s theorem, we know that
d
Wn → W.
where W is the standard Brownian motion. The Skorohod representation theorem
tells us that there exists a probability space (Ω, F , P ), where Ω is the space containing all the possible outcome, F is a σ-field and P a probability measure, that
supports W and a process W0n such that W0n has the same distribution as Wn and
a.s.
Wn0 → W.
(2)
Indeed, as demonstrated by Sakhanenko (1980), W0n can be chosen so that
P r{kWn0 − W k ≥ δ} ≤ n1−r/2 Kr E|εt |r
(3)
for any δ > 0 and r > 2 such that E|εt |r < ∞ and where Kr is a constant that depends
on r only. The result ( 3) is often referred to as the strong approximation. Because
the invariance principle we seek to establish is a distributional result, we do not
need to distinguish Wn from W0n . Consequently, because of equations ( 2) and
( 3), we say that Wn a.s.
→ W, which is stronger than the convergence in distribution
implied by Donsker’s theorem.
Now, suppose that we can obtain an estimate of {εt }nt=1, which we will denote as
{ε̂t }nt=1 , from which we can draw bootstrap samples of size n, denoted as {ε? }nt=1 . If
we suppose that n→ ∞, then we can build a bootstrap probability space (Ω? , F ? , P ? )
which is conditional on the realization of the set of residuals {ε̂t}∞
t=1 from which
the bootstrap errors are drawn. What this means is that each bootstrap drawing
{ε?t }nt=1 can be seen as a realization of a random variable defined on (Ω? , F ? , P ? ).
In all that follows, the expectation with respect to this space (that is, with
respect to the probability measure P ? ) will be denoted by E? . For example, if
the bootstrap samples are drawn
from {(ε̂t − ε̄n )}nt=1 , then E? ε?t = 0 and E? ε?t 2 =
?
?
P
p
d
a.s.?
n
σ̂n2 = (1/n) t=1 ε̂2t . Also, →, → and → will be used to denote convergence in
distribution, in probability and almost sure convergence of the functionals of the
bootstrap samples defined on (Ω? , F ? , P ? ). Further, following Park (2002), for any
d?
sequence of bootstrapped statistics {Xn? } we say that X?n →
X a.s. if the conditional
?
distribution of { Xn } weakly converges to that of X a.s on all sets of {ε̂t}∞
t=1 . In other
5
?
d
words, if the bootstrap convergence in distribution (→
) of functionals of bootstrap
samples on (Ω? , F ? , P ? ) happens almost surely for all realizations of {ε̂t}∞
t=1 , then we
d?
write → a.s.
Let {ε?t }nt=1 be a realization from a bootstrap probability space. Define
[nt]
Wn? (t) =
1 X ?
√
εk .
σ̂n n
k=1
Once again, by Skohorod’s theorem, there exists a probability space on which a
Brownian motion W? is supported and on which there also exists a process W?n 0
which has the same distribution as W?n and such that
P r? {kWn? 0 − W ? k ≥ δ} ≤ n1−r/2 Kr E ? |ε?t |r
(4)
for δ , r and Kr defined as before. Because W?n 0 and W?n are distributionaly equivalent, we will not distinguish them in all that follows. Equation ( 4) allows us to
state the following theorem, which is also theorem 2,2 in Park (2002)
Theorem (Park 2002, theorem 2.2, p. 473) If E? |ε?t |r < ∞ a.s. and
a.s.
n1−r/2 E ? |ε?t |r → 0
(5)
?
d
for some r>2, then W?n →
W a.s. as n→ ∞.
This result comes from the fact that if condition ( 5) holds, then equation ( 4)
d?
implies W?n →
W? a.s. Since the distribution of W? is independent of the set of
?
? d
residuals {ε̂t}∞
t=1 , we can equivalently say Wn → W a.s. Hence, whenever condition
( 5) is met, the invariance principle follows.
2.2
Invariance principle for MA sieve bootstrap
We now establish the invariance principle for ε̂t obtained from a MA sieve bootstrap. We consider a general linear process:
ut = π(L)εt
where
π(z) =
∞
X
k=0
6
πk z k
(6)
and the εt are iid random variables. Moreover, let π(z) and εt satisfy the following
assumptions:
Assumption 1.
(a) The εt are iid random variables such that E(εt )=0 and E(|εt |r )< ∞ for some
r>4.
(b) π(z) 6= 0 for all |z| ≤ 1 and
P∞
k=0
|k|s |πk | < ∞
for some s ≥ 1.
These are usual assumptions in stationary time series analysis. Notice that (a)
along with the coefficient summability condition insures that the process is weakly
stationary. On the other hand, the assumption that π(z) 6= 0 for all |z| ≤ 1 is
necessary for the process to have an AR(∞) form. See CP (2003) for a discussion
of these assumptions.
The MA sieve bootstrap consists into approximating equation ( 6) by a finite
order MA(q) model:
ut = π1 εq,t−1 + π2 εq,t−2 + ... + πq εq,t−q + εq,t
(7)
where q is a function of the sample size. Our theoretical framework is built on the
assumption that the parameters of the MA sieve bootstrap DGP are estimated by
the analytical indirect inference method of GZW (1994) rather than by maximum
likelihood because we believe that it is more appropriate for the task. There are
several reasons for this. The first is computation speed. Consider that in practice,
one often uses information criterions such as the AIC and BIC to chose the order of
the MA sieve model. These criterions make use of the value of the loglikelihood at
the estimated parameters, which implies that, if we want q to be within a certain
range, say q1 ≤ q ≤q2 , then we must estimate q2 -q1 models. With maximum
likelihood, this requires us to maximize the log likelihood q2 -q1 times. With GZW
(1994)’s method, we need only estimate one model, namely an AR(f), from which
we can deduce all at once the parameters of all the q2 -q1 MA(q) models. We then
only need to evaluate the loglikelihood function at these parameter values and
chose the best model accordingly. This is obviously much faster then maximum
likelihood.
Second, the simulations of GZW (1994) indicate that their estimator is more
robust to changes in q. For example, suppose that the true model is MA(∞) and
7
that we consider approximating it by either a MA(q) or a MA(q+1) model. If
we use the GZW (1994) method, for fixed f, going from a MA(q) to a MA(q+1)
specification does not alter the values of the first q coefficients. On the other
hand, these q estimates are likely to change very much if the two models are
estimated by maximum likelihood, because this latter method strongly depends
on the specification. Therefore, bootstrap samples generated from parameters
estimated using the GZW estimator are likely to be more robust to the choice of
q than samples generated usiong maximum likelihood estimates.
Another reason to prefer GZW (1994)’s estimator is that, according to their
simulations, it tends to yield less non-invertible roots. Finally, it allows us to
determine, through simulations, which sieve bootstrap method yields more precise
inference for a given quantity of information (that is, for a given lag length).
Approximating an infinite linear process by a finite model is an old topic in
econometrics. Most of the time, finite f-order autoregressions are used, with f
increasing as a function of the sample size. The classical reference on the subject
is Berk (1974) who proposes to increase f so that f3 /n → 0 as n→ ∞ (that is,
f=o(n1/3 )). This assumption is quite restrictive because it does not allow q to
increase at the logarithmic rate, which is what happens if we use AIC or BIC.
Here, we make the following assumption about q and f:
Assumption 2
q→ ∞ and f→ ∞ as n→ ∞ and q=o (n/log (n))1/2 and f=o (n/log (n))1/2 with f>q.
The reason for this choice is closely related to lemma 3.1 in Park (2002) and the
reader is referred to the discussion following it. Here, we limit ourselves to pointing
out that this rate is consistent with both AIC and BIC, which are commonly used
in practice.
The bootstrap samples are generated from the DGP:
u?t = π̂q,1 ε?t−1 + ... + π̂q,q ε?t−q + ε?t .
(8)
where the π̂q,i , i=1,2,...q are estimates of the true parameters πi , i=1,2,... and the
ε?t are drawn from the EDF of (ε̂t − ε̄n ), that is, from the EDF of the centered
residuals of the MA(q) sieve. We will now establish an invariance principle for the
partial sum process of u?t by considering its Beveridge-Nelson decomposition and
8
showing that it converges almost surely to the same limit as the corresponding
partial sum process built with the original ut . First, consider the decomposition
of ut :
ut = π(1)εt + (ũt−1 − ũt )
where
ũt =
∞
X
π̃k εt−k
k=0
and
∞
X
π̃k =
πi .
i=k+1
Now, consider the partial sum process
[nt]
1 X
Vn (t) = √
uk
n
k=1
hence,
[nt]
[nt] ∞
k=1
k=1 k=0
1 X
1 XX
= √
π(1)εt + √
n
n
"
∞
X
πi
i=k+1
!
#
(εt−k−1 − εt−k )
1
Vn (t) = (σπ(1))Wn (t) + √ (ũ0 − ũ[nt] ).
n
Under assumption 1, Phillips and Solo (1992) show that
p
max1≤k≤n |n−1/2 ũk | → 0.
Therefore, applying the continuous mapping theorem, we have
d
Vn (t) → V = (σπ(1))W
On the other hand, from equation ( 8), we see that u?t can be decomposed as
u?t = π̂(1)ε?t + (ũ?t−1 − ũ?t )
where
π̂(1) = 1 +
q
X
π̂q,k
k=1
ũ?t
=
q
X
π̂˜k ε?t−k+1
k=1
π̂˜k =
q
X
i=k
9
π̂q,i
It therefore follows that we can write:
[nt]
Vn? (t)
1
1 X ?
ut = (σ̂n π̂n (1))Wn? + √ (ũ?0 − ũ?[nt] )
=√
n
n
k=1
?
d
In order to establish the invariance principle, we must show that V?n →
V=
(σπ(1))W a.s.. To do this, we need 3 lemmas. The first one shows that σ̂n and π̂n (1)
d?
converge almost surely to σ and π(1). The second demonstrates that W?n (t) →
W
a.s. Finally, the last one shows that
a.s.
P ? {max1≤t≤n |n−1/2 ũ?t | > δ} → 0
(9)
for all δ > 0, which is equivalent to saying that
p?
max1≤k≤n |n−1/2 ũk | → 0. a.s.
and is therefore the bootstrap equivalent of the result of Philips and Solo (1992).
These 3 lemmas are closely related to the results of Park (2002) and their counterpart in this paper are identified.
Lemma 1 (Park 2002, lemma 3.1, p. 476)
Let assumptions 1 and 2 hold. Then,
h
i
M ax1≤k≤q |π̂q,k − πk | = O c (logn/n)1/2 + o c(q −s )
a.s.
(10)
for large n and where c is a constant equal to the qth element in the progression
[1, 2, 4, 8, 16, ...]. Also,
h
i
σ̂n2 = σ 2 + O c (logn/n)1/2 + o(cq −s )
a.s.
h
i
π̂n (1) = π(1) + O qc (logn/n)1/2 + o(cq −s )
a.s.
(11)
(12)
Proof: see the appendix.
Lemma 2 (Park 2002, lemma 3.2, p. 477). Let assumptions 1 and 2 hold.
Then, E ? |ε?t |r < ∞ a.s. and n1−r/2 E ? |ε?t |r a.s.
→ 0
Proof: see the appendix.
10
?
d
Lemma 2 proves that W?n (t) →
W a.s. because it shows that condition ( 5) holds
almost surely.
Lemma 3 (Park 2002, theorem 3.3, p. 478). Let assumptions 1 and 2 hold.
Then, equation ( 9) holds.
Proof: see the appendix.
With these 3 lemmas, the MA sieve bootstrap invariance principle is established.
It is formalized in the next theorem.
Theorem 1. Let assumptions 1 and 2 hold. Then by lemma 1, 2 and 3,
d?
Vn? → V = (σπ(1))W a.s.
3
Consistency of the sieve bootstrap ADF tests
We will now use the results from the preceding section to show that the ADF bootstrap test based on the MA sieve is asymptotically valid. A bootstrap test is said
to be asymptotically valid or consistent if it can be shown that its large sample
distribution under the null converges to the test’s asymptotic distribution. Consequently, we will seek to prove that MA sieve bootstrap ADF tests statistics follow
the DF distribution asymptotically under the null. Let us begin by considering a
time series yt with the following DGP:
yt = αyt−1 + ut
(13)
where ut is the general linear process described in equation ( 6). We want to test
the unit root hypothesis against the stationarity alternative (that is, H0 : α = 1
against H1 : |α| < 1). This test is frequently conducted as a t-test in the so called
ADF regression, first proposed by Said and Dickey (1984):
yt = αyt−1 +
p
X
αk ∆yt−k + ep,t
(14)
k=1
where p is chosen as a function of the sample size. A large literature has been
devoted to selecting p, see for example, Ng and Perron (1995, 2001). CP (2003)
11
have shown that the test based on this regression asymptotically follows the DF
distribution when H0 is true under very weak conditions, including assumptions 1
and 2. Let y?t denote the bootstrap process generated by the following DGP:
yt? =
t
X
u?k
k=1
and the u?k are generated as in ( 8). The bootstrap ADF regression equivalent to
regression ( 14) is
?
yt? = αyt−1
+
p
X
?
αk ∆yt−k
+ et .
(15)
k=1
Let us suppose for a moment that u?t has been generated by an AR(p) sieve
bootstrap DGP. Then, it is easy to see that the errors of regression ( 15) would be
identical to the bootstrap errors driving the bootstrap DGP. This is a convenient
fact which CP (2003) use to prove the consistency of the AR(p) sieve bootstrap
ADF test based on this regression. If however the y?t are generated by the MA(q)
sieve described above, then the errors of regression ( 15) are not identical to the
bootstrap errors under the null because the AR(p) approximation captures only a
part of the correlation structure present in the MA(q) process. It is nevertheless
possible to show that they will be equivalent asymptotically. This is done in lemma
A1, which can be found in the appendix.
?
?
?
Let x?p,t = (∆yt−1
, ∆yt−2
, ..., ∆yt−p
) and define:
A?n
=
n
X
?
yt−1
ε?t
t=1
Bn?
=
n
X
t=1
? 2
yt−1
−
−
n
X
?
yt−1
x?p,t >
t=1
n
X
?
yt−1
x?p,t >
t=1
!
!
n
X
x?p,t x?p,t >
t=1
n
X
x?p,t x?p,t >
t=1
!−1
!−1
n
X
t=1
n
X
t=1
x?p,t ε?t
!
?
x?p,t yt−1
!
Then, it is easy to see that the t-statistic computed from regression ( 15) can be
written as (e.g. Davidson and MacKinnon, 2003):
Tn? =
α̂?n − 1
+ o(1) a.s.
s(α̂?n )
for large n and where α̂?n − 1 = A?n Bn? −1 and s(α̂?n )2 = σ̂n2 Bn? −1 .
The equality is asymptotic and almost surely holds because the residuals of the
ADF regression are asymptotically equal to the bootstrap errors, as shown in
12
lemma A1. This also justifies the use of the estimated variance σ̂n2 . Note that in
small samples, it may be preferable to use the estimated variance of the residuals
from the ADF regression, which is indeed what we do in the simulations. We must
now address the issue of how fast p is to increase. For the ADF regression, Said
and Dickey (1984) require that p=o(nk ) for some 0< k ≤ 1/3. As we argued earlier,
these rates do not allow the logarithmic rate. Hence, we state new assumptions
about the rate at which q (the sieve order) and p (the ADF regression order)
increase:
Assumption 2’
q=cq nk , p=cp nk where cq and cp are constants and 1/rs < k < 1/2.
Assumption 2 can be fitted into this assumption for appropriate values of k.
Also, notice that assumption 2’ imposes a lower bound on the growth rate of both
p and q. This is necessary to obtain almost sure convergence. See CP (2003) for a
weaker assumption that allows for convergence in probability. Several preliminary
and quite technical results are necessary to prove that the bootstrap test based
on the statistic T?n is consistent. To avoid rendering the present exposition more
laborious than it needs to be, we relegate them to the appendix (lemmas A2 to
A5). For now, let it be sufficient to say that they extend to the MA sieve bootstrap
samples some results established by CP (2003) for the AR sieve bootstrap. In turn,
some of CP (2003)’s lemmas are adaptations of identical results in Berk (1974) and
An, Chen and Hannan (1982).
In order to prove that the MA sieve bootstrap ADF test is consistent, we now
prove two results on the elements of A?n and B?n . These results are stated in terms
of bootstrap stochastic orders, denoted by O?p and o?p , which are defined as follows.
Consider a sequence of non-constant numbers {cn}. Then, we say that X?n =o?p (cn )
a.s. or in p if P? {|Xn? /cn > } →0 a.s. or in p for any > 0. Similarly, we say that
X?n =O(cn ) if for every > 0, there exists a constant M > 0 such that for all large
n, P? {|Xn? /cn | > M } < a.s or in p. It follows that if E? |Xn? | → 0 a.s., then X?n = o?p (1)
a.s. and that if E? |Xn? |=O(1) a.s., then X?n =O?p (1) a.s. See CP (2003), p. 7 for a
slightly more elaborated discussion.
13
Lemma 4. Under assumptions 1 and 2’, we have
n
n
1X ? ?
1X ? ?
yt−1 εt = π̂n (1)
w ε + o?p (1) a.s.
n t=1
n t=1 t−1 t
n
n
X
1 X ? 2
2 1
y
=
π̂
(1)
w? 2 + o?p (1) a.s.
n
t−1
n2 t=1
n2 t=1 t−1
(16)
(17)
Proof: see appendix.
Lemma 5. Under assumptions 1 and 2’ we have
n
1X
?
? > −1 xp,t xp,t ) = Op? (1) a.s.
(
n
t=1
n
X
? x?p,t yt−1
= Op? (np1/2 ) a.s.
t=1
n
X
x?p,t ε?t = Op? (n1/2 p1/2 ) a.s.
(18)
(19)
(20)
t=1
Proof: see appendix.
Lemma 5 allows us to place an upper bound on the absolute value of the second
term of A?n . This is:
!−1 ! n
!−1 n
! n
X
n
n
X
X
X
X
X
n ? ? >
>
>
>
?
?
?
?
?
?
?
?
?
?
xp,t εt xp,t xp,t
yt−1 xp,t xp,t εt ≤ xp,t xp,t
yt−1 xp,t
t=1
t=1
t=1
t=1
t=1
t=1
But by lemma 5, the right hand side is O?p (n−1 )O?p (np1/2 )O?p (n1/2 p1/2 ) which gives
O?p (n1/2 p). Now, using the results of lemma 4, we have that:
n
n
−1
A?n
1X ? ?
= π̂n (1)
w ε + o?p (1)a.s.
n t=1 t−1 t
because the second term of A?n multiplied by n−1 is O?p (n−1/2 p). We can further
say that
n−2 Bn? = π̂n (1)2
14
n
1 X ? 2
w
+ o?p (1) a.s.
n2 t=1 t−1
because n−2 times the second part of B?n is O?p (n−1 ). Therefore, the T?n statistic
can be seen to be:
1 Pn
?
?
Tn? =
n
1
n2
σ
Pt
wt−1 εt
+ o?p (1) a.s.
2 1/2
?
t=1 wt−1
t=1
Pn
Recalling that wt? = k=1 ε?k , it is then easy to use the result of section 3 along with
the continuous mapping theorem to deduce that:
n
1 X ? ? d?
w ε →
n t=1 t−1 t
Z
1
Wt dWt a.s.
0
Z
n
1 X ? 2 d? 1 2
w
→
Wt dt a.s.
n2 t=1 t−1
0
under assumptions 1 and 2’. We can therefore state the following theorem.
Theorem 2. Under assumptions 1 and 2’, we have
d?
Tn? →
R1
0
R
1
0
Wt dWt
1/2 a.s.
Wt2 dt
which establishes the asymptotic validity of the MA sieve bootstrap ADF test.
4
Simulations
We now present a set of simulations designed to illustrate the extent to which
the proposed MA-sieve bootstrap scheme improves upon the usual AR-based sieve
bootstrap. For this purpose, I(1) data series were generated from the model described by equation ( 13) with errors generated by a general linear model of the
class described in equation ( 6) with iid N(0,1) innovations. Most of the existing
literature on the rejection probability of asymptotic or sieve bootstrap unit root
tests is mainly concerned with the case where the unit root process is driven by
stationary and invertible errors processes with a moving average root near the
unit circle. This typically results into large ERP and low power of the asymptotic
ADF test. Classical references on this are Schwert (1989) and Agiakoglou and
Newbold (1992). Recently, CP (2003) have shown, through Monte Carlo experiences, that the AR sieve bootstrap allows one to substantially reduce this ERP,
15
but not to eliminate it altogether. Their simulations however show that the AR
sieve bootstrap looses some of its accuracy as the dependence of the error process
increases.
Most of the time, simulation studies use an invertible MA(1) process to generate
ut . The reason is that it is easy to generate and easy to control and graph the
degree of correlation of ut by simply changing the MA parameter. This simple
device is obviously not appropriate in the present context because the MA(1) would
correctly model the first difference process under the null and would therefore not
be a sieve anymore. We therefore used ARMA(p,q) DGPs so that neither an AR
or an MA sieve may represent a correct specification of the first difference process.
We have also accounted for the possibility of long memory in the first difference
by using ARFIMA(1,d,1) DGPs as well.
4.1
Implementation
The bootstrap tests are computed as follows.
1. Estimate by OLS regression of the first difference of yt . Obtain
the residuals:
p
ε̂t = ∆yt −
X
α̂p,k ∆yt−k
k=1
For the MA sieve bootstrap, we fit an MA(q) model to the first
difference process and obtain the residuals:
ε̂t = Ψ̂> ∆yt
where Ψ̂ is the triangular GLS transformation matrix for MA(q)
processes.
2. Draw bootstrap errors (ε?t ) from the EDF of the recentered and
Pn
n 1/2
rescaled residuals ( n−p
) (ε̂t − n1 t=1 ε̂t )
3. Generate bootstrap samples of yt . For the AR sieve bootstrap:
∆yt? =
p
X
k=1
16
?
∆yt−k
+ ε?t .
For the MA sieve bootstrap:
∆yt? =
q
X
π̂q,k ε?t−k + ε?t .
k=1
Then, we generate bootstrap samples of yt? : yt? =
Pt
j=1
∆yj?
4. Compute the bootstrap T?n i ADF test based on the bootstrap
ADF regression:
?
+
yt? = αyt−1
p
X
?
αk ∆yt−k
+ et
k=1
5. Repeat steps 2 to 4 B times to obtain a set of B ADF statistics
T?ni , i=1,2,...B. The p-value of the test is defined as:
P? =
B
1 X
I(Tn? i < Tn )
B i=1
where Tn is the original ADF test statistic and I() is the indicator function which is equal to 1 every time the bootstrap
statistic is smaller than Tn and 0 otherwise. The null hypothesis is rejected at the 5 percent level whenever P? is smaller than
0.05.
In all the simulations reported here, the AR sieve and the ADF regressions are
computed using OLS. Further, all MA parameters were estimated by the analytical
indirect method of GZW (1994). The bootstrap samples are generated recursively,
which requires some starting values for ∆yt? . We have set the first p values of ∆yt?
equal to the first p values of ∆yt and generated samples of n+100+p observations.
Then, we have thrown away the first 100+p values and used the last n to compute
the bootstrap tests. For the MA sieve bootstrap parameters, we have set the first
q errors to 0, generated samples of size n+100+q and thrown away the first 100+q
realizations.
The GLS transformation matrix Ψ is defined as being the matrix that satisfies
the equation Ψ> Ψ = Σ−1 where Σ−1 is the inverse of the process’s covariance matrix.
There are several ways to estimate Ψ. A popular one is to compute the inverse
of the covariance matrix evaluated at the parameter estimates and to obtain Ψ̂
using a numerical algorithm (the Choleski decomposition for example). This exact
17
method, however, requires the inversion of the n×n covariance matrix, which is
computationally costly when n is large. A corresponding approximation method
consists into decomposing the covariance matrix of the inverse process (for example,
of a MA(∞) in the case of an AR(1) model). This has the advantage of not
requiring the inversion of a large matrix. For all the simulations reported here,
the transformation matrix was estimated using the exact method proposed by
Galbraith and Zinde-Walsh (1992), who provide exact algebraic expressions for
Ψ rather than for Σ. This is computationally advantageous because it does not
require the inversion, the decomposition nor the storage of a n×n matrix since all
calculations can be performed through row by row looping.
4.2
Error in rejection probability
We now turn to the analysis of the finite sample performances of the MA sieve
bootstrap ADF unit root test in terms of ERP relative to other commonly used
bootstrap methods. In order to do this, we use size discrepancy curves which
plot the ERP of the test with respect to nominal size. This is obviously much
more informative than simply considering the ERP at a given nominal size. See
Davidson and MacKinnon (1998) for a discussion. All the curves presented below
were generated from 2500 samples and 999 bootstrap replications per sample. We
have used 4 different DGP for the first difference process, all of which have N(0,1)
errors. Their characteristics are summarized in the table below.
Table 1. Models.
ARMA 1
ARMA 2
ARMA 3
ARFIMA 1
AR
-0.85
-0.85
-0.4
-0.85
MA
-0.85
-0.4
-0.85
-0.85
d
0
0
0
0.45
Figures 1 and 2 compare size discrepancy of certain AR and MA sieve bootstrap
tests for sample sizes of 25 and 100 respectively generated by model 1. Overall,
the MA sieve bootstrap tests seem to be somewhat more accurate than the AR
for similar parametric orders. For example, the MA(3) test has virtually no ERP
when N=25, while the AR(3) strongly over-rejects. Similarly, the MA(7) only
slightly over-rejects when N=100 while the AR(7) over-rejects quite severely. A
second feature is that the AR sieve bootstrap test is often able to perform as well
18
as the MA sieve bootstrap, but only at the cost of a much higher parametric order.
A good example of this is the fact that the AR(15) has a size discrepancy curve
similar to the MA(7). This last point may be of extreme importance in small
samples, as is evident from figure 1.
In figure 3, we compare the MA sieve bootstrap test to the overlapping block
bootstrap in a sample of 25 observations generated from model 1. It appears that
the MA sieve bootstrap procedure has 2 advantages over the block. First of all,
it is clearly more robust to the choice of model order or block size. Indeed, for
example, choosing a block length of 3 instead of 4 when testing at 5 percent results
in the RP being around 0.202 instead of 0.045. In the same situation, choosing an
MA order of 2 instead of 3 results in the RP being around 0.100 instead of 0.052.
Since the choice of order or block size is done either arbitrarily or using data based
methods whose reliability is not certain, this point is of capital importance for
applied work. The second advantage of the MA sieve bootstrap is that it seems to
be more robust to the nominal level of the test than the block. For example, the
figure shows that even though the block bootstrap with block size 4 has almost no
ERP at 5 percent, it over-rejects by almost 3.5 percent at a nominal level of 10
percent, while the MA sieve bootstrap still has only negligible ERP.
Figure 4 shows the effects of the relative strength of the AR and MA parts on
the AR and MA sieve bootstrap ADF tests. We denote by strong AR the curves
obtained using model 2 and by strong MA the ones obtained from model 3. The
simulations reported there indicate that the MA sieve may be more robust than
the AR sieve in terms of absolute ERP relatively to the relative strength of the
different parts of the DGP. Indeed, even though it exhibits almost no ERP when
the AR part is stronger (a little over 1 percent at 5 percent nominal level), the
AR sieve has a considerable over-rejection problem when the MA part is stronger
(6 percent ERP at 5 percent nominal level). On the other hand, the MA sieve
has almost the same absolute ERP in both cases (around 2 percent at 5 percent
nominal size).
Figure 5 compares AR and MA sieve bootstrap ADF tests’ discrepancy curves
for yet another DGP, namely the stationary ARFIMA model described in table
1. Just like in figure 1, we see that the MA sieve bootstrap can achieve better
accuracy than the AR sieve bootstrap. In figure 6, we compare size discrepancy
plots for the MA sieve bootstrap ADF test when the sieve model is estimated
19
by the analytical indirect inference method GZW or MLE. It appears that the
estimating method has very little effects on the small sample performances of the
test.
4.3
Power
We now briefly turn to power considerations. There is no good reason why the
MA sieve bootstrap test would have better power properties than the AR sieve
bootstrap tests. It is however possible that the accuracy gains under the null
observed in the previous subsection come at the cost of a loss of power. Fortunately,
it does not appear to be so. Figure 7 compares size-power curves of the AR and
MA sieve bootstrap ADF tests for nominal size from 0 to 0.5 and N=50. They are
based on model 1 and two alternatives are considered: one quite close to the null
where the parameter α in equation ( 1) is 0.9 and the other where it is equal to
0.7. According to the simulations, both sieve bootstrap procedures have similar
power characteristics. Similar results were obtained for N=100.
5
Test of PPP
In order to get a feeling of how the MA sieve bootstrap performs on real data, we
have carried out a very simple test of the purchasing power parity (PPP) on a small
set of European countries, namely, France, Germany, Italy and the Netherlands.
The purpose of this section being to illustrate the finite sample performances of
the proposed bootstrap technique, we have chosen a sample containing only 50
monthly observations, from the first month of 1977 to the second month of 1981.
Our method is as simple as possible: under the PPP hypothesis, there should be a
long term relationship between the price level ratio and the exchange rate of any
two countries. Thus, simple Engle-Granger cointegration tests should provide a
simple and easy way to detect PPP. This implies two steps, both giving us the
opportunity to try out our MA sieve bootstrap: first, we must run unit root tests
on each time series and second, we must test for cointegration by means of unit
root tests performed on the residuals of a regression of one series on the other. The
results of the former are shown in table 2 while the results of the latter are shown
in table 3. The bootstrap tests are based on 9999 replications and are conducted
20
using models with a constant and no deterministic trend (the addition of such a
trend does not alter the results significantly).
Table 2. Unit root tests.
Series
E (F-G)
E (F-I)
E (F-N)
E (I-N)
P (F-G)
P (F-I)
P (F-N)
P (I-N)
llgtd
5
2
5
6
4
6
4
6
ADF
-2.2405
-1.1563
-2.4146
-1.5684
0.0993
2.2813
2.3903
3.1417
AR ord p-val
5
0.1094
2
0.2640
5
0.0986*
6
0.1026
4
0.5912
6
0.9898
4
0.9892
6
0.9969
MA ord p-val
1
0.1188
1
0.2637
1
0.1459
1
0.1092
1
0.6427
1
0.9745
1
0.9743
4
0.9945
p-val diff
-0.0094
0.0003
-0.0473
-0.0066
-0.0515
0.0153
0.0149
0.0024
Table 2 reads as follows: the first column gives the name of the time series being
tested, the second and fourth columns show how many lags were used in the ADF
regression and the AR sieve model respectively while the sixth gives the order of
the MA sieve used, the third reports the value of the ADF test statistic, the fifth
and seventh show the p-values of the AR and MA sieve bootstrap tests respectively
and the last one compares these two quantities. One, two and three stars denote
rejection of the unit root hypothesis at 10%, 5% and 1%. The autoregressive orders
of the ADF tests regressions and of the AR sieve and the order of the MA sieve
were all chosen using the AIC selection criterion.
There is little to say about this table, except that the two bootstrap procedures
give similar diagnostics in all cases but one, namely the exchange rate between
France and the Netherlands, where the AR sieve rejects very narrowly at the 10%
nominal size. This appears to be a rather insignificant result and we will not
discuss it any further. A more interesting feature of the numbers shown is that
both methods produce very similar p-values, but that the MA sieve bootstrap test
uses less parameters. This confirms what we observed in the simulations presented
in the preceding section.
Table 3. Cointegration tests.
Series
F-G
F-I
I-N
llgtd
3
0
1
ADF
-4.1137***
-1.3324
-1.6471
AR ord
3
0
1
21
p-val
0.0019***
0.6008
0.4530
MA ord p-val
1
0.3089
1
0.7268
1
0.5107
p-val diff
-0.3070
-0.1260
-0.0577
Table 3 reads the same way as table 2. The test of PPP between France and
Germany is particularly interesting. We see that both the asymptotic and AR sieve
bootstrap tests strongly reject the unit root hypothesis, and therefore find evidence
in support of PPP, while the MA sieve test does not even come close to rejecting
the null. This may be an example of a situation where the MA sieve bootstrap
test is capable of correcting the AR sieve’s tendency to over-reject. Notice that
imposing an MA order of either 2 or 3 yielded p-values of 0.1730 and 0.1750, so
we may consider this result as quite robust.
6
Conclusions
Using results on the invariance principle of MA sieve bootstrap partial sum processes
derived in section 2, we have shown that the MA sieve bootstrap ADF test is consistent under fairly weak regularity conditions. Most of our analysis is based on
the analytical indirect inference method of Galbraith and Zinde-Walsh (1994) but
certainly also applies to maximum likelihood estimation, a supposition supported
by our simulations.
Our simulations have also shown that the MA sieve bootstrap may have some
advantages in the context of unit root testing over the AR sieve and block bootstrap. In particular, we have found that, for the DGPs and sample sizes considered,
the MA sieve bootstrap ADF tests are likely to require much more parsimonious
models than the AR sieve bootstrap tests to achieve the same, or even better,
accuracy in terms of ERP. Also, the MA sieve tests appear to be more robust to
changes in the DGP in terms of absolute ERP. Finally, it seems to be more robust
to order choice than the block bootstrap and also more robust with respect to the
choice of nominal level. We find no evidence that these gains are made at the cost
of a power loss.
Further ongoing research includes the generalization of these results to ARMA
sieve bootstrap methods, which naturally follows from combining our results to
those of Park (2002) and Chang and Park (2003). We also consider the utilization
of bias correction methods to obtain sieve models that are closer to the DGP. This
should allow us to further reduce ERP problems.
22
Appendix
PROOF OF LEMMA 1.
First, consider the following expression from Park (2002, equation 19):
ut = αf,1 ut−1 + αf,2 ut−2 + ... + αf,f ut−f + ef,t
(21)
where the coefficients αf,1 are pseudo-true values defined so that the equality
holds and the ef,t are uncorrelated with the ut−k , k=0,1,2,...r. Using once again
the results of GZW (1994), we define
πq,1 = αf,1
πq,2 = αf,2 + αf,1 ∗ πq,1
..
.
πq,q = αf,q + αf,q−1 ∗ πq,1 + ... + αf,1 ∗ πq,q−1
to be the moving average parameters deduced from the parameters of the AR
process ( 21). It is shown in Hannan and Kavalieris (1986) that
1/2
a.s.
max1≤k≤q |α̂f,k − αf,k | = O (logn/n)
where α̂f,i are OLS or Yule-Walker estimates (an equivalent result is shown to hold
in probability in Baxter (1962)). Further, they show that
f
X
k=1
|αf,k − αk | ≤ c
∞
X
k=f +1
|αk | = o(f −s ).
where c is a constant. This yields part 1 of lemma 3.1 of Park (2002):
1/2
+ o(f −s ) a.s.
max1≤k≤f |α̂f,k − αk | = O (logn/n)
Now, it follows from the equations of GZW (1994) that:
k−1
X
(α̂f,k−j π̂q,j − αk−j πj )
|π̂q,k − πk | = j=0
23
(22)
Of course, it is possible to express all of the π̂q,k and πk as functions of the α̂f,k and
αk respectively. For example, we have π1 = α1 , π2 = α21 + α2 , π3 = α31 + α1 α2 + α1 α2 + α3
and so forth. Note that, as is made clear from this progression, the expression of
πk has twice as many terms as the expression of πk−1 . It is therefore possible to
rewrite ( 22) for any k as a function of α̂f,j and αj , j=1,...k. There is no satisfying
general expression for this so let us consider, as an illustration, the case of k=3:
|π̂q,3 − π3 | = α̂3f,1 + α̂f,1 α̂f,2 + α̂f,1 α̂f,2 + α̂f,3 − α31 − α1 α2 − α1 α2 − α3 using the triangle inequality:
|π̂q,3 − π3 | ≤ α̂3f,1 − α31 + |α̂f,1 α̂f,2 − α1 α2 | + |α̂f,1 α̂f,2 − α1 α2 | + |α̂f,3 − α3 |
Thus,
using the
i results of lemma 3.1 of Park (2002), |π̂q,3 − π3 | can be at most
h
1/2
O 4 (logn/n) +o(4f −s) because the number of summand is proportional to a power
h
i
of n. Similarly, |π̂q,4 − π4 | can be at most O 8 (logn/n)1/2 + o(8f −s ) and so forth.
Generalizing this and considering that we have assumed that the order of the MA
model, q, increases at the same rate as the order of the AR approximation (f), so
that we can replace f by q in the preceding expressions, yields the stated result.
The other two results follow in a similar manner.
PROOF OF LEMMA 2.
r First, note that n1−r/2 E ? |ε?t |r = n1−r/2 n1 nt=1 ε̂q,t − n1 nt=1 ε̂q,t because the
bootstrap errors are drawn from the series of recentered residuals from the MA(q)
model. Therefore, what must be shown is that
P
n1−r/2
P
r !
n
n
1X
1 X a.s.
ε̂q,t → 0
ε̂q,t −
n t=1 n t=1
as n→ ∞. If we add and subtract εt and εq,t (which was defined in equation 7)
inside the absolute value operator, we obtain:
n
1−r/2
r !
n
n
1X
1 X ε̂q,t ≤ c (An + Bn + Cn + Dn )
ε̂q,t −
n t=1 n t=1
where c is a constant and
An =
1
n
Bn =
1
n
Pn
t=1
Pn
t=1
r
|εt |
r
|εq,t − εt |
24
Cn =
1
n
Pn
t=1
Pn
Dn = n1
r
|ε̂q,t − εq,t |
t=1 ε̂q,t
r
To get the desired result, one must show that n1−r/2 times An , Bn , Cn and Dn
each go to 0 almost surely.
→ 0. This holds by the strong law of large numbers
1. n1−r/2 An a.s.
r
which states that An a.s.
→ E |εt | which has been assumed to be
finite. Since r>4, 1-r/2<-1 from which the result follows.
2. n1−r/2 Bn a.s.
→ 0. This is shown through proving that
r
E |εq,t − εt | = o(q −rs )
(23)
holds uniformly in t and where s is as specified in assumption 1
part b. We begin by recalling that from equation ( 7) we have
εq,t = ut −
q
X
πk εq,t−k .
k=1
Writing this using an infinite AR form:
εq,t = ut −
∞
X
α̈k ut−k
k=1
where the parameters α̈k are functions of the first q true parameters πk in the usual manner (see proof of lemma 1). We also
have:
∞
εt = u t −
X
πk εt−k .
k=1
which we also write in AR(∞) form:
εt = u t −
∞
X
αk ut−k .
k=1
Evidently, αk = α̈k for all k=1,...,q. Subtracting the second of
these two expressions to the first we obtain:
εq,t − εt =
∞
X
k=q+1
25
(αk − α̈k ) ut−k
(24)
Using Minkowski’s inequality, the triangle inequality and the
stationarity of ut ,

r
r
E |εq,t − εt | ≤ E |ut | 
∞
X
k=q+1
r
|(αk − α̈k )|
(25)
The second element of the right hand side can be rewritten as
q
r 
k−q
r
∞ X
k
∞ X
X
X
X

π` αk−` −
π` αk−` − πk  = 
π` αk−` − πk 
−

k=q+1 `=1
`=1
k=q+1
`=1
that is, we have a sequence of the sort: -(πq+1 + πq+2 + π1 αq+1 +
πq+3 + π1 αq+2 + π2 αq+1 ...) Then, it follows from assumptions 1 b
and 2 that
r
E |εq,t − εt | = o(q −rs )
(26)
because E|ut |r < ∞ (see Park 2002, equation 25, p. 483). The
equality ( 26) together with the inequality ( 25) imply that
equation ( 23) holds. In turn, equation ( 23) implies that
a.s.
n1−r/2 Bn → 0 is true, provided that q increases at a proper
rate, such as the one specified in assumption 2.
3. n1−r/2 Cn a.s.
→ 0
We start from the AR(∞) expression for the residuals to be
resampled:
ε̂q,t = ut −
∞
X
α̂q,k ut−k
(27)
k=1
where α̂q,k denotes the parameters corresponding to the estimated MA(q) parameters π̂q,k . Then, adding and subtracting
P∞
k=1 αq,k ut−k , where the parameters πq,k were defined in the
proof of lemma 1 and using once more the AR(∞) form of ( 7),
equation ( 27) becomes:
ε̂q,t = εq,t −
∞
X
k=1
(α̂q,k − αq,k )ut−k −
∞
X
k=1
(αq,k − α̈k )ut−k
(28)
It then follows that
r !
r ∞
∞
X
X
|ε̂q,t − εq,t | ≤ c (α̂q,k − αq,k )ut−k + (αq,k − α̈k )ut−k r
k=1
k=1
26
for c=2r−1 . Let us define
C1n
∞
r
n
1 X X
=
(α̂q,k − αq,k )ut−k n t=1 k=1
C2n
r
∞
n
1 X X
=
(αq,k − α̈k )ut−k n
t=1 k=1
a.s.
then showing that n1−r/2 C1n a.s.
→ 0 and n1−r/2 C2n → 0 will give us
our result. First, let us note that C1n is majorized by:
n
r
C1n ≤ (max1≤k≤∞ |α̂q,k − αq,k | )
∞
1 XX
r
|ut−k |
n t=1
(29)
k=1
By lemma 1 and equation (20) of Park (2002), we have
1/2
max1≤k≤∞ |α̂q,k − αq,k | = O (c logn/n)
a.s.,
that is, the estimated parameters are alomst sure consistent
estimates of the pseudo-true parameters. Therefore, using yet
again a triangle inequality on ( 29) and truncating it, without
loss of generality because it applies to the whole sequence from
1 to ∞, for some order p which increases according to the rate
specified in assumption 2 (which is necessary if we want to
apply the result to a finite sample size),
C1n
p
≤ (max1≤k≤p |α̂q,k − αq,k | )
n
r
n−1
X
t=0
r
|ut | +
1−p
X
t=−1
r
|ut |
!
(30)
which can easily be seen to go to 0 because of the results of
lemma 1 and equation 25 in Park (2002). See also Park (2002),
p. 484.
On the other hand, if we apply Minkowski’s inequality to the
absolute value part of C2n , we obtain
r
∞
X
r
E
(αq,k − α̈k ) ut−k ≤ E |ut |
k=1
∞
X
k=1
!r
|αq,k − α̈k |
(31)
of which the right hand side goes to 0 by the boundedness of
r
E |ut | , the definition of the α̈, lemma 1 and equation (21) of
27
Park (2002), where it is shown that pk=1 |αp,k − αk | = o(p−s ) for
some p→ ∞, which implies a similar result between the πq,k and
the πk , which in turn implies a similar result betwee the αq,k
and the α̈k . This proves the result.
P
4. n1−r/2 Dn a.s.
→ 0
In order to prove this result, we show that
n
n
n
1X
1X
1X
ε̂q,t =
εq,t + o(1) a.s. =
εt + o(1) a.s.
n t=1
n t=1
n t=1
Recalling equations ( 24) and ( 28), this will hold if
n
∞
1X X
a.s.
(αk − α̈k ) ut−k → 0
n t=1
(32)
k=q+1
n
∞
1 XX
a.s.
(αq,k − α̈k )ut−k → 0
n t=1
(33)
k=1
n
∞
1 XX
a.s.
(α̂q,k − αq,k )ut−k → 0
n t=1
(34)
k=1
where the first equation serves to prove the second asymptotic
equality and the other two serve to prove the first asymptotic
equality. Proving those 3 results requires some work. Just like
Park (2002), p.485, let us define
Sn (i, j) =
n
X
εt−i−j
n
X
ut−i
t=1
and
Tn (i) =
t=1
so that
Tn (i) =
∞
X
πj Sn (i, j)
j=0
and remark that by Doob’s inequality,
r
|maxi≤m≤n Sm (i, j)| ≤ z |Sn |
28
r
where z = 1/(1 − r1 ). Taking expectations and applying Burkholder’s inequality,
r/2
n
X
2
E (max1≤m≤n |Sm (i, j)| ) ≤ c1 zE εt r
t=1
where c1 is a constant depending only on r. By the law of large
numbers, the right hand side is equal to c1 z(nσ2 )r/2 =c1 z(nr/2 σ).
Thus, we have
r
E(max1≤m≤n |Sm (i, j)| ) ≤ cnr/2
uniformly over i and j where c = c1 zσ. Define
∞
X
Ln =
k=q+1
(αk − α̈k )
n
X
ut−k .
t=1
It must therefore follow that
r
1/r
[E (max1≤m≤n |Lm | )]
≤
but
∞
X
≤
k=q+1
∞
X
k=q+1
∞
X
k=q+1
|(αk − α̈k )| [E(max1≤m≤n |Tm (k)|r )]
1/r
|(αk − α̈k )| cnr/2
|(αk − α̈k )| = o(q −s )
by assumption 1 b and the construction of the α̈k , recall part 2
of the present proof. Thus,
E [max1≤m≤n |Lm |r ] ≤ cq −rs nr/2
Then it follows from the result in Moricz (1976, theorem 6),
that for any δ > 0,
1/r
(1+δ)/r
= o(n) a.s.
Ln = o q −s T 1/2 (log n) (log log n)
this last equation proves ( 32).
Now, if we let
Mt =
∞
X
k=1
(αq,k − α̈k )
29
n
X
t=1
ut−k ,
then we find by the same devise that
1/r
r
[E(max1≤m≤n |Mm | )]
≤
∞
X
(αq,k − α̈) [E(max1≤m≤n |Tm (k)|r )]1/r
k
k=1
the right hand side of which is smaller or equal to cq−s n1/2 ,
see the discussion under equation ( 31) . Consequently, using
Moricz’s result once again, we have Mn = o(n) a.s. and equation
( 33) is demonstrated. Finally, define
Nn =
∞
X
k=1
further, let
(α̂q,k − αq,k )
n
X
ut−k
t=1
n
∞ X
X
Qn =
ut−k .
k=1 t=1
Then, the larger element of Nn is Qn max1≤k≤∞ |α̂q,k − αq,k |. Further, using again Doob’s and Burkholder’s inequalities along
with assumption 1 b and the result underneath ( 29), we have
r
E [max1≤m≤n |Qm | ] ≤ cq r nr/2 .
Therefore, we can again deduce that for any δ > 0,
and
1/r
(1+δ)/r
Qn = o qn1/2 (logn) (loglogn)
a.s.
(r+2)/2r
(1+δ)/r
1/2
= o(n).
Qn = o q (logn)
(loglogn)
Nn = O (logn/n)
Hence, equation ( 34) is proved.
The proof of the lemma is complete. Consequently, the conditions required for
d?
theorem 2.2 of Park (2002) are satisfied and we may conclude that W?n →
W a.s.
•
PROOF OF LEMMA 3.
30
We begin by noting that
n
i
i X
h
h
P ? n−1/2 ũ?t > δ
P ? max1≤t≤n n−1/2 ũ?t > δ ≤
t=1
i
h
= nP ? n−1/2 ũ?t > δ
r
≤ (1/δ r )n1−r/2 E ? |ũ?t |
where the first inequality is trivial, the second equality follows from the invertibility
of ũ?t conditional on the realization of {ε̂q,t} (which implies that the AR(∞) form
of the MA(q) sieve is stationary) and the last inequality is an application of the
Tchebyshev inequality. Recall that
ũ?t =
q
q X
X
π̂q,i )ε?t−k+1 .
(
k=1 i=k
Then, by Minkowski’s inequality and assumption 1, we have:
q
X
r
r
k |π̂q,k |)r E ? |ε?t |
E ? |ũ?t | ≤ (
k=1
But by lemma 1, the estimates π̂q,i are consistent for πk . Hence, by assumption
1, the first part must be bounded as n→ ∞. Also, we have shown in lemma 2 that
n1−r/2 E ? |ε?t |r a.s.
→ 0. The result thus follows.
•
PROOF OF THEOREM 1.
The result follows directly from combining lemmas 1 to 3. Note that, had proved
lemmas 1 to 3 in probability, the result of theorem 1 would also be in probability.
•
Lemma A1. Let assumptions 1 and 2 hold. Then, the errors from regression ( 15)
are asymptotically equal to the bootstrap error terms, that is, et = ε?t + o(1) a.s.
PROOF OF LEMMA A1.
31
Let us first rewrite the ADF regression ( 15) under the null as follows:
∆yt? =
p
X
αp,k (
k=1
k+q
X
π̂q,j ε?t−j + ε?t−k ) + et
j=k+1
?
. This can be rewritten as
where we have substituted the bootstrap DGP for ∆yt−k
∆yt? =
q
p X
X
αp,i π̂q,j ε?t−i−j + et .
(35)
i=1 j=0
where π̂0,j is constrained to be equal to 1 as usual. Then, from equations ( 35) and
( 8),
et =
q
X
π̂q,j ε?t−j
+
ε?t
q
p X
X
−
j=1
αp,i π̂q,j ε?t−i−j .
i=1 j=0
Let us rewrite this result as follows:
et = At + Bt + ε?t
where
At = (π̂q,1 − α1 )ε?t−1 + (π̂q,2 − αp,1 π̂q,1 − αp,2 )ε?t−2 + ... + (π̂q,q − αp,1 π̂q,q−1 − ... − αp,q )ε?t−q
and
Bt =
p+q
X
j=q+1
q
X
ε?t−j (
π̂q,i αp,j−i )
i=0
where π̂0 =1 and αp,i =0 whenever i > p. First, we note that the coefficients appearing in At are the formulas linking the parameters of an AR(p) regression to those
of a MA(q) process (see Galbraith and Zinde-Walsh, 1994). Hence, no matter how
the π̂q,j are estimated, these coefficients all equal to zero asymptotically under assumption 2. Since we assume that q → ∞, we may conclude that At →0. Further,
by lemma 1 and lemma 3.1 of Park (2002), this convergence is almost sure.
On the other hand, taking Bt and applying Minkowski’s inequality to it:
r
p+q X
q
X
r
E ? |Bt | ≤ E ? |ε?t | (
j=q+1 i=0
|π̂q,i αp,j−i |)r .
Pq
But for each j, i=0 πq,i αp,j−i is equal to -αp,j , the jth parameter of the approximating
autoregression (see Galbraith and Zinde-Walsh, 1994). Hence, we can write:
r
r
E ? |Bt | ≤ E ? |ε?t | (
p+q
X
j=q+1
32
|αj−i |)r a.s.
by lemma 1. But, as p and q go to infinity with p>q,
under assumption 1.
Pp+q
r
j=q+1 (|αj−i |)
=o(q−rs ) a.s.
•
Lemma A2 (CP lemma A1) Under assumptions 1 and 2’, we have σ?2 a.s.
→ σ 2 and
a.s.
Γ?0 → Γ0 as n→ ∞ where E? |ε?t |2 = σ?2 and E? |u?t |2 = Γ?0
PROOF OF LEMMA A2.
Consider the bootstrap DGP ( 8) once more:
u?t =
q
X
π̂q,k ε?t−k + ε?t
k=1
Under assumption 1 and lemma 1, this process admits an AR(∞) representation.
Let this be:
∞
u?t +
X
ψ̂q,k u?t−k = ε?t
(36)
k=1
where we write the ψ̂q,k parameters with a hat and a subscript q to emphasize that
they come from the estimation of a finite order MA(q) model. We can rewrite
equation ( 36) as follows:
u?t = −
∞
X
ψ̂q,k u?t−k + ε?t .
(37)
k=1
Multiplying by u?t and taking expectations under the bootstrap DGP, we obtain
Γ?0 = −
∞
X
ψ̂q,k Γ?k + σ?2 .
k=1
dividing both sides by Γ?0 and rearranging,
Γ?0 =
1+
σ?2
P∞
k=1
ψ̂q,k ρ?k
(38)
where ρ?k are the autocorrelations of the bootstrap process. Note that these are
functions of the parameters ψ̂q,k and that it can easily be shown that they satisfy
the homogeneous system of linear differential equations described as:
ρ?h +
∞
X
k=1
33
ψ̂q,k ρ?h−k = 0
for all h > 0. Thus, the autocorrelations ρ?h are implicitly defined as functions of
the ψ̂q,k . On the other hand, let us now consider the model:
ut =
q
X
π̂q,k ε̂q,t−k + ε̂q,t
k=1
which is simply the result of the computation of the parameters estimates π̂q,k .
This, of course, also has an AR(∞) representation:
ut = −
∞
X
ψ̂q,k ut−k + ε̂q,t .
k=1
where the parameters ψ̂q,k are exactly the same as in equation ( 37). Applying the
same steps to this new expression, we obtain:
Γ0,n =
σ̂n2
P∞
1 + k=1 ψ̂q,k ρk,n
(39)
where Γ0,n , is the sample autocovariance of ut when we have n observations, σ̂n2 =
Pn
(1/n) i=1 ε̂2t and ρk,n is the kth autocorrelation of ut , which are functions of the
parameters π̂q,k that generated ut . Since the autocorrelation parameters are the
same in equations ( 38) and ( 39), we can write:
Γ?0 = (σ?2 /σ̂n2 )Γ0,n .
The strong law of large numbers implies that Γ0n a.s.
→ Γ0 . Therefore, we only need
2
2 a.s.
to show that σ? /σ̂n → 0 to obtain the second result (that is, to show that Γ?0 a.s.
→ Γ0 ).
2 a.s. 2
By the consistency results in lemma 1, we have that σ̂n → σ . Also, recall that the
Pn
ε?t are drawn from the EDF of (ε̂q,t − (1/n) t=1 ε̂q,t ). Therefore, ε?t is defined as:
n
σ?2 =
n
1X
1X
(ε̂q,t −
ε̂q,t )2 .
n t=1
n t=1
It follows that:
σ?2 = σ̂n2 + ((1/n)
n
X
ε̂q,t )2 .
(40)
t=1
But we have shown in lemma 2 that ((1/n) nt=1 ε̂q,t )2 = o(1) a.s. (to see this, take
the result for n1−r/2 Dn with r=2). Therefore, we have that σ?2 a.s.
→ σ̂n2 , and thus,
a.s.
a.s.
a.s.
σ?2 /σ̂n2 → 1. It therefore follows that Γ?0 → Γ0 . On the other hand, σ?2 → σ̂n2 implies
a.s.
σ?2 → σ 2 .
P
•
34
Lemma A3 (CP lemma A2, Berk, theorem 1, p. 493)Let f and f ? be the spectral
densities of ut and u?t respectively. Then, under assumptions 1 and 2’,
supλ |f ? (λ) − f (λ)| = o(1) a.s.
for large n. Also, letting Γk and Γ?k be the autocovariance functions of ut and u?t
respectively, we have
∞
X
Γ?k =
∞
X
Γk + o(1) a.s.
k=−∞
k=−∞
for large n. Notice that the result of CP (2003) and our are almost sure whereas
Berk’s is only in probability.
PROOF OF LEMMA A3.
The spectral density of the bootstrap data is
σ2
f (λ) = ?
2π
?
Further, let us define
2
q
X
ikλ π̂q,k e .
1 +
k=1
2
q
X
σ̂n2 ikλ ˆ
f (λ) =
π̂q,k e .
1 +
2π k=1
Recall that
σ?2
#2
"
n
n
1X
1X
=
ε̂q,t −
ε̂q,t .
n t=1
n t=1
From lemma 2 (proof of the 4th part) and lemma A2 (equation 40), we have
n
σ?2 =
1X 2
ε̂ + op (1).
n t=1 q,t
Thus,
ˆ + op (1).
f ? (λ) = f (λ)
Therefore, the desired result follows if we show that
Supλ f ˆ(λ) − f (λ) = o(1) a.s.
Now, denote by fn (λ) the spectral density function evaluated at the pseudo-true
parameters introduced in the proof of lemma 1:
σ2
fn (λ) = n
2π
35
2
q
X
πq,k eikλ 1 +
k=1
where σn2 is the minimum value of
−2
q
X
ikλ f (λ) 1 +
πq,k e dλ
−π
Z
π
k=1
and σn2 → σ2 as shown in Baxter (1962). Obviously,
Supλ f ˆ(λ) − fn (λ) = o(1) a.s.
by lemma 1 and equation (20) of Park (2002). Also,
Supλ |fn (λ) − f (λ)| = o(1) a.s.
by the same argument we used at the end of part 3 of the proof of lemma 2. The
first part of the present lemma therefore follows. If we consider that
∞
X
Γk = 2πf (0) and
−∞
∞
X
Γ?k = 2πf ? (0)
−∞
the second part follows quite directly.
•
Lemma A4 Under assumptions 1 and 2, we have
4
E ? |ε?t | = O(1) a.s.
PROOF OF LEMMA A4.
From the proof of lemma 2, we have E? |ε?t |4 ≤ c (An + Bn + Cn + Dn ) where c is a
constant. The relevant results are:
1. An =O(1) a.s.
2. E(Bn ) = o(q−rs ) (equation 23)
3. Cn ≤ 2r−1(C1n + C2n )
where C1n =o(1) a.s. (equation 30)
and E(C2n )=o(q−rs ) (equation 31)
4. Dn =o(1) a.s.
36
Under assumption 2’, we have that Bn =o(1) a.s. and C2n =o(1) a.s. because o(q−rs )
= o((cnk )−rs ) = o(n−krs ) = o(n−1−δ ) for δ >0. The result therefore follows.
•
Lemma A5 (CP lemma A4, Berk proof of lemma 3)Define
Mn? (i, j)
=E
?
"
n
X
t=1
u?t−i u?t−j
−
Γ?i−j
#2
.
Then, under assumptions 1 and 2, we have M?n (i, j)=O(n) a.s.
PROOF OF LEMMA A5.
For general linear models, Hannan (1960, p. 39) and Berk (1974, p. 491) have
shown that

Mn? (i, j) ≤ n 2
∞
X
k=−∞
Γ?k + |K4? |
∞
X
k=0
!2 
2

π̂q,k
for all i and j and where K is the fourth cumulant of ε?t . Since our MA sieve
bootstrap certainly fits into the class of linear models (with π̂k,q =0 for all k>q and
π̂0,q =1), this result applies here. But K?4 can be written as a polynomial of degree
4 in the first 4 moments of ε?t . Therefore, |K4? | must be O(1) a.s. by lemma A4.
P
The result now follows from lemma A3 and the fact that ∞
−∞ Γk =O(n).
?
4
•
Before going on, it is proper to note that the proofs of lemmas 4 and 5 are
almost identical to the proofs of lemma 3.2 and 3.3 of CP (2003). We present
them here for the sake of completeness.
PROOF OF LEMMA 4.
First, we prove equation ( 16). Using the Beveridge-Nelson decomposition of
P
u and the fact that y?t = tk=1 u?k , we can write:
?
t
n
n
n
n
1X ? ?
1X ? 1X ? ?
1X ?
yt = π̂n (1)
wt−1 εt + ũ?0
ε −
ũ ε .
n t=1
n t=1
n t=1 t
n t=1 t−1 t
37
Therefore, to prove the result, it suffices to show that
E
?
"
#
n
n
1X ? 1X ? ?
ε −
ũ ε = o(1) a.s.
n t=1 t n t=1 t−1 t
ũ?0
(41)
Since the ε?t are iid by construction, we have:
E
n
X
?
ε?t
t=1
and
E
?
n
X
!2
ũ?t−1 ε?t
t=1
= nσ?2 = O(n) a.s
!2
= nσ?2 Γ̃?0 = O(n) a.s
(42)
(43)
where Γ̃?0 =E? (ũ?t )2 . But the terms in equation ( 41) are n1 times the square root
of ( 42) and ( 43). Hence, equation ( 41) follows. Now, to prove equation ( 17),
consider again from the Beveridge-Nelson decomposition of u?t :
yt? 2 = π̂n (1)2 wt? 2 + (ũ?0 − ũ?t )2 + 2π̂n (1)wt? (ũ?0 − ũ?t )
= π̂n (1)2 wt? 2 + (ũ?0 )2 + (ũ?t )2 − 2ũ?t u?0 + 2π̂(1)wt? (ũ?0 − ũ?t )
thus,
n
n
n
n
n
n
X
X
1 X ? ?
1 ? 2 1 X ? 2 2 ?X ?
1 X ?2
?2
?
2 1
? 1
y
w
(ũ
)
−
w
+2π̂
(1)
w ũ .
=
π̂
(1)
+
(ũ
)
+
ũ
ũ
+2π̂
(1)ũ
n
n
n
0 2
n2 t=1 t
n2 t=1 t
n2 0
n2 t=1 t
n2 0 t=1 t
n t=1 t
n2 t=1 t t
By lemma 3, every term but the first of this expression is o(1) a.s. The result
follows.
•
PROOF OF LEMMA 5.
Using the definition of bootstrap stochastic orders, we prove the results by
showing that:
!−1 n
1X
?
? >
= Op (1)
E? x
x
p,t p,t
n
t=1
n
X
? E? x?p,t yt−1
= Op (np1/2 )a.s.
t=1
38
(44)
(45)
n
X
?
?
E xp.t εt = Op (n1/2 p1/2 )a.s.
?
(46)
t=1
The proofs below rely on the fact that, under the null, the ADF regression is a
finite order autoregressive approximation to the bootstrap DGP, which admits an
AR(∞) form.
Proof of ( 44): First, let us define the long run covariance of the vector x?p,t as
Ω?pp = (Γ?i−j )pi,j=1 . Then, recalling the result of lemma A5, we have that
2
n
1 X
?
? >
? xp,t xp,t − Ωpp = Op (n−1 p2 ) a.s.
E n
t=1
?
(47)
This is because equation ( 47) is squared with a factor of 1/n and the dimension
of x?p,t is p. Also,
? −1 −1
Ωpp ≤ [2π(infλ f ? (λ))] = O(1) a.s.
(48)
because, under lemma A4, we can apply the result from Berk (1974, equation
2.14). In that paper, Berk considers the problem of approximating a general linear
process, of which our bootstrap DGP can be seen as being a special case, with
a finite order AR(p) model, which is what our ADF regression does under the
null hypothesis. To see how his results apply, consider assumption 1 (b) on the
parameters of the original data’s DGP. Using this and the results of lemma 1, we
P
may say that ∞
k=0 |π̂q,k | < ∞. Therefore, as argued by Berk (1974, p. 493), the
P
polynomial 1 + qk=1 π̂q,k eikλ is continuous and nonzero over λ so that f? (λ) is also
continuous and there are constant values F1 and F2 such that 0 < F1 < f ? (λ) < F2 .
This further implies that (Grenander and Szegö 1958, p. 64) 2πF1 ≤ λ1 < ... < λk ≤
2π F2 where λi , i=1,...,p are the eigenvalues of the theoretical covariance matrix
of the bootstrap DGP. To get the result, it suffices to consider the definition of
the matrix norm. For a given matrix C, we have that kCk=sup kCxk for kxk ≤ 1
P
where x is a vector and kxk2 =x> x. Thus, kCk2 ≤ i,j c2i,j , where ci,j is the element
in position (i,j) in C. Therefore, the matrix norm kCk is dominated by the largest
modulus of the eigenvalues of C. This in turn implies that the norm of the inverse
of C is dominated by the inverse of the smallest modulus of the eigenvalues of C.
Hence, equation ( 48) follows.
Then, we write the following inequality:
39
!−1
n
n
1 X
X
?
? −1 ?
? >
?
? >
? −1 ? 1
xp,t xp,t − Ωpp ≤ E xp,t xp,t
− Ωpp E n
n t=1
t=1
where we used the fact that E? Ω?pp = Ω?pp . By equation ( 47), the right hand
side goes to 0 as n increases. Equation ( 44) then follows from equation ( 48).
Proof of ( 45): Our proof is almost exactly the same as the proof of lemma 3.2 in
Chang and Park (2002), except that we consider bootstrap quantities. As them,
we let y?t = 0 for all t≤0 and, for 1 ≤ j ≤ p, we write
n
X
?
yt−1
u?t−j
=
t=1
n
X
?
yt−1
u?t + Rn?
(49)
t=1
where
Rn? =
n
X
t=1
?
yt−1
u?t−j −
n
X
?
yt−1
u?t
t=1
where u is the bootstrap first difference process. First of all, we note that
Pn
P
?
?
th
?
element is nt=1 yt−1
u?t−j . Therefore, as we will
t=1 xp,t yt−1 is a 1×p vector whose j
see, by the definition of the Eucledian vector norm, equation ( 45) will be proved
if we show that R?n =O?p (n) uniformly in j for j from 1 to p. We begin by noting
that
n
n
n
?
t
X
?
yt−1
u?t =
t=1
X
t=1
?
yt−j−1
u?t−j −
X
?
yt−1
u?t
t=n−j+1
for each j. This allows us to write:
Rn? =
n
X
t=1
?
?
(yt−1
− yt−j−1
)u?t−j −
n
X
?
yt−1
u?t .
t=n−j+1
Let us call R?1n and R?2n the first and second elements of this last equation. Then,
because y?t is integrated,
?
R1n
=
n
X
t=1
=
j
n
X
X
t=1
=n
j
X
i=1
Γ?i−j
?
?
?
yt−1
− yt−j−1
ut−j
!
+
40
u?t−i
i=1
j
X
i=1
"
n
X
t=1
!
u?t−j
u?t−i u?t−j − Γ?i−j
#
.
By lemma A3, the first part is O(n) a.s. because ∞
−∞ Γk =O(1). Similarly, we can
?
use lemma A5 to show that the second part is Op (n1/2 p) a.s., where the change
to O?p comes from the fact that the result of lemma A5 is for the expectation
under the bootstrap DGP, the root-n from the fact that lemma A5 considers the
square of the present term and p from the fact that j goes from 1 to p. Thus,
R?1n =O(n)+O?p (n1/2 p) a.s.
P
Now, let us consider R?2n :
n
X
?
R2n
=
?
yt−1
u?t
t=n−j+1
t−1
X
n
X
=
i=1
t=n−j+1
=
n
X
n−j
X
u?t u?t−i +
t=n−j+1 i=1
u?t−i
!
n
X
u?t
t−1
X
u?t u?t−i
i=n−j+2 t=n−j+1
? a
? b
Letting R2n
and R2n
denote the first and second part of this last equation, we
have
#
!
"n−j
n−j
n
? a
R2n
=j
X
Γ?i
+
i=1
X
t=n−j+1
X
i=1
(u?t u?t−i − Γ?i )
= O(p) + Op? (n1/2 p) a.s.
uniformly in j, where the last line comes from lemmas A3 and A5 again. The order
of the first term is obvious while the order of the second term is explained like that
of the second term of R?1n . Similarly, we also have
? b
R2n
= (j − 1)
t−1
X
Γi +
n−j+1
n
X
t=n−j+2


t−1
X
i=n−j+1
= O(p) + Op (p3/2 ) a.s.

(u?t u?t−i − Γ?i )
uniformly in j under lemmas A3 and A5 and where the order of the first term is
obvious and the order of the second term comes from the fact that j appears in
the starting index of both summand. Hence, R?n is O?p (n) a.s. uniformly in j. Also,
P
?
under assumptions 1 and 2, and by lemma 1, nt=1 yt−1
u?t = O?p (n) a.s. uniformly
in j. Therefore, equation ( 49) is also O?p (n) a.s. uniformly in j, 1≤ j ≤ p. The
result follows.
Proof of ( 46): We begin by noting that for all k such that 1≤ k≤p,
41
E
?
n
X
u?t−k ε?t
t=1
which means that
!2
= nσ?2 Γ?0
2
n
X
?
?
E xp,t εt = npσ?2 Γ?0 .
?
t=1
But it has been show in lemma A2 that σ?2 and Γ?0 are O(1) a.s. The result is then
obvious.
•
PROOF OF theorem 2.
Follows directly from the previous results.
•
42
Appendix 2: Figures
Figure 1. Size discrepancy plot, ARMA model 1, N=25.
Figure 2. Size discrepancy plot, ARMA model 1, N=100.
43
Figure 3. Size discrepancy plot, ARMA model 1, N=25.
Figure 4. Size discrepancy plot, ARFMA model 2 and 3, N=25.
44
Figure 5. Size discrepancy plot, ARFIMA model, N=25.
Figure 6. Size discrepancy plot, ARMA model 1, N=25, GZW and MLE.
45
Figure 7. Size-power curves, ARMA model 1, N=25.
46
References
Agiakoglou, C and P. Newbold (1992). Empirical evidence on Dickey-Fuller type
tests. Journal of Time Series Analysis, 13, 471-83.
An H.-Z., Z.-G. Chen and E. J. Hannan (1982). Autocorrelation, autoregression
and autoregressive approximation. Annals of Statistics 10, 926-36.
Baxter, G. (1962) An asymptotic result for the finite predictor. Math. Scand. 10,
137-44.
K. N. Berk (1974). Consistent autoregressive spectral estimates. Annals of Statistics 2, 485-502.
Bickel, P. and P. Bühlmann (1999). A new mixing notion and functional central
limit theorems for a sieve bootstrap in time series. Bernouilli 5, 413-46.
Bühlmann, P (1995). Moving average representation of autoregressive approximations. Stochastic Processes and their Applications, 60, 331-42.
Bühlmann, P (1997). Sieve bootstrap for time series. Bernouilli 3, 123-48.
Chang, Y. and J.Y. Park (2002). On the asymptotics of ADF tests for unit roots.
Econometric Reviews 21, 431-47.
Chang, Y. and J.Y. Park (2003). A sieve bootstrap for the test of a unit root
Journal of Time Series Analysis 24, 379-400.
Davidson, R. and J. MacKinnon (1998). Graphical methods for investigating the
size and power of test statistics, The Manchester School 66, 1-26
Davidson, R. and J. MacKinnon (2003). Econometric theory and methods Oxford
University Press.
Dickey, D.A. and W.A. Fuller (1979). Distribution of estimators for autoregressive
time series with a unit root. Journal of the American Statistical Association 74,
427-31.
Dickey, D.A. and W.A. Fuller (1981). Likelihood ratio statistics for autoregressive
time series with a unit root Econometrica 49, 1057-72.
47
Fuller, W.A. (1996). Introduction to statistical time series, second edition, NewYork, John Wiley and Sons.
Galbraith, J. and V. Zinde-Walsh (1992). The GLS transformation matrix. Econometric Theory 8, 95-111.
Galbraith, J. and V. Zinde-Walsh (1994). A simple noniterative estimator for
moving average models Biometrika 81, 143-55.
Galbraith, J. and V. Zinde-Walsh (1997). On some simple, autoregression-based
estimation and identification techniques for ARMA models Biometrika 84, 685-96.
Grenander and Szegö (1958). Toeplitz forms and their applications. University of
California Press, Berkeley.
Hannan, E.J. (1960) Time series analysis. Methuen, London.
Hannan, E.J. and L. Kavalieris. (1986). Regression, autoregression models. Journal of Time Series Analysis, 7, 27-49.
Johansen, S. (2005) A small sample correction of the Dickey-Fuller test. Paper
presented at the 2005 World meetings of the econometrics society, London, England.
Kreiss, J.P. (1992) Bootstrap procedures for AR(∞) processes. in K.H. Jöckel, G.
Rothe and W. Sender Eds, Bootstrapping and related techniques Lecture Notes in
Economics and Mathematical Systems 376, Springer, Heidelberg.
Künsch, H.R. (1989) The Jacknife and the bootstrap for general stationary observations Annals of Statistics 17, 1217-41.
Móricz, F. (1976). Moment inequalities and the strong law of large numbers.
Zeitschrift für Wahrscheinlichkeitstheorie und Verwandte Gebiete 35, 298-314.
Ng. S. and P. Perron (1995). Unit root tests in ARMA models with data dependent
methods for selection of the truncation lag Journal of the American Statistical
Association 90, 268-81.
Ng, S. and P. Perron. (2001). Lag length selection and the construction of unit
root tests with good size and power. Econometrica 69, 1211-15.
48
Park, J.Y. (2002) An invariance principle for sieve bootstrap in time series. Econometric Theory 18, 469-90.
Perron and Ng (1996) Useful modifications to some unit root tests with dependent
errors and their local asymptotic properties. The Review of Economic Studies 63,
435-64.
Phillips, P.C.B. (1987) Time series regression with a unit root Econometrica 55,
277-301.
Phillips, P.C.B. and P. Perron (1988). Testing for a unit root in time series regressions. Biometrika 75, 335-46.
Phillips, P.C.B. and V. Solo (1992). Asymptotics for linear processes Annals of
Statistics 20, 971-1001.
Said, S.E. and D.A. Dickey (1984). Testing for unit roots in autoregressive-moving
average models of unknown order. Biometrika 71, 599-08.
Sakhanenko, A.I. (1980). On unimprovable estimates of the rate of convergence
in invariance principle. In Nonparametric Statistical Inference , Colloquia Mathematica Societatis János Bolyai 32. 779-783 . Budapest, Hungary.
Schewrt, G.W (1989) Tests for unit roots: a Monte Carlo investigation. Journal
of Business and Economics Statistics, 7, 147-59
Stout, W.F. (1974) Almost Sure Convergence New York, Academic Press.
49
Download