Estimation of Nonlinear Error Correction Models - STICERD

advertisement
Estimation of Nonlinear Error Correction
Models
Myung Hwan Seo∗
London School of Economics
Discussion paper
No. EM/2007/517
March 2007
∗
The Suntory Centre
Suntory and Toyota International Centres for
Economics and Related Disciplines
London School of Economics and Political Science
Houghton Street
London WC2A 2AE
Tel: 020 7955 6679
Department of Economics, London School of Economics, Houghton Street, London WC2A 2AE, United
Kingdom. E-mail address: m.seo@lse.ac.uk. This research was supported through a grant from the Economic
and Social Science Research Council.
Abstract
Asymptotic inference in nonlinear vector error correction models (VECM) that
exhibit regime-specific short-run dynamics is nonstandard and complicated. This
paper contributes the literature in several important ways. First, we establish the
consistency of the least squares estimator of the cointegrating vector allowing for
both smooth and discontinuous transition between regimes. This is a nonregular
problem due to the presence of cointegration and nonlinearity. Second, we obtain
the convergence rates of the cointegrating vector estimates. They differ depending
on whether the transition is smooth or discontinuous. In particular, we find that the
rate in the discontinuous threshold VECM is extremely fast, which is n^{3/2},
compared to the standard rate of n: This finding is very useful for inference on
short-run parameters. Third, we provide an alternative inference method for the
threshold VECM based on the smoothed least squares (SLS). The SLS estimator
of the cointegrating vector and threshold parameter converges to a functional of a
vector Brownian motion and it is asymptotically independent of that of the slope
parameters, which is asymptotically normal.
Keywords: Threshold Cointegration, Smooth Transition Error Correction,
Least
Squares,
Smoothed
Least
Squares,
Consistency,
Convergence Rate.
JEL No:
C32
© The author. All rights reserved. Short sections of text, not to exceed two paragraphs,
may be quoted without explicit permission provided that full credit, including © notice, is
given to the source.
1
Introduction
Nonlinear error correction models (ECM) have been studied actively in economics and there
are numerous applications. To list only a few, see Michael, Nobay, and Peel (1997) for
the application in purchasing power parity, Anderson (1997) in the term structure model of
interest rates, Escribano (2004) in Money demand, Psaradakis, Sola, and Spagnolo (2004) in
relation between stock prices and dividends, and Sephton (2003) in spatial market arbitrage,
and see also a review by Granger (2001). The models include smooth transition ECM of
Granger and Teräsvirta (1993), threshold cointegration of Balke and Fomby (1997), Markov
switching ECM of Psaradakis et al. (2004), and cubic polynomials of Escribano (2004) :
A strand of econometric literature focuses on testing for the presence of nonlinearity and
cointegration in an attempt to disentangle the nonstationarity from nonlinearity. A partial list
includes Hansen and Seo (2002), Kapetanios, Shin, and Snell (2006) and Seo (2006). Time
series properties of various ECMs have been established by Corradi, Swanson, and White
(2000) and Saikkonen (2005, 2007) among others. However, the result on estimation is still
limited. Most of all, consistency has not been proven except for special cases. It is di¢ cult to
establish due to the lack of uniformity in the convergence over the cointegrating vector space as
noted by Saikkonen (1995), which derived the consistency of the MLE of a cointegrated system
that is nonlinear in parameters but otherwise linear. de Jong (2002) studied consistency of
minimization estimators of smooth transition ECMs where the error correction term appears
only in a bounded transition function. Another case studied by Kristensen and Rahbek (2008)
is that the function is unbounded but becomes linear as the error correction term diverges.
Next, estimation of regime-switching and/or discontinuous cases has hardly been studied,
which includes important class of models such as smooth transition ECM and threshold
cointegration. Hansen and Seo (2002) proposed the MLE under normality but only to make
conjecture on the consistency. While it may be argued that the two-step approach by Engle
and Granger (1987) can be adopted due to the super-consistency of the cointegrating vector
estimate, the estimation error cannot be ignored in nonlinear ECMs as shown by de Jong
(2001).1
The purpose of this paper is to develop asymptotic theory for a class of nonlinear vector
error correction models (VECM). In particular, we consider regime switching VECMs, where
each regime exhibits di¤erent short-run dynamics and the regime switching depends on the
disequilibrium error. Examples include threshold cointegration and smooth transition VECM.
First, we establish the square root n consistency for the LS estimator of . This enables us
to employ de Jong (2002) to make asymptotic inference for both short-run and long-run
parameters jointly in smooth transition models. Then, we turn to discontinuous models,
focusing on the threshold cointegration model, which is particularly popular in practice.
This paper shows that the convergence of the LS estimator of in the threshold cointegration model is extremely fast at the rate of n3=2 : This asymptotics is based not on the
diminishing threshold asymptotics of Hansen (2000) but on the …xed threshold asymptotics. Two di¤erent irregularities contribute to this fast rate. First, the estimating function
1 It
provides an orthogonality condition, under which the two-step approach is valid.
1
lacks uniformity over the cointegrating vector space as the data becomes stationary at the
true value, which is the reason for the super-consistency of the standard cointegrating vector
estimates. Second,
takes part in regime switching, which is discontinuous. This model
discontinuity also boosts the convergence rate, yielding the super-consistency of the threshold estimate as in Chan (1993). While this fast convergence rate is certainly interesting and
has some inferential value, e.g. when we perform sequential test to determine the number of
regimes, it makes it very challenging to obtain an asymptotic distribution of the estimator.
Even in the stationary threshold autoregression, the asymptotic distribution is very complicated and cannot be tabulated (see Chan 1993). Subsampling is the only way to approximate
the distribution in the literature reported by Gonzalo and Wolf (2005), although it would not
work when is estimated due to the involved nonstationarity. Meanwhile, Seo and Linton
(2007) proposed the smoothed least squares (SLS) estimation for threshold regression models,
which results in the asymptotic normality of the threshold estimate and is applicable to the
threshold cointegration model.
We develop the asymptotic distributions of the SLS estimators of , the threshold parameter , and the other short-run parameters. The estimates ^ and ^ converge jointly to a
functional of Brownian motions, with the rates slightly slower than those of the unsmoothed
counterparts. This slow-down in convergence rate has already been observed in Seo and Linton and is the price to pay to achieve standard inference. The remaining regression parameter
estimates converge to the Normal as if the true values of and were known. This is not the
case if the transition function is smooth. We also show that can be treated as if known in
the SLS estimation of the short-run parameters including if we plug in the unsmoothed cointegrating vector estimate due to the fast convergence rate. A set of Monte Carlo experiments
demonstrates that this two-step approach is more e¢ cient in …nite samples.
This paper is organized as follows. Section 2 introduces the regime switching VECMs and
establishes the square root n consistency of the LS estimator of . Section 3 concentrates on
the threshold cointegration model, obtaining the convergence rate of the LS estimator of
and the asymptotic distributions of the SLS estimators of all the model parameters. It also
discusses estimation of the asymptotic variances. Finite sample performance of the proposed
estimators is examined in Section 4. Section 5 concludes. Proofs of theorems are collected in
the appendix.
R
We make the following conventions throughout the paper. The integral is taken over R
P
unless speci…ed otherwise and the summation t with respect to t is taken for all available
observations for a given sample. The subscript 0 and the hat ^ in any parameter indicate
the true value and an estimate of the parameter, respectively, e.g., 0 and ^ . For a function
R
2
2
g; kgk2 = g (x) dx and g (i) indicates the ith derivative of g: And, for a random vector xt
and a parameter ; we write gt ( ) = g (xt ; ), gt = g (xt ; 0 ) and g^t = g xt ; ^ : For example,
if zt ( ) = x0 ; then we write zt = x0
and z^t = x0 ^ . The weak convergence of stochastic
t
t 0
t
processes under the uniform metric is signi…ed by ) :
2
2
Regime Switching Error Correction Models
Let xt be a p-dimensional I (1) vector that is cointegrated with single cointegrating vector.
0
It is denoted by 1; 0 , normalizing the …rst element by 1: De…ne the error correction term
0
zt ( ) = x1t + x02t , where xt = (x1t ; x02t ) ; and let
Xt
1
( ) = 1; zt
( );
1
0
0
t 1
;
where t 1 denotes the vector of the lagged …rst di¤erence terms
Then, consider a two-regime vector error correction model
xt = A0 Xt
1
( ) + D 0 Xt
1
( ) dt
1
x0t
1;
; x0t
( ; ) + ut ;
l+1
0
.
(1)
where t = l + 1; :::; n; and dt ( ; ) = d (zt ( ) ; ) is a bounded function that controls the
transition from one regime to the other regime. It needs not be continuous. Typical examples
of the transition function include the indicator function 1 fzt ( ) > g and the logistic function
1
0
(1 exp ( 1 (zt ( )
, where = ( 1 ; 2 ) :
2 )))
The threshold cointegration model of Balke and Fomby (1997) and the smooth transition
error correction model in Granger and Teräsvirta (1993) can be viewed as special cases. As an
alternative, Escribano (2004) used cubic polynomials to capture this type of regime-speci…c
error correction behavior. While the last model is not nested in model (1), all this literature
focuses on the nonlinear adjustment based on the magnitude of disequilibrium error. In this
regard, Gonzalo and Pitarakis (2006) is di¤erent, in which a stationary variable determines
the regimes. While we study a two-regime model to simplify our exposition, we expect the
models with more than two regimes can be analyzed in a similar way. A symmetric threeregime model can be directly embedded in the two-regime model by replacing the threshold
variable zt with its absolute value jzt j. However, some of the assumptions imposed later on in
this section, in particular, the one with the series being I (1) becomes more di¢ cult to verify.
See Saikkonen (2007) :
We introduce some matrix notation. De…ne X ( ), X ( ) ; y; and u as the matrices
0
stacking Xt0 1 ( ), Xt0 1 ( ) dt 1 ( ; ) ; xt and ut , respectively. Let = vec (A0 ; D0 ) ;
where vec stacks rows of a matrix. We call by Az and Dz the columns of A0 and D0 that are
associated with zt 1 ( ) and zt 1 ( ) dt 1 ( ; ) ; respectively, and by z the collection of Az
and Dz : Then, we may write
y=
X ( );X ( )
Ip
+ u:
We consider the LS estimation, which minimizes
Sn ( ) = y
where
=
0
; ;
0 0
X ( );X ( )
Ip
0
y
X ( );X ( )
: The LS estimator is then de…ned as
^ = arg minS ( ) ;
n
3
Ip
;
(2)
where the minimum is taken over a compact parameter space . The concentrated LS is
computationally convenient, since it is simple OLS for a …xed ( ; ) ; i:e:
0"
#
0
0
X
(
)
X
(
)
X
(
)
X
(
)
( ; )=@
0
0
X ( ) X( ) X ( ) X ( )
1
0
X( )
0
X ( )
!
1
Ip A y;
which is then plugged back into (2) for optimization over ( ; ) : In practice, the grid search
over ( ; ) can be applied. In particular, the grid for can be set up around a preliminary
estimate of , that can be obtained based on the linear VECM, such as the Johansen’s
maximum likelihood estimator or the simple OLS estimator as described in Hansen and Seo
(2002) :
The asymptotic property of the estimator ^ is nonstandard due to the irregular feature
of Sn , which does not obey a uniform law of large numbers. Thus, we take a two-step
approach. First it is shown that ^ = 0 + Op n 1=2 by evaluating the di¤erence between
inf Sn ( ) and Sn ( 0 ) ; where the in…mum is taken over all 2
such that rn j
0j >
p
for a sequence rn such that rn ! 1 and rn = n ! 0: Similar approaches were taken by
Wu (1981) and Saikkonen (1995) among others. The latter established the consistency of the
maximum likelihood estimator of nonlinear transformation of in the linear model. Second,
the consistency of the short-run parameter estimates is established by the standard consistency
argument using a uniform law of large numbers.
We assume the following for the consistency of the estimator ^ .
Assumption 1 (a) fut g is an independent and identically distributed sequence with Eut =
0; Eut u0t = that is positive de…nite.
(b) f xt ; zt g is a sequence of strictly stationary strong mixing random variables with mixing numbers m ; m = 1; 2; : : : ; that satisfy m = o m ( 0 +1)=( 0 1) as m ! 1 for some
+"
+"
1; and for some " > 0; E jXt Xt0 j 0
< 1 and E jXt 1 ut j 0
< 1: Furthermore,
0
p
E xt = 0 and the partial sum process, x[ns] = n; s 2 [0; 1] ; converges weakly to a vector
Brownian motion B with a covariance matrix , which is the long-run covariance matrix
p
of xt and has rank p 1 such that 1; 00
= 0: In particular, assume that x2[ns] = n
converges weakly to a vector Brownian motion B with a covariance matrix , which is …nite
and positive de…nite.
(c) the parameter space
is compact and bounded away from zero for z and there is
a function d~(x) that is monotonic, integrable, and symmetric around zero, and by which
sup 2 jd (x; ) 1 fx > 0gj is bounded.
(d) Let ut ( ; ; ) be de…ned as in (1) replacing zt ( ) with zt + ; where belongs to a compact
set in R and let
S ( ; ; ) = E (u0t ( ; ; ) ut ( ; ; )) :
P
p
0
Then, assume that n1 t ut ( ; ; ) ut ( ; ; ) ! S ( ; ; ) uniformly in ( ; ; ) on any
compact set and S ( ; ; ) is continuous in all its arguments and it is uniquely minimized at
( ; ; ) = (0; 0 ; 0 ) :
Condition (a) is common as in Chan (1993) : It simpli…es our presentation but could be
4
relaxed. While we assume the stationarity and mixing conditions for f xt ; zt g ; Saikkonen
(2007) provide more primitive conditions on fut g and the coe¢ cients. They are more stronger
than each regime satisfying the standard conditions in the linear VECM. We also focus on
the processes without the linear time trend by assuming that E xt = 0:
Unlike for nonlinear models with stationary variables, the consistency proof for the nonlinear error correction models is di¢ cult to be established in a general level. It depends
crucially on the shape of the nonlinear transformation of the error correction term when the
variable takes large values, see Park and Phillips (2001) : Condition (c) identi…es the shape,
which is piece-wise linear in large values of the error correction terms. It is clear that the
indicator functions and logistic functions satisfy (c) : It distinguishes the current work from
previous ones. de Jong (2002) considered the nonlinearity only through a bounded function
and Kristensen and Rahbek (2008) through an unbounded function, which becomes linear for
the large values of the error correction zt 1 . Thus, they do not capture the regime-speci…c
behavior, which makes the consistency proof much di¤erent from the previous ones. We do
not consider more general functional forms discussed in Saikkonen (2005) and Escribano and
Mira (2002). The condition for z is not necessary but convenient for our proof and does not
appear to be much restrictive. We note that the case with z = 0 is similar to the model
studied by de Jong (2002) as the error correction term appears only in a bounded function in
this case. We also comment on the case where the threshold variable is jzt 1 ( )j : The proof
of the consistency goes through almost the same but an additional assumption on z such that
Az + Dz 6= 0 will facilitate the direct application of the proof since 1 fj t j > g 1 fj t j > 0g
for an integrated process t :
The conditions in (d) are a standard set of conditions that are imposed to ensure the
consistency of nonlinear least squares estimators. More speci…c set of su¢ cient conditions
to ensure the uniform law of large numbers can be found in Andrews (1987) or Pöscher
and Prucha (1991) ; for instance. It can be easily checked that the commonly used smooth
transition functions and the indicator functions for threshold models satisfy such conditions.
It implicitly impose the condition that D0 6= 0 as in the standard threshold model. The
identi…cation condition for here is to identify the cointegrating vector at the square root n
neighborhood and it is also imposed in de Jong (2002). The conditions in Assumption 1 do
not guarantee the existence of a measurable least squares estimator since we do not impose
the continuity of the function d: In this case, we can still establish the consistency based
on the convergences in outer measure, see e.g. Newey and McFadden (1994). To ease the
exposition, we implicitly assume the measurability in the theorem below.
p ^
Theorem 1 Under Assumption 1, ^
n
0 is op (1) and furthermore,
0 = op (1) :
When the transition function dt 1 ( ; ) satis…es certain smoothness condition, the asymptotic distribution of ^ can be derived following the standard approach using the Taylor
series expansion. de Jong (2002) explored minimization estimators with nonlinear objective
function that involves the error correction term. It derived the asymptotic distributions of
p
such estimators under the assumption that n ^
= Op (1) : Thus, we refer to de
0
Jong (2002) for the case with a smooth dt 1 ( ; ) : It is worth noting that the asymptotic
distribution of the short-run parameter estimates is in general dependent on the estimation
5
error of the cointegrating vector despite its super-consistency due to the nonlinearity of the
model. On the other hand, the threshold model has not been studied due to the irregular feature of the indicator function, although the model has been adopted more widely in empirical
research. We turn to the so-called threshold cointegration model and develop an asymptotics
for the model in the next section.
3
Threshold Cointegration Model
Balke and Fomby (1997) introduced the threshold cointegration model, which corresponds
to model (1) with d (zt ( ) ; ) = 1 fzt ( ) > g ; to allow for nonlinear and/or asymmetric
adjustment process to the equilibrium. That is,
xt =
(
A0 + Az zt
B 1 + B z zt
( ) + A1
1 ( ) + B1
1
x0t
x0t
; x0t
; x0t
1;
1;
l+1
l+1
0
0
+ ut ; if zt
+ ut ; if zt
( )
1( )>
1
;
where B = A + D: The motivation of the model was that the magnitude and/or the sign
of the disequilibrium zt 1 plays a central role in determining the short-run dynamics (see
e.g. Taylor 2001). Thus, they employed the error correction term as the threshold variable.
This threshold variable makes the estimation problem highly irregular as the cointegrating
vector subjects to two di¤erent sorts of nonlinearity. Even when the cointegrating vector is
prespeci…ed, the estimation is nonstandard. We introduce a smoothed estimator and study
the asymptotic properties of both smoothed and unsmoothed estimators in the following
subsections.
To resolve the irregularity of the indicator function, Seo and Linton (2007) introduced a
smoothed least squares estimator. To describe the estimator, de…ne a bounded function K ( )
satisfying that
lim K (s) = 0; lim K (s) = 1:
s! 1
s!+1
; where h ! 0
A distribution function is often used for K. Let Kt ( ; ) = K zt ( h)
as n ! 1: To de…ne the smoothed objective function, we replace dt 1 ( ; ) in (1) with
Kt 1 ( ; ) and de…ne the matrix X ( ) that stacks Xt 1 ( ) Kt 1 ( ; ) : Then, we have the
smoothed objective function
Sn ( ) = (y
[(X ( ) ; X ( ))
0
Ip ] ) (y
[(X ( ) ; X ( ))
Ip ] ) :
(3)
And, the Smoothed Least Squares (SLS) estimator is de…ned as
^ = arg minSn ( ) :
2
Similarly as the concentrated LS estimator, we can de…ne
0"
#
0
0
X
(
)
X
(
)
X
(
)
X
(
)
( ; )=@
0
0
X ( ) X( ) X ( ) X ( )
6
1
0
X( )
0
X ( )
!
1
Ip A y;
(4)
and by pro…ling we can minimize Sn ( ) with respect to ( ; ) :
It is worth mentioning that the true model is a threshold model and we employ the
smoothing only for the estimation purpose. Since
zt ( )
h
K
! 1 fzt ( ) > g
as h ! 0; Sn ( ) converges in probability to the probability limit of Sn ( ) as n ! 1:
We make the following assumptions regarding the smoothing function K and the bandwidth parameter h:
Assumption 2 (a) K is twice di¤ erentiable everywhere, K(1) is symmetric around zero, K(1)
R (1)
4
and K(2) are uniformly bounded and uniformly continuous. Furthermore,
K (s) ds,
R (2)
R 2 (2)
2
K (s) ds, and
s K (s) ds are …nite.
R i (1)
(b) For some integer # 1 and each integer i (1 i #) ;
s K (s) ds < 1; and
Z
si
Z
1
sgn (s) K(1) (s) ds
=
0;
s# sgn (s) K(1) (s) ds
6=
0;
si K(1) (s) ds
=
0;
K(2) (s) ds
=
0:
and K (x) K (0) ? 0 if x ? 0:
(c) Furthermore, for some " > 0;
lim hi
#
n!1
Z
jhsj>"
lim h
n!1
1
Z
jhsj>"
(d) The sequence fhg satis…es that for some sequence m
nh3
log (nm) n1
6=r 2
h m
2
1
1;
! 0;
! 0;
and
h
where k is the dimension of
9k=2 3 (9k=2+2)=r+"
n
m
!0
and r > 4 is speci…ed in Assumption 4.
These conditions are imposed in Seo and Linton (2007) and common in smoothed estimation as in Horowitz (1992) for example. Condition (b) is an analogous condition to that
de…ning the so-called #th order kernel, and requires a kernel K(1) that permits negative values
when # > 1 and K (0) = 1=2: We impose the condition that K (x) K (0) ? 0 if x ? 0 as we
need negative kernels for # > 1: Condition (c) is standard. The standard normal cumulative
distribution function clearly satis…es these conditions and see Seo and Linton (2007) for an
example with # > 1: Condition (d) serves to determine the rate for h: While this range of
rates is admissible, we do not have a sharp bound and thus no optimal rate. It the data
7
1
were independent and identically distributed, the conditions simplify to nh2
log n ! 0
and nh3 ! 0: As will be shown in the following section, the smaller h implies the faster convergence, whereas it may destroy the asymptotic normality if it is too small. This condition
is not relevant for the consistency of ^: On the other hand, too large h introduces correlation
between the threshold estimate ^ and the slope estimate ^ :
The following corollary establishes the consistency of the smoothed least squares estimator.
Corollary 2 Under Assumption 1 and 2, ^
op (1) :
3.1
0
is op (1) and furthermore,
p
n ^
is
0
Convergence Rates and Asymptotic Distributions
The unsmoothed LS estimator of the threshold parameter is super-consistent in the standard stationary threshold regression and has complicated asymptotic distribution, which depends not only on certain moments but on the whole distribution of data. On the contrary,
the smoothed LS estimator of the same parameter exhibits asymptotic normality, while the
smoothing slows down the convergence rate. The nonstandard nature of the estimation of
threshold models becomes more complicated in threshold cointegration since the thresholding relies on the error correction term, which is estimated simultaneously with the threshold
parameter : We begin with developing the convergence rates of the unsmoothed estimators
of the cointegrating vector and the threshold parameter and then explore the asymptotic
distribution of the smoothed estimators.
The asymptotic behavior of the threshold estimator heavily relies on whether the model
is continuous or not. We focus on the discontinuous model. The following is assumed.
Assumption 3 (a) For almost every t ; the probability distribution of zt conditional on
has everywhere positive density with respect to Lebesque measure.
(b) E Xt0 1 D0 D00 Xt 1 jzt 1 = 0 > 0:
t
The condition (a) ensures that the threshold parameter is uniquely identi…ed and is
common in threshold autoregressions. While it is more complicated to verity the condition in
general, it is easy to see that the threshold VECM without any lagged term satis…es it since
it entails threshold autoregression in zt : This remark is also relevant to Assumption 4 (c) :
The discontinuity of the regression function is assumed in the condition (b) : At the threshold
point 0 ; the change in the regression function is nonzero, which makes the regression function
discontinuous. This enables a super-e¢ cient estimation of the threshold parameter : If the
regression function is continuous, Gonzalo and Wolf (2005) showed that the least squares
estimator of the threshold parameter is root n consistent and asymptotically normal in the
context of stationary threshold autoregression, which may be used to test for the condition
(b) :
We obtain the following rate result for the unsmoothed estimator of and :
Theorem 3 Under Assumption 1 and 3, ^ =
8
0
+ Op n
3=2
and ^ =
0
+ Op n
1
:
It is surprising that the cointegrating vector estimate converges faster than the standard
n-rate. Heuristically, = ^ x02t 1 ^
0 behaves like a threshold estimate in a stationary
threshold model as
1 fzt 1 ( ) > g = 1 fzt 1 > g :
Since sup2 t n jxt 1 j = Op n1=2 and the threshold estimate is super-consistent, it is ex3=2
pected that ^
: This fast rate of convergence has an important inferential
0 = Op n
implication for the short-run parameters as will be discussed later.
We turn to the smoothed estimator for the inference of the cointegrating vector. While
subsampling is shown to be valid to approximate the asymptotic distribution of the unsmoothed LS estimator of the threshold parameter in the stationary threshold autoregression
(see Gonzalo and Wolf 2005), the extension to the threshold cointegration is not trivial due to
the involved nonstationarity. The smoothing of the objective function enables us to develop
the asymptotic distribution based on the standard Taylor series expansion. Let f ( ) denote
the density of zt and f ( j ) the conditional density given t = . Also de…ne
~ 1 (s) = K(1) (s) (1 fs > 0g
K
K (s))
and
K(1)
2
Xt0
2
v
= E
2
q
= K(1) (0) E Xt0
2
1 D0 ut
2
~1
+ K
0
1 D0 D0 Xt 1 jzt 1
2
2
=
Xt0
0
2
0
1 D0 D0 Xt 1
f(
jzt
1
=
0
f(
0)
0) :
(5)
(6)
First, we set out assumptions that we need to derive the asymptotic distribution.
r
r
Assumption 4 (a) E[jXt u0t j ] < 1; E[jXt Xt0 j ] < 1; for some r > 4;
(b) f xt ; zt g is a sequence of strictly stationary strong mixing random variables with mixing
numbers m ; m = 1; 2; : : : ; that satisfy m
Cm (2r 2)=(r 2)
for positive C and ;as
m ! 1:
(c) For some integer # 2 and each integer i such that 1 i # 1; all z in a neighborhood
of , almost every ; and some M < 1, f (i) (zj ) exists and is a continuous function of z
satisfying f (i) (zj ) < M . In addition, f (zj ) < M for all z and almost every .
(d) The conditional joint density f (zt ; zt m j t ; t m ) < M; for all (zt ; zt m ) and almost all
( t; t m) :
(e) 0 is an interior point of :
These assumptions are analogous to those imposed in Seo and Linton (2007) that study the
SLS estimator of the threshold regression model. The condition (a) ensures the convergence of
the variance covariance estimators but can be weakened. We need stronger mixing condition
as set out in (b) than that required for consistency. The conditions (c) - (e) are common
in the smoothed estimation as in Horowitz (1992), only (d) being an analogue of a random
sample to a dependent sample. In particular, we require more stringent smoothness condition
(condition (c)) for the conditional density f (zj ) for the smoothed estimation than for the
unsmoothed estimation.
9
We present the asymptotic distribution below.
Theorem 4 Suppose Assumption 1 - 4 hold. Let W denote a standard Brownian motion that
is independent of B: Then,
nh 1=2 ^
p
nh 1 (^
p
n ^
0
0)
!
0
)
)
! 1
!
R1
R1
R
0
BB
B
BdW
0
R0 1 0
B
1
W (1)
0
0 "
!
#
1
d
t
1
0
N @0; E
Xt 1 Xt 1
dt 1 dt 1
v
2
q
1
1
A;
and these two random vectors are asymptotically independent. The unsmoothed estimator ^
has the same asymptotic distribution as ^ .
We make some remarks on the similarities to and di¤erences from the linear cointegration
model and the stationary threshold model. First, the asymptotic distribution of ^ and ^ is
mixed normal, the same asymptotic distribution as that of the standard OLS estimator of the
cointegrating vector and the constant in the exogenous case up, to the scaling factor v = 2q :
A reading of the proof of this theorem reveals that the linear part does not contribute to
the asymptotic variance of ^ although appears in both inside the indicator and the linear
part of the model. The factor 2v = 4q contains the conditional expectation and density at the
discontinuity point. It is the asymptotic variance of threshold estimate if the true cointegrating
vector were known. Other than the estimation of this factor, the inference can be made in the
same way as in the standard OLS case. Second, the cointegrating vector converges faster than
the usual n rate but slower than the n3=2 , which is obtained for the unsmoothed estimator.
This is also the case for the estimators ^ and ^ for the threshold point : Third, as in the
stationary threshold model, the slope parameter estimate ^ is asymptotically independent of
the estimation of and :
The convergence rates of the estimators ^ and ^ depend on the smoothing parameter h
in a way that the smaller h accelerates the convergence. This is in contrast to the smoothed
maximum score estimation. In the extremum case where h = 0; we obtain the fastest convergence, which corresponds to the unsmoothed estimator. The smaller h boosts the convergence
rates by reducing the bias but too small a h destroys the asymptotic normality. We do not
know the exact order of h where the asymptotic normality breaks down, which requires further
research.
The asymptotic independence between the estimator ^ and the estimator ^ of the slope
parameter
and the asymptotic normality of ^ contrast the result in smooth transition
cointegration models, where the asymptotic distribution of ^ not only draws on the estimation
of but is non-Normal without certain orthogonality condition (see e.g. de Jong 2001; 2002):2
This is due to the slower convergence of the estimators of in the smooth transition models.
Therefore, it should also be noted that the Engle-Granger type two-step approach, where
2 In
case of Ezt = 0; we can still retain the asymptotic Normality of the slope estimate by estimating (1)
1 P
~
~ = 1; ~ 0 xt 1
1 ( ) with zt 1
s xs 1 ; for any n-consistent ; as in de
n
after replacing the zt
Jong (2001) : It is worth noting, however, that this demeaning increases the asymptotic variance.
10
the cointegrating vector is estimated by the linear regression of x1t on x2t and the estimate
is plugged in the error correction model, does not work in our case in the sense that the
estimation error a¤ects the asymptotic distribution of ^ : Therefore, the above independence
result is useful for the construction of con…dence interval for the slope parameter .
Furthermore, we may propose a two-step approach for the inference of the short-run
parameters making use of the fact that the unsmoothed estimator ^ converges faster than
the smoothed estimator ^ : In principle, we can treat ^ as if it is the true value 0 : The
following corollary states this.
Corollary 5 Suppose Assumption 1 - 4 hold. Let ^ ( ) be the smoothed estimator of when
is given. Then, ^ ^
has the same asymptotic distribution as that of ^ ( 0 ), which is
N 0;
3.2
2
v
4
q
:
Asymptotic Variance Estimation
The construction of con…dence interval for the slope parameter is straightforward as ^ and
^ are just OLS estimators given ( ; ) : We may treat the estimates ^ and ^ (or ^ and ^ )
as if they are 0 and 0 due to Theorem 4. We may use either 1 f^
zt 1 > ^ g or Kt 1 ^ ; ^
2 3
2
for dt 1 : The inference for ( ; ) requires to estimate , v ; and q : The estimation of can
be done by applying a standard method of HAC estimation to xt ; see e.g., Andrews (1991).
Although 2v and 2q involve nonparametric objects like conditional expectation and density,
we do not have to do a nonparametric estimation as those are limits of the …rst and second
derivatives of the objective function with respect to the threshold parameter :
Thus, let
0
1
(1)
^t = p Xt 1 ^ DKt 1 ^ ; ^ u
^t ;
(7)
2 h
where u
^t is the residual from the regression (1) ; and let
^ 2v =
h
1X 2
Qn22 ^ ;
^ ; and ^ 2q =
n t t
2n
where Qn22 is the diagonal element corresponding to of the Hessian matrix Qn ; see Appendix
for the explicit formulas. Consistency of ^ 2q is straightforward from the proof of Theorem 4
and that of ^ 2v can be obtained after a slight modi…cation of Theorem 4 of Seo and Linton
(2007) :
We can construct con…dence interval for based on Corollary 5. The estimation of 2v and
2
= ^ : Due to the asymptotic normality and independence,
q can be done as above with
the construction of con…dence interval is much simpler this way without the need to estimate
:
Even though ^ and ^ are asymptotically independent of ^ ; ^ and ^ ; ^ ; they are
dependent in …nite samples. So, we may not bene…t from the imposition of the block diagonal
3 In the MLE of linear VECM, the likelihood ratio statistic for a hyhothesis on
converges to the Chi-square
distribution as the log likelihood function under normality is approximately quadratic. It will be interesting
to examine if the same holds true for the threshold cointegration model.
11
feature of the asymptotic variance matrix. Corollary 5 enables the standard way of constructing con…dence interval based on the inversion of t-statistic with jointly estimated covariance
matrix. In this case, we may de…ne t in (7) using the score of ut ( ) with respect to ( ; )
for a given ^ : See Seo and Linton (2007) for a more discussion.
4
Monte Carlo Experiments
This section investigates the …nite sample performance of the estimators explored in this
paper. Of particular interest are the various estimators of the cointegrating vector and
the threshold parameter : We compare the unsmoothed LS estimator ^ of with the SLS
estimator ^ and the Johansen’s maximum likelihood estimator ~ ; which is based on the linear
VECM. For comparison purpose, we also compute the restricted estimators ^ 0 and ^ 0 ; which
are the unsmoothed LS and SLS estimators of when is …xed at the true value 0 . Similarly,
^ 0 and ^ 0 denote the restricted unsmoothed LS and SLS estimators of when is prespeci…ed
at the true value 0 . To distinguish the SLS estimator ^ from the two-step SLS estimator,
let ^ 2 denote the two-step estimator.
The simulation samples are generated from the following process
xt
1
0
=
+
!
(x1t
0:5
0
!
0 x2t 1 )
1
(x1t
1
+
2
0
!
1 fx1t
0 x2t 1 ) 1 fx1t 1
1
0 x2t 1
0 x2t 1
>
0g
0g
+ ut ;
where ut
iidN (0; I2 ) ; t = 0; :::; n; and x0 = u0 : This process was considered in Hansen
and Seo (2002) ; who provide us with the …nite sample performance of the maximum likelihood
estimator of and : We …x 0 = 1; and 0 = 0: While the data generating process does
not contain any lagged …rst di¤erence term, the model is estimated with two lagged terms in
addition to the error correction term. The estimation is based on the grid search with grid
sizes for and being 100 and 500, respectively. The grid for is set around the Johansen’s
maximum likelihood estimator ~ as in Hansen and Seo (2002) : For the smoothed estimators,
we use the standard normal distribution function for K and set h = ^ n 1=2 log n; where ^ 2 is
the sample variance of the error correction term.
Table 1 summarizes the results of our experiments with various sample sizes n = 100; 250,
and 500. We examined the …nite sample distributions of the various estimators in terms of
mean, root mean squared error (RMSE), mean absolute error (MAE), and selected percentiles
from 1000 simulation replications. The RMSE and MAE are reported in log :
The results appear as we expected. The Johansen’s maximum likelihood estimator ~ does
not perform as well as all the other estimators, which are obtained from estimating the correctly speci…ed threshold cointegration model. The unsmoothed estimators and the restricted
estimators outperform the smoothed and the unrestricted counterparts, respectively, in terms
of RMSE and MAE. However, careful examination of percentiles reveals that the smoothed
estimator ^ exhibits the smaller length of interval between …ve percentile and ninety-…ve per-
12
Mean
n = 100
e
0
b
0
b
0
0
b
0
b
0
b
b0
b
b0
b2
0
0
0
0
0
0
n = 250
e
0
b
0
b
0
0
b
0
b
0
b
b0
b
b0
b2
0
0
0
0
0
0
n = 500
e
0
b
0
b
0
0
b
0
b
0
0
RMSE
in log
in log
MAE
5
25
Percentile(%)
50
75
0.001
0.001
-0.001
-0.001
-0.002
-2.841
-2.880
-3.009
-2.854
-2.947
-3.191
-3.293
-3.617
-3.268
-3.470
-0.083
-0.082
-0.063
-0.085
-0.068
-0.028
-0.023
-0.015
-0.025
-0.020
0.000
-0.001
-0.001
0.001
-0.001
0.028
0.023
0.011
0.025
0.019
0.098
0.091
0.062
0.084
0.068
-0.672
-0.631
-0.461
-0.444
-0.497
0.138
0.080
0.149
0.090
0.146
-0.284
-0.408
-0.265
-0.380
-0.272
-2.649
-2.617
-2.702
-2.719
-2.802
-1.153
-1.023
-0.945
-0.862
-1.014
-0.293
-0.209
-0.058
-0.021
-0.086
-0.032
-0.038
0.241
0.192
0.219
0.258
0.123
0.700
0.504
0.582
0.000
0.001
0.000
0.000
0.000
-3.960
-4.302
-4.639
-4.278
-4.540
-4.241
-4.637
-5.133
-4.626
-4.866
-0.030
-0.022
-0.014
-0.022
-0.019
-0.011
-0.007
-0.003
-0.007
-0.006
0.000
0.001
0.000
0.000
0.000
0.011
0.007
0.003
0.007
0.006
0.031
0.023
0.014
0.020
0.016
-0.116
-0.102
-0.057
-0.049
-0.054
-0.928
-1.007
-0.888
-0.901
-0.917
-1.778
-2.085
-1.677
-1.826
-1.704
-0.734
-0.526
-0.746
-0.622
-0.769
-0.108
-0.069
-0.073
-0.043
-0.071
-0.031
-0.021
0.019
0.030
0.025
0.015
-0.002
0.113
0.100
0.105
0.156
0.069
0.236
0.184
0.236
0.000
-0.000
-0.000
-0.000
-0.000
-2.006
-2.279
-2.491
-2.253
-2.335
-2.147
-2.425
-2.698
-2.383
-2.476
-0.015
-0.009
-0.005
-0.009
-0.007
-0.005
-0.003
-0.001
-0.003
-0.003
0.000
-0.000
-0.000
-0.000
-0.000
0.006
0.002
0.001
0.003
0.002
0.015
0.008
0.005
0.008
0.008
95
b
-0.006 -1.165 -1.346 -0.115 -0.035 -0.004 0.020 0.096
0
b0
-0.010 -1.438 -1.636 -0.068 -0.022 -0.007 0.002 0.045
0
b
0.032
-1.080 -1.199 -0.092 -0.013 0.030
0.076 0.155
0
b0
0.031
-1.213
-1.319
-0.051
-0.007
0.031
0.061 0.122
0
b2
0.028
-1.102
-1.229
-0.086
-0.015
0.025
0.068 0.146
0
~
Notation. Johansen’s maximum likelihood estimator, ; the unsmoothed estimators, ^ , the restricted unsmoothed estimators, ^0 , the smoothed estimators, ^; the
restricted smoothed estimators, ^0 ; and …nally the two-step estimator ^ 2 .
Table 1: Distribution of Estimators
13
centile than the unsmoothed estimator ^ : And the percentiles of all the estimators of seem
very symmetric around the medians, which are mostly zeros.
Similar observation is made for the comparison among the estimators of the threshold
parameter : The unsmoothed estimators have smaller RMSE and MAE than the smoothed
ones. The knowledge on the true value of helps reduce RMSE and MAE of the estimators
of : The two-step estimator ^ 2 ; which employs more e¢ cient estimator ^ than ^ ; indeed
improves upon ^ and even has the smaller RMSE than the restricted estimator ^ 0 when
n = 250: The distributions of all the estimators of appear quite asymmetric and have large
negative biases when n = 100: However, for n = 500 the distributions become more or less
symmetric around the medians and the biases are much smaller than those for n = 100:
We also note that the biases of the unsmoothed estimators are much bigger than those of
smoothed estimators when n = 100 and n = 250:
5
Conclusion
We have established the consistency of the LS estimators of the cointegrating vector in general regime switching VECMs, validating the application of some of existing results on the
joint estimation of long-run and short-run parameters for models with smooth transition.
We also provided asymptotic inference methods for threshold cointegration, establishing the
convergence rates and asymptotic distributions of the LS and SLS estimators of the model parameters. In particular, the theory and Monte Carlo experiments indicate that the inference
on the threshold parameter can be improved upon by using the two-step estimator.
While we only considered two-regime models, our results might be extended to multipleregime models, provided that the assumptions on stationarity and invariance principle in
Assumption 1 hold true. In that case, we may consider the sequential estimation strategy
discussed in Bai and Perron (1998) and Hansen (1999) : A sequence of estimations and tests can
determine the number of regimes and the threshold parameter. The LM test by Hansen and
Seo (2002) can be employed to test for the presence of the second break without modi…cation
due to the fast rate of convergence of the cointegrating vector estimators.
It is also possible to think of the case with more than one cointegrating relation if p is
greater than 2: In this case, the threshold variable can be understood as a linear combination
of those cointegrating vectors. However, the models commonly used in empirical applications
are bivariate and the estimation of such a model is more demanding and thus we leave it for
a future research.
Proof of Theorems
A word on notation. Throughout this section, C stands for a generic constant that is …nite.
The proof of Theorem 1 makes use of the following lemma, which might be of independent
interest.
Lemma 6 Let fZn g be a sequence of random variables and frn g be a sequence of positive
numbers such that rn ! 1: If an Zn = op (1) for any sequence fan g such that an =rn ! 0;
14
then
rn Zn = Op (1) :
Proof of Lemma 6 Let Xn = rn Zn and suppose Xn 6= Op (1) : Then, there exists " > 0
such that for any K;
lim sup Pr (jXn j > K) > ":
n!1
Thus, there exists n1 such that Pr (jXn1 j > 1) > ": Similarly one can …nd n2 > n1 such
that Pr (jXn2 j > 2) > "; and n3 < n4 <
; accordingly. Let bn = 1 for n
n1 ; b n = 2
for n1 < n
n2 ; etc. Then, it is clear from the construction that bn ! 1, and that
Pr (jXn j > bn ) > "; in…nitely often (i:o:), which implies that Xn =bn 6= op (1) : However, given
the condition of the lemma, Xn =bn = (rn =bn ) Zn = op (1) since (rn =bn ) =rn ! 0: This yields
contradiction.
Proof of Theorem 1
Let rn ; = f 2 : rn j
0 j > g : Supremums and in…mums in this proof are taken on
1
the set rn ; ; unless speci…ed otherwise. To show that ^
0 = op (rn ) we need to show
that for every > 0;
Pr inf Sn ( ) =n
Let X
;
= X ( );X ( )
Sn ( 0 ) =n > 0
! 1:
(8)
Ip and rewrite (2) as
0
Sn ( ) = y 0 y +
X
0
;
X
;
2y 0 X
;
:
p
p
Let = n (
n
0 ) and rn be a sequence of real numbers such that
p
rn = n ! 0 as n ! 1: Then,
j j = rn j
0j
p
n=rn
p
rn ! 1 and
n=rn ! 1
(9)
for any 2 rn ; :
Note that
1
S ( )
n n
1
1 0
S ( 0) =
X
n n
n
0
;
X
2 0
yX
n
;
+
;
1 0
yy
n
1
S ( 0)
n n
and that y 0 y=n Sn ( 0 ) =n = y 0 y=n u0 u=n converges in probability to a positive constant
as in the standard linear regression due to Assumption 1 (c) and this term is free of the
parameter : Therefore, it is su¢ cient to show that
=
!
Pr inf
(
Pr inf
1;
1 0
X
n
1
j j
n
0
;
0
X
X
0
;
1
2 y0 X
n
;
X
2
j j
15
;
;
1 y0 X ;
2
n j j
0
!
)
0
(10)
as n ! 1: This follows from (9) if we show that (i)
1 y0 X ;
n j j
sup
X
0
= Op (1) ;
(11)
X
and (ii) inf n1 0 ;j j2 ; converges weakly to a random variable that is positive with probability one.
Show (i) …rst. Note that y 0 X ; =n consists of the sample means of the product of xt and
1; zt 1 ( ) ; 0t 1 and that of xt and 1; zt 1 ( ) ; 0t 1 dt 1 ( ; ) : However, as and
P
P
dt 1 ( ; ) are bounded, it is su¢ cient to observe that n1 t j x0t j ; and n1 t xt 0t 1 are
Op (1) ; and that
1 X
jzt
nj j t
1
1 X
jzt
n t
( ) x0t j
x0t j +
1
1X
n t
x02t
xt p
1
n
= Op (1) ;
p
by the law of large numbers for jzt 1 x0t j ; the weak convergence of x2t = n and the CauchySchwarz inequality.
We consider (ii) now. Let _ = = j j : Since n1 X 0; X ; is the matrix of the sample means
of the outer product of
1; zt
1
0
t 1
( );
; 1; zt
0
t 1
1
( );
n
( ; )
dt
1
0
( ; ) ;
we may write
1
2
nj j
0
X
0
;
X
0
z
=
;
(
Ip )
z
+ Rn ( ) ;
where
n
( ; )=
and
1
n2
P
1
n2
t
P
t
x02t
x02t
1_
2
1_
dt
2
1
( ; )
sup jRn ( )j = Op
by the same reasoning to show (i) : Since
it is su¢ cient to show that
n
( ; )) _
0
_
z
1
n2
1
n2
r
pn
n
P
Pt
t
2
!
x02t
1_
x02t
1_
2
2
dt
1
( ; )
d2t
1
( ; )
!
;
;
is bounded away from zero by Assumption 1 (c) ;
!
R1 2
W2
W 1 fW > 0g
0
R1
R1 2
;
W 1 fW > 0g 0 W 2 1 fW > 0g
0
R1
0
(12)
where W is the standard Brownian motion. Note that the matrix in (12) is positive de…nite
and free of parameters up to a constant multiple _ 0 _ ; which is positive and bounded away
from zero.
Now we show (12) : It follows from Assumption 1 (b) and the continuous mapping theorem
that
Z 1
Z 1
1 X 0
2
0 2
d
x
_
)
(B
_
)
=
W2 _0 _ ;
(13)
2t 1
n2 t
0
0
16
and that
1 X 0
x2t
n2 t
2
1_
1 x02t
1_ > 0 )
Z
1
2
(B 0 _ ) 1 fB 0 _ > 0g =d _ 0 _
0
Z
1
0
W 2 1 fW > 0g :
Then, it remains to show that
1 X 0
x2t
n2 t
sup
2
1_
1 x02t
1_
d (zt
1
( ); )
= op (1)
(14)
d2 (zt
1
( ); )
= op (1) :
(15)
>0
and that
sup
1 X 0
x2t
n2 t
1_
2
1 x02t
1_
>0
Since 1 fx > 0g g 2 (x)
(supx jg (x)j + 1) j1 fx > 0g g (x)j for any bounded function g
p 2
0
and sup _ ;t x2t 1 _ = n = Op (1), it is su¢ cient to show that
sup
1X
d (zt
n t
1
1 x02t
( ); )
1_
>0
R1n + R2n = op (1) ;
where
R1n
=
R2n
=
1X
1 fzt
n t
1X
sup
jd (zt
n t
Due to (9) and the fact that zt
1X
1 fzt
n t
1
1
( ) > 0g
1
( ); )
sup
( ) > 0g
1
1 x02t
( ) = zt
1_
1
+
1 x02t
1 fzt
x02t
p
1
_
n
>0
1X
1
n t
(
inf
x02[nr] _
p
n
inf
x02[nr] _
p
n
1
1_
>0
( ) > 0gj :
j j;
x02t 1 _
p
n
r n zt
p
n
1
:
(16)
For any mn > 0;
sup 1
j _ j=1
(
x02[nr] _
p
n
r n zt
p
n
1
)
1
1
Consider the …rst term in (17) : Since
E1
(
inf
j _ j=1
x02[nr] _
p
n
j _ j=1
(
x02[nr] _
p
n
mn
j _ j=1
r n zt
p
n
)
mn
! Pr
inf jBr0 _ j
j _ j=1
)
r n zt
p
n
+1
converges weakly on f0
)
1
0
r
1g
1
> mn (17)
:
f _ : j _ j = 1g ;
= 0;
for any r 2 [0:1] and fornany decreasing osequence mn ! 0: And for the second term in
rn zt 1
p
(17) ; we observe that E1
> mn is the same for all t and that it goes to zero as
n
17
p
p
mn n=rn ! 1: Consider the right hand side term of (16). Letting mn ! 0 and mn n=rn !
1; we observe that
1X
E
1
n t
x02t 1 _
p
n
r n zt
p
n
1
=
Z
1
E1
0
(
x02[nr] _
p
n
)
rn z[nr]
p
n
dr ! 0;
by the dominated convergence theorem. This shows that R1n = op (1) :
Next, it follows from Assumption 1 (c) that
sup jd (zt
1
( ); )
1 fzt
1
( ) > 0gj
x02[nr]
d~ inf z[nr] + p
n
!
;
(18)
where [nr] = t 1: Due to (9) ; for each r and for any sequence f"n g such that "n ! 0 and
p
"n n=rn ! 1 as n ! 1;
"
E d~ inf z[nr] +
"
E d~
+CE
p
n
x02[nr]
p
n
inf
rn
j _ j=1
!#
x02[nr] _
p
n
z[nr]
p rn > "n
n
1
"n
+1
!!
(
x02[nr] _
p
n
E d~
p
x02[nr] _
p
n
n
inf
rn j _ j=1
p
d~ d"n n=rn ! 0:
"n
x02[nr] _
p
n
inf
j _ j=1
n
= Op (1) ; E1 inf j _ j=1
n
o
z[nr]
p rn > "n ! 0; and
1; we have E1
n
As inf j _ j=1
z[nr]
1 sup
j j
x02[nr] _
p
n
!!
"n 1
2"n
)!
(
inf
j _ j=1
x02[nr] _
p
> 2"n
n
)#
:
o
p
< 2"n ! 0: Due to the fact that "n n=rn !
z[nr]
1 sup
j j
"n 1
(
inf
j _ j=1
x02[nr] _
p
> 2"n
n
)
Thus, it follows from the dominated convergence theorem that the integral of (18) over r on
the interval [0; 1] goes to zero. This shows R2n = op (1) ; thus, completing the proof of (8) :
p
Then, it follows from Lemma 6 that n ^
0 = Op (1).
p
We turn to show that ^
= op (1) and n ^
0 = op (1). When d (x; ) is continuous in all its arguments, we can resort to Theorem 1 of de Jong (2002) : And it can be
readily generalized to the case where the limit objective function is continuous. Recalling
Assumption 1 (d) and reading the proof of Theorem 1 of de Jong (2002) ; we see that we only
have to show his equation (A:9) with an = 0; which corresponds to
"
1X
ut
n t
x02t
p
1
n
; ;
0
ut
x02t
p
1
n
; ;
S
uniformly in ( ; ) on any compact set. However, for any
18
x02t
p
1
n
and
; ;
#
p
! 0;
(19)
and C > 0; the term in
(19) is bounded by
sup
; ;j j C
1X
0
ut ( ; ; ) ut ( ; ; )
n t
S( ; ; ) +
x0 1
1X
p
1 sup 2t
n t
n
>C ;
whose …rst term converges to zero in probability by assumption and the second term can be
x0
made arbitrarily small by choosing C large enough due to the weak convergence of 2tpn1 :
Therefore, the proof is complete.
Proof of Corollary 2
As in the proof of Theorem 1, we need to show that
Pr inf Sn ( ) =n
Sn ( 0 ) =n > 0
! 1;
where we follow the notational convention of the proof the theorem that in…mums and supremums are assumed to be taken over rn ; unless speci…ed otherwise. As Sn ( 0 ) =n does not
contain any I (1) variable, the result in Seo and Linton (2007) applies so that it is su¢ cient
to show (10) with X ; replaced by X ; = (X ( ) ; X ( )) Ip : However, (11) is obvious as
K is bounded. Thus, we need to show (12), that is,
n
( ; )
=
)
1
n2
_0 _
P
1
n2
t
P
t
x02t
1_
2
1
n2
1
n2
P
t
x02t
1_
2
Kt
P
2 2
0
( ; )
Kt
t x2t 1 _
!
R1 2
R1 2
W
W 1 fW > 0g
R 1 20
R01 2
;
W 1 fW > 0g 0 W 1 fW > 0g
0
x02t
1_
2
Kt
1
1
( ; )
1
( ; )
!
which follows if we show (14) and (15) with d replaced by K. The proof hinges on (18) ; which
is the part where d matters. However, it still holds replacing d with K and taking supremum
p
over both and h on any interval including zero. This establishes that ^ is n-consistent.
Remaining part of the proof follows if the conditions on K satisfy Assumption 1 (d) : In other
words, de…ning u
~t ( ; ; ) like ut ( ; ; ) replacing the indicator with K, we need to show
the uniform convergence of its sample mean to S ( ; ; ), which follows from Seo and Linton
(2007) as u
~t ( ; ; ) consists of stationary variables. This completes the proof.
A word on notation. Having proved Theorem 1 and Corollary 2, we write hereafter
0
0
= 0 ; ; 0 and 1 = ( 0 ; ) with slight abuse of notation. To further simplify notation
assume 0 = 0 and thus 10 is a vector of zeros.
Proof of Theorem 3
Let c = f : j
0 j < cg : Due to the consistency shown in Theorem 1, we may restrict the
parameter space to c for some c > 0; which will be speci…ed below. It is su¢ cient to show
the following claim for 1 to be n-consistent as in Chan (1993) :
19
Claim I For any " > 0; there exist c > 0 and K > 0 such that
lim inf Pr fSn ( )
Sn (0; ) > 0 for all
n!1
where
c;K
=
c
2
c;K g
>1
";
\ f : j 1 j > K=ng :
Proof of Claim I: Let
u1t ( )
t
= ut
u2t ( )
x0
p2t
n
=
and ut ( ) = u1t ( ) + u2t ( ) ; where
0
(A
A0 ) Xt
1
(D
D00 Xt 1
1 zt
>
t 1
=
1
Az + Dz 1 zt
Since u2t (0; ) = 0; (Sn ( )
1
>
0
D0 ) Xt
1 fzt
x02t
p
t 1
11
1
1
zt
1
>
t 1
> 0g
:
n
Sn (0; )) =n = D1n ( ) + D2n ( ) ; where
D1n ( )
=
D2n ( )
=
x2t
p
Note that ~x2n = supt<n
for 2 c : Then,
1X
0
u1t ( ) u1t ( ) u01t (0; ) u1t (0; )
n t
1X
2X
0
0
u2t ( ) u2t ( )
u1t ( ) u2t ( ) :
n t
n t
1
n
= Op (1) and thus ~ n = supt<n
1X
0
u2t ( ) u2t ( )
n t
2
2
O [~x2n ] j j
= Op (j 1 j) = Op (c)
t 1
= Op (c j j) :
(20)
Since fut g is a martingale di¤erence sequence, so is the sequence ut 1 zt
implying
x02t 1
1X
u t 1 zt 1 > t 1 p
= op (1) ;
n t
n
1
>
t 1
;
see e.g. Hansen (1992) : This implies that
1X
0
u1t ( ) u2t ( )
n t
because
1
n
P
t
Xt
11
zt
1
>
t 1
x02t
p
op (j j) + Op (c j j) + op (j j) ;
1
n
1
n
1X
X t 1 1 zt 1 > t
n t
1X
~x2n
jXt 1 j 1 jzt 1 j
n t
1
P
t
1 fzt
Xt
1
x02t 1
1 pn
(21)
= Op (1) and
> 0g 1 zt
1
>
t 1
x02t
p
1
n
t 1
= Op (1) Op (c) :
The meaning of op (j j) is that the term can be bounded for all large n by the product of
j j and an arbitrary small constant with probability 1 " for any " > 0: The same is true for
20
Op (c j j) since c can be chosen arbitrary small. Thus, we conclude from (20) and (21) that
for any m; " > 0; there is c > 0 such that
lim inf Pr fjD2n ( )j
m j 1 j for all
n!1
2
cg
>1
":
(22)
On the other hand, as we show below, for any " > 0 and all su¢ ciently large n; there
exists some constant m0 > 0 such that D1n ( ) > m0 j 1 j with probability 1 "; which will
complete the proof of Claim I as m is arbitrary. By direct calculation as above, we may write
D1n ( )
1X
Xt 1 Xt0 1 1 jzt 1 j
n t
1X
+ (D00 + O (c))
Xt 1 ut 1 jzt 1 j
n t
(D00 + Op (c))
=
D0
t 1
+ Op c2 :
t 1
We …rst argue that the same as (22) can be said for the last two terms. It is obvious for the
last term Op c2 and it is left to show that for any "; > 0 and su¢ ciently large n
Pr
(
The other terms in Xt
b
=
n
sup
2
1
1
c;K
1X
ut 1 jzt
n t
1j
t 1
= j 1j >
= bi1 ; :::; bik
(
0
2
1j
1j
t
EFt
1
t 1
M
1X
x2t
E p
n t
n
= K=n b ; :::; b
ik 0
>
)
2
t
Eu2t 1 jzt
However, since the conditional distribution, say, Ft
is bounded by M , an expansion of it yields that
X
By the Markov inequality
t 1
0 2 2
2
K 2 j(bi1 ; :::; bik )j
b:
i1
t 1
K=n (bi1 ; :::; bik )
X
1
i1 ;:::;ik
i1 ;:::;ik
(23)
o
K=n : j 1 j < c; and i1 ; :::; ik = 1; 2; ::: ;
1X
Pr sup
ut 1 jzt
n t
b
P
X E 1
t ut 1 jzt
n
X
< ":
can be analyzed similarly. To show (23), we consider a grid
for b > 1 and …rst show (23) when the supremum is taken over
=
)
1
of zt
1j
1
t 1
given x2t
:
(24)
1
has a density, which
2
K
bi1 ; :::; bik
=O K
bi1 ; :::; bik
:
(25)
P
1
< 1: Thus, letting K large makes
Furthermore, b > 1 implies that i1 ;:::;ik bi1 ; :::; bik
the term in (24) smaller than any given " > 0:
Next, if a1 and a2 are any two adjacent points in b ; then ja1 a2 j ja1 j (b 1) K=n
c (b 1) K=n: Also, let 1t and 2t denote 0t s corresponding to any two points (say, 11 and
21
12 ; )
lying between a1 and a2 ; then
1 jzt
(
1
1j
1 jzt
1t 1
1t 1
t;
sup j
2t 1
2t j
1t
11 ; 12
1j
jzt
1j
1t 1
~x2n c (b
1)
and
t;
sup j
2t j
1t
11 ; 12
+ sup j
t;
11 ; 12
1t
)
2t j
K
;
n
where the supremum is taken over t < n and over 11 and 12 between a1 and a2 . Since a1
and a2 were chosen arbitrary, the supremum can be extended to the collection of all 11 and
12 that lie between any two adjacent point in
b and by the same reasoning as (25) ;
E sup
11 ; 12
X
t
ut 1 jzt
1j
1 jzt
1t 1
1j
= O (c (b
2t 1
1) K) :
Since b can be chosen arbitrarily close to 1; this completes the proof of (23) :
Turning to the …rst term of D1n ; let F denote the distribution function of jzt j then it
follows from the standard uniform law of large numbers that
sup
jxj;j
1j
C
1X
j1 fjzt
n t
p
jx0 1 jg
1j
F (jx0 1 j)j ! 0;
for any C < 1: Since C is arbitrary and ~x2n = Op (1) ;
sup
j
1j
C
1X
1 jzt
n t
_t
1j
F
1
p
_t
! 0:
1
(26)
p
Since x2[nr] = n ) B (r) and F is continuous, it follows from (26) that
1X
1 jzt
n t
1j
_t
1
)
Z
1
0
B (r)
F
d
dr =
Z
1
F (j +
0
W (r)j) dr;
0
0
where W is a standard Brownian motion and the equality in distribution follows from the
normality of B: Then, for all large n;
(
1X
Pr
1 jzt
n t
Z 1
Pr
F (j +
1j
0
_t
m5 j 1 j
1
W (r)j) dr
0
1
m5 j 1 j
0 for all
0 for all
1
2
1
2
c
)
c
"=2
";
where the last inequality is shown below in several steps.
First, there is m1 such that F (z) m1 z for all z 2 [0; 1] and sup0 r 1 jW (r)j = Op (1)
and j j and j j are bounded by c: Thus, choosing c small we can argue that for any " > 0;
22
with probability greater than 1
Z
";
Z
1
0
F (j +
W (r)j) dr
0
1
0
m1 j +
0
W (r)j dr:
Second, since j j c; if jW (r)j c; then j + 0 W (r)j 2 0 jW (r)j 2m2 j j jW (r)j ;
for some m2 as is positive de…nite. Thus, choosing c small, Pr finf 0 r 1 jW (r)j cg > 1 "
and thus with probability greater 1 ";
Z
1
0
m1 j +
0
W (r)j dr
m3 j j
Z
1
jW (r)j dr:
0
nR
o
1
Third, for any " > 0; there is m4 such that Pr 0 jW (r)j dr > m4 > 1
conclude that for any " > 0; there exists a constant m5 > 0 such that
Z
Pr
": Thus, we can
1
0
F (j +
W (r)j) dr
m5 j 1 j
0
0
>1
"=2:
Proof of Theorem 4
To derive the limit distribution of the SLS estimator ^; de…ne Tn ( ) =
@ 2 Sn ( )
n@ @ 0 : Then, by the mean value theorem,
p
nDn 1 ^
1
= Dn Qn ~ Dn
0
p
@Sn ( )
n@
and Qn ( ) =
nDn Tn ( 0 ) ;
1=2
where Dn is a diagonal matrix, whose …rst p 1 elements are (h=n) ; the p-th element is
p
h; and the others are 1; and ~ lies between ^ and 0 : Recall that Xt0 1 ( ) D = Xt0 1 D +
x02t
p
1
n
Dz0 and write the residuals of the SLS as
et ( )
= ut
0
(A
D 0 Xt
A0 ) Xt
(Kt
1
1
0
(D
1
( ; )
D0 ) Xt
dt
1)
1 dt 1
(27)
(Az + Dz Kt
1
( ; ))
x02t 1
p
n
:
Then,
0
@et ( )
@
0
@et ( )
@
0
@et ( )
@
=
x2t
1
Xt0
=
0
= @
"
A0z
1D
+
Dz0 Kt 1
x0
p
+ Dz0 2t
Xt0
Kt
1
1D
( ; )+
1
n
+ Dz0
( ; ) Xt0
x0 1
p
Dz0 2t
1D
n
+
Xt0 1 D
(1)
1
Kt
x02t
p
+
( ; )
h
1
n
x0
Dz0 2tpn1
23
(1)
1
Kt
( ; )
h
#
(28)
(29)
Ip
Ip
1
A:
Convergence of Tn
The asymptotic distribution of
p
X @et ( 0 )
1
nDn Tn ( 0 ) =2 = p Dn
et ( 0 )
@
n
t
0
has been developed in Seo and Linton (2007) except for the part corresponding to : Thus,
we focus on the convergence of
p
h X @et ( 0 )
et ( 0 ) =
2n t
@
1X
x2t
n t
0
p
1(
hv1t +
p
p
hv2t + v3t = h);
where
v1t
0
= A0z0 ut + Dz0
Kt
v2t
=
v3t
=
1 ut ;
0
(A0z0 + Dz0
Kt 1 ) D00 Xt 1 (Kt 1
(1)
Kt 1 Xt0 1 D0 [ut D00 Xt 1 (Kt 1
dt
dt
1) ;
1 )] ;
p
and that of covariances of nDn Tn ( 0 ) :
P
Since v1t is a martingale di¤erence array, n1 t x2t 1 v1t = Op (1) due to the convergence of
P
stochastic integrals (see e.g. Kurtz and Protter 1991). The convergence of n 1 h 1=2 t x2t 1 v2t
P
1
is similar to that of np
t x2t 1 v3t , which we will establish here. Then, it follows that
h
1X
x2t
n t
1
p
(v1t + v2t ) h = op (1) ;
p
as h ! 0: Let v3t = (v3t Ev3t ) = h; then v3t is a zero mean strongh mixing array. iSeo
p
1=2 P
and Linton (2007, Lemma 2) has shown that n=hEv3t ! 0 and var (hn)
t v3t =
h
i
p
var v3t = h + o (1) ! 2v ; which is de…ned in (5) : This implies that
1X
x2t
n t
and that
n
1=2
[nr]
X
t=2
p
h = op (1) ;
!
B
2
vW
1 Ev3t =
x2t
v3t
1
)
!
;
(30)
due to Assumption 4 and the invariance principle of Wooldridge and White (1988, Theorem
2.11), where W is a standard Brownian motion that is independent of B: For the independence between B and W , see Lemma 2 of Seo and Linton (2007) ; which shows that
Pn
s;t=1 E xs v3t = o (1).
P
For the convergence of n1 t x2t 1 v3t ; we resort to Hansen (1992, Theorem 3.1). Checking
p
his conditions, we observe that the moment condition for v3t is not met, that is, E jv3t j ! 1
P1
for p > 2: However, it is not necessary but used to show that sup1 t n n 1=2 k=1 E [v3t+k jFt ] =
op (1) ; where Ft is the natural …ltration at time t: Using the Markov inequality, he obtains
24
that for p > 2
Pr
(
sup n
1=2
1 t n
1
X
k=1
"
E [v3t+k jFt ]
p
)
p
CE jv3t j
!0
"p np=2 1
(31)
p
if E jv3t j < 1. Now we show that while E jv3t j is not bounded for p > 2 but diverges at a
p
rate slower than np=2 1 so that (31) still holds. As n=hEv3t ! 0 and the part associated
with ut can be done in the same manner, we focus on
(1)
0
0
1 Xt 1 D0 D0 Xt 1
E Kt
= h
= h1
p=2
Z
p=2
Z
p
p
(Kt
jX 0 D0 D00 Xj K(1)
1
dt
z
1)
0
h
p=2
K
h
p
jX 0 D0 D00 Xj K(1) (s) (K (s)
p
z
0
1 (z >
h
0)
f (zjX) dzdPX
p
1 (s)) f (hs +
0 jX) dsdPX ;
where PX is the distribution of Xt 1 and the last equality follows by the change-of-variables.
p
Note that f is bounded almost every X, K (s) 1 (s) is bounded, K(1) is integrable, and
p
E jX 0 D0 D00 Xj < 1. As h1 p=2 =np=2 1 ! 0 for p > 2; we conclude that (31) converges to
zero. Therefore,
p
Z 1
0
h X @et ( 0 )
et ( 0 ) )
BdW:
(32)
2 vn t
@
0
Finally, note that a similar argument that showed the asymptotic independence in (30)
yields the asymptotic independence between the scores for and : For the covariance between
0
the scores for and ; note that v3t = h @et@( 0 ) et ( 0 ) :
Convergence of Qn
To begin with, we claim the following lemma.
Lemma 7 Under the conditions of this theorem, h
Proof of Lemma 7
1
(^
0)
and h
1
^ are op (1) :
Recalling the formula (27) and (29) ; we may write
1X @
0
et ( ) et ( ) = T1n ( ) +
n t @
+ T10n ( ) ;
(33)
where
1 X (1)
( ; ) Xt0 1 Dut ;
K
nh t t 1
"
1 X (1)
T2n ( ) = tr D0
K
( ; ) Xt 1 Xt0 1 (A
nh t t 1
"
1 X
(1)
T3n ( ) = tr D0
Xt 1 Xt0 1 Kt 1 ( ; ) (Kt
nh t
T1n ( )
=
25
#
A0 )
1
( ; )
dt
1) D
#
;
and till we reach
T10n ( ) =
1 0 X x2t 1 (1)
p Kt 1 ( ; ) (Az + Dz Kt
D
nh z t
n
1
( ; ))
x02t
p
1
n
:
As we proved the consistency of the smoothed
least squares
estimator of in Corollary 2,
o
n
^
we can …nd a sequence rn ! 0 and that Pr
0 > rn ! 0: Without loss of generality, we
restrict the parameter space to rn = fj
j
rn g : We …rst show that sup rn jTin ( )j =
0
op (1) ; for all i 6= 3; which implies that
T3n ^ = op (1) ;
(34)
since (33) equals zero at = ^ due to the …rst order condition of the SLS estimation.
First take T1n ( ). In particular, consider
1 X (1)
K
nh t
zt
1
+ x02t
1
p
= n
ut ;
h
and the other terms in the vector can be handled similarly. As in Seo and Linton, we may
assume ut is bounded by some constant C < 1 and divide the parameter space for 1 of rn
into n non-overlapping pieces ni ; i = 1; :::; n ; and the distance between any two points in
each piece is smaller than equal to h2+" for some > 0: Then, n = O h 3(p 1)(1+"=2) . An
application of a Hoe¤ding type inequality for martingale di¤erence sequence (Azuma 1967)
yields that for 1i 2 ni and some constant C1
(
1 X
ut K(1)
Pr sup
nh
i
n
t
(
X
1 X
Pr
ut K(1)
nh t
i
zt
1
+ x02t
p
n
i
p
n
i
>
+ x02t
1 2=
1 i=
>
h
zt
1
+ x02t
1 i=
h
n
n
nh2
2
1
and
2
1
+ x02t
exp
)
)
=C1
! 0;
as nh2 ! 1: And for any
1 X (1)
K
nh t
zt
1 X
sup K(2)
nh t ;
in
ni ;
p
1 1=
n
1
h
zt
1
+ x02t
1
p
= n
h
K
zt
(1)
x2t
ut p
1
h
1
n
p
n
2
ut
h2+"
h
"
Op (h ) ;
2t 1
as sup1 t n xp
= Op (1) and K(2) is bounded.
n
Next, we study the convergence of T3n ( ) : The same analysis applies to Tin s for i =
2; 4; 5; :::; 10; to yield that they are all op (1) s: Note for example that T2n is the product of
(A A0 ) ; which is bounded by rn ; and of a sample mean, which is Op (1) applying the same
26
method as below.
De…ne
Tt (x;
1)
= Xt
0
(1)
1 Xt 1 K
zt
1
+ x0
h
zt
K
1
+ x0
h
1 fzt
1
+ x0 > g ;
and
& (x;
= Xt
= E (Tt (x;
1 )) :
x0 , de…ne
Furthermore, letting _ =
T_t ( _ )
1)
0
(1)
1 Xt 1 K
zt
_
1
zt
K
h
_
1
1 fzt
h
1
> _g ;
= E T_t ( _ ) :
&_ ( _ )
Assume that x belongs to a compact set x and thus _ lies within an interval
to zero as j 1 j rn ! 0: Then, it is clear that
sup
x2
x; 12
rn
1 X
(Tt (x;
nh t
1)
& (x;
1 ))
sup
_2
gn
1 X _
Tt ( _ )
nh t
gn ;
shrinking
= op (1) ; (35)
&_ ( _ )
where the last equality is due to Lemma 4 (17) of Seo and Linton (2007) : Since (35) holds
for any compact set x and supt n 1=2 x2t 1 = Op (1) ; (35) holds true when x is replaced
with n 1=2 x2t 1 : This implies that
"
1 X
tr D
&
nh t
sup T3n ( )
rn
and thus
0
1 X
&
nh t
x2t
p
1
n
x2t
p
; ^1
1
n
;
1
D
#
= op (1) ;
(36)
= op (1) ;
(37)
due to (34) : Furthermore,
1 X
&
nh t
x2t
p
1
n
;
=
10
1 X
&_ (0) = o (1) ;
nh t
(38)
where the last equality is due to (24) of Seo and Linton (2007) : Expanding the (1; 1) element
of (37) around 10 yields that for ~1 between ^1 and 10
1 X
& 11
nh t
x2t
p
1
n
; ^1
=
1 X
& 11
nh t
x2t
p
1
n
;
10
+
@ 1 X
& 11
@ 1 nh t
x2t
p
1
n
; ~1
where & 11 denotes the (1; 1) element of &: Given (37) and (38), this implies that h
op (1) if
@ 1X
x2t 1 ~
p ; 1 6= op (1) :
& 11
@ 1n t
n
27
1
^1
10
;
^1
10
=
(39)
However,
@
E K(1)
@ 1
zt
= E K(2)
"
+ EK(1)
zt
1
zt
1
+ x0
h
+ x0
h
K
2
+ x0
h
1
zt
K
zt
1
h
1
+ x0
h
1
1 fzt
+ x0
h
1 fzt
#
1
+ x0 > g
1
+ x0 > g
1
0
(x0 ; 1)
h
0
x0 ) (x0 ; 1) ;
K(1) (0) f (
where f denotes the density of zt : Using change-of-variables formula,
zt
EK(1)
1
2
_
h
1
h
Z
=
Z
=
Z
!
1
1
1
1
1
1
K(1)
z
2
_
h
1
f (z) dz
h
2
K(1) (z) f (hz + _ ) dz
2
K(1) (z) dz f (0) ;
by the dominated convergence theorem as h and _ go to zero. Similarly,
zt 1 _
E K(2)
h
Z 1
!
K(2) (z) (K (z)
1
zt
K
1
_
1 fzt
h
1
> _g
1
h
1 fz > 0g) dz f (0) :
Furthermore, it follows from the integral by parts that
Z
1
1
Z
2
K(1) (s) ds
Since sup1
=
t n;
12
1
K(2) (s) (1 fs > 0g
1
rn
x02t
p
1
K (s)) ds =
Z
1
1
K(2) (s) 1 fs > 0g ds = K0 (0) :
= op (1) ; we obtain
n
@ 1X
x2t 1 ~
p ; 1
& 11
@ 1n t
n
Z 1
1X
2
K(2) (z) (K (z) 1 fz > 0g) dz f (0)
n t
1
Thus, (39) follows since
R1
1
K(2) (z) (K (z)
x02t
p
1
n
; 1
0
+ op (1) :
1 fz > 0g) dz > 0 and f (0) > 0:
In view of Lemma 7 and the proof, we can restrict the parameter space to
n
=f 2
:j
0j
< rn ; h
1
j
0j
< rn ; h
1
j j < rn g;
for a sequence rn ! 0. Let
Qan
p
1 X @et ( ) @et ( )
1 X X @ 2 eit ( )
b
( )=
eit ( ) ;
0 ; Qn ( ) =
n t
@
n t i=1 @ @ 0
@
0
28
where the subscript i of a matrix (or a vector) indicates the ith Column (element) of the
matrix (the vector). Then, Qn ( ) = 2Qan ( ) + 2Qbn ( ) : Start with Qbn ( ) ; in particular, with
X @ 2 eit ( )
eit ( )
@ @ 0
t
X
=
x2t 1 x02t 1
(1)
1
Kt
2Dzi
t
( ; )
+ Xt
h
x0
1
(2)
1
0
( ) Di
0
Since h ! 0 and sup1 t n h2tpn1 = op (1) ; Xt 1 ( ) Di = Xt0
op (1) and the leading term in (40) with the normalization is
(2)
Kt 1 (
0
0
1 x2t 1 Xt 1 Di
h2
h X
x2t
n2 t
; )
eit ( ) )
~ 2qi
Kt
( ; )
h2
1 Di
+ Dzi
Z
~2
K
Z
!
eit ( ) :
x02t
p
1
n
(40)
= Xt0
1 Di
+
1
BB 0 ;
(41)
0
~ 2 (s) = K(2) (s) (K (s) 1 fs > 0g) and ~ 2 = E D0 X 0 Xt 1 D0i jzt 1 = 0 f (0) :
where K
qi
0i t 1
The convergence (41) can be achieved in the same way as in Lemma 7. That is, decomposing
0
it as in the lemma, we can easily show that Tin
s are negligible except for i = 3: For T3n ; the
same as (35) holds with
& (x; ) =
where _ =
nique
1 X 0 h 0
xx E Xt
hn2 t
x0
h
: Since supt
1
E Xt0
h
1 Di
2
(2)
1
Kt
x02t
n; 2
(Kt
@ 2 eit ( )
@ @ 0
=
=
Dzi
Xt
1
dt
1
@ 2 eit ( )
@ @ 0
1
( ; )
+ Xt
h
0
( ) Di
zt 1
+_
h
0
1 ) ! E (Di Xt
(1)
1
Kt
K
1
2
1 ) jzt
0
( ) Di
(2)
1
Kt
( ; )
h2
( ; ) @ 2 eit ( )
;
=
h2
@ @ 0
!
Kt
1
( ; )
t 1
h
o i
+ _ >0
:
2
1
= 0 f (0)
Z
+
29
2
(1)
Kt 1 ( ; )
Xt 1
h
x02t
~ 2;
K
1;
0
(1)
1(
Kt
and
@ 2 eit ( )
=
@ @ 0
nz
= 0,
(2)
1
Kt
1
= op (1) and by the change-of-variables tech-
h
n
the convergence in (41) follows.
The remaining terms in Qbn ( ) are
@ 2 eit ( )
@ @ 0
zt 1
+_
h
(2)
1 Di K
( )
h
!
; )
Xt
x02t
1
1
( )
Ii
!
Ii ;
where 2 = (0; 1; 0; :::; 0) whose dimension is (pl + 2) : As their convergence can be analyzed
similarly as above, we omit the details and conclude that
Z
Dn Qbn Dn ) ~ 2q
where ~ 2q =
Dn Qan Dn
Pp
i=1
2
6
6
)6
6
4
0
R1
BB 0
R 01 0
B
0
0
~2 B
K
@
1
0
C
0 A;
0
R1
B
1
0
0
~ 2qi : Similarly,
2
K(1) 2
R1
BB 0
0R
1 0
B
0
~ 2q
R1
0
!
B
1
0
R
2
Finally, note that K(1) 2
yields the desired result.
1
dt 1
E
dt
dt
1
1
3
0
!
Xt
0
1 Xt 1
Ip
7
7
7:
7
5
~ 2 = K(1) (0) by an application of the integral by parts, which
K
The convergence of ^ is a direct consequence of Theorem 3, which obtains the convergence
rates of ^ and ^ ; and Theorem 5 of Seo and Linton (2007) :
Proof of Corollary 5
Let n = + Op n 1 , 2 = ( ; ) : The consistency of ^2 is obvious since we established the
p
consistency in Corollary 2 based on the n-consistency of ^ : For the limit distribution, let
T2n ( 2 ; ) and Q2n ( 2 ; ) denote the score and hessian of the sum of squares function with a
…xed . But, we have already derived the convergence of the Hessian Qn ( ) for ^ = 0 +op (1) :
Thus, we only have to examine the score T2n : Let
e•t ( 2 ;
n)
= ut
0
0
@•
et ( 2 )
@ 2
D Xt
0
B
= @
0
(A
A0 ) Xt
1
(Kt
Xt0
Kt
1
1
(
n;
0
(D
1
)
D0 ) Xt
dt
1)
(
(Az + Dz Kt
+ Dz0 x02t 1 ( n
0)
0
0 0
Xt 1 D + Dz x2t 1 ( n
0
0 0
n ; ) Xt 1 D + Dz x2t 1
1D
1 dt 1
(1)
Kt 1
0)
(
n
(
n;
Ip
0)
1
(
n;
) =h
Ip
)) x02t
1
1
(
C
A
Then, we need to show that
p n
h X @•
et ( 20 ;
p
@ 2
n t=1
0
n)
e•t (
20 ; n )
@•
et ( 20 ;
@ 2
30
0
0)
e•t (
20 ; 0 )
= op (1) :
n
0) ;
As the arguments are all similar for each term, we show that
p n
hX 0
p
X D0 D00 Xt 1 (Kt 1 ( n ; 0 ) Kt 1 )
n t=1 t 1
p
n
x02t 1 1 X 0
n
p j( n
p
X D0 D00 Xt
)j
sup
0
n n t=1 t 1
h
1 t n
where ~ lies between
completes the proof.
0
and
n;
since (
n
0)
p
(1)
1 Kt 1
(~;
0)
= op (1) ;
p
n= h = op (1) and K(1) is bounded. This
References
Anderson, H. M. (1997): “Transaction Costs and Non-linear Adjustment towards Equilibrium in the US Treasury Bill Market,”Oxford Bulletin of Economics and Statistics, 59(4),
465–84.
Andrews, D. W. K. (1987): “Consistency in nonlinear econometric models: a generic
uniform law of large numbers,” Econometrica, 55(6), 1465–1471.
(1991): “Heteroskedasticity and autocorrelation consistent covariance matrix estimation,” Econometrica, 59, 817–858.
Azuma, K. (1967): “Weighted sums of certain dependent random variables,” The Tohoku
Mathematical Journal. Second Series, 19, 357–367.
Bai, J., and P. Perron (1998): “Estimating and testing linear models with multiple structural changes,” Econometrica, 66, 47–78.
Balke, N., and T. Fomby (1997): “Threshold cointegration,” International Economic Review, 38, 627–645.
Bec, F., and A. Rahbek (2004): “Vector equilibrium correction models with non-linear
discontinuous adjustments,” The Econometrics Journal, 7(2), 628–651.
Chan, K. S. (1993): “Consistency and Limiting Distribution of the Least Squares Estimator
of a Threshold Autoregressive Model,” The Annals of Statistics, 21, 520–533.
Corradi, V., N. R. Swanson, and H. White (2000): “Testing for stationarity-ergodicity
and for comovements between nonlinear discrete time Markov processes,”Journal of Econometrics, 96(1), 39–73.
de Jong, R. M. (2001): “Nonlinear estimation using estimated cointegrating relations,”
Journal of Econometrics, 101(1), 109–122.
(2002): “Nonlinear minimization estimators in the presence of cointegrating relations,” Journal of Econometrics, 110(2), 241–259.
Engle, R. F., and C. W. J. Granger (1987): “Co-integration and error correction: representation, estimation, and testing,” Econometrica, 55(2), 251–276.
31
Escribano, A. (2004): “Nonlinear Error Correction: The Case of Money Demand in the
United Kingdom (1878-2000),” Macroeconomic Dynamics, 8(1), 76–116.
Escribano, A., and S. Mira (2002): “Nonlinear error correction models,”Journal of Time
Series Analysis, 23(5), 509–522.
Gonzalo, J., and J.-Y. Pitarakis (2006): “Threshold E¤ects in Cointegrating Relationships,” Oxford Bulletin of Economics and Statistics, 68(s1), 813–833.
Gonzalo, J., and M. Wolf (2005): “Subsampling Inference in Threshold Autoregressive
Models,” Journal of Econometrics, 127(201-224), 209–233.
Granger, C., and T. Terasvirta (1993): Modelling nonlinear economic relationships.
Oxford University Press.
Granger, C. W. J. (2001): “Overview of Nonlinear Macroeconometric Empirical Models,”
Macroeconomic Dynamics, 5(4), 466–81.
Hansen, B. (1992): “Convergence to Stochastic Integrals for Dependent Heterogeneous
Processes,” Econometric Theory, 8, 489–500.
Hansen, B., and B. Seo (2002): “Testing for two-regime threshold cointegration in vector
error correction models,” Journal of Econometrics, 110, 293–318.
Hansen, B. E. (1999): “Threshold e¤ects in non-dynamic panels: estimation, testing, and
inference,” Journal of Econometrics, 93(2), 345–368.
(2000): “Sample splitting and threshold estimation,” Econometrica, 68, 575–603.
Horowitz, J. L. (1992): “A smoothed maximum score estimator for the binary response
model,” Econometrica, 60(3), 505–531.
Kapetanios, G., Y. Shin, and A. Snell (2006): “Testing for Cointegration in Nonlinear
Smooth Transition Error Correction Models,” Econometric Theory, 22.
Kristensen, D., and A. Rahbek (2008): “Likelihood-Based Inference in Nonlinear ErrorCorrection Models,” Journal of Econometrics, forthcoming.
Kurtz, T., and P. Protter (1991): “Weak Limit Theorems for Stochastic Integrals and
Stochastic Di¤erential Equations,” Annals of Probability, 19, 1035–1070.
Lo, M., and E. Zivot (2001): “Threshold Cointegration and Nonlinear Adjustment to the
Law of One Price,” Macroeconomic Dynamics, 5(4), 533–576.
Michael, P., A. R. Nobay, and D. A. Peel (1997): “Transactions Costs and Nonlinear
Adjustment in Real Exchange Rates: An Empirical Investigation,” Journal of Political
Economy, 105(4), 862–79.
Newey, W. K., and D. McFadden (1994): “Large sample estimation and hypothesis
testing,”in Handbook of econometrics, Vol. IV, vol. 2, pp. 2111–2245. North-Holland, Amsterdam.
32
Park, J., and P. Phillips (2001): “Nonlinear Regressions with Integrated Time Series,”
Econometrica, 69, 117–161.
Pötscher, B. M., and I. R. Prucha (1991): “Basic structure of the asymptotic theory
in dynamic nonlinear econometric models. I. Consistency and approximation concepts,”
Econometric Reviews, 10(2), 125–216.
Psaradakis, Z., M. Sola, and F. Spagnolo (2004): “On Markov error-correction models,
with an application to stock prices and dividends,”Journal of Applied Econometrics, 19(1),
69–88.
Saikkonen, P. (1995): “Problems with the asymptotic theory of maximum likelihood estimation in integrated and cointegrated systems,” Econometric Theory, 11(5), 888–911.
(2005): “Stability results for nonlinear error correction models,” Journal of Econometrics, 127, 69–81.
(2007): “Stability of Regime Switching Error Correction Models under Linear Cointegration,” Econometric Theory, 24, 294–318.
Seo, M. (2006): “Bootstrap Testing for the Null of No Cointegration in a Threshold Vector
Error Correction Model,” Journal of Econometrics, 134(1), 129–150.
Seo, M., and O. Linton (2007): “A Smoothed Least Squares Estimator For The Threshold
Regression,” Journal of Econometrics, 141, 704.
Sephton, P. S. (2003): “Spatial market arbitrage and threshold cointegration,” American
Journal of Agricultural Economics, 85, 1041.
Taylor, A. M. (2001): “Potential Pitfalls for the Purchasing Power Parity Puzzle? Sampling
and Speci…cation Biases in Mean-Reversion Tests of the Law of One Price,”Econometrica,
69(2), 473–498.
Wooldridge, J. M., and H. White (1988): “Some invariance principles and central limit
theorems for dependent heterogeneous processes,” Econometric Theory, 4(2), 210–230.
Wu, C.-F. (1981): “Asymptotic theory of nonlinear least squares estimation,” The Annals
of Statistics, 9(3), 501–513.
33
Download