Testing for Regime Switching in State Space Models ∗ Fan Zhuo Boston University

advertisement
Testing for Regime Switching in State Space Models∗
Fan Zhuo†
Boston University
November 10, 2015
Abstract
This paper develops a modified likelihood ratio (MLR) test for detecting regime switching
in state space models. I apply the filtering algorithm introduced in Gordon and Smith (1988)
to construct a modified likelihood function under the alternative hypothesis of two regimes and
extend the analysis in Qu and Zhuo (2015) to establish the asymptotic distribution of the MLR
statistic under the null hypothesis. I also present a practical application of the test using U.S.
unemployment rates. This paper is the first to develop a test for detecting regime switching in
state space models that is based on the likelihood ratio principle.
Keywords: Hypothesis testing, likelihood ratio, state space model, Markov switching.
∗
I am deeply indebted to my advisor Zhongjun Qu for his invaluable advice and support for my research. I also wish
to thank Pierre Perron, Hiroaki Kaido, Ivan Fernandez-Val, and seminar participants at Boston University for valuable
suggestions.
†
Department of Economics, Boston University, 270 Bay State Rd., Boston, MA, 02215 (zhuo@bu.edu).
1
1
Introduction
Economists have long recognized the possibility that model parameters may not be constant through
time, and that instead there can be variations in model structure. If these variations are temporary
and recurrent, then the Markov regime switching model can offer a natural modeling choice. Hamilton (1989) makes a seminal contribution that not only introduces a framework with Markov regime
switching for describing economic growth, but also provides a general algorithm for filtering, smoothing, and maximum likelihood estimation. A survey of the literature on regime switching models can
be found in Hamilton (2008). Meanwhile, state space models are widely used in economics and finance
to study time series with latent state variables. Harvey (1981) and Meinhold and Singpurwalla (1983)
introduced economists to the use of the Kalman (1960) filter for constructing likelihood functions
through the prediction error decomposition. Latent state variables and regime switching can arise at
the same time, which poses a challenge for modeling them jointly.
However, as Hamilton (1990) pointed out, conducting formal tests for the presence of Markov
switching is challenging. There are generally three approaches for detecting regime switching. The
first approach tests for parameter homogeneity versus heterogeneity. Early contributions include
Neyman and Scott (1996), Chesher (1984), Lancaster (1984) and Davidson and MacKinnon (1991).
Recently, Carrasco, Hu and Ploberger (2014) further developed this approach and proposed a class
of optimal tests for the constancy of parameters in random coefficients models where the parameters
are weakly dependent under the alternative hypothesis. The second approach, from Hamilton (1996),
offers a series of specification tests of regime switching in time series models. These tests only need
researchers to estimate the model under the null hypothesis and have powers against a wide range
of alternative models. However, their powers can be lower than what is achievable if the parameters
indeed follow a finite state Markov chain. The third approach is based on the (quasi) likelihood
ratio principle. Several important advances have been made by Hansen (1992), Garcia (1998), Cho
and White (2007), and Carter and Steigerwald (2012). Qu and Zhuo (2015) is a recent development,
which analyzes likelihood ratio based tests for Markov regime switching allowing for multiple switching
parameters. The purpose of the present paper is to detect the regime switching in state space models.
The likelihood function for a state space model with regime switching is hard to construct, as
discussed in Kim and Nelson (1999). Different approximations to the likelihood function have been
considered in the literature, such as in Gordon and Smith (1988) and Highfield (1990). This paper
2
uses the approximation applied in Gordon and Smith (1988). Based on this approximation, I develop
a modified likelihood ratio (MLR) test.
I extend the techniques developed in Qu and Zhuo (2015) to handle the nonstandard features
associated with the MLR test. These nonstandard features include the following: (1) Some nuisance
parameters are unidentified under the null hypothesis, which violates the standard conditions that
yield the chi-squared asymptotic distribution for the test statistic. This gives rise to the Davies
(1977) problem. (2) The null hypothesis yields a local optimum (c.f. Hamilton, 1990), making the
score function identically zero at the null parameter estimates. Consequently, a second order Taylor
approximation of the likelihood ratio is insufficient to study its asymptotic properties. (3) Conditional
regime probabilities follow stochastic processes that can only be constructed recursively. Moreover,
this paper tackles an additional difficulty introduced by the latent state variables when expanding the
MLR. The asymptotic distribution of the MLR test statistic is analyzed in five steps.
1. I describe the algorithm used to construct the modified likelihood function for Markov switching
state space models introduced in Gordon and Smith (1988).
2. I characterize the conditional regime probability, the filtered latent state, the mean squared error
of the filtered latent state, and their high order derivatives with respect to the model parameters.
3. I first fix p and q and derive a fourth order Taylor approximation to the MLR. Then, I view the
MLR as an empirical process indexed by p and q, and derive its asymptotic distribution.
4. While the above limiting distributions are adequate for a broad class of models, they can lead
to over-rejections in some situations specified later. To resolve the issue of over-rejection, the
higher order terms in the likelihood expansion are incorporated into the asymptotic distribution
to safe guard against their effects.
5. I apply a unified algorithm proposed in Qu and Zhuo (2015) to simulate the above refined
asymptotic distribution.
Three Monte Carlo experiments are conducted to examine the MLR statistic. The first experiment
checks the improvement introduced by the refined asymptotic distribution. The second and third
experiments check the size and power of the MLR statistic. I also apply my method to study changes
in U.S. unemployment rates and find strong evidence favoring the regime switching specification.
3
This paper is the first to develop a likelihood ratio based test for detecting regime switching in
general state space models and contributes to the literature in several ways. First, I demonstrate the
construction of the modified likelihood function under a two regimes specification for general state
space models. Next, I study the Taylor expansion of the MLR when some regularity conditions fail
to hold. Finally, I apply my method to an empirical example and find the comovement between the
U.S. business cycle and changes in monthly U.S. unemployment rates.
The paper is structured as follows. In Section 2, I provide the general model, the basic filter,
and the hypotheses. Section 3 introduces the test statistic. Section 4 studies asymptotic properties
of the MLR for prespecified p and q. Section 5 provides the limiting distribution of the MLR test
statistic and introduces a finite sample refinement. Section 6 examines the finite sample properties of
the test statistic. Section 7 considers an empirical application to the U.S. unemployment rate. Section
8 concludes. All proofs are collected in the appendix.
The following notation is used. ||x|| is the Euclidean norm of a vector x. ||X|| is the vector induced
norm for a matrix X. x⊗k and X ⊗k denote the k-fold Kronecker product of x and X, respectively.
The expression vec(A) stands for the vectorization of a k dimensional array A. For example, for a
three dimensional array A with n elements along each dimension, vec(A) returns a n3 -vector whose
(i + (j − 1)n + (k − 1)n2 )-th element equals A(i, j, k). 1{·} is the indicator function. For a scalar valued
function f (θ), let θ ∈ Rp , ∇θ f (θ0 ) denotes a p × 1 vector of partial derivatives with respect to θ and
evaluated at θ0 . ∇θ0 f (θ0 ) equals the transpose of ∇θ f (θ0 ) and ∇θj f (θ0 ) denotes its j-th element.
For a matrix function P (θ), ∇θj P (θ) denotes the derivative of P (θ) with respect to the j-th element
in θ. The symbols “⇒”, “→d ” and “→p ” denote weak convergence under the Skorohod topology,
convergence in distribution and in probability, respectively. Op (·) and op (·) are the usual notations
for the orders of stochastic magnitude.
2
Model and hypotheses
This section presents the model and hypotheses. The discussion consists of the following: the model,
the log likelihood function under the null hypothesis (i.e., one regime), the modified log likelihood
function under the alternative hypothesis (i.e., two regimes), and some assumptions related to these
three aspects.
4
2.1
The model
Consider the following state space representation of a dynamic linear model with switching in both
transition and measurement equations:
xt = Gst + Fst xt−1 + ut ,
(2.1)
yt = Hs0 t xt + A0st zt ,
(2.2)
ut ∼ N (0, Qst ) .
(2.3)
The transition equation (2.1) describes the dynamics of the unobserved state vector xt as a function
of a J × 1 vector of shocks ut and xt−1 . The measurement equation (2.2) describes the evolution of
an observed scalar time series as a function of xt and a K × 1 vector of weakly exogenous variables
zt . The measurement error, normally included in (2.2), is treated as a latent variable in xt . Fst is of
dimension J × J, Gst is of dimension J × 1, Hst is of dimension J × 1, and Ast is of dimension K × 1.
Qst is a positive semidefinite symmetric matrix of dimension J × J.
The subscripts in Fst , Gst , Hst , Ast , and Qst imply that some of the parameters in these matrices
are dependent on an unobserved binary variable st whose value determines the regime at time t. The
regimes are Markovian, i.e., p(st = 1|st−1 = 1) = p and p(st = 2|st−1 = 2) = q. The resulting
stationary (or invariant) probability for st = 1 is given by
ξ∗ (p, q) =
1−q
.
2−p−q
(2.4)
In the subsequent analysis, ξ∗ (p, q) is abbreviated as ξ∗ . Because this paper seeks to test regime
switching in state space models based on the likelihood ratio principle, subsections 2.2-2.3 will focus
on constructing the likelihood function under the two regimes specification, and the likelihood function
under one regime specification is a by-product of the standard Kalman filter.
2.2
Modified Kalman filter
When constructing the likelihood function for a general state space model with regime switching, each
iteration of the Kalman filter produces a two-fold increase in the number of cases to consider under
a two regimes specification, as noted by Gordon and Smith (1988) and Harrison and Stevens (1976).
This means there can be more than 1000 components in the likelihood function for a sample of size
5
T = 10. This makes studying this likelihood function and its expansion infeasible. Therefore, an
approximation is considered here to “collapse” the filtered states when st = 1 and st = 2 to a single
filtered state at each t, as in Gordon and Smith (1988).
Define the information set at time t − 1 as
0
Ωt−1 = σ-field{..., zt−1
, yt−2 , zt0 , yt−1 }.
Suppose the model parameters are known. The modified Kalman filter algorithm, conditional on
st = i, is given by:
(i)
(2.5)
(i)
(2.6)
xt|t−1 := Gi + Fi xt−1|t−1 ,
Pt|t−1 := Fi Pt−1|t−1 Fi0 + Qi ,
(i)
(i)
µt|t−1 := yt − Hi0 xt|t−1 − A0i zt ,
(i)
(2.7)
(i)
Ct|t−1 := Hi0 Pt|t−1 H i ,
(i)
(i)
(2.8)
(i)
(i)
(i)
xt|t := xt|t−1 + Pt|t−1 Hi [Ct|t−1 ]−1 µt|t−1 ,
(i)
(i)
(i)
(2.9)
(i)
Pt|t := (I − Pt|t−1 Hi [Ct|t−1 ]−1 Hi0 )Pt|t−1 ,
(2.10)
(i)
where xt−1|t−1 is an estimate of xt−1 based on information up to time t − 1; xt|t−1 is an estimate of xt
(i)
based on information up to time t − 1 given st = i; Pt|t−1 is an estimate of the mean squared error of
(i)
(i)
xt|t−1 ; µt|t−1 estimates the conditional forecast error of yt based on information up to time t − 1 given
(i)
(i)
st = i; and Ct|t−1 estimates the conditional variance of the forecast error µt|t−1 .
(1)
Let ξt|t be an estimate of P r(st = 1|Ωt ). The “collapse” step combines the two filtered states xt|t
(2)
and xt|t into a single estimate of xt based on Ωt by
(1)
(2)
xt|t := ξt|t xt|t + (1 − ξt|t )xt|t .
(2.11)
Then, the mean squared error of xt|t can be computed as:
(1)
(2)
(1)
(2)
Pt|t := ξt|t Pt|t + (1 − ξt|t )Pt|t + ξt|t (1 − ξt|t ) xt|t − xt|t
(1)
(2) 0
xt|t − xt|t
.
(2.12)
At the end of each iteration, equations (2.11) and (2.12) are employed to collapse the two filtered
states into one filtered state xt|t and calculate the mean squared error of xt|t , i.e. Pt|t .
6
2.3
Modified Markov switching filter
To complete the modified Kalman filter, we need to calculate ξt|t for t = 1, 2, ..., T . The calculation
of ξt|t is based on the Markov regime switching filter introduced in Hamilton (1989) and conducted in
three steps.
1. At the beginning of the t-th iteration, given ξt−1|t−1 , we have
ξt|t−1 := pξt−1|t−1 + (1 − q)(1 − ξt−1|t−1 ).
(2.13)
2. An estimate of the density of yt is obtained by
f (yt |Ωt−1 ) := ξt|t−1 f (yt |st = 1, Ωt−1 ) + (1 − ξt|t−1 )f (yt |st = 2, Ωt−1 ),
where the conditional density satisfies
h
(i)
f (yt |st = i, Ωt−1 ) := 2πCt|t−1
i−1/2
 h

 µ(i)
exp −


(i)
i2 


t|t−1
(i)
2Ct|t−1
, (i = 1, 2)
(2.14)


(i)
where µt|t−1 and Ct|t−1 are given in (2.7) and (2.8).
3. Once yt is observed, we can update the modified conditional regime probability
ξt|t
pξt−1|t−1 + (1 − q)(1 − ξt−1|t−1 ) f (yt |st = 1, Ωt−1 )
:=
.
f (yt |st = 2, Ωt−1 ) + pξt−1|t−1 + (1 − q)(1 − ξt−1|t−1 ) [f (yt |st = 1, Ωt−1 ) − f (yt |st = 2, Ωt−1 )]
(2.15)
Figure 1 presents a flowchart for the filter described in subsections 2.2-2.3. The modified log likelihood
function under the two regimes specification, i.e.
PT
t=1 log [f (yt |Ωt−1 )],
is given as a by-product of the
filter. The initial values ξ0|0 , x0|0 and P0|0 will be discussed in section 4.
2.4
Hypotheses
Let δ represent parameters that are affected by regime switching, taking a value of δ1 in regime 1 and
δ2 in regime 2. Let β represent parameters that remain constant across the regimes. Then, for any
7
Figure 1: Flowchart for the filter: state space models with regime switching
8
prespecified 0 < p, q < 1, the modified log likelihood function is given by
LA (p, q, β, δ1 , δ2 )
=
T
X
n
o
log f1t (p, q, β, δ1 , δ2 )ξt|t−1 (p, q, β, δ1 , δ2 ) + f2t (p, q, β, δ1 , δ2 )(1 − ξt|t−1 (p, q, β, δ1 , δ2 )) , (2.16)
t=1
where
fit (p, q, β, δ1 , δ2 ) = f (yt |st = i, Ωt−1 ),
which is defined in (2.14). When δ1 = δ2 = δ, the modified log likelihood function reduces to
LN (β, δ) =
T
X
:=
log f1t (p, q, β, δ, δ)
t=1
T
X
(2.17)
log ft (β, δ),
t=1
which can be computed using the standard Kalman filter. This paper studies a test statistic based
on (2.17) and (2.16) for the one regime specification versus the two regimes specification. To start, I
impose the following restrictions on the DGP and the parameter space, following Assumption 1-3 in
Qu and Zhuo (2015).
Assumption 1. (i) The random vector (zt0 , yt ) is strict stationary, ergodic and β-mixing with the
mixing coefficient βτ satisfying βτ ≤ cρτ for some c > 0 and ρ ∈ [0, 1). (ii) Under the null hypothesis,
yt is generated by f (·|Ωt−1 ; β∗ , δ∗ ) where β∗ and δ∗ are interior points of Θ ⊂ Rnβ and ∆ ⊂ Rnδ with
Θ and ∆ being compact.
h
i
Assumption 2. Under the null hypothesis: (i) (β∗ , δ∗ ) uniquely solves max(β,δ)∈Θ×∆ E LN (β, δ) ;
h
i
(ii) for any 0 < p, q < 1, (β∗ , δ∗ , δ∗ ) uniquely solves max(β,δ1 ,δ2 )∈Θ×∆×∆ E LA (β, δ1 , δ2 ) .
h
i
Assumption 3. Under the null hypothesis, we have: (i) T −1 LN (β, δ) − ELN (β, δ) = op (1) holds
uniformly over (β, δ) ∈ Θ × ∆, with T −1
PT
t=1
∇(β 0 ,δ0 )0 log ft (β, δ)
∇(β 0 ,δ0 ) log ft (β, δ) being positive
definite over an open neighborhood of (β∗ , δ∗ ) for sufficiently large T ; (ii) for any 0 < p, q < 1,
h
i
T −1 LA (β, δ1 , δ2 ) − ELA (β, δ1 , δ2 ) = op (1) holds uniformly over (β, δ1 , δ2 ) ∈ Θ × ∆ × ∆.
9
Using the above notation, the null and alternative hypotheses can be more formally stated as:
H0 : δ1 = δ2 = δ∗ for some unknown δ∗ ,
H1 : (δ1 , δ2 ) = (δ1∗ , δ2∗ ) for some unknown δ1∗ 6= δ2∗ and (p, q) ∈ (0, 1) × (0, 1).
For the remainder of this paper, I use an ARM A(K, L) model to illustrate the main results.
The illustrative model. Let us consider a general ARM A(K, L) model:
mt =
K
X
φk,st mt−k + εt +
k=1
L
X
θl,st εt−l , εt ∼ i.i.d. N (0, σs2t ),
(2.18)
l=1
yt = αst + mt ,
(2.19)
where only yt is observable but not mt or st . In this illustrative model, some or all of the model
parameters can be affected by st . This ARM A model can be written into the setting in (2.1)-(2.3) as
follows: Define nr = max{K, L + 1}. Interpret φj,st = 0 for j > K and θj,st = 0 for j > L. Let






Fst = 








φ2,st
···
φnr −1,st
1
0
···
0
0
0
..
.
1
..
.
···
···
0
..
.
0
..
.
0
0
···
1
0
 φ1,st


2
 σst
 εt 



ut = 




0
..
.
0
φnr ,st 





 0
 ∼ N (0, Qs ) , Qs = 
t
t

 .

 ..




0





Hst = 




1
θ1,st
..
.
θnr −1,st





,






Gst = 0,

0 ···
0 
0 ···
..
. ···
0
..
.
0 ···
0





 , As = αs , and zt = 1.
t
t




10



,




3
The test statistic
This section proposes a test statistic based on the MLR. Let β̃ and δ̃ denote the maximizer of the log
likelihood function under null hypothesis:
(β̃, δ̃) = arg max LN (β, δ).
(3.1)
β,δ
The MLR evaluated at some 0 < p, q < 1 then equals
A
N
M LR(p, q) = 2 max L (p, q, β, δ1 , δ2 ) − L (β̃, δ̃) .
β,δ1 ,δ2
(3.2)
It is natural to consider the following test statistic:
SupM LR(Λ ) =
sup M LR(p, q),
(p,q)∈Λ
where Λ is a compact set to be specified later. Similar test statistics have been studied by Hansen
(1992), Garcia (1998) and Qu and Zhuo (2015).
MLR under prespecified p and q
4
This section studies the MLR under a given (p, q) ∈ Λ . The choice of Λ will be discussed in the next
section.
4.1
Conditional regime probability
Let us first study the conditional regime probability ξt+1|t (p, q, β, δ1 , δ2 ) as well as its derivatives with
respect to β, δ1 and δ2 because the results will be needed to develop the expansion of the modified
log likelihood function. Combining equations (2.13) and (2.15) gives a recursive formula to calculate
ξt+1|t (p, q, β, δ1 , δ2 ):
ξt+1|t (p, q, β, δ1 , δ2 )
=p + (p + q − 1)
f2t (p, q, β, δ1 , δ2 )(ξt|t−1 (p, q, β, δ1 , δ2 ) − 1)
,
f1t (p, q, β, δ1 , δ2 )ξt|t−1 (p, q, β, δ1 , δ2 ) + f2t (p, q, β, δ1 , δ2 )(1 − ξt|t−1 (p, q, β, δ1 , δ2 ))
(4.1)
11
where
fit (p, q, β, δ1 , δ2 ) = f (yt |st = i, Ωt−1 )
(4.2)
as in (2.14). This recursive formula implies that the derivatives of ξt+1|t with respect to the model
parameters must also follow first order difference equations. Because the asymptotic expansions are
considered around the estimates under the null hypothesis, it is sufficient to analyze ξt+1|t (p, q, β, δ1 , δ2 )
and its derivatives at δ1 = δ2 = δ for an arbitrary value of δ in ∆.
Let θ = (β 0 , δ10 , δ20 )0 be an augmented parameter vector. We then three sets of integers (they index
the elements in β, δ1 and δ2 , respectively):
I0 = {1, ..., nβ }, I1 = {nβ + 1, ..., nβ + nδ }, I2 = {nβ + nδ + 1, ..., nβ + 2nδ }.
Let “ḡ” denote that g(β, δ1 , δ2 ) is evaluated at some β and δ1 = δ2 = δ, i.e., ξ¯t+1|t and f¯t denote
that ξt+1|t (p, q, β, δ1 , δ2 ) and f1t (p, q, β, δ1 , δ2 ) (or f2t (p, q, β, δ1 , δ2 )) are evaluated at some β and δ1 =
δ2 = δ. Let ∇θj1 ...∇θjk ξ¯t|t−1 , ∇θj1 ...∇θjk f¯1t and ∇θj1 ...∇θjk f¯2t denote the k-th order derivatives of
ξt|t−1 (p, q, β, δ1 , δ2 ), f1t (p, q, β, δ1 , δ2 ) and f2t (p, q, β, δ1 , δ2 ) with respect to the (j1 , ..., jk )-th elements
of θ, evaluated at some β and δ1 = δ2 = δ. Also let “F̄ ” denote the matrix Fi (i = 1 or 2) evaluated
at some β and δ1 = δ2 = δ and ∇θj1 ...∇θjk F̄i denote the k-th order derivatives of the parameter
matrix Fi evaluated at some β and δ1 = δ2 = δ. By definition, the following relationships hold:
∇θj1 ...∇θjk f¯1t = ∇θj1 ...∇θjk f¯2t if j1 , ..., jk all belong to I0 . The next lemma is parallel to Lemma 1
in Qu and Zhuo (2015), which characterizes the properties of ξt+1|t (p, q, β, δ1 , δ2 ) and its derivatives
when δ1 = δ2 = δ.
Lemma 1. Let ξ0|0 = ξ∗ , ρ = p + q − 1 and r = ρξ∗ (1 − ξ∗ ) with ξ∗ defined in (2.4). Then, for t ≥ 1,
we have:
1. ξ¯t+1|t = ξ∗ .
2. ∇θj ξ¯t+1|t = ρ∇θj ξ¯t|t−1 + Ēj,t , where
∇θj f¯1t ∇θj f¯2t
=r
−
,
f¯t
f¯t
"
Ēj,t
#
with j ∈ {I0 , I1 , I2 }.
3. ∇θj ∇θk ξ¯t+1|t = ρ∇θj ∇θk ξ¯t|t−1 + Ējk,t , where Ējk,t are given by (Let (Ia ,Ib ) denote the situation with
j ∈ Ia and k ∈ Ib ; a, b = 0, 1, 2,):
12
(I0 , I0 ) : 0
(I0 , I1 ) or (I0 , I2 ) : r
∇θj ∇θk f¯1t
f¯t
∇θj ∇θk f¯2t
f¯t
−
+
∇θj f¯2t ∇θ f¯2t
k
f¯t
f¯t
−
∇θj f¯2t ∇θ f¯1t
k
f¯t
f¯t
(I1 , I1 ) or (I1 , I2 ) or (I2 , I2 ) :
∇θj f¯1t
∇θj f¯2t
∇θk f¯1t
∇θk f¯2t
−
∇θk ξ¯t|t−1 +
−
∇θj ξ¯t|t−1
¯
¯
¯
ft
ft
ft
f¯t
∇θj f¯1t ∇θk f¯1t
∇θj f¯2t ∇θk f¯2t
∇θj f¯1t ∇θk f¯2t
∇θj f¯2t ∇θk f¯1t
− 2r ξ∗
− (1 − ξ∗ )
+ r(2ξ∗ − 1)
+
.
f¯t
f¯t
f¯t
f¯t
f¯t
f¯t
f¯t
f¯t
r
4.
∇θj ∇θk f¯1t
∇θj ∇θk f¯2t
−
¯
ft
f¯t
+ ρ(1 − 2ξ∗ )
∇θj ∇θk ∇θk ξ¯t+1|t = ρ∇θj ∇θk ∇θk ξ¯t|t−1 + Ējkl,t , where Ējkl,t are given in the appendix with j, k, l ∈
{Ia , Ib , Ic } and a, b, c = 0, 1, 2.
I now discuss the first order derivatives of fit appearing in the lemma. By (2.5), (2.6), (2.7), (2.8),
(2.14), and (4.2), we have:
h
fit = 2πHi0 Fi Pt−1|t−1 Fi0 + Qi Hi
i−1/2
 h
i2 


 yt − Hi0 Gi + Fi xt−1|t−1 − A0i zt 
.
exp −




2Hi0 Fi Pt−1|t−1 Fi0 + Qi Hi
The first order derivative of fit with respect to the j-th component in θ is as follows:
i
h
 yt − Hi0 Gi + Fi xt−1|t−1 − A0i zt 
∇θj fit = fit


H0 F P
F0 + Q H
i
n
i t−1|t−1 i
i
(4.3)
i
× ∇θj Hi0 Gi + Fi xt−1|t−1 + ∇θj A0i zt + Hi0 ∇θj Gi + ∇θj Fi xt−1|t−1 + Fi ∇θj xt−1|t−1
o

i2
 h


0 G +F x
0z


y
−
H
−
A
t
i
i
t
t−1|t−1
i
i
1

+ fit 
−1



2Hi0 Fi Pt−1|t−1 Fi0 + Qi Hi 
Hi0 Fi Pt−1|t−1 Fi0 + Qi Hi

n
× ∇θj Hi0 Fi Pt−1|t−1 Fi0 + Qi Hi + Hi0 Fi Pt−1|t−1 Fi0 + Qi ∇θj Hi
o
+Hi0 ∇θj Fi Pt−1|t−1 Fi0 + Fi ∇θj Pt−1|t−1 Fi0 + Fi Pt−1|t−1 ∇θj Fi0 + ∇θj Qi Hi .
The exact expressions for the second order derivatives of fit are included in the appendix. The
properties of xt|t and Pt|t and their derivatives will be studied in the next subsection (see Lemma 3
13
below). Note that, for some β and δ1 = δ2 ,
∇θj f¯1t − ∇θj f¯2t
i
h
 yt − H̄ 0 Ḡ + F̄ x̄t−1|t−1 − Ā0 zt 
= f¯t


H̄ 0 F̄ P̄
F̄ 0 + Q̄ H̄
t−1|t−1
×
n
∇θj H̄10 − ∇θj H̄20
h
+H̄ 0
Ḡ + F̄ x̄t−1|t−1 + ∇θj Ā01 − ∇θj Ā02 zt
∇θj Ḡ1 − ∇θj Ḡ2 + ∇θj F̄1 − ∇θj F̄2 x̄t−1|t−1
io

i2
 h


0 Ḡ + F̄ x̄
0z


y
−
H̄
−
Ā
t
t
t−1|t−1
1

−1
+ f¯t 



2H̄ 0 F̄ P̄t−1|t−1 F̄ 0 + Q̄ H̄ 
H̄ 0 F̄ P̄t−1|t−1 F̄ 0 + Q̄ H̄

×
n
+H̄ 0
∇θj H̄10 − ∇θj H̄20
h
F̄ P̄t−1|t−1 F̄ 0 + Q̄ H̄ + H̄ 0 F̄ P̄t−1|t−1 F̄ 0 + Q̄
∇θj H̄1 − ∇θj H̄2
∇θj F̄1 − ∇θj F̄2 P̄t−1|t−1 F̄ 0 + F̄ P̄t−1|t−1 ∇θj F̄10 − ∇θj F̄20 + ∇θj Q̄1 − ∇θj Q̄2
i
o
H̄ ,
in which the ∇θj x̄t−1|t−1 and ∇θj P̄t−1|t−1 terms are canceled out. Consequently, these two quantities
are not needed for calculating ∇θj ξ¯t+1|t . Similarly, ∇θj ∇θk x̄t−1|t−1 and ∇θj ∇θk P̄t−1|t−1 are not needed
for calculating ∇θj ∇θk f¯1t − ∇θj ∇θk f¯2t . This is also true for higher order derivatives of (f1t − f2t ) when
evaluated at δ1 = δ2 . I now use an example to illustrate Lemma 1.
The illustrative model (cont’d). Consider the illustrative example (2.18)-(2.19) and assume that
both αst and σs2t are affected by regime switching. Lemma 1 implies that
i
1 h
0
y
−
ᾱ
−
H̄
F̄
x̄
t
t−1|t−1 ,
σ̄ 2 

2
0
1  yt − ᾱ − H̄ F̄ x̄t−1|t−1

− 1 .
= ρ∇σ2 ξ¯t|t−1 + r 2 
1
2σ̄
σ̄ 2
∇α1 ξ¯t+1|t = ρ∇α1 ξ¯t|t−1 + r
∇σ2 ξ¯t+1|t
1
Because the filter described in subsections 2.2-2.3 reduces to the standard Kalman filter when δ1 = δ2 ,
∇α1 ξ¯t+1|t and ∇σ2 ξ¯t+1|t both reduce to stationary AR(1) processes with mean zero when evaluated at
1
the true parameter values under the null hypothesis. Their variances are finite and satisfy
E(∇α1 ξ¯t+1|t )2 =
r2
(1 − ρ2 )σ∗2
and E(∇σ2 ξ¯t+1|t )2 =
1
where σ∗2 denotes the true value of σs2t under the null hypothesis.
14
r2
,
2(1 − ρ2 )σ∗4
4.2
Filtered state and its mean squared error
Since both the filtered state in (2.11) and its mean squared error in (2.12) are components in the
modified log likelihood function, it is important to study these functions and their derivatives with
respect to β, δ1 and δ2 . As in the previous subsection, it is sufficient to study these quantities when
δ1 = δ2 .
Under Assumptions 1-3 and when δ1 = δ2 , xt in (2.1) is stationary. The unconditional mean
of xt can be employed as the initial value, denoted by x0|0 . The unconditional mean of xt satisfies
E(xt ) = Ḡ + F̄ E(xt−1 ). This implies
x0|0 = (I − F̄ )−1 Ḡ.
The next lemma provides the initial value for Pt|t when δ1 = δ2 , denoted by P0|0 . Its results are also
used later to study the properties of xt|t and Pt|t .
Lemma 2. Let F̄ have all it eigenvalues inside the unit circle. Set P0|0 = P̄∗ , where P̄∗ solves
0
h
P̄∗ = I − F̄ P̄∗ F̄ + Q̄ H̄ H̄
0
0
F̄ P̄∗ F̄ + Q̄ H̄
i−1
H̄
0
F̄ P̄∗ F̄ 0 + Q̄ .
(4.4)
Then, under the null hypothesis and Assumption 1-3,
(i)
P̄t|t = P̄∗ , P̄t|t = P̄∗ and P̄t|t−1 = F̄ P̄∗ F̄ 0 + Q̄,
for all t = 1, ..., T .
In the subsequent analysis, P̄t|t−1 is abbreviated as P̄ . The following assumption, which is analogous to Proposition 13.2 in Hamilton (1994), ensures that P̄∗ and P̄ are unique.
Assumption 4. The eigenvalues of
P̄ H̄ H̄ 0
I− 0
H̄ P̄ H̄
!
F̄
are all inside the unit circle.
I use the ARM A model in (2.18)-(2.19) to illustrate this assumption.
The illustrative model (cont’d.) Consider the model in (2.18)-(2.19). Then, P̄∗ and P̄ are equal
15
to 0 and Q̄ respectively. The quantity in Assumption 4 is given by:

 −θ̄1


 1
!


P̄ H̄ H̄ 0
I− 0
F̄ = 
 0
H̄ P̄ H̄


 0


−θ̄2 · · ·
0

−θ̄nr−1
0 




,






0
0
···
0 
1
···
0
0
..
.
0
0
..
.
···
0
1
0
where θ̄j denotes that the parameter θj,st is evaluated at some β and δ1 = δ2 = δ. For the ARM A(1, 1)
model, Assumption 4 is equivalent to −1 < θ̄1 < 1.
The next lemma contains the details on the first and second order derivatives of the filtered state
and its mean squared error when evaluated at δ1 = δ2 = δ.
Lemma 3. Under the null hypothesis and Assumptions 1-4, we have:
1. For any j ∈ Ia , a = 0, 1, 2,
"
P̄ H̄ H̄ 0
I− 0
H̄ P̄ H̄
vec ∇θj P̄t|t =
#⊗2
!
vec ∇θj P̄t−1|t−1 + vec P̄j,t ,
F̄
where
P̄j,t = ξ∗
P̄ H̄ H̄ 0
I−
H̄ 0 P̄ H̄
0
∇θj F̄1 P̄∗ F̄ +
0
P̄ H̄ H̄
+ (1 − ξ∗ ) I −
H̄ 0 P̄ H̄
− ξ∗
0
∇θj F̄2 P̄∗ F̄ +
P̄ ∇θj H̄1 H̄ 0 + H̄∇θj H̄10
− (1 − ξ∗ )
F̄ P̄∗ ∇θj F̄10
H̄ 0 P̄ H̄
−
P̄ ∇θj H̄2 H̄ 0 + H̄∇θj H̄20
+ ∇θj Q1
F̄ P̄∗ ∇θj F̄20
P̄ H̄ H̄ 0
H̄ 0 P̄ H̄
−
H̄ H̄ 0 P̄
I−
H̄ 0 P̄ H̄
+ ∇θj Q2
H̄ H̄ 0 P̄
I−
H̄ 0 P̄ H̄
∇θj H̄10 P̄ H̄ + H̄ 0 P̄ ∇θj H̄1
H̄ 0 P̄ H̄
P̄ H̄ H̄ 0
!
F̄ P̄∗ F̄ 0 + Q̄
2
∇θj H̄20 P̄ H̄ + H̄ 0 P̄ ∇θj H̄2
H̄ 0 P̄ H̄
2
!
F̄ P̄∗ F̄ 0 + Q̄ .
2. For any j ∈ Ia , a = 0, 1, 2,
"
∇θj x̄t|t =
P̄ H̄ H̄ 0
I− 0
H̄ P̄ H̄
!
#
F̄ ∇θj x̄t−1|t−1 + X̄j,t ,
where the expression of X̄j,t are given in the appendix.
3. For any j ∈ Ia and k ∈ Ib , a, b = 0, 1, 2,
vec ∇θj ∇θk P̄t|t =
"
P̄ H̄ H̄ 0
I− 0
H̄ P̄ H̄
#⊗2
!
vec ∇θj ∇θk P̄t−1|t−1 + vec P̄jk,t ,
F̄
16
where the expression of P̄jk,t are given in the appendix.
4. For any j ∈ Ia and k ∈ Ib , a, b = 0, 1, 2,
"
!
P̄ H̄ H̄ 0
I− 0
H̄ P̄ H̄
∇θj ∇θk x̄t|t =
#
F̄ ∇θj ∇θk x̄t−1|t−1 + X̄jk,t ,
where the expression of X̄jk,t are given in the appendix.
Lemma 3 shows that the first and second order derivatives of the filtered state and its mean
squared error all follow first order linear difference equations and the lagged coefficient matrices for
h
i
them always include I − P̄ H̄(H̄ 0 P̄ H̄)−1 H̄ 0 F̄ . The recursive structures implied by Lemma 3 suggest
that we apply a similar strategy to analyze xt|t and Pt|t , as we are studying the properties of the
higher order derivatives of ξt+1|t . The following example illustrates the results in Lemma 3 with an
ARM A(1, 1) model.
The illustrative model (cont’d.) Consider the ARM A(1, 1) model in (2.18)-(2.19) and assume
only αst switches. Lemma 3 implies that P̄∗ and P̄ are equal to 0 and Q̄, respectively. Meanwhile,
further calculations show that ∇θj P̄t|t = 0 for j ∈ {1, ..., nβ + 2} and
∇θj x̄t|t =

0




0 0





− 1+ξ∗θ̄


1






 − 1−ξ∗
1+θ̄1
j ∈ {1, ..., nβ },
0
j = nβ + 1,
0
0
0
j = nβ + 2.
The second order derivatives of Pt|t and xt|t , with respect to α1 , satisfy
∇2α1 P̄t|t = 2ξ∗ (1 − ξ∗ )
1
1 − θ̄12
!


−θ̄1 
1


−θ̄1

1
and


 −θ̄1 0 
∇2α1 x̄t|t = 
1
0
2
 ∇α1 x̄t−1|t−1 − 2∇α1 ξ¯t|t
0
+
−φ̄1 θ̄1 φ̄1
0
1 0
yt − ᾱ − H̄ 0 F̄ x̄t−1|t−1
σ̄ 2
17
!
2ξ∗ (1 − ξ∗ ),
where
∇α1 f¯1t ∇α1 f¯2t
∇α1 ξ¯t|t = ρ∇α1 ξ¯t−1|t−1 + (1 − ξ∗ )ξ∗
−
.
f¯t
f¯t
"
#
In this case, when δ1 = δ2 , ∇2α1 P̄t|t is a constant matrix, while ∇2α1 x̄t|t depends on ∇2α1 x̄t−1|t−1 , ∇α1 ξ¯t|t ,
and the prediction error, yt − ᾱ − H̄ 0 F̄ x̄t−1|t−1 , at time t.
4.3
Modified log likelihood function and its expansion
Because of the multiple local maxima in LA (p, q, β, δ1 , δ2 ), it is difficult to directly expand this function
around the null estimates (β̃, δ̃, δ̃). Both Cho and White (2007) and Qu and Zhuo (2015) suggest to
work with the concentrated likelihood function. To derive the concentrated likelihood function, β and
δ1 are treated as functions of δ2 and the dependence between (β, δ1 ) and δ2 is quantified using the first
order conditions that define the concentrated and modified log likelihood function (see Lemma A.3 in
the appendix). This effectively removes β and δ1 from the subsequent analysis and allows us to work
with the concentrated, modified log likelihood function, which is only a function of δ2 . Therefore, we
can expand the concentrated, modified log likelihood function around δ2 = δ̃ (see Lemma 4 below) to
obtain an approximation for M LR(p, q).
For any δ2 ∈ ∆, we can write
L(p, q, δ2 ) = max LA (p, q, β, δ1 , δ2 )
β,δ1
and
β̂(δ2 ), δ̂1 (δ2 ) = arg max LA (p, q, β, δ1 , δ2 ).
β,δ1
Then,
M LR(p, q) = 2 max[L(p, q, δ2 ) − L(p, q, δ̃)].
δ2
(k)
For k ≥ 1, let Li1 ...ik (p, q, δ2 ) (i1 , ...ik ∈ {1, ..., nδ }) denote the k-th order derivative of L(p, q, δ2 )
with respect to the (i1 , ...ik )-th elements of δ2 . Let dj (j ∈ {1, ..., nδ }) denote the j-th element of
18
(δ2 − δ̃). Then, a fourth order Taylor expansion of L(p, q, δ2 ) around δ̃ is given by
L(p, q, δ2 ) − L(p, q, δ̃) =
+
nδ
X
(1)
n
Lj (p, q, δ̃)dj +
j=1
nδ
nδ X
nδ X
X
n
δ
δ X
1 X
(2)
L (p, q, δ̃)dj dk
2! j=1 k=1 jk
(4.5)
1
(3)
L (p, q, δ̃)dj dk dl
3! j=1 k=1 l=1 jkl
n
n
n
n
δ
δ X
δ X
δ X
1 X
(4)
L
(p, q, δ̄)dj dk dl dm ,
+
4! j=1 k=1 l=1 m=1 jklm
where in the last term δ̄ is a value that lies between δ2 and δ̃. Two lemmas will be provided to analyze
this expansion. Here, two more assumptions, which are similar to Assumptions 4 and 5 in Qu and
Zhuo (2015), are needed.
Assumption 5. There exists an open neighborhood of (β∗ , δ∗ ), denoted by B(β∗ , δ∗ ), and a sequence
of positive, strictly stationary and ergodic random variables {υt } satisfying Eυt1+c < L < ∞ for some
c > 0, such that
α(k)
∇θ ...∇θ ft (β, δ1 ) k
ik
i1
sup
< υt
f
(β,
δ
)
t
1
(β,δ1 )∈B(β∗ ,δ∗ )
for all i1 , ..., ik ∈ {1, ..., nβ + nδ } , where 1 ≤ k ≤ 5; α(k) = 6 if k = 1, 2, 3 and α(k) = 5 if k = 4, 5.
(5)
Assumption 6. There exists η > 0, such that supp,q∈[,1−] sup|δ−δ̃|<η T −1 |Ljklmn (p, q, δ)| = Op (1)
for all j, k, l, m, n ∈ {1, ..., nδ }, where is an arbitrary small constant satisfying 0 < < 1/2.
The next lemma characterizes the derivatives of β̂(δ2 ) and δ̂1 (δ2 ) with respect to δ2 evaluated
at δ2 = δ̃. To shorten the expressions, let ξ˜t+1|t and f˜t denote ξt+1|t (p, q, β, δ1 , δ2 ) and ft (β, δ1 )
evaluated at (β, δ1 , δ2 ) = (β̃, δ̃, δ̃). Also, let ∇δ1i1 ...∇δ1ik ξ˜t|t−1 and ∇δ1i1 ...∇δ1ik f˜1t denote the k-th
order derivative of ξt+1|t (p, q, β, δ1 , δ2 ) and ft (β, δ1 ) with respect to the (i1 , ..., ik )-th elements of δ1 ,
evaluated at (β, δ1 , δ2 ) = (β̃, δ̃, δ̃). Finally, define
# "
#
2 "
∇δ1j ∇δ1k f˜1t
∇δ1j ∇δ1k f˜2t
∇δ2j ∇δ2k f˜1t
∇δ2j ∇δ2k f˜2t
Ũjk,t =
ξ∗
+ (1 − ξ∗ )
+ ξ∗
+ (1 − ξ∗ )
f˜t
f˜t
f˜t
f˜t
#
"
∇δ2j ∇δ1k f˜1t
∇δ2j ∇δ1k f˜2t
∇δ1j ∇δ2k f˜1t
∇δ1j ∇δ2k f˜2t
1 − ξ∗
−
ξ∗
+ (1 − ξ∗ )
+ ξ∗
+ (1 − ξ∗ )
ξ∗
f˜t
f˜t
f˜t
f˜t
"
!
#
∇θj f¯1t
∇θj f¯2t
1
∇θk f¯1t
∇θk f¯2t
+ 2 ∇δ1j ξ˜t|t−1
∇δ1k ξ˜t|t−1 ,
−
+
−
(4.6)
ξ∗
f¯t
f¯t
f¯t
f¯t
1 − ξ∗
ξ∗
19
and
ξ∗ ∇(β 0 ,δ0 )0 f˜1t + (1 − ξ∗ )∇(β 0 ,δ0 )0 f˜2t
2
Ũjk,t ,
˜
ft



ξ∗ ∇(β 0 ,δ0 )0 f˜1t + (1 − ξ∗ )∇(β 0 ,δ0 )0 f˜2t
ξ∗ ∇(β 0 ,δ0 ) f˜1t + (1 − ξ∗ )∇(β 0 ,δ0 ) f˜2t
1
2
1
2

,
I˜t = 
˜
˜
ft
ft
D̃jk,t =
1
Ṽjklm = T −1
T
X
Ũjk,t Ũlm,t , D̃lm = T −1
t=1
T
X
D̃lm,t , I˜ = T −1
t=1
T
X
I˜t .
(4.7)
t=1
(2)
As will be seen, Ũjk,t is the leading term in Ljk (p, q, δ̃), while D̃jk,t and I˜t appear when constructing
(4)
the leading term of Ljklm (p, q, δ̃). The next lemma, analogous to Lemma 3 in Qu and Zhuo (2015),
(k)
presents the properties of Li1 ...ik (p, q, δ2 ) when δ2 is evaluated at the null estimate, i.e. δ̃.
Lemma 4. Under the null hypothesis and Assumptions 1-6, for all j, k, l, m ∈ {1, ..., nδ }, we have
(1)
1. Lj (p, q, δ̃) = 0.
(2)
PT
2. T −1/2 Ljk (p, q, δ̃) = T −1/2
t=1 Ũjk,t
(3)
+ op (1).
3. T −3/4 Ljkl (p, q, δ̃) = Op T −1/4 .
(4)
0 ˜−1 D̃ + Ṽ
0 ˜−1 D̃
0 I˜−1 D̃
4. T −1 Ljklm (p, q, δ̄) = −{Ṽjklm − D̃jk
km } + op (1).
lm + Ṽjmkl − D̃jm I
kl
jlkm − D̃jl I
The illustrative model (cont’d.) We use the ARM A(1, 1) model to illustrate the leading terms
(2)
(4)
of T −1/2 Ljk (p, q, δ̃) and T −1 Ljklm (p, q, δ̃) in Lemma 4. Suppose only αst switches. Then, Ũjk,t and
D̃jk,t equal, respectively,
1 − ξ∗
ξ∗
(
1 + φ̃21 + 2φ̃1 θ̃1
1 − θ̃12
−2(φ̃1 + θ̃1 )
t−1
X
!

(−θ̃1 )j−1 
1
σ̃ 2
t−j
X
!
µ̃2t
µ̃t
−1 +2
σ̃ 2
σ̃ 2
ρs−1
s=1
j=1
µ̃t−s
σ̃ 2
"X
t−1
s
ρ
s=1
+ φ̃1 θ̃1
µ̃t−s
σ̃ 2
#

µ̃t−j 

σ̃ 2
and
(yt−1 −α̃)µ̃t
σ̃ 2
µ̃t µ̃t−1
σ̃ 2
1
2σ̃ 2
µ̃2t
σ̃ 2
−1
− σ̃µ̃2t
ξ∗ (1−φ̃1 )
1+θ̃1
where µ̃t denotes the residuals under the null hypothesis and σ̃ 2 = T −1
variance function of T
−1/2 PT
t=1 Ũjk,t ,
(2)
Ũjk,t ,
PT
2
t=1 µ̃t .
This makes the
and therefore of T −1/2 Ljk (p, q, δ̃), consistently estimable.
20
5
Asymptotic approximations
(2)
Let L(2) (p, q, δ̃) be a square matrix with its (j, k)-th element given by Ljk (p, q, δ̃) for j, k ∈ {1, 2, ..., nδ }.
This section includes three sets of results. (1) The weak convergence of T −1/2 L(2) (p, q, δ̃) over ≤
p, q ≤ 1 − . (2) The limiting distribution of SupM LR(Λ ). (3) A finite sample refinement that
improves the asymptotic approximation.
5.1
Weak convergence of L(2) (p, q, δ̃)
For 0 < pr , qr , ps , qs < 1 and j, k, l, m ∈ {1, 2, ..., nδ }, define
0
ωjklm (pr , qr ; ps , qs ) = Vjklm (pr , qr ; ps , qs ) − Djk
(pr , qr )I −1 Dlm (ps , qs ),
(5.1)
where Vjklm (pr , qr ; ps , qs ) = E [Ujk,t (pr , qr ) Ulm,t (ps , qs )] , Djk (pr , qr ) = EDjk,t (pr , qr ), and I = EIt .
Here, Ujk,t (pr , qr ) , Djk,t (pr , qr ) and It have the same definitions as Ũjk,t , D̃jk,t and I˜t in (4.6) and
(4.7) but evaluated at (pr , qr , β∗ , δ∗ ) instead of (pr , qr , β̃, δ̃). The following lemma is parallel to Lemma
4 in Qu and Zhuo (2015).
Lemma 5. Under the null hypothesis and Assumptions 1-6, we have, over ≤ p, q ≤ 1 − :
T −1/2 L(2) (p, q, δ̃) ⇒ G (p, q) ,
where the elements of G (p, q) are mean zero continuous Gaussian processes satisfying
Cov[Gjk (pr , qr ), Glm (ps , qs )] = ωjklm (pr , qr ; ps , qs )
for j,k,l,m ∈ {1,2,...,nδ }, where ωjklm (pr ,qr ; ps ,qs ) is given by (5.1).
In the appendix, this lemma is proved by first showing the finite dimensional convergence and then
the stochastic equicontinuity.
5.2
Limiting distribution of SupM LR(Λε )
(2)
Let E be a set of open balls that includes all possible values of (p, q) such that Ljk (p, q, δ̃) ≡ 0 for any
(2)
j, k ∈ {1, 2, ..., nδ }. For example, if for some specific j1 and k1 , Lj1 k1 (p1 , q1 , δ̃) ≡ 0, then (p, q) ∈ E if
21
p ∈ (p1 − 1 , p1 + 1 ) and q ∈ (q1 − 1 , q1 + 1 ) for any small 1 , say 1 = 0.01. Define
Λ = {(p, q) : ≤ p, q ≤ 1 − , and (p, q) ∈
/ E}.
(5.2)
Let Ω(p, q) be an n2δ -dimensional square matrix whose (j + (k − 1)nδ , l + (m − 1)nδ )-th element is
given by ωjklm (p, q; p, q). Then, Lemma 5 implies E[vecG (p, q) vecG (p, q)0 ] = Ω(p, q). The next
result, which is analogous to Proposition 2 in Qu and Zhuo (2015), gives the asymptotic distribution
of SupM LR(Λ ).
Proposition 1. Suppose the null hypothesis and Assumptions 1-6 hold. Then
SupM LR(Λ ) ⇒
sup
sup W (2) (p, q, η),
(5.3)
(p,q)∈Λ η∈Rnδ
where Λ is given by (5.2) and
W (2) (p, q, η) = η ⊗2
0
vecG (p, q) −
1 ⊗2 0
η
Ω(p, q) η ⊗2 .
4
Some important features of ωjklm (p, q; p, q) have been shown in section 5.1 of Qu and Zhuo (2015) by
simple examples. In the current context, I also observe the similar features of ωjklm (p, q; p, q) that
this function depends on: (1) the model’s dynamic properties (e.g., whether the regressors are strictly
exogenous or predetermined), (2) which parameters are allowed to switch (e.g., regressions coefficients
or the variance of the errors), and (3) whether nuisance parameters are present.
5.3
A refinement
Qu and Zhuo (2015) provide a refinement to the asymptotic distribution when L(2) (p, q, δ̃) ≡ 0. In
such situations, the magnitude of T −1/2
PT
t=1 Ũjk,t
can be too small to dominate the higher order terms
in the likelihood expansion when p + q is close to 1. This indicates that an asymptotic distribution
that relies entirely on T −1/2
PT
t=1 Ũjk,t
can be inadequate. Motivated by this observation, I consider
a refinement to the asymptotic approximation under Markov switching state space models. The
following assumption is parallel to Assumption 6 in Qu and Zhuo (2015).
Assumption 7. There exists an open neighborhood of (β∗ , δ∗ ), B(β∗ , δ∗ ), and a sequence of pos-
22
itive, strictly stationary and ergodic random variables {υt } satisfying Eυt1+c < ∞ for some c > 0,
such that the supremums of the following quantities over B(β∗ , δ∗ ) are bounded from above by υt :
4 2 ∇θi1 ...∇θik ft (β, δ1 ) /ft (β, δ1 ) , ∇θi1 ...∇θim ft (β, δ1 ) /ft (β, δ1 ) , ∇θi1 ...∇θi8 ft (β, δ1 ) /ft (β, δ1 ),
∇θj1 ∇θi1 ...∇θi7 ft (β, δ1 ) /ft (β, δ1 ) , ∇θj1 ∇θj2 ∇θi1 ...∇θi6 ft (β, δ1 ) /ft (β, δ1 ), where k = 1, 2, 3, 4, m =
5, 6, 7, i1 , ..., i8 ∈ {1, ..., nβ + nδ } and j1 , j2 ∈ {1, ..., nβ }.
Obtaining all the leading terms in an even higher order expansion of the modified likelihood
function is very difficult in the current context. This paper considers incorporating some specific
terms for the refinement. Define
s̃jkl,t (p, q) = −
(1 − ξ∗ )(1 − 2ξ∗ ) ∇δ1j ∇δ1k ∇δ1l f˜1t
,
ξ∗2
f˜t
(5.4)
where x̃t|t and P̃t|t are treated as constant when calculating ∇δ1j ∇δ1k ∇δ1l f˜1t here. For j, k, l, m, n, u ∈
(3)
{1, ..., nδ }, let Gjkl (p, q) be a continuous Gaussian process with mean zero and satisfy
(3)
ωjklmnu (pr , qr ; ps , qs )
(3)
= Cov(Gjkl (pr , qr ) , G(3)
mnu (ps , qs ))
= E [sjkl,t (pr , qr )smnu,t (ps , qs )]
"
−E
ξ∗ ∇(β 0 ,δ0 ) f1t + (1 − ξ∗ )∇(β 0 ,δ0 ) f2t
2
1
f˜t
"
#
sjkl,t (pr , qr ) I −1
ξ∗ ∇(β 0 ,δ0 )0 f1t + (1 − ξ∗ )∇(β 0 ,δ0 )0 f2t
2
1
f˜t
#
smnu,t (ps , qs ) ,
where sjkl,t (p, q) is the same as s̃jkl,t (p, q) but is evaluated at true parameter values. The other
quantities on the right hand side are also evaluated at the true parameter values. For the fourth and
eighth order derivatives, define
k̃jklm,t (p, q) = (1 − ξ∗ ) 1 +
1 − ξ∗
ξ∗
3 !
∇δ1j ∇δ1k ∇δ1l ∇δ1m f˜1t
,
f˜t
(5.5)
where x̃t|t and P̃t|t are treated as constant when calculating ∇δ1j ∇δ1k ∇δ1l ∇δ1m f˜1t here. For i1 , ..., i8 ∈
(4)
{1, ..., nδ }, let Gi1 i2 i3 i4 (p, q) denote a continuous Gaussian process with mean zero and satisfy
(4)
ωi1 i2 ...i8 (pr , qr ; ps , qs )
(4)
(4)
= Cov Gi1 i2 i3 i4 (pr , qr ) , Gi5 i6 i7 i8 (ps , qs )
= E [ki1 i2 i3 i4 ,t (pr , qr ) ki5 i6 i7 i8 ,t (ps , qs )]
"
−E
ξ∗ ∇(β 0 ,δ0 ) f1t + (1 − ξ∗ )∇(β 0 ,δ0 ) f2t
1
2
f˜t
#
ki1 i2 i3 i4 ,t (pr , qr ) I
23
"
−1
ξ∗ ∇(β 0 ,δ0 ) f1t + (1 − ξ∗ )∇(β 0 ,δ0 ) f2t
1
2
f˜t
#
ki5 i6 i7 i8 ,t (ps , qs ) ,
where ki1 i2 i3 i4 ,t (p, q) equals k̃i1 i2 i3 i4 ,t (p, q) but evaluated at the true parameter values. The remaining
quantities on the right hand side are also evaluated at the true parameter values. The next lemma
characterizes the asymptotic properties of s̃jkl,t (p, q) and k̃jklm,t (p, q) when j, k, l, m ∈ {1, ..., nδ }.
Lemma 6. Under the null hypothesis and Assumptions 1-7, we have
T −1/2
T
X
(3)
s̃jkl,t (p, q) ⇒ Gjkl (p, q)
t=1
and
T −1/2
T
X
(4)
k̃jklm,t (p, q) ⇒ Gjklm (p, q).
t=1
We now incorporate the corresponding terms to obtain a refined approximation. Let G(3) (p, q)
(3)
be a n3δ - dimensional vector whose (j + (k − 1)nδ + (l − 1)n2δ )-th element is given by Gjkl (p, q). Let
Ω(3) (p, q) denote an n3δ − by − n3δ matrix whose (j + (k − 1)nδ + (l − 1)n2δ , m + (n − 1)nδ + (r − 1)n2δ )-th
(3)
element is given by ωjklmnr (p, q; p, q). Define
W (3) (p, q, η) = T −1/4
1 ⊗3 0 (3)
1 ⊗3 0
vecG(3) (p, q) − T −1/2
Ω (p, q) η ⊗3 .
η
η
3
36
Let G(4) (p, q) be an n4δ - dimensional vector whose (j + (k − 1)nδ + (l − 1)n2δ + (m − 1)n3δ )-th element
(4)
is given by Gjklm (p, q). Let Ω(4) (p, q) be an n4δ − by − n4δ matrix whose (j + (k − 1)nδ + (l − 1)n2δ +
(4)
(m − 1)n3δ , n + (r − 1)nδ + (s − 1)n2δ + (u − 1)n3δ )-th element is given by ωjklmnrsu (p, q; p, q). Define
W (4) (p, q, η) = T −1/2
1 ⊗4 0
1 ⊗4 0 (4)
vecG(4) (p, q) − T −1
Ω (p, q) η ⊗4 .
η
η
12
576
Then, the distribution of the SupM LR(Λ ) test can be approximated by:
S∞ (Λ ) ≡
sup
sup
n
o
W (2) (p, q, η) + W (3) (p, q, η) + W (4) (p, q, η) ,
(5.6)
(p,q)∈Λ η∈Rnδ
where Λ is specified in (5.2). The following corollary is analogous to Corollary 1 in Qu and Zhuo
(2015).
Corollary 1. Under Assumptions 1-7 and the null hypothesis, we have:
Pr (SupM LR(Λ ) ≤ s) − Pr (S∞ (Λ ) ≤ s) → 0,
24
over Λ in (5.2).
Note that the above result holds irrespective of the model. This follows because the additional
terms W (3) (p, q, η) and W (4) (p, q, η) both converge to zero as T → ∞. These terms provide refinements
in finite samples, having no effect asymptotically. The critical values can be obtained by following the
simulation procedures described in section 5.4 of Qu and Zhuo (2015).
The illustrative model (cont’d). Consider the ARM A(1, 1) model in (2.18)-(2.19) and assume αst
switches. This model can be written as:
mt = φmt−1 + et + θet−1 ,
et ∼ i.i.d. N (0, σ 2 )
yt = αst + mt .
To illustrate the effects of the refined approximation, I simulate data using the ARM A(1, 1) model
with T = 2000, α1 = α2 = 0, φ = 0.9, θ = −0.70 and σ = 0.2. Then, for each simulated sample
and fixed (p, q), I calculate M LR(p, q), the approximation to M LR(p, q) using only the second and
fourth order terms in the Taylor expansion, and the approximation to M LR(p, q) using the second
order, fourth order and refinement terms in the Taylor expansion. After simulating 500 samples, I
calculate both the correlation between M LR(p, q) and its approximation using only the second and
fourth order terms, and the correlation between M LR(p, q) and its approximation using the second
order, fourth order and refinement terms.
Table 1: Comparison of correlations between M LR(p, q) statistic and original and refined approximations
Between M LR(p, q) and
(p, q)
original approximation
refined approximation
(0.90, 0.90)
0.989
0.994
(0.70, 0.90)
0.217
0.843
(0.50, 0.80)
0.239
0.822
I also check these correlations for different p and q. Results are summarized in Table 1. The results
show that including the refinement terms brings the approximation closer to the M LR(p, q) statistic.
25
6
Monte Carlo
This section examines the size and power properties of the SupM LR test statistics. The DGP is
mt = φmt−1 + et + θet−1 ,
et ∼ i.i.d. N (0, σ 2 )
yt = αst + mt ,
where yt is observable and αst switches with p(st = 1|st−1 = 1) = p and p(st = 2|st−1 = 2) = q. Assign
φ = 0.9, θ = −0.70 and σ = 0.2. The choice of this DGP is motivated by both the model studied in
Perron (1993) and the empirical application in the next section. In this section, Λ is specified as in
(5.2) with = 0.01. The critical values and rejection frequencies are all based on 3000 replications.
Table 2: Rejection frequencies under the null
Level
1.00 2.50 5.00
T = 200 SupM LR(Λ0.01 ) 1.20 3.67 7.40
T = 500 SupM LR(Λ0.01 ) 1.17 2.63 6.33
hypothesis
7.50 10.00
10.20 14.23
9.60 12.70
Table 2 reports the sizes of the SupM LR(Λ ) test statistic at five different nominal levels. Under
the null hypothesis, we set α1 = α2 = 0. The rejection frequencies overall are close to the nominal
levels with mild over-rejections in some cases. For example, the rejection rates at the 5% and 10%
levels are 7.40% and 14.23% respectively, for = 0.01 and sample size T = 200. Similar rejection rates
are observed when T = 500.
For power properties, I set α1 = −τ and α2 = τ , with τ = 0.05, 0.10, 0.15, 0.20 and 0.25. The
sample size T = 500 and the number of replications is 3000. Three pairs of values for (p, q) are
considered: (0.70, 0.70), (0.70, 0.90) and (0.90, 0.95).
Table 3: Rejection frequencies
(p, q)
τ
(0.70, 0.70) SupM LR(Λ0.01 )
(0.70, 0.90) SupM LR(Λ0.01 )
(0.90, 0.95) SupM LR(Λ0.01 )
Nominal level, 5%.
under
0.05
6.33
7.67
5.47
the alternative hypotheses
0.10 0.15
0.20
0.25
6.67 21.00 74.33 99.67
8.67 26.33 85.67 100
6.13 17.70 69.57 97.13
The rejection frequencies at 5% nominal levels are reported in Table 3. The power of the SupM LR
statistic increases consistently as the magnitude of |α1 − α2 | increases.
26
7
Application
In this section, I apply the MLR test developed in the preceding sections to study the changes in
monthly U.S. unemployment rates. The data are from the labor force statistics reported in the Current
Population Survey. The full sample is from January 1960 to July 2015. A simple ARM A(1, 1) model
as in (2.18) and (2.19) is considered, i.e.
mt = φmt−1 + et + θet−1 ,
et ∼ i.i.d. N (0, σ 2 )
yt = αst + mt .
Here, αst switches and indicates the mean level of change in the unemployment rate at time t.
SupM LR(Λ0.01 ) equals 36.99 for the full sample with the critical value being 8.83 at the 5% level.
The test statistic therefore provides strong evidence favoring the regime switching specification. To
provide some further evidence for the relevance of the regime switching specification, I estimate the
regimes implied by the model. Let st = 1 and st = 2 represent the tight and slack labor market
regimes, respectively. Here the tight and slack labor market regimes are defined from the perspective
of the job-seekers. Estimation results show that the regime shifts closely follow the recession and
expansion periods dated by the National Bureau of Economic Research (NBER) and shown in Figure 2. Comparing the smoothed regime probabilities and the dates of the recession and expansion
periods, I find that the labor market normally takes time to react at the beginning of a recession.
Estimation under two regimes specification also provides detailed information on changes in the labor
market and the durations of each regime. In the tight labor market regime, the unemployment rate
increases by 0.26% per month on average and this regime lasts about 8.8 months. In the slack labor
market regime, the unemployment rate decreases by 0.03% per month and the regime lasts about
79.4 months. The model assigns low probabilities, around 20 − 30%, to the tight regime during the
relatively shallow recessions of July 1990 to March 1991 and March 2001 to November 2001. This
makes sense, because the increases in the unemployment rate during these two recessions are relatively
moderate when compared with other recessions, such as the recent Great Recession from December
2007 to June 2009.
27
Figure 2. Smoothed probabilities of in tight labor market regime
Note. The shaded areas correspond to the NBER defined recessions. The solid line indicates the smoothed probabilities
of being in the tight labor market regime.
8
Conclusion
This paper develops a modified likelihood ratio (MLR) based test for detecting regime switching in
state space models. The asymptotic distribution of this test statistics is also established. When
applied to changes in U.S. monthly unemployment rates, the test finds strong evidence favoring the
regime switching specification. This paper is the first to develop a test that is based on the likelihood
ratio principle for detecting regime switching in state space models. The techniques developed in this
paper can have implications for hypothesis testing in more general contexts, such as testing for regime
switching in state space models with multiple observables.
28
References
Billingsley, P. (1999): Convergence of Probability Measures, John Willey & Sons.
Carrasco, M., L. Hu, and W. Ploberger (2014): “Optimal Test for Markov Switching Parameters,” Econometrica, 82, 765–784.
Carter, A. V. and D. G. Steigerwald (2012): “Testing for Regime Switching: A Comment,”
Econometrica, 80, 1809–1812.
Chesher, A. D. (1984): “Testing for Neglected Heterogeneity,” Econometrica, 52, 865–872.
Cho, J. S. and H. White (2007): “Testing for Regime Switching,” Econometrica, 75, 1671–1720.
Davidson, R. and J. MacKinnon (1991): “Une nouvelle forme du test de la matrice d’information,”
Annales d’Economie et de Statistique, 171–192.
Davies, R. B. (1977): “Hypothesis Testing When a Nuisance Parameter Is Present Only under the
Alternative,” Biometrika, 64, 247–254.
Garcia, R. (1998): “Asymptotic Null Distribution of the Likelihood Ratio Test in Markov Switching
Models,” International Economic Review, 39, 763–788.
Gordon, K. and A. Smith (1988): “Modeling and Monitoring Discontinuous Changes in Time
Series,” in Bayesian Analysis of Time Series and Dynamic Linear models, ed. by J. Spall, New
York: Marcel Dekker, 359–392.
Hamilton, J. D. (1989): “A New Approach to the Economic Analysis of Nonstationary Time Series
and the Business Cycle,” Econometrica, 57, 357–384.
——— (1990): “Analysis of Time Series Subject to Changes in Regime,” Journal of Econometrics,
45, 39–70.
——— (1994): Time Series Analysis, Princeton, NJ: Princeton University Press.
——— (1996): “Specification Testing in Markov-Switching Time-Series Models,” Journal of Econometrics, 70, 127–157.
——— (2008): “Regime-Switching Models,” in New Palgrave Dictionary of Economics, ed. by
S. Durlauf and L. Blume, London: Palgrave Macmillan.
29
Hansen, B. E. (1992): “The Likelihood Ratio Test under Non-Standard Conditions: Testing the
Markov Switching Model of GNP,” Journal of Applied Econometrics, 7, S61–S82.
Harrison, P. J. and C. F. Stevens (1976): “Bayesian Forecasting,” Journal of the Royal Statistical
Society. Series B (Methodological), 38, 205–247.
Harvey, A. C. (1981): Time Series Models, Oxford, UK: Philip Allan and Humanities Press.
Highfield, R. A. (1990): “Bayesian Approaches to Turning Point Prediction,” in Proceedings of the
Business and Economics Section, Washington, DC: American Statistical Association, 89–98.
Kalman, R. E. (1960): “A New Approach to Linear Filtering and Prediction Problems,” Journal of
Basic Engineering, 82, 35–45.
Kim, C.-J. and C. R. Nelson (1999): State-Space Models with Regime Switching: Classical and
Gibbs-Sampling Approaches with Applications, Cambridge, MA: The MIT Press.
Lancaster, T. (1984): “The Covariance Matrix of the Information Matrix Test,” Econometrica, 52,
1051–1053.
Meinhold, R. J. and N. D. Singpurwalla (1983): “Understanding the Kalman Filter,” The
American Statistician, 37, 123–127.
Neyman, J. and E. Scott (1966): “On the Use of C(a) Optimal Tests of Composite Hypotheses,”
Bull. Inst. Int. Statist., 41, 477–497.
Perron, P. (1993): “Non-stationarities and Non-linearities in Canadian Inflation,” in Economic
Behaviour and Policy Choice under Price Stability : Proceedings of a Conference Held at the Bank
of Canada.
Qu, Z. and F. Zhuo (2015): “Likelihood Ratio Based Tests for Markov Regime Switching,” Working
Paper.
30
Appendix
A
Derivatives of the density function
The derivatives of fit in (2.16) are calculated here. By (2.5), (2.6), (2.7), (2.8), (2.14) and (4.2), we
have:
h
(i)
fit = 2πCt|t−1
i−1/2
i2 


 h

 µ(i)
exp −


t|t−1
(i)
2Ct|t−1
,


where
(i)
µt|t−1 = yt − Hi0 Gi + Fi xt−1|t−1 − A0i zt
and
(i)
Ct|t−1 = Hi0 Fi Pt−1|t−1 Fi0 + Qi Hi .
(i)
Then the first order derivative of µt|t−1 with respect to the j-th component in θ is
(i)
∇θj µt|t−1 = −∇θj Hi0 Gi + Fi xt−1|t−1 − Hi0 ∇θj Gi + ∇θj Fi xt−1|t−1 + Fi ∇θj xt−1|t−1 − ∇θj A0i zt .
(A.1)
(i)
and the first order derivative of Ct|t−1 with respect to the j-th component in θ is
(i)
∇θj Ct|t−1 = ∇θj Hi0 Fi Pt−1|t−1 Fi0 + Qi Hi + Hi0 Fi Pt−1|t−1 Fi0 + Qi ∇θj Hi
+ Hi0 ∇θj Fi Pt−1|t−1 Fi0 + Fi ∇θj Pt−1|t−1 Fi0 + Fi Pt−1|t−1 ∇θj Fi0 + ∇θj Qi Hi .
Then the first order derivative of fit with respect to the j-th component in θ is
 (i) 

µt|t−1 (i)
∇θj fit = −fit  (i)  ∇θj µt|t−1 + fit 
Ct|t−1
31
1
(i)
2Ct|t−1

  h (i) i2
µ
t|t−1
(i)


−
1
∇
C


θ
j
t|t−1 .
(i)
Ct|t−1
(A.2)
(i)
Similarly, the derivative of ∇θj µt|t−1 respect to the k-th component in θ is
(i)
∇θj ∇θk µt|t−1 = −∇θj ∇θk Hi0 Gi + Fi xt−1|t−1 − ∇θj Hi0 ∇θk Gi + ∇θk Fi xt−1|t−1 + Fi ∇θk xt−1|t−1
− ∇θk Hi0 ∇θj Gi + ∇θj Fi xt−1|t−1 + Fi ∇θj xt−1|t−1
− Hi0 ∇θj ∇θk Gi + ∇θj ∇θk Fi xt−1|t−1 + ∇θj Fi ∇θk xt−1|t−1 + Fi ∇θj ∇θk xt−1|t−1
− ∇θj ∇θk A0i zt .
(A.3)
(i)
The derivative of ∇θj Ct|t−1 respect to the k-th component in θ is
(i)
∇θj ∇θk Ct|t−1 = ∇θj ∇θk Hi0 Fi Pt−1|t−1 Fi0 + Qi Hi + ∇θj Hi0 Fi Pt−1|t−1 Fi0 + Qi ∇θk Hi
∇θj Hi0 ∇θk Fi Pt−1|t−1 Fi0 + Fi ∇θk Pt−1|t−1 Fi0 + Fi Pt−1|t−1 ∇θk Fi0 + ∇θk Qi Hi
+ ∇θk Hi0 Fi Pt−1|t−1 Fi0 + Qi ∇θj Hi + Hi0 Fi Pt−1|t−1 Fi0 + Qi ∇θj ∇θk Hi
+ Hi0 ∇θk Fi Pt−1|t−1 Fi0 + Fi ∇θk Pt−1|t−1 Fi0 + Fi Pt−1|t−1 ∇θk Fi0 + ∇θk Qi ∇θj Hi
+ ∇θk Hi0 ∇θj Fi Pt−1|t−1 Fi0 + Fi ∇θj Pt−1|t−1 Fi0 + Fi Pt−1|t−1 ∇θj Fi0 + ∇θj Qi Hi
+ Hi0 ∇θj Fi Pt−1|t−1 Fi0 + Fi ∇θj Pt−1|t−1 Fi0 + Fi Pt−1|t−1 ∇θj Fi0 + ∇θj Qi ∇θk Hi
+ Hi0 ∇θj ∇θk Fi Pt−1|t−1 Fi0 + ∇θj Fi ∇θk Pt−1|t−1 Fi0 + ∇θj Fi Pt−1|t−1 ∇θk Fi0 Hi
+ Hi0 ∇θk Fi ∇θj Pt−1|t−1 Fi0 + Fi ∇θj ∇θk Pt−1|t−1 Fi0 + Fi ∇θj Pt−1|t−1 ∇θk Fi0 Hi
+ Hi0 ∇θk Fi Pt−1|t−1 ∇θj Fi0 + Fi ∇θk Pt−1|t−1 ∇θj Fi0 + Fi Pt−1|t−1 ∇θj ∇θk Fi0 Hi
+ Hi0 ∇θj ∇θk Qi Hi .
(A.4)
32
Then the derivative of ∇θj fit respect to the k-th component in θ is
 (i) 
 (i) 
µt|t−1 µt|t−1 (i)
(i)
∇θj ∇θk fit = − (∇θk fit )  (i)  ∇θj µt|t−1 − fit  (i)  ∇θj ∇θk µt|t−1
Ct|t−1
Ct|t−1


(i)
(i)
(i)
µt|t−1 ∇θk Ct|t−1  (i)
 ∇θk µt|t−1
− fit 
∇
−
µ

i
h
θ
j
2
t|t−1
(i)
(i)
Ct|t−1
Ct|t−1

+ (∇θk fit ) 
1
(i)
2Ct|t−1

  h (i) i2
µt|t−1
(i)


−
1
∇
C


θj t|t−1
(i)
Ct|t−1
 h

(i)
i2

(i)
(i)
 ∇θk Ct|t−1   µt|t−1

∇
C
−
1
− fit  h

i2  
θ
j
t|t−1
(i)
(i)
Ct|t−1
2 C
t|t−1

+ fit 
1
(i)
2Ct|t−1

+ fit 
B
1
(i)
2Ct|t−1

i2 h
  h (i) i (i)
(i)
(i)
∇
µ
∇
C
2
µ
µ
θ
θ
k t|t−1
k t|t−1 
t|t−1
t|t−1
(i)

−
∇
C


h
i
θ
j
2
t|t−1
(i)
(i)
Ct|t−1
Ct|t−1

  h (i) i2
µt|t−1
(i)


∇
C
−
1
∇


θj θk t|t−1 .
(i)
Ct|t−1
Proofs
Proof of Lemma 1. The equation (4.1) can be written as
ξt+1|t = p + ρ
At
,
Bt
(B.1)
where At = f2t (ξt|t−1 − 1) and Bt = (f1t − f2t )ξt|t−1 + f2t . Let “-” (e.g. ξ¯t|t−1 ) denote that the quantity
is evaluated at (β 0 , δ 0 , δ 0 ).
Consider Lemma 1.1. Because f¯1t = f¯2t = f¯t , it follows that
Āt = f¯t (ξ¯t|t−1 − 1)
and
B̄t = f¯t .
(B.2)
Plugging this into (B.1), we have
ξ¯t+1|t = p + ρ(ξ¯t|t−1 − 1).
This, together with (2.13), implies ξ¯2|1 = p + ρ(ξ¯1|0 − 1) = p + ρ(ξ∗ − 1) = ξ∗ , where the last equality
follows from the definition of ρ and ξ∗ . This can be iterated forward, leading to ξ¯t+1|t = ξ∗ for all
t ≥ 1.
33
Consider Lemma 1.2. Differentiate (B.1) with respect to the j-th component in θ, we have
∇θj ξt+1|t = ρ
∇θj At At ∇θj Bt
−
Bt
Bt2
,
(B.3)
where
∇θj At = ∇θj f2t (ξt|t−1 − 1) + f2t ∇θj ξt|t−1
and
∇θj Bt = (∇θj f1t − ∇θj f2t )ξt|t−1 + (f1t − f2t )∇θj ξt|t−1 + ∇θj f2t .
Below, we evaluate the right hand side of (B.3) at (β 0 , δ 0 , δ 0 ) for two possible situations:
(1). If j ∈ I0 , then ∇θj f¯1t = ∇θj f¯2t and f¯1t = f¯2t = f¯t . Consequently
∇θj Āt = ∇θj f¯2t (ξ∗ − 1) + f¯t ∇θj ξ¯t|t−1 ,
∇θj B̄t = ∇θj f¯2t .
(B.4)
Combining (B.4) with (B.2), we have ∇θj ξ¯t+1|t = ρ∇θj ξ¯t|t−1 . This implies, at t = 1, we have ∇θj ξ¯2|1 =
ρ∇θj ξ¯1|0 = ρ∇θj ξ∗ = 0. This can be iterated forward leading to ∇θj ξ¯t|t−1 = 0.
(2). If j ∈ I1 or j ∈ I2 , then f¯1t = f¯2t = f¯t and
∇θj Āt = ∇θj f¯2t (ξ∗ − 1) + f¯t ∇θj ξ¯t|t−1 ,
∇θj B̄t = ξ∗ ∇θj f¯1t + (1 − ξ∗ )∇θj f¯2t .
(B.5)
Combining this with (B.2), we have
(
∇θj f¯1t ∇θj f¯2t
−
f¯t
f¯t
!
∇θj f¯1t ∇θj f¯2t
−
,
f¯t
f¯t
!)
∇θj ξ¯t+1|t = ρ ∇θj ξ¯t|t−1 − (ξ∗ − 1)ξ∗
= ρ∇θj ξ¯t|t−1 + r
where r = ρ(1 − ξ∗ )ξ∗ . Note that ∇θj ξ¯t+1|t can also be written as
∇θj ξ¯t+1|t = r
t−1
X
s=0
s
ρ
∇θj f¯1t−s ∇θj f¯2t−s
−
f¯t
f¯t
34
!
.
(B.6)
Because
∇θj f¯1t ∇θj f¯2t
−
f¯t
f¯t
!
!
∇θj−nδ f¯1t ∇θj−nδ f¯2t
=−
−
,
f¯t
f¯t
(B.7)
when j ∈ I2 , we have
∇θj ξ¯t+1|t = −∇θj−nδ ξ¯t+1|t .
In addition, from (2.13) and (B.6), we have
∇θj f¯1t−s ∇θj f¯2t−s
−
∇θj ξ¯t|t = (1 − ξ∗ )ξ∗
ρ
f¯t
f¯t
s=0
!
¯1t ∇θ f¯2t
∇
f
θ
j
j
−
,
= ρ∇θj ξ¯t−1|t−1 + (1 − ξ∗ )ξ∗
f¯t
f¯t
t−1
X
!
s
(B.8)
when j ∈ Ia , a = 1, 2, and
∇θj ξ¯t|t = −∇θj−nδ ξ¯t|t .
Consider Lemma 1.3. Differentiating (B.3) with respect to θk :
∇θj ∇θk ξt+1|t = ρ
∇θj ∇θk At
Bt
−
∇θj At ∇θk Bt
Bt2
−
∇θk At ∇θj Bt
Bt2
−
At ∇θj ∇θk Bt
+2
Bt2
At ∇θj Bt ∇θk Bt
Bt3
,
(B.9)
where
∇θj ∇θk At = ∇θj ∇θk f2t (ξt|t−1 − 1) + ∇θj f2t ∇θk ξt|t−1 + ∇θk f2t ∇θj ξt|t−1 + f2t ∇θj ∇θk ξt|t−1 ,
∇θj ∇θk Bt = (∇θj ∇θk f1t − ∇θj ∇θk f2t )ξt|t−1 + (∇θj f1t − ∇θj f2t )∇θk ξt|t−1
+ (∇θk f1t − ∇θk f2t )∇θj ξt|t−1 + (f1t − f2t )∇θj ∇θk ξt|t−1 + ∇θj ∇θk f2t .
We now evaluate the right hand side of (B.9) at δ1 = δ2 = δ under three possible situations:
(1) If j ∈ I0 and k ∈ I0 , then f¯1t = f¯2t = f¯t , ∇θj f¯1t = ∇θj f¯2t , ∇θk f¯1t = ∇θk f¯2t , ∇θj ∇θk f¯1t =
∇θj ∇θk f¯2t and ∇θj ξ¯t+1|t = ∇θk ξ¯t+1|t = 0, implying ∇θj ∇θk Āt = ∇θj ∇θk f¯2t (ξ¯t|t−1 − 1) + f¯t ∇θj ∇θk ξ¯t|t−1
and ∇θj ∇θk B̄t = ∇θj ∇θk f¯2t . Combining them with (B.4) and (B.2), ∇θj ∇θk ξ¯t+1|t equals
(ξ̄t|t−1 −1)∇θj f¯2t ∇θk f¯2t
∇θj ∇θk f¯2t (ξ̄t|t−1 −1)+f¯t ∇θj ∇θk ξ̄t|t−1
−
f¯t
f¯t2
(ξ̄t|t−1 −1)∇θk f¯2t ∇θj f¯2t
(ξ̄t|t−1 −1)∇θj ∇θk f¯2t
(ξ̄t|t−1 −1)∇θj f¯2t ∇θk f¯2t
−
−
+
2
f¯2
f¯t
f¯2
ρ
t
t
= ρ∇θj ∇θk ξ¯t|t−1 .
35
Starting at t = 1 and iterating forward, we have ∇θj ∇θk ξ¯t+1|t = 0 for all t ≥ 1.
(2) If j ∈ I0 and k ∈ I1 , then ∇θj f¯1t = ∇θj f¯2t and ∇θj ξ¯t+1|t = 0, which imply that
∇θj ∇θk Āt = ∇θj ∇θk f¯2t (ξ∗ − 1) + ∇θj f¯2t ∇θk ξ¯t|t−1 + f¯t ∇θj ∇θk ξ¯t|t−1
and
∇θj ∇θk B̄t = ξ∗ ∇θj ∇θk f¯1t + (1 − ξ∗ )∇θj ∇θk f¯2t .
Combing these two equations with (B.2), (B.4) and (B.5), ∇θj ∇θk ξ¯t+1|t equals
1 1 ∇θj ∇θk f¯2t (ξ∗ − 1) + ∇θj f¯2t ∇θk ξ¯t|t−1 + f¯t ∇θj ∇θk ξ¯t|t−1 − 2 ∇θj f¯2t (ξ∗ − 1) ξ∗ ∇θk f¯1t + (1 − ξ∗ )∇θk f¯2t
f¯t
f¯t
1 1
− 2 ∇θj f¯2t ∇θk f¯2t (ξ∗ − 1) + f¯t ∇θk ξ¯t|t−1 − (ξ∗ − 1)
ξ∗ ∇θj ∇θk f¯1t + (1 − ξ∗ )∇θj ∇θk f¯2t
¯
¯
ft
ft
ρ
1 +2(ξ∗ − 1) 2 ∇θj f¯2t ξ∗ ∇θk f¯1t + (1 − ξ∗ )∇θk f¯2t
¯
ft
.
The result follows from rearranging the terms. For the case j ∈ I0 and k ∈ I2 , we have the same
result.
(3) If j ∈ I1 and k ∈ I1 , then
∇θj ∇θk Āt = ∇θj ∇θk f¯2t (ξ∗ − 1) + ∇θj f¯2t ∇θk ξ¯t|t−1 + ∇θk f¯2t ∇θj ξ¯t|t−1 + f¯t ∇θj ∇θk ξ¯t|t−1
and
∇θj ∇θk B̄t = ξ∗ ∇θj ∇θk f¯1t + (1 − ξ∗ )∇θj ∇θk f¯2t + (∇θj f¯1t − ∇θj f¯2t )∇θk ξ¯t|t−1 + (∇θk f¯1t − ∇θk f¯2t )∇θj ξ¯t|t−1 .
Applying the similar derivative above, we have ∇θj ∇θk ξ¯t+1|t equals
1 ∇θj ∇θk f¯2t (ξ∗ − 1) + ∇θj f¯2t ∇θk ξ¯t|t−1 + ∇θk f¯2t ∇θj ξ¯t|t−1 + ∇θj ∇θk ξ¯t|t−1
¯
ft
1 − 2 ∇θj f¯2t (ξ∗ − 1) + f¯t ∇θj ξ¯t|t−1 ξ∗ ∇θk f¯1t + (1 − ξ∗ )∇θk f¯2t
f¯
ρ
t
1 − 2 ξ∗ ∇θj f¯1t + (1 − ξ∗ )∇θj f¯2t ∇θk f¯2t (ξ∗ − 1) + f¯t ∇θk ξ¯t|t−1
¯
ft
1
− (ξ∗ − 1)
ξ∗ ∇θj ∇θk f¯1t + (1 − ξ∗ )∇θj ∇θk f¯2t + (∇θj f¯1t − ∇θj f¯2t )∇θk ξ¯t|t−1 + (∇θk f¯1t − ∇θk f¯2t )∇θj ξ¯t|t−1
f¯t
+2(ξ∗ − 1)
1 ξ∗ ∇θj f¯1t + (1 − ξ∗ )∇θj f¯2t ξ∗ ∇θk f¯1t + (1 − ξ∗ )∇θk f¯2t
2
f¯
.
t
The result follows from rearranging the terms. For the cases j ∈ I1 and k ∈ I2 , and j ∈ I2 and k ∈ I2 ,
we have the same result.
36
Consider Lemma 1.4. Differentiating (B.9) with respect to θl :
∇θj ∇θk ∇θl ξt+1|t
=ρ
−
−
∇θj ∇θk ∇θl At
Bt
∇θk ∇θl At ∇θj Bt
Bt2
∇θl At ∇θj ∇θk Bt
+
−
−
−
Bt2
2∇θl At ∇θj Bt ∇θk Bt
Bt3
∇θj ∇θk At ∇θl Bt
Bt2
∇θk At ∇θj ∇θl Bt
Bt2
At ∇θj ∇θk ∇θl Bt
+
−
+
+
∇θj ∇θl At ∇θk Bt
−
∇θj At ∇θk ∇θl Bt
Bt2
+
2∇θj At ∇θk Bt ∇θl Bt
Bt3
2∇θk At ∇θj Bt ∇θl Bt
Bt3
2At ∇θj ∇θk Bt ∇θl Bt
Bt2
2At ∇θj ∇θl Bt ∇θk Bt
Bt3
Bt2
+
Bt3
2At ∇θj Bt ∇θk ∇θl Bt
Bt3
−
6At ∇θj Bt ∇θk Bt ∇θl Bt
Bt4
,
where
∇θj ∇θk ∇θl At = ∇θj ∇θk ∇θl f2t (ξt|t−1 − 1) + ∇θj ∇θl f2t ∇θk ξt|t−1 + ∇θk ∇θl f2t ∇θj ξt|t−1
+ ∇θl f2t ∇θj ∇θk ξt|t−1 + ∇θj ∇θk f2t ∇θl ξt|t−1 + ∇θj f2t ∇θk ∇θl ξt|t−1
+ ∇θk f2t ∇θj ∇θl ξt|t−1 + f2t ∇θj ∇θk ∇θl ξt|t−1
and
∇θj ∇θk ∇θl Bt = (∇θj ∇θk ∇θl f1t − ∇θj ∇θk ∇θl f2t )ξt|t−1 + (∇θj ∇θl f1t − ∇θj ∇θl f2t )∇θk ξt|t−1
+ (∇θk ∇θl f1t − ∇θk ∇θl f2t )∇θj ξt|t−1 + (∇θl f1t − ∇θl f2t )∇θj ∇θk ξt|t−1
+ ∇θj ∇θk ∇θl f2t + (∇θj ∇θk f1t − ∇θj ∇θk f2t )∇θl ξt|t−1
+ (∇θj f1t − ∇θj f2t )∇θk ∇θl ξt|t−1 + (∇θk f1t − ∇θk f2t )∇θj ∇θl ξt|t−1
+ (f1t − f2t )∇θj ∇θk ∇θl ξt|t−1 .
We now evaluate the above terms at δ1 = δ2 = δ for 4 possible cases. We only report the values of
Ējkl,t but omit the derivation details.
(1) If j ∈ I0 , k ∈ I0 and l ∈ I0 , then Ējkl,t = 0.
(2) If j ∈ I0 , k ∈ I0 and l ∈
/ I0 , then Ējkl,t equals
r
1
1 1 ∇θj ∇θk ∇θl f¯1t − ∇θj ∇θk ∇θl f¯2t − 2 ∇θj ∇θk f¯2t (∇θl f¯1t − ∇θl f¯2t ) − 2 ∇θk f¯2t (∇θj ∇θl f¯1t − ∇θj ∇θl f¯2t )
f¯t
f¯t
f¯t
1 1 − 2 ∇θj f¯2t (∇θk ∇θl f¯1t − ∇θk ∇θl f¯2t ) + 2 3 ∇θj f¯2t ∇θk f¯2t (∇θl f¯1t − ∇θl f¯2t )
¯
¯
ft
ft
37
.
(3) If j ∈ I0 , k ∈
/ I0 and l ∈
/ I0 , then Ējkl,t equals
r
∇θj ∇θl ∇θk f¯2t
∇θj ∇θl ∇θk f¯1t
−
f¯t
f¯t
"
+ ρ(1 − 2ξ∗ )
∇θj ∇θk f¯1t − ∇θj ∇θk f¯2t
∇θl f¯1t − ∇θl f¯2t
∇θj ∇θl f¯1t − ∇θj ∇θl f¯2t
∇θk ξ̄t|t−1 +
∇θl ξ̄t|t−1 +
∇θj ∇θk ξ̄t|t−1
¯
¯
ft
ft
f¯t
∇θj f¯2t ∇θl f¯1t − ∇θl f¯2t
∇θj f¯2t ∇θk f¯1t − ∇θk f¯2t
∇θk f¯1t − ∇θk f¯2t
ξ̄
−
+
∇θj ∇θl ξ̄t|t−1 −
∇
∇θl ξ̄t|t−1
θ
t|t−1
k
f¯t
f¯t2
f¯t2
"
−r
∇θj f¯2t ∇θk ∇θl f¯1t − ∇θk ∇θl f¯2t
f¯t
∇θj ∇θk f¯2t ∇θl f¯1t − ∇θl f¯2t
+
f¯t
"
− 2rξ∗
∇θj ∇θk f¯2t ∇θl f¯1t − ∇θl f¯2t
−
f¯2
t
+ 2r(1 − 2ξ∗ )
∇θk f¯2t ∇θj ∇θl f¯1t − ∇θj ∇θl f¯2t
+
f¯t
∇θj ∇θl f¯2t ∇θk f¯1t − ∇θk f¯2t
+
f¯t
∇θj ∇θk f¯1t ∇θl f¯1t − ∇θl f¯2t
f¯2
t
∇θl f¯2t ∇θj ∇θk f¯1t − ∇θj ∇θk f¯2t
+
f¯t
#
∇θj ∇θl f¯1t ∇θk f¯1t − ∇θk f¯2t
+
f¯2
t
∇θj ∇θl f¯2t ∇θk f¯1t − ∇θk f¯2t
−
f¯2
#
t
∇θj f¯2t ∇θk f¯2t ∇θl f¯1t
∇θj f¯2t ∇θk f¯1t ∇θl f¯2t
+
3
¯
f
f¯3
t
#
+ 4r
t
ξ∗ ∇θj f¯2t ∇θl f¯1t ∇θk f¯1t
(1 − ξ∗ )∇θj f¯2t ∇θl f¯2t ∇θk f¯2t
−
3
¯
f
f¯3
t
38
t
(4) If j ∈
/ I0 , k ∈
/ I0 and l ∈
/ I0 , then Ējkl,t equals
r
∇θj ∇θl ∇θk f¯2t
∇θj ∇θl ∇θk f¯1t
−
f¯t
f¯t
"
+ ρ(1 − 2ξ∗ )
∇θj ∇θk f¯1t − ∇θj ∇θk f¯2t
∇θk ∇θl f¯1t − ∇θk ∇θl f¯2t
∇θj ∇θl f¯1t − ∇θj ∇θl f¯2t
∇θk ξ̄t|t−1 +
∇θl ξ̄t|t−1 +
∇θj ξ̄t|t−1
¯
¯
ft
ft
f¯t
∇θk f¯1t − ∇θk f¯2t
∇θj f¯1t − ∇θj f¯2t
∇θl f¯1t − ∇θl f¯2t
∇θj ∇θk ξ̄t|t−1 +
∇θj ∇θl ξ̄t|t−1 +
∇θk ∇θl ξ̄t|t−1
+
f¯t
f¯t
f¯t
"
−r
∇θj f¯2t ∇θk ∇θl f¯1t − ∇θk ∇θl f¯2t
f¯t
∇θj ∇θk f¯2t ∇θl f¯1t − ∇θl f¯2t
+
f¯t
"
− 2rξ∗
t
∇θj ∇θk f¯2t ∇θl f¯1t − ∇θl f¯2t
−
f¯2
t
"
− 2ρ
∇θk f¯2t ∇θj ∇θl f¯1t − ∇θj ∇θl f¯2t
+
f¯t
∇θj ∇θl f¯2t ∇θk f¯1t − ∇θk f¯2t
+
f¯t
∇θj ∇θk f¯1t ∇θl f¯1t − ∇θl f¯2t
f¯2
∇θj ∇θl f¯2t ∇θk f¯1t − ∇θk f¯2t
−
f¯2
t
#
∇θk ∇θl f¯1t ∇θj f¯1t − ∇θj f¯2t
+
f¯2
t
∇θk ∇θl f¯2t ∇θj f¯1t − ∇θj f¯2t
−
f¯2
#
t
∇θj f¯1t ∇θl f¯1t
∇θk f¯1t ∇θl f¯1t
∇θj f¯1t ∇θk f¯1t
∇θl ξ̄t|t−1 +
∇θk ξ̄t|t−1 +
∇θj ξ̄t|t−1
2
2
f¯
f¯
f¯2
t
"
− 6ξ∗ + 1)
t
#
t
∇θj f¯1t ∇θk f¯2t
∇θj f¯1t ∇θl f¯2t
∇θk f¯1t ∇θj f¯2t
∇θl ξ̄t|t−1 +
∇θk ξ̄t|t−1 +
∇θl ξ̄t|t−1
2
2
¯
¯
f
f
f¯2
t
t
+
∇θj f¯1t − ∇θj f¯2t
∇θk f¯1t − ∇θk f¯2t
∇θl f¯1t − ∇θl f¯2t
∇θk ξ̄t|t−1 ∇θl ξ̄t|t−1 +
∇θj ξ̄t|t−1 ∇θl ξ̄t|t−1 +
∇θj ξ̄t|t−1 ∇θk ξ̄t|t−1
¯
¯
ft
ft
f¯t
+ ρ(6ξ∗2 − 4ξ∗ )
−
t
∇θl f¯2t ∇θj ∇θk f¯1t − ∇θj ∇θk f¯2t
+
f¯t
∇θk ∇θl f¯2t ∇θj f¯1t − ∇θj f¯2t
+
f¯t
∇θj ∇θl f¯1t ∇θk f¯1t − ∇θk f¯2t
+
f¯2
"
ρ(6ξ∗2
#
t
∇θk f¯1t ∇θl f¯2t
∇θl f¯1t ∇θk f¯2t
∇θl f¯1t ∇θj f¯2t
∇θj ξ̄t|t−1 +
∇θk ξ̄t|t−1 +
∇θj ξ̄t|t−1
2
2
f¯
f¯
f¯2
t
t
"
2
+ ρ[6(1 − ξ∗ ) − 4(1 − ξ∗ )]
t
t
t
t
+ r[6(1 − ξ∗ )2 − 4(1 − ξ∗ )]
∇θj f¯2t ∇θk f¯2t
∇θj f¯2t ∇θl f¯2t
∇θk f¯2t ∇θl f¯2t
∇θl ξ̄t|t−1 +
∇θk ξ̄t|t−1 +
∇θj ξ̄t|t−1
2
2
¯
¯
f
f
f¯2
∇θj f¯1t ∇θk f¯1t ∇θl f¯1t
− r(6ξ∗2 − 4ξ∗ )
+ 6rξ∗2
f¯3
#
t
∇θj f¯1t ∇θk f¯1t ∇θl f¯2t
∇θj f¯1t ∇θk f¯2t ∇θl f¯1t
∇θj f¯2t ∇θk f¯1t ∇θl f¯1t
+
+
f¯3
f¯3
f¯3
t
t
t
t
∇θj f¯1t ∇θk f¯2t ∇θl f¯2t
∇θj f¯2t ∇θk f¯2t ∇θl f¯1t
∇θj f¯2t ∇θk f¯1t ∇θl f¯2t
+
+
f¯3
f¯3
f¯3
t
#
t
∇θj f¯2t ∇θk f¯2t ∇θl f¯2t
− 6r(1 − ξ∗ )2
.
f¯3
t
Proof of Lemma 2. When δ1 = δ2 , we can directly apply the proposition 13.1 in Hamilton (1994)
to show that the sequence of predicted mean squared error P̄t|t−1 convergences. Then by (2.10) and
(2.12), P̄t|t also convergences to a positive semidefinite matrix P̄∗ . Let P0|0 = P̄∗ , then the results
in this lemma hold and (4.4) can be achieved by combining (2.6), (2.10) with (2.12) when δ1 = δ2 .
(i)
In addition, by (2.12) and (2.6), we also have both P̄t|t and P̄t+1|t equal to P̄∗ and F̄ P̄∗ F̄ 0 + Q̄ for
39
#
t = 1, ..., T .
Proof of Lemma 3. To show Lemma 3.1, combining (2.8) and (2.10), then
(i)
(i)
(i)
(i)
Pt|t = (I − Pt|t−1 Hi [Hi0 Pt|t−1 H i ]−1 Hi0 )Pt|t−1 .
(i)
The first order derivative of Pt|t with respect to θj gives
(i)

(i)
∇θj Pt|t = I −

Pt|t−1 Hi Hi0
(i)
Hi0 Pt|t−1 Hi
 ∇θ P (i) I −
j t|t−1
(i)
0
0
 Pt|t−1 ∇θj Hi Hi + Hi ∇θj Hi
−
(i)


Hi Hi0 Pt|t−1

(i)

Hi0 Pt|t−1 Hi
−
(i)
Hi0 Pt|t−1 Hi
(i)
Pt|t−1 Hi Hi0
(B.10)
(i)

(i)
∇θj Hi0 Pt|t−1 Hi + Hi0 Pt|t−1 ∇θj Hi 
(i)
 Pt|t−1 .
2
(i)
Hi0 Pt|t−1 Hi
(i)
The first order derivative of Pt|t−1 in (2.6) with respect to θj gives
(i)
∇θj Pt|t−1 = Fi ∇θj Pt−1|t−1 Fi0 + ∇θj Fi Pt−1|t−1 Fi0 + Fi Pt−1|t−1 ∇θj Fi0 + ∇θj Qi .
By the last two equations, we have
(i)

(i)
∇θj Pt|t = I −
Pt|t−1 Hi Hi0
(i)
Hi0 Pt|t−1 Hi
(i)

+ I −

Pt|t−1 Hi Hi0
(i)
Hi0 Pt|t−1 Hi
(i)



 Fi  ∇θ Pt−1|t−1 F 0 I −
i
j
Hi Hi0 Pt|t−1

(i)
Hi0 Pt|t−1 Hi

(B.11)
(i)


 ∇θ Fi Pt−1|t−1 F 0 + Fi Pt−1|t−1 ∇θ F 0 + ∇θ Qi I −
i
j
j i
j
0
0
 Pt|t−1 ∇θj Hi Hi + Hi ∇θj Hi
−
(i)

(i)
Hi0 Pt|t−1 Hi
−
(i)
Pt|t−1 Hi Hi0
(i)
Hi Hi0 Pt|t−1

(i)

Hi0 Pt|t−1 Hi
(i)

∇θj Hi0 Pt|t−1 Hi + Hi0 Pt|t−1 ∇θj Hi 
(i)
 Pt|t−1 .
2
(i)
Hi0 Pt|t−1 Hi
Let δ1 = δ2 , then
"
(i)
∇θj P̄t|t
=
P̄ H̄ H̄ 0
I− 0
H̄ P̄ H̄
P̄ H̄ H̄ 0
+ I− 0
H̄ P̄ H̄

!
!
#
F̄ ∇θj P̄t−1|t−1 F̄
h
0
∇θj F̄i P̄∗ F̄ +
0
0
 P̄ ∇θj H̄i H̄ + H̄∇θj H̄i
−
"
H̄ 0 P̄ H̄
0
H̄ H̄ 0 P̄
I− 0
H̄ P̄ H̄
F̄ P̄∗ ∇θj F̄i0
P̄ H̄ H̄ 0
−
40
!#
+ ∇θj Qi
(B.12)
i
H̄ H̄ 0 P̄
I− 0
H̄ P̄ H̄
!

i
∇θj H̄i0 P̄ H̄ + H̄ 0 P̄ ∇θj H̄i  h
0
 F̄ P̄∗ F̄ + Q̄ ,
2
H̄ 0 P̄ H̄
where
P̄ H̄ H̄ 0
I− 0
H̄ P̄ H̄
!
F̄ and F̄
H̄ H̄ 0 P̄
I− 0
H̄ P̄ H̄
0
!
have all their eigenvalues inside the unit circle by Assumption 4.
Taking first order derivative with respect to θj in (2.12) and let δ1 = δ2 , we have
(1)
(2)
¯
∇θj P̂t|t = ξ∗ ∇θj P̄t|t + (1 − ξ∗ )∇θj P̄t|t .
Together with the result in (B.12),
"
∇θj P̄t|t =
+ ξ∗
P̄ H̄ H̄ 0
I− 0
H̄ P̄ H̄
!
#
"
F̄ ∇θj P̄t−1|t−1 F̄
P̄ H̄ H̄ 0
I− 0
H̄ P̄ H̄
!
h
 P̄ ∇θj H̄1
− ξ∗ 
H̄ 0
+
!
h
0
∇θj F̄2 P̄∗ F̄ +
H̄∇θj H̄10
(B.13)
P̄ H̄ H̄ 0
H̄ H̄ 0 P̄
I− 0
H̄ P̄ H̄
i
+ ∇ θj Q2
∇θj H̄10 P̄ H̄
H̄ 0 P̄ H̄
0
0
 P̄ ∇θj H̄2 H̄ + H̄∇θj H̄2
− (1 − ξ∗ ) 
F̄ P̄∗ ∇θj F̄20
−
H̄ 0 P̄ H̄

!#
∇θj F̄1 P̄∗ F̄ 0 + F̄ P̄∗ ∇θj F̄10 + ∇θj Q1
P̄ H̄ H̄ 0
+ (1 − ξ∗ ) I − 0
H̄ P̄ H̄

H̄ H̄ 0 P̄
I− 0
H̄ P̄ H̄
0
H̄ 0 P̄ H̄
P̄ H̄ H̄ 0
+
i
H̄ H̄ 0 P̄
I− 0
H̄ P̄ H̄
H̄ 0 P̄ ∇
θj H̄1
2
!

h
i

0
 F̄ P̄∗ F̄ + Q̄

−
!
i
∇θj H̄20 P̄ H̄ + H̄ 0 P̄ ∇θj H̄2  h
0
 F̄ P̄∗ F̄ + Q̄ ,
2
H̄ 0 P̄ H̄
which gives the result in the lemma.
To show Lemma 3.2, by (2.5), (2.7), (2.8) and (2.9), we have

(i)
xt|t = I −
(i)
Pt|t−1 Hi Hi0
(i)
Hi0 Pt|t−1 Hi


 Fi xt−1|t−1 + Gi + 
(i)
Then the first order derivative of xt|t with respect to θj gives
41
(i)
Pt|t−1 Hi
(i)
Hi0 Pt|t−1 Hi

 (yt − A0i zt ) .

(i)
(i)
∇θj xt|t = I −

−
(i)
Hi0 Pt|t−1 Hi
(i)
+ I −

Pt|t−1 Hi Hi0
Pt|t−1 Hi Hi0
(i)
Hi0 Pt|t−1 Hi

 Fi ∇θj xt−1|t−1
(B.14)

 ∇θj Fi xt−1|t−1 + ∇θj Gi
(i)
(i)
(i)
∇θj Pt|t−1 Hi Hi0 + Pt|t−1 ∇θj Hi Hi0 + Pt|t−1 Hi ∇θj Hi0
(i)
Hi0 Pt|t−1 Hi

 Fi xt−1|t−1 + Gi

(i)
(i)
(i)
∇θj Hi0 Pt|t−1 Hi + Hi0 ∇θj Pt|t−1 Hi + Hi0 Pt|t−1 ∇θj Hi 
 Fi xt−1|t−1 + Gi
2
(i)
Hi0 Pt|t−1 Hi


(i)
(i)
∇θj Pt|t−1 Hi + Pt|t−1 ∇θj Hi
 (yt − A0i zt )
+
(i)
0
Hi Pt|t−1 Hi


(i)
(i)
0 (i)
0
0 (i)
P
∇
P
P
P
H
∇
H
H
+
H
H
+
H
∇
H
θj i t|t−1 i
i θj t|t−1 i
i t|t−1 θj i 
 t|t−1 i
0
−
 (yt − Ai zt )
2
(i)
0
Hi Pt|t−1 Hi


(i)
Pt|t−1 Hi

 ∇θj A0i zt .
−
(i)
Hi0 Pt|t−1 Hi

(i)
0
 Pt|t−1 Hi Hi
+
When δ1 = δ2 , we have
(i)
∇θj x̄t|t
P̄ H̄ H̄ 0
F̄ ∇θj x̄t−1|t−1
(B.15)
= I− 0
H̄ P̄ H̄
P̄ H̄ H̄ 0
+ I− 0
∇θj F̄i x̄t−1|t−1 + ∇θj Ḡi
H̄ P̄ H̄


(i)
∇θj P̄t|t−1 H̄ H̄ 0 + P̄ ∇θj H̄i H̄ 0 + P̄ H̄∇θj H̄i0
 F̄ x̄t−1|t−1 + Ḡ
−
H̄ 0 P̄ H̄


(i)
P̄ H̄ H̄ 0 ∇θj H̄i0 P̄ H̄ + H̄ 0 ∇θj P̄t|t−1 H̄ + H̄ 0 P̄ ∇θj H̄i
 F̄ x̄t−1|t−1 + Ḡ
+
2
0
H̄ P̄ H̄


(i)
(i)
P̄ H̄ ∇θj H̄i0 P̄ H̄ + H̄ 0 ∇θj P̄t|t−1 H̄ + H̄ 0 P̄ ∇θj H̄i
∇θj P̄t|t−1 H̄ + P̄ ∇θj H̄i
 yt − Ā0 zt
+
−
2
0
H̄ P̄ H̄
H̄ 0 P̄ H̄
P̄ H̄
∇θj Ā0i zt ,
−
H̄ 0 P̄ H̄
(1)
(1)
where the expressions for ∇θj P̄t|t−1 and ∇θj P̄t|t−1 , based on (2.6), are given by:
(1)
∇θj P̄t|t−1 = ∇θj F̄1 P̄ F̄ 0 + F̄ ∇θj P̄t−1|t−1 F̄ 0 + F̄ P̄ ∇θj F̄10 + ∇θj Q̄1 ,
42
(B.16)
and
(2)
∇θj P̄t|t−1 = ∇θj F̄2 P̄ F̄ 0 + F̄ ∇θj P̄t−1|t−1 F̄ 0 + F̄ P̄ ∇θj F̄20 + ∇θj Q̄2 .
(B.17)
By (2.11), when δ1 = δ2 ,
(1)
(2)
∇θj x̄t|t = ξ∗ ∇θj x̄t|t + (1 − ξ∗ )∇θj x̄t|t .
Therefore,
∇θj x̄t|t =
P̄ H̄ H̄ 0
I− 0
H̄ P̄ H̄
!
F̄ ∇θj x̄t−1|t−1 + X̄j,t ,
(B.18)
where
P̄ H̄ H̄ 0 ξ∗ ∇θj F̄1 x̄t−1|t−1 + ∇θj Ḡ1 + (1 − ξ∗ ) ∇θj F̄2 x̄t−1|t−1 + ∇θj Ḡ2
I− 0
H̄ P̄ H̄


(1)
∇θj P̄t|t−1 H̄ H̄ 0 + P̄ ∇θj H̄1 H̄ 0 + P̄ H̄∇θj H̄10
 F̄ x̄t−1|t−1 + Ḡ
− ξ∗ 
H̄ 0 P̄ H̄


(1)
P̄ H̄ H̄ 0 ∇θj H̄10 P̄ H̄ + H̄ 0 ∇θj P̄t|t−1 H̄ + H̄ 0 P̄ ∇θj H̄1
 F̄ x̄t−1|t−1 + Ḡ
+ ξ∗ 
2
H̄ 0 P̄ H̄


(2)
∇θj P̄t|t−1 H̄ H̄ 0 + P̄ ∇θj H̄2 H̄ 0 + P̄ H̄∇θj H̄20
 F̄ x̄t−1|t−1 + Ḡ
− (1 − ξ∗ ) 
0
H̄ P̄ H̄


(2)
P̄ H̄ H̄ 0 ∇θj H̄20 P̄ H̄ + H̄ 0 ∇θj P̄t|t−1 H̄ + H̄ 0 P̄ ∇θj H̄2
 F̄ x̄t−1|t−1 + Ḡ
+ (1 − ξ∗ ) 
2
H̄ 0 P̄ H̄


(1)
(1)
P̄ H̄ ∇θj H̄10 P̄ H̄ + H̄ 0 ∇θj P̄t|t−1 H̄ + H̄ 0 P̄ ∇θj H̄1
∇θj P̄t|t−1 H̄ + P̄ ∇θj H̄1
 yt − Ā01 zt
+ ξ∗ 
−
2
H̄ 0 P̄ H̄
H̄ 0 P̄ H̄


(2)
(2)
P̄ H̄ ∇θj H̄20 P̄ H̄ + H̄ 0 ∇θj P̄t|t−1 H̄ + H̄ 0 P̄ ∇θj H̄2
∇θj P̄t|t−1 H̄ + P̄ ∇θj H̄2
 yt − Ā02 zt
−
+ (1 − ξ∗ ) 
2
0
0
H̄ P̄ H̄
H̄ P̄ H̄
P̄ H̄
0
0
−
ξ
∇
Ā
+
(1
−
ξ
)∇
Ā
zt .
∗
θ
∗
θ
1
2
j
j
H̄ 0 P̄ H̄
X̄j,t =
To show Lemma 3.3. By (2.12), we have the second order derivatives of Pt|t with respect to θj and
43
θk , when evaluated at some δ1 = δ2 , is given by
(1)
(2)
(1)
(2)
∇θj ∇θk P̄t|t = ξ∗ (1 − ξ∗ ) ∇θj x̄t|t − ∇θj x̄t|t
+ ξ∗ (1 − ξ∗ ) ∇θk x̄t|t − ∇θk x̄t|t
(2)
(1)
+ ∇θj ξ¯t|t ∇θk P̄t|t − ∇θk P̄t|t
(1)
(2) 0
(1)
(2) 0
∇θk x̄t|t − ∇θk x̄t|t
∇θj x̄t|t − ∇θj x̄t|t
(1)
(2)
+ ∇θk ξ¯t|t ∇θj P̄t|t − ∇θj P̄t|t
(1)
(2)
+ ξ∗ ∇θj ∇θk P̄t|t + (1 − ξ∗ )∇θj ∇θk P̄t|t .
(1)
(2)
(1)
(2)
There are three main components in this expression, ∇θj x̄t|t − ∇θj x̄t|t , ∇θj P̄t|t − ∇θj P̄t|t and
(1)
(2)
(i)
(1)
ξ∗ ∇θj ∇θk P̄t|t + (1 − ξ∗ )∇θj ∇θk P̄t|t . ∇θj x̄t|t and ∇θj P̄t|t are given in (B.15) and (B.12). To study
(i)
∇θj ∇θk P̄t|t , by (B.11), we have
(i)

(i)
∇θj ∇θk Pt|t = I −
Pt|t−1 Hi Hi0
(i)
Hi0 Pt|t−1 Hi



(i)

 Fi  ∇θ ∇θ Pt−1|t−1 F 0 I −
i
j
k
Hi Hi0 Pt|t−1
(i)
Hi0 Pt|t−1 Hi

 + P (i) ,
jk,t
where
(i)

Pt|t−1 Hi Hi0
(i)
Pjk,t = ∇θk I −
(i)
Hi0 Pt|t−1 Hi
(i)

+ I −
Pt|t−1 Hi Hi0
(i)
Hi0 Pt|t−1 Hi
+ ∇θk I −
Pt|t−1 Hi Hi0
(i)
Hi0 Pt|t−1 Hi
(i)
+ I −
Pt|t−1 Hi Hi0
(i)
Hi0 Pt|t−1 Hi
(i)

+ I −

Pt|t−1 Hi Hi0
(i)
Hi0 Pt|t−1 Hi
(i)


 Fi  ∇θ Pt−1|t−1 F 0 I −
i
j



(i)

(i)
Hi0 Pt|t−1 Hi

(i)

Hi Hi0 Pt|t−1

(i)

Hi0 Pt|t−1 Hi
(i)

 ∇θ Fi Pt−1|t−1 F 0 + Fi Pt−1|t−1 ∇θ F 0 + ∇θ Qi I −
i
j
j i
j
 ∇θ
k
∇θj Fi Pt−1|t−1 Fi0 + Fi Pt−1|t−1 ∇θj Fi0 + ∇θj Qi I −

 ∇θ Fi Pt−1|t−1 F 0 + Fi Pt−1|t−1 ∇θ F 0 + ∇θ Qi ∇θ I −
i
j
j i
j
k
−
(i)
Hi0 Pt|t−1 Hi
(i)
Hi0 Pt|t−1 Hi
−
(i)
Pt|t−1 Hi Hi0
(i)
(i)

Hi Hi0 Pt|t−1

(i)

Hi0 Pt|t−1 Hi
Hi Hi0 Pt|t−1

(i)

Hi0 Pt|t−1 Hi

(i)
(i)
Hi0 Pt|t−1 Hi
(i)
2
(i)
Hi0 Pt|t−1 Hi
(i)
 Pt|t−1
2
(i)

∇θj Hi0 Pt|t−1 Hi + Hi0 Pt|t−1 ∇θj Hi 
44
(i)
Hi0 Pt|t−1 Hi
∇θj Hi0 Pt|t−1 Hi + Hi0 Pt|t−1 ∇θj Hi 
Pt|t−1 Hi Hi0

(i)

Hi Hi0 Pt|t−1
(i)


0
0
 Pt|t−1 ∇θj Hi Hi + Hi ∇θj Hi
−
Hi Hi0 Pt|t−1

0
0
 Pt|t−1 ∇θj Hi Hi + Hi ∇θj Hi
− ∇ θk 
(i)

 Fi  ∇θ Pt−1|t−1 ∇θ F 0 I −
i
j
k
(i)




(i)
 ∇θk Pt|t−1 .
Therefore, we have
(1)
(2)
ξ∗ ∇θj ∇θk P̄t|t + (1 − ξ∗ )∇θj ∇θk P̄t|t
"
=
P̄ H̄ H̄ 0
I− 0
H̄ P̄ H̄
!
#
"
F̄ ∇θj ∇θk P̄t−1|t−1 F̄
H̄ H̄ 0 P̄
I− 0
H̄ P̄ H̄
0
!#
(1)
(2)
+ ξ∗ P̄jk,t + (1 − ξ∗ )P̄jk,t
and
"
∇θj ∇θk P̄t|t =
P̄ H̄ H̄ 0
I− 0
H̄ P̄ H̄
"
#
!
F̄ ∇θj ∇θk P̄t−1|t−1 F̄
H̄ H̄ 0 P̄
I− 0
H̄ P̄ H̄
0
!#
+ P̄jk,t ,
where
(1)
(2)
P̄jk,t = ξ∗ P̄jk,t + (1 − ξ∗ )P̄jk,t
(1)
(2)
(1)
(2)
+ ξ∗ (1 − ξ∗ ) ∇θj x̄t|t − ∇θj x̄t|t
+ ξ∗ (1 − ξ∗ ) ∇θk x̄t|t − ∇θk x̄t|t
(1)
(2) 0
(1)
(2) 0
∇θk x̄t|t − ∇θk x̄t|t
∇θj x̄t|t − ∇θj x̄t|t
(1)
(2)
(1)
(2)
+ ∇θj ξ¯t|t ∇θk P̄t|t − ∇θk P̄t|t + ∇θk ξ¯t|t ∇θj P̄t|t − ∇θj P̄t|t .
This gives the result in Lemma 3.3.
To show Lemma 3.4. By (2.11), we have
(1)
(2)
∇θj ∇θk xt|t = ∇θj ∇θk ξt|t xt|t − xt|t
(1)
(1)
(2)
+ ∇θj ξt|t ∇θk xt|t − ∇θk xt|t
(2)
+ ∇θk ξt|t ∇θj xt|t − ∇θj xt|t
(1)
(2)
+ ξt|t ∇θj ∇θk xt|t + (1 − ξt|t )∇θj ∇θk xt|t
and
(1)
(2)
∇θj ∇θk x̄t|t = ∇θj ξ¯t|t ∇θk x̄t|t − ∇θk x̄t|t
(1)
(2)
+ ∇θk ξ¯t|t ∇θj x̄t|t − ∇θj x̄t|t
(1)
(2)
+ ξ∗ ∇θj ∇θk x̄t|t + (1 − ξ∗ )∇θj ∇θk x̄t|t ,
(i)
(i)
when δ1 = δ2 . By the previous results, we just need to derive ∇θj ∇θk x̄t|t . Differentiating ∇θj xt|t in
(B.14) with respect to θk , we have

(i)
∇θj ∇θk xt|t = I −
(i)
Pt|t−1 Hi Hi0
(i)
Hi0 Pt|t−1 Hi
45


 Fi  ∇θ ∇θ xt−1|t−1 + X (i) ,
j
k
jk,t
where

(i)
(i)
Pt|t−1 Hi Hi0


 Fi  ∇θj xt−1|t−1
(i)
Hi0 Pt|t−1 Hi


(i)
Pt|t−1 Hi Hi0
 ∇θj Fi xt−1|t−1 + ∇θj Gi
+ ∇θk I −
(i)
0
Hi Pt|t−1 Hi


(i)
Pt|t−1 Hi Hi0
 ∇θk ∇θj Fi xt−1|t−1 + ∇θj Gi
I −
(i)
0
Hi Pt|t−1 Hi


(i)
(i)
(i)
∇θj Pt|t−1 Hi Hi0 + Pt|t−1 ∇θj Hi Hi0 + Pt|t−1 Hi ∇θj Hi0
 Fi xt−1|t−1 + Gi
− ∇θk 
(i)
0
Hi Pt|t−1 Hi


(i)
(i)
(i)
∇θj Pt|t−1 Hi Hi0 + Pt|t−1 ∇θj Hi Hi0 + Pt|t−1 Hi ∇θj Hi0
 ∇θk Fi xt−1|t−1 + Gi
−
(i)
0
Hi Pt|t−1 Hi


(i)
(i)
0 (i)
0
0 (i)
0
∇
H
P
H
+
H
∇
P
H
+
H
P
∇
H
P
H
H
θ
i
θ
i
θ
i
i
j
j
j
i t|t−1
i
i t|t−1
i
t|t−1
 t|t−1

+ ∇θk 
 Fi xt−1|t−1 + Gi
2
(i)
Hi0 Pt|t−1 Hi


(i)
(i)
0
0 (i)
0
0 (i)
P
H
H
∇
H
P
H
+
H
∇
P
H
+
H
P
∇
H
θj i t|t−1 i
i θj t|t−1 i
i t|t−1 θj i 
 t|t−1 i i
+
 ∇θk Fi xt−1|t−1 + Gi
2
(i)
Hi0 Pt|t−1 Hi


(i)
(i)
∇θj Pt|t−1 Hi + Pt|t−1 ∇θj Hi
 (yt − A0i zt )
+ ∇θk 
(i)
Hi0 Pt|t−1 Hi


(i)
(i)
∇θj Pt|t−1 Hi + Pt|t−1 ∇θj Hi
 ∇θk (yt − A0i zt )
+
(i)
Hi0 Pt|t−1 Hi


(i)
(i)
0 (i)
0
0 (i)
P
H
∇
H
P
H
+
H
∇
P
H
+
H
P
∇
H
i
θ
i
θ
i
θ
i
j
j
j
i t|t−1
i
i t|t−1
t|t−1
 t|t−1

0
− ∇ θk 
 (yt − Ai zt )
2
(i)
Hi0 Pt|t−1 Hi


(i)
(i)
0 (i)
0
0 (i)
P
H
∇
H
P
H
+
H
∇
P
H
+
H
P
∇
H
θj i t|t−1 i
i θj t|t−1 i
i t|t−1 θj i 
 t|t−1 i
0
−
 ∇θk (yt − Ai zt )
2
(i)
0
Hi Pt|t−1 Hi




(i)
(i)
P
Pt|t−1 Hi
H
i
t|t−1
 ∇θj A0i zt − 
 ∇θj ∇θk A0i zt .
− ∇θk 
(i)
(i)
Hi0 Pt|t−1 Hi
Hi0 Pt|t−1 Hi
Xjk,t = ∇θk I −
(i)
Then, when δ1 = δ2 , X̄jk,t can be achieved and
"
∇θj ∇θk x̄t|t =
I−
P̄t|t−1 H̄ H̄ 0
H̄ 0 P̄t|t−1 H̄
46
!
#
F̄ ∇θj ∇θk x̄t−1|t−1 + X̄jk,t ,
where
(1)
(2)
X̄jk,t = ∇θj ξ¯t|t ∇θk x̄t|t − ∇θk x̄t|t
(1)
(2)
+ ∇θk ξ¯t|t ∇θj x̄t|t − ∇θj x̄t|t
(1)
(2)
+ ξ∗ X̄jk,t + (1 − ξ∗ )X̄jk,t .
The next lemma provides stochastic bounds for ξ¯t+1|t and its derivatives.
Lemma A.1,1 Suppose Assumption 5 hold. Then, there exists an open neighborhood of (β∗ , δ∗ ) ,
denoted by B (β∗ , δ∗ ) , and a sequence of strictly stationary and ergodic random variables {λt } satisfying
Eλ1+c
< M < ∞ for some c, M > 0, such that:
t
sup
(β,δ1 )∈B(β∗ ,δ∗ )
α(k)
k
< λt
∇θi1 ...∇θik ξ¯t+1|t (t = 1, ..., T )
for all i1 , ..., ik ∈ {1, .., 2nδ + nβ } and k = 1, 2, 3 and 4, where α(k) = 6 if k = 1, 2, 3 and α(k) = 5 if
k = 4. The above inequalities hold uniformly over ≤ p, q ≤ 1 − with being an arbitrary number
satisfying 0 < < 1/2.
Proof of Lemma A.1.1 We use the difference equations in Lemma 1 to relate ∇θi1 ...∇θik ξ¯t+1|t to
the density functions f¯1t and f¯2t and their derivatives. Because the higher order derivatives depend
successively on the lower orders, we start with k = 1. Without loss of generality, suppose j ∈ I1 .
Then, apply (B.6):
6
∇θj ξ¯t+1|t ≤
t−1 X
s
rρ
s=0
∇θj f¯1(t−s) ∇θj f¯2(t−s)
−
f¯t−s
f¯t−s
!6
!!6
!6
∞
∞
X
X
s 1/6
s 1/6
(1 − ) υt−s ,
≤2
|rρ | υt−s ≤ 2
s=0
s=0
where the second inequality follows from Assumption 5 and the last inequality uses ρ = p + q − 1.
Because {υt } is stationary and ergodic, the right hand side is also stationary and ergodic (White,
2001, Theorem 3.35). Denote it by λt and apply Minkowski’s inequality for an infinite sum:
Eλ1+c
t
= 2E
"∞
X
#6(1+c)
(1 −
1/6
)s υt−s
≤2
s=0
=2
(∞
X
s
(1 − )
h
1+c
Eυt−s
i
1
6(1+c)
(∞
Xh
E((1 −
s=0
)6(1+c)
≤ 2L
s=0
1/6
)s υt−s )6(1+c)
(∞
X
i
1
6(1+c)
)6(1+c)
)6(1+c)
s
(1 − )
,
s=0
1+c
where the last inequality holds because Eυt−s
is finite by Assumption 5. Because
P∞
s=0 (1−)
s
= 1/ <
∞, we have Eλ1+c
≤ 2L/6(1+c) < ∞. This establishes the result for k = 1 by setting M = 2L/6(1+c) .
t
47
P
3
s
The proof for k > 1 is similar. For k = 2, we have |∇θj ∇θi ξ¯t+1|t |3 ≤ ( ∞
s=0 ρ Ēji,t−s ) . I provide
upper bounds for |Ēji,t | for five possible cases. Specifically, if j ∈ I0 and i ∈ I1 , then
∇ ∇ f¯
∇θj ∇θi f¯2t ∇θj f¯2t ∇θi f¯2t ∇θj f¯2t ∇θi f¯1t θj θi 1t
−
+
−
Ēji,t = r f¯t
f¯t
f¯t
f¯t
f¯t
f¯t !
∇ ∇ f¯ ∇ ∇ f¯ ∇ f¯ ∇ f¯ ∇ f¯ ∇ f¯ θj θi 1t θj θi 2t θj 2t θi 2t θj 2t θi 1t ≤r +
+
+
f¯t
f¯t
f¯t
f¯t f¯t
f¯t 1/3
≤ 4 |r| υt .
1/6 1/6
The same bound holds if j ∈ I0 and i ∈ I2 . If j ∈ I1 and i ∈ I1 , then |Ēji,t | ≤ 4 |ρ (1 − 2ξ∗ )| λt−1 υt
+
1/3
2 (2 |r| + |r(2ξ∗ − 1)|) υt . The same bound holds if j ∈ I1 and i ∈ I2 , and j ∈ I2 and i ∈ I2 .
Consequently, there exists a finite constant C1 , such that for all the three cases we have |Ēji,t | ≤
1/6 1/6
C1 (λt−1 υt
3
1/3
+ υt ). This implies ∇θj ∇θi ξ¯t+1|t ≤
P
∞
s=0 C1 (1
1/6 1/6
− )s (λt−1 υt
1/3
3
+ υt ) . The right
side is stationary and ergodic; we continue to denote it by λt . By Minkowski’s inequality:
Eλ1+c
t
≤
(∞ X
E C1 (1 − )
s
1/6 1/6
(λt−1 υt
+
1/3 3(1+c)
υt )
1
3(1+c)
)3(1+c)
.
(B.19)
s=0
Apply Minkowski’s inequality followed by the Cauchy–Schwarz inequality to the summands:
1/6 1/6
1/3
E C1 (1 − )s (λt−1 υt
3(1+c)
s 3(1+c)
(1+c)/2 (1+c)/2
Eλt−1 υt
s 3(1+c)
(1+c)
(1+c)
Eλt−1 Eυt
≤ (C1 (1 − ) )
≤ (C1 (1 − ) )
(1+c)
(1+c)
Because Eλt−1 < M and Eυt
(1 − )
+ υt )
3(1+c)s
1
3(1+c)
1
6(1+c)
+
+
(1+c)
Eυt
(1+c)
Eυt
1
3(1+c)
1
3(1+c)
3(1+c)
3(1+c)
.
< L, the last term in the preceding display is no greater than
3(1+c)
C1
(M L)
1
6(1+c)
+L
1
3(1+c)
3(1+c)
≤ C2 (1 − )3(1+c)s ,
(B.20)
where C2 is a finite constant independent of p and q. Plug this into (B.19), we have Eλt1+c ≤
C2 (
P∞
s=0 (1
− )s )3(1+c) = C2 /3(1+c) < ∞. This proves the result for k = 2.
Now, consider k = 3. Inspecting the expressions of Ējil,t reported in the proof of Lemma 1 shows
48
that they comprise the following terms (a, b, c = 1, 2):
∇θj ∇θi f¯at ∇θl f¯bt ∇θj ∇θi ∇θl f¯at ∇θj f¯at ∇θi f¯bt ∇θl f¯ct ∇θl f¯at ∇θj ∇θi ξ̄t|t−1
,
,
,
,
f¯t2
f¯t
f¯t3
f¯t
∇θj f¯at ∇θl f¯bt ∇θi ξ̄t|t−1 ∇θj ∇θi f¯at ∇θl ξ̄t|t−1 ∇θj f¯at ∇θi ξ̄t|t−1 ∇θl ξ̄t|t−1
,
,
.
f¯2
f¯t
f¯t
(B.21)
t
By Assumption 4 and the above results for k = 1 and 2, the quantities in (B.21) are bounded, respec1/2
1/2
1/2
1/6 1/3
1/3 1/6
1/3 1/6
1/2
1/6 1/3
1/6 1/3
tively, by υt , υt , υt , υt λt−1 , υt λt−1 , υt λt−1 and υt λt−1 . Therefore, the ten cases specified
in Lemma 1 all satisfy Ējil,t ≤ C3 (υt
2
1/3 1/6
+ υt λt−1 + υt λt−1 ), where C3 is a finite constant indeP
∞
pendent of p and q. This implies ∇θj ∇θi ∇θl ξ¯t+1|t ≤ 1/2
s
s=0 (1 − ) C3 (υt
1/6 1/3
1/3 1/6
2
+ υt λt−1 + υt λt−1 ) .
Denote the right hand side by λt and proceed along the same lines as between (B.19) and (B.20). It
then follows that Eλ1+c
< ∞. For k = 4, the expressions of Ējilm,t , although omitted here, include
t
terms as in (B.21) but with the orders of derivatives sum to 4 instead of 3. Using the same arguments
as between (B.19) and (B.20), it can be shown that Eλ1+c
< ∞ holds.
t
The follow two lemmas provides stochastic bounds for x̄t|t , P̄t|t and their derivatives.
Lemma A.1.2 Suppose Assumption 5 hold. Then, there exists an open neighborhood of (β∗ , δ∗ ) , denoted by B (β∗ , δ∗ ) , and a sequence of strictly stationary and ergodic random variables {λi,t } satisfying
i
< Mi < ∞ for some ci , Mi > 0, such that for the i-th entry of x̄t|t and its derivatives:
Eλ1+c
i,t
sup
(β,δ1 )∈B(β∗ ,δ∗ )
α(k)
k
< λi,t
∇θi1 ...∇θik x̄t|t,i (t = 1, ..., T )
for all i1 , ..., ik ∈ {1, .., 2nδ + nβ } and k = 1, 2, 3 and 4, where α(k) = 6 if k = 1, 2, 3 and α(k) = 5 if
k = 4. The above inequalities hold uniformly over ≤ p, q ≤ 1 − with being an arbitrary number
satisfying 0 < < 1/2.
Lemma A.1.3 Suppose Assumption 5 hold. Then, there exists an open neighborhood of (β∗ , δ∗ ) , denoted by B (β∗ , δ∗ ) , and a sequence of strictly stationary and ergodic random variables {λij,t } satisfying
1+c
Eλij,t ij < Mij < ∞ for some cij , Mij > 0, such that for the (i,j)-th entry of P̄t|t and its derivatives:
sup
(β,δ1 )∈B(β∗ ,δ∗ )
α(k)
k
< λij,t
∇θi1 ...∇θik P̄t|t,ij (t = 1, ..., T )
for all i1 , ..., ik ∈ {1, .., 2nδ + nβ } and k = 1, 2, 3 and 4, where α(k) = 6 if k = 1, 2, 3 and α(k) = 5 if
k = 4. The above inequalities hold uniformly over ≤ p, q ≤ 1 − with being an arbitrary number
satisfying 0 < < 1/2.
49
Proof of Lemma A.1.2-A.1.3 : The proofs for Lemma A.1(2)-(3) are omitted here since they follow
the similar proofs in Lemma A.1.1.
The next lemma establishes stochastic orders of some quantities related to ξt|t−1 , f1t and f2t . The
quantities are all evaluated at (β̃, δ̃, δ̃).
Lemma A.2 Let is , js , ls , ms , ns be arbitrary integers satisfying 1 ≤ is , js , ls , ms , ns ≤ 2nδ + nβ for
s ∈ {1, 2, 3, 4}. The following results hold uniformly over ≤ p, q ≤ 1 − with being an arbitrary
number satisfying 0 < < 1/2:
1. For any a ∈ {1, 2}, u ∈ {1, 2, 3, 4} and v ∈ {0, 1, 2, 3} satisfying u + v ≤ 4, we have (interpret
∇θj1 ...∇θjv ξ˜t|t−1 as 1 when v = 0)
T ∇ ...∇
˜
1X
θi1
θiu fat
∇θj1 ...∇θjv ξ˜t|t−1 = op (1),
T t=1
f˜t
(B.22)
Further, if u + v ≤ 3, then the result holds with op (1) replaced by Op (T −1/2 ).
2. For any (a, b, c) ∈ {1, 2}, (u, w) ∈ {1, 2, 3} and v ∈ {0, 1, 2} satisfying u + v + w ≤ 4 :
T ∇ ...∇
˜
˜
˜
1X
θi1
θiu fat ∇θj1 ...∇θjv fbt ∇θl1 ...∇θlw fct
= Op (1).
T t=1
f˜t
f˜t
f˜t
3. For any (a, b, c) ∈ {1, 2}, (u, w) ∈ {1, 2, 3} and (v, z) ∈ {0, 1} satisfying u + v + w + z ≤ 3 :
T ∇ f˜ ∇
˜
˜
1X
θi1 at
θj1 ...∇θju fbt ∇θl1 ...∇θlv fct
∇θm1 ...∇θmw ξ˜t|t−1 ∇θn1 ...∇θnz ξ˜t|t−1 = Op (1).
T t=1 f˜t
f˜t
f˜t
Proof of Lemma A.2. By the mean value theorem, the left hand side of (B.22) equals
T −1
∗
T ∇ ...∇
X
θi1
θiu fat
ft∗
t=1
(
+ T
−3/2
T
X
t=1
∇θ 0
∗
∇θj1 ...∇θjv ξt|t−1
(B.23)
∇θi1 ...∇θiu f¯at
∇θj1 ...∇θjv ξ¯t|t−1
f¯at
!)
T 1/2 θ̃ − θ∗ ,
where "∗" and "−" denote that the relevant quantities are evaluated at the true values θ∗0 = (β∗0 , δ∗0 , δ∗0 )
and θ̄0 = β̄ 0 , δ̄ 0 , δ̄ 0 , where θ̄ lies between θ̃0 = (β̃ 0 , δ̃ 0 , δ̃ 0 ) and θ∗ . The first summation is over terms
v/α(k) u/α(k)
υt
that are stationary and ergodic, which are bounded by λt
50
by Assumption 5 and Lemma
A.1. Apply Hölder’s inequality:
v/α(k) u/α(k) 1+c
E(λt
υt
)
≤ E
v(1+c)/α(k) α(k)/v
λt
≤ Eλ1+c
t
v
α(k)
Eυt1+c
v
α(k)
E
u(1+c)/α(k) α(k)/(α(k)−v)
υt
α(k)−v
α(k)
α(k)−v
α(k)
where the last inequality follows because u + v < α(k). Both terms on the right hand side are
finite by Assumption 5 and Lemma A.1. Therefore, the first term in the display (B.23) is op (1) by
Theorem 3.34 in White (2001). Now turn to the second term in the display (B.23). We have, for any
k ∈ {1, ..., 2nδ + nβ } :
!
T
∇θi1 ...∇θiu f¯at
−3/2 X
¯
∇ θk
∇θj1 ...∇θjv ξt|t−1 T
¯at
f
t=1
T
T
¯
¯
−3/2 X ∇θi1 ...∇θiu ∇θk fat
−3/2 X ∇θi1 ...∇θiu fat
¯
¯
≤ T
∇θj1 ...∇θjv ξt|t−1 + T
∇θj1 ...∇θjv ∇θk ξt|t−1 ¯at
¯at
f
f
t=1
t=1
T ∇ ...∇
¯
X
θiu fat ∇θk f¯at
θi1
+ T −3/2
∇θj1 ...∇θjv ξ¯t|t−1 f¯at
f¯at
≤ T −3/2
t=1
T n
X
(u+1)/α(k) v/α(k)
λt
2υt
u/α(k) (v+1)/α(k)
λt
+ υt
o
= Op T −1/2 ,
t=1
where the equality follows from Assumption 4, Lemma A.1 and u + v + 1 ≤ 5. Therefore, the display
(B.23) is op (1).
Now we consider the cases with u + v ≤ 3. If u + v < 3, then the terms inside the first summation
v/6 u/6
of (B.23) are bounded by λt υt
v/6 u/6 2(1+c)
E(λt υt
)
. We have
v(1+c)/3
≤ E(λt
3
)v
v/3 u(1+c)/3
E(υt
3
(3−v)/3
) (3−v)
≤ Eλ1+c
t
v/3 v/6 u/6 2(1+c)
)
The right hand side is finite. If u+v = 3, i.e., u = 3 and v = 0, then E(λt υt
Eυt1+c
(3−v)/3
.
= Eυt (1+c) < ∞.
Apply the central limit theorem; it follows that the left hand side of (B.22) is Op T −1/2 .
Lemma A2.2 and A2.3 can be proved using the same arguments, i.e., first applying the mean value
theorem and then obtaining bounds for the two resulting terms separately. It follows that the left
hand side quantity in Lemma A2.2 is bounded by T −1
Lemma A2.3 is bounded by T −1
(u+v+w)/α(k)
t=1 υt
PT
(1+u+v)/α(k) (w+z)/α(k)
λt
t=1 υt
PT
both satisfy a law of large numbers, therefore are Op (1).
51
+ Op (T −1/2 ), while that in
+ Op (T −1/2 ). The two leading terms
Lemma A.3 Under the null hypothesis and Assumptions 1-5, for all k, l, m ∈ {1, ..., nδ }, we have:
1. Let ek be an nδ -dimensional unit vector whose k-th element equals 1, then



∇δ2k β̂(δ̃)
ξ∗ ∇δ2k δ̂1 (δ̃)



 0 

 = (ξ∗ − 1) 
ek
 + Op (T
−1/2
).
2. The second order derivatives satisfy



∇δ2k ∇δ2l β̂(δ̃)
ξ∗ ∇δ2k ∇δ2l δ̂1 (δ̃)

T
X

−1 1
D̃kl,t + Op (T −1/2 ).
 = −I˜
T
t=1
3. The third order derivatives satisfy



∇δ2k ∇δ2l ∇δ2m β̂(δ̃)
ξ∗ ∇δ2k ∇δ2l ∇δ2m δ̂1 (δ̃)


 = Op (1).
Proof of Lemma A.3 As the proof is long, we organize it into three parts, corresponding to Lemma
A.3.1, A.3.2 and A.3.3 respectively.
Proof of Lemma A.3.1 By construction, θ̂(δ2 ) satisfies the first order conditions:
(1)
Mj (p, q, δ2 ) =
T
M̂t
1X
= 0,
T t=1 B̂t
(j ∈ {1, ..., nβ + nδ }),
(B.24)
where
B̂t = fˆ1t ξˆt|t−1 + fˆ2t (1 − ξˆt|t−1 ),
M̂jt = (∇θj fˆ1t − ∇θj fˆ2t )ξˆt|t−1 + (fˆ1t − fˆ2t )∇θj ξˆt|t−1 + ∇θj fˆ2t .
(B.25)
Because (B.24) holds for any δ2 ∈ ∆, its derivative with respect to δ2 must equal zero. The proof
exploits this fact. It consists of three steps. The first step takes first order derivatives of the nβ + nδ
restrictions in (B.24) with respect to δ2k , where k ∈ {1, ..., nδ }, obtain a system of nβ + nδ linear equations with ∇δ2k β̂(δ2 ) and ∇δ2k δ̂1 (δ2 ) being the unknowns. The second step evaluates these equations
at δ2 = δ̃ and obtains approximation to them. The third step solves these approximating equations
to obtain explicit expressions for ∇δ2k β̂(δ2 ) and ∇δ2k δ̂1 (δ2 ). These three steps are then repeated for
52
every k ∈ {1, ..., nδ } to obtain the lemma.
Step 1 for proving Lemma. Pick an arbitrary k ∈ {1, ..., nδ } and an arbitrary j ∈ {1, ..., nβ + nδ }.
Differentiate the j-th equation from (B.24) with respect to the k-th element of δ2 to obtain
(2)
Mjk (p, q, δ2 ) =
T
T
1X
1X
∇δ2k M̂jt
∇δ2k B̂t
−
M̂jt = 0,
T t=1
T
B̂t
B̂t2
t=1
(B.26)
where
(
∇δ2k M̂jt =
ξˆt|t−1 ∇θj ∇θ0 fˆ1t + 1 − ξˆt|t−1 ∇θj ∇θ0 fˆ2t + ∇θj fˆ1t − ∇θj fˆ2t ∇θ0 ξˆt|t−1
)
+∇θj ξˆt|t−1 ∇θ0 fˆ1t − ∇θ0 fˆ2t + (fˆ1t − fˆ2t )∇θj ∇θ0 ξˆt|t−1 ∇δ2k θ̂(δ2 )
(B.27)
and
n
o
∇δ2k B̂t = ξˆt|t−1 ∇θ0 fˆ1t + 1 − ξˆt|t−1 ∇θ0 fˆ2t + fˆ1t − fˆ2t ∇θ0 ξˆt|t−1 ∇δ2k θ̂(δ2 )
with


 ∇δ2k β̂(δ2 )


∇δ2k θ̂(δ2 ) =  ∇δ2k δ̂1 (δ2 )





,


ek
(B.28)
(B.29)
where ek is an nδ dimensional vector, whose k-th element equals 1 and the others zero. We view
(B.26) as a linear equation with the first nβ + nδ elements of ∇δ2k θ̂(δ2 ) being the unknowns.
The above differentiation can be carried for all j = 1, ..., nβ + nδ while keeping k fixed at the same
value. This delivers nβ + nδ equations with the same number of unknown contained in (B.29).
Step 2 for proving Lemma. We evaluate the term T −1
PT
2
t=1 (∇δ2k B̂t /B̂t )M̂jt
in (B.26) at δ2 = δ̃
for an arbitrary j ∈ {1, ..., nβ + nδ }. It equals
T
i
ξ∗ ∇θj f˜1t + (1 − ξ∗ )∇θj f˜2t h
1X
˜1t + (1 − ξ∗ )∇θ0 f˜2t ∇δ θ̂(δ̃).
0f
ξ
∇
∗
θ
2k
T t=1
f˜t2
Using (B.29), this can be rewritten as
53
T
i
ξ∗ ∇θj f˜1t + (1 − ξ∗ )∇θj f˜2t h
1X
˜1t + (1 − ξ∗ )∇β 0 f˜2t ∇δ β̂(δ̃)
0f
ξ
∇
∗
β
2k
T t=1
f˜t2
+
T
i
ξ∗ ∇θj f˜1t + (1 − ξ∗ )∇θj f˜2t h
1X
˜1t + (1 − ξ∗ )∇δ0 f˜2t ∇δ δ̂1 (δ̃)
0f
ξ
∇
∗
δ
2k
1
1
T t=1
f˜t2
+
T
i
ξ∗ ∇θj f˜1t + (1 − ξ∗ )∇θj f˜2t h
1X
˜1t + (1 − ξ∗ )∇δ f˜2t .
ξ
∇
f
∗
δ
2k
2k
T t=1
f˜t2
(B.30)
Now, let j run through the set {1, ..., nβ + nδ }. Let

1
I˜ =
T
T 
X
t=1


˜
˜
ξ∗ ∇β f˜1t +(1−ξ∗ )∇β f˜2t ξ∗ ∇δ10 f1t +(1−ξ∗ )∇δ10 f2t
f˜t
f˜t
˜
˜
ξ∗ ∇δ1 f˜1t +(1−ξ∗ )∇δ1 f˜2t ξ∗ ∇δ10 f1t +(1−ξ∗ )∇δ10 f2t
f˜t
f˜t
ξ∗ ∇β f˜1t +(1−ξ∗ )∇β f˜2t ξ∗ ∇β 0 f˜1t +(1−ξ∗ )∇β 0 f˜2t
f˜t
f˜t
ξ∗ ∇δ1 f˜1t +(1−ξ∗ )∇δ1 f˜2t ξ∗ ∇β 0 f˜1t +(1−ξ∗ )∇β 0 f˜2t
f˜t
f˜t


.

Then, (B.30) can be written as


I˜ 

∇δ2k β̂(δ̃) 
∇δ2k δ̂1 (δ̃)


+
ξ∗ ∇β f˜1t +(1−ξ∗ )∇β f˜2t ξ∗ ∇δ2k f˜1t +(1−ξ∗ )∇δ2k f˜2t
t=1
f˜t
f˜t
ξ∗ ∇δ1 f˜1t +(1−ξ∗ )∇δ1 f˜2t ξ∗ ∇δ2k f˜1t +(1−ξ∗ )∇δ2k f˜2t
1 PT
t=1
T
f˜t
f˜t
1
T
PT


,
(B.31)
where I˜ is the observed information evaluated at the null estimates. Now we turn to the first term at
δ2 = δ̃ in (B.26). It equals
(
1
T
T X
∇θj ∇θ0 f˜1t
ξ∗
t=1
f˜t
∇θj ∇θ0 f˜2t
+ (1 − ξ∗ )
+
f˜t
∇θj f˜1t − ∇θj f˜2t
f˜t
∇θ0 ξ̂t|t−1 + ∇θj ξ̂t|t−1
∇θ0 f˜1t − ∇θ0 f˜2t
f˜t
)
∇δ2k θ̂(δ̃).
All the terms inside the curly brackets are Op (T −1/2 ). Their effects are dominated by I˜ which is
positive definite in large sample. Combining this fact with (B.26) and (B.31) , we obtain


I˜ 

∇δ2k β̂(δ̃) 
∇δ2k δ̂1 (δ̃)


 = −
ξ∗ ∇β f˜1t +(1−ξ∗ )∇β f˜2t ξ∗ ∇δ2k f˜1t +(1−ξ∗ )∇δ2k f˜2t
t=1
f˜t
f˜t
˜
˜
˜
˜
P
ξ
∇
f
+(1−ξ
ξ
∇
f
+(1−ξ
)∇
f
∗ )∇δ2k f2t
∗ δ1 1t
∗
δ1 2t ∗ δ2k 1t
T
1
t=1
T
f˜t
f˜t
1
T
PT


−1/2
).
 + Op (T
(B.32)
To solve this, it is important to check ξ∗ ∇δ1k f˜1t + (1 − ξ∗ )∇δ1k f˜2t and ξ∗ ∇δ2k f˜1t + (1 − ξ∗ )∇δ2k f˜2t . By
the results in (B.13) and (B.18), we have
∇δ2k P̃t−1|t−1 =
1 − ξ∗
∇δ1k P̃t−1|t−1 ,
ξ∗
54
and
∇δ2k x̃t−1|t−1 =
1 − ξ∗
∇δ1k x̃t−1|t−1 .
ξ∗
Applying (4.3) and the above two equalities, we have
ξ∗ ∇δ2k f˜1t + (1 − ξ∗ )∇δ2k f˜2t =
i
1 − ξ∗ h
ξ∗ ∇δ1k f˜1t + (1 − ξ∗ )∇δ1k f˜2t .
ξ∗
(B.33)
Then we can solve the unknowns in (B.32) and get


 ∇δ2k β̂(δ̃) 

∇δ2k δ̂1 (δ̃)
=


ξ∗ − 1  0 
−1/2
).

 + Op (T
ξ∗
e
k
Proof of Lemma A.3.2. View the quantities in (B.26) as functions of δ2 , p and q, and differentiate it
with respect to the l-th component of δ2 (l = 1, ..., nδ ):
(3)
Mjkl (p, q, δ2 )
=
T
1X
T t=1
= 0,
(
)
∇δ2k ∇δ2l M̂jt ∇δ2k M̂jt ∇δ2l B̂t ∇δ2k ∇δ2l B̂t
∇δ2l M̂jt ∇δ2k B̂t
∇δ B̂t ∇δ B̂t
−
−
M̂jt −
+ 2 2k 3 2l M̂jt
2
2
2
B̂t
B̂t
B̂t
B̂t
B̂t
(B.34)
where
nβ +2nδ (
∇δ2k ∇δ2l M̂jt =
X
∇θ0 ξˆt|t−1 ∇θj ∇θs fˆ1t + ξˆt|t−1 ∇θj ∇θs ∇θ0 fˆ1t
s=1
−∇θ0 ξˆt|t−1 ∇θj ∇θs fˆ2t + (1 − ξˆt|t−1 )∇θj ∇θs ∇θ0 fˆ2t
+(∇θj ∇θ0 fˆ1t − ∇θj ∇θ0 fˆ2t )∇θs ξˆt|t−1 + (∇θj fˆ1t − ∇θj fˆ2t )∇θs ∇θ0 ξˆt|t−1
+(∇θs fˆ1t − ∇θs fˆ2t )∇θj ∇θ0 ξˆt|t−1 + (∇θs ∇θ0 fˆ1t − ∇θs ∇θ0 fˆ2t )∇θj ξˆt|t−1
)
+(∇θ0 fˆ1t − ∇θ0 fˆ2t )∇θj ∇θs ξˆt|t−1 + (fˆ1t − fˆ2t )∇θj ∇θs ∇θ0 ξˆt|t−1 ∇δ2k θ̂s (δ2 )∇δ2l θ̂(δ2 )
(
+ ξˆt|t−1 ∇θj ∇θ0 fˆ1t + 1 − ξˆt|t−1 ∇θj ∇θ0 fˆ2t + ∇θj fˆ1t − ∇θj fˆ2t ∇θ0 ξˆt|t−1 +
)
+∇θj ξˆt|t−1 ∇θ0 fˆ1t − ∇θ0 fˆ2t + (fˆ1t − fˆ2t )∇θj ∇θ0 ξˆt|t−1 ∇δ2k ∇δ2l θ̂(δ2 )
55
and
nβ +2nδ (
X
∇δ2k ∇δ2l B̂t =
∇θ0 ξˆt|t−1 ∇θs fˆ1t + ξˆt|t−1 ∇θs ∇θ0 fˆ1t − ∇θ0 ξˆt|t−1 ∇θs fˆ2t + (1 − ξˆt|t−1 )∇θs ∇θ0 fˆ2t
s=1
)
+(∇θ0 fˆ1t − ∇θ0 fˆ2t )∇θs ξˆt|t−1 + (fˆ1t − fˆ2t )∇θs ∇θ0 ξˆt|t−1 ∇δ2k θ̂s (δ2 )∇δ2l θ̂(δ2 )
n
o
+ ξˆt|t−1 ∇θ0 fˆ1t + (1 − ξˆt|t−1 )∇θ0 fˆ2t + (fˆ1t − fˆ2t )∇θ0 ξˆt|t−1 ∇δ2k ∇δ2l θ̂(δ2 ).
To study (B.34), we can start with the third term T −1
nβ +2nδ nβ +2nδ (
X
X
u=1
s=1
PT
t=1
h
i
∇δ2k ∇δ2l B̂t /B̂t2 M̂jt . At θ̃, it equals
T
ξ∗ ∇θj f˜1t + (1 − ξ∗ )∇θj f˜2t
1X
[∇θu ξ˜t|t−1 ∇θs f˜1t + ξ∗ ∇θs ∇θu f˜1t − ∇θu ξ˜t|t−1 ∇θs f˜2t
T t=1
f˜t2
)
+(1 − ξ∗ )∇θs ∇θu f˜2t + (∇θu f˜1t − ∇θu f˜2t )∇θs ξ˜t|t−1 ∇δ2k θ̂s (δ̃)∇δ2l θ̂u (δ̃)
nβ +2nδ
+
X
s=1
(
T
ξ∗ ∇θj f˜1t + (1 − ξ∗ )∇θj f˜2t
1X
[ξ∗ ∇θs f˜1t + (1 − ξ∗ )∇θs f˜2t ] ∇δ2k ∇δ2l θ̂s (δ̃).
T t=1
f˜t2
)
Because ∇δ2k θ̂s (δ̃) and ∇δ2l θ̂u (δ̃) are Op (T −1/2 ) except for the following four cases s = nβ + k, s =
56
nβ + nδ + k, u = nβ + l and u = nβ + nδ + l, the preceding display equals
T
ξ∗ ∇θj f˜1t + (1 − ξ∗ )∇θj f˜2t
1X
∇δ1l ξ˜t|t−1 ∇δ1k f˜1t + ξ∗ ∇δ1k ∇δ1l f˜1t − ∇δ1l ξ˜t|t−1 ∇δ1k f˜2t
T t=1
f˜t2
(
)
+(1 − ξ∗ )∇δ1k ∇δ1l f˜2t + (∇δ1l f˜1t − ∇δ1l f˜2t )∇δ1k ξ˜t|t−1 ∇δ2k δ̂1k (δ̃)∇δ2l δ̂1l (δ̃)
T
ξ∗ ∇θj f˜1t + (1 − ξ∗ )∇θj f˜2t
1X
∇δ2l ξ˜t|t−1 ∇δ1k f˜1t + ξ∗ ∇δ1k ∇δ2l f˜1t − ∇δ2l ξ˜t|t−1 ∇δ1k f˜2t
T t=1
f˜t2
(
+
)
+(1 − ξ∗ )∇δ1k ∇δ2l f˜2t + (∇δ2l f˜1t − ∇δ2l f˜2t )∇δ1k ξ˜t|t−1 ∇δ2k δ̂1k (δ̃)∇δ2l δ̂2l (δ̃)
T
ξ∗ ∇θj f˜1t + (1 − ξ∗ )∇θj f˜2t
1X
∇δ1l ξ˜t|t−1 ∇δ2k f˜1t + ξ∗ ∇δ2k ∇δ1l f˜1t − ∇δ1l ξ˜t|t−1 ∇δ2k f˜2t
T t=1
f˜t2
(
+
)
+(1 − ξ∗ )∇δ2k ∇δ1l f˜2t + (∇δ1l f˜1t − ∇δ1l f˜2t )∇δ2k ξ˜t|t−1 ∇δ2k δ̂2k (δ̃)∇δ2l δ̂1l (δ̃)
T
ξ∗ ∇θj f˜1t + (1 − ξ∗ )∇θj f˜2t
1X
∇δ2l ξ˜t|t−1 ∇δ2k f˜1t + ξ∗ ∇δ2k ∇δ2l f˜1t − ∇δ2l ξ˜t|t−1 ∇δ2k f˜2t
T t=1
f˜t2
(
+
)
+(1 − ξ∗ )∇δ2k ∇δ2l f˜2t + (∇δ2l f˜1t − ∇δ2l f˜2t )∇δ2k ξ˜t|t−1 ∇δ2k δ̂2k (δ̃)∇δ2l δ̂2l (δ̃)
nβ +nδ
+
X
s=1
(
T
ξ∗ ∇θj f˜1t + (1 − ξ∗ )∇θj f˜2t
1X
[ξ∗ ∇θs f˜1t + (1 − ξ∗ )∇θs f˜2t ] ∇δ2k ∇δ2l θ̂s (δ̃) + Op (T −1/2 ).
2
˜
T t=1
ft
)
Apply ∇δ2l δ̂1l (δ̃) = (ξ∗ − 1)/ξ∗ + Op (T −1/2 ), ∇δ2k δ̂1k (δ̃) = (ξ∗ − 1)/ξ∗ + Op (T −1/2 ), ∇δ2l δ̂2l (δ̃) =
∇δ2k δ̂2k (δ̃) = 1 and rearrange terms, the preceding display further reduces to
T
1 X ξ∗ ∇θj f˜1t + (1 − ξ∗ )∇θj f˜2t
T t=1
f˜t2
(
2 1 − ξ∗
∇δ ∇δ f˜1t
∇δ ∇δ f˜2t
∇δ ∇δ f˜1t
∇δ ∇δ f˜2t
×
ξ∗ 1k 1l
+ (1 − ξ∗ ) 1k 1l
+ ξ∗ 2k 2l
+ (1 − ξ∗ ) 2k 2l
ξ∗
f˜t
f˜t
f˜t
f˜t
∇δ ∇δ f˜1t
∇δ ∇δ f˜2t
∇δ ∇δ f˜1t
∇δ ∇δ f˜2t
1 − ξ∗
ξ∗ 2k 1l
+ (1 − ξ∗ ) 2k 1l
+ ξ∗ 1k 2l
+ (1 − ξ∗ ) 1k 2l
−
ξ∗
f˜t
f˜t
f˜t
f˜t
)
1
∇θl f¯1t
∇θl f¯2t
∇θk f¯1t
∇θk f¯2t
+ 2 ∇δ1k ξ˜t|t−1
−
+
−
∇δ1l ξ˜t|t−1
¯
¯
¯
ξ∗
ft
ft
ft
f¯t
)
nβ +nδ (
T
X
1 X ξ∗ ∇θj f˜1t + (1 − ξ∗ )∇θj f˜2t
+
[ξ∗ ∇θs f˜1t + (1 − ξ∗ )∇θs f˜2t ] ∇δ2k ∇δ2l θ̂s (δ̃) + Op (T −1/2 ).
˜2
T
f
t
s=1
t=1
The above display leads to (nβ + nδ ) equations with j = 1, ..., nβ + nδ . These equations can be written
57
collectively as


I˜ 

∇δ2k ∇δ2l β̂(δ̃) 
∇δ2k ∇δ2l δ̂1 (δ̃)


+
ξ∗ ∇β f˜1t +(1−ξ∗ )∇β f˜2t
Ũkl,t
t=1
f˜t2
˜
˜
P
ξ∗ ∇δ1 f1t +(1−ξ∗ )∇δ1 f2t
T
1
Ũkl,t
t=1
T
f˜t2
1
T
PT


−1/2
),
 + Op (T
where
2 ∇δ1k ∇δ1l f˜2t
∇δ2k ∇δ2l f˜1t
∇δ2k ∇δ2l f˜2t
∇δ1k ∇δ1l f˜1t
Ũkl,t =
+ (1 − ξ∗ )
+ ξ∗
+ (1 − ξ∗ )
ξ∗
f˜t
f˜t
f˜t
f˜t
1 − ξ∗
∇δ2k ∇δ1l f˜1t
∇δ2k ∇δ1l f˜2t
∇δ1k ∇δ2l f˜1t
∇δ1k ∇δ2l f˜2t
−
ξ∗
+ (1 − ξ∗ )
+ ξ∗
+ (1 − ξ∗ )
ξ∗
f˜t
f˜t
f˜t
f˜t
∇θl f¯1t
1
∇θl f¯2t
∇θk f¯2t
∇θk f¯1t
+ 2 ∇δ1k ξ˜t|t−1
−
−
+
∇δ1l ξ˜t|t−1 .
ξ∗
f¯t
f¯t
f¯t
f¯t
1 − ξ∗
ξ∗
This completes the analysis of the third term in (B.34). As shown below, the other terms in (B.34)
are all asymptotically negligible. Note that at δ2 = δ̃, ∇δ2k B̂t can be rewritten as
nβ
X
nβ +2nδ
∇θs f˜1t ∇δ2k θ̂s (δ̃) +
s=1
=
s=1
h
h
i
ξ∗ ∇θs f˜1t + (1 − ξ∗ )∇θs f˜2t ∇δ2k θ̂s (δ̃)
s=nβ +1
nβ
X
X
∇θs f˜1t ∇δ2k β̂s (δ̃) +
nδ
X
h
i
ξ∗ ∇θs f˜1t + (1 − ξ∗ )∇θs f˜2t ∇δ2k δ̂1s (δ̃)
s=1,s6=k
+ ξ∗ ∇δ1k f˜1t + (1 − ξ∗ )∇δ1k f˜2t
i
∇δ2k δ̂1k (δ̃) +
1 − ξ∗
ξ∗
.
(B.35)
This representation is useful because, corresponding to the three terms on the right hand side, we have
∇δ2k β̂s (δ̃) = Op (T −1/2 ), ∇δ2k δ̂1s (δ̃) = Op (T −1/2 ) when s 6= k and ∇δ2k δ̂1k (δ̃) + (1 − ξ∗ )/ξ∗ = Op (T −1/2 )
by the previous results. So, the second, fourth and fifth terms are all Op (T −1/2 ) in (B.34) after applying
(B.35) to ∇δ2k B̂t .
Proof of Lemma A.3.3. View the quantities in (B.34) as functions of δ2 , p and q and differentiate them
58
with respect to the h-th element of δ2 (h = 1, ..., nδ ):
(4)
Mjklh (p, q, δ2 )
T
1X
=
T t=1
(
∇δ2k ∇δ2l ∇δ2h M̂jt ∇δ2k ∇δ2l M̂jt ∇δ2h B̂t
−
B̂t
B̂t2
(B.36)
−
∇δ2k ∇δ2h M̂jt ∇δ2l B̂t ∇δ2k M̂jt ∇δ2l ∇δ2h B̂t
∇δ M̂jt ∇δ2l B̂t ∇δ2h B̂t
−
+ 2 2k
2
2
B̂t
B̂t
B̂t3
−
∇δ2k ∇δ2l B̂t
∇δ2k ∇δ2l ∇δ2h B̂t
∇δ ∇δ B̂t ∇δ2h B̂t
M̂jt −
∇δ2h M̂jt + 2 2k 2l 3
M̂jt
2
2
B̂t
B̂t
B̂t
−
∇δ2k ∇δ2h B̂t
∇δ2k B̂t
∇δ B̂t ∇δ B̂t
∇δ2l M̂jt −
∇δ2l ∇δ2h M̂jt + 2 2k 3 2h ∇δ2l M̂jt
2
2
B̂t
B̂t
B̂t
+2
∇δ2k ∇δ2h B̂t ∇δ2l B̂t
∇δ B̂t ∇δ2l ∇δ2h B̂t
M̂jt + 2 2k
M̂jt
3
B̂t
B̂t3
)
∇δ B̂t ∇δ B̂t
∇δ B̂t ∇δ2l B̂t ∇δ2h B̂t
+2 2k 3 2l ∇δ2h M̂jt − 6 2k
M̂jt .
B̂t
B̂t4
= 0.
Among the fifteen terms, only the 1st and the 6th term involve third order derivatives. They will
be analyzed later. Among the remaining terms, we have the following five cases: (1) The 4th, 7th
and 9th terms involve second order derivatives of B̂t and first order derivatives of M̂jt , which lead
to: T −1
T −1
PT
T −1
PT
˜
PT
˜
˜
˜
t=1 (∇θi1 ∇θi2 fat /ft )(∇θj1 ∇θj2 fbt /ft ),
˜
˜
˜
˜
t=1 (∇θi1 fat /ft )(∇θj1 ∇θj2 fbt /ft ),
˜
˜
˜
t=1 (∇θi1 ∇θi2 fat /ft )∇θj1 ξt|t−1 ,
T −1
T −1
˜
PT
˜
˜
˜
˜
˜
t=1 (∇θi1 fat /ft )(∇θi2 fbt /ft )∇θj1 ξt|t−1 ∇θj2 ξt|t−1 ,
˜
PT
˜
˜
˜
˜
t=1 (∇θi1 fat /ft )(∇θj1 fbt /ft )∇θj2 ξt|t−1
and
where 1 ≤ i1 , i2 , j1 , j2 ≤ nβ + 2nδ , a = 1, 2 and b = 1, 2. They
are all Op (1) by Lemma A.2. (2) The 2rd, 3rd and 10th terms consist of first order derivatives of
B̂t and second order derivatives of M̂jt . They lead to: T −1
˜
˜
PT
˜
˜
˜
t=1 (∇θi1 fat /ft )(∇θj1 ∇θj2 ∇θj3 fbt /ft ),
˜
˜
˜
˜
˜
T −1
PT
˜
˜
˜
˜
T −1
PT
˜
˜
˜
˜ and T −1 PT (∇θ f˜at /f˜t )(∇θ f˜bt /f˜t )∇θ ξ˜t|t−1 , which are all
t=1
i1
j1
j2
t=1 (∇θi1 ∇θi2 fat /ft )(∇θj1 fbt /ft )∇θj2 ξt|t−1 ,
T −1
PT
t=1 (∇θi1 fat /ft )(∇θi2 fbt /ft )∇θj1 ∇θj2 ξt|t−1 ,
t=1 (∇θi1 ∇θi2 fat /ft )(∇θj1 fbt /ft )
Op (1). These three terms are thus Op (T −1/2 ) after applying (B.35) to the first order derivatives of
B̂t . (3) The 5th, 11th and 14th terms consist of: T −1
T −1
˜
PT
˜
˜
˜
˜
˜
˜
PT
˜
t=1 (∇θi1 fat /ft )(∇θi2 fbt /ft )(∇θi3 fct /ft )∇θj1 ξt|t−1 ,
˜
˜
˜
˜
˜
t=1 (∇θi1 fat /ft )(∇θi2 fbt /ft )(∇θj1 ∇θj2 fct /ft ),
which are all Op (1).
Consequently, these
three terms are Op (T −1/2 ) after applying (B.35). (4) The 8th, 12th and 13th terms lead to:
T −1
PT
˜
˜
˜
˜
˜
˜
T −1
T −1
PT
˜
˜
˜
˜
˜
˜
˜
t=1 (∇θi1 fat /ft )(∇θj1 fbt /ft )(∇θj2 fct /ft ),
PT
t=1 (∇θi1 fat /ft )(∇θi2 fbt /ft )(∇θi3 fct /ft )∇θj1 ξt|t−1 ,
sists of T
˜
˜
˜
˜
˜
˜
t=1 (∇θi1 fat /ft )(∇θi2 fbt /ft )(∇θj1 ∇θj2 fct /ft ),
which are all Op (1). (5) The 15th term con
−1/2 after ap˜ ˜
˜ ˜
˜ ˜
˜ ˜
t=1 (∇θi1 fat /ft )(∇θi2 fbt /ft )(∇θi3 fct /ft )(∇θi4 fct /fct ). This term is Op T
−1 PT
plying (B.35).
59
To analyze the remaining two terms in (B.36), we need third order derivatives of M̂jt and B̂t :
∇δ2k ∇δ2l ∇δ2h M̂jt
nβ +2nδ nβ +2nδ (
=
X
X
u=1
s=1
∇θu ξˆt|t−1 ∇θj ∇θs ∇θ0 fˆ1t + ∇θu ∇θ0 ξˆt|t−1 ∇θj ∇θs fˆ1t + ξˆt|t−1 ∇θj ∇θs ∇θu ∇θ0 fˆ1t
+ ∇θ0 ξˆt|t−1 ∇θj ∇θs ∇θu fˆ1t − ∇θu ξˆt|t−1 ∇θj ∇θs ∇θ0 fˆ2t − ∇θu ∇θ0 ξˆt|t−1 ∇θj ∇θs fˆ2t
+ (1 − ξˆt|t−1 )∇θj ∇θs ∇θu ∇θ0 fˆ2t − ∇θ0 ξˆt|t−1 ∇θj ∇θs ∇θu fˆ2t
+ (∇θj ∇θu ∇θ0 fˆ1t − ∇θj ∇θu ∇θ0 fˆ2t )∇θs ξˆt|t−1 + (∇θj ∇θu fˆ1t − ∇θj ∇θu fˆ2t )∇θs ∇θ0 ξˆt|t−1
+ (∇θj ∇θ0 fˆ1t − ∇θj ∇θ0 fˆ2t )∇θs ∇θu ξˆt|t−1 + (∇θj fˆ1t − ∇θj fˆ2t )∇θs ∇θu ∇θ0 ξˆt|t−1
+ (∇θs ∇θ0 fˆ1t − ∇θs ∇θ0 fˆ2t )∇θj ∇θu ξˆt|t−1 + (∇θs fˆ1t − ∇θs fˆ2t )∇θj ∇θu ∇θ0 ξˆt|t−1
+ (∇θs ∇θu ∇θ0 fˆ1t − ∇θs ∇θu ∇θ0 fˆ2t )∇θj ξˆt|t−1 + (∇θs ∇θu fˆ1t − ∇θs ∇θu fˆ2t )∇θj ∇θ0 ξˆt|t−1
+ (∇θu ∇θ0 fˆ1t − ∇θu ∇θ0 fˆ2t )∇θj ∇θs ξˆt|t−1 + (∇θu fˆ1t − ∇θu fˆ2t )∇θj ∇θs ∇θ0 ξˆt|t−1
+ (∇θ0 fˆ1t − ∇θ0 fˆ2t )∇θj ∇θs ∇θu ξˆt|t−1
)
+ (fˆ1t − fˆ2t )∇θj ∇θs ∇θu ∇θ0 ξˆt|t−1 ∇δ2k θ̂s (δ2 )∇δ2l θ̂u (δ2 )∇δ2h θ̂(δ2 )
nβ +2nδ nβ +2nδ (
+
X
X
u=1
s=1
∇θu ξˆt|t−1 ∇θj ∇θs fˆ1t + ξˆt|t−1 ∇θj ∇θs ∇θu fˆ1t − ∇θu ξˆt|t−1 ∇θj ∇θs fˆ2t
+ (1 − ξˆt|t−1 )∇θj ∇θs ∇θu fˆ2t + (∇θj ∇θu fˆ1t − ∇θj ∇θu fˆ2t )∇θs ξˆt|t−1
+ (∇θj fˆ1t − ∇θj fˆ2t )∇θs ∇θu ξˆt|t−1 + (∇θs fˆ1t − ∇θs fˆ2t )∇θj ∇θu ξˆt|t−1
+ (∇θs ∇θu fˆ1t − ∇θs ∇θu fˆ2t )∇θj ξˆt|t−1 + (∇θu fˆ1t − ∇θu fˆ2t )∇θj ∇θs ξˆt|t−1
)
+ (fˆ1t − fˆ2t )∇θj ∇θs ∇θu ξˆt|t−1 [∇δ2k ∇δ2h θ̂s (δ2 )∇δ2l θ̂u (δ2 ) + ∇δ2k θ̂s (δ2 )∇δ2l ∇δ2h θ̂u (δ2 )]
nβ +2nδ (
+
X
ξˆt|t−1 ∇θj ∇θs ∇θ0 fˆ1t + ∇θ0 ξˆt|t−1 ∇θj ∇θs fˆ1t + (1 − ξˆt|t−1 )∇θj ∇θs ∇θ0 fˆ2t
s=1
− ∇θ0 ξˆt|t−1 ∇θj ∇θs fˆ2t + (∇θj ∇θ0 fˆ1t − ∇θj ∇θ0 fˆ2t )∇θs ξˆt|t−1 + (∇θj fˆ1t − ∇θj fˆ2t )∇θs ∇θ0 ξˆt|t−1
+ (∇θs ∇θ0 fˆ1t − ∇θs ∇θ0 fˆ2t )∇θj ξˆt|t−1 + (∇θs fˆ1t − ∇θs fˆ2t )∇θj ∇θ0 ξˆt|t−1
)
+ (∇θ0 fˆ1t − ∇θ0 fˆ2t )∇θj ∇θs ξˆt|t−1 + (fˆ1t − fˆ2t )∇θj ∇θs ∇θ0 ξˆt|t−1 ∇δ2h θ̂(δ2 )∇δ2k ∇δ2l θ̂s (δ2 ),
(
+ ξˆt|t−1 ∇θj ∇θ0 fˆ1t + (1 − ξt|t−1 )∇θj ∇θ0 fˆ2t + (∇θj fˆ1t − ∇θj fˆ2t )∇θ0 ξˆt|t−1
)
+ (∇θ0 fˆ1t − ∇θ0 fˆ2t )∇θj ξˆt|t−1 + (fˆ1t − fˆ2t )∇θj ∇θ0 ξˆt|t−1 ∇δ2k ∇δ2l ∇δ2h θ̂(δ2 ),
60
and
∇δ2k ∇δ2l ∇δ2h B̂t
nβ +2nδ nβ +2nδ (
=
X
X
s=1
u=1
∇θu ∇θ0 ξˆt|t−1 ∇θs fˆ1t + ∇θu ξˆt|t−1 ∇θs ∇θ0 fˆ1t + ∇θ0 ξˆt|t−1 ∇θs ∇θu fˆ1t
+ ξˆt|t−1 ∇θs ∇θu ∇θ0 fˆ1t − ∇θu ∇θ0 ξˆt|t−1 ∇θs fˆ2t − ∇θu ξˆt|t−1 ∇θs ∇θ0 fˆ2t + (1 − ξˆt|t−1 )∇θs ∇θu ∇θ0 fˆ2t
− ∇θ0 ξˆt|t−1 ∇θs ∇θu fˆ2t + (∇θu ∇θ0 fˆ1t − ∇θu ∇θ0 fˆ2t )∇θs ξˆt|t−1 + (∇θu fˆ1t − ∇θu fˆ2t )∇θs ∇θ0 ξˆt|t−1
)
+ (∇θ0 fˆ1t − ∇θ0 fˆ2t )∇θs ∇θu ξˆt|t−1 + (fˆ1t − fˆ2t )∇θs ∇θu ∇θ0 ξˆt|t−1 ∇δ2l θ̂u (δ2 )∇δ2k θ̂s (δ2 )∇δ2h θ̂(δ2 )
nβ +2nδ nβ +2nδ (
+
X
X
s=1
u=1
∇θu ξˆt|t−1 ∇θs fˆ1t + ξˆt|t−1 ∇θs ∇θu fˆ1t − ∇θu ξˆt|t−1 ∇θs fˆ2t + (1 − ξˆt|t−1 )∇θs ∇θu fˆ2t
)
+ (∇θu fˆ1t − ∇θu fˆ2t )∇θs ξˆt|t−1 + (fˆ1t − fˆ2t )∇θs ∇θu ξˆt|t−1 ×
h
i
∇δ2h ∇δ2l θ̂u (δ2 )∇δ2k θ̂s (δ2 ) + ∇δ2l θ̂u (δ2 )∇δ2h ∇δ2k θ̂s (δ2 )
nβ +2nδ (
X
+
∇θ0 ξˆt|t−1 ∇θs fˆ1t + ξˆt|t−1 ∇θs ∇θ0 fˆ1t + (1 − ξˆt|t−1 )∇θs ∇θ0 fˆ2t − ∇θ0 ξˆt|t−1 ∇θs fˆ2t
s=1
)
+ (∇θ0 fˆ1t − ∇θ0 fˆ2t )∇θs ξˆt|t−1 + (fˆ1t − fˆ2t )∇θs ∇θ0 ξˆt|t−1 ∇δ2h θ̂(δ2 )∇δ2k ∇δ2l θ̂s (δ2 )
h
i
+ ξˆt|t−1 ∇θ0 fˆ1t + (1 − ξˆt|t−1 )∇θ0 fˆ2t + (fˆ1t − fˆ2t )∇θ0 ξˆt|t−1 ∇δ2k ∇δ2l ∇δ2h θ̂(δ2 ).
Consider the 1st term in (B.36). In the expression of ∇δ2k ∇δ2l ∇δ2h M̂jt , only the last two lines involve third order derivatives of θ̂(δ2 ). These derivatives are multiplied by (after division by f˜t ):
T −1
˜
PT
˜ and T −1 PT (∇θ f˜at /f˜t )∇θ ξ˜t|t−1 , where a = 1, 2. They are Op (T −1/2 ) by
t=1
j1
i1
t=1 ∇θi1 ∇θi2 fat /ft
Lemma A.2. The remaining components of ∇δ2k ∇δ2l ∇δ2h M̂jt lead to: T −1
a = 1, 2 and k ≤ 4 and T −1
˜
PT
˜
˜
t=1 (∇θi1 ...∇θik fat /ft )(∇θj1 ...∇θjm ξt|t−1 )
˜
PT
˜ for
t=1 (∇θi1 ...∇θik fat /ft )
for a = 1, 2 and k + m ≤ 4.
They are all op (1) by Lemma A.2. Therefore the contribution of ∇δ2k ∇δ2l ∇δ2h M̂jt to (B.36) is op (1).
Finally, we turn to the 6th term in (B.36). In the expression for ∇δ2k ∇δ2l ∇δ2h B̂t , only the final line
involves third order derivatives of θ̂(δ2 ). It can be analyzed in the same way as the second term
in (B.26); see Step 2 of the proof there. The remaining components, multiplied by M̂jt , lead to:
T −1
PT
T −1
PT
T −1
PT
˜
˜
˜
˜
˜
T −1
PT
˜
T −1
PT
t=1 (∇θi1 fat /ft )(∇θi2 fbt /ft )(∇θj1 ∇θj2 ξt|t−1 ),
˜
˜
˜
˜
t=1 (∇θi1 ∇θi2 fat /ft )(∇θj1 fbt /ft )(∇θj2 ξt|t−1 ),
˜
˜
˜
˜
˜
t=1 (∇θi1 fat /ft )(∇θi2 fbt /ft )(∇θj1 ξt|t−1 )
˜
˜
˜
˜
t=1 (∇θi1 ∇θi2 fat /ft )(∇θj1 fbt /ft ),
˜
˜
˜
˜ and
t=1 (∇θi1 ∇θi2 ∇θi3 fat /ft )(∇θj1 fbt /ft )
for a = 1, 2 and b = 1, 2. They are all Op (1) by Lemma
61
A.2. This implies the desired result.
Proof of Lemma 4. The first order derivative of L(p, q, δ2 ) with respect to δ2j gives
(1)
Lj (p, q, δ2 )
= ∇δ2j L(p, q, δ2 )
=
=
T
X
1 ∇θ0 fˆ1t ξˆt|t−1 + ∇θ0 fˆ2t (1 − ξˆt|t−1 ) + (fˆ1t − fˆ2t )∇θ0 ξˆt|t−1 ∇δ2j θ̂(δ2 )
t=1 B̂t
nβ +nδ ( T
X
X
s=1
T
X
+
)
1 ∇θs fˆ1t ξˆt|t−1 + ∇θs fˆ2t (1 − ξˆt|t−1 ) + (fˆ1t − fˆ2t )∇θs ξˆt|t−1 ∇δ2j θ̂s (δ2 )
t=1 B̂t
1 ∇δ2j fˆ1t ξˆt|t−1 + ∇δ2j fˆ2t (1 − ξˆt|t−1 ) + (fˆ1t − fˆ2t )∇δ2j ξˆt|t−1 ,
t=1 B̂t
where the third equality follows from the definition of ∇δ2j θ̂(δ2 ). The term inside the curly brackets
is zero because of the first order conditions determining β̂(δ2 ) and δ̂1 (δ2 ). Thus,
(1)
Lj (p, q, δ2 ) =
T
X
L̂jt
t=1
B̂t
,
where B̂t is defined in (B.1) and L̂jt = ∇δ2j fˆ1t ξˆt|t−1 + ∇δ2j fˆ2t (1 − ξˆt|t−1 ) + (fˆ1t − fˆ2t )∇δ2j ξˆt|t−1 . When
δ2 = δ̃, by (B.33), we have
L̂jt =
1 − ξ∗
M̂(nβ +j)t
ξ∗
(B.37)
and
(1)
Lj (p, q, δ̃) =
1 − ξ∗
(1)
Mnβ +j (p, q, δ̃) = 0.
ξ∗
(1)
Consider the second result in the lemma. Differentiate Lj (p, q, δ2 ) with respect to the k-th
component of δ2 :
(2)
Ljk (p, q, δ2 ) =
T
X
∇δ2k L̂jt
t=1
B̂t
−
T
X
∇δ2k B̂jt
t=1
B̂t
L̂jt ,
where
∇δ2k L̂jt = {ξˆt|t−1 ∇δ2j ∇θ0 fˆ1t + (1 − ξˆt|t−1 )∇δ2j ∇θ0 fˆ2t + (∇δ2j fˆ1t − ∇δ2j fˆ2t )∇θ0 ξˆt|t−1
+(∇θ0 fˆ1t − ∇θ0 fˆ2t )∇δ2j ξˆt|t−1 + (fˆ1t − fˆ2t )∇δ2j ∇θ0 ξˆt|t−1 }∇δ2k θ̂(δ2 ).
62
(2)
Because M(nβ +j)k (p, q, δ2 ) = 0, we have
(2)
(2)
(2)
T −1/2 Ljk (p, q, δ2 ) = T −1/2 Ljk (p, q, δ2 ) − T 1/2 M(nβ +j)k (p, q, δ2 )
= T
T
X
−1/2
−T
(
t=1
T
X
−1/2
∇δ2k L̂jt
−
B̂t
∇δ2k B̂t
B̂t2
t=1
1 − ξ∗
ξ∗
L̂jt −
∇δ2k M̂(nβ +j)t
)
B̂t
1 − ξ∗
M̂(nβ +j)t .
ξ∗
The second summation on the right hand side equals 0 when evaluated at δ2 = δ̃ by (B.37). Consider
the first summation, at δ2 = δ̃, T −1/2
nβ +2nδ (
X
T
−1/2
s=1
PT
i
h
t=1
∇δ2k L̂jt /B̂t equals
"
T
X
ξ∗ ∇δ2j ∇θs f˜1t + (1 − ξ∗ )∇δ2j ∇θs f˜2t
f˜t
t=1
(∇δ2j f˜1t − ∇δ2j f˜2t )∇θs ξ˜t|t−1 + (∇θs f˜1t − ∇θs f˜2t )∇δ2j ξ˜t|t−1
+
f˜t
#)
∇δ2k θ̂s (δ̃).
The terms in the curly brackets are all Op (1) while ∇δ2k θ̂s (δ̃) = Op (T −1/2 ) unless s = nβ + k or
s = nβ + nδ + k. Similarly, at δ2 = δ̃, T −1/2
nβ +2nδ (
X
s=1
T
−1/2
PT
h
i
∇δ2k M̂(nβ +j)t /B̂t equals
t=1
"
T
X
ξ∗ ∇δ1j ∇θs f˜1t + (1 − ξ∗ )∇δ1j ∇θs f˜2t
f˜t
t=1
(∇δ1j f˜1t − ∇δ1j f˜2t )∇θs ξ˜t|t−1 + (∇θs f˜1t − ∇θs f˜2t )∇δ1j ξ˜t|t−1
+
f˜t
#)
∇δ2k θ̂s (δ̃),
and terms in the curly brackets are all Op (1) while ∇δ2k θ̂s (δ̃) = Op (T −1/2 ) unless s = nβ + k or
s = nβ + nδ + k. Combining the two preceding displays, we have
T
−1/2
(2)
Ljk (p, q, δ̃)
=T
−1/2
T
X
t=1
63
Ũjk,t + op (1).
Consider the third order derivatives. We have
(3)
T −3/4 Ljkl (p, q, δ2 ) − T 1/4
= T
−3/4
−T
−T
−T
T
X
(
t=1
T
X
−3/4
∇δ2k ∇δ2l L̂jt
−
B̂t
∇δ2l B̂t
B̂t
t=1
(
T
X
∇δ2k B̂t
−3/4
(
−3/4
+2T
t=1
T
X
1 − ξ∗
(3)
M(nβ +j)kl (p, q, δ2 )
ξ∗
1 − ξ∗
ξ∗
∇δ2k L̂jt
−
B̂t
B̂t
∇δ2l L̂jt
−
B̂t
∇δ2k ∇δ2l B̂t
B̂t2
t=1
−3/4
T
X
∇δ2k B̂t ∇δ2l B̂t
t=1
B̂t2
∇δ2k ∇δ2l M̂(nβ +j)t
)
B̂t
1 − ξ∗
ξ∗
1 − ξ∗
ξ∗
L̂jt −
(B.38)
B̂t
∇δ2l M̂(nβ +j)t
1 − ξ∗
M̂(nβ +j)t
ξ∗
L̂jt −
)
B̂t
)
∇δ2k M̂(nβ +j)t
1 − ξ∗
M̂(nβ +j)t ,
ξ∗
where
∇δ2k ∇δ2l L̂jt
nβ +2nδ (
=
X
ξˆt|t−1 ∇δ2j ∇θs ∇θ0 fˆ1t + (1 − ξˆt|t−1 )∇δ2j ∇θs ∇θ0 fˆ2t + (∇δ2j ∇θs fˆ1t − ∇δ2j ∇θs fˆ2t )∇θ0 ξˆt|t−1
s=1
+(∇δ2j ∇θ0 fˆ1t − ∇δ2j ∇θ0 fˆ2t )∇θs ξˆt|t−1 + (∇δ2j fˆ1t − ∇δ2j fˆ2t )∇θs ∇θ0 ξˆt|t−1
+(∇θs ∇θ0 fˆ1t − ∇θs ∇θ0 fˆ2t )∇δ2j ξˆt|t−1 + (∇θs fˆ1t − ∇θs fˆ2t )∇δ2j ∇θ0 ξˆt|t−1
)
+(∇θ0 fˆ1t − ∇θ0 fˆ2t )∇δ2j ∇θs ξˆt|t−1 + (fˆ1t − fˆ2t )∇δ2j ∇θs ∇θ0 ξˆt|t−1 ∇δ2k θ̂s (δ2 )∇δ2l θ̂(δ2 )
(
+ ξˆt|t−1 ∇δ2j ∇θ0 fˆ1t + (1 − ξˆt|t−1 )∇δ2j ∇θ0 fˆ2t + (∇δ2j fˆ1t − ∇δ2j fˆ2t )∇θ0 ξˆt|t−1
)
+(∇θ0 fˆ1t − ∇θ0 fˆ2t )∇δ2j ξˆt|t−1 + (fˆ1t − fˆ2t )∇δ2j ∇θ0 ξˆt|t−1 ∇δ2k ∇δ2l θ̂(δ2 ).
The first summation in (B.38) consists of the following: T −3/4
PT
˜
˜
˜
t=1 (∇θi1 ...∇θiu fat /ft )∇θj1 ...∇θjv ξt|t−1
with u + v ≤ 3. They are Op (T −1/4 ) by the first result in Lemma A.2. Combining this result with
Lemma A.3, it follows that this summation is Op (T −1/4 ). The remaining two summations in (B.38)
have the same structure. They are both Op (T −1/4 ) after applying (B.35).
64
Consider the fourth order derivatives. We have
(4)
T −1 Ljklm (p, q, δ2 ) −
= T
−1
−T
−T
T
X
(
t=1
T
X
−1
−T
∇δ2k L̂jt
−
B̂t
1 − ξ∗
ξ∗
∇δ2m ∇δ2k L̂jt
−
B̂t
1 − ξ∗
ξ∗
∇δ2k L̂jt ∇δ2l B̂t
−
B̂t2
−1
T
X
∇δ2k ∇δ2m B̂t
−1
t=1
T
X
∇δ2k ∇δ2l B̂t
B̂t
t=1
−1
t=1
T
X
∇δ2l L̂jt
−
B̂t
1 − ξ∗
ξ∗
∇δ2l ∇δ2m L̂jt
−
B̂t
1 − ξ∗
ξ∗
B̂t
∇δ2l L̂jt ∇δ2m B̂t
−
B̂t2
∇δ2k ∇δ2l ∇δ2m B̂t
B̂t2
t=1
L̂jt −
65
)
∇δ2k ∇δ2m M̂(nβ +j)t
1 − ξ∗
ξ∗
∇δ2k M̂(nβ +j)t
)
B̂t
(
)
B̂t
1 − ξ∗
ξ∗
(
∇δ2m L̂jt
−
B̂t
∇δ2k ∇δ2l M̂(nβ +j)t
B̂t
B̂t
(
(
)
B̂t
(
(B.39)
∇δ2k ∇δ2l ∇δ2m M̂(nβ +j)t
(
−1
T
X
∇δ2k B̂t
1 − ξ∗
ξ∗
B̂t
∇δ2k B̂t
B̂t
t=1
1 − ξ∗
ξ∗
∇δ2k ∇δ2l L̂jt
−
B̂t
t=1
T
X
+2T
−T
(
B̂t
T
X
∇δ2m B̂t
−1
−1
T
X
∇δ2l ∇δ2m B̂t
−1
∇δ2l B̂t
−T −1
B̂t
t=1
−T
(
∇δ2m B̂t
B̂t
t=1
+2T
1 − ξ∗
(4)
M(nβ +j)klm (p, q, δ2 )
ξ∗
∇δ2k ∇δ2l ∇δ2m L̂jt
−
B̂t
t=1
T
X
−T
∇δ2k M̂(nβ +j)t ∇δ2l B̂t
)
B̂t2
∇δ2m M̂(nβ +j)t
)
B̂t
∇δ2l M̂(nβ +j)t
)
B̂t
∇δ2l ∇δ2m M̂(nβ +j)t
1 − ξ∗
ξ∗
B̂t
∇δ2m B̂t ∇δ2l M̂(nβ +j)t
B̂t2
1 − ξ∗
M̂(nβ +j)t ,
ξ∗
)
)
where
∇δ2k ∇δ2l ∇δ2m L̂jt
nβ +2nδ nβ +2nδ
=
X
X
u=1
s=1
{ξˆt|t−1 ∇δ2j ∇θs ∇θu ∇θ0 fˆ1t + (1 − ξˆt|t−1 )∇δ2j ∇θs ∇θu ∇θ0 fˆ2t
+ (∇δ2j ∇θs ∇θu fˆ1t − ∇δ2j ∇θs ∇θu fˆ2t )∇θ0 ξˆt|t−1
+ (∇δ2j ∇θs ∇θ0 fˆ1t − ∇δ2j ∇θs ∇θ0 fˆ2t )∇θu ξˆt|t−1 + (∇δ2j ∇θs fˆ1t − ∇δ2j ∇θs fˆ2t )∇θu ∇θ0 ξˆt|t−1
+ (∇δ2j ∇θu ∇θ0 fˆ1t − ∇δ2j ∇θu ∇θ0 fˆ2t )∇θs ξˆt|t−1 + (∇δ2j ∇θu fˆ1t − ∇δ2j ∇θu fˆ2t )∇θs ∇θ0 ξˆt|t−1
+ (∇δ2j ∇θ0 fˆ1t − ∇δ2j ∇θ0 fˆ2t )∇θs ∇θu ξˆt|t−1 + (∇δ2j fˆ1t − ∇δ2j fˆ2t )∇θs ∇θu ∇θ0 ξˆt|t−1
+ (∇θs ∇θu ∇θ0 fˆ1t − ∇θs ∇θu ∇θ0 fˆ2t )∇δ2j ξˆt|t−1 + (∇θs ∇θu fˆ1t − ∇θs ∇θu fˆ2t )∇δ2j ∇θ0 ξˆt|t−1
+ (∇θs ∇θ0 fˆ1t − ∇θs ∇θ0 fˆ2t )∇δ2j ∇θu ξˆt|t−1 + (∇θs fˆ1t − ∇θs fˆ2t )∇δ2j ∇θu ∇θ0 ξˆt|t−1
+ (∇θu ∇θ0 fˆ1t − ∇θu ∇θ0 fˆ2t )∇δ2j ∇θs ξˆt|t−1 + (∇θu fˆ1t − ∇θu fˆ2t )∇δ2j ∇θs ∇θ0 ξˆt|t−1
+ (∇θ0 fˆ1t − ∇θ0 fˆ2t )∇δ2j ∇θs ∇θu ξˆt|t−1 + (fˆ1t − fˆ2t )∇δ2j ∇θs ∇θu ∇θ0 ξˆt|t−1 }×
∇δ2k θ̂s (δ2 )∇δ2l θ̂u (δ2 )∇δ2m θ̂(δ2 )
nβ +2nδ nβ +2nδ
+
X
X
u=1
s=1
{ξˆt|t−1 ∇δ2j ∇θs ∇θu fˆ1t + (1 − ξˆt|t−1 )∇δ2j ∇θs ∇θu fˆ2t
+ (∇δ2j ∇θs fˆ1t − ∇δ2j ∇θs fˆ2t )∇θu ξˆt|t−1
+ (∇δ2j ∇θu fˆ1t − ∇δ2j ∇θu fˆ2t )∇θs ξˆt|t−1 + (∇δ2j fˆ1t − ∇δ2j fˆ2t )∇θs ∇θu ξˆt|t−1
+ (∇θs ∇θu fˆ1t − ∇θs ∇θu fˆ2t )∇δ2j ξˆt|t−1 + (∇θs fˆ1t − ∇θs fˆ2t )∇δ2j ∇θu ξˆt|t−1
+ (∇θu fˆ1t − ∇θu fˆ2t )∇δ2j ∇θs ξˆt|t−1 + (fˆ1t − fˆ2t )∇δ2j ∇θs ∇θu ξˆt|t−1 }×
[∇δ2m ∇δ2k θ̂s (δ2 )∇δ2l θ̂u (δ2 ) + ∇δ2k θ̂s (δ2 )∇δ2l ∇δ2m θ̂u (δ2 )]
nβ +2nδ
+
X
{ξˆt|t−1 ∇δ2j ∇θs ∇θ0 fˆ1t + (1 − ξˆt|t−1 )∇δ2j ∇θs ∇θ0 fˆ2t + (∇δ2j ∇θs fˆ1t − ∇δ2j ∇θs fˆ2t )∇θ0 ξˆt|t−1
s=1
+ (∇δ2j ∇θ0 fˆ1t − ∇δ2j ∇θ0 fˆ2t )∇θs ξˆt|t−1 + (∇δ2j fˆ1t − ∇δ2j fˆ2t )∇θs ∇θ0 ξˆt|t−1
+ (∇θs ∇θ0 fˆ1t − ∇θs ∇θ0 fˆ2t )∇δ2j ξˆt|t−1 + (∇θs fˆ1t − ∇θs fˆ2t )∇δ2j ∇θ0 ξˆt|t−1
+ (∇θ0 fˆ1t − ∇θ0 fˆ2t )∇δ2j ∇θs ξˆt|t−1 + (fˆ1t − fˆ2t )∇δ2j ∇θs ∇θ0 ξˆt|t−1 }∇δ2k ∇δ2l θ̂s (δ2 )∇δ2m θ̂(δ2 )
nβ +2nδ
+
X
{ξˆt|t−1 ∇δ2j ∇θs fˆ1t + (1 − ξˆt|t−1 )∇δ2j ∇θs fˆ2t + (∇δ2j fˆ1t − ∇δ2j fˆ2t )∇θs ξˆt|t−1
s=1
+ (∇θs fˆ1t − ∇θs fˆ2t )∇δ2j ξˆt|t−1 + (fˆ1t − fˆ2t )∇δ2j ∇θs ξˆt|t−1 }∇δ2k ∇δ2l ∇δ2m θ̂s (δ2 ).
66
When δ2 = δ̃, all the terms in (B.39) are op (1) except the 3rd, 6th and 7th terms. These three terms
share the same structure and it is suffice to study the first of them:
−T
−1
T
X
∇δ2l ∇δ2m B̂t
(
B̂t
t=1
∇δ2k L̂jt
−
B̂t
1 − ξ∗
ξ∗
∇δ2k M̂(nβ +j)t
B̂t
)
.
(B.40)
According to the previous study, for ∇δ2l ∇δ2m B̂t it is sufficient to consider
1 − ξ∗
ξ∗
∇δ ∇δ f˜1t
∇δ ∇δ f˜1t
∇δ ∇δ f˜2t
∇δ ∇δ f˜2t
+ (1 − ξ∗ ) 1l 1m
+ ξ∗ 2l 2m
+ (1 − ξ∗ ) 2l 2m
ξ∗ 1l 1m
f˜t
f˜t
f˜t
f˜t
#
2 "
"
∇δ ∇δ f˜1t
∇δ ∇δ f˜2t
∇δ ∇δ f˜1t
∇δ ∇δ f˜2t
ξ∗ 2l 1m
+ (1 − ξ∗ ) 2l 1m
+ ξ∗ 1l 2m
+ (1 − ξ∗ ) 1l 2m
f˜t
f˜t
f˜t
f˜t
"
!
!
#
1
∇θm f¯1t ∇θm f¯2t
∇θl f¯1t ∇θl f¯2t
+ 2 ∇δ1l ξ˜t|t−1
−
+
−
∇δ1m ξ˜t|t−1
ξ∗
f¯t
f¯t
f¯t
f¯t
1 − ξ∗
−
ξ∗
"
nβ +nδ
X n
o
[ξ∗ ∇θs f˜1t + (1 − ξ∗ )∇θs f˜2t ] ∇δ2l ∇δ2m θ̂s (δ̃),
+
s=1
and for
∇δ2k L̂jt
−
B̂t
1 − ξ∗
ξ∗
∇δ2k M̂(nβ +j)t
B̂t
,
it is sufficient to consider
# "
#
2 "
∇δ1j ∇δ1k f˜2t
∇δ2j ∇δ2k f˜1t
∇δ2j ∇δ2k f˜2t
∇δ1j ∇δ1k f˜1t
1 − ξ∗
+ ξ∗
+ (1 − ξ∗ )
+ (1 − ξ∗ )
ξ∗
ξ∗
f˜t
f˜t
f˜t
f˜t
"
#
∇δ2j ∇δ1k f˜1t
∇δ2j ∇δ1k f˜2t
∇δ1j ∇δ2k f˜2t
∇δ1j ∇δ2k f˜1t
1 − ξ∗
−
ξ∗
+ (1 − ξ∗ )
+ ξ∗
+ (1 − ξ∗ )
ξ∗
f˜t
f˜t
f˜t
f˜t
"
!
#
∇θj f¯1t
∇θj f¯2t
1
∇θk f¯1t
∇θk f¯2t
+ 2 ∇δ1j ξ˜t|t−1
−
+
−
∇δ1k ξ˜t|t−1 .
¯
¯
¯
ξ∗
ft
ft
ft
f¯t
So, at δ2 = δ̃, (B.40) equals
−T −1
T
X

Ũlm,t +
t=1
= −T
−1
T
X
1
f˜t
X n
[ξ∗ ∇θs f˜1t + (1 − ξ∗ )∇θs f˜2t ] ∇δ2l ∇δ2m θ̂s (δ̃) Ũjk,t + op (1)
Ũlm,t Ũjk,t − T
= − Ṽjklm −
o
s=1
−1
t=1
h

nβ +nδ
T
X
t=1
0 ˜−1
D̃jk
I D̃lm
i
#
ξ∗ ∇(β 0 ,δ10 ) f˜1t + (1 − ξ∗ )∇(β 0 ,δ10 ) f˜2t −1
Ũjk,t
I˜ D̃lm + op (1)
f˜t
"
+ op (1).
67
#
#
Consequently,
(4)
T −1 Ljklm (p, q, δ̃) − T −1
1 − ξ∗
(4)
M(nβ +j)klm (p, q, δ̃)
ξ∗
n
o
0 ˜−1
0 ˜−1
0 ˜−1
= − Ṽjklm − D̃jk
I D̃lm + Ṽjmkl − D̃jm
I D̃kl + Ṽkmjl − D̃km
I D̃jl + op (1) .
This proves the last result of the lemma.
The next lemma is the same as Lemma A.3 in Qu and Zhuo (2015) and will be used in the proof of
Lemma 5 for establishing the stochastic equicontinuity. Let "*" signify that the quantity is evaluated
at the true parameter value.
Lemma A.4. Let Assumptions 1-6 and the null hypothesis hold. Let zt (ρ) = T −1/2
Pt−1 t−s
εjs εit ,
s=1 ρ
∗ /f ∗ and ε = ∇ f ∗ /f ∗ . Then, for any ρ, ρ and ρ satisfying − 1 ≤ ρ ≤ ρ ≤
where εit = ∇δ1i f1t
1
2
1
js
δ1j 1s s
t
ρ2 ≤ 1 − , we have

2 
2 T
T
X
X
E  [zt (ρ) − zt (ρ1 )] [zt (ρ2 ) − zt (ρ)]  ≤ C (ρ − ρ1 )2 ,
(B.41)
t=1
t=1
where C is a finite constant that depends only on 0 < < 1/2 and the moments of εit and εjs up to
the fourth order.
Proof of A.4. See the proof of Lemma A.3 in Qu and Zhuo (2015).
Proof of Lemma 5. Apply the mean value theorem:
T
−1/2
T
X
Ũjk,t = T
−1/2
t=1
T
X
(
Ujk,t + T
−1
T
X
)
∇θ0 Ūjk,t T 1/2 (θ̃ − θ∗ ),
(B.42)
t=1
t=1
where Ujk,t and Ūjk,t have the same definition as Ũjk,t but evaluated at the true value θ∗ and some
value θ̄ that lies between θ̃ and θ∗ , respectively. We establish the weak convergence of the first term
of (B.42) in two steps. First, for any ≤ p, q ≤ 1 − , T −1/2
PT
t=1 Ujk,t
satisfies the central limit
theorem. Second, to verify its stochastic equicontinuity, it suffices to consider the following term in
its definition (4.6):
T
−1/2
=
T
1 X
∇δ1k f1t ∇δ1k f2t
∇δ ξ
−
ξ∗2 t=1 1j t|t−1
ft
ft
1−p
1−q
(
T
−1/2
T
t−1
X
X
t=1
s=1
s
ρ
∇δ1j f1t−s ∇δ1j f2t−s
−
ft
ft
68
! ∇δ1k f1t ∇δ1k f2t
−
ft
ft
)
,
where the quantities are all evaluated at the true value θ∗ , and the equality follows from (B.6) and (2.4).
Denote the quantity inside the curly brackets as W (ρ). Note that we have |ρ| ≤ 1 − 2. Then, Lemma
A.3 implies, for any ρ1 ≤ ρ ≤ ρ2 , we have E[|W (ρ1 ) − W (ρ)|2 |W (ρ) − W (ρ2 )|2 ] ≤ C2 (ρ1 − ρ2 )2 ,
where C2 is a finite constant. We can apply the similar proof to the third and fourth components in
T −1/2
PT
t=1 Ujk,t .
Then the condition required in Theorem 13.5 in Billingsley (1999; c.f. the Display
(13.14) in p. 143) is satisfied.
The second term in (B.42) equals, by the mean value theorem,
(
−
T −1
T X
ξ∗ ∇(β 0 ,δ0 ) f1t + (1 − ξ∗ )∇(β 0 ,δ0 ) f2t
1
1
ft
t=1
(
= −Djk I
−1
T
−1/2
)
Ujk,t
(
I −1
T −1/2
T X
ξ∗ ∇(β 0 ,δ0 ) f1t + (1 − ξ∗ )∇(β 0 ,δ0 ) f2t
1
)
T X
ξ∗ ∇(β 0 ,δ0 ) f1t + (1 − ξ∗ )∇(β 0 ,δ0 ) f2t
1
1
ft
t=1
1
ft
t=1
)
+ op (1)
+ op (1) ,
where the quantities are all evaluated at the true value θ∗ and the equality holds because of the
uniform law of large numbers. The term inside the last curly brackets is independent of p and q
and satisfies the central limit theorem. Combining the above results for the two terms in (B.42), it
follows that T −1/2
PT
converges weakly over ≤ p, q ≤ 1 − . The covariance function follows
t=1 Ũjk,t
immediately.
Proof of Proposition 1. Let η = T −1/4 (δ2 − δ̃). The expansion (4.5) can be equivalently represented
in matrix notation as
L(p, q, δ2 ) − L(p, q, δ̃)
=
i
1
0
1 ⊗3 0
1 ⊗2 0 h −1/2
× Op T −1/4 −
η
T
vecL(2) (p, q, δ̃) +
η
η ⊗2 [Ω(p, q) + op (1)] η ⊗2 .
2!
3!
8
Because Ω(p, q) is positive definite, the right hand side will be negative with probability approaching
1 unless η = Op (1). Thus, for any ε > 0, we can choose M < ∞ such that P (kηk ≤ M ) ≥ 1 − ε for
sufficiently large T . Restricting to this set, we have
sup
sup
h
L(p, q, δ2 ) − L(p, q, δ̃)
i
(p,q)∈Λ kηk≤M
=
sup
sup
η
⊗2
0 h
T
(p,q)∈Λ kηk≤M
=⇒
sup
sup
(p,q)∈Λ kηk≤M
η
⊗2
0
−1/2
(2)
vecL
1 ⊗2 0
(p, q, δ̃) −
η
Ω(p, q) η ⊗2 + op (1)
4
i
(B.43)
1 ⊗2 0
G(p, q) −
η
Ω(p, q) η ⊗2 ,
4
where the convergence follows from Lemma (B.43) and that the supremum operator is continuous when
69
taken over a compact set. Finally, the result follows because ε can be made arbitrarily small.
70
Download