Testing for Regime Switching in State Space Models∗ Fan Zhuo† Boston University November 10, 2015 Abstract This paper develops a modified likelihood ratio (MLR) test for detecting regime switching in state space models. I apply the filtering algorithm introduced in Gordon and Smith (1988) to construct a modified likelihood function under the alternative hypothesis of two regimes and extend the analysis in Qu and Zhuo (2015) to establish the asymptotic distribution of the MLR statistic under the null hypothesis. I also present a practical application of the test using U.S. unemployment rates. This paper is the first to develop a test for detecting regime switching in state space models that is based on the likelihood ratio principle. Keywords: Hypothesis testing, likelihood ratio, state space model, Markov switching. ∗ I am deeply indebted to my advisor Zhongjun Qu for his invaluable advice and support for my research. I also wish to thank Pierre Perron, Hiroaki Kaido, Ivan Fernandez-Val, and seminar participants at Boston University for valuable suggestions. † Department of Economics, Boston University, 270 Bay State Rd., Boston, MA, 02215 (zhuo@bu.edu). 1 1 Introduction Economists have long recognized the possibility that model parameters may not be constant through time, and that instead there can be variations in model structure. If these variations are temporary and recurrent, then the Markov regime switching model can offer a natural modeling choice. Hamilton (1989) makes a seminal contribution that not only introduces a framework with Markov regime switching for describing economic growth, but also provides a general algorithm for filtering, smoothing, and maximum likelihood estimation. A survey of the literature on regime switching models can be found in Hamilton (2008). Meanwhile, state space models are widely used in economics and finance to study time series with latent state variables. Harvey (1981) and Meinhold and Singpurwalla (1983) introduced economists to the use of the Kalman (1960) filter for constructing likelihood functions through the prediction error decomposition. Latent state variables and regime switching can arise at the same time, which poses a challenge for modeling them jointly. However, as Hamilton (1990) pointed out, conducting formal tests for the presence of Markov switching is challenging. There are generally three approaches for detecting regime switching. The first approach tests for parameter homogeneity versus heterogeneity. Early contributions include Neyman and Scott (1996), Chesher (1984), Lancaster (1984) and Davidson and MacKinnon (1991). Recently, Carrasco, Hu and Ploberger (2014) further developed this approach and proposed a class of optimal tests for the constancy of parameters in random coefficients models where the parameters are weakly dependent under the alternative hypothesis. The second approach, from Hamilton (1996), offers a series of specification tests of regime switching in time series models. These tests only need researchers to estimate the model under the null hypothesis and have powers against a wide range of alternative models. However, their powers can be lower than what is achievable if the parameters indeed follow a finite state Markov chain. The third approach is based on the (quasi) likelihood ratio principle. Several important advances have been made by Hansen (1992), Garcia (1998), Cho and White (2007), and Carter and Steigerwald (2012). Qu and Zhuo (2015) is a recent development, which analyzes likelihood ratio based tests for Markov regime switching allowing for multiple switching parameters. The purpose of the present paper is to detect the regime switching in state space models. The likelihood function for a state space model with regime switching is hard to construct, as discussed in Kim and Nelson (1999). Different approximations to the likelihood function have been considered in the literature, such as in Gordon and Smith (1988) and Highfield (1990). This paper 2 uses the approximation applied in Gordon and Smith (1988). Based on this approximation, I develop a modified likelihood ratio (MLR) test. I extend the techniques developed in Qu and Zhuo (2015) to handle the nonstandard features associated with the MLR test. These nonstandard features include the following: (1) Some nuisance parameters are unidentified under the null hypothesis, which violates the standard conditions that yield the chi-squared asymptotic distribution for the test statistic. This gives rise to the Davies (1977) problem. (2) The null hypothesis yields a local optimum (c.f. Hamilton, 1990), making the score function identically zero at the null parameter estimates. Consequently, a second order Taylor approximation of the likelihood ratio is insufficient to study its asymptotic properties. (3) Conditional regime probabilities follow stochastic processes that can only be constructed recursively. Moreover, this paper tackles an additional difficulty introduced by the latent state variables when expanding the MLR. The asymptotic distribution of the MLR test statistic is analyzed in five steps. 1. I describe the algorithm used to construct the modified likelihood function for Markov switching state space models introduced in Gordon and Smith (1988). 2. I characterize the conditional regime probability, the filtered latent state, the mean squared error of the filtered latent state, and their high order derivatives with respect to the model parameters. 3. I first fix p and q and derive a fourth order Taylor approximation to the MLR. Then, I view the MLR as an empirical process indexed by p and q, and derive its asymptotic distribution. 4. While the above limiting distributions are adequate for a broad class of models, they can lead to over-rejections in some situations specified later. To resolve the issue of over-rejection, the higher order terms in the likelihood expansion are incorporated into the asymptotic distribution to safe guard against their effects. 5. I apply a unified algorithm proposed in Qu and Zhuo (2015) to simulate the above refined asymptotic distribution. Three Monte Carlo experiments are conducted to examine the MLR statistic. The first experiment checks the improvement introduced by the refined asymptotic distribution. The second and third experiments check the size and power of the MLR statistic. I also apply my method to study changes in U.S. unemployment rates and find strong evidence favoring the regime switching specification. 3 This paper is the first to develop a likelihood ratio based test for detecting regime switching in general state space models and contributes to the literature in several ways. First, I demonstrate the construction of the modified likelihood function under a two regimes specification for general state space models. Next, I study the Taylor expansion of the MLR when some regularity conditions fail to hold. Finally, I apply my method to an empirical example and find the comovement between the U.S. business cycle and changes in monthly U.S. unemployment rates. The paper is structured as follows. In Section 2, I provide the general model, the basic filter, and the hypotheses. Section 3 introduces the test statistic. Section 4 studies asymptotic properties of the MLR for prespecified p and q. Section 5 provides the limiting distribution of the MLR test statistic and introduces a finite sample refinement. Section 6 examines the finite sample properties of the test statistic. Section 7 considers an empirical application to the U.S. unemployment rate. Section 8 concludes. All proofs are collected in the appendix. The following notation is used. ||x|| is the Euclidean norm of a vector x. ||X|| is the vector induced norm for a matrix X. x⊗k and X ⊗k denote the k-fold Kronecker product of x and X, respectively. The expression vec(A) stands for the vectorization of a k dimensional array A. For example, for a three dimensional array A with n elements along each dimension, vec(A) returns a n3 -vector whose (i + (j − 1)n + (k − 1)n2 )-th element equals A(i, j, k). 1{·} is the indicator function. For a scalar valued function f (θ), let θ ∈ Rp , ∇θ f (θ0 ) denotes a p × 1 vector of partial derivatives with respect to θ and evaluated at θ0 . ∇θ0 f (θ0 ) equals the transpose of ∇θ f (θ0 ) and ∇θj f (θ0 ) denotes its j-th element. For a matrix function P (θ), ∇θj P (θ) denotes the derivative of P (θ) with respect to the j-th element in θ. The symbols “⇒”, “→d ” and “→p ” denote weak convergence under the Skorohod topology, convergence in distribution and in probability, respectively. Op (·) and op (·) are the usual notations for the orders of stochastic magnitude. 2 Model and hypotheses This section presents the model and hypotheses. The discussion consists of the following: the model, the log likelihood function under the null hypothesis (i.e., one regime), the modified log likelihood function under the alternative hypothesis (i.e., two regimes), and some assumptions related to these three aspects. 4 2.1 The model Consider the following state space representation of a dynamic linear model with switching in both transition and measurement equations: xt = Gst + Fst xt−1 + ut , (2.1) yt = Hs0 t xt + A0st zt , (2.2) ut ∼ N (0, Qst ) . (2.3) The transition equation (2.1) describes the dynamics of the unobserved state vector xt as a function of a J × 1 vector of shocks ut and xt−1 . The measurement equation (2.2) describes the evolution of an observed scalar time series as a function of xt and a K × 1 vector of weakly exogenous variables zt . The measurement error, normally included in (2.2), is treated as a latent variable in xt . Fst is of dimension J × J, Gst is of dimension J × 1, Hst is of dimension J × 1, and Ast is of dimension K × 1. Qst is a positive semidefinite symmetric matrix of dimension J × J. The subscripts in Fst , Gst , Hst , Ast , and Qst imply that some of the parameters in these matrices are dependent on an unobserved binary variable st whose value determines the regime at time t. The regimes are Markovian, i.e., p(st = 1|st−1 = 1) = p and p(st = 2|st−1 = 2) = q. The resulting stationary (or invariant) probability for st = 1 is given by ξ∗ (p, q) = 1−q . 2−p−q (2.4) In the subsequent analysis, ξ∗ (p, q) is abbreviated as ξ∗ . Because this paper seeks to test regime switching in state space models based on the likelihood ratio principle, subsections 2.2-2.3 will focus on constructing the likelihood function under the two regimes specification, and the likelihood function under one regime specification is a by-product of the standard Kalman filter. 2.2 Modified Kalman filter When constructing the likelihood function for a general state space model with regime switching, each iteration of the Kalman filter produces a two-fold increase in the number of cases to consider under a two regimes specification, as noted by Gordon and Smith (1988) and Harrison and Stevens (1976). This means there can be more than 1000 components in the likelihood function for a sample of size 5 T = 10. This makes studying this likelihood function and its expansion infeasible. Therefore, an approximation is considered here to “collapse” the filtered states when st = 1 and st = 2 to a single filtered state at each t, as in Gordon and Smith (1988). Define the information set at time t − 1 as 0 Ωt−1 = σ-field{..., zt−1 , yt−2 , zt0 , yt−1 }. Suppose the model parameters are known. The modified Kalman filter algorithm, conditional on st = i, is given by: (i) (2.5) (i) (2.6) xt|t−1 := Gi + Fi xt−1|t−1 , Pt|t−1 := Fi Pt−1|t−1 Fi0 + Qi , (i) (i) µt|t−1 := yt − Hi0 xt|t−1 − A0i zt , (i) (2.7) (i) Ct|t−1 := Hi0 Pt|t−1 H i , (i) (i) (2.8) (i) (i) (i) xt|t := xt|t−1 + Pt|t−1 Hi [Ct|t−1 ]−1 µt|t−1 , (i) (i) (i) (2.9) (i) Pt|t := (I − Pt|t−1 Hi [Ct|t−1 ]−1 Hi0 )Pt|t−1 , (2.10) (i) where xt−1|t−1 is an estimate of xt−1 based on information up to time t − 1; xt|t−1 is an estimate of xt (i) based on information up to time t − 1 given st = i; Pt|t−1 is an estimate of the mean squared error of (i) (i) xt|t−1 ; µt|t−1 estimates the conditional forecast error of yt based on information up to time t − 1 given (i) (i) st = i; and Ct|t−1 estimates the conditional variance of the forecast error µt|t−1 . (1) Let ξt|t be an estimate of P r(st = 1|Ωt ). The “collapse” step combines the two filtered states xt|t (2) and xt|t into a single estimate of xt based on Ωt by (1) (2) xt|t := ξt|t xt|t + (1 − ξt|t )xt|t . (2.11) Then, the mean squared error of xt|t can be computed as: (1) (2) (1) (2) Pt|t := ξt|t Pt|t + (1 − ξt|t )Pt|t + ξt|t (1 − ξt|t ) xt|t − xt|t (1) (2) 0 xt|t − xt|t . (2.12) At the end of each iteration, equations (2.11) and (2.12) are employed to collapse the two filtered states into one filtered state xt|t and calculate the mean squared error of xt|t , i.e. Pt|t . 6 2.3 Modified Markov switching filter To complete the modified Kalman filter, we need to calculate ξt|t for t = 1, 2, ..., T . The calculation of ξt|t is based on the Markov regime switching filter introduced in Hamilton (1989) and conducted in three steps. 1. At the beginning of the t-th iteration, given ξt−1|t−1 , we have ξt|t−1 := pξt−1|t−1 + (1 − q)(1 − ξt−1|t−1 ). (2.13) 2. An estimate of the density of yt is obtained by f (yt |Ωt−1 ) := ξt|t−1 f (yt |st = 1, Ωt−1 ) + (1 − ξt|t−1 )f (yt |st = 2, Ωt−1 ), where the conditional density satisfies h (i) f (yt |st = i, Ωt−1 ) := 2πCt|t−1 i−1/2 h µ(i) exp − (i) i2 t|t−1 (i) 2Ct|t−1 , (i = 1, 2) (2.14) (i) where µt|t−1 and Ct|t−1 are given in (2.7) and (2.8). 3. Once yt is observed, we can update the modified conditional regime probability ξt|t pξt−1|t−1 + (1 − q)(1 − ξt−1|t−1 ) f (yt |st = 1, Ωt−1 ) := . f (yt |st = 2, Ωt−1 ) + pξt−1|t−1 + (1 − q)(1 − ξt−1|t−1 ) [f (yt |st = 1, Ωt−1 ) − f (yt |st = 2, Ωt−1 )] (2.15) Figure 1 presents a flowchart for the filter described in subsections 2.2-2.3. The modified log likelihood function under the two regimes specification, i.e. PT t=1 log [f (yt |Ωt−1 )], is given as a by-product of the filter. The initial values ξ0|0 , x0|0 and P0|0 will be discussed in section 4. 2.4 Hypotheses Let δ represent parameters that are affected by regime switching, taking a value of δ1 in regime 1 and δ2 in regime 2. Let β represent parameters that remain constant across the regimes. Then, for any 7 Figure 1: Flowchart for the filter: state space models with regime switching 8 prespecified 0 < p, q < 1, the modified log likelihood function is given by LA (p, q, β, δ1 , δ2 ) = T X n o log f1t (p, q, β, δ1 , δ2 )ξt|t−1 (p, q, β, δ1 , δ2 ) + f2t (p, q, β, δ1 , δ2 )(1 − ξt|t−1 (p, q, β, δ1 , δ2 )) , (2.16) t=1 where fit (p, q, β, δ1 , δ2 ) = f (yt |st = i, Ωt−1 ), which is defined in (2.14). When δ1 = δ2 = δ, the modified log likelihood function reduces to LN (β, δ) = T X := log f1t (p, q, β, δ, δ) t=1 T X (2.17) log ft (β, δ), t=1 which can be computed using the standard Kalman filter. This paper studies a test statistic based on (2.17) and (2.16) for the one regime specification versus the two regimes specification. To start, I impose the following restrictions on the DGP and the parameter space, following Assumption 1-3 in Qu and Zhuo (2015). Assumption 1. (i) The random vector (zt0 , yt ) is strict stationary, ergodic and β-mixing with the mixing coefficient βτ satisfying βτ ≤ cρτ for some c > 0 and ρ ∈ [0, 1). (ii) Under the null hypothesis, yt is generated by f (·|Ωt−1 ; β∗ , δ∗ ) where β∗ and δ∗ are interior points of Θ ⊂ Rnβ and ∆ ⊂ Rnδ with Θ and ∆ being compact. h i Assumption 2. Under the null hypothesis: (i) (β∗ , δ∗ ) uniquely solves max(β,δ)∈Θ×∆ E LN (β, δ) ; h i (ii) for any 0 < p, q < 1, (β∗ , δ∗ , δ∗ ) uniquely solves max(β,δ1 ,δ2 )∈Θ×∆×∆ E LA (β, δ1 , δ2 ) . h i Assumption 3. Under the null hypothesis, we have: (i) T −1 LN (β, δ) − ELN (β, δ) = op (1) holds uniformly over (β, δ) ∈ Θ × ∆, with T −1 PT t=1 ∇(β 0 ,δ0 )0 log ft (β, δ) ∇(β 0 ,δ0 ) log ft (β, δ) being positive definite over an open neighborhood of (β∗ , δ∗ ) for sufficiently large T ; (ii) for any 0 < p, q < 1, h i T −1 LA (β, δ1 , δ2 ) − ELA (β, δ1 , δ2 ) = op (1) holds uniformly over (β, δ1 , δ2 ) ∈ Θ × ∆ × ∆. 9 Using the above notation, the null and alternative hypotheses can be more formally stated as: H0 : δ1 = δ2 = δ∗ for some unknown δ∗ , H1 : (δ1 , δ2 ) = (δ1∗ , δ2∗ ) for some unknown δ1∗ 6= δ2∗ and (p, q) ∈ (0, 1) × (0, 1). For the remainder of this paper, I use an ARM A(K, L) model to illustrate the main results. The illustrative model. Let us consider a general ARM A(K, L) model: mt = K X φk,st mt−k + εt + k=1 L X θl,st εt−l , εt ∼ i.i.d. N (0, σs2t ), (2.18) l=1 yt = αst + mt , (2.19) where only yt is observable but not mt or st . In this illustrative model, some or all of the model parameters can be affected by st . This ARM A model can be written into the setting in (2.1)-(2.3) as follows: Define nr = max{K, L + 1}. Interpret φj,st = 0 for j > K and θj,st = 0 for j > L. Let Fst = φ2,st ··· φnr −1,st 1 0 ··· 0 0 0 .. . 1 .. . ··· ··· 0 .. . 0 .. . 0 0 ··· 1 0 φ1,st 2 σst εt ut = 0 .. . 0 φnr ,st 0 ∼ N (0, Qs ) , Qs = t t . .. 0 Hst = 1 θ1,st .. . θnr −1,st , Gst = 0, 0 ··· 0 0 ··· .. . ··· 0 .. . 0 ··· 0 , As = αs , and zt = 1. t t 10 , 3 The test statistic This section proposes a test statistic based on the MLR. Let β̃ and δ̃ denote the maximizer of the log likelihood function under null hypothesis: (β̃, δ̃) = arg max LN (β, δ). (3.1) β,δ The MLR evaluated at some 0 < p, q < 1 then equals A N M LR(p, q) = 2 max L (p, q, β, δ1 , δ2 ) − L (β̃, δ̃) . β,δ1 ,δ2 (3.2) It is natural to consider the following test statistic: SupM LR(Λ ) = sup M LR(p, q), (p,q)∈Λ where Λ is a compact set to be specified later. Similar test statistics have been studied by Hansen (1992), Garcia (1998) and Qu and Zhuo (2015). MLR under prespecified p and q 4 This section studies the MLR under a given (p, q) ∈ Λ . The choice of Λ will be discussed in the next section. 4.1 Conditional regime probability Let us first study the conditional regime probability ξt+1|t (p, q, β, δ1 , δ2 ) as well as its derivatives with respect to β, δ1 and δ2 because the results will be needed to develop the expansion of the modified log likelihood function. Combining equations (2.13) and (2.15) gives a recursive formula to calculate ξt+1|t (p, q, β, δ1 , δ2 ): ξt+1|t (p, q, β, δ1 , δ2 ) =p + (p + q − 1) f2t (p, q, β, δ1 , δ2 )(ξt|t−1 (p, q, β, δ1 , δ2 ) − 1) , f1t (p, q, β, δ1 , δ2 )ξt|t−1 (p, q, β, δ1 , δ2 ) + f2t (p, q, β, δ1 , δ2 )(1 − ξt|t−1 (p, q, β, δ1 , δ2 )) (4.1) 11 where fit (p, q, β, δ1 , δ2 ) = f (yt |st = i, Ωt−1 ) (4.2) as in (2.14). This recursive formula implies that the derivatives of ξt+1|t with respect to the model parameters must also follow first order difference equations. Because the asymptotic expansions are considered around the estimates under the null hypothesis, it is sufficient to analyze ξt+1|t (p, q, β, δ1 , δ2 ) and its derivatives at δ1 = δ2 = δ for an arbitrary value of δ in ∆. Let θ = (β 0 , δ10 , δ20 )0 be an augmented parameter vector. We then three sets of integers (they index the elements in β, δ1 and δ2 , respectively): I0 = {1, ..., nβ }, I1 = {nβ + 1, ..., nβ + nδ }, I2 = {nβ + nδ + 1, ..., nβ + 2nδ }. Let “ḡ” denote that g(β, δ1 , δ2 ) is evaluated at some β and δ1 = δ2 = δ, i.e., ξ¯t+1|t and f¯t denote that ξt+1|t (p, q, β, δ1 , δ2 ) and f1t (p, q, β, δ1 , δ2 ) (or f2t (p, q, β, δ1 , δ2 )) are evaluated at some β and δ1 = δ2 = δ. Let ∇θj1 ...∇θjk ξ¯t|t−1 , ∇θj1 ...∇θjk f¯1t and ∇θj1 ...∇θjk f¯2t denote the k-th order derivatives of ξt|t−1 (p, q, β, δ1 , δ2 ), f1t (p, q, β, δ1 , δ2 ) and f2t (p, q, β, δ1 , δ2 ) with respect to the (j1 , ..., jk )-th elements of θ, evaluated at some β and δ1 = δ2 = δ. Also let “F̄ ” denote the matrix Fi (i = 1 or 2) evaluated at some β and δ1 = δ2 = δ and ∇θj1 ...∇θjk F̄i denote the k-th order derivatives of the parameter matrix Fi evaluated at some β and δ1 = δ2 = δ. By definition, the following relationships hold: ∇θj1 ...∇θjk f¯1t = ∇θj1 ...∇θjk f¯2t if j1 , ..., jk all belong to I0 . The next lemma is parallel to Lemma 1 in Qu and Zhuo (2015), which characterizes the properties of ξt+1|t (p, q, β, δ1 , δ2 ) and its derivatives when δ1 = δ2 = δ. Lemma 1. Let ξ0|0 = ξ∗ , ρ = p + q − 1 and r = ρξ∗ (1 − ξ∗ ) with ξ∗ defined in (2.4). Then, for t ≥ 1, we have: 1. ξ¯t+1|t = ξ∗ . 2. ∇θj ξ¯t+1|t = ρ∇θj ξ¯t|t−1 + Ēj,t , where ∇θj f¯1t ∇θj f¯2t =r − , f¯t f¯t " Ēj,t # with j ∈ {I0 , I1 , I2 }. 3. ∇θj ∇θk ξ¯t+1|t = ρ∇θj ∇θk ξ¯t|t−1 + Ējk,t , where Ējk,t are given by (Let (Ia ,Ib ) denote the situation with j ∈ Ia and k ∈ Ib ; a, b = 0, 1, 2,): 12 (I0 , I0 ) : 0 (I0 , I1 ) or (I0 , I2 ) : r ∇θj ∇θk f¯1t f¯t ∇θj ∇θk f¯2t f¯t − + ∇θj f¯2t ∇θ f¯2t k f¯t f¯t − ∇θj f¯2t ∇θ f¯1t k f¯t f¯t (I1 , I1 ) or (I1 , I2 ) or (I2 , I2 ) : ∇θj f¯1t ∇θj f¯2t ∇θk f¯1t ∇θk f¯2t − ∇θk ξ¯t|t−1 + − ∇θj ξ¯t|t−1 ¯ ¯ ¯ ft ft ft f¯t ∇θj f¯1t ∇θk f¯1t ∇θj f¯2t ∇θk f¯2t ∇θj f¯1t ∇θk f¯2t ∇θj f¯2t ∇θk f¯1t − 2r ξ∗ − (1 − ξ∗ ) + r(2ξ∗ − 1) + . f¯t f¯t f¯t f¯t f¯t f¯t f¯t f¯t r 4. ∇θj ∇θk f¯1t ∇θj ∇θk f¯2t − ¯ ft f¯t + ρ(1 − 2ξ∗ ) ∇θj ∇θk ∇θk ξ¯t+1|t = ρ∇θj ∇θk ∇θk ξ¯t|t−1 + Ējkl,t , where Ējkl,t are given in the appendix with j, k, l ∈ {Ia , Ib , Ic } and a, b, c = 0, 1, 2. I now discuss the first order derivatives of fit appearing in the lemma. By (2.5), (2.6), (2.7), (2.8), (2.14), and (4.2), we have: h fit = 2πHi0 Fi Pt−1|t−1 Fi0 + Qi Hi i−1/2 h i2 yt − Hi0 Gi + Fi xt−1|t−1 − A0i zt . exp − 2Hi0 Fi Pt−1|t−1 Fi0 + Qi Hi The first order derivative of fit with respect to the j-th component in θ is as follows: i h yt − Hi0 Gi + Fi xt−1|t−1 − A0i zt ∇θj fit = fit H0 F P F0 + Q H i n i t−1|t−1 i i (4.3) i × ∇θj Hi0 Gi + Fi xt−1|t−1 + ∇θj A0i zt + Hi0 ∇θj Gi + ∇θj Fi xt−1|t−1 + Fi ∇θj xt−1|t−1 o i2 h 0 G +F x 0z y − H − A t i i t t−1|t−1 i i 1 + fit −1 2Hi0 Fi Pt−1|t−1 Fi0 + Qi Hi Hi0 Fi Pt−1|t−1 Fi0 + Qi Hi n × ∇θj Hi0 Fi Pt−1|t−1 Fi0 + Qi Hi + Hi0 Fi Pt−1|t−1 Fi0 + Qi ∇θj Hi o +Hi0 ∇θj Fi Pt−1|t−1 Fi0 + Fi ∇θj Pt−1|t−1 Fi0 + Fi Pt−1|t−1 ∇θj Fi0 + ∇θj Qi Hi . The exact expressions for the second order derivatives of fit are included in the appendix. The properties of xt|t and Pt|t and their derivatives will be studied in the next subsection (see Lemma 3 13 below). Note that, for some β and δ1 = δ2 , ∇θj f¯1t − ∇θj f¯2t i h yt − H̄ 0 Ḡ + F̄ x̄t−1|t−1 − Ā0 zt = f¯t H̄ 0 F̄ P̄ F̄ 0 + Q̄ H̄ t−1|t−1 × n ∇θj H̄10 − ∇θj H̄20 h +H̄ 0 Ḡ + F̄ x̄t−1|t−1 + ∇θj Ā01 − ∇θj Ā02 zt ∇θj Ḡ1 − ∇θj Ḡ2 + ∇θj F̄1 − ∇θj F̄2 x̄t−1|t−1 io i2 h 0 Ḡ + F̄ x̄ 0z y − H̄ − Ā t t t−1|t−1 1 −1 + f¯t 2H̄ 0 F̄ P̄t−1|t−1 F̄ 0 + Q̄ H̄ H̄ 0 F̄ P̄t−1|t−1 F̄ 0 + Q̄ H̄ × n +H̄ 0 ∇θj H̄10 − ∇θj H̄20 h F̄ P̄t−1|t−1 F̄ 0 + Q̄ H̄ + H̄ 0 F̄ P̄t−1|t−1 F̄ 0 + Q̄ ∇θj H̄1 − ∇θj H̄2 ∇θj F̄1 − ∇θj F̄2 P̄t−1|t−1 F̄ 0 + F̄ P̄t−1|t−1 ∇θj F̄10 − ∇θj F̄20 + ∇θj Q̄1 − ∇θj Q̄2 i o H̄ , in which the ∇θj x̄t−1|t−1 and ∇θj P̄t−1|t−1 terms are canceled out. Consequently, these two quantities are not needed for calculating ∇θj ξ¯t+1|t . Similarly, ∇θj ∇θk x̄t−1|t−1 and ∇θj ∇θk P̄t−1|t−1 are not needed for calculating ∇θj ∇θk f¯1t − ∇θj ∇θk f¯2t . This is also true for higher order derivatives of (f1t − f2t ) when evaluated at δ1 = δ2 . I now use an example to illustrate Lemma 1. The illustrative model (cont’d). Consider the illustrative example (2.18)-(2.19) and assume that both αst and σs2t are affected by regime switching. Lemma 1 implies that i 1 h 0 y − ᾱ − H̄ F̄ x̄ t t−1|t−1 , σ̄ 2 2 0 1 yt − ᾱ − H̄ F̄ x̄t−1|t−1 − 1 . = ρ∇σ2 ξ¯t|t−1 + r 2 1 2σ̄ σ̄ 2 ∇α1 ξ¯t+1|t = ρ∇α1 ξ¯t|t−1 + r ∇σ2 ξ¯t+1|t 1 Because the filter described in subsections 2.2-2.3 reduces to the standard Kalman filter when δ1 = δ2 , ∇α1 ξ¯t+1|t and ∇σ2 ξ¯t+1|t both reduce to stationary AR(1) processes with mean zero when evaluated at 1 the true parameter values under the null hypothesis. Their variances are finite and satisfy E(∇α1 ξ¯t+1|t )2 = r2 (1 − ρ2 )σ∗2 and E(∇σ2 ξ¯t+1|t )2 = 1 where σ∗2 denotes the true value of σs2t under the null hypothesis. 14 r2 , 2(1 − ρ2 )σ∗4 4.2 Filtered state and its mean squared error Since both the filtered state in (2.11) and its mean squared error in (2.12) are components in the modified log likelihood function, it is important to study these functions and their derivatives with respect to β, δ1 and δ2 . As in the previous subsection, it is sufficient to study these quantities when δ1 = δ2 . Under Assumptions 1-3 and when δ1 = δ2 , xt in (2.1) is stationary. The unconditional mean of xt can be employed as the initial value, denoted by x0|0 . The unconditional mean of xt satisfies E(xt ) = Ḡ + F̄ E(xt−1 ). This implies x0|0 = (I − F̄ )−1 Ḡ. The next lemma provides the initial value for Pt|t when δ1 = δ2 , denoted by P0|0 . Its results are also used later to study the properties of xt|t and Pt|t . Lemma 2. Let F̄ have all it eigenvalues inside the unit circle. Set P0|0 = P̄∗ , where P̄∗ solves 0 h P̄∗ = I − F̄ P̄∗ F̄ + Q̄ H̄ H̄ 0 0 F̄ P̄∗ F̄ + Q̄ H̄ i−1 H̄ 0 F̄ P̄∗ F̄ 0 + Q̄ . (4.4) Then, under the null hypothesis and Assumption 1-3, (i) P̄t|t = P̄∗ , P̄t|t = P̄∗ and P̄t|t−1 = F̄ P̄∗ F̄ 0 + Q̄, for all t = 1, ..., T . In the subsequent analysis, P̄t|t−1 is abbreviated as P̄ . The following assumption, which is analogous to Proposition 13.2 in Hamilton (1994), ensures that P̄∗ and P̄ are unique. Assumption 4. The eigenvalues of P̄ H̄ H̄ 0 I− 0 H̄ P̄ H̄ ! F̄ are all inside the unit circle. I use the ARM A model in (2.18)-(2.19) to illustrate this assumption. The illustrative model (cont’d.) Consider the model in (2.18)-(2.19). Then, P̄∗ and P̄ are equal 15 to 0 and Q̄ respectively. The quantity in Assumption 4 is given by: −θ̄1 1 ! P̄ H̄ H̄ 0 I− 0 F̄ = 0 H̄ P̄ H̄ 0 −θ̄2 · · · 0 −θ̄nr−1 0 , 0 0 ··· 0 1 ··· 0 0 .. . 0 0 .. . ··· 0 1 0 where θ̄j denotes that the parameter θj,st is evaluated at some β and δ1 = δ2 = δ. For the ARM A(1, 1) model, Assumption 4 is equivalent to −1 < θ̄1 < 1. The next lemma contains the details on the first and second order derivatives of the filtered state and its mean squared error when evaluated at δ1 = δ2 = δ. Lemma 3. Under the null hypothesis and Assumptions 1-4, we have: 1. For any j ∈ Ia , a = 0, 1, 2, " P̄ H̄ H̄ 0 I− 0 H̄ P̄ H̄ vec ∇θj P̄t|t = #⊗2 ! vec ∇θj P̄t−1|t−1 + vec P̄j,t , F̄ where P̄j,t = ξ∗ P̄ H̄ H̄ 0 I− H̄ 0 P̄ H̄ 0 ∇θj F̄1 P̄∗ F̄ + 0 P̄ H̄ H̄ + (1 − ξ∗ ) I − H̄ 0 P̄ H̄ − ξ∗ 0 ∇θj F̄2 P̄∗ F̄ + P̄ ∇θj H̄1 H̄ 0 + H̄∇θj H̄10 − (1 − ξ∗ ) F̄ P̄∗ ∇θj F̄10 H̄ 0 P̄ H̄ − P̄ ∇θj H̄2 H̄ 0 + H̄∇θj H̄20 + ∇θj Q1 F̄ P̄∗ ∇θj F̄20 P̄ H̄ H̄ 0 H̄ 0 P̄ H̄ − H̄ H̄ 0 P̄ I− H̄ 0 P̄ H̄ + ∇θj Q2 H̄ H̄ 0 P̄ I− H̄ 0 P̄ H̄ ∇θj H̄10 P̄ H̄ + H̄ 0 P̄ ∇θj H̄1 H̄ 0 P̄ H̄ P̄ H̄ H̄ 0 ! F̄ P̄∗ F̄ 0 + Q̄ 2 ∇θj H̄20 P̄ H̄ + H̄ 0 P̄ ∇θj H̄2 H̄ 0 P̄ H̄ 2 ! F̄ P̄∗ F̄ 0 + Q̄ . 2. For any j ∈ Ia , a = 0, 1, 2, " ∇θj x̄t|t = P̄ H̄ H̄ 0 I− 0 H̄ P̄ H̄ ! # F̄ ∇θj x̄t−1|t−1 + X̄j,t , where the expression of X̄j,t are given in the appendix. 3. For any j ∈ Ia and k ∈ Ib , a, b = 0, 1, 2, vec ∇θj ∇θk P̄t|t = " P̄ H̄ H̄ 0 I− 0 H̄ P̄ H̄ #⊗2 ! vec ∇θj ∇θk P̄t−1|t−1 + vec P̄jk,t , F̄ 16 where the expression of P̄jk,t are given in the appendix. 4. For any j ∈ Ia and k ∈ Ib , a, b = 0, 1, 2, " ! P̄ H̄ H̄ 0 I− 0 H̄ P̄ H̄ ∇θj ∇θk x̄t|t = # F̄ ∇θj ∇θk x̄t−1|t−1 + X̄jk,t , where the expression of X̄jk,t are given in the appendix. Lemma 3 shows that the first and second order derivatives of the filtered state and its mean squared error all follow first order linear difference equations and the lagged coefficient matrices for h i them always include I − P̄ H̄(H̄ 0 P̄ H̄)−1 H̄ 0 F̄ . The recursive structures implied by Lemma 3 suggest that we apply a similar strategy to analyze xt|t and Pt|t , as we are studying the properties of the higher order derivatives of ξt+1|t . The following example illustrates the results in Lemma 3 with an ARM A(1, 1) model. The illustrative model (cont’d.) Consider the ARM A(1, 1) model in (2.18)-(2.19) and assume only αst switches. Lemma 3 implies that P̄∗ and P̄ are equal to 0 and Q̄, respectively. Meanwhile, further calculations show that ∇θj P̄t|t = 0 for j ∈ {1, ..., nβ + 2} and ∇θj x̄t|t = 0 0 0 − 1+ξ∗θ̄ 1 − 1−ξ∗ 1+θ̄1 j ∈ {1, ..., nβ }, 0 j = nβ + 1, 0 0 0 j = nβ + 2. The second order derivatives of Pt|t and xt|t , with respect to α1 , satisfy ∇2α1 P̄t|t = 2ξ∗ (1 − ξ∗ ) 1 1 − θ̄12 ! −θ̄1 1 −θ̄1 1 and −θ̄1 0 ∇2α1 x̄t|t = 1 0 2 ∇α1 x̄t−1|t−1 − 2∇α1 ξ¯t|t 0 + −φ̄1 θ̄1 φ̄1 0 1 0 yt − ᾱ − H̄ 0 F̄ x̄t−1|t−1 σ̄ 2 17 ! 2ξ∗ (1 − ξ∗ ), where ∇α1 f¯1t ∇α1 f¯2t ∇α1 ξ¯t|t = ρ∇α1 ξ¯t−1|t−1 + (1 − ξ∗ )ξ∗ − . f¯t f¯t " # In this case, when δ1 = δ2 , ∇2α1 P̄t|t is a constant matrix, while ∇2α1 x̄t|t depends on ∇2α1 x̄t−1|t−1 , ∇α1 ξ¯t|t , and the prediction error, yt − ᾱ − H̄ 0 F̄ x̄t−1|t−1 , at time t. 4.3 Modified log likelihood function and its expansion Because of the multiple local maxima in LA (p, q, β, δ1 , δ2 ), it is difficult to directly expand this function around the null estimates (β̃, δ̃, δ̃). Both Cho and White (2007) and Qu and Zhuo (2015) suggest to work with the concentrated likelihood function. To derive the concentrated likelihood function, β and δ1 are treated as functions of δ2 and the dependence between (β, δ1 ) and δ2 is quantified using the first order conditions that define the concentrated and modified log likelihood function (see Lemma A.3 in the appendix). This effectively removes β and δ1 from the subsequent analysis and allows us to work with the concentrated, modified log likelihood function, which is only a function of δ2 . Therefore, we can expand the concentrated, modified log likelihood function around δ2 = δ̃ (see Lemma 4 below) to obtain an approximation for M LR(p, q). For any δ2 ∈ ∆, we can write L(p, q, δ2 ) = max LA (p, q, β, δ1 , δ2 ) β,δ1 and β̂(δ2 ), δ̂1 (δ2 ) = arg max LA (p, q, β, δ1 , δ2 ). β,δ1 Then, M LR(p, q) = 2 max[L(p, q, δ2 ) − L(p, q, δ̃)]. δ2 (k) For k ≥ 1, let Li1 ...ik (p, q, δ2 ) (i1 , ...ik ∈ {1, ..., nδ }) denote the k-th order derivative of L(p, q, δ2 ) with respect to the (i1 , ...ik )-th elements of δ2 . Let dj (j ∈ {1, ..., nδ }) denote the j-th element of 18 (δ2 − δ̃). Then, a fourth order Taylor expansion of L(p, q, δ2 ) around δ̃ is given by L(p, q, δ2 ) − L(p, q, δ̃) = + nδ X (1) n Lj (p, q, δ̃)dj + j=1 nδ nδ X nδ X X n δ δ X 1 X (2) L (p, q, δ̃)dj dk 2! j=1 k=1 jk (4.5) 1 (3) L (p, q, δ̃)dj dk dl 3! j=1 k=1 l=1 jkl n n n n δ δ X δ X δ X 1 X (4) L (p, q, δ̄)dj dk dl dm , + 4! j=1 k=1 l=1 m=1 jklm where in the last term δ̄ is a value that lies between δ2 and δ̃. Two lemmas will be provided to analyze this expansion. Here, two more assumptions, which are similar to Assumptions 4 and 5 in Qu and Zhuo (2015), are needed. Assumption 5. There exists an open neighborhood of (β∗ , δ∗ ), denoted by B(β∗ , δ∗ ), and a sequence of positive, strictly stationary and ergodic random variables {υt } satisfying Eυt1+c < L < ∞ for some c > 0, such that α(k) ∇θ ...∇θ ft (β, δ1 ) k ik i1 sup < υt f (β, δ ) t 1 (β,δ1 )∈B(β∗ ,δ∗ ) for all i1 , ..., ik ∈ {1, ..., nβ + nδ } , where 1 ≤ k ≤ 5; α(k) = 6 if k = 1, 2, 3 and α(k) = 5 if k = 4, 5. (5) Assumption 6. There exists η > 0, such that supp,q∈[,1−] sup|δ−δ̃|<η T −1 |Ljklmn (p, q, δ)| = Op (1) for all j, k, l, m, n ∈ {1, ..., nδ }, where is an arbitrary small constant satisfying 0 < < 1/2. The next lemma characterizes the derivatives of β̂(δ2 ) and δ̂1 (δ2 ) with respect to δ2 evaluated at δ2 = δ̃. To shorten the expressions, let ξ˜t+1|t and f˜t denote ξt+1|t (p, q, β, δ1 , δ2 ) and ft (β, δ1 ) evaluated at (β, δ1 , δ2 ) = (β̃, δ̃, δ̃). Also, let ∇δ1i1 ...∇δ1ik ξ˜t|t−1 and ∇δ1i1 ...∇δ1ik f˜1t denote the k-th order derivative of ξt+1|t (p, q, β, δ1 , δ2 ) and ft (β, δ1 ) with respect to the (i1 , ..., ik )-th elements of δ1 , evaluated at (β, δ1 , δ2 ) = (β̃, δ̃, δ̃). Finally, define # " # 2 " ∇δ1j ∇δ1k f˜1t ∇δ1j ∇δ1k f˜2t ∇δ2j ∇δ2k f˜1t ∇δ2j ∇δ2k f˜2t Ũjk,t = ξ∗ + (1 − ξ∗ ) + ξ∗ + (1 − ξ∗ ) f˜t f˜t f˜t f˜t # " ∇δ2j ∇δ1k f˜1t ∇δ2j ∇δ1k f˜2t ∇δ1j ∇δ2k f˜1t ∇δ1j ∇δ2k f˜2t 1 − ξ∗ − ξ∗ + (1 − ξ∗ ) + ξ∗ + (1 − ξ∗ ) ξ∗ f˜t f˜t f˜t f˜t " ! # ∇θj f¯1t ∇θj f¯2t 1 ∇θk f¯1t ∇θk f¯2t + 2 ∇δ1j ξ˜t|t−1 ∇δ1k ξ˜t|t−1 , − + − (4.6) ξ∗ f¯t f¯t f¯t f¯t 1 − ξ∗ ξ∗ 19 and ξ∗ ∇(β 0 ,δ0 )0 f˜1t + (1 − ξ∗ )∇(β 0 ,δ0 )0 f˜2t 2 Ũjk,t , ˜ ft ξ∗ ∇(β 0 ,δ0 )0 f˜1t + (1 − ξ∗ )∇(β 0 ,δ0 )0 f˜2t ξ∗ ∇(β 0 ,δ0 ) f˜1t + (1 − ξ∗ )∇(β 0 ,δ0 ) f˜2t 1 2 1 2 , I˜t = ˜ ˜ ft ft D̃jk,t = 1 Ṽjklm = T −1 T X Ũjk,t Ũlm,t , D̃lm = T −1 t=1 T X D̃lm,t , I˜ = T −1 t=1 T X I˜t . (4.7) t=1 (2) As will be seen, Ũjk,t is the leading term in Ljk (p, q, δ̃), while D̃jk,t and I˜t appear when constructing (4) the leading term of Ljklm (p, q, δ̃). The next lemma, analogous to Lemma 3 in Qu and Zhuo (2015), (k) presents the properties of Li1 ...ik (p, q, δ2 ) when δ2 is evaluated at the null estimate, i.e. δ̃. Lemma 4. Under the null hypothesis and Assumptions 1-6, for all j, k, l, m ∈ {1, ..., nδ }, we have (1) 1. Lj (p, q, δ̃) = 0. (2) PT 2. T −1/2 Ljk (p, q, δ̃) = T −1/2 t=1 Ũjk,t (3) + op (1). 3. T −3/4 Ljkl (p, q, δ̃) = Op T −1/4 . (4) 0 ˜−1 D̃ + Ṽ 0 ˜−1 D̃ 0 I˜−1 D̃ 4. T −1 Ljklm (p, q, δ̄) = −{Ṽjklm − D̃jk km } + op (1). lm + Ṽjmkl − D̃jm I kl jlkm − D̃jl I The illustrative model (cont’d.) We use the ARM A(1, 1) model to illustrate the leading terms (2) (4) of T −1/2 Ljk (p, q, δ̃) and T −1 Ljklm (p, q, δ̃) in Lemma 4. Suppose only αst switches. Then, Ũjk,t and D̃jk,t equal, respectively, 1 − ξ∗ ξ∗ ( 1 + φ̃21 + 2φ̃1 θ̃1 1 − θ̃12 −2(φ̃1 + θ̃1 ) t−1 X ! (−θ̃1 )j−1 1 σ̃ 2 t−j X ! µ̃2t µ̃t −1 +2 σ̃ 2 σ̃ 2 ρs−1 s=1 j=1 µ̃t−s σ̃ 2 "X t−1 s ρ s=1 + φ̃1 θ̃1 µ̃t−s σ̃ 2 # µ̃t−j σ̃ 2 and (yt−1 −α̃)µ̃t σ̃ 2 µ̃t µ̃t−1 σ̃ 2 1 2σ̃ 2 µ̃2t σ̃ 2 −1 − σ̃µ̃2t ξ∗ (1−φ̃1 ) 1+θ̃1 where µ̃t denotes the residuals under the null hypothesis and σ̃ 2 = T −1 variance function of T −1/2 PT t=1 Ũjk,t , (2) Ũjk,t , PT 2 t=1 µ̃t . This makes the and therefore of T −1/2 Ljk (p, q, δ̃), consistently estimable. 20 5 Asymptotic approximations (2) Let L(2) (p, q, δ̃) be a square matrix with its (j, k)-th element given by Ljk (p, q, δ̃) for j, k ∈ {1, 2, ..., nδ }. This section includes three sets of results. (1) The weak convergence of T −1/2 L(2) (p, q, δ̃) over ≤ p, q ≤ 1 − . (2) The limiting distribution of SupM LR(Λ ). (3) A finite sample refinement that improves the asymptotic approximation. 5.1 Weak convergence of L(2) (p, q, δ̃) For 0 < pr , qr , ps , qs < 1 and j, k, l, m ∈ {1, 2, ..., nδ }, define 0 ωjklm (pr , qr ; ps , qs ) = Vjklm (pr , qr ; ps , qs ) − Djk (pr , qr )I −1 Dlm (ps , qs ), (5.1) where Vjklm (pr , qr ; ps , qs ) = E [Ujk,t (pr , qr ) Ulm,t (ps , qs )] , Djk (pr , qr ) = EDjk,t (pr , qr ), and I = EIt . Here, Ujk,t (pr , qr ) , Djk,t (pr , qr ) and It have the same definitions as Ũjk,t , D̃jk,t and I˜t in (4.6) and (4.7) but evaluated at (pr , qr , β∗ , δ∗ ) instead of (pr , qr , β̃, δ̃). The following lemma is parallel to Lemma 4 in Qu and Zhuo (2015). Lemma 5. Under the null hypothesis and Assumptions 1-6, we have, over ≤ p, q ≤ 1 − : T −1/2 L(2) (p, q, δ̃) ⇒ G (p, q) , where the elements of G (p, q) are mean zero continuous Gaussian processes satisfying Cov[Gjk (pr , qr ), Glm (ps , qs )] = ωjklm (pr , qr ; ps , qs ) for j,k,l,m ∈ {1,2,...,nδ }, where ωjklm (pr ,qr ; ps ,qs ) is given by (5.1). In the appendix, this lemma is proved by first showing the finite dimensional convergence and then the stochastic equicontinuity. 5.2 Limiting distribution of SupM LR(Λε ) (2) Let E be a set of open balls that includes all possible values of (p, q) such that Ljk (p, q, δ̃) ≡ 0 for any (2) j, k ∈ {1, 2, ..., nδ }. For example, if for some specific j1 and k1 , Lj1 k1 (p1 , q1 , δ̃) ≡ 0, then (p, q) ∈ E if 21 p ∈ (p1 − 1 , p1 + 1 ) and q ∈ (q1 − 1 , q1 + 1 ) for any small 1 , say 1 = 0.01. Define Λ = {(p, q) : ≤ p, q ≤ 1 − , and (p, q) ∈ / E}. (5.2) Let Ω(p, q) be an n2δ -dimensional square matrix whose (j + (k − 1)nδ , l + (m − 1)nδ )-th element is given by ωjklm (p, q; p, q). Then, Lemma 5 implies E[vecG (p, q) vecG (p, q)0 ] = Ω(p, q). The next result, which is analogous to Proposition 2 in Qu and Zhuo (2015), gives the asymptotic distribution of SupM LR(Λ ). Proposition 1. Suppose the null hypothesis and Assumptions 1-6 hold. Then SupM LR(Λ ) ⇒ sup sup W (2) (p, q, η), (5.3) (p,q)∈Λ η∈Rnδ where Λ is given by (5.2) and W (2) (p, q, η) = η ⊗2 0 vecG (p, q) − 1 ⊗2 0 η Ω(p, q) η ⊗2 . 4 Some important features of ωjklm (p, q; p, q) have been shown in section 5.1 of Qu and Zhuo (2015) by simple examples. In the current context, I also observe the similar features of ωjklm (p, q; p, q) that this function depends on: (1) the model’s dynamic properties (e.g., whether the regressors are strictly exogenous or predetermined), (2) which parameters are allowed to switch (e.g., regressions coefficients or the variance of the errors), and (3) whether nuisance parameters are present. 5.3 A refinement Qu and Zhuo (2015) provide a refinement to the asymptotic distribution when L(2) (p, q, δ̃) ≡ 0. In such situations, the magnitude of T −1/2 PT t=1 Ũjk,t can be too small to dominate the higher order terms in the likelihood expansion when p + q is close to 1. This indicates that an asymptotic distribution that relies entirely on T −1/2 PT t=1 Ũjk,t can be inadequate. Motivated by this observation, I consider a refinement to the asymptotic approximation under Markov switching state space models. The following assumption is parallel to Assumption 6 in Qu and Zhuo (2015). Assumption 7. There exists an open neighborhood of (β∗ , δ∗ ), B(β∗ , δ∗ ), and a sequence of pos- 22 itive, strictly stationary and ergodic random variables {υt } satisfying Eυt1+c < ∞ for some c > 0, such that the supremums of the following quantities over B(β∗ , δ∗ ) are bounded from above by υt : 4 2 ∇θi1 ...∇θik ft (β, δ1 ) /ft (β, δ1 ) , ∇θi1 ...∇θim ft (β, δ1 ) /ft (β, δ1 ) , ∇θi1 ...∇θi8 ft (β, δ1 ) /ft (β, δ1 ), ∇θj1 ∇θi1 ...∇θi7 ft (β, δ1 ) /ft (β, δ1 ) , ∇θj1 ∇θj2 ∇θi1 ...∇θi6 ft (β, δ1 ) /ft (β, δ1 ), where k = 1, 2, 3, 4, m = 5, 6, 7, i1 , ..., i8 ∈ {1, ..., nβ + nδ } and j1 , j2 ∈ {1, ..., nβ }. Obtaining all the leading terms in an even higher order expansion of the modified likelihood function is very difficult in the current context. This paper considers incorporating some specific terms for the refinement. Define s̃jkl,t (p, q) = − (1 − ξ∗ )(1 − 2ξ∗ ) ∇δ1j ∇δ1k ∇δ1l f˜1t , ξ∗2 f˜t (5.4) where x̃t|t and P̃t|t are treated as constant when calculating ∇δ1j ∇δ1k ∇δ1l f˜1t here. For j, k, l, m, n, u ∈ (3) {1, ..., nδ }, let Gjkl (p, q) be a continuous Gaussian process with mean zero and satisfy (3) ωjklmnu (pr , qr ; ps , qs ) (3) = Cov(Gjkl (pr , qr ) , G(3) mnu (ps , qs )) = E [sjkl,t (pr , qr )smnu,t (ps , qs )] " −E ξ∗ ∇(β 0 ,δ0 ) f1t + (1 − ξ∗ )∇(β 0 ,δ0 ) f2t 2 1 f˜t " # sjkl,t (pr , qr ) I −1 ξ∗ ∇(β 0 ,δ0 )0 f1t + (1 − ξ∗ )∇(β 0 ,δ0 )0 f2t 2 1 f˜t # smnu,t (ps , qs ) , where sjkl,t (p, q) is the same as s̃jkl,t (p, q) but is evaluated at true parameter values. The other quantities on the right hand side are also evaluated at the true parameter values. For the fourth and eighth order derivatives, define k̃jklm,t (p, q) = (1 − ξ∗ ) 1 + 1 − ξ∗ ξ∗ 3 ! ∇δ1j ∇δ1k ∇δ1l ∇δ1m f˜1t , f˜t (5.5) where x̃t|t and P̃t|t are treated as constant when calculating ∇δ1j ∇δ1k ∇δ1l ∇δ1m f˜1t here. For i1 , ..., i8 ∈ (4) {1, ..., nδ }, let Gi1 i2 i3 i4 (p, q) denote a continuous Gaussian process with mean zero and satisfy (4) ωi1 i2 ...i8 (pr , qr ; ps , qs ) (4) (4) = Cov Gi1 i2 i3 i4 (pr , qr ) , Gi5 i6 i7 i8 (ps , qs ) = E [ki1 i2 i3 i4 ,t (pr , qr ) ki5 i6 i7 i8 ,t (ps , qs )] " −E ξ∗ ∇(β 0 ,δ0 ) f1t + (1 − ξ∗ )∇(β 0 ,δ0 ) f2t 1 2 f˜t # ki1 i2 i3 i4 ,t (pr , qr ) I 23 " −1 ξ∗ ∇(β 0 ,δ0 ) f1t + (1 − ξ∗ )∇(β 0 ,δ0 ) f2t 1 2 f˜t # ki5 i6 i7 i8 ,t (ps , qs ) , where ki1 i2 i3 i4 ,t (p, q) equals k̃i1 i2 i3 i4 ,t (p, q) but evaluated at the true parameter values. The remaining quantities on the right hand side are also evaluated at the true parameter values. The next lemma characterizes the asymptotic properties of s̃jkl,t (p, q) and k̃jklm,t (p, q) when j, k, l, m ∈ {1, ..., nδ }. Lemma 6. Under the null hypothesis and Assumptions 1-7, we have T −1/2 T X (3) s̃jkl,t (p, q) ⇒ Gjkl (p, q) t=1 and T −1/2 T X (4) k̃jklm,t (p, q) ⇒ Gjklm (p, q). t=1 We now incorporate the corresponding terms to obtain a refined approximation. Let G(3) (p, q) (3) be a n3δ - dimensional vector whose (j + (k − 1)nδ + (l − 1)n2δ )-th element is given by Gjkl (p, q). Let Ω(3) (p, q) denote an n3δ − by − n3δ matrix whose (j + (k − 1)nδ + (l − 1)n2δ , m + (n − 1)nδ + (r − 1)n2δ )-th (3) element is given by ωjklmnr (p, q; p, q). Define W (3) (p, q, η) = T −1/4 1 ⊗3 0 (3) 1 ⊗3 0 vecG(3) (p, q) − T −1/2 Ω (p, q) η ⊗3 . η η 3 36 Let G(4) (p, q) be an n4δ - dimensional vector whose (j + (k − 1)nδ + (l − 1)n2δ + (m − 1)n3δ )-th element (4) is given by Gjklm (p, q). Let Ω(4) (p, q) be an n4δ − by − n4δ matrix whose (j + (k − 1)nδ + (l − 1)n2δ + (4) (m − 1)n3δ , n + (r − 1)nδ + (s − 1)n2δ + (u − 1)n3δ )-th element is given by ωjklmnrsu (p, q; p, q). Define W (4) (p, q, η) = T −1/2 1 ⊗4 0 1 ⊗4 0 (4) vecG(4) (p, q) − T −1 Ω (p, q) η ⊗4 . η η 12 576 Then, the distribution of the SupM LR(Λ ) test can be approximated by: S∞ (Λ ) ≡ sup sup n o W (2) (p, q, η) + W (3) (p, q, η) + W (4) (p, q, η) , (5.6) (p,q)∈Λ η∈Rnδ where Λ is specified in (5.2). The following corollary is analogous to Corollary 1 in Qu and Zhuo (2015). Corollary 1. Under Assumptions 1-7 and the null hypothesis, we have: Pr (SupM LR(Λ ) ≤ s) − Pr (S∞ (Λ ) ≤ s) → 0, 24 over Λ in (5.2). Note that the above result holds irrespective of the model. This follows because the additional terms W (3) (p, q, η) and W (4) (p, q, η) both converge to zero as T → ∞. These terms provide refinements in finite samples, having no effect asymptotically. The critical values can be obtained by following the simulation procedures described in section 5.4 of Qu and Zhuo (2015). The illustrative model (cont’d). Consider the ARM A(1, 1) model in (2.18)-(2.19) and assume αst switches. This model can be written as: mt = φmt−1 + et + θet−1 , et ∼ i.i.d. N (0, σ 2 ) yt = αst + mt . To illustrate the effects of the refined approximation, I simulate data using the ARM A(1, 1) model with T = 2000, α1 = α2 = 0, φ = 0.9, θ = −0.70 and σ = 0.2. Then, for each simulated sample and fixed (p, q), I calculate M LR(p, q), the approximation to M LR(p, q) using only the second and fourth order terms in the Taylor expansion, and the approximation to M LR(p, q) using the second order, fourth order and refinement terms in the Taylor expansion. After simulating 500 samples, I calculate both the correlation between M LR(p, q) and its approximation using only the second and fourth order terms, and the correlation between M LR(p, q) and its approximation using the second order, fourth order and refinement terms. Table 1: Comparison of correlations between M LR(p, q) statistic and original and refined approximations Between M LR(p, q) and (p, q) original approximation refined approximation (0.90, 0.90) 0.989 0.994 (0.70, 0.90) 0.217 0.843 (0.50, 0.80) 0.239 0.822 I also check these correlations for different p and q. Results are summarized in Table 1. The results show that including the refinement terms brings the approximation closer to the M LR(p, q) statistic. 25 6 Monte Carlo This section examines the size and power properties of the SupM LR test statistics. The DGP is mt = φmt−1 + et + θet−1 , et ∼ i.i.d. N (0, σ 2 ) yt = αst + mt , where yt is observable and αst switches with p(st = 1|st−1 = 1) = p and p(st = 2|st−1 = 2) = q. Assign φ = 0.9, θ = −0.70 and σ = 0.2. The choice of this DGP is motivated by both the model studied in Perron (1993) and the empirical application in the next section. In this section, Λ is specified as in (5.2) with = 0.01. The critical values and rejection frequencies are all based on 3000 replications. Table 2: Rejection frequencies under the null Level 1.00 2.50 5.00 T = 200 SupM LR(Λ0.01 ) 1.20 3.67 7.40 T = 500 SupM LR(Λ0.01 ) 1.17 2.63 6.33 hypothesis 7.50 10.00 10.20 14.23 9.60 12.70 Table 2 reports the sizes of the SupM LR(Λ ) test statistic at five different nominal levels. Under the null hypothesis, we set α1 = α2 = 0. The rejection frequencies overall are close to the nominal levels with mild over-rejections in some cases. For example, the rejection rates at the 5% and 10% levels are 7.40% and 14.23% respectively, for = 0.01 and sample size T = 200. Similar rejection rates are observed when T = 500. For power properties, I set α1 = −τ and α2 = τ , with τ = 0.05, 0.10, 0.15, 0.20 and 0.25. The sample size T = 500 and the number of replications is 3000. Three pairs of values for (p, q) are considered: (0.70, 0.70), (0.70, 0.90) and (0.90, 0.95). Table 3: Rejection frequencies (p, q) τ (0.70, 0.70) SupM LR(Λ0.01 ) (0.70, 0.90) SupM LR(Λ0.01 ) (0.90, 0.95) SupM LR(Λ0.01 ) Nominal level, 5%. under 0.05 6.33 7.67 5.47 the alternative hypotheses 0.10 0.15 0.20 0.25 6.67 21.00 74.33 99.67 8.67 26.33 85.67 100 6.13 17.70 69.57 97.13 The rejection frequencies at 5% nominal levels are reported in Table 3. The power of the SupM LR statistic increases consistently as the magnitude of |α1 − α2 | increases. 26 7 Application In this section, I apply the MLR test developed in the preceding sections to study the changes in monthly U.S. unemployment rates. The data are from the labor force statistics reported in the Current Population Survey. The full sample is from January 1960 to July 2015. A simple ARM A(1, 1) model as in (2.18) and (2.19) is considered, i.e. mt = φmt−1 + et + θet−1 , et ∼ i.i.d. N (0, σ 2 ) yt = αst + mt . Here, αst switches and indicates the mean level of change in the unemployment rate at time t. SupM LR(Λ0.01 ) equals 36.99 for the full sample with the critical value being 8.83 at the 5% level. The test statistic therefore provides strong evidence favoring the regime switching specification. To provide some further evidence for the relevance of the regime switching specification, I estimate the regimes implied by the model. Let st = 1 and st = 2 represent the tight and slack labor market regimes, respectively. Here the tight and slack labor market regimes are defined from the perspective of the job-seekers. Estimation results show that the regime shifts closely follow the recession and expansion periods dated by the National Bureau of Economic Research (NBER) and shown in Figure 2. Comparing the smoothed regime probabilities and the dates of the recession and expansion periods, I find that the labor market normally takes time to react at the beginning of a recession. Estimation under two regimes specification also provides detailed information on changes in the labor market and the durations of each regime. In the tight labor market regime, the unemployment rate increases by 0.26% per month on average and this regime lasts about 8.8 months. In the slack labor market regime, the unemployment rate decreases by 0.03% per month and the regime lasts about 79.4 months. The model assigns low probabilities, around 20 − 30%, to the tight regime during the relatively shallow recessions of July 1990 to March 1991 and March 2001 to November 2001. This makes sense, because the increases in the unemployment rate during these two recessions are relatively moderate when compared with other recessions, such as the recent Great Recession from December 2007 to June 2009. 27 Figure 2. Smoothed probabilities of in tight labor market regime Note. The shaded areas correspond to the NBER defined recessions. The solid line indicates the smoothed probabilities of being in the tight labor market regime. 8 Conclusion This paper develops a modified likelihood ratio (MLR) based test for detecting regime switching in state space models. The asymptotic distribution of this test statistics is also established. When applied to changes in U.S. monthly unemployment rates, the test finds strong evidence favoring the regime switching specification. This paper is the first to develop a test that is based on the likelihood ratio principle for detecting regime switching in state space models. The techniques developed in this paper can have implications for hypothesis testing in more general contexts, such as testing for regime switching in state space models with multiple observables. 28 References Billingsley, P. (1999): Convergence of Probability Measures, John Willey & Sons. Carrasco, M., L. Hu, and W. Ploberger (2014): “Optimal Test for Markov Switching Parameters,” Econometrica, 82, 765–784. Carter, A. V. and D. G. Steigerwald (2012): “Testing for Regime Switching: A Comment,” Econometrica, 80, 1809–1812. Chesher, A. D. (1984): “Testing for Neglected Heterogeneity,” Econometrica, 52, 865–872. Cho, J. S. and H. White (2007): “Testing for Regime Switching,” Econometrica, 75, 1671–1720. Davidson, R. and J. MacKinnon (1991): “Une nouvelle forme du test de la matrice d’information,” Annales d’Economie et de Statistique, 171–192. Davies, R. B. (1977): “Hypothesis Testing When a Nuisance Parameter Is Present Only under the Alternative,” Biometrika, 64, 247–254. Garcia, R. (1998): “Asymptotic Null Distribution of the Likelihood Ratio Test in Markov Switching Models,” International Economic Review, 39, 763–788. Gordon, K. and A. Smith (1988): “Modeling and Monitoring Discontinuous Changes in Time Series,” in Bayesian Analysis of Time Series and Dynamic Linear models, ed. by J. Spall, New York: Marcel Dekker, 359–392. Hamilton, J. D. (1989): “A New Approach to the Economic Analysis of Nonstationary Time Series and the Business Cycle,” Econometrica, 57, 357–384. ——— (1990): “Analysis of Time Series Subject to Changes in Regime,” Journal of Econometrics, 45, 39–70. ——— (1994): Time Series Analysis, Princeton, NJ: Princeton University Press. ——— (1996): “Specification Testing in Markov-Switching Time-Series Models,” Journal of Econometrics, 70, 127–157. ——— (2008): “Regime-Switching Models,” in New Palgrave Dictionary of Economics, ed. by S. Durlauf and L. Blume, London: Palgrave Macmillan. 29 Hansen, B. E. (1992): “The Likelihood Ratio Test under Non-Standard Conditions: Testing the Markov Switching Model of GNP,” Journal of Applied Econometrics, 7, S61–S82. Harrison, P. J. and C. F. Stevens (1976): “Bayesian Forecasting,” Journal of the Royal Statistical Society. Series B (Methodological), 38, 205–247. Harvey, A. C. (1981): Time Series Models, Oxford, UK: Philip Allan and Humanities Press. Highfield, R. A. (1990): “Bayesian Approaches to Turning Point Prediction,” in Proceedings of the Business and Economics Section, Washington, DC: American Statistical Association, 89–98. Kalman, R. E. (1960): “A New Approach to Linear Filtering and Prediction Problems,” Journal of Basic Engineering, 82, 35–45. Kim, C.-J. and C. R. Nelson (1999): State-Space Models with Regime Switching: Classical and Gibbs-Sampling Approaches with Applications, Cambridge, MA: The MIT Press. Lancaster, T. (1984): “The Covariance Matrix of the Information Matrix Test,” Econometrica, 52, 1051–1053. Meinhold, R. J. and N. D. Singpurwalla (1983): “Understanding the Kalman Filter,” The American Statistician, 37, 123–127. Neyman, J. and E. Scott (1966): “On the Use of C(a) Optimal Tests of Composite Hypotheses,” Bull. Inst. Int. Statist., 41, 477–497. Perron, P. (1993): “Non-stationarities and Non-linearities in Canadian Inflation,” in Economic Behaviour and Policy Choice under Price Stability : Proceedings of a Conference Held at the Bank of Canada. Qu, Z. and F. Zhuo (2015): “Likelihood Ratio Based Tests for Markov Regime Switching,” Working Paper. 30 Appendix A Derivatives of the density function The derivatives of fit in (2.16) are calculated here. By (2.5), (2.6), (2.7), (2.8), (2.14) and (4.2), we have: h (i) fit = 2πCt|t−1 i−1/2 i2 h µ(i) exp − t|t−1 (i) 2Ct|t−1 , where (i) µt|t−1 = yt − Hi0 Gi + Fi xt−1|t−1 − A0i zt and (i) Ct|t−1 = Hi0 Fi Pt−1|t−1 Fi0 + Qi Hi . (i) Then the first order derivative of µt|t−1 with respect to the j-th component in θ is (i) ∇θj µt|t−1 = −∇θj Hi0 Gi + Fi xt−1|t−1 − Hi0 ∇θj Gi + ∇θj Fi xt−1|t−1 + Fi ∇θj xt−1|t−1 − ∇θj A0i zt . (A.1) (i) and the first order derivative of Ct|t−1 with respect to the j-th component in θ is (i) ∇θj Ct|t−1 = ∇θj Hi0 Fi Pt−1|t−1 Fi0 + Qi Hi + Hi0 Fi Pt−1|t−1 Fi0 + Qi ∇θj Hi + Hi0 ∇θj Fi Pt−1|t−1 Fi0 + Fi ∇θj Pt−1|t−1 Fi0 + Fi Pt−1|t−1 ∇θj Fi0 + ∇θj Qi Hi . Then the first order derivative of fit with respect to the j-th component in θ is (i) µt|t−1 (i) ∇θj fit = −fit (i) ∇θj µt|t−1 + fit Ct|t−1 31 1 (i) 2Ct|t−1 h (i) i2 µ t|t−1 (i) − 1 ∇ C θ j t|t−1 . (i) Ct|t−1 (A.2) (i) Similarly, the derivative of ∇θj µt|t−1 respect to the k-th component in θ is (i) ∇θj ∇θk µt|t−1 = −∇θj ∇θk Hi0 Gi + Fi xt−1|t−1 − ∇θj Hi0 ∇θk Gi + ∇θk Fi xt−1|t−1 + Fi ∇θk xt−1|t−1 − ∇θk Hi0 ∇θj Gi + ∇θj Fi xt−1|t−1 + Fi ∇θj xt−1|t−1 − Hi0 ∇θj ∇θk Gi + ∇θj ∇θk Fi xt−1|t−1 + ∇θj Fi ∇θk xt−1|t−1 + Fi ∇θj ∇θk xt−1|t−1 − ∇θj ∇θk A0i zt . (A.3) (i) The derivative of ∇θj Ct|t−1 respect to the k-th component in θ is (i) ∇θj ∇θk Ct|t−1 = ∇θj ∇θk Hi0 Fi Pt−1|t−1 Fi0 + Qi Hi + ∇θj Hi0 Fi Pt−1|t−1 Fi0 + Qi ∇θk Hi ∇θj Hi0 ∇θk Fi Pt−1|t−1 Fi0 + Fi ∇θk Pt−1|t−1 Fi0 + Fi Pt−1|t−1 ∇θk Fi0 + ∇θk Qi Hi + ∇θk Hi0 Fi Pt−1|t−1 Fi0 + Qi ∇θj Hi + Hi0 Fi Pt−1|t−1 Fi0 + Qi ∇θj ∇θk Hi + Hi0 ∇θk Fi Pt−1|t−1 Fi0 + Fi ∇θk Pt−1|t−1 Fi0 + Fi Pt−1|t−1 ∇θk Fi0 + ∇θk Qi ∇θj Hi + ∇θk Hi0 ∇θj Fi Pt−1|t−1 Fi0 + Fi ∇θj Pt−1|t−1 Fi0 + Fi Pt−1|t−1 ∇θj Fi0 + ∇θj Qi Hi + Hi0 ∇θj Fi Pt−1|t−1 Fi0 + Fi ∇θj Pt−1|t−1 Fi0 + Fi Pt−1|t−1 ∇θj Fi0 + ∇θj Qi ∇θk Hi + Hi0 ∇θj ∇θk Fi Pt−1|t−1 Fi0 + ∇θj Fi ∇θk Pt−1|t−1 Fi0 + ∇θj Fi Pt−1|t−1 ∇θk Fi0 Hi + Hi0 ∇θk Fi ∇θj Pt−1|t−1 Fi0 + Fi ∇θj ∇θk Pt−1|t−1 Fi0 + Fi ∇θj Pt−1|t−1 ∇θk Fi0 Hi + Hi0 ∇θk Fi Pt−1|t−1 ∇θj Fi0 + Fi ∇θk Pt−1|t−1 ∇θj Fi0 + Fi Pt−1|t−1 ∇θj ∇θk Fi0 Hi + Hi0 ∇θj ∇θk Qi Hi . (A.4) 32 Then the derivative of ∇θj fit respect to the k-th component in θ is (i) (i) µt|t−1 µt|t−1 (i) (i) ∇θj ∇θk fit = − (∇θk fit ) (i) ∇θj µt|t−1 − fit (i) ∇θj ∇θk µt|t−1 Ct|t−1 Ct|t−1 (i) (i) (i) µt|t−1 ∇θk Ct|t−1 (i) ∇θk µt|t−1 − fit ∇ − µ i h θ j 2 t|t−1 (i) (i) Ct|t−1 Ct|t−1 + (∇θk fit ) 1 (i) 2Ct|t−1 h (i) i2 µt|t−1 (i) − 1 ∇ C θj t|t−1 (i) Ct|t−1 h (i) i2 (i) (i) ∇θk Ct|t−1 µt|t−1 ∇ C − 1 − fit h i2 θ j t|t−1 (i) (i) Ct|t−1 2 C t|t−1 + fit 1 (i) 2Ct|t−1 + fit B 1 (i) 2Ct|t−1 i2 h h (i) i (i) (i) (i) ∇ µ ∇ C 2 µ µ θ θ k t|t−1 k t|t−1 t|t−1 t|t−1 (i) − ∇ C h i θ j 2 t|t−1 (i) (i) Ct|t−1 Ct|t−1 h (i) i2 µt|t−1 (i) ∇ C − 1 ∇ θj θk t|t−1 . (i) Ct|t−1 Proofs Proof of Lemma 1. The equation (4.1) can be written as ξt+1|t = p + ρ At , Bt (B.1) where At = f2t (ξt|t−1 − 1) and Bt = (f1t − f2t )ξt|t−1 + f2t . Let “-” (e.g. ξ¯t|t−1 ) denote that the quantity is evaluated at (β 0 , δ 0 , δ 0 ). Consider Lemma 1.1. Because f¯1t = f¯2t = f¯t , it follows that Āt = f¯t (ξ¯t|t−1 − 1) and B̄t = f¯t . (B.2) Plugging this into (B.1), we have ξ¯t+1|t = p + ρ(ξ¯t|t−1 − 1). This, together with (2.13), implies ξ¯2|1 = p + ρ(ξ¯1|0 − 1) = p + ρ(ξ∗ − 1) = ξ∗ , where the last equality follows from the definition of ρ and ξ∗ . This can be iterated forward, leading to ξ¯t+1|t = ξ∗ for all t ≥ 1. 33 Consider Lemma 1.2. Differentiate (B.1) with respect to the j-th component in θ, we have ∇θj ξt+1|t = ρ ∇θj At At ∇θj Bt − Bt Bt2 , (B.3) where ∇θj At = ∇θj f2t (ξt|t−1 − 1) + f2t ∇θj ξt|t−1 and ∇θj Bt = (∇θj f1t − ∇θj f2t )ξt|t−1 + (f1t − f2t )∇θj ξt|t−1 + ∇θj f2t . Below, we evaluate the right hand side of (B.3) at (β 0 , δ 0 , δ 0 ) for two possible situations: (1). If j ∈ I0 , then ∇θj f¯1t = ∇θj f¯2t and f¯1t = f¯2t = f¯t . Consequently ∇θj Āt = ∇θj f¯2t (ξ∗ − 1) + f¯t ∇θj ξ¯t|t−1 , ∇θj B̄t = ∇θj f¯2t . (B.4) Combining (B.4) with (B.2), we have ∇θj ξ¯t+1|t = ρ∇θj ξ¯t|t−1 . This implies, at t = 1, we have ∇θj ξ¯2|1 = ρ∇θj ξ¯1|0 = ρ∇θj ξ∗ = 0. This can be iterated forward leading to ∇θj ξ¯t|t−1 = 0. (2). If j ∈ I1 or j ∈ I2 , then f¯1t = f¯2t = f¯t and ∇θj Āt = ∇θj f¯2t (ξ∗ − 1) + f¯t ∇θj ξ¯t|t−1 , ∇θj B̄t = ξ∗ ∇θj f¯1t + (1 − ξ∗ )∇θj f¯2t . (B.5) Combining this with (B.2), we have ( ∇θj f¯1t ∇θj f¯2t − f¯t f¯t ! ∇θj f¯1t ∇θj f¯2t − , f¯t f¯t !) ∇θj ξ¯t+1|t = ρ ∇θj ξ¯t|t−1 − (ξ∗ − 1)ξ∗ = ρ∇θj ξ¯t|t−1 + r where r = ρ(1 − ξ∗ )ξ∗ . Note that ∇θj ξ¯t+1|t can also be written as ∇θj ξ¯t+1|t = r t−1 X s=0 s ρ ∇θj f¯1t−s ∇θj f¯2t−s − f¯t f¯t 34 ! . (B.6) Because ∇θj f¯1t ∇θj f¯2t − f¯t f¯t ! ! ∇θj−nδ f¯1t ∇θj−nδ f¯2t =− − , f¯t f¯t (B.7) when j ∈ I2 , we have ∇θj ξ¯t+1|t = −∇θj−nδ ξ¯t+1|t . In addition, from (2.13) and (B.6), we have ∇θj f¯1t−s ∇θj f¯2t−s − ∇θj ξ¯t|t = (1 − ξ∗ )ξ∗ ρ f¯t f¯t s=0 ! ¯1t ∇θ f¯2t ∇ f θ j j − , = ρ∇θj ξ¯t−1|t−1 + (1 − ξ∗ )ξ∗ f¯t f¯t t−1 X ! s (B.8) when j ∈ Ia , a = 1, 2, and ∇θj ξ¯t|t = −∇θj−nδ ξ¯t|t . Consider Lemma 1.3. Differentiating (B.3) with respect to θk : ∇θj ∇θk ξt+1|t = ρ ∇θj ∇θk At Bt − ∇θj At ∇θk Bt Bt2 − ∇θk At ∇θj Bt Bt2 − At ∇θj ∇θk Bt +2 Bt2 At ∇θj Bt ∇θk Bt Bt3 , (B.9) where ∇θj ∇θk At = ∇θj ∇θk f2t (ξt|t−1 − 1) + ∇θj f2t ∇θk ξt|t−1 + ∇θk f2t ∇θj ξt|t−1 + f2t ∇θj ∇θk ξt|t−1 , ∇θj ∇θk Bt = (∇θj ∇θk f1t − ∇θj ∇θk f2t )ξt|t−1 + (∇θj f1t − ∇θj f2t )∇θk ξt|t−1 + (∇θk f1t − ∇θk f2t )∇θj ξt|t−1 + (f1t − f2t )∇θj ∇θk ξt|t−1 + ∇θj ∇θk f2t . We now evaluate the right hand side of (B.9) at δ1 = δ2 = δ under three possible situations: (1) If j ∈ I0 and k ∈ I0 , then f¯1t = f¯2t = f¯t , ∇θj f¯1t = ∇θj f¯2t , ∇θk f¯1t = ∇θk f¯2t , ∇θj ∇θk f¯1t = ∇θj ∇θk f¯2t and ∇θj ξ¯t+1|t = ∇θk ξ¯t+1|t = 0, implying ∇θj ∇θk Āt = ∇θj ∇θk f¯2t (ξ¯t|t−1 − 1) + f¯t ∇θj ∇θk ξ¯t|t−1 and ∇θj ∇θk B̄t = ∇θj ∇θk f¯2t . Combining them with (B.4) and (B.2), ∇θj ∇θk ξ¯t+1|t equals (ξ̄t|t−1 −1)∇θj f¯2t ∇θk f¯2t ∇θj ∇θk f¯2t (ξ̄t|t−1 −1)+f¯t ∇θj ∇θk ξ̄t|t−1 − f¯t f¯t2 (ξ̄t|t−1 −1)∇θk f¯2t ∇θj f¯2t (ξ̄t|t−1 −1)∇θj ∇θk f¯2t (ξ̄t|t−1 −1)∇θj f¯2t ∇θk f¯2t − − + 2 f¯2 f¯t f¯2 ρ t t = ρ∇θj ∇θk ξ¯t|t−1 . 35 Starting at t = 1 and iterating forward, we have ∇θj ∇θk ξ¯t+1|t = 0 for all t ≥ 1. (2) If j ∈ I0 and k ∈ I1 , then ∇θj f¯1t = ∇θj f¯2t and ∇θj ξ¯t+1|t = 0, which imply that ∇θj ∇θk Āt = ∇θj ∇θk f¯2t (ξ∗ − 1) + ∇θj f¯2t ∇θk ξ¯t|t−1 + f¯t ∇θj ∇θk ξ¯t|t−1 and ∇θj ∇θk B̄t = ξ∗ ∇θj ∇θk f¯1t + (1 − ξ∗ )∇θj ∇θk f¯2t . Combing these two equations with (B.2), (B.4) and (B.5), ∇θj ∇θk ξ¯t+1|t equals 1 1 ∇θj ∇θk f¯2t (ξ∗ − 1) + ∇θj f¯2t ∇θk ξ¯t|t−1 + f¯t ∇θj ∇θk ξ¯t|t−1 − 2 ∇θj f¯2t (ξ∗ − 1) ξ∗ ∇θk f¯1t + (1 − ξ∗ )∇θk f¯2t f¯t f¯t 1 1 − 2 ∇θj f¯2t ∇θk f¯2t (ξ∗ − 1) + f¯t ∇θk ξ¯t|t−1 − (ξ∗ − 1) ξ∗ ∇θj ∇θk f¯1t + (1 − ξ∗ )∇θj ∇θk f¯2t ¯ ¯ ft ft ρ 1 +2(ξ∗ − 1) 2 ∇θj f¯2t ξ∗ ∇θk f¯1t + (1 − ξ∗ )∇θk f¯2t ¯ ft . The result follows from rearranging the terms. For the case j ∈ I0 and k ∈ I2 , we have the same result. (3) If j ∈ I1 and k ∈ I1 , then ∇θj ∇θk Āt = ∇θj ∇θk f¯2t (ξ∗ − 1) + ∇θj f¯2t ∇θk ξ¯t|t−1 + ∇θk f¯2t ∇θj ξ¯t|t−1 + f¯t ∇θj ∇θk ξ¯t|t−1 and ∇θj ∇θk B̄t = ξ∗ ∇θj ∇θk f¯1t + (1 − ξ∗ )∇θj ∇θk f¯2t + (∇θj f¯1t − ∇θj f¯2t )∇θk ξ¯t|t−1 + (∇θk f¯1t − ∇θk f¯2t )∇θj ξ¯t|t−1 . Applying the similar derivative above, we have ∇θj ∇θk ξ¯t+1|t equals 1 ∇θj ∇θk f¯2t (ξ∗ − 1) + ∇θj f¯2t ∇θk ξ¯t|t−1 + ∇θk f¯2t ∇θj ξ¯t|t−1 + ∇θj ∇θk ξ¯t|t−1 ¯ ft 1 − 2 ∇θj f¯2t (ξ∗ − 1) + f¯t ∇θj ξ¯t|t−1 ξ∗ ∇θk f¯1t + (1 − ξ∗ )∇θk f¯2t f¯ ρ t 1 − 2 ξ∗ ∇θj f¯1t + (1 − ξ∗ )∇θj f¯2t ∇θk f¯2t (ξ∗ − 1) + f¯t ∇θk ξ¯t|t−1 ¯ ft 1 − (ξ∗ − 1) ξ∗ ∇θj ∇θk f¯1t + (1 − ξ∗ )∇θj ∇θk f¯2t + (∇θj f¯1t − ∇θj f¯2t )∇θk ξ¯t|t−1 + (∇θk f¯1t − ∇θk f¯2t )∇θj ξ¯t|t−1 f¯t +2(ξ∗ − 1) 1 ξ∗ ∇θj f¯1t + (1 − ξ∗ )∇θj f¯2t ξ∗ ∇θk f¯1t + (1 − ξ∗ )∇θk f¯2t 2 f¯ . t The result follows from rearranging the terms. For the cases j ∈ I1 and k ∈ I2 , and j ∈ I2 and k ∈ I2 , we have the same result. 36 Consider Lemma 1.4. Differentiating (B.9) with respect to θl : ∇θj ∇θk ∇θl ξt+1|t =ρ − − ∇θj ∇θk ∇θl At Bt ∇θk ∇θl At ∇θj Bt Bt2 ∇θl At ∇θj ∇θk Bt + − − − Bt2 2∇θl At ∇θj Bt ∇θk Bt Bt3 ∇θj ∇θk At ∇θl Bt Bt2 ∇θk At ∇θj ∇θl Bt Bt2 At ∇θj ∇θk ∇θl Bt + − + + ∇θj ∇θl At ∇θk Bt − ∇θj At ∇θk ∇θl Bt Bt2 + 2∇θj At ∇θk Bt ∇θl Bt Bt3 2∇θk At ∇θj Bt ∇θl Bt Bt3 2At ∇θj ∇θk Bt ∇θl Bt Bt2 2At ∇θj ∇θl Bt ∇θk Bt Bt3 Bt2 + Bt3 2At ∇θj Bt ∇θk ∇θl Bt Bt3 − 6At ∇θj Bt ∇θk Bt ∇θl Bt Bt4 , where ∇θj ∇θk ∇θl At = ∇θj ∇θk ∇θl f2t (ξt|t−1 − 1) + ∇θj ∇θl f2t ∇θk ξt|t−1 + ∇θk ∇θl f2t ∇θj ξt|t−1 + ∇θl f2t ∇θj ∇θk ξt|t−1 + ∇θj ∇θk f2t ∇θl ξt|t−1 + ∇θj f2t ∇θk ∇θl ξt|t−1 + ∇θk f2t ∇θj ∇θl ξt|t−1 + f2t ∇θj ∇θk ∇θl ξt|t−1 and ∇θj ∇θk ∇θl Bt = (∇θj ∇θk ∇θl f1t − ∇θj ∇θk ∇θl f2t )ξt|t−1 + (∇θj ∇θl f1t − ∇θj ∇θl f2t )∇θk ξt|t−1 + (∇θk ∇θl f1t − ∇θk ∇θl f2t )∇θj ξt|t−1 + (∇θl f1t − ∇θl f2t )∇θj ∇θk ξt|t−1 + ∇θj ∇θk ∇θl f2t + (∇θj ∇θk f1t − ∇θj ∇θk f2t )∇θl ξt|t−1 + (∇θj f1t − ∇θj f2t )∇θk ∇θl ξt|t−1 + (∇θk f1t − ∇θk f2t )∇θj ∇θl ξt|t−1 + (f1t − f2t )∇θj ∇θk ∇θl ξt|t−1 . We now evaluate the above terms at δ1 = δ2 = δ for 4 possible cases. We only report the values of Ējkl,t but omit the derivation details. (1) If j ∈ I0 , k ∈ I0 and l ∈ I0 , then Ējkl,t = 0. (2) If j ∈ I0 , k ∈ I0 and l ∈ / I0 , then Ējkl,t equals r 1 1 1 ∇θj ∇θk ∇θl f¯1t − ∇θj ∇θk ∇θl f¯2t − 2 ∇θj ∇θk f¯2t (∇θl f¯1t − ∇θl f¯2t ) − 2 ∇θk f¯2t (∇θj ∇θl f¯1t − ∇θj ∇θl f¯2t ) f¯t f¯t f¯t 1 1 − 2 ∇θj f¯2t (∇θk ∇θl f¯1t − ∇θk ∇θl f¯2t ) + 2 3 ∇θj f¯2t ∇θk f¯2t (∇θl f¯1t − ∇θl f¯2t ) ¯ ¯ ft ft 37 . (3) If j ∈ I0 , k ∈ / I0 and l ∈ / I0 , then Ējkl,t equals r ∇θj ∇θl ∇θk f¯2t ∇θj ∇θl ∇θk f¯1t − f¯t f¯t " + ρ(1 − 2ξ∗ ) ∇θj ∇θk f¯1t − ∇θj ∇θk f¯2t ∇θl f¯1t − ∇θl f¯2t ∇θj ∇θl f¯1t − ∇θj ∇θl f¯2t ∇θk ξ̄t|t−1 + ∇θl ξ̄t|t−1 + ∇θj ∇θk ξ̄t|t−1 ¯ ¯ ft ft f¯t ∇θj f¯2t ∇θl f¯1t − ∇θl f¯2t ∇θj f¯2t ∇θk f¯1t − ∇θk f¯2t ∇θk f¯1t − ∇θk f¯2t ξ̄ − + ∇θj ∇θl ξ̄t|t−1 − ∇ ∇θl ξ̄t|t−1 θ t|t−1 k f¯t f¯t2 f¯t2 " −r ∇θj f¯2t ∇θk ∇θl f¯1t − ∇θk ∇θl f¯2t f¯t ∇θj ∇θk f¯2t ∇θl f¯1t − ∇θl f¯2t + f¯t " − 2rξ∗ ∇θj ∇θk f¯2t ∇θl f¯1t − ∇θl f¯2t − f¯2 t + 2r(1 − 2ξ∗ ) ∇θk f¯2t ∇θj ∇θl f¯1t − ∇θj ∇θl f¯2t + f¯t ∇θj ∇θl f¯2t ∇θk f¯1t − ∇θk f¯2t + f¯t ∇θj ∇θk f¯1t ∇θl f¯1t − ∇θl f¯2t f¯2 t ∇θl f¯2t ∇θj ∇θk f¯1t − ∇θj ∇θk f¯2t + f¯t # ∇θj ∇θl f¯1t ∇θk f¯1t − ∇θk f¯2t + f¯2 t ∇θj ∇θl f¯2t ∇θk f¯1t − ∇θk f¯2t − f¯2 # t ∇θj f¯2t ∇θk f¯2t ∇θl f¯1t ∇θj f¯2t ∇θk f¯1t ∇θl f¯2t + 3 ¯ f f¯3 t # + 4r t ξ∗ ∇θj f¯2t ∇θl f¯1t ∇θk f¯1t (1 − ξ∗ )∇θj f¯2t ∇θl f¯2t ∇θk f¯2t − 3 ¯ f f¯3 t 38 t (4) If j ∈ / I0 , k ∈ / I0 and l ∈ / I0 , then Ējkl,t equals r ∇θj ∇θl ∇θk f¯2t ∇θj ∇θl ∇θk f¯1t − f¯t f¯t " + ρ(1 − 2ξ∗ ) ∇θj ∇θk f¯1t − ∇θj ∇θk f¯2t ∇θk ∇θl f¯1t − ∇θk ∇θl f¯2t ∇θj ∇θl f¯1t − ∇θj ∇θl f¯2t ∇θk ξ̄t|t−1 + ∇θl ξ̄t|t−1 + ∇θj ξ̄t|t−1 ¯ ¯ ft ft f¯t ∇θk f¯1t − ∇θk f¯2t ∇θj f¯1t − ∇θj f¯2t ∇θl f¯1t − ∇θl f¯2t ∇θj ∇θk ξ̄t|t−1 + ∇θj ∇θl ξ̄t|t−1 + ∇θk ∇θl ξ̄t|t−1 + f¯t f¯t f¯t " −r ∇θj f¯2t ∇θk ∇θl f¯1t − ∇θk ∇θl f¯2t f¯t ∇θj ∇θk f¯2t ∇θl f¯1t − ∇θl f¯2t + f¯t " − 2rξ∗ t ∇θj ∇θk f¯2t ∇θl f¯1t − ∇θl f¯2t − f¯2 t " − 2ρ ∇θk f¯2t ∇θj ∇θl f¯1t − ∇θj ∇θl f¯2t + f¯t ∇θj ∇θl f¯2t ∇θk f¯1t − ∇θk f¯2t + f¯t ∇θj ∇θk f¯1t ∇θl f¯1t − ∇θl f¯2t f¯2 ∇θj ∇θl f¯2t ∇θk f¯1t − ∇θk f¯2t − f¯2 t # ∇θk ∇θl f¯1t ∇θj f¯1t − ∇θj f¯2t + f¯2 t ∇θk ∇θl f¯2t ∇θj f¯1t − ∇θj f¯2t − f¯2 # t ∇θj f¯1t ∇θl f¯1t ∇θk f¯1t ∇θl f¯1t ∇θj f¯1t ∇θk f¯1t ∇θl ξ̄t|t−1 + ∇θk ξ̄t|t−1 + ∇θj ξ̄t|t−1 2 2 f¯ f¯ f¯2 t " − 6ξ∗ + 1) t # t ∇θj f¯1t ∇θk f¯2t ∇θj f¯1t ∇θl f¯2t ∇θk f¯1t ∇θj f¯2t ∇θl ξ̄t|t−1 + ∇θk ξ̄t|t−1 + ∇θl ξ̄t|t−1 2 2 ¯ ¯ f f f¯2 t t + ∇θj f¯1t − ∇θj f¯2t ∇θk f¯1t − ∇θk f¯2t ∇θl f¯1t − ∇θl f¯2t ∇θk ξ̄t|t−1 ∇θl ξ̄t|t−1 + ∇θj ξ̄t|t−1 ∇θl ξ̄t|t−1 + ∇θj ξ̄t|t−1 ∇θk ξ̄t|t−1 ¯ ¯ ft ft f¯t + ρ(6ξ∗2 − 4ξ∗ ) − t ∇θl f¯2t ∇θj ∇θk f¯1t − ∇θj ∇θk f¯2t + f¯t ∇θk ∇θl f¯2t ∇θj f¯1t − ∇θj f¯2t + f¯t ∇θj ∇θl f¯1t ∇θk f¯1t − ∇θk f¯2t + f¯2 " ρ(6ξ∗2 # t ∇θk f¯1t ∇θl f¯2t ∇θl f¯1t ∇θk f¯2t ∇θl f¯1t ∇θj f¯2t ∇θj ξ̄t|t−1 + ∇θk ξ̄t|t−1 + ∇θj ξ̄t|t−1 2 2 f¯ f¯ f¯2 t t " 2 + ρ[6(1 − ξ∗ ) − 4(1 − ξ∗ )] t t t t + r[6(1 − ξ∗ )2 − 4(1 − ξ∗ )] ∇θj f¯2t ∇θk f¯2t ∇θj f¯2t ∇θl f¯2t ∇θk f¯2t ∇θl f¯2t ∇θl ξ̄t|t−1 + ∇θk ξ̄t|t−1 + ∇θj ξ̄t|t−1 2 2 ¯ ¯ f f f¯2 ∇θj f¯1t ∇θk f¯1t ∇θl f¯1t − r(6ξ∗2 − 4ξ∗ ) + 6rξ∗2 f¯3 # t ∇θj f¯1t ∇θk f¯1t ∇θl f¯2t ∇θj f¯1t ∇θk f¯2t ∇θl f¯1t ∇θj f¯2t ∇θk f¯1t ∇θl f¯1t + + f¯3 f¯3 f¯3 t t t t ∇θj f¯1t ∇θk f¯2t ∇θl f¯2t ∇θj f¯2t ∇θk f¯2t ∇θl f¯1t ∇θj f¯2t ∇θk f¯1t ∇θl f¯2t + + f¯3 f¯3 f¯3 t # t ∇θj f¯2t ∇θk f¯2t ∇θl f¯2t − 6r(1 − ξ∗ )2 . f¯3 t Proof of Lemma 2. When δ1 = δ2 , we can directly apply the proposition 13.1 in Hamilton (1994) to show that the sequence of predicted mean squared error P̄t|t−1 convergences. Then by (2.10) and (2.12), P̄t|t also convergences to a positive semidefinite matrix P̄∗ . Let P0|0 = P̄∗ , then the results in this lemma hold and (4.4) can be achieved by combining (2.6), (2.10) with (2.12) when δ1 = δ2 . (i) In addition, by (2.12) and (2.6), we also have both P̄t|t and P̄t+1|t equal to P̄∗ and F̄ P̄∗ F̄ 0 + Q̄ for 39 # t = 1, ..., T . Proof of Lemma 3. To show Lemma 3.1, combining (2.8) and (2.10), then (i) (i) (i) (i) Pt|t = (I − Pt|t−1 Hi [Hi0 Pt|t−1 H i ]−1 Hi0 )Pt|t−1 . (i) The first order derivative of Pt|t with respect to θj gives (i) (i) ∇θj Pt|t = I − Pt|t−1 Hi Hi0 (i) Hi0 Pt|t−1 Hi ∇θ P (i) I − j t|t−1 (i) 0 0 Pt|t−1 ∇θj Hi Hi + Hi ∇θj Hi − (i) Hi Hi0 Pt|t−1 (i) Hi0 Pt|t−1 Hi − (i) Hi0 Pt|t−1 Hi (i) Pt|t−1 Hi Hi0 (B.10) (i) (i) ∇θj Hi0 Pt|t−1 Hi + Hi0 Pt|t−1 ∇θj Hi (i) Pt|t−1 . 2 (i) Hi0 Pt|t−1 Hi (i) The first order derivative of Pt|t−1 in (2.6) with respect to θj gives (i) ∇θj Pt|t−1 = Fi ∇θj Pt−1|t−1 Fi0 + ∇θj Fi Pt−1|t−1 Fi0 + Fi Pt−1|t−1 ∇θj Fi0 + ∇θj Qi . By the last two equations, we have (i) (i) ∇θj Pt|t = I − Pt|t−1 Hi Hi0 (i) Hi0 Pt|t−1 Hi (i) + I − Pt|t−1 Hi Hi0 (i) Hi0 Pt|t−1 Hi (i) Fi ∇θ Pt−1|t−1 F 0 I − i j Hi Hi0 Pt|t−1 (i) Hi0 Pt|t−1 Hi (B.11) (i) ∇θ Fi Pt−1|t−1 F 0 + Fi Pt−1|t−1 ∇θ F 0 + ∇θ Qi I − i j j i j 0 0 Pt|t−1 ∇θj Hi Hi + Hi ∇θj Hi − (i) (i) Hi0 Pt|t−1 Hi − (i) Pt|t−1 Hi Hi0 (i) Hi Hi0 Pt|t−1 (i) Hi0 Pt|t−1 Hi (i) ∇θj Hi0 Pt|t−1 Hi + Hi0 Pt|t−1 ∇θj Hi (i) Pt|t−1 . 2 (i) Hi0 Pt|t−1 Hi Let δ1 = δ2 , then " (i) ∇θj P̄t|t = P̄ H̄ H̄ 0 I− 0 H̄ P̄ H̄ P̄ H̄ H̄ 0 + I− 0 H̄ P̄ H̄ ! ! # F̄ ∇θj P̄t−1|t−1 F̄ h 0 ∇θj F̄i P̄∗ F̄ + 0 0 P̄ ∇θj H̄i H̄ + H̄∇θj H̄i − " H̄ 0 P̄ H̄ 0 H̄ H̄ 0 P̄ I− 0 H̄ P̄ H̄ F̄ P̄∗ ∇θj F̄i0 P̄ H̄ H̄ 0 − 40 !# + ∇θj Qi (B.12) i H̄ H̄ 0 P̄ I− 0 H̄ P̄ H̄ ! i ∇θj H̄i0 P̄ H̄ + H̄ 0 P̄ ∇θj H̄i h 0 F̄ P̄∗ F̄ + Q̄ , 2 H̄ 0 P̄ H̄ where P̄ H̄ H̄ 0 I− 0 H̄ P̄ H̄ ! F̄ and F̄ H̄ H̄ 0 P̄ I− 0 H̄ P̄ H̄ 0 ! have all their eigenvalues inside the unit circle by Assumption 4. Taking first order derivative with respect to θj in (2.12) and let δ1 = δ2 , we have (1) (2) ¯ ∇θj P̂t|t = ξ∗ ∇θj P̄t|t + (1 − ξ∗ )∇θj P̄t|t . Together with the result in (B.12), " ∇θj P̄t|t = + ξ∗ P̄ H̄ H̄ 0 I− 0 H̄ P̄ H̄ ! # " F̄ ∇θj P̄t−1|t−1 F̄ P̄ H̄ H̄ 0 I− 0 H̄ P̄ H̄ ! h P̄ ∇θj H̄1 − ξ∗ H̄ 0 + ! h 0 ∇θj F̄2 P̄∗ F̄ + H̄∇θj H̄10 (B.13) P̄ H̄ H̄ 0 H̄ H̄ 0 P̄ I− 0 H̄ P̄ H̄ i + ∇ θj Q2 ∇θj H̄10 P̄ H̄ H̄ 0 P̄ H̄ 0 0 P̄ ∇θj H̄2 H̄ + H̄∇θj H̄2 − (1 − ξ∗ ) F̄ P̄∗ ∇θj F̄20 − H̄ 0 P̄ H̄ !# ∇θj F̄1 P̄∗ F̄ 0 + F̄ P̄∗ ∇θj F̄10 + ∇θj Q1 P̄ H̄ H̄ 0 + (1 − ξ∗ ) I − 0 H̄ P̄ H̄ H̄ H̄ 0 P̄ I− 0 H̄ P̄ H̄ 0 H̄ 0 P̄ H̄ P̄ H̄ H̄ 0 + i H̄ H̄ 0 P̄ I− 0 H̄ P̄ H̄ H̄ 0 P̄ ∇ θj H̄1 2 ! h i 0 F̄ P̄∗ F̄ + Q̄ − ! i ∇θj H̄20 P̄ H̄ + H̄ 0 P̄ ∇θj H̄2 h 0 F̄ P̄∗ F̄ + Q̄ , 2 H̄ 0 P̄ H̄ which gives the result in the lemma. To show Lemma 3.2, by (2.5), (2.7), (2.8) and (2.9), we have (i) xt|t = I − (i) Pt|t−1 Hi Hi0 (i) Hi0 Pt|t−1 Hi Fi xt−1|t−1 + Gi + (i) Then the first order derivative of xt|t with respect to θj gives 41 (i) Pt|t−1 Hi (i) Hi0 Pt|t−1 Hi (yt − A0i zt ) . (i) (i) ∇θj xt|t = I − − (i) Hi0 Pt|t−1 Hi (i) + I − Pt|t−1 Hi Hi0 Pt|t−1 Hi Hi0 (i) Hi0 Pt|t−1 Hi Fi ∇θj xt−1|t−1 (B.14) ∇θj Fi xt−1|t−1 + ∇θj Gi (i) (i) (i) ∇θj Pt|t−1 Hi Hi0 + Pt|t−1 ∇θj Hi Hi0 + Pt|t−1 Hi ∇θj Hi0 (i) Hi0 Pt|t−1 Hi Fi xt−1|t−1 + Gi (i) (i) (i) ∇θj Hi0 Pt|t−1 Hi + Hi0 ∇θj Pt|t−1 Hi + Hi0 Pt|t−1 ∇θj Hi Fi xt−1|t−1 + Gi 2 (i) Hi0 Pt|t−1 Hi (i) (i) ∇θj Pt|t−1 Hi + Pt|t−1 ∇θj Hi (yt − A0i zt ) + (i) 0 Hi Pt|t−1 Hi (i) (i) 0 (i) 0 0 (i) P ∇ P P P H ∇ H H + H H + H ∇ H θj i t|t−1 i i θj t|t−1 i i t|t−1 θj i t|t−1 i 0 − (yt − Ai zt ) 2 (i) 0 Hi Pt|t−1 Hi (i) Pt|t−1 Hi ∇θj A0i zt . − (i) Hi0 Pt|t−1 Hi (i) 0 Pt|t−1 Hi Hi + When δ1 = δ2 , we have (i) ∇θj x̄t|t P̄ H̄ H̄ 0 F̄ ∇θj x̄t−1|t−1 (B.15) = I− 0 H̄ P̄ H̄ P̄ H̄ H̄ 0 + I− 0 ∇θj F̄i x̄t−1|t−1 + ∇θj Ḡi H̄ P̄ H̄ (i) ∇θj P̄t|t−1 H̄ H̄ 0 + P̄ ∇θj H̄i H̄ 0 + P̄ H̄∇θj H̄i0 F̄ x̄t−1|t−1 + Ḡ − H̄ 0 P̄ H̄ (i) P̄ H̄ H̄ 0 ∇θj H̄i0 P̄ H̄ + H̄ 0 ∇θj P̄t|t−1 H̄ + H̄ 0 P̄ ∇θj H̄i F̄ x̄t−1|t−1 + Ḡ + 2 0 H̄ P̄ H̄ (i) (i) P̄ H̄ ∇θj H̄i0 P̄ H̄ + H̄ 0 ∇θj P̄t|t−1 H̄ + H̄ 0 P̄ ∇θj H̄i ∇θj P̄t|t−1 H̄ + P̄ ∇θj H̄i yt − Ā0 zt + − 2 0 H̄ P̄ H̄ H̄ 0 P̄ H̄ P̄ H̄ ∇θj Ā0i zt , − H̄ 0 P̄ H̄ (1) (1) where the expressions for ∇θj P̄t|t−1 and ∇θj P̄t|t−1 , based on (2.6), are given by: (1) ∇θj P̄t|t−1 = ∇θj F̄1 P̄ F̄ 0 + F̄ ∇θj P̄t−1|t−1 F̄ 0 + F̄ P̄ ∇θj F̄10 + ∇θj Q̄1 , 42 (B.16) and (2) ∇θj P̄t|t−1 = ∇θj F̄2 P̄ F̄ 0 + F̄ ∇θj P̄t−1|t−1 F̄ 0 + F̄ P̄ ∇θj F̄20 + ∇θj Q̄2 . (B.17) By (2.11), when δ1 = δ2 , (1) (2) ∇θj x̄t|t = ξ∗ ∇θj x̄t|t + (1 − ξ∗ )∇θj x̄t|t . Therefore, ∇θj x̄t|t = P̄ H̄ H̄ 0 I− 0 H̄ P̄ H̄ ! F̄ ∇θj x̄t−1|t−1 + X̄j,t , (B.18) where P̄ H̄ H̄ 0 ξ∗ ∇θj F̄1 x̄t−1|t−1 + ∇θj Ḡ1 + (1 − ξ∗ ) ∇θj F̄2 x̄t−1|t−1 + ∇θj Ḡ2 I− 0 H̄ P̄ H̄ (1) ∇θj P̄t|t−1 H̄ H̄ 0 + P̄ ∇θj H̄1 H̄ 0 + P̄ H̄∇θj H̄10 F̄ x̄t−1|t−1 + Ḡ − ξ∗ H̄ 0 P̄ H̄ (1) P̄ H̄ H̄ 0 ∇θj H̄10 P̄ H̄ + H̄ 0 ∇θj P̄t|t−1 H̄ + H̄ 0 P̄ ∇θj H̄1 F̄ x̄t−1|t−1 + Ḡ + ξ∗ 2 H̄ 0 P̄ H̄ (2) ∇θj P̄t|t−1 H̄ H̄ 0 + P̄ ∇θj H̄2 H̄ 0 + P̄ H̄∇θj H̄20 F̄ x̄t−1|t−1 + Ḡ − (1 − ξ∗ ) 0 H̄ P̄ H̄ (2) P̄ H̄ H̄ 0 ∇θj H̄20 P̄ H̄ + H̄ 0 ∇θj P̄t|t−1 H̄ + H̄ 0 P̄ ∇θj H̄2 F̄ x̄t−1|t−1 + Ḡ + (1 − ξ∗ ) 2 H̄ 0 P̄ H̄ (1) (1) P̄ H̄ ∇θj H̄10 P̄ H̄ + H̄ 0 ∇θj P̄t|t−1 H̄ + H̄ 0 P̄ ∇θj H̄1 ∇θj P̄t|t−1 H̄ + P̄ ∇θj H̄1 yt − Ā01 zt + ξ∗ − 2 H̄ 0 P̄ H̄ H̄ 0 P̄ H̄ (2) (2) P̄ H̄ ∇θj H̄20 P̄ H̄ + H̄ 0 ∇θj P̄t|t−1 H̄ + H̄ 0 P̄ ∇θj H̄2 ∇θj P̄t|t−1 H̄ + P̄ ∇θj H̄2 yt − Ā02 zt − + (1 − ξ∗ ) 2 0 0 H̄ P̄ H̄ H̄ P̄ H̄ P̄ H̄ 0 0 − ξ ∇ Ā + (1 − ξ )∇ Ā zt . ∗ θ ∗ θ 1 2 j j H̄ 0 P̄ H̄ X̄j,t = To show Lemma 3.3. By (2.12), we have the second order derivatives of Pt|t with respect to θj and 43 θk , when evaluated at some δ1 = δ2 , is given by (1) (2) (1) (2) ∇θj ∇θk P̄t|t = ξ∗ (1 − ξ∗ ) ∇θj x̄t|t − ∇θj x̄t|t + ξ∗ (1 − ξ∗ ) ∇θk x̄t|t − ∇θk x̄t|t (2) (1) + ∇θj ξ¯t|t ∇θk P̄t|t − ∇θk P̄t|t (1) (2) 0 (1) (2) 0 ∇θk x̄t|t − ∇θk x̄t|t ∇θj x̄t|t − ∇θj x̄t|t (1) (2) + ∇θk ξ¯t|t ∇θj P̄t|t − ∇θj P̄t|t (1) (2) + ξ∗ ∇θj ∇θk P̄t|t + (1 − ξ∗ )∇θj ∇θk P̄t|t . (1) (2) (1) (2) There are three main components in this expression, ∇θj x̄t|t − ∇θj x̄t|t , ∇θj P̄t|t − ∇θj P̄t|t and (1) (2) (i) (1) ξ∗ ∇θj ∇θk P̄t|t + (1 − ξ∗ )∇θj ∇θk P̄t|t . ∇θj x̄t|t and ∇θj P̄t|t are given in (B.15) and (B.12). To study (i) ∇θj ∇θk P̄t|t , by (B.11), we have (i) (i) ∇θj ∇θk Pt|t = I − Pt|t−1 Hi Hi0 (i) Hi0 Pt|t−1 Hi (i) Fi ∇θ ∇θ Pt−1|t−1 F 0 I − i j k Hi Hi0 Pt|t−1 (i) Hi0 Pt|t−1 Hi + P (i) , jk,t where (i) Pt|t−1 Hi Hi0 (i) Pjk,t = ∇θk I − (i) Hi0 Pt|t−1 Hi (i) + I − Pt|t−1 Hi Hi0 (i) Hi0 Pt|t−1 Hi + ∇θk I − Pt|t−1 Hi Hi0 (i) Hi0 Pt|t−1 Hi (i) + I − Pt|t−1 Hi Hi0 (i) Hi0 Pt|t−1 Hi (i) + I − Pt|t−1 Hi Hi0 (i) Hi0 Pt|t−1 Hi (i) Fi ∇θ Pt−1|t−1 F 0 I − i j (i) (i) Hi0 Pt|t−1 Hi (i) Hi Hi0 Pt|t−1 (i) Hi0 Pt|t−1 Hi (i) ∇θ Fi Pt−1|t−1 F 0 + Fi Pt−1|t−1 ∇θ F 0 + ∇θ Qi I − i j j i j ∇θ k ∇θj Fi Pt−1|t−1 Fi0 + Fi Pt−1|t−1 ∇θj Fi0 + ∇θj Qi I − ∇θ Fi Pt−1|t−1 F 0 + Fi Pt−1|t−1 ∇θ F 0 + ∇θ Qi ∇θ I − i j j i j k − (i) Hi0 Pt|t−1 Hi (i) Hi0 Pt|t−1 Hi − (i) Pt|t−1 Hi Hi0 (i) (i) Hi Hi0 Pt|t−1 (i) Hi0 Pt|t−1 Hi Hi Hi0 Pt|t−1 (i) Hi0 Pt|t−1 Hi (i) (i) Hi0 Pt|t−1 Hi (i) 2 (i) Hi0 Pt|t−1 Hi (i) Pt|t−1 2 (i) ∇θj Hi0 Pt|t−1 Hi + Hi0 Pt|t−1 ∇θj Hi 44 (i) Hi0 Pt|t−1 Hi ∇θj Hi0 Pt|t−1 Hi + Hi0 Pt|t−1 ∇θj Hi Pt|t−1 Hi Hi0 (i) Hi Hi0 Pt|t−1 (i) 0 0 Pt|t−1 ∇θj Hi Hi + Hi ∇θj Hi − Hi Hi0 Pt|t−1 0 0 Pt|t−1 ∇θj Hi Hi + Hi ∇θj Hi − ∇ θk (i) Fi ∇θ Pt−1|t−1 ∇θ F 0 I − i j k (i) (i) ∇θk Pt|t−1 . Therefore, we have (1) (2) ξ∗ ∇θj ∇θk P̄t|t + (1 − ξ∗ )∇θj ∇θk P̄t|t " = P̄ H̄ H̄ 0 I− 0 H̄ P̄ H̄ ! # " F̄ ∇θj ∇θk P̄t−1|t−1 F̄ H̄ H̄ 0 P̄ I− 0 H̄ P̄ H̄ 0 !# (1) (2) + ξ∗ P̄jk,t + (1 − ξ∗ )P̄jk,t and " ∇θj ∇θk P̄t|t = P̄ H̄ H̄ 0 I− 0 H̄ P̄ H̄ " # ! F̄ ∇θj ∇θk P̄t−1|t−1 F̄ H̄ H̄ 0 P̄ I− 0 H̄ P̄ H̄ 0 !# + P̄jk,t , where (1) (2) P̄jk,t = ξ∗ P̄jk,t + (1 − ξ∗ )P̄jk,t (1) (2) (1) (2) + ξ∗ (1 − ξ∗ ) ∇θj x̄t|t − ∇θj x̄t|t + ξ∗ (1 − ξ∗ ) ∇θk x̄t|t − ∇θk x̄t|t (1) (2) 0 (1) (2) 0 ∇θk x̄t|t − ∇θk x̄t|t ∇θj x̄t|t − ∇θj x̄t|t (1) (2) (1) (2) + ∇θj ξ¯t|t ∇θk P̄t|t − ∇θk P̄t|t + ∇θk ξ¯t|t ∇θj P̄t|t − ∇θj P̄t|t . This gives the result in Lemma 3.3. To show Lemma 3.4. By (2.11), we have (1) (2) ∇θj ∇θk xt|t = ∇θj ∇θk ξt|t xt|t − xt|t (1) (1) (2) + ∇θj ξt|t ∇θk xt|t − ∇θk xt|t (2) + ∇θk ξt|t ∇θj xt|t − ∇θj xt|t (1) (2) + ξt|t ∇θj ∇θk xt|t + (1 − ξt|t )∇θj ∇θk xt|t and (1) (2) ∇θj ∇θk x̄t|t = ∇θj ξ¯t|t ∇θk x̄t|t − ∇θk x̄t|t (1) (2) + ∇θk ξ¯t|t ∇θj x̄t|t − ∇θj x̄t|t (1) (2) + ξ∗ ∇θj ∇θk x̄t|t + (1 − ξ∗ )∇θj ∇θk x̄t|t , (i) (i) when δ1 = δ2 . By the previous results, we just need to derive ∇θj ∇θk x̄t|t . Differentiating ∇θj xt|t in (B.14) with respect to θk , we have (i) ∇θj ∇θk xt|t = I − (i) Pt|t−1 Hi Hi0 (i) Hi0 Pt|t−1 Hi 45 Fi ∇θ ∇θ xt−1|t−1 + X (i) , j k jk,t where (i) (i) Pt|t−1 Hi Hi0 Fi ∇θj xt−1|t−1 (i) Hi0 Pt|t−1 Hi (i) Pt|t−1 Hi Hi0 ∇θj Fi xt−1|t−1 + ∇θj Gi + ∇θk I − (i) 0 Hi Pt|t−1 Hi (i) Pt|t−1 Hi Hi0 ∇θk ∇θj Fi xt−1|t−1 + ∇θj Gi I − (i) 0 Hi Pt|t−1 Hi (i) (i) (i) ∇θj Pt|t−1 Hi Hi0 + Pt|t−1 ∇θj Hi Hi0 + Pt|t−1 Hi ∇θj Hi0 Fi xt−1|t−1 + Gi − ∇θk (i) 0 Hi Pt|t−1 Hi (i) (i) (i) ∇θj Pt|t−1 Hi Hi0 + Pt|t−1 ∇θj Hi Hi0 + Pt|t−1 Hi ∇θj Hi0 ∇θk Fi xt−1|t−1 + Gi − (i) 0 Hi Pt|t−1 Hi (i) (i) 0 (i) 0 0 (i) 0 ∇ H P H + H ∇ P H + H P ∇ H P H H θ i θ i θ i i j j j i t|t−1 i i t|t−1 i t|t−1 t|t−1 + ∇θk Fi xt−1|t−1 + Gi 2 (i) Hi0 Pt|t−1 Hi (i) (i) 0 0 (i) 0 0 (i) P H H ∇ H P H + H ∇ P H + H P ∇ H θj i t|t−1 i i θj t|t−1 i i t|t−1 θj i t|t−1 i i + ∇θk Fi xt−1|t−1 + Gi 2 (i) Hi0 Pt|t−1 Hi (i) (i) ∇θj Pt|t−1 Hi + Pt|t−1 ∇θj Hi (yt − A0i zt ) + ∇θk (i) Hi0 Pt|t−1 Hi (i) (i) ∇θj Pt|t−1 Hi + Pt|t−1 ∇θj Hi ∇θk (yt − A0i zt ) + (i) Hi0 Pt|t−1 Hi (i) (i) 0 (i) 0 0 (i) P H ∇ H P H + H ∇ P H + H P ∇ H i θ i θ i θ i j j j i t|t−1 i i t|t−1 t|t−1 t|t−1 0 − ∇ θk (yt − Ai zt ) 2 (i) Hi0 Pt|t−1 Hi (i) (i) 0 (i) 0 0 (i) P H ∇ H P H + H ∇ P H + H P ∇ H θj i t|t−1 i i θj t|t−1 i i t|t−1 θj i t|t−1 i 0 − ∇θk (yt − Ai zt ) 2 (i) 0 Hi Pt|t−1 Hi (i) (i) P Pt|t−1 Hi H i t|t−1 ∇θj A0i zt − ∇θj ∇θk A0i zt . − ∇θk (i) (i) Hi0 Pt|t−1 Hi Hi0 Pt|t−1 Hi Xjk,t = ∇θk I − (i) Then, when δ1 = δ2 , X̄jk,t can be achieved and " ∇θj ∇θk x̄t|t = I− P̄t|t−1 H̄ H̄ 0 H̄ 0 P̄t|t−1 H̄ 46 ! # F̄ ∇θj ∇θk x̄t−1|t−1 + X̄jk,t , where (1) (2) X̄jk,t = ∇θj ξ¯t|t ∇θk x̄t|t − ∇θk x̄t|t (1) (2) + ∇θk ξ¯t|t ∇θj x̄t|t − ∇θj x̄t|t (1) (2) + ξ∗ X̄jk,t + (1 − ξ∗ )X̄jk,t . The next lemma provides stochastic bounds for ξ¯t+1|t and its derivatives. Lemma A.1,1 Suppose Assumption 5 hold. Then, there exists an open neighborhood of (β∗ , δ∗ ) , denoted by B (β∗ , δ∗ ) , and a sequence of strictly stationary and ergodic random variables {λt } satisfying Eλ1+c < M < ∞ for some c, M > 0, such that: t sup (β,δ1 )∈B(β∗ ,δ∗ ) α(k) k < λt ∇θi1 ...∇θik ξ¯t+1|t (t = 1, ..., T ) for all i1 , ..., ik ∈ {1, .., 2nδ + nβ } and k = 1, 2, 3 and 4, where α(k) = 6 if k = 1, 2, 3 and α(k) = 5 if k = 4. The above inequalities hold uniformly over ≤ p, q ≤ 1 − with being an arbitrary number satisfying 0 < < 1/2. Proof of Lemma A.1.1 We use the difference equations in Lemma 1 to relate ∇θi1 ...∇θik ξ¯t+1|t to the density functions f¯1t and f¯2t and their derivatives. Because the higher order derivatives depend successively on the lower orders, we start with k = 1. Without loss of generality, suppose j ∈ I1 . Then, apply (B.6): 6 ∇θj ξ¯t+1|t ≤ t−1 X s rρ s=0 ∇θj f¯1(t−s) ∇θj f¯2(t−s) − f¯t−s f¯t−s !6 !!6 !6 ∞ ∞ X X s 1/6 s 1/6 (1 − ) υt−s , ≤2 |rρ | υt−s ≤ 2 s=0 s=0 where the second inequality follows from Assumption 5 and the last inequality uses ρ = p + q − 1. Because {υt } is stationary and ergodic, the right hand side is also stationary and ergodic (White, 2001, Theorem 3.35). Denote it by λt and apply Minkowski’s inequality for an infinite sum: Eλ1+c t = 2E "∞ X #6(1+c) (1 − 1/6 )s υt−s ≤2 s=0 =2 (∞ X s (1 − ) h 1+c Eυt−s i 1 6(1+c) (∞ Xh E((1 − s=0 )6(1+c) ≤ 2L s=0 1/6 )s υt−s )6(1+c) (∞ X i 1 6(1+c) )6(1+c) )6(1+c) s (1 − ) , s=0 1+c where the last inequality holds because Eυt−s is finite by Assumption 5. Because P∞ s=0 (1−) s = 1/ < ∞, we have Eλ1+c ≤ 2L/6(1+c) < ∞. This establishes the result for k = 1 by setting M = 2L/6(1+c) . t 47 P 3 s The proof for k > 1 is similar. For k = 2, we have |∇θj ∇θi ξ¯t+1|t |3 ≤ ( ∞ s=0 ρ Ēji,t−s ) . I provide upper bounds for |Ēji,t | for five possible cases. Specifically, if j ∈ I0 and i ∈ I1 , then ∇ ∇ f¯ ∇θj ∇θi f¯2t ∇θj f¯2t ∇θi f¯2t ∇θj f¯2t ∇θi f¯1t θj θi 1t − + − Ēji,t = r f¯t f¯t f¯t f¯t f¯t f¯t ! ∇ ∇ f¯ ∇ ∇ f¯ ∇ f¯ ∇ f¯ ∇ f¯ ∇ f¯ θj θi 1t θj θi 2t θj 2t θi 2t θj 2t θi 1t ≤r + + + f¯t f¯t f¯t f¯t f¯t f¯t 1/3 ≤ 4 |r| υt . 1/6 1/6 The same bound holds if j ∈ I0 and i ∈ I2 . If j ∈ I1 and i ∈ I1 , then |Ēji,t | ≤ 4 |ρ (1 − 2ξ∗ )| λt−1 υt + 1/3 2 (2 |r| + |r(2ξ∗ − 1)|) υt . The same bound holds if j ∈ I1 and i ∈ I2 , and j ∈ I2 and i ∈ I2 . Consequently, there exists a finite constant C1 , such that for all the three cases we have |Ēji,t | ≤ 1/6 1/6 C1 (λt−1 υt 3 1/3 + υt ). This implies ∇θj ∇θi ξ¯t+1|t ≤ P ∞ s=0 C1 (1 1/6 1/6 − )s (λt−1 υt 1/3 3 + υt ) . The right side is stationary and ergodic; we continue to denote it by λt . By Minkowski’s inequality: Eλ1+c t ≤ (∞ X E C1 (1 − ) s 1/6 1/6 (λt−1 υt + 1/3 3(1+c) υt ) 1 3(1+c) )3(1+c) . (B.19) s=0 Apply Minkowski’s inequality followed by the Cauchy–Schwarz inequality to the summands: 1/6 1/6 1/3 E C1 (1 − )s (λt−1 υt 3(1+c) s 3(1+c) (1+c)/2 (1+c)/2 Eλt−1 υt s 3(1+c) (1+c) (1+c) Eλt−1 Eυt ≤ (C1 (1 − ) ) ≤ (C1 (1 − ) ) (1+c) (1+c) Because Eλt−1 < M and Eυt (1 − ) + υt ) 3(1+c)s 1 3(1+c) 1 6(1+c) + + (1+c) Eυt (1+c) Eυt 1 3(1+c) 1 3(1+c) 3(1+c) 3(1+c) . < L, the last term in the preceding display is no greater than 3(1+c) C1 (M L) 1 6(1+c) +L 1 3(1+c) 3(1+c) ≤ C2 (1 − )3(1+c)s , (B.20) where C2 is a finite constant independent of p and q. Plug this into (B.19), we have Eλt1+c ≤ C2 ( P∞ s=0 (1 − )s )3(1+c) = C2 /3(1+c) < ∞. This proves the result for k = 2. Now, consider k = 3. Inspecting the expressions of Ējil,t reported in the proof of Lemma 1 shows 48 that they comprise the following terms (a, b, c = 1, 2): ∇θj ∇θi f¯at ∇θl f¯bt ∇θj ∇θi ∇θl f¯at ∇θj f¯at ∇θi f¯bt ∇θl f¯ct ∇θl f¯at ∇θj ∇θi ξ̄t|t−1 , , , , f¯t2 f¯t f¯t3 f¯t ∇θj f¯at ∇θl f¯bt ∇θi ξ̄t|t−1 ∇θj ∇θi f¯at ∇θl ξ̄t|t−1 ∇θj f¯at ∇θi ξ̄t|t−1 ∇θl ξ̄t|t−1 , , . f¯2 f¯t f¯t (B.21) t By Assumption 4 and the above results for k = 1 and 2, the quantities in (B.21) are bounded, respec1/2 1/2 1/2 1/6 1/3 1/3 1/6 1/3 1/6 1/2 1/6 1/3 1/6 1/3 tively, by υt , υt , υt , υt λt−1 , υt λt−1 , υt λt−1 and υt λt−1 . Therefore, the ten cases specified in Lemma 1 all satisfy Ējil,t ≤ C3 (υt 2 1/3 1/6 + υt λt−1 + υt λt−1 ), where C3 is a finite constant indeP ∞ pendent of p and q. This implies ∇θj ∇θi ∇θl ξ¯t+1|t ≤ 1/2 s s=0 (1 − ) C3 (υt 1/6 1/3 1/3 1/6 2 + υt λt−1 + υt λt−1 ) . Denote the right hand side by λt and proceed along the same lines as between (B.19) and (B.20). It then follows that Eλ1+c < ∞. For k = 4, the expressions of Ējilm,t , although omitted here, include t terms as in (B.21) but with the orders of derivatives sum to 4 instead of 3. Using the same arguments as between (B.19) and (B.20), it can be shown that Eλ1+c < ∞ holds. t The follow two lemmas provides stochastic bounds for x̄t|t , P̄t|t and their derivatives. Lemma A.1.2 Suppose Assumption 5 hold. Then, there exists an open neighborhood of (β∗ , δ∗ ) , denoted by B (β∗ , δ∗ ) , and a sequence of strictly stationary and ergodic random variables {λi,t } satisfying i < Mi < ∞ for some ci , Mi > 0, such that for the i-th entry of x̄t|t and its derivatives: Eλ1+c i,t sup (β,δ1 )∈B(β∗ ,δ∗ ) α(k) k < λi,t ∇θi1 ...∇θik x̄t|t,i (t = 1, ..., T ) for all i1 , ..., ik ∈ {1, .., 2nδ + nβ } and k = 1, 2, 3 and 4, where α(k) = 6 if k = 1, 2, 3 and α(k) = 5 if k = 4. The above inequalities hold uniformly over ≤ p, q ≤ 1 − with being an arbitrary number satisfying 0 < < 1/2. Lemma A.1.3 Suppose Assumption 5 hold. Then, there exists an open neighborhood of (β∗ , δ∗ ) , denoted by B (β∗ , δ∗ ) , and a sequence of strictly stationary and ergodic random variables {λij,t } satisfying 1+c Eλij,t ij < Mij < ∞ for some cij , Mij > 0, such that for the (i,j)-th entry of P̄t|t and its derivatives: sup (β,δ1 )∈B(β∗ ,δ∗ ) α(k) k < λij,t ∇θi1 ...∇θik P̄t|t,ij (t = 1, ..., T ) for all i1 , ..., ik ∈ {1, .., 2nδ + nβ } and k = 1, 2, 3 and 4, where α(k) = 6 if k = 1, 2, 3 and α(k) = 5 if k = 4. The above inequalities hold uniformly over ≤ p, q ≤ 1 − with being an arbitrary number satisfying 0 < < 1/2. 49 Proof of Lemma A.1.2-A.1.3 : The proofs for Lemma A.1(2)-(3) are omitted here since they follow the similar proofs in Lemma A.1.1. The next lemma establishes stochastic orders of some quantities related to ξt|t−1 , f1t and f2t . The quantities are all evaluated at (β̃, δ̃, δ̃). Lemma A.2 Let is , js , ls , ms , ns be arbitrary integers satisfying 1 ≤ is , js , ls , ms , ns ≤ 2nδ + nβ for s ∈ {1, 2, 3, 4}. The following results hold uniformly over ≤ p, q ≤ 1 − with being an arbitrary number satisfying 0 < < 1/2: 1. For any a ∈ {1, 2}, u ∈ {1, 2, 3, 4} and v ∈ {0, 1, 2, 3} satisfying u + v ≤ 4, we have (interpret ∇θj1 ...∇θjv ξ˜t|t−1 as 1 when v = 0) T ∇ ...∇ ˜ 1X θi1 θiu fat ∇θj1 ...∇θjv ξ˜t|t−1 = op (1), T t=1 f˜t (B.22) Further, if u + v ≤ 3, then the result holds with op (1) replaced by Op (T −1/2 ). 2. For any (a, b, c) ∈ {1, 2}, (u, w) ∈ {1, 2, 3} and v ∈ {0, 1, 2} satisfying u + v + w ≤ 4 : T ∇ ...∇ ˜ ˜ ˜ 1X θi1 θiu fat ∇θj1 ...∇θjv fbt ∇θl1 ...∇θlw fct = Op (1). T t=1 f˜t f˜t f˜t 3. For any (a, b, c) ∈ {1, 2}, (u, w) ∈ {1, 2, 3} and (v, z) ∈ {0, 1} satisfying u + v + w + z ≤ 3 : T ∇ f˜ ∇ ˜ ˜ 1X θi1 at θj1 ...∇θju fbt ∇θl1 ...∇θlv fct ∇θm1 ...∇θmw ξ˜t|t−1 ∇θn1 ...∇θnz ξ˜t|t−1 = Op (1). T t=1 f˜t f˜t f˜t Proof of Lemma A.2. By the mean value theorem, the left hand side of (B.22) equals T −1 ∗ T ∇ ...∇ X θi1 θiu fat ft∗ t=1 ( + T −3/2 T X t=1 ∇θ 0 ∗ ∇θj1 ...∇θjv ξt|t−1 (B.23) ∇θi1 ...∇θiu f¯at ∇θj1 ...∇θjv ξ¯t|t−1 f¯at !) T 1/2 θ̃ − θ∗ , where "∗" and "−" denote that the relevant quantities are evaluated at the true values θ∗0 = (β∗0 , δ∗0 , δ∗0 ) and θ̄0 = β̄ 0 , δ̄ 0 , δ̄ 0 , where θ̄ lies between θ̃0 = (β̃ 0 , δ̃ 0 , δ̃ 0 ) and θ∗ . The first summation is over terms v/α(k) u/α(k) υt that are stationary and ergodic, which are bounded by λt 50 by Assumption 5 and Lemma A.1. Apply Hölder’s inequality: v/α(k) u/α(k) 1+c E(λt υt ) ≤ E v(1+c)/α(k) α(k)/v λt ≤ Eλ1+c t v α(k) Eυt1+c v α(k) E u(1+c)/α(k) α(k)/(α(k)−v) υt α(k)−v α(k) α(k)−v α(k) where the last inequality follows because u + v < α(k). Both terms on the right hand side are finite by Assumption 5 and Lemma A.1. Therefore, the first term in the display (B.23) is op (1) by Theorem 3.34 in White (2001). Now turn to the second term in the display (B.23). We have, for any k ∈ {1, ..., 2nδ + nβ } : ! T ∇θi1 ...∇θiu f¯at −3/2 X ¯ ∇ θk ∇θj1 ...∇θjv ξt|t−1 T ¯at f t=1 T T ¯ ¯ −3/2 X ∇θi1 ...∇θiu ∇θk fat −3/2 X ∇θi1 ...∇θiu fat ¯ ¯ ≤ T ∇θj1 ...∇θjv ξt|t−1 + T ∇θj1 ...∇θjv ∇θk ξt|t−1 ¯at ¯at f f t=1 t=1 T ∇ ...∇ ¯ X θiu fat ∇θk f¯at θi1 + T −3/2 ∇θj1 ...∇θjv ξ¯t|t−1 f¯at f¯at ≤ T −3/2 t=1 T n X (u+1)/α(k) v/α(k) λt 2υt u/α(k) (v+1)/α(k) λt + υt o = Op T −1/2 , t=1 where the equality follows from Assumption 4, Lemma A.1 and u + v + 1 ≤ 5. Therefore, the display (B.23) is op (1). Now we consider the cases with u + v ≤ 3. If u + v < 3, then the terms inside the first summation v/6 u/6 of (B.23) are bounded by λt υt v/6 u/6 2(1+c) E(λt υt ) . We have v(1+c)/3 ≤ E(λt 3 )v v/3 u(1+c)/3 E(υt 3 (3−v)/3 ) (3−v) ≤ Eλ1+c t v/3 v/6 u/6 2(1+c) ) The right hand side is finite. If u+v = 3, i.e., u = 3 and v = 0, then E(λt υt Eυt1+c (3−v)/3 . = Eυt (1+c) < ∞. Apply the central limit theorem; it follows that the left hand side of (B.22) is Op T −1/2 . Lemma A2.2 and A2.3 can be proved using the same arguments, i.e., first applying the mean value theorem and then obtaining bounds for the two resulting terms separately. It follows that the left hand side quantity in Lemma A2.2 is bounded by T −1 Lemma A2.3 is bounded by T −1 (u+v+w)/α(k) t=1 υt PT (1+u+v)/α(k) (w+z)/α(k) λt t=1 υt PT both satisfy a law of large numbers, therefore are Op (1). 51 + Op (T −1/2 ), while that in + Op (T −1/2 ). The two leading terms Lemma A.3 Under the null hypothesis and Assumptions 1-5, for all k, l, m ∈ {1, ..., nδ }, we have: 1. Let ek be an nδ -dimensional unit vector whose k-th element equals 1, then ∇δ2k β̂(δ̃) ξ∗ ∇δ2k δ̂1 (δ̃) 0 = (ξ∗ − 1) ek + Op (T −1/2 ). 2. The second order derivatives satisfy ∇δ2k ∇δ2l β̂(δ̃) ξ∗ ∇δ2k ∇δ2l δ̂1 (δ̃) T X −1 1 D̃kl,t + Op (T −1/2 ). = −I˜ T t=1 3. The third order derivatives satisfy ∇δ2k ∇δ2l ∇δ2m β̂(δ̃) ξ∗ ∇δ2k ∇δ2l ∇δ2m δ̂1 (δ̃) = Op (1). Proof of Lemma A.3 As the proof is long, we organize it into three parts, corresponding to Lemma A.3.1, A.3.2 and A.3.3 respectively. Proof of Lemma A.3.1 By construction, θ̂(δ2 ) satisfies the first order conditions: (1) Mj (p, q, δ2 ) = T M̂t 1X = 0, T t=1 B̂t (j ∈ {1, ..., nβ + nδ }), (B.24) where B̂t = fˆ1t ξˆt|t−1 + fˆ2t (1 − ξˆt|t−1 ), M̂jt = (∇θj fˆ1t − ∇θj fˆ2t )ξˆt|t−1 + (fˆ1t − fˆ2t )∇θj ξˆt|t−1 + ∇θj fˆ2t . (B.25) Because (B.24) holds for any δ2 ∈ ∆, its derivative with respect to δ2 must equal zero. The proof exploits this fact. It consists of three steps. The first step takes first order derivatives of the nβ + nδ restrictions in (B.24) with respect to δ2k , where k ∈ {1, ..., nδ }, obtain a system of nβ + nδ linear equations with ∇δ2k β̂(δ2 ) and ∇δ2k δ̂1 (δ2 ) being the unknowns. The second step evaluates these equations at δ2 = δ̃ and obtains approximation to them. The third step solves these approximating equations to obtain explicit expressions for ∇δ2k β̂(δ2 ) and ∇δ2k δ̂1 (δ2 ). These three steps are then repeated for 52 every k ∈ {1, ..., nδ } to obtain the lemma. Step 1 for proving Lemma. Pick an arbitrary k ∈ {1, ..., nδ } and an arbitrary j ∈ {1, ..., nβ + nδ }. Differentiate the j-th equation from (B.24) with respect to the k-th element of δ2 to obtain (2) Mjk (p, q, δ2 ) = T T 1X 1X ∇δ2k M̂jt ∇δ2k B̂t − M̂jt = 0, T t=1 T B̂t B̂t2 t=1 (B.26) where ( ∇δ2k M̂jt = ξˆt|t−1 ∇θj ∇θ0 fˆ1t + 1 − ξˆt|t−1 ∇θj ∇θ0 fˆ2t + ∇θj fˆ1t − ∇θj fˆ2t ∇θ0 ξˆt|t−1 ) +∇θj ξˆt|t−1 ∇θ0 fˆ1t − ∇θ0 fˆ2t + (fˆ1t − fˆ2t )∇θj ∇θ0 ξˆt|t−1 ∇δ2k θ̂(δ2 ) (B.27) and n o ∇δ2k B̂t = ξˆt|t−1 ∇θ0 fˆ1t + 1 − ξˆt|t−1 ∇θ0 fˆ2t + fˆ1t − fˆ2t ∇θ0 ξˆt|t−1 ∇δ2k θ̂(δ2 ) with ∇δ2k β̂(δ2 ) ∇δ2k θ̂(δ2 ) = ∇δ2k δ̂1 (δ2 ) , ek (B.28) (B.29) where ek is an nδ dimensional vector, whose k-th element equals 1 and the others zero. We view (B.26) as a linear equation with the first nβ + nδ elements of ∇δ2k θ̂(δ2 ) being the unknowns. The above differentiation can be carried for all j = 1, ..., nβ + nδ while keeping k fixed at the same value. This delivers nβ + nδ equations with the same number of unknown contained in (B.29). Step 2 for proving Lemma. We evaluate the term T −1 PT 2 t=1 (∇δ2k B̂t /B̂t )M̂jt in (B.26) at δ2 = δ̃ for an arbitrary j ∈ {1, ..., nβ + nδ }. It equals T i ξ∗ ∇θj f˜1t + (1 − ξ∗ )∇θj f˜2t h 1X ˜1t + (1 − ξ∗ )∇θ0 f˜2t ∇δ θ̂(δ̃). 0f ξ ∇ ∗ θ 2k T t=1 f˜t2 Using (B.29), this can be rewritten as 53 T i ξ∗ ∇θj f˜1t + (1 − ξ∗ )∇θj f˜2t h 1X ˜1t + (1 − ξ∗ )∇β 0 f˜2t ∇δ β̂(δ̃) 0f ξ ∇ ∗ β 2k T t=1 f˜t2 + T i ξ∗ ∇θj f˜1t + (1 − ξ∗ )∇θj f˜2t h 1X ˜1t + (1 − ξ∗ )∇δ0 f˜2t ∇δ δ̂1 (δ̃) 0f ξ ∇ ∗ δ 2k 1 1 T t=1 f˜t2 + T i ξ∗ ∇θj f˜1t + (1 − ξ∗ )∇θj f˜2t h 1X ˜1t + (1 − ξ∗ )∇δ f˜2t . ξ ∇ f ∗ δ 2k 2k T t=1 f˜t2 (B.30) Now, let j run through the set {1, ..., nβ + nδ }. Let 1 I˜ = T T X t=1 ˜ ˜ ξ∗ ∇β f˜1t +(1−ξ∗ )∇β f˜2t ξ∗ ∇δ10 f1t +(1−ξ∗ )∇δ10 f2t f˜t f˜t ˜ ˜ ξ∗ ∇δ1 f˜1t +(1−ξ∗ )∇δ1 f˜2t ξ∗ ∇δ10 f1t +(1−ξ∗ )∇δ10 f2t f˜t f˜t ξ∗ ∇β f˜1t +(1−ξ∗ )∇β f˜2t ξ∗ ∇β 0 f˜1t +(1−ξ∗ )∇β 0 f˜2t f˜t f˜t ξ∗ ∇δ1 f˜1t +(1−ξ∗ )∇δ1 f˜2t ξ∗ ∇β 0 f˜1t +(1−ξ∗ )∇β 0 f˜2t f˜t f˜t . Then, (B.30) can be written as I˜ ∇δ2k β̂(δ̃) ∇δ2k δ̂1 (δ̃) + ξ∗ ∇β f˜1t +(1−ξ∗ )∇β f˜2t ξ∗ ∇δ2k f˜1t +(1−ξ∗ )∇δ2k f˜2t t=1 f˜t f˜t ξ∗ ∇δ1 f˜1t +(1−ξ∗ )∇δ1 f˜2t ξ∗ ∇δ2k f˜1t +(1−ξ∗ )∇δ2k f˜2t 1 PT t=1 T f˜t f˜t 1 T PT , (B.31) where I˜ is the observed information evaluated at the null estimates. Now we turn to the first term at δ2 = δ̃ in (B.26). It equals ( 1 T T X ∇θj ∇θ0 f˜1t ξ∗ t=1 f˜t ∇θj ∇θ0 f˜2t + (1 − ξ∗ ) + f˜t ∇θj f˜1t − ∇θj f˜2t f˜t ∇θ0 ξ̂t|t−1 + ∇θj ξ̂t|t−1 ∇θ0 f˜1t − ∇θ0 f˜2t f˜t ) ∇δ2k θ̂(δ̃). All the terms inside the curly brackets are Op (T −1/2 ). Their effects are dominated by I˜ which is positive definite in large sample. Combining this fact with (B.26) and (B.31) , we obtain I˜ ∇δ2k β̂(δ̃) ∇δ2k δ̂1 (δ̃) = − ξ∗ ∇β f˜1t +(1−ξ∗ )∇β f˜2t ξ∗ ∇δ2k f˜1t +(1−ξ∗ )∇δ2k f˜2t t=1 f˜t f˜t ˜ ˜ ˜ ˜ P ξ ∇ f +(1−ξ ξ ∇ f +(1−ξ )∇ f ∗ )∇δ2k f2t ∗ δ1 1t ∗ δ1 2t ∗ δ2k 1t T 1 t=1 T f˜t f˜t 1 T PT −1/2 ). + Op (T (B.32) To solve this, it is important to check ξ∗ ∇δ1k f˜1t + (1 − ξ∗ )∇δ1k f˜2t and ξ∗ ∇δ2k f˜1t + (1 − ξ∗ )∇δ2k f˜2t . By the results in (B.13) and (B.18), we have ∇δ2k P̃t−1|t−1 = 1 − ξ∗ ∇δ1k P̃t−1|t−1 , ξ∗ 54 and ∇δ2k x̃t−1|t−1 = 1 − ξ∗ ∇δ1k x̃t−1|t−1 . ξ∗ Applying (4.3) and the above two equalities, we have ξ∗ ∇δ2k f˜1t + (1 − ξ∗ )∇δ2k f˜2t = i 1 − ξ∗ h ξ∗ ∇δ1k f˜1t + (1 − ξ∗ )∇δ1k f˜2t . ξ∗ (B.33) Then we can solve the unknowns in (B.32) and get ∇δ2k β̂(δ̃) ∇δ2k δ̂1 (δ̃) = ξ∗ − 1 0 −1/2 ). + Op (T ξ∗ e k Proof of Lemma A.3.2. View the quantities in (B.26) as functions of δ2 , p and q, and differentiate it with respect to the l-th component of δ2 (l = 1, ..., nδ ): (3) Mjkl (p, q, δ2 ) = T 1X T t=1 = 0, ( ) ∇δ2k ∇δ2l M̂jt ∇δ2k M̂jt ∇δ2l B̂t ∇δ2k ∇δ2l B̂t ∇δ2l M̂jt ∇δ2k B̂t ∇δ B̂t ∇δ B̂t − − M̂jt − + 2 2k 3 2l M̂jt 2 2 2 B̂t B̂t B̂t B̂t B̂t (B.34) where nβ +2nδ ( ∇δ2k ∇δ2l M̂jt = X ∇θ0 ξˆt|t−1 ∇θj ∇θs fˆ1t + ξˆt|t−1 ∇θj ∇θs ∇θ0 fˆ1t s=1 −∇θ0 ξˆt|t−1 ∇θj ∇θs fˆ2t + (1 − ξˆt|t−1 )∇θj ∇θs ∇θ0 fˆ2t +(∇θj ∇θ0 fˆ1t − ∇θj ∇θ0 fˆ2t )∇θs ξˆt|t−1 + (∇θj fˆ1t − ∇θj fˆ2t )∇θs ∇θ0 ξˆt|t−1 +(∇θs fˆ1t − ∇θs fˆ2t )∇θj ∇θ0 ξˆt|t−1 + (∇θs ∇θ0 fˆ1t − ∇θs ∇θ0 fˆ2t )∇θj ξˆt|t−1 ) +(∇θ0 fˆ1t − ∇θ0 fˆ2t )∇θj ∇θs ξˆt|t−1 + (fˆ1t − fˆ2t )∇θj ∇θs ∇θ0 ξˆt|t−1 ∇δ2k θ̂s (δ2 )∇δ2l θ̂(δ2 ) ( + ξˆt|t−1 ∇θj ∇θ0 fˆ1t + 1 − ξˆt|t−1 ∇θj ∇θ0 fˆ2t + ∇θj fˆ1t − ∇θj fˆ2t ∇θ0 ξˆt|t−1 + ) +∇θj ξˆt|t−1 ∇θ0 fˆ1t − ∇θ0 fˆ2t + (fˆ1t − fˆ2t )∇θj ∇θ0 ξˆt|t−1 ∇δ2k ∇δ2l θ̂(δ2 ) 55 and nβ +2nδ ( X ∇δ2k ∇δ2l B̂t = ∇θ0 ξˆt|t−1 ∇θs fˆ1t + ξˆt|t−1 ∇θs ∇θ0 fˆ1t − ∇θ0 ξˆt|t−1 ∇θs fˆ2t + (1 − ξˆt|t−1 )∇θs ∇θ0 fˆ2t s=1 ) +(∇θ0 fˆ1t − ∇θ0 fˆ2t )∇θs ξˆt|t−1 + (fˆ1t − fˆ2t )∇θs ∇θ0 ξˆt|t−1 ∇δ2k θ̂s (δ2 )∇δ2l θ̂(δ2 ) n o + ξˆt|t−1 ∇θ0 fˆ1t + (1 − ξˆt|t−1 )∇θ0 fˆ2t + (fˆ1t − fˆ2t )∇θ0 ξˆt|t−1 ∇δ2k ∇δ2l θ̂(δ2 ). To study (B.34), we can start with the third term T −1 nβ +2nδ nβ +2nδ ( X X u=1 s=1 PT t=1 h i ∇δ2k ∇δ2l B̂t /B̂t2 M̂jt . At θ̃, it equals T ξ∗ ∇θj f˜1t + (1 − ξ∗ )∇θj f˜2t 1X [∇θu ξ˜t|t−1 ∇θs f˜1t + ξ∗ ∇θs ∇θu f˜1t − ∇θu ξ˜t|t−1 ∇θs f˜2t T t=1 f˜t2 ) +(1 − ξ∗ )∇θs ∇θu f˜2t + (∇θu f˜1t − ∇θu f˜2t )∇θs ξ˜t|t−1 ∇δ2k θ̂s (δ̃)∇δ2l θ̂u (δ̃) nβ +2nδ + X s=1 ( T ξ∗ ∇θj f˜1t + (1 − ξ∗ )∇θj f˜2t 1X [ξ∗ ∇θs f˜1t + (1 − ξ∗ )∇θs f˜2t ] ∇δ2k ∇δ2l θ̂s (δ̃). T t=1 f˜t2 ) Because ∇δ2k θ̂s (δ̃) and ∇δ2l θ̂u (δ̃) are Op (T −1/2 ) except for the following four cases s = nβ + k, s = 56 nβ + nδ + k, u = nβ + l and u = nβ + nδ + l, the preceding display equals T ξ∗ ∇θj f˜1t + (1 − ξ∗ )∇θj f˜2t 1X ∇δ1l ξ˜t|t−1 ∇δ1k f˜1t + ξ∗ ∇δ1k ∇δ1l f˜1t − ∇δ1l ξ˜t|t−1 ∇δ1k f˜2t T t=1 f˜t2 ( ) +(1 − ξ∗ )∇δ1k ∇δ1l f˜2t + (∇δ1l f˜1t − ∇δ1l f˜2t )∇δ1k ξ˜t|t−1 ∇δ2k δ̂1k (δ̃)∇δ2l δ̂1l (δ̃) T ξ∗ ∇θj f˜1t + (1 − ξ∗ )∇θj f˜2t 1X ∇δ2l ξ˜t|t−1 ∇δ1k f˜1t + ξ∗ ∇δ1k ∇δ2l f˜1t − ∇δ2l ξ˜t|t−1 ∇δ1k f˜2t T t=1 f˜t2 ( + ) +(1 − ξ∗ )∇δ1k ∇δ2l f˜2t + (∇δ2l f˜1t − ∇δ2l f˜2t )∇δ1k ξ˜t|t−1 ∇δ2k δ̂1k (δ̃)∇δ2l δ̂2l (δ̃) T ξ∗ ∇θj f˜1t + (1 − ξ∗ )∇θj f˜2t 1X ∇δ1l ξ˜t|t−1 ∇δ2k f˜1t + ξ∗ ∇δ2k ∇δ1l f˜1t − ∇δ1l ξ˜t|t−1 ∇δ2k f˜2t T t=1 f˜t2 ( + ) +(1 − ξ∗ )∇δ2k ∇δ1l f˜2t + (∇δ1l f˜1t − ∇δ1l f˜2t )∇δ2k ξ˜t|t−1 ∇δ2k δ̂2k (δ̃)∇δ2l δ̂1l (δ̃) T ξ∗ ∇θj f˜1t + (1 − ξ∗ )∇θj f˜2t 1X ∇δ2l ξ˜t|t−1 ∇δ2k f˜1t + ξ∗ ∇δ2k ∇δ2l f˜1t − ∇δ2l ξ˜t|t−1 ∇δ2k f˜2t T t=1 f˜t2 ( + ) +(1 − ξ∗ )∇δ2k ∇δ2l f˜2t + (∇δ2l f˜1t − ∇δ2l f˜2t )∇δ2k ξ˜t|t−1 ∇δ2k δ̂2k (δ̃)∇δ2l δ̂2l (δ̃) nβ +nδ + X s=1 ( T ξ∗ ∇θj f˜1t + (1 − ξ∗ )∇θj f˜2t 1X [ξ∗ ∇θs f˜1t + (1 − ξ∗ )∇θs f˜2t ] ∇δ2k ∇δ2l θ̂s (δ̃) + Op (T −1/2 ). 2 ˜ T t=1 ft ) Apply ∇δ2l δ̂1l (δ̃) = (ξ∗ − 1)/ξ∗ + Op (T −1/2 ), ∇δ2k δ̂1k (δ̃) = (ξ∗ − 1)/ξ∗ + Op (T −1/2 ), ∇δ2l δ̂2l (δ̃) = ∇δ2k δ̂2k (δ̃) = 1 and rearrange terms, the preceding display further reduces to T 1 X ξ∗ ∇θj f˜1t + (1 − ξ∗ )∇θj f˜2t T t=1 f˜t2 ( 2 1 − ξ∗ ∇δ ∇δ f˜1t ∇δ ∇δ f˜2t ∇δ ∇δ f˜1t ∇δ ∇δ f˜2t × ξ∗ 1k 1l + (1 − ξ∗ ) 1k 1l + ξ∗ 2k 2l + (1 − ξ∗ ) 2k 2l ξ∗ f˜t f˜t f˜t f˜t ∇δ ∇δ f˜1t ∇δ ∇δ f˜2t ∇δ ∇δ f˜1t ∇δ ∇δ f˜2t 1 − ξ∗ ξ∗ 2k 1l + (1 − ξ∗ ) 2k 1l + ξ∗ 1k 2l + (1 − ξ∗ ) 1k 2l − ξ∗ f˜t f˜t f˜t f˜t ) 1 ∇θl f¯1t ∇θl f¯2t ∇θk f¯1t ∇θk f¯2t + 2 ∇δ1k ξ˜t|t−1 − + − ∇δ1l ξ˜t|t−1 ¯ ¯ ¯ ξ∗ ft ft ft f¯t ) nβ +nδ ( T X 1 X ξ∗ ∇θj f˜1t + (1 − ξ∗ )∇θj f˜2t + [ξ∗ ∇θs f˜1t + (1 − ξ∗ )∇θs f˜2t ] ∇δ2k ∇δ2l θ̂s (δ̃) + Op (T −1/2 ). ˜2 T f t s=1 t=1 The above display leads to (nβ + nδ ) equations with j = 1, ..., nβ + nδ . These equations can be written 57 collectively as I˜ ∇δ2k ∇δ2l β̂(δ̃) ∇δ2k ∇δ2l δ̂1 (δ̃) + ξ∗ ∇β f˜1t +(1−ξ∗ )∇β f˜2t Ũkl,t t=1 f˜t2 ˜ ˜ P ξ∗ ∇δ1 f1t +(1−ξ∗ )∇δ1 f2t T 1 Ũkl,t t=1 T f˜t2 1 T PT −1/2 ), + Op (T where 2 ∇δ1k ∇δ1l f˜2t ∇δ2k ∇δ2l f˜1t ∇δ2k ∇δ2l f˜2t ∇δ1k ∇δ1l f˜1t Ũkl,t = + (1 − ξ∗ ) + ξ∗ + (1 − ξ∗ ) ξ∗ f˜t f˜t f˜t f˜t 1 − ξ∗ ∇δ2k ∇δ1l f˜1t ∇δ2k ∇δ1l f˜2t ∇δ1k ∇δ2l f˜1t ∇δ1k ∇δ2l f˜2t − ξ∗ + (1 − ξ∗ ) + ξ∗ + (1 − ξ∗ ) ξ∗ f˜t f˜t f˜t f˜t ∇θl f¯1t 1 ∇θl f¯2t ∇θk f¯2t ∇θk f¯1t + 2 ∇δ1k ξ˜t|t−1 − − + ∇δ1l ξ˜t|t−1 . ξ∗ f¯t f¯t f¯t f¯t 1 − ξ∗ ξ∗ This completes the analysis of the third term in (B.34). As shown below, the other terms in (B.34) are all asymptotically negligible. Note that at δ2 = δ̃, ∇δ2k B̂t can be rewritten as nβ X nβ +2nδ ∇θs f˜1t ∇δ2k θ̂s (δ̃) + s=1 = s=1 h h i ξ∗ ∇θs f˜1t + (1 − ξ∗ )∇θs f˜2t ∇δ2k θ̂s (δ̃) s=nβ +1 nβ X X ∇θs f˜1t ∇δ2k β̂s (δ̃) + nδ X h i ξ∗ ∇θs f˜1t + (1 − ξ∗ )∇θs f˜2t ∇δ2k δ̂1s (δ̃) s=1,s6=k + ξ∗ ∇δ1k f˜1t + (1 − ξ∗ )∇δ1k f˜2t i ∇δ2k δ̂1k (δ̃) + 1 − ξ∗ ξ∗ . (B.35) This representation is useful because, corresponding to the three terms on the right hand side, we have ∇δ2k β̂s (δ̃) = Op (T −1/2 ), ∇δ2k δ̂1s (δ̃) = Op (T −1/2 ) when s 6= k and ∇δ2k δ̂1k (δ̃) + (1 − ξ∗ )/ξ∗ = Op (T −1/2 ) by the previous results. So, the second, fourth and fifth terms are all Op (T −1/2 ) in (B.34) after applying (B.35) to ∇δ2k B̂t . Proof of Lemma A.3.3. View the quantities in (B.34) as functions of δ2 , p and q and differentiate them 58 with respect to the h-th element of δ2 (h = 1, ..., nδ ): (4) Mjklh (p, q, δ2 ) T 1X = T t=1 ( ∇δ2k ∇δ2l ∇δ2h M̂jt ∇δ2k ∇δ2l M̂jt ∇δ2h B̂t − B̂t B̂t2 (B.36) − ∇δ2k ∇δ2h M̂jt ∇δ2l B̂t ∇δ2k M̂jt ∇δ2l ∇δ2h B̂t ∇δ M̂jt ∇δ2l B̂t ∇δ2h B̂t − + 2 2k 2 2 B̂t B̂t B̂t3 − ∇δ2k ∇δ2l B̂t ∇δ2k ∇δ2l ∇δ2h B̂t ∇δ ∇δ B̂t ∇δ2h B̂t M̂jt − ∇δ2h M̂jt + 2 2k 2l 3 M̂jt 2 2 B̂t B̂t B̂t − ∇δ2k ∇δ2h B̂t ∇δ2k B̂t ∇δ B̂t ∇δ B̂t ∇δ2l M̂jt − ∇δ2l ∇δ2h M̂jt + 2 2k 3 2h ∇δ2l M̂jt 2 2 B̂t B̂t B̂t +2 ∇δ2k ∇δ2h B̂t ∇δ2l B̂t ∇δ B̂t ∇δ2l ∇δ2h B̂t M̂jt + 2 2k M̂jt 3 B̂t B̂t3 ) ∇δ B̂t ∇δ B̂t ∇δ B̂t ∇δ2l B̂t ∇δ2h B̂t +2 2k 3 2l ∇δ2h M̂jt − 6 2k M̂jt . B̂t B̂t4 = 0. Among the fifteen terms, only the 1st and the 6th term involve third order derivatives. They will be analyzed later. Among the remaining terms, we have the following five cases: (1) The 4th, 7th and 9th terms involve second order derivatives of B̂t and first order derivatives of M̂jt , which lead to: T −1 T −1 PT T −1 PT ˜ PT ˜ ˜ ˜ t=1 (∇θi1 ∇θi2 fat /ft )(∇θj1 ∇θj2 fbt /ft ), ˜ ˜ ˜ ˜ t=1 (∇θi1 fat /ft )(∇θj1 ∇θj2 fbt /ft ), ˜ ˜ ˜ t=1 (∇θi1 ∇θi2 fat /ft )∇θj1 ξt|t−1 , T −1 T −1 ˜ PT ˜ ˜ ˜ ˜ ˜ t=1 (∇θi1 fat /ft )(∇θi2 fbt /ft )∇θj1 ξt|t−1 ∇θj2 ξt|t−1 , ˜ PT ˜ ˜ ˜ ˜ t=1 (∇θi1 fat /ft )(∇θj1 fbt /ft )∇θj2 ξt|t−1 and where 1 ≤ i1 , i2 , j1 , j2 ≤ nβ + 2nδ , a = 1, 2 and b = 1, 2. They are all Op (1) by Lemma A.2. (2) The 2rd, 3rd and 10th terms consist of first order derivatives of B̂t and second order derivatives of M̂jt . They lead to: T −1 ˜ ˜ PT ˜ ˜ ˜ t=1 (∇θi1 fat /ft )(∇θj1 ∇θj2 ∇θj3 fbt /ft ), ˜ ˜ ˜ ˜ ˜ T −1 PT ˜ ˜ ˜ ˜ T −1 PT ˜ ˜ ˜ ˜ and T −1 PT (∇θ f˜at /f˜t )(∇θ f˜bt /f˜t )∇θ ξ˜t|t−1 , which are all t=1 i1 j1 j2 t=1 (∇θi1 ∇θi2 fat /ft )(∇θj1 fbt /ft )∇θj2 ξt|t−1 , T −1 PT t=1 (∇θi1 fat /ft )(∇θi2 fbt /ft )∇θj1 ∇θj2 ξt|t−1 , t=1 (∇θi1 ∇θi2 fat /ft )(∇θj1 fbt /ft ) Op (1). These three terms are thus Op (T −1/2 ) after applying (B.35) to the first order derivatives of B̂t . (3) The 5th, 11th and 14th terms consist of: T −1 T −1 ˜ PT ˜ ˜ ˜ ˜ ˜ ˜ PT ˜ t=1 (∇θi1 fat /ft )(∇θi2 fbt /ft )(∇θi3 fct /ft )∇θj1 ξt|t−1 , ˜ ˜ ˜ ˜ ˜ t=1 (∇θi1 fat /ft )(∇θi2 fbt /ft )(∇θj1 ∇θj2 fct /ft ), which are all Op (1). Consequently, these three terms are Op (T −1/2 ) after applying (B.35). (4) The 8th, 12th and 13th terms lead to: T −1 PT ˜ ˜ ˜ ˜ ˜ ˜ T −1 T −1 PT ˜ ˜ ˜ ˜ ˜ ˜ ˜ t=1 (∇θi1 fat /ft )(∇θj1 fbt /ft )(∇θj2 fct /ft ), PT t=1 (∇θi1 fat /ft )(∇θi2 fbt /ft )(∇θi3 fct /ft )∇θj1 ξt|t−1 , sists of T ˜ ˜ ˜ ˜ ˜ ˜ t=1 (∇θi1 fat /ft )(∇θi2 fbt /ft )(∇θj1 ∇θj2 fct /ft ), which are all Op (1). (5) The 15th term con −1/2 after ap˜ ˜ ˜ ˜ ˜ ˜ ˜ ˜ t=1 (∇θi1 fat /ft )(∇θi2 fbt /ft )(∇θi3 fct /ft )(∇θi4 fct /fct ). This term is Op T −1 PT plying (B.35). 59 To analyze the remaining two terms in (B.36), we need third order derivatives of M̂jt and B̂t : ∇δ2k ∇δ2l ∇δ2h M̂jt nβ +2nδ nβ +2nδ ( = X X u=1 s=1 ∇θu ξˆt|t−1 ∇θj ∇θs ∇θ0 fˆ1t + ∇θu ∇θ0 ξˆt|t−1 ∇θj ∇θs fˆ1t + ξˆt|t−1 ∇θj ∇θs ∇θu ∇θ0 fˆ1t + ∇θ0 ξˆt|t−1 ∇θj ∇θs ∇θu fˆ1t − ∇θu ξˆt|t−1 ∇θj ∇θs ∇θ0 fˆ2t − ∇θu ∇θ0 ξˆt|t−1 ∇θj ∇θs fˆ2t + (1 − ξˆt|t−1 )∇θj ∇θs ∇θu ∇θ0 fˆ2t − ∇θ0 ξˆt|t−1 ∇θj ∇θs ∇θu fˆ2t + (∇θj ∇θu ∇θ0 fˆ1t − ∇θj ∇θu ∇θ0 fˆ2t )∇θs ξˆt|t−1 + (∇θj ∇θu fˆ1t − ∇θj ∇θu fˆ2t )∇θs ∇θ0 ξˆt|t−1 + (∇θj ∇θ0 fˆ1t − ∇θj ∇θ0 fˆ2t )∇θs ∇θu ξˆt|t−1 + (∇θj fˆ1t − ∇θj fˆ2t )∇θs ∇θu ∇θ0 ξˆt|t−1 + (∇θs ∇θ0 fˆ1t − ∇θs ∇θ0 fˆ2t )∇θj ∇θu ξˆt|t−1 + (∇θs fˆ1t − ∇θs fˆ2t )∇θj ∇θu ∇θ0 ξˆt|t−1 + (∇θs ∇θu ∇θ0 fˆ1t − ∇θs ∇θu ∇θ0 fˆ2t )∇θj ξˆt|t−1 + (∇θs ∇θu fˆ1t − ∇θs ∇θu fˆ2t )∇θj ∇θ0 ξˆt|t−1 + (∇θu ∇θ0 fˆ1t − ∇θu ∇θ0 fˆ2t )∇θj ∇θs ξˆt|t−1 + (∇θu fˆ1t − ∇θu fˆ2t )∇θj ∇θs ∇θ0 ξˆt|t−1 + (∇θ0 fˆ1t − ∇θ0 fˆ2t )∇θj ∇θs ∇θu ξˆt|t−1 ) + (fˆ1t − fˆ2t )∇θj ∇θs ∇θu ∇θ0 ξˆt|t−1 ∇δ2k θ̂s (δ2 )∇δ2l θ̂u (δ2 )∇δ2h θ̂(δ2 ) nβ +2nδ nβ +2nδ ( + X X u=1 s=1 ∇θu ξˆt|t−1 ∇θj ∇θs fˆ1t + ξˆt|t−1 ∇θj ∇θs ∇θu fˆ1t − ∇θu ξˆt|t−1 ∇θj ∇θs fˆ2t + (1 − ξˆt|t−1 )∇θj ∇θs ∇θu fˆ2t + (∇θj ∇θu fˆ1t − ∇θj ∇θu fˆ2t )∇θs ξˆt|t−1 + (∇θj fˆ1t − ∇θj fˆ2t )∇θs ∇θu ξˆt|t−1 + (∇θs fˆ1t − ∇θs fˆ2t )∇θj ∇θu ξˆt|t−1 + (∇θs ∇θu fˆ1t − ∇θs ∇θu fˆ2t )∇θj ξˆt|t−1 + (∇θu fˆ1t − ∇θu fˆ2t )∇θj ∇θs ξˆt|t−1 ) + (fˆ1t − fˆ2t )∇θj ∇θs ∇θu ξˆt|t−1 [∇δ2k ∇δ2h θ̂s (δ2 )∇δ2l θ̂u (δ2 ) + ∇δ2k θ̂s (δ2 )∇δ2l ∇δ2h θ̂u (δ2 )] nβ +2nδ ( + X ξˆt|t−1 ∇θj ∇θs ∇θ0 fˆ1t + ∇θ0 ξˆt|t−1 ∇θj ∇θs fˆ1t + (1 − ξˆt|t−1 )∇θj ∇θs ∇θ0 fˆ2t s=1 − ∇θ0 ξˆt|t−1 ∇θj ∇θs fˆ2t + (∇θj ∇θ0 fˆ1t − ∇θj ∇θ0 fˆ2t )∇θs ξˆt|t−1 + (∇θj fˆ1t − ∇θj fˆ2t )∇θs ∇θ0 ξˆt|t−1 + (∇θs ∇θ0 fˆ1t − ∇θs ∇θ0 fˆ2t )∇θj ξˆt|t−1 + (∇θs fˆ1t − ∇θs fˆ2t )∇θj ∇θ0 ξˆt|t−1 ) + (∇θ0 fˆ1t − ∇θ0 fˆ2t )∇θj ∇θs ξˆt|t−1 + (fˆ1t − fˆ2t )∇θj ∇θs ∇θ0 ξˆt|t−1 ∇δ2h θ̂(δ2 )∇δ2k ∇δ2l θ̂s (δ2 ), ( + ξˆt|t−1 ∇θj ∇θ0 fˆ1t + (1 − ξt|t−1 )∇θj ∇θ0 fˆ2t + (∇θj fˆ1t − ∇θj fˆ2t )∇θ0 ξˆt|t−1 ) + (∇θ0 fˆ1t − ∇θ0 fˆ2t )∇θj ξˆt|t−1 + (fˆ1t − fˆ2t )∇θj ∇θ0 ξˆt|t−1 ∇δ2k ∇δ2l ∇δ2h θ̂(δ2 ), 60 and ∇δ2k ∇δ2l ∇δ2h B̂t nβ +2nδ nβ +2nδ ( = X X s=1 u=1 ∇θu ∇θ0 ξˆt|t−1 ∇θs fˆ1t + ∇θu ξˆt|t−1 ∇θs ∇θ0 fˆ1t + ∇θ0 ξˆt|t−1 ∇θs ∇θu fˆ1t + ξˆt|t−1 ∇θs ∇θu ∇θ0 fˆ1t − ∇θu ∇θ0 ξˆt|t−1 ∇θs fˆ2t − ∇θu ξˆt|t−1 ∇θs ∇θ0 fˆ2t + (1 − ξˆt|t−1 )∇θs ∇θu ∇θ0 fˆ2t − ∇θ0 ξˆt|t−1 ∇θs ∇θu fˆ2t + (∇θu ∇θ0 fˆ1t − ∇θu ∇θ0 fˆ2t )∇θs ξˆt|t−1 + (∇θu fˆ1t − ∇θu fˆ2t )∇θs ∇θ0 ξˆt|t−1 ) + (∇θ0 fˆ1t − ∇θ0 fˆ2t )∇θs ∇θu ξˆt|t−1 + (fˆ1t − fˆ2t )∇θs ∇θu ∇θ0 ξˆt|t−1 ∇δ2l θ̂u (δ2 )∇δ2k θ̂s (δ2 )∇δ2h θ̂(δ2 ) nβ +2nδ nβ +2nδ ( + X X s=1 u=1 ∇θu ξˆt|t−1 ∇θs fˆ1t + ξˆt|t−1 ∇θs ∇θu fˆ1t − ∇θu ξˆt|t−1 ∇θs fˆ2t + (1 − ξˆt|t−1 )∇θs ∇θu fˆ2t ) + (∇θu fˆ1t − ∇θu fˆ2t )∇θs ξˆt|t−1 + (fˆ1t − fˆ2t )∇θs ∇θu ξˆt|t−1 × h i ∇δ2h ∇δ2l θ̂u (δ2 )∇δ2k θ̂s (δ2 ) + ∇δ2l θ̂u (δ2 )∇δ2h ∇δ2k θ̂s (δ2 ) nβ +2nδ ( X + ∇θ0 ξˆt|t−1 ∇θs fˆ1t + ξˆt|t−1 ∇θs ∇θ0 fˆ1t + (1 − ξˆt|t−1 )∇θs ∇θ0 fˆ2t − ∇θ0 ξˆt|t−1 ∇θs fˆ2t s=1 ) + (∇θ0 fˆ1t − ∇θ0 fˆ2t )∇θs ξˆt|t−1 + (fˆ1t − fˆ2t )∇θs ∇θ0 ξˆt|t−1 ∇δ2h θ̂(δ2 )∇δ2k ∇δ2l θ̂s (δ2 ) h i + ξˆt|t−1 ∇θ0 fˆ1t + (1 − ξˆt|t−1 )∇θ0 fˆ2t + (fˆ1t − fˆ2t )∇θ0 ξˆt|t−1 ∇δ2k ∇δ2l ∇δ2h θ̂(δ2 ). Consider the 1st term in (B.36). In the expression of ∇δ2k ∇δ2l ∇δ2h M̂jt , only the last two lines involve third order derivatives of θ̂(δ2 ). These derivatives are multiplied by (after division by f˜t ): T −1 ˜ PT ˜ and T −1 PT (∇θ f˜at /f˜t )∇θ ξ˜t|t−1 , where a = 1, 2. They are Op (T −1/2 ) by t=1 j1 i1 t=1 ∇θi1 ∇θi2 fat /ft Lemma A.2. The remaining components of ∇δ2k ∇δ2l ∇δ2h M̂jt lead to: T −1 a = 1, 2 and k ≤ 4 and T −1 ˜ PT ˜ ˜ t=1 (∇θi1 ...∇θik fat /ft )(∇θj1 ...∇θjm ξt|t−1 ) ˜ PT ˜ for t=1 (∇θi1 ...∇θik fat /ft ) for a = 1, 2 and k + m ≤ 4. They are all op (1) by Lemma A.2. Therefore the contribution of ∇δ2k ∇δ2l ∇δ2h M̂jt to (B.36) is op (1). Finally, we turn to the 6th term in (B.36). In the expression for ∇δ2k ∇δ2l ∇δ2h B̂t , only the final line involves third order derivatives of θ̂(δ2 ). It can be analyzed in the same way as the second term in (B.26); see Step 2 of the proof there. The remaining components, multiplied by M̂jt , lead to: T −1 PT T −1 PT T −1 PT ˜ ˜ ˜ ˜ ˜ T −1 PT ˜ T −1 PT t=1 (∇θi1 fat /ft )(∇θi2 fbt /ft )(∇θj1 ∇θj2 ξt|t−1 ), ˜ ˜ ˜ ˜ t=1 (∇θi1 ∇θi2 fat /ft )(∇θj1 fbt /ft )(∇θj2 ξt|t−1 ), ˜ ˜ ˜ ˜ ˜ t=1 (∇θi1 fat /ft )(∇θi2 fbt /ft )(∇θj1 ξt|t−1 ) ˜ ˜ ˜ ˜ t=1 (∇θi1 ∇θi2 fat /ft )(∇θj1 fbt /ft ), ˜ ˜ ˜ ˜ and t=1 (∇θi1 ∇θi2 ∇θi3 fat /ft )(∇θj1 fbt /ft ) for a = 1, 2 and b = 1, 2. They are all Op (1) by Lemma 61 A.2. This implies the desired result. Proof of Lemma 4. The first order derivative of L(p, q, δ2 ) with respect to δ2j gives (1) Lj (p, q, δ2 ) = ∇δ2j L(p, q, δ2 ) = = T X 1 ∇θ0 fˆ1t ξˆt|t−1 + ∇θ0 fˆ2t (1 − ξˆt|t−1 ) + (fˆ1t − fˆ2t )∇θ0 ξˆt|t−1 ∇δ2j θ̂(δ2 ) t=1 B̂t nβ +nδ ( T X X s=1 T X + ) 1 ∇θs fˆ1t ξˆt|t−1 + ∇θs fˆ2t (1 − ξˆt|t−1 ) + (fˆ1t − fˆ2t )∇θs ξˆt|t−1 ∇δ2j θ̂s (δ2 ) t=1 B̂t 1 ∇δ2j fˆ1t ξˆt|t−1 + ∇δ2j fˆ2t (1 − ξˆt|t−1 ) + (fˆ1t − fˆ2t )∇δ2j ξˆt|t−1 , t=1 B̂t where the third equality follows from the definition of ∇δ2j θ̂(δ2 ). The term inside the curly brackets is zero because of the first order conditions determining β̂(δ2 ) and δ̂1 (δ2 ). Thus, (1) Lj (p, q, δ2 ) = T X L̂jt t=1 B̂t , where B̂t is defined in (B.1) and L̂jt = ∇δ2j fˆ1t ξˆt|t−1 + ∇δ2j fˆ2t (1 − ξˆt|t−1 ) + (fˆ1t − fˆ2t )∇δ2j ξˆt|t−1 . When δ2 = δ̃, by (B.33), we have L̂jt = 1 − ξ∗ M̂(nβ +j)t ξ∗ (B.37) and (1) Lj (p, q, δ̃) = 1 − ξ∗ (1) Mnβ +j (p, q, δ̃) = 0. ξ∗ (1) Consider the second result in the lemma. Differentiate Lj (p, q, δ2 ) with respect to the k-th component of δ2 : (2) Ljk (p, q, δ2 ) = T X ∇δ2k L̂jt t=1 B̂t − T X ∇δ2k B̂jt t=1 B̂t L̂jt , where ∇δ2k L̂jt = {ξˆt|t−1 ∇δ2j ∇θ0 fˆ1t + (1 − ξˆt|t−1 )∇δ2j ∇θ0 fˆ2t + (∇δ2j fˆ1t − ∇δ2j fˆ2t )∇θ0 ξˆt|t−1 +(∇θ0 fˆ1t − ∇θ0 fˆ2t )∇δ2j ξˆt|t−1 + (fˆ1t − fˆ2t )∇δ2j ∇θ0 ξˆt|t−1 }∇δ2k θ̂(δ2 ). 62 (2) Because M(nβ +j)k (p, q, δ2 ) = 0, we have (2) (2) (2) T −1/2 Ljk (p, q, δ2 ) = T −1/2 Ljk (p, q, δ2 ) − T 1/2 M(nβ +j)k (p, q, δ2 ) = T T X −1/2 −T ( t=1 T X −1/2 ∇δ2k L̂jt − B̂t ∇δ2k B̂t B̂t2 t=1 1 − ξ∗ ξ∗ L̂jt − ∇δ2k M̂(nβ +j)t ) B̂t 1 − ξ∗ M̂(nβ +j)t . ξ∗ The second summation on the right hand side equals 0 when evaluated at δ2 = δ̃ by (B.37). Consider the first summation, at δ2 = δ̃, T −1/2 nβ +2nδ ( X T −1/2 s=1 PT i h t=1 ∇δ2k L̂jt /B̂t equals " T X ξ∗ ∇δ2j ∇θs f˜1t + (1 − ξ∗ )∇δ2j ∇θs f˜2t f˜t t=1 (∇δ2j f˜1t − ∇δ2j f˜2t )∇θs ξ˜t|t−1 + (∇θs f˜1t − ∇θs f˜2t )∇δ2j ξ˜t|t−1 + f˜t #) ∇δ2k θ̂s (δ̃). The terms in the curly brackets are all Op (1) while ∇δ2k θ̂s (δ̃) = Op (T −1/2 ) unless s = nβ + k or s = nβ + nδ + k. Similarly, at δ2 = δ̃, T −1/2 nβ +2nδ ( X s=1 T −1/2 PT h i ∇δ2k M̂(nβ +j)t /B̂t equals t=1 " T X ξ∗ ∇δ1j ∇θs f˜1t + (1 − ξ∗ )∇δ1j ∇θs f˜2t f˜t t=1 (∇δ1j f˜1t − ∇δ1j f˜2t )∇θs ξ˜t|t−1 + (∇θs f˜1t − ∇θs f˜2t )∇δ1j ξ˜t|t−1 + f˜t #) ∇δ2k θ̂s (δ̃), and terms in the curly brackets are all Op (1) while ∇δ2k θ̂s (δ̃) = Op (T −1/2 ) unless s = nβ + k or s = nβ + nδ + k. Combining the two preceding displays, we have T −1/2 (2) Ljk (p, q, δ̃) =T −1/2 T X t=1 63 Ũjk,t + op (1). Consider the third order derivatives. We have (3) T −3/4 Ljkl (p, q, δ2 ) − T 1/4 = T −3/4 −T −T −T T X ( t=1 T X −3/4 ∇δ2k ∇δ2l L̂jt − B̂t ∇δ2l B̂t B̂t t=1 ( T X ∇δ2k B̂t −3/4 ( −3/4 +2T t=1 T X 1 − ξ∗ (3) M(nβ +j)kl (p, q, δ2 ) ξ∗ 1 − ξ∗ ξ∗ ∇δ2k L̂jt − B̂t B̂t ∇δ2l L̂jt − B̂t ∇δ2k ∇δ2l B̂t B̂t2 t=1 −3/4 T X ∇δ2k B̂t ∇δ2l B̂t t=1 B̂t2 ∇δ2k ∇δ2l M̂(nβ +j)t ) B̂t 1 − ξ∗ ξ∗ 1 − ξ∗ ξ∗ L̂jt − (B.38) B̂t ∇δ2l M̂(nβ +j)t 1 − ξ∗ M̂(nβ +j)t ξ∗ L̂jt − ) B̂t ) ∇δ2k M̂(nβ +j)t 1 − ξ∗ M̂(nβ +j)t , ξ∗ where ∇δ2k ∇δ2l L̂jt nβ +2nδ ( = X ξˆt|t−1 ∇δ2j ∇θs ∇θ0 fˆ1t + (1 − ξˆt|t−1 )∇δ2j ∇θs ∇θ0 fˆ2t + (∇δ2j ∇θs fˆ1t − ∇δ2j ∇θs fˆ2t )∇θ0 ξˆt|t−1 s=1 +(∇δ2j ∇θ0 fˆ1t − ∇δ2j ∇θ0 fˆ2t )∇θs ξˆt|t−1 + (∇δ2j fˆ1t − ∇δ2j fˆ2t )∇θs ∇θ0 ξˆt|t−1 +(∇θs ∇θ0 fˆ1t − ∇θs ∇θ0 fˆ2t )∇δ2j ξˆt|t−1 + (∇θs fˆ1t − ∇θs fˆ2t )∇δ2j ∇θ0 ξˆt|t−1 ) +(∇θ0 fˆ1t − ∇θ0 fˆ2t )∇δ2j ∇θs ξˆt|t−1 + (fˆ1t − fˆ2t )∇δ2j ∇θs ∇θ0 ξˆt|t−1 ∇δ2k θ̂s (δ2 )∇δ2l θ̂(δ2 ) ( + ξˆt|t−1 ∇δ2j ∇θ0 fˆ1t + (1 − ξˆt|t−1 )∇δ2j ∇θ0 fˆ2t + (∇δ2j fˆ1t − ∇δ2j fˆ2t )∇θ0 ξˆt|t−1 ) +(∇θ0 fˆ1t − ∇θ0 fˆ2t )∇δ2j ξˆt|t−1 + (fˆ1t − fˆ2t )∇δ2j ∇θ0 ξˆt|t−1 ∇δ2k ∇δ2l θ̂(δ2 ). The first summation in (B.38) consists of the following: T −3/4 PT ˜ ˜ ˜ t=1 (∇θi1 ...∇θiu fat /ft )∇θj1 ...∇θjv ξt|t−1 with u + v ≤ 3. They are Op (T −1/4 ) by the first result in Lemma A.2. Combining this result with Lemma A.3, it follows that this summation is Op (T −1/4 ). The remaining two summations in (B.38) have the same structure. They are both Op (T −1/4 ) after applying (B.35). 64 Consider the fourth order derivatives. We have (4) T −1 Ljklm (p, q, δ2 ) − = T −1 −T −T T X ( t=1 T X −1 −T ∇δ2k L̂jt − B̂t 1 − ξ∗ ξ∗ ∇δ2m ∇δ2k L̂jt − B̂t 1 − ξ∗ ξ∗ ∇δ2k L̂jt ∇δ2l B̂t − B̂t2 −1 T X ∇δ2k ∇δ2m B̂t −1 t=1 T X ∇δ2k ∇δ2l B̂t B̂t t=1 −1 t=1 T X ∇δ2l L̂jt − B̂t 1 − ξ∗ ξ∗ ∇δ2l ∇δ2m L̂jt − B̂t 1 − ξ∗ ξ∗ B̂t ∇δ2l L̂jt ∇δ2m B̂t − B̂t2 ∇δ2k ∇δ2l ∇δ2m B̂t B̂t2 t=1 L̂jt − 65 ) ∇δ2k ∇δ2m M̂(nβ +j)t 1 − ξ∗ ξ∗ ∇δ2k M̂(nβ +j)t ) B̂t ( ) B̂t 1 − ξ∗ ξ∗ ( ∇δ2m L̂jt − B̂t ∇δ2k ∇δ2l M̂(nβ +j)t B̂t B̂t ( ( ) B̂t ( (B.39) ∇δ2k ∇δ2l ∇δ2m M̂(nβ +j)t ( −1 T X ∇δ2k B̂t 1 − ξ∗ ξ∗ B̂t ∇δ2k B̂t B̂t t=1 1 − ξ∗ ξ∗ ∇δ2k ∇δ2l L̂jt − B̂t t=1 T X +2T −T ( B̂t T X ∇δ2m B̂t −1 −1 T X ∇δ2l ∇δ2m B̂t −1 ∇δ2l B̂t −T −1 B̂t t=1 −T ( ∇δ2m B̂t B̂t t=1 +2T 1 − ξ∗ (4) M(nβ +j)klm (p, q, δ2 ) ξ∗ ∇δ2k ∇δ2l ∇δ2m L̂jt − B̂t t=1 T X −T ∇δ2k M̂(nβ +j)t ∇δ2l B̂t ) B̂t2 ∇δ2m M̂(nβ +j)t ) B̂t ∇δ2l M̂(nβ +j)t ) B̂t ∇δ2l ∇δ2m M̂(nβ +j)t 1 − ξ∗ ξ∗ B̂t ∇δ2m B̂t ∇δ2l M̂(nβ +j)t B̂t2 1 − ξ∗ M̂(nβ +j)t , ξ∗ ) ) where ∇δ2k ∇δ2l ∇δ2m L̂jt nβ +2nδ nβ +2nδ = X X u=1 s=1 {ξˆt|t−1 ∇δ2j ∇θs ∇θu ∇θ0 fˆ1t + (1 − ξˆt|t−1 )∇δ2j ∇θs ∇θu ∇θ0 fˆ2t + (∇δ2j ∇θs ∇θu fˆ1t − ∇δ2j ∇θs ∇θu fˆ2t )∇θ0 ξˆt|t−1 + (∇δ2j ∇θs ∇θ0 fˆ1t − ∇δ2j ∇θs ∇θ0 fˆ2t )∇θu ξˆt|t−1 + (∇δ2j ∇θs fˆ1t − ∇δ2j ∇θs fˆ2t )∇θu ∇θ0 ξˆt|t−1 + (∇δ2j ∇θu ∇θ0 fˆ1t − ∇δ2j ∇θu ∇θ0 fˆ2t )∇θs ξˆt|t−1 + (∇δ2j ∇θu fˆ1t − ∇δ2j ∇θu fˆ2t )∇θs ∇θ0 ξˆt|t−1 + (∇δ2j ∇θ0 fˆ1t − ∇δ2j ∇θ0 fˆ2t )∇θs ∇θu ξˆt|t−1 + (∇δ2j fˆ1t − ∇δ2j fˆ2t )∇θs ∇θu ∇θ0 ξˆt|t−1 + (∇θs ∇θu ∇θ0 fˆ1t − ∇θs ∇θu ∇θ0 fˆ2t )∇δ2j ξˆt|t−1 + (∇θs ∇θu fˆ1t − ∇θs ∇θu fˆ2t )∇δ2j ∇θ0 ξˆt|t−1 + (∇θs ∇θ0 fˆ1t − ∇θs ∇θ0 fˆ2t )∇δ2j ∇θu ξˆt|t−1 + (∇θs fˆ1t − ∇θs fˆ2t )∇δ2j ∇θu ∇θ0 ξˆt|t−1 + (∇θu ∇θ0 fˆ1t − ∇θu ∇θ0 fˆ2t )∇δ2j ∇θs ξˆt|t−1 + (∇θu fˆ1t − ∇θu fˆ2t )∇δ2j ∇θs ∇θ0 ξˆt|t−1 + (∇θ0 fˆ1t − ∇θ0 fˆ2t )∇δ2j ∇θs ∇θu ξˆt|t−1 + (fˆ1t − fˆ2t )∇δ2j ∇θs ∇θu ∇θ0 ξˆt|t−1 }× ∇δ2k θ̂s (δ2 )∇δ2l θ̂u (δ2 )∇δ2m θ̂(δ2 ) nβ +2nδ nβ +2nδ + X X u=1 s=1 {ξˆt|t−1 ∇δ2j ∇θs ∇θu fˆ1t + (1 − ξˆt|t−1 )∇δ2j ∇θs ∇θu fˆ2t + (∇δ2j ∇θs fˆ1t − ∇δ2j ∇θs fˆ2t )∇θu ξˆt|t−1 + (∇δ2j ∇θu fˆ1t − ∇δ2j ∇θu fˆ2t )∇θs ξˆt|t−1 + (∇δ2j fˆ1t − ∇δ2j fˆ2t )∇θs ∇θu ξˆt|t−1 + (∇θs ∇θu fˆ1t − ∇θs ∇θu fˆ2t )∇δ2j ξˆt|t−1 + (∇θs fˆ1t − ∇θs fˆ2t )∇δ2j ∇θu ξˆt|t−1 + (∇θu fˆ1t − ∇θu fˆ2t )∇δ2j ∇θs ξˆt|t−1 + (fˆ1t − fˆ2t )∇δ2j ∇θs ∇θu ξˆt|t−1 }× [∇δ2m ∇δ2k θ̂s (δ2 )∇δ2l θ̂u (δ2 ) + ∇δ2k θ̂s (δ2 )∇δ2l ∇δ2m θ̂u (δ2 )] nβ +2nδ + X {ξˆt|t−1 ∇δ2j ∇θs ∇θ0 fˆ1t + (1 − ξˆt|t−1 )∇δ2j ∇θs ∇θ0 fˆ2t + (∇δ2j ∇θs fˆ1t − ∇δ2j ∇θs fˆ2t )∇θ0 ξˆt|t−1 s=1 + (∇δ2j ∇θ0 fˆ1t − ∇δ2j ∇θ0 fˆ2t )∇θs ξˆt|t−1 + (∇δ2j fˆ1t − ∇δ2j fˆ2t )∇θs ∇θ0 ξˆt|t−1 + (∇θs ∇θ0 fˆ1t − ∇θs ∇θ0 fˆ2t )∇δ2j ξˆt|t−1 + (∇θs fˆ1t − ∇θs fˆ2t )∇δ2j ∇θ0 ξˆt|t−1 + (∇θ0 fˆ1t − ∇θ0 fˆ2t )∇δ2j ∇θs ξˆt|t−1 + (fˆ1t − fˆ2t )∇δ2j ∇θs ∇θ0 ξˆt|t−1 }∇δ2k ∇δ2l θ̂s (δ2 )∇δ2m θ̂(δ2 ) nβ +2nδ + X {ξˆt|t−1 ∇δ2j ∇θs fˆ1t + (1 − ξˆt|t−1 )∇δ2j ∇θs fˆ2t + (∇δ2j fˆ1t − ∇δ2j fˆ2t )∇θs ξˆt|t−1 s=1 + (∇θs fˆ1t − ∇θs fˆ2t )∇δ2j ξˆt|t−1 + (fˆ1t − fˆ2t )∇δ2j ∇θs ξˆt|t−1 }∇δ2k ∇δ2l ∇δ2m θ̂s (δ2 ). 66 When δ2 = δ̃, all the terms in (B.39) are op (1) except the 3rd, 6th and 7th terms. These three terms share the same structure and it is suffice to study the first of them: −T −1 T X ∇δ2l ∇δ2m B̂t ( B̂t t=1 ∇δ2k L̂jt − B̂t 1 − ξ∗ ξ∗ ∇δ2k M̂(nβ +j)t B̂t ) . (B.40) According to the previous study, for ∇δ2l ∇δ2m B̂t it is sufficient to consider 1 − ξ∗ ξ∗ ∇δ ∇δ f˜1t ∇δ ∇δ f˜1t ∇δ ∇δ f˜2t ∇δ ∇δ f˜2t + (1 − ξ∗ ) 1l 1m + ξ∗ 2l 2m + (1 − ξ∗ ) 2l 2m ξ∗ 1l 1m f˜t f˜t f˜t f˜t # 2 " " ∇δ ∇δ f˜1t ∇δ ∇δ f˜2t ∇δ ∇δ f˜1t ∇δ ∇δ f˜2t ξ∗ 2l 1m + (1 − ξ∗ ) 2l 1m + ξ∗ 1l 2m + (1 − ξ∗ ) 1l 2m f˜t f˜t f˜t f˜t " ! ! # 1 ∇θm f¯1t ∇θm f¯2t ∇θl f¯1t ∇θl f¯2t + 2 ∇δ1l ξ˜t|t−1 − + − ∇δ1m ξ˜t|t−1 ξ∗ f¯t f¯t f¯t f¯t 1 − ξ∗ − ξ∗ " nβ +nδ X n o [ξ∗ ∇θs f˜1t + (1 − ξ∗ )∇θs f˜2t ] ∇δ2l ∇δ2m θ̂s (δ̃), + s=1 and for ∇δ2k L̂jt − B̂t 1 − ξ∗ ξ∗ ∇δ2k M̂(nβ +j)t B̂t , it is sufficient to consider # " # 2 " ∇δ1j ∇δ1k f˜2t ∇δ2j ∇δ2k f˜1t ∇δ2j ∇δ2k f˜2t ∇δ1j ∇δ1k f˜1t 1 − ξ∗ + ξ∗ + (1 − ξ∗ ) + (1 − ξ∗ ) ξ∗ ξ∗ f˜t f˜t f˜t f˜t " # ∇δ2j ∇δ1k f˜1t ∇δ2j ∇δ1k f˜2t ∇δ1j ∇δ2k f˜2t ∇δ1j ∇δ2k f˜1t 1 − ξ∗ − ξ∗ + (1 − ξ∗ ) + ξ∗ + (1 − ξ∗ ) ξ∗ f˜t f˜t f˜t f˜t " ! # ∇θj f¯1t ∇θj f¯2t 1 ∇θk f¯1t ∇θk f¯2t + 2 ∇δ1j ξ˜t|t−1 − + − ∇δ1k ξ˜t|t−1 . ¯ ¯ ¯ ξ∗ ft ft ft f¯t So, at δ2 = δ̃, (B.40) equals −T −1 T X Ũlm,t + t=1 = −T −1 T X 1 f˜t X n [ξ∗ ∇θs f˜1t + (1 − ξ∗ )∇θs f˜2t ] ∇δ2l ∇δ2m θ̂s (δ̃) Ũjk,t + op (1) Ũlm,t Ũjk,t − T = − Ṽjklm − o s=1 −1 t=1 h nβ +nδ T X t=1 0 ˜−1 D̃jk I D̃lm i # ξ∗ ∇(β 0 ,δ10 ) f˜1t + (1 − ξ∗ )∇(β 0 ,δ10 ) f˜2t −1 Ũjk,t I˜ D̃lm + op (1) f˜t " + op (1). 67 # # Consequently, (4) T −1 Ljklm (p, q, δ̃) − T −1 1 − ξ∗ (4) M(nβ +j)klm (p, q, δ̃) ξ∗ n o 0 ˜−1 0 ˜−1 0 ˜−1 = − Ṽjklm − D̃jk I D̃lm + Ṽjmkl − D̃jm I D̃kl + Ṽkmjl − D̃km I D̃jl + op (1) . This proves the last result of the lemma. The next lemma is the same as Lemma A.3 in Qu and Zhuo (2015) and will be used in the proof of Lemma 5 for establishing the stochastic equicontinuity. Let "*" signify that the quantity is evaluated at the true parameter value. Lemma A.4. Let Assumptions 1-6 and the null hypothesis hold. Let zt (ρ) = T −1/2 Pt−1 t−s εjs εit , s=1 ρ ∗ /f ∗ and ε = ∇ f ∗ /f ∗ . Then, for any ρ, ρ and ρ satisfying − 1 ≤ ρ ≤ ρ ≤ where εit = ∇δ1i f1t 1 2 1 js δ1j 1s s t ρ2 ≤ 1 − , we have 2 2 T T X X E [zt (ρ) − zt (ρ1 )] [zt (ρ2 ) − zt (ρ)] ≤ C (ρ − ρ1 )2 , (B.41) t=1 t=1 where C is a finite constant that depends only on 0 < < 1/2 and the moments of εit and εjs up to the fourth order. Proof of A.4. See the proof of Lemma A.3 in Qu and Zhuo (2015). Proof of Lemma 5. Apply the mean value theorem: T −1/2 T X Ũjk,t = T −1/2 t=1 T X ( Ujk,t + T −1 T X ) ∇θ0 Ūjk,t T 1/2 (θ̃ − θ∗ ), (B.42) t=1 t=1 where Ujk,t and Ūjk,t have the same definition as Ũjk,t but evaluated at the true value θ∗ and some value θ̄ that lies between θ̃ and θ∗ , respectively. We establish the weak convergence of the first term of (B.42) in two steps. First, for any ≤ p, q ≤ 1 − , T −1/2 PT t=1 Ujk,t satisfies the central limit theorem. Second, to verify its stochastic equicontinuity, it suffices to consider the following term in its definition (4.6): T −1/2 = T 1 X ∇δ1k f1t ∇δ1k f2t ∇δ ξ − ξ∗2 t=1 1j t|t−1 ft ft 1−p 1−q ( T −1/2 T t−1 X X t=1 s=1 s ρ ∇δ1j f1t−s ∇δ1j f2t−s − ft ft 68 ! ∇δ1k f1t ∇δ1k f2t − ft ft ) , where the quantities are all evaluated at the true value θ∗ , and the equality follows from (B.6) and (2.4). Denote the quantity inside the curly brackets as W (ρ). Note that we have |ρ| ≤ 1 − 2. Then, Lemma A.3 implies, for any ρ1 ≤ ρ ≤ ρ2 , we have E[|W (ρ1 ) − W (ρ)|2 |W (ρ) − W (ρ2 )|2 ] ≤ C2 (ρ1 − ρ2 )2 , where C2 is a finite constant. We can apply the similar proof to the third and fourth components in T −1/2 PT t=1 Ujk,t . Then the condition required in Theorem 13.5 in Billingsley (1999; c.f. the Display (13.14) in p. 143) is satisfied. The second term in (B.42) equals, by the mean value theorem, ( − T −1 T X ξ∗ ∇(β 0 ,δ0 ) f1t + (1 − ξ∗ )∇(β 0 ,δ0 ) f2t 1 1 ft t=1 ( = −Djk I −1 T −1/2 ) Ujk,t ( I −1 T −1/2 T X ξ∗ ∇(β 0 ,δ0 ) f1t + (1 − ξ∗ )∇(β 0 ,δ0 ) f2t 1 ) T X ξ∗ ∇(β 0 ,δ0 ) f1t + (1 − ξ∗ )∇(β 0 ,δ0 ) f2t 1 1 ft t=1 1 ft t=1 ) + op (1) + op (1) , where the quantities are all evaluated at the true value θ∗ and the equality holds because of the uniform law of large numbers. The term inside the last curly brackets is independent of p and q and satisfies the central limit theorem. Combining the above results for the two terms in (B.42), it follows that T −1/2 PT converges weakly over ≤ p, q ≤ 1 − . The covariance function follows t=1 Ũjk,t immediately. Proof of Proposition 1. Let η = T −1/4 (δ2 − δ̃). The expansion (4.5) can be equivalently represented in matrix notation as L(p, q, δ2 ) − L(p, q, δ̃) = i 1 0 1 ⊗3 0 1 ⊗2 0 h −1/2 × Op T −1/4 − η T vecL(2) (p, q, δ̃) + η η ⊗2 [Ω(p, q) + op (1)] η ⊗2 . 2! 3! 8 Because Ω(p, q) is positive definite, the right hand side will be negative with probability approaching 1 unless η = Op (1). Thus, for any ε > 0, we can choose M < ∞ such that P (kηk ≤ M ) ≥ 1 − ε for sufficiently large T . Restricting to this set, we have sup sup h L(p, q, δ2 ) − L(p, q, δ̃) i (p,q)∈Λ kηk≤M = sup sup η ⊗2 0 h T (p,q)∈Λ kηk≤M =⇒ sup sup (p,q)∈Λ kηk≤M η ⊗2 0 −1/2 (2) vecL 1 ⊗2 0 (p, q, δ̃) − η Ω(p, q) η ⊗2 + op (1) 4 i (B.43) 1 ⊗2 0 G(p, q) − η Ω(p, q) η ⊗2 , 4 where the convergence follows from Lemma (B.43) and that the supremum operator is continuous when 69 taken over a compact set. Finally, the result follows because ε can be made arbitrarily small. 70