Journal of Econometrics 221 (2021) 337–367 Contents lists available at ScienceDirect Journal of Econometrics journal homepage: www.elsevier.com/locate/jeconom Estimation of dynamic panel spatial vector autoregression: Stability and spatial multivariate cointegration Kai Yang a , Lung-fei Lee b , ∗ a School of Economics, Shanghai University of Finance and Economics, and Key Laboratory of Mathematical Economics (SUFE), Ministry of Education, No. 777 Guoding Road, Yangpu District, Shanghai, 200433, China b Department of Economics, The Ohio State University, 410 Arps Hall, 1945 N. High St., Columbus, OH 43210, USA article info Article history: Received 29 June 2017 Received in revised form 4 October 2019 Accepted 15 May 2020 Available online 7 August 2020 JEL classification: C31 C33 R11 Keywords: Dynamic panel Spatial vector autoregression Identification Quasi-maximum likelihood Spatial cointegration Market integration a b s t r a c t This paper introduces dynamic panel spatial vector autoregressive models. We study features of dynamics and spatial interactions that an SVAR model can generate and classify the model into stable or unstable cases by partitioning parameter spaces. For stable, spatial cointegration, and mixed cointegration cases, we investigate identification and QML estimation of the models to take into account simultaneity and correlated relationships. Asymptotic properties and bias-corrected estimators are presented. To detect unknown cointegration relationships, we introduce a sequential likelihood ratio testing procedure. Simulations show the advantage of QMLEs on bias reduction and efficiency gains. The empirical application provides evidences on ancient China’s market integration. © 2020 Elsevier B.V. All rights reserved. 1. Introduction Panel data with cross-sectional dependent variables have highlighted the need for new analytical models to model dependence patterns. The vector autoregressive (VAR) model has proven to be useful for describing dynamic behaviors of economic variables. However, the number of dependent variables cannot be too large for statistical inference, which limits its application to analyze data with large cross-sectional units, in particular, for regional studies. Regional data often contains much larger cross-sectional units (counties, metropolitan areas, prefectures or states). In order to overcome the cross-sectional dimension problem, researchers assume relative prior strengths of connections via the specification of a spatial weights matrix. Among various models to describe spatial dependence, spatial autoregressive models (SAR) have attracted much attention. Early works include Anselin (1988), Kelejian and Prucha (1998, 1999) and Lee (2004, 2007). Panel data models incorporating spatial autoregression are studied in Elhorst (2003), Baltagi (2006), Yu et al. (2008, 2012), Su and Yang (2015), and Li (2017) among others. Asymptotic properties of estimators for single equation spatial dynamic panel data (SDPD) models are established in Yu et al. (2008, 2012). Li (2017) studies the SDPD model with high order spatial lags and high order time lags. He investigates the estimation of the impulse response functions and studies estimation and inference of the average direct, indirect and total impacts (LeSage and Pace, 2009). Although a large number of economic ∗ Corresponding author. E-mail addresses: yang.kai@mail.shufe.edu.cn (K. Yang), lee.1777@osu.edu (L.-f. Lee). https://doi.org/10.1016/j.jeconom.2020.05.010 0304-4076/© 2020 Elsevier B.V. All rights reserved. 338 K. Yang and L.-f. Lee / Journal of Econometrics 221 (2021) 337–367 theories may concern about interrelations among several economic variables, econometric studies regarding vector SAR (or spatial vector autoregression, SVAR) models are limited with a few exceptions: Kelejian and Prucha (2004), Baltagi and Deng (2015), Cohen-Cole et al. (2018), Liu (2014) and Yang and Lee (2017). This paper considers dynamic panel SVAR models which are composed of multiple dependent variables across time and space. We study stable and/or unstable temporal and spatial features of those variables. In addition to time lags, a cross-sectional unit may respond to its neighbors’ or peers’ behavior in current period (spatial lags), and in previous periods (space–time lags; diffusion). Therefore, we could expect that a shock to a variable in a region could propagate to other variables, across both temporal and spatial dimensions. For panel data models, overall disturbances in such a model may include time and individual fixed effects in addition to idiosyncratic shocks. Such a model can be applied to analyze equation systems with time dynamics and spatial spillover effects in regional science (de Graaff et al., 2012; Gebremariam et al., 2011), fiscal policy with government competition (Hauptmeier et al., 2012; Allers and Elhorst, 2011), macroeconomic or financial analysis with internal and external habits in consumption (Korniotis, 2010), spatial propagation of macroeconomic shocks (Dewachter et al., 2012) and spillover effects in environmental economics, housing prices, and social network (Brown and Laschever, 2012). The SVAR model adopts spatial and temporal dependence features of the SDPD model in Yu et al. (2008, 2012) and the nature of multivariate interactions in econometrics. However, it is not a simple extension of those models. We present main issues of SVAR with its difference from SDPD and conventional VAR models. The first issue is on features of data that a SVAR model can generate. A SVAR model can generate stable and unstable time series and spatial process in panel data: (i) Case S (stable): all variables (time series) for each spatial unit and spatial processes are stable ; (ii) Case SC (pure spatial cointegration): variables are spatially cointegrated among all spatial units, but they are not cointegrated with each other in a spatial unit as a VAR, so the cointegration rank is determined only by the spatial weights matrix; (iii) Case VC (pure variable cointegration): variables for each spatial unit are cointegrated as a VAR but they are not spatially cointegrated across spatial units; (iv) Case MC (mixed case of spatial cointegration and variable cointegration): variables are both spatially cointegrated among spatial units and cointegrated with other variables in each spatial unit; (v) Case PU (pure unit root) unstable variables are unit root processes and they are not cointegrated; (vi) Case E (explosive) some/all variables are explosive. For a SDPD model, which is a univariate model, it may exhibit stable, spatial cointegration, pure unit root, and explosive situations. However, for a SDPD model, ‘‘the cointegrating space is completely known (determined by the spatial weights matrix) when cointegration occurs’’ (Yu et al., 2012). In the conventional multivariate time series literature, the rank of cointegration is the main object of inference. In our model, the cointegrating space is ‘‘partially known’’, because the spatial weights matrix is known but matrices of parameters for cointegration in the temporal dimension are needed to be estimated, which causes difficulties in categorizing various cases and estimation issues without knowing the cointegration rank, and there is a need on detecting cointegration rank if it occurs. The identification and estimation have several issues needed to be investigated. The first issue lies on the identification of parameters with simultaneity among equations, correlation over time, and dependence across spatial units. The second issue, which is the major difference from the univariate SDPD model, is on estimation. One has to study suitable estimation strategy with neither knowing the model being stable or unstable nor knowing cointegration ranks if the model is cointegrated. The third issue is to detect which case the true model is with available real data. We show that this third issue is related to the investigation on cointegration rank. Hence, a testing procedure is required due to the multivariate interactions. Therefore, we extend Johansen’s type cointegration rank test statistics to detect the rank with sequential hypothesis testing for our spatial panels. There are few works on VAR models with spatial features. Beenstock and Felsenstein (2007) study a SVAR model by introducing spatial lags, space–time lags, and spatial errors into the vector autoregressive model. They propose an IV estimation method and use annual data with 4 variables to illustrate their estimation. They find evidence of space– time effect and spatial error. Mutl (2009) studies a panel VAR model with spatial dependent errors and proposes a 3-step estimation method to handle spatial errors. However, stable and unstable cases of the SVAR model have not been investigated in detail and asymptotic properties of estimators are not formally studied. We study in detail characteristics of various cases of the SVAR model in Section 2. We decompose the model to show that the rank of a parametric matrix is crucial to categorize various cases. In Section 3, we study identification and a unified quasi-maximum likelihood (QML) estimation. We focus on cases S, SC, and MC. The identification requirement for spatial lags via IVs relies on the presence of exogenous variables and/or valid predetermined time lagged variables. For estimation, we first eliminate unstable components and time fixed effects in order to avoid the incidental parameter problem due to time effects as we are considering samples with the number of time periods T√tending to infinity. The QML estimator for each common parameter of interest is consistent and asymptotic normal with nT rate of convergence rate. Furthermore, we introduce a test which distinguishes the stable case with SC and MC cases and to detect the cointegration rank if cointegration occurs. This method of test relies on the rotation of variables by the spatial weights matrix, and utilizes the ‘‘known’’ part of information for instability. Using the transformed model, we propose a Johansen’s type sequential hypothesis testing procedure and show the asymptotic distribution of the test statistic. Our main theoretical results rely on asymptotic analysis, while Monte Carlo experiments show the robustness of these results for finite samples. Possible biases reduction and efficient gains of systematic QML estimation over single equation estimations (IV/2SLS) and 3SLS for finite sample are also presented. K. Yang and L.-f. Lee / Journal of Econometrics 221 (2021) 337–367 339 In application, we apply the model to study the grain market integration in historical China using a unique historical dataset of rice and wheat prices of 65 prefectures over 49 years in Yangtze River Basin in the 18th century. Previous researches consider rice prices solely. We add the multivariate feature by considering wheat prices. The empirical results show that rice and wheat prices are spatially dependent with each other and across prefectures. The evidence of test suggests a stable model. These results provide evidence of interregional and intertemporal grain market integration and trading network in the eighteenth-century Yangtze River basin. Section 2 specifies the dynamic panel SVAR model. Section 3 studies model identification and quasi-maximum likelihood (QML) estimation. Asymptotic properties of QML estimates (QMLEs) for stable and unstable models are established. Section 4 introduces a hypothesis testing procedure and its test statistic. Sections 5 and 6 present Monte Carlo experiments and an empirical application. Additional details on proofs of results are collected in an online supplementary file. 2. A dynamic panel spatial vector autoregressive model We specify a dynamic panel spatial vector autoregressive (SVAR) model as Ynm,t = Wn Ynm,t Ψm0 + Ynm,t −1 Pm0 + Wn Ynm,t −1 Φm0 + Xnk,t Πkm0 + Cnm0 + Dm0,t + Vnm,t , (2.1) for t = 1, 2, . . . , T , where Ynm,t is an n × m matrix consisting of m multivariate endogenous variables (one for each column); t represents the time index and n is the number of cross-sectional spatial units. Neighborhood relationships are summarized by Wn , an n × n matrix. Ynm,t −1 is the time lag, and Wn Ynm,t −1 is the space–time lag, which captures diffusion. Xnk,t is an n × k matrix of k exogenous variables of n spatial units at time t. This model includes own-variable spatial effects represented by diagonal elements of Ψm0 and cross-variable spatial effects represented by off-diagonal elements of Ψm0 . Furthermore, Pm0 and Φm0 are m × m matrices, representing dynamic time effects and space–time diffusion effects. Πkm0 is a k × m coefficient matrix for regressors. The Cnm0 is an n × m matrix of individuals effects. There is an m-dimensional ′ column vector of time effects for the m dependent variables in Dm0,t = αm0 ,t ⊗ ln , where ln is an n × 1 vector with each entry being one and αm0,t is the m-dimensional time fixed effects column vector. Row vectors of idiosyncratic disturbance matrix Vnm,t are assumed to be i.i.d.(0, Σv m0 ), across spatial units and over time, but vn,t ,ih and vn,t ,il in Vnm,t of a unit i at time t for different equations h and l are allowed to be correlated. Correlated effects across disturbances are incorporated in off-diagonal entries of the covariance matrix Σv m0 of disturbances. ′ ′ ′ ′ ′ ′ For the reduced form of the SVAR model, we may first transpose the equation, Ynm ,t = Ψm0 Ynm,t Wn + Pm0 Ynm,t −1 + ′ ′ ′ ′ ′ ′ ′ ′ + V , and then take the vectorization to have, + D X + C W + Π Ynm Φm0 nm,t m0,t nm0 km0 nk,t ,t −1 n ′ ′ ′ ynm,t = (Wn ⊗ Ψm0 )ynm,t + (In ⊗ Pm0 )ynm,t −1 + (Wn ⊗ Φm0 )ynm,t −1 ′ + (Xnk,t ⊗ Im )vec(Πkm0 ) + cnm0 + ln ⊗ αm0,t + vnm,t , (2.2) ′ ′ ′ where ynm,t = vec(Ynm ,t ), cnm0 = vec(Cnm0 ), and vnm,t = vec(Vnm,t ) are (column) vectors of dimension nm. For this arrangement, at each time t, we first pack the m multivariate variables together for each individual and then order the individuals. For simpler exposition, we collect terms of exogenous variables and individual fixed effects into Qnm0,t = ′ (Xnk,t ⊗ Im )vec(Πkm0 ) + cnm0 , and define −1 ′ ′ ′ Snm0 = Inm − Wn ⊗ Ψm0 and Hnm0 = Snm0 (In ⊗ Pm0 + Wn ⊗ Φm0 ). Assuming that the process has been operated for a long time, its final form across space and time is ynm,t = +∞ ∑ −1 h Hnm0 Snm0 [Qnm0,t −h + ln ⊗ αm0,t −h + vnm,t −h ]. (2.3) h=0 In order to analyze dynamics of this system, we consider the popular specification that the weights matrix Wn is row-normalized and diagonalizable with real eigenvalues i.e., Wn = Γn ω̄n Γn−1 with a diagonal real eigenvalue matrix ω̄n and an eigenvector matrix Γn .1 Since the weights matrix is row-normalized, the largest eigenvalue is 1. Suppose that the eigenvalue matrix ω̄n has n1 eigenvalues to be one while others are less than one in absolute value. The eigenvalues are arranged such that ω̄n = diag{1, . . . , 1, ωn,n1 +1 , . . . , ωn,n } = 1n,n1 + diag{0, . . . , 0, ωn,n1 +1 , . . . , ωn,n }, where 1n,n1 represents the n × n diagonal matrix with the first n1 diagonal elements being one and remaining diagonal elements being zero. It follows that we can decompose Snm0 and Hnm0 based on the above decomposition of Wn . We further let ′ Sm0 = Im − Ψm0 , −1 ′ ′ Hm0 = Sm0 (Pm0 + Φm0 ), 1 Detailed discussion is on Assumption 2.1. In networks, a typical undirected network matrix is diagonalizable with real eigenvalues because of its symmetry. If Wn is row-normalized from an original symmetric model as in Ord (1975), Wn is diagonalizable with all eigenvalues being real. And normalized weights matrices are widely employed in empirical studies of regional economics and social network analysis. 340 K. Yang and L.-f. Lee / Journal of Econometrics 221 (2021) 337–367 ′ −1 ) (Pm0 + ωn,n1 +1 Φm0 )′ , B̃nm0 = diag{0, . . . , 0, (Im − ωn,n1 +1 Ψm0 ′ −1 ) (Pm0 + ωn,n Φm0 )′ }, · · · , (Im − ωn,n Ψm0 Bnm0 = (Γn ⊗ Im )B̃nm0 (Γn−1 ⊗ Im ), and Wnu = Γn 1n,n1 Γn−1 , where B̃nm0 is an nm × nm diagonal block matrix and each diagonal block is an m × m submatrix. It follows that −1 −1 −1 2 h h h h Hnm0 = Wnu ⊗ Hm0 + Bhnm0 , and Hnm0 Snm0 = Wnu ⊗ (Hm0 Sm0 ) + Bhnm0 Snm0 . We can decompose the vector of dependent variables at t into several components as (τ ) (u) (s) (α ) ynm,t = ynm,t + ynm,t + ynm,t + ynm,t , where (τ ) ynm,t = t ∑ −1 h Sm0 )]Qnm0,t −h , [Wnu ⊗ (Hm0 h=0 (u) t +1 )ynm,−1 + ynm,t = (Wnu ⊗ Hm0 t ∑ −1 h [Wnu ⊗ (Hm0 Sm0 )]vnm,t −h , h=0 (s) ynm,t = ∞ ∑ −1 (Qnm0,t −h Bhnm0 Snm0 (2.4) + vnm,t −h ), and h=0 (α ) ynm,t = t ∑ −1 h [ln ⊗ (Hm0 Sm0 )]αm0,t −h . h=0 The decomposition is appealing as it strips out the unit eigenvalues from Wn to construct Wnu so that the potential stable and unstable components can be separated. In the above decomposition, because Wnu ’s eigenvalues are ones and zeros, (τ ) if Hm0 contains unit eigenvalues as well, ynm,t represents the possible time trend (deterministic trend) due to exogenous (u) (s) variables and individual fixed effects and ynm,t generates a stochastic trend. Otherwise, they would be stable. ynm,t is 3 (α ) assumed to be stable under cases S, MC, SC and E. ynm,t contains time effects due to time fixed effects. By dropping the subscript 0 in a vector or a matrix, it refers to a corresponding vector or a matrix evaluated at a possible parameter vector instead of the true value. For example, Snm = Inm − Wn ⊗ Ψm′ , where Ψm is a possible parameter matrix with its true one being Ψm0 . Formal assumptions for Wn and related matrices are in Assumption 2.1. Assumption 2.1. (i) Wn is a nonstochastic, row-normalized and diagonalizable weights matrix. Row and column sums of Wn in absolute −1 value are uniformly bounded, uniformly in n. Snm is nonsingular in the parameter space of Ψm ; and Snm and Snm are 4 uniformly bounded in row and column sums norms∑ uniformly in Ψm ’s parameter space. +∞ h (ii) Wn = Γn ω̄n Γn−1 , all eigenvalues of Wn are real, and h=1 abs(Bnm0 ) are bounded in row and column sums norms. (iii) In addition to (ii), for unstable cases, Wn has n1 unit eigenvalues in ω̄n such that n1 /n tends to a positive constant as n → ∞.5 A spatial weights matrix is a key object in a SAR model. In this study, we focus on the scenario of a nonstochastic and row-normalized spatial weights matrix as these features are prevalent in empirical studies in regional economics and social network (as a social norm effect, see Liu et al., 2014). Its corresponding eigenvalues and eigenvectors are better used for dynamic analysis. We assume the row and column sum norms of the sequence of Wn are bounded, which could be justified by the sparsity of a spatial weights matrix or decreasing interactions of spatial units with far away ones and there are no units linked strongly by many others. We make the assumption of real eigenvalues of Wn for unstable models for −1 proper analysis.6 The boundedness of row and column sum ∑ norms for Snm and Snm uniformly in Ψm in its parameter space +∞ h are usual in spatial autoregressive models. We always need h=1 abs(Bnm0 ) to be bounded in row and column sum norms in (ii) even for some unstable SVAR models. There are many examples of Wn , in particular, block diagonal matrices, with a diverging number of unit eigenvalues that satisfy Assumption 2.1(iii). For instance, when studying peer effects among 2 See the supplementary file for detailed derivations. 3 Cases VC and PU are different as their unstable components are merely generated by the time lagged term, while in other cases the unstable components are due to all three lags. 4 When evaluated at any value of Ψ ’s parameter space, we use S instead of S (Ψ ) for simplicity. m nm nm m 5 If n /n goes to zero, one would expect the unit root issue would asymptotically disappear and might not be of interest. 1 6 For the case that W is row-normalized from a non-negative symmetric matrix, it is known that all eigenvalues of W are real. For the stable n n model, it is not necessary to assume that all eigenvalues ∑ of the weights matrix are real for consistency and asymptotic normality of the direct QMLE +∞ h (without transformation in Section 3). Instead we need h=1 abs(Hnm0 ) to be bounded in row and column sum norms for the stable SVAR model. However, for unstable models, it is not easy to analyze their estimators with complex unit eigenvalues for Wn . K. Yang and L.-f. Lee / Journal of Econometrics 221 (2021) 337–367 341 teenagers, Cohen-Cole et al. (2018) use a sample consisting of 7669 students distributed over 124 schools. There are 1043 networks, which mean that there are at least 1043 unit eigenvalues of Wn when they use a row-normalized network matrix. Brown and Laschever (2012) study peer effects on timing of retirement among Los Angeles teachers and define that teachers in a school form a network. Their 7-year panel data sample consists of 31,931 person-year observations and 606 unique schools (network). In above examples, we can expect diverging number of unit eigenvalues with increasing sample size. Furthermore, in regional studies, when we treat villages/counties/cities within a high level administrative area (for example, states or provinces) as being in one network, the diagonal-block pattern of a spatial weights matrix implies diverging number of unit eigenvalues. In the empirical regional study in Section 6, one of the possible networks that we introduce is in this form. 2.1. Parameter space and (In)stability The dynamic panel SVAR system has two dimensions — one in space and the other in time. In this paper, we assume that the model exhibits stability in the spatial dimension at each time period.7 However, we consider the stable and unstable cases in the temporal dimension, or mixed with both the spatial dimension and the temporal dimension. We investigate parameter values with which the model can generate various cases.8 2.1.1. Categorization: the stable case and unstable cases The stable equilibrium in the spatial dimension at each time period is presented in Assumption 2.2(i). For dynamic panel data models, we need to consider also the temporal dimension. First of all, the stability of SVAR system requires all the eigenvalues λH in absolute value of Hnm , |λH | < 1. Because Snm is invertible, the stable condition requires that ′ all solutions λ to |(In ⊗ Pm + Wn ⊗ Φm′ ) − λSnm | = 0 are inside the unit circle since it is equivalent to |Hnm − λInm | = 0. By the spectral radius theorem, any eigenvalue of Hnm in absolute value will be less than or equal to ∥Hnm ∥ for any consistent matrix norm ∥.∥.9 There are many sufficient conditions. For example, simple sufficient conditions can be ∥Pm′ ∥∞ + ∥Φm′ ∥∞ + ∥Ψm′ ∥∞ < 1, or ∥Pm′ ∥1 + ∥Φm′ ∥1 ∥Wn ∥1 + ∥Ψm′ ∥1 ∥Wn ∥1 < 1.10 These sufficient conditions suggest that, in order to guarantee the model is stable, the column (row) sums of matrices for spatial lagged effects, time lagged effects, diffusion effects cannot be large. Assumption 2.2(ii) states the condition for the stable case. Assumption 2.2. ′ ′ ′ ), vec(Pm ), vec(Φm ), vec(Ψm′ ), vec∗ (Σv m )]′ is (i) (General conditions) The parameter space of coefficient θ = [vec(Πkm 11 compact and the true parameter θ0 is located in the interior of its parameter space. The parameter space of coefficients Ψm is such that ρ (Ψm ) < 1, where ρ (Ψm ) represents the spectral radius of matrix Ψm .12 The covariance matrix Σv m is nonsingular. (ii) (Temporal stable case) The parameter space of coefficients Ψm , Pm and Φm is compact such that all solutions λ to −1 ′ |Snm (In ⊗ Pm + Wn ⊗ Φm′ ) − λInm | = 0 are inside the unit circle. (iii) (Temporal unstable case MC) All eigenvalues of Hm are less than or equal to one in absolute value, while the largest eigenvalue equals one and the smallest eigenvalue in absolute value is less than one. Ψm + Φm ̸ = 0, ρ (Pm ) < 1, and ρ ((Im − ωn,j Ψm′ )−1 (Pm + ωn,j Φm )′ ) < 1 for all those eigenvalues wn,j of Wn less than one in absolute value, where such eigenvalues are assumed to be bounded away from 1, as n tends to infinity. (iv) (Temporal unstable case SC) Hm = Im , Ψm + Φm ̸ = 0, and ρ (Pm ) < 1. ρ ((Im − ωn,j Ψm′ )−1 (Pm + ωn,j Φm )′ ) < 1 for all those eigenvalues wn,j of Wn less than one in absolute value, where such eigenvalues are assumed to be bounded away from 1, as n tends to infinity. (v) (Temporal unstable case VC) Ψm + Φm = 0 and all eigenvalues of Pm are inside or on the unit circle while ρ (Pm ) = 1 and 0 < rank(Pm − Im ) < m. ρ ((Im − ωn,j Ψm′ )−1 (Pm + ωn,j Φm )′ ) = 1 for all those eigenvalues ωn,j of Wn for j = 1, 2, . . . , n, as n tends to infinity. (vi) (Temporal unstable case PU) Ψm + Φm = 0 and Pm = Im . 7 That would usually be for a cross sectional SAR model as it is an equilibrium model. An equilibrium model would require S to be invertible. m0 If unit eigenvalues existed in Ψm0 with a row-normalized Wn , Snm0 would not be invertible. Of course, if eigenvalues are larger than one, then the cross sectional model would be unstable. We do not consider such a case here. 8 We analyze the stability of the model using parameter of any value in its parameter spaces. We assume the true value is located in the interior of its parameter space by Assumption 2.2. 9 Recall that a matrix norm ∥.∥ on Rn×m is consistent for two vector norms ∥.∥ on Rm and ∥.∥ on Rn if ∥Ax∥ ≤ ∥A∥∥x∥ for A ∈ Rn×m and a b b a x ∈ Rm . All induced norms are consistent by definition. We usually use column sum norm ∥.∥1 and row sum norm ∥.∥∞ in this paper. 1 10 This is derived from ∥H ∥ < 1 for any induced matrix norm. As ∥H ∥ ≤ (∥I ⊗ P ′ ∥+∥W ⊗ Φ ′ ∥)∥S −1 ∥ ≤ (∥I ⊗ P ′ ∥+∥W ⊗ Φ ′ ∥) ,a nm nm n n n n m m nm m m 1−∥W ⊗Ψ ′ ∥ n m ′ stronger sufficient condition from the preceding inequality is ∥Pm ∥∞ +∥Φm′ ∥∞ ∥Wn ∥∞ +∥Ψm′ ∥∞ ∥Wn ∥∞ < 1, or ∥Pm′ ∥1 +∥Φm′ ∥1 ∥Wn ∥1 +∥Ψm′ ∥1 ∥Wn ∥1 < 1. ( ) 11 vec∗ (·) collects distinct entries of a matrix and transforms them into a vector. For example, vec∗ 12 A general condition with any possible W is ρ (Ψ )ρ (W ) < 1. n m n σ11 σ21 σ12 σ22 = (σ11 , σ21 , σ22 )′ if σ12 = σ21 . 342 K. Yang and L.-f. Lee / Journal of Econometrics 221 (2021) 337–367 (vii) (An explosive case E) The largest eigenvalues in absolute value of Hm are larger than one. ρ (Pm ) < 1. ρ ((Im − ωn,j Ψm′ )−1 (Pm + ωn,j Φm )′ ) < 1 for all those eigenvalues wn,j of Wn less than one in absolute value, where such eigenvalues are assumed to be bounded away from 1, as n tends to infinity. Assumption 2.2(i) characterizes the condition ρ (Ψm ) < 1 under which the SVAR system provides an equilibrium outcome matrix Ynm,t in terms of time lagged variables, exogenous variables, and disturbances at each time period. We maintain this assumption in the whole paper, which means the instability of the model will come from the temporal dimension. Condition (ii) is equivalent to assuming that all eigenvalues of matrix Hnm are inside the unit circle. (s) Assumption 2.2(iii) and (iv) provide general conditions under which the component ynm,t is stable while other components are unstable. Differing from a single equation model in Yu et al. (2012) and Lee and Yu (2010a), the model might generate cointegration among variables and/or spatial units. There are two relevant issues: how to determine the difference between spatial cointegration and mixed (spatial and variable) cointegration; and how to determine the cointegration rank if cointegration occurs. We analyze characteristics of these unstable models in the following subsections. Assumption 2.2(v) characterizes an unstable case in which the cointegration relationship only exists among different variables for each spatial unit. They are distinct from the spatial cointegration and mixed cointegration. It implies that the coefficient matrix of own-variable and cross-variable spatial lagged effects and the coefficient matrix of own-variable and cross-variable diffusion cancels out, i.e. Ψm + Φm = 0. Differing from the pure unit root case for a univariate SDPD model, the case VC is firstly introduced in this paper. For the case VC if Ψm = Φm = 0, we get a degenerated case with both spatial lagged effects and diffusion effects that disappear. So the model is the same with classic cointegration for dynamic panel data models. Assumption 2.2(vi) provides a condition under which the model exhibits pure unit root processes. This PU case can also be viewed as an extension to the pure unit root case for a univariate SDPD model in Yu and Lee (2010). All variables are unstable and they exhibit unit root characteristics. Assumption 2.2(vii) characterizes an explosive case under which our proposed estimation method works.13 The important difference between cases VC/PU with cases SC/MC is that, for VC/PU, the spatial interaction does not contribute to cointegrated relationships and no linear combinations of dependent variables from different spatial units are stable. For cases SC/MC, the unstable components might be generated by the SVAR model not only in the temporal dimension by time lags, but also mixed with the spatial dimension, which is what we are concerned with in this paper. Also, PU and VC are relatively restrictive in assumptions, and require different estimation strategy with different asymptotics, therefore in this paper we focus on cases S, SC, and MC in estimation. We propose an estimation method which can be applied to these three cases. Indeed, it can also be applied to some explosive cases. 2.1.2. Unstable SC and MC cases: Error correction, rotation and cointegration rank When there are relevant unit eigenvalues of the process, the system may be unstable. In the decomposition of Eq. (2.4), (s) the instability arises in all components except ynm,t for cases SC and MC. The cointegration among variables and spatial units can be represented by an error correction model (ECM). We use the ECM representation to derive the cointegration matrix and its cointegration rank. Also, the model can be represented by rotation using eigenvectors of the spatial weights matrix, which can present different views among cases SC, MC, VC and PU. In this subsection, we investigate cointegration relationships and cointegration ranks for cases SC and MC. First, we represent the model with its parameters in the true value as an ECM and derive the cointegration matrix. We −1 subtract a time lagged term from both sides of (2.2) and arrive at ynm,t − ynm,t −1 = (Hnm0 − Inm )ynm,t −1 + Snm0 [Qnm0,t + vec(D′m0,t ) + vnm,t ]. It follows that an error correction representation is −1 θ −1 ynm,t − ynm,t −1 = Snm0 Anm0 ynm,t −1 + Snm0 [Qnm0,t + vec(D′m0,t ) + vnm,t ], (2.5) and Aθnm0 = In ⊗ (Pm0 − Im )′ + Wn ⊗ (Ψm0 + Φm0 )′ (2.6) = (In − Wn ) ⊗ (Pm0 − Im )′ + Wn ⊗ (Ψm0 + Φm0 + Pm0 − Im )′ . By showing the stability of Aθnm0 ynm,t −1 , we have Aθnm0 being a cointegration matrix. In the cointegration case, ynm,t , (α ) (u) (τ ) (α ) and ynm,t in (2.4) of ynm,t may be unstable if some of the eigenvalues of Hm0 are ones. As ynm,t , ynm,t and ynm,t all ∑t h include the summation h=0 Hm0 , there may be instability in terms of time trend due to the summation in the presence of unit eigenvalues in Hm0 . However, these unstable components of ynm,t can be taken by linear combinations Aθnm0 to h h become stable. To see this possibility, we note that (Ψm0 + Φm0 + Pm0 − Im )′ Hm0 = Sm0 (Hm0 − Im )Hm0 . Suppose λm is h h an eigenvalue of Hm0 , then the corresponding eigenvalue of (Hm0 − Im )Hm0 should be (λm − 1)λm , which will be 0 when τ) λm = 1. In addition, (In − Wn )ln = 0, and (In − Wn )Wnu = 0. Thus as λhm geometrically declines in h for |λm | < 1, Aθnm0 y(nm ,t , (u) ( α ) θ θ θ 14 Anm0 ynm,t , and Anm0 ynm,t become stable. In consequence, Anm0 ynm,t is stable and we have cointegration. (u) (τ ) ynm,t 13 Note that there are various explosive cases and properties of estimators under these cases would not be standard. ∑t (u) −1 t +1 θ h 14 Aθ y(τ ) = ∑t [W u ⊗ (S (H − I )H h S −1 )]Q u u m0 m0 m nm0,t −h , Anm0 ynm,t = n nm0 nm,t m0 m0 h=0 h=0 [Wn ⊗ (Sm0 (Hm0 − Im )Hm0 Sm0 )]vnm,t −h + Wn ⊗ (Sm0 (Hm0 − Im )Hm0 )ynm,−1 , (α ) (α ) −1 θ h and Aθnm0 ynm,t = h=0 [ln ⊗ (Sm0 (Hm0 − Im )Hm0 Sm0 )]αm0,t −h . Strictly speaking, for the claim that Anm0 ynm,t to be stable, we need to exclude possible explosive behavior for αm,t over time t. ∑t K. Yang and L.-f. Lee / Journal of Econometrics 221 (2021) 337–367 343 The cointegration rank can be derived from the rank of Aθnm0 . Actually, the cointegration relationship may be determined −1 by parameters and the spatial weights matrix. With Wn = Γn ω̄n Γn−1 , multiply by Snm0 , θ Snm0 Anm0 = (Γn ⊗ Im ) −1 ( An1 ,m 0 ) 0 An2 ,m (Γn−1 ⊗ Im ), −1 ′ −1 ′ (Pm0 + Ψm0 + Φm0 − Im )′ ], and An2 ,m = (In2 ⊗ Im − ω̄n2 ⊗ Ψm0 ) [In2 ⊗ Pm0 + ω̄n2 ⊗ (Ψm0 + Φm0 )′ − In2 ⊗ Im ], where An1 ,m = In1 ⊗[Sm0 with ω̄n2 = diag{0, . . . , 0, wn,n1 +1 , . . . , wn,n }. An2 ,m can have full rank, n2 m, as in Assumption 2.2(iii) and (iv). As −1 Sm0 (Pm0 + Ψm0 + Φm0 − Im )′ = Hm0 − Im has eigenvalues (λm − 1)’s, where λm ’s are eigenvalues of Hm0 , An1 ,m has rank n1 (m − m1 ), where m1 is the number of unit eigenvalues of Hnm0 . Therefore, cointegration rank of the system is nm − n1 m1 . For case MC, 0 < m1 < m, while the cointegration rank for case SC is (n − n1 )m because the SC case has Hm0 = Im and hence, m1 = m. We use rotations of variables to investigate the difference between spatial cointegration and mixed cointegration. We utilize the known information provided by the spatial weights matrix to rotate the dependent variables, in order to explicitly view how unit root processes are generated due to unit eigenvalues of Wn and Hm0 . The diagonalizability of Wn gives Wn = Γn ω̄n Γn −1 = (Γn,n1 , Γn,n2 ) ( 0 In 1 0 )( ω̄n2 Γn∗1 ,n Γn∗2 ,n ) , (2.7) where ω̄n has n1 unit eigenvalues in In1 and the remaining n2 = (n − n1 ) eigenvalues in ω̄n2 are strictly less than one in absolute value. Γn = (Γn,n1 , Γn,n2 ) where the first n1 columns are eigenvectors with respect to unit eigenvalues and the rest are with respect to other eigenvalues. Γn∗1 ,n represent the first n1 rows of Γn−1 while the rest rows form Γn∗2 ,n . The original system is Ynm,t = Wn Ynm,t Ψm0 + Ynm,t −1 Pm0 + Wn Ynm,t −1 Φm0 + Unm,t , where Unm,t represents the remaining model components (regressors, time fixed effects, individual fixed effects and idiosyncratic disturbances) for simplicity. + + + + −1 Applying the transformation Γn−1 to Ynm,t , we arrive ( + at Y)nm,t = ω̄n Ynm,t Ψm0 + Ynm,t −1 Pm0 + ω̄n Ynm,t −1 Φm0 + Γn Unm,t , Yn1 m,s + −1 + , then we have where Ynm ,s = Γn Ynm,s for any s. With Ynm,s = Yn+2 m,s Yn+1 m,t = Yn+1 m,t Ψm0 + Yn+ m,t −1 Pm0 + Yn+ m,t −1 Φm0 + Γn∗1 Unm,t , 1 1 (2.8) Yn+2 m,t = ω̄n2 Yn+2 m,t Ψm0 + Yn+ m,t −1 Pm0 + ω̄n2 Yn+ m,t −1 Φm0 + Γn∗2 Unm,t . 2 2 Rearrange the parameters, we arrive at ′ −1 ′ (i) Yn+1 m,t = Yn+ m,t −1 Hm0 + Γn∗1 Unm,t Sm0 , 1 ′ ′ ′ ′ ′ ⊗ ω̄n2 )vec(Yn+ m,t −1 ) ⊗ In2 + Φm0 (ii) vec(Yn+2 m,t ) = (Im − Ψm0 ⊗ ω̄n2 )−1 (Pm0 2 (2.9) ′ + (Im − Ψm0 ⊗ ω̄n2 )−1 vec(Γn∗2 Unm,t ). The subsystem (2.9)(ii) is stable under Assumption 2.2(iii) and (iv), as this subsystem captures stable components of variables. However, for the unstable subsystem, depending on different scenarios of (2.9)(i), the model can generate ′ −1 ′ various cases. Transform (2.9)(i) into ∆Yn+1 m,t = Yn+ m,t −1 (Hm0 − Im ) + Γn∗1 Unm,t Sm0 , which is similar to the error correction 1 form for a panel VAR model. For the case SC, the rank of Hm0 − Im is 0, which suggests that there is no cointegration relationship among the m + + + θ variables Y·+ 1,n1 m,t , Y·2,n1 m,t , . . . , Y·m,n1 m,t , which are columns of Yn1 m,t . However, since rank(Anm0 ) = (n − n1 )m for case SC, there are (n − n1 )m combinations among variables, which are stable. The decomposition of system (2.9) implies that those combinations of variables that are linked by spatial weights matrix in (2.9)(ii) are still stable. As unstable components of variables cancel out with each other across spatial units and the cointegration rank is determined only by the spatial weights matrix, therefore, the case is named as pure spatial cointegration-case SC. For the case MC, the rank of Hm0 has 0 < rank(Hm0 − Im ) < m, as eigenvalues of Hm0 are between 0 and 1 in absolute values in Assumption 2.2(iii). To simplify the analysis, we assume that the parameter matrix Hm0 is diagonalizable such that −1 ′ Hm0 = Υm0 Λm0 Υm0 , where Υm0 is a nonsingular eigenvector matrix. The decomposition resulted in (2.9)(i) implies there are m1 unit root processes among different variables for each spatial unit. There are n1 rows in Yn+1 m,t , which suggests the total number of unit root processes is n1 m1 . A further decomposition of systems in (2.8) reveals the unit root precesses clearer. As −1 ′ = Υm0 Λm0 Υm0 , where Υm0 is a nonsingular Hm0 ( matrix and the ) eigenvalue matrix Λm0 has m1 unit eigenvalues. The Im1 0 unit values can be arranged such that Λm0 = , i.e., the first m1 eigenvalues are 1 and the remaining 0 Λm2 0 + ∗ ones are smaller than 1 in absolute values. Denote Yn1 m,t = Yn1 m,t Υm0 and, conformably Yn∗1 m,t = (Yn∗1 m1 ,t , Yn∗1 m2 ,t ) and 344 K. Yang and L.-f. Lee / Journal of Econometrics 221 (2021) 337–367 Table 2.1 Stable and unstable cases for SVAR models. Name Parameter restriction Cointegration matrix (Assumption 2.2) Cointegration rank Cointegration relationship Stable (S) (ii) n.a. n.a. n.a. Spatial Cointegration (SC) (iii) (In − Wn ) ⊗ (Pm0 − Im )′ (n − n1 )m Across spatial units Mixed Cointegration (MC) (iv) Aθnm0 in (2.5) nm − n1 m1 Both Variable Cointegration (VC) (v) In ⊗ (Pm0 − Im )′ n(m − m1 ) Among variables in each spatial unit Pure Unit Root (PU) (vi) 0 n.a. None Explosive (E) Variousa n.a. n.a. n.a. a We can define many scenarios that are ‘‘explosive’’. We introduce an estimation method, which could be applied to one of those explosive cases. It is because the decomposition in (2.4) still works for this case. Υm0 = [Υm1 0 , Υm2 0 ]. Then we have the following unit root subsystem, (i) Yn∗1 m1 ,t = Yn∗1 m1 ,t −1 + Γn∗1 Unm,t (Im − Ψm0 )−1 Υm1 0 , (ii) Yn∗1 m2 ,t = Yn∗1 m2 ,t −1 Λm2 0 + Γn∗1 Unm,t (Im − Ψm0 )−1 Υm2 0 , (2.10) (iii) Yn+2 m,t = ω̄n2 Yn+2 m,t Ψm0 + Yn+ m,t −1 Pm0 + ω̄n2 Yn+ m,t −1 Φm0 + Γn∗2 Unm,t , 2 2 where (i) contains n1 m1 unit root processes and (ii) consists of stable components. Recall that (2.10)(iii) implies (2.8)(ii), thus it is also a stable subsystem. In sum, the transformed system can be divided into three subsystems, and the first one consists of unit root processes. The second subsystem reveals that for each spatial unit there are m2 stable relationships among variables. The third subsystem reveals that spatially there are n2 stable relationships among variables for different units. Hence, the cointegration rank should be nm − n1 m1 while the unstable components cancel out across variables and across spatial units. The cointegration rank is determined not only by the number of unit eigenvalues of the spatial weight matrix, but also the unknown parameter matrices, which differs from that in Yu et al. (2012). It can be named as a mixed cointegration case (MC). In Table 2.1, we summarize the characteristics of stable and unstable cases.15 An example of bivariate triangular SVAR model is provided in the supplementary file. 3. QML estimation We study the estimation strategy and asymptotic distributions of estimators for stable, spatial cointegration and mixed cointegration models. For estimation, we list assumptions of the SVAR model. Assumption 3.1. (i) vn,t ,i· , the ith row of Vnm,t , for all i and t, are i.i.d. random vectors of dimension m with zero mean and covariance matrix Σv m0 . The elements of disturbances satisfy the moment condition supk,l,p,q E|vn,t ,ik vn,t ,il vn,t ,ip vn,t ,iq |1+δ < ∞ for some constant δ > 0. (ii) Elements of Xnk,t are exogenous constants, uniformly bounded for all n and t. 15 The decomposition, estimation strategy and asymptotic analysis for both cases PU and VC are distinct from stable, spatial cointegration, mixed cointegration cases, and they need to be analyzed separately. K. Yang and L.-f. Lee / Journal of Econometrics 221 (2021) 337–367 345 (iii) n is a nondecreasing function of T . Both T and n tend to infinity. (iv) Parameters satisfied Assumption 2.2(i) and data is generated under conditions in Assumption 2.2(ii)–(iv), or (vii). According to Assumption 3.1(i), although disturbances are i.i.d. across time and individual spatial units, they are allowed to be correlated among different variables. The homogeneous disturbance assumption is restrictive but it is common in the spatial econometrics literature with QML estimation, we maintain this assumption in the paper.16 We consider the model with large T . As we are focusing on QML estimation and the investigation of dynamics without any extra specification on the initial value, we need large T . For spatial econometrics, the number of spatial units n is usually large, so we focus on the case with both large n and T . When n and T tend to infinity, the numbers of both individual and time fixed effects go to infinity. We eliminate time fixed effects and have individual fixed effects and common parameters of interest consistently estimated. To estimate the common parameters, it is computationally effective with a concentrated log likelihood function by concentrating out individual fixed effects. Our asymptotic analysis on the QML estimator of common parameters can also be easier with the concentrated log likelihood function. The elimination of time effects can avoid the incidental parameter problem due to many time dummies, but the incidental parameter problem remains for the many individual fixed effects.17 The dependent variable ynm,t of the model can be decomposed into four components. As all the eigenvalues of Bnm0 are (s) inside the unit circle, ynm,t is stable. But if some eigenvalues of Hm0 are equal to or larger than 1, we have respectively, the possible SC and MC, and an explosive case defined in Assumption 2.2(vii). A direct QML with observed data without (τ ) (u) (s) transformation will use all the components ynm,t , ynm,t , and ynm,t in estimation. When the true data generating process is in one of those cases, asymptotic properties of the direct QMLE are not yet known, but would be non-standard. However, the robust QMLE with the proposed spatial difference transformation method would overcome such a difficulty. (τ ) (u) (α ) In our proposed estimation, we eliminate ynm,t , ynm,t , and ynm,t and estimate parameters of the model using only (s) information in ynm,t . Note that the spatial difference In − Wn has its corresponding zero eigenvalues when Wn has unit (u) (τ ) eigenvalues, but the same eigenvectors Γn of Wn . We have [(In − Wn ) ⊗ Im ]ynm,t = [(In − Wn ) ⊗ Im ]ynm,t = 0 since (α ) (In − Wn )Γn 1n,n1 = 0 and [(In − Wn ) ⊗ Im ]ynm,t = 0 from (In − Wn )ln = 0 as Wn is row-normalized. Therefore, (s) ′ ′ [(In − Wn ) ⊗ Im ]ynm,t = [(In − Wn ) ⊗ Im ]ynm ,t is stable. E([(In − Wn ) ⊗ Im ]vnm,t vnm,t [(In − Wn ) ⊗ Im ]) = Σn ⊗ Σv m0 , ′ where Σn = (In − Wn )(In − Wn ) is symmetric but its rank is n − n1 . To eliminate the linear dependence among the 18 transformed disturbances, we use the eigenvalues and eigenvectors decomposition. Decompose Σn = Rn ΛΣn R′n where ( ) Λ1n 0 Rn is the orthonormal eigenvectors matrix and ΛΣn = is the eigenvalue matrix. Define Rn = [R1n , R2n ] 0 0 † −1 1 2 where R1n corresponds to (nonzero) eigenvalues of Λ1n and R2n to those zero eigenvalues. Define Wn = Λ1n2 R′1n Wn R1n Λ1n , −1 † † † † Ynm,t = Λ1n2 R′1n (In − Wn )Ynm,t , and similarly for other variables Xnk,t , Cnm0 , and Vnm,t . As Wn is known, all those transformed vectors and matrices are known, and we have † † † † † † † Ynm,t = Wn† Ynm,t Ψm0 + Ynm,t −1 Pm0 + Wn† Ynm,t −1 Φm0 + Xnk,t Πkm0 + Cnm0 + Vnm,t . †′ (3.1) − 21 †′ − 12 The covariance matrix of the disturbances is E(vec(Vnm,t )vec(Vnm,t )′ ) = (Λ1n R′1n Σn R1n Λ1n ) ⊗ Σv m0 = In−n1 ⊗ Σv m0 . 1 2 − 12 † † Denote Snm0 = (Λ1n R′1n ⊗ Im )Snm0 (R1n Λ1n ⊗ Im ), therefore |Snm0 | = |(R′1n ⊗ Im )Snm0 (R1n ⊗ Im )|. Furthermore, |Snm0 | = n1 †′ ′ | .19 |Snm0 ∥ Im − Ψm0 † † The partial quasi log-likelihood function for Ynm,t for t = 1, . . . , T , by regarding Ynm,0 as if it is given, is 1 (n − n1 )T ln LnT ,m (θ, Cnm ) = − − m 2 ln(2π ) + 1 2(n − n1 )T 1 n − n1 T ∑ ln |Snm | − n1 n − n1 ln |Im − Ψm′ | − 1 2 ln |Σv m | (3.2) vnm,t (θ )′ (Jn∗ ⊗ Σv−m1 )vnm,t (θ ), t =1 ′ where Jn = (In − Wn ) R1n Λ1n R1n (In − Wn ) and vnm,t (θ ) = (Imn − Wn ⊗ Ψm′ )ynm,t − (In ⊗ Pm + Wn ⊗ Φm′ )ynm,t −1 − (Xnk,t ⊗ ′ Im )vec(Πkm0 ) − cnm . The first order condition with respect to cnm gives an estimate of individual effects in terms of other parameters, ∗ ĉnm = ′ −1 ′ T 1∑ T ′ [Snm ynm,t − (In ⊗ Pm′ + Wn ⊗ Φm′ )ynm,t −1 − (Xnk,t ⊗ Im )vec(Πkm )]. t =1 16 If disturbances had heteroskedastic variances, a QML estimation might not be consistent. Other estimation methods are needed. 17 As T tends to infinity, many individual effects would not generate inconsistency for QML estimates, however, asymptotic biases would remain. One might eliminate the individual effects by time differencing before estimation. However, due to the remaining initial value problem in a dynamic model, asymptotic bias issue remains. So there is not much estimation advantages to eliminate individual fixed effects for estimation. 18 The method in this section can be found in textbooks, such as Theil (1971). 19 See the supplementary file for details of derivation. 346 K. Yang and L.-f. Lee / Journal of Econometrics 221 (2021) 337–367 ∑T 1 T Define temporal means ȳnm,T = t =1 ynm,t , and ỹnm,t = ynm,t − ȳnm,T , similarly for X̃nk,t , for t = 1, . . . , T as deviations ˜ ¯ from time means. Furthermore, for time lagged explanatory variables, ȳ¯ nm,T = T1 t =1 ynm,t −1 and ỹnm,t = ynm,t − ȳnm,T , for t = 0, . . . , T − 1. The concentrated log likelihood function with individual effects concentrated out is 1 ln LnT ,m (θ ) = − (n − n1 )T − m 2 ∑T 1 ln(2π ) + n − n1 T ∑ 1 2(n − n1 )T ln |Snm | − n1 ln |Im − Ψm′ | − n − n1 1 2 ln |Σv m | (3.3) ṽnm,t (θ )′ (Jn∗ ⊗ Σv−m1 )ṽnm,t (θ ) t =1 ′ ′ ˜ ). With the concentrated log likelihood, we )ỹnm,t −1 − (X̃nk,t ⊗ Im )vec(Πkm where ṽnm,t (θ ) = Snm ỹnm,t − (In ⊗ Pm + Wn ⊗ Φm can focus on the estimation and computation of common parameters of interest. Furthermore, asymptotic analysis on the common parameters’ estimators can be simplified without facing explicitly an infinite number of parameters. For the stable (S) and unstable (SC and MC) models, their identification and estimation are based on (3.3). The log likelihood for the SVAR above needs to be maximized by evaluating many parameters. We introduce a computational efficient method to accelerate the evaluation process by concentration as described in the supplementary file. This method needs to evaluate only the parameter matrix Ψm in the final concentrated log likelihood. The SVAR model contains own-variable spatial lags and cross-variable spatial lags as endogenous explanatory variables, and the presence of endogenous variables raises an identification issue. If an optimal IV exists, we can employ the optimal IV for endogenous variables in the equation system to identify each equation. Let e′m,j be the 1 × m row unit vector with ′ all elements being zeros except for its jth entry taking the value one. Define Ãz ,t = [ã1,t , ã2,t ..., ãm,t ], an n × m matrix ˜ ˜ −1 ′ ′ ′ [(In ⊗ Pm0 + Wn ⊗ Φm0 )ỹ˜ nm,t −1 + (X̃nk,t ⊗ Im )vec(Πkm0 where ãj,t = (Wn ⊗ e′m,j )Snm0 )], and Z̃nm,t = [Ỹnm,t −1 , Wn Ỹnm,t −1 , X̃nk,t ] an n × (2m + k) matrix. The matrix Ãz ,t represents optimal IVs for spatial lagged terms Ỹnm,t and they can be derived from the expectation of the reduced form for ynm,t , while Z̃nm,t consists of predetermined and exogenous regressors. For a finite sample, the parameters Ψm , Pm , Φm , and Πkm are identified if the matrix consists of first 2m + k columns as ′ ′ ′ ′ ′ ′ ′ ′ [Znm ,1 , Znm,2 , . . . , Znm,T ] with m additional columns [Ãz ,1 , Ãz ,2 , . . . , Ãz ,T ] has full column rank 3m + k. Assumption 3.2 formally states identification conditions. Assumption 3.2. (i) (a) limT →∞ 1 (n−1)T limT →∞ [ ∑T 1 (n−1)T t =1 ∑T t =1 ] ′ ∗ E Z̃nm ,t Jn Z̃nm,t exists and is nonsingular. [ ] E Ã′z ,t Jn∗ Ãz ,t and limT →∞ (b) The limit matrix limT →∞ where PzzA,t = Ãz ,t − Z̃nm,t [ 1 (n−n1 )T 1 (n−n1 )T ∑T [ t =1 ∑T 1 (n−1)T t =1 ∑T [ t =1 ] ′ ∗ E Z̃nm ,t Jn Ãz ,t exist. ] ′ ∗ E PzzA ,t Jn PzzA,t exists and is nonsingular, ′ ∗ Z̃nm ,t Jn Z̃nm,t ]−1 [ 1 (n−n1 )T ∑T ] t =1 ′ ∗ Z̃nm ,t Jn Ãz ,t . (ii) [ lim n→+∞ − ′ †−1 † −1 † ∗ 1 † |Snm0 Snm (In−n1 ⊗ Σv∗− m )Snm Snm0 (In−n1 ⊗ Σv m0 )| 1 m(n − n1 ) ′ †′ −1 †′ Tr[Snm0 Snm (In−n1 1 m(n−n1 ) ] ⊗ †−1 1 † Σv∗− m )Snm Snm0 (In−n1 ⊗ Σv∗m0 )] <0 unless Σv∗m = Σv∗m0 and Ψm = Ψm0 , where σm2 = Tr(Σv m ) and Σv∗m = Σv m /σm2 . Assumption 3.2(i) implies the existence and nonsingularity of the limiting matrix limT →+∞ [( )] 1 (n−n1 )T ∑T t =1 E ′ ∗ ′ ∗ Z̃nm Z̃nm ,t Jn Z̃nm,t ,t Jn Ãz ,t , which requires optimal instruments for endogenous spatial lags exist and they are not ′ ∗ ′ ∗ Ãz ,t Jn Z̃nm,t Ãz ,t Jn Ãz ,t linearly dependent with each other. Assumption 3.2(ii) exploits the information of disturbances and the structure of the spatial weights matrix, which is similar to simultaneous equations SAR models for cross-sectional data in Yang and Lee (2017). Intuitively, Assumption 3.2(ii) utilizes the i.i.d. assumption for disturbances across time and space.20 Lemma 3.1 †′ † †′ † states that the linear independence of In−n1 , Wn , Wn and Wn Wn is sufficient to guarantee the arithmetic–geometric inequality in Assumption 3.2(ii) with a finite sample hold. Lemma 3.1. 1 †′ † †′ † †′ −1 †′ † †−1 1 When In−n1 , Wn , Wn and Wn Wn are linearly independent, then |Snm0 Snm (In−n1 ⊗ Σv∗− m )Snm Snm0 (In−n1 †′ − 1 †′ Snm0 Snm (In−n1 1 ⊗ Σv∗m0 )| m(n−n1 ) − m(n− Tr[ n ) 1 †−1 ∗ ∗ 1 † ∗ ⊗ Σv∗− m )Snm Snm0 (In−n1 ⊗ Σv m0 )] < 0, unless Σv m = Σv m0 and Ψm = Ψm0 . 20 This condition can be extended for models with more complicated disturbances as long as proper assumptions can be put on the first and second moments of disturbance terms; for example, spatial moving average or spatial autoregression disturbances. K. Yang and L.-f. Lee / Journal of Econometrics 221 (2021) 337–367 347 The identification and consistency of parameters of the SVAR model is established in the following proposition. Proposition 3.1(i) and (ii) indicate that two sets of conditions in Assumption 3.2 can separately identify the parameters. Therefore, as long as either one holds, the parameters can be consistently estimated. Proposition 3.1. Under Assumptions 2.1(i)–(ii) and 3.1, (1) with Assumption 3.2(i), Ψm , Pm , Φm and Πkm can be identified; (2) with Assumption 3.2(ii), Ψm and Σv m can be identified. Furthermore, if Assumption 3.2(i) (a) holds, then Pm , Φm and Πkm can also be identified; p (3) as T goes to infinity, (n−1n )T ln LnT ,m (θ ) − E [ (n−1n )T ln LnT ,m (θ )] → 0 uniformly in θ in its compact parameter space; 1 1 (4) with Assumption 3.2, the QMLE θ̂nT = argmaxθ (n−1n 1 )T ln LnT ,m (θ ) for the SVAR model is consistent. The asymptotic distribution of the QML estimator θ̂nT can be derived from the Taylor expansion of the log likelihood function θ̂nT − θ0 = − [ ∂ 2 ln LnT ,m (θ̃nT ) ∂θ ∂θ ′ ]−1 ∂ ln LnT ,m (θ0 ) , ∂θ where θ̃nT lies between θ̂nT and θ0 . At the true θ0 , the first order ∂ ln LnT ,m0 ∂θ ∂ ln LnT ,m (θ0 ) ∂θ ∂ ln L1 ∂ ln LR nT ,m0 nT ,m0 derivatives can be decomposed as ≡ = + , where the first component captures the ∂θ ∂θ score with zero mean, and the second component generates possible asymptotic bias. Detailed formulas of the above decomposition are in the supplementary file (Section 3). By inspection, ∑T √ ∂ ln L1nT ,m0 1 ∂θ (n−n1 )T is a statistic of the general form: ∑ h {Unm,t −1 vnm,t + vnm,t B1nm vnm,t + Dnm,t vnm,t − Tr[B1nm (In ⊗ Σvm0 )]}, where Unm,t = +∞ h=1 Gnm,c Gnm,d vnm,t −h+1 , B1nm , ∑+∞ h Gnm,c and Gnm,d are generic nm × nm matrices, such that B1nm , Gnm,c , and h=1 abs(Gnm,d ) are bounded in row and column sum norms, and Dnm,t is an nm × 1 generic vector with uniformly bounded entries. As the components of the disturbance ′ ′ ′ t =1 vector for each i at t of vnm,t are not independently distributed, the central limit theorem for linear–quadratic form in Yu et al. (2008) for the univariate case needs to be extended to the multivariate case. Lemma 3.2. Suppose QnT ,m = ∑T t =1 [U′nm,t −1 vnm,t + v′nm,t B1nm vnm,t + D′nm,t vnm,t − Tr(B1nm Σvm )], Vnm,t satisfies 2 Assumption 3.1(i), its variance σQ is O(nT ), and nT ,m 1 nT 2 σQ is bounded away from zero, then nT ,m QnT ,m σQnT ,m d − → N(0, 1). With the variance matrix of the normalized score vector consists of two components: ( a proper decomposition, ) Cov √ ∂ ln L1nT ,m0 1 ∂θ (n−n1 )T = Ωθ0 ,nT + Ξθ0 ,nT + O (1) T , where if disturbances are normally distributed, Ξθ0 ,nT = 0. The detailed expressions for Ωθ0 ,nT and Ξθ0 ,nT are in the supplementary file (Section 3). By the CLT in Lemma 3.2, ∂ ln L1nT ,m0 ∂θ √ 1 (n−n1 )T d → N(0, Ωθ0 + Ξθ0 ), where Ωθ0 = limT →+∞ Ωθ0 ,nT and Ξθ0 = limT →+∞ Ξθ0 ,nT which are assumed to exist. The ∂ ln LRnT ,m0 ¯ ′ v̄ ¯ other component √(n−1n )T has expressions in forms with either v̄′nm,T B1nm v̄nm,T or Ū nm,T nm,T , where Ūnm,T = ∂θ 1 √ R ∑T −1 ∂ ln LnT ,m0 n−n1 1 U , due to the concentration of individual effects. This component has √(n−1n )T = ∆R,nT + T (√t =0 ) nm,t ∂θ T 1 n where explicit expressions of entries in ∆R,nT , which capture possible asymptotic bias components, are in the O T3 Appendix D. For the normalized Lemma 3.3. ∂ 2 ln LnT ,m (θ̂nT ) 1 , (n−n1 )T ∂θ ∂θ ′ Under Assumptions 2.1(i)–(ii), 3.1, and 3.2, (1) for any consistent estimate θ̂nT of θ0 , and (2) it has the following regular convergence properties. ∂ 2 ln LnT ,m (θ0 ) 1 (n−n1 )T ∂θ ∂θ ′ ∂ 2 ln LnT ,m (θ̂nT ) 1 (n−n1 )T ∂θ ∂θ ′ − ∂ 2 ln LnT ,m (θ0 ) 1 (n−n1 )T ∂θ∂θ ′ = op (1); + Ωθ0 ,nT = op (1). By defining ∆θ0 ,nT = Ωθ−1,nT ∆R,nT , the QMLE θ̂nT has the following asymptotic distribution: 0 Theorem 3.1. √ Under Assumptions 2.1(i)–(ii), 3.1, and 3.2, when T → +∞, √ (n − n1 )T (θ̂nT − θ0 ) − n − n1 T ( (√ ∆θ0 ,nT + Op max Consequently, (1) if (n − n1 )/T → 0, then the bias Ξθ0 )Ωθ−01 ); (2) if (n − n1 )/T → M, a finite positive constant, p √ √ n − n1 n−n1 ∆θ0 ,nT T T3 √ )) , 1 d → N(0, Ωθ−01 (Ωθ0 + Ξθ0 )Ωθ−01 ). T vanishes, and (n − n1 )T (θ̂nT − θ0 ) − (3) if (n − n1 )/T → ∞, T (θ̂nT − θ0 ) − ∆θ0 ,nT → 0. For the case that the disturbances are normally distributed, Ξθ0 = 0. √ √ d (n − n1 )T (θ̂nT − θ0 ) → N(0, Ωθ−01 (Ωθ0 + d M ∆θ0 ,nT → N(0, Ωθ−01 (Ωθ0 + Ξθ0 )Ωθ−01 ); and 348 K. Yang and L.-f. Lee / Journal of Econometrics 221 (2021) 337–367 The analysis of the asymptotic distribution differs from Lee and Yu (2010a) with the application of lemmas of convergence for multiple dependent variables in each spatial unit and the multivariate CLT, which are in Appendix B. When Tn → M > 0, the QMLE is not asymptotically centered at zero. However, the asymptotic bias may be eliminated by considering a proper bias correction. As the asymptotic bias of θ̂nT is limT →+∞ T1 ∆θ0 ,nT , we define the c bias corrected estimator θ̂nT = θ̂nT − √ θ̂ 1 ∆θ ,nT ( nT ), T where ∆θ,nT (θ̂nT ) = − [ ∂ 2 ln LnT ,m (θ̂nT ) 1 (n−n1 )T ∂θ∂θ ′ ]−1 ∆R,nT (θ̂nT ). To show that θ̂ can be nT consistent and asymptotically normally centered at zero, additional technical assumption is − needed. Define Snm = (Γn ⊗ Im )diag {0, . . . , 0, (Im − ωn,n1 +1 Ψm′ )−1 , . . . , (Im − ωn,n Ψm′ )−1 }(Γn−1 ⊗ Im ), Ewn m,ij = (Γn ⊗ ′ ′ ′ ′ −1 −1 Im )diag {0, . . . , 0, ωn,n1 +1 Em ,ij , . . . , ωn,n Em,ij }(Γn ⊗ Im ) and Enm,ij = (Γn ⊗ Im )diag {0, . . . , 0, Em,ij , . . . , Em,ij }(Γn ⊗ Im ) with ′ the first n1 diagonal blocks being zeros. Em,ij = em,i em,j . c nT Assumption 3.3. The row and column sum norms of the sequences ∑∞ ∑g −1 g =1 h=0 g −1−h − Bhnm Snm Ewn m,ij Bnm and g −1−h Snm Enm,ij Bnm are bounded uniformly in absolute value in the parameter space for i, j = 1, . . . , m. Theorem 3.2. c Under Assumptions 2.1(i)–(ii), 3.1–3.3, when n/T 3 → 0, the bias corrected QMLE θ̂nT has − d θ0 ) → N(0, Ωθ−01 (Ωθ0 + ∑∞ ∑g −1 √ g =1 h=0 Bhnm c (n − n1 )T (θ̂nT − Ξθ0 )Ωθ−01 ). 4. Tests for cointegration rank It is a crucial issue to distinguish stable and unstable cases. For cases SC and MC, the cointegration rank is also important since an (n − n1 )m rank implies that the model exhibits pure spatial cointegration while an nm − n1 m1 rank for 0 < m1 < m, which is relatively larger, indicates that dependent variables are mixed cointegrated among spatial units and with each other in a spatial unit. For the univariate case, m = 1, it is clear that only the spatial cointegration might occur. Therefore, we introduce a hypothesis testing procedure under Assumption 2.2(ii) or (iii) or (iv), which can distinguish the stable model, spatial cointegration model, or mixed cointegration model generalizing Johansen’s cointegration rank test (Johansen, 1988, 1991).21 We will firstly introduce the transformation of the model, and present the procedure and statistic for the hypothesis testing. Then we derive the asymptotic distribution of the statistic. The transformation procedure consists of multiple steps and is notationally intensive. However, the basic idea is to decompose the original system like (2.8) and only use the possible unstable subsystem with time dummies being eliminated. Then, concentrate out all regressors and individual fixed effects. Third, use the log-likelihood function to construct the likelihood ratio statistics. Recall the decomposition of Wn in (2.7): Γn,n1 collects n1 eigenvectors corresponding to unit eigenvalues of Wn , which ′ ′ are arranged as the first n1 columns of Γn . For the starred matrices [Γn∗1 ,n , Γn∗−n1 ,n ]′ = Γn−1 , Γn∗1 ,n represents the first −1 n1 rows in Γn . As Wn ln = ln , without loss of generality, let Γn,n1 = (Γn,n0 , √1n ln ) where n0 = n1 − 1. Note that Γn∗0 ,n Γn ϖn Γn−1 ( Γn∗0 ,n Γn,n0 0 1 ) , where Γn∗0 ,n is the first n0 rows of Γn∗1 ,n . Thus, Γn∗0 ,n Γn,n0 = In0 , Γn∗0 ,n ln = 0, and ) ( 1 ′ 0 In 1 = Γn∗0 ,n [Γn,n0 , √1n ln , Γn,n2 ]ϖn Γn−1 = [In0 , 0, 0] Γn−1 = Γn∗0 ,n . Multiplying (Γn∗0 ,n Γn∗0 ,n )− 2 Γn∗0 ,n 0 ϖn2 In1 = Γn1 ,n Γn,n1 ≡ ∗ 0 from the left to the model, we can eliminate the time fixed effect and arrive at ′ ◦ Yn◦0 m,t =Yn◦0 m,t −1 Hm0 + Xn◦0 ,t Πkm0 + Cn◦0 m0 + Vn◦0 m,t , or ′ ′ ′ ′ ′ ′ ′ ◦ ∆Yn◦0 m,t =αm0 βm0 Yn◦0 m,t −1 + Πkm0 Xn◦0 ,t + Cn◦0 m0 + Vn◦0 m,t , ′ 1 ′ 1 ′ 1 where Yn◦0 m,t = (Γn∗0 ,n Γn∗0 ,n )− 2 Γn∗0 ,n Ynm,t , Vn◦0 m,t = (Γn∗0 ,n Γn∗0 ,n )− 2 Γn∗0 ,n Vnm,t (Im − Ψm0 )−1 , Xn◦0 ,t = (Γn∗0 ,n Γn∗0 ,n )− 2 Γn∗0 ,n Xnk,t , ∗′ − 21 n0 ,n ) ◦ Cn◦0 ,t = (Γn∗0 ,n Γ Γn∗0 ,n Cnm0 (Im − Ψm0 )−1 , and Πkm0 = Πkm0 (Im − Ψm0 )−1 . The subsystem captures all possible unstable components with time dummies eliminated by perpendicularity of ln and the rows of Γn∗0 ,n . Here αm0 and βm0 are ′ m × (m − m1 ) matrices such that Hm0 − Im = αm0 βm0 , as m − m1 is the rank of Hm0 − Im .22 ′ ◦ ◦′ ◦ ◦′ ◦ Let yn0 ,t = vec(Yn0 m,t ), vn0 ,t = vec(Vn0 m,t ), and cn0 = vec(Cn◦0 m ). Define Σum0 as the covariance matrix of any row of Vnm,t (Im − Ψm0 )−1 . We have cov(v◦n0 ,t ) = In0 ⊗ Σum0 . The log likelihood function for the subsystem y◦n0 ,t with parameters 21 Note that the testing procedure does not apply to cases PU or VC. If we want to test for these cases, we should first implement t test for Φm0 + Ψm0 = 0. 22 The representation was introduced by Engle and Granger (1987). The matrices α and β are not unique. Normalization is needed in order m0 m0 to estimate them; see Lütkepohl (2005, Chapter 7) for possible normalization. However, the likelihood ratio statistic can be derived as long as they are properly selected to maximize the log likelihood function. K. Yang and L.-f. Lee / Journal of Econometrics 221 (2021) 337–367 349 ◦ βm , αm , Πkm , Σum is ◦ ln0 T (βm , αm , Πkm , Σum ) = constant − n0 T 2 ln |Σum | − T 1∑ 2 ) ( [∆y◦n0 ,t − In0 ⊗ (αm βm′ ) y◦n0 ,t −1 − c◦n0 t =1 ) ( ◦′ ′ ◦′ ′ −1 )]. ) y◦n0 ,t −1 − c◦n0 − (Xn◦0 ,t ⊗ Im )vec(Πkm )[∆y◦n0 ,t − In0 ⊗ (αm βm )] (In0 ⊗ Σum − (Xn◦0 ,t ⊗ Im )vec(Πkm Now we concentrate out parameters: 1. Concentrating ĉ◦n0 . The fixed effects can be concentrated out by ĉ◦n0 = ∑T 1 T ( ) ( 1 ∑T −1 ◦ ) ◦ ′ ∆ y − I ⊗ ( α β ) n m n , t m t =1 t =0 yn0 ,t 0 T 0 ∑T ′ ◦ ⊗ Im )vec(Πkm0 ). ∑T ∑T −1 ◦ 1 ◦ ◦′ ◦ ◦ ˜◦ 2. Concentrating vec(Πkm ). Define ∆ỹ◦n0 ,t = ∆y◦n0 ,t − T1 t =1 ∆yn0 ,t , ỹn0 ,t −1 = yn0 ,t −1 − T t =0 yn0 ,t , and X̃n0 ,t = ∑ ′ ′ ′ ′ ′ ′ T ◦ ◦ ◦ ◦ ′ ′ ◦ ′ ˜◦ ◦ ◦ ˜◦ ˜◦ Xn◦0 ,t − T1 t =1 Xn0 ,t . Let X̃ = [X̃n0 ,1 , . . . , X̃n0 ,T ] , ỹ−1 = [ỹn0 ,0 , . . . , ỹn0 ,T −1 ] , and ∆ỹ = [∆ỹn0 ,1 , . . . , ∆ỹn0 ,T ] . We ′ ′ ′ ′ ˆ ◦ ◦ ′ can concentrate out vec(Πkm ) by vec(Πkm ) = [(X̃◦ X̃◦ ) ⊗ Im ]−1 (X̃◦ ⊗ Im )[∆ỹ◦ − [In0 T ⊗ (αm βm )]ỹ˜ ◦−1 ]. M ◦ ◦′ ◦ −1 ◦′ M ◦ ˜ 3. Concentrating Σum . Let Mx = In T − X̃ (X̃ X̃ ) X̃ , ∆ỹ = (Mx ⊗ Im )∆ỹ , and ỹ = (Mx ⊗ Im )ỹ˜ ◦ . We can reshape − 1 T ◦ t =1 (Xn0 ,t −1 0 −1 M M M the mn0 T × 1 vector ∆ỹM as an m × n0 T matrix ∆ỸM = [∆ỹM 1,1 , ∆ỹ2,1 , . . . , ∆ỹn ,1 , . . . , ∆ỹn ,T ], and reshape the 0 0 ˜M ˜M ˜M ˜M ˜M vector ỹ˜ M −1 as m × n0 T matrix Ỹ−1 = [ỹ1,0 , ỹ2,0 , . . . , ỹn ,0 , . . . , ỹn ,T −1 ]. We concentrate out the entries in covariance matrix Σum by Σ̂um (αm , βm ) = 1 n0 T ˜ 0 0 ˜ ′ M ′ M ′ (∆ỸM − αm βm Ỹ−1 )(∆ỸM − αm βm Ỹ−1 ) . Therefore, the concentrated log likelihood function is ln0 T (αm , βm ) = constant − n0 T 2 ln |Σ̂um (αm , βm )|. (4.1) ˜ ′ ˜ ˜ ′ ′ M M −1 To maximize the log likelihood function, given βm , αm (βm ) = (∆ỸM ỸM −1 βm )(βm Ỹ−1 Ỹ−1 βm ) . Define S00 = 1 ′ n0 T ∆ỸM ∆ỸM , S10 = 1 ˜M ′ Ỹ−1 ∆ỸM , and S11 = n0 T 1 ˜ M ˜ M′ Ỹ−1 Ỹ−1 . n0 T −1 And λ1 , λ2 , . . . , λm are solutions of |λS11 − S10 S00 S01 | = 0 with λ1 ≥ λ2 ≥ · · · ≥ λm . Substitute αm in the log likelihood function, Similarly to Johansen (1988, pp. 235) we get concentrated log likelihood function.23 ln0 T (βm ) =constant − =constant − n0 T 2 n0 T 2 ′ ′ ln |S00 − S01 βm (βm S11 βm )−1 βm S10 | −1 ′ [ln |S00 | + ln |βm′ (S11 − S10 S00 S01 )βm | − ln |βm S11 βm |]. Using Proposition A.7 of Lütkepohl (2005), under H0 : rank(Hm0 − Im ) = m − m1 , the concentrated log likelihood function ln0 T (βm ) is maximized by the choice β̃m being the first m − m1 eigenvectors in the eigenvector matrix V̂m of the equation −1 |λS11 − S10 S00 S01 | = 0 with λ1 ≥ λ2 ≥ · · · ≥ λm normed by V̂m′ S11 V̂m = Im . The maximized likelihood function under the condition that rank(Hm0 − Im ) = m − m1 is lmax = constant − n0 T 2 m−m1 (ln |S00 | + ∑ ln |1 − λp |). p=1 Therefore, the likelihood ratio test statistics against H1 : m − m1 < rank(Hm0 − Im ) ≤ m is LRn0 T (m − m1 , m) = n0 T m ∑ p=m−m1 +1 ln |1 − λp | = n0 T m ∑ λp + op (1). p=m−m1 +1 Under H0 , its asymptotic distribution is in Proposition 4.1. We run a sequential hypothesis testing procedure by m1 = m, m − 1, . . . , 1 for a sequence of null hypothesis from H0 : rank(Hm0 − Im ) = 0 to H0 : rank(Hm0 − Im ) = m − 1. The inference follows: 1. If H0 : rank(Hm0 − Im ) = 0 cannot be rejected, we conclude that the model exhibits spatial cointegration with cointegration rank (n − n1 )m (case SC); 23 The second equality is due to ⏐ ⏐ S00 ⏐ ′ ⏐ βm S10 . S01 βm ⏐⏐ −1 ′ ′ ′ = |S00 ∥ βm′ (S11 − S10 S00 S01 )βm | = |βm S11 βm ∥ S00 − S01 βm (βm S11 βm )−1 βm S10 |. βm′ S11 βm ⏐ ⏐ 350 K. Yang and L.-f. Lee / Journal of Econometrics 221 (2021) 337–367 2. If all H0′ s : rank(Hm0 − Im ) = j for j = 0, 1, . . . , m − m1 − 1 are rejected and H0 : rank(Hm0 − Im ) = m − m1 cannot be rejected, we conclude that the model exhibits mixed cointegration with cointegration rank nm − n1 m1 (case MC); 3. If all H0′ s : rank(Hm0 − Im ) = j for j = 0, 1, . . . , m − 1 are rejected, we conclude that the model is stable (case S). We make the following additional assumptions in order to derive the asymptotic distribution of the likelihood ratio ′ L with L being the lag operator and χ (L) = (ϑ (L) − ϑ (1))/(1 − L). Also, test statistics. Define ϑ (L) = (1 − L)Im − αm0 βm0 ′ ′ −1 ′ B = β⊥m0 (α⊥m0 χ (1)β⊥m0 ) α⊥m0 and τn0 ≡ BCn◦ m0 , where β⊥m0 and α⊥m0 are m × m1 matrices with full column rank 0 ′ ′ such that βm0 β⊥m0 = αm0 α⊥m0 = 0(m−m1 )×m1 . Assumption 4.1. limn0 →∞ 1 n0 β⊥′ m0 τn0 τn′ 0 β⊥m0 exists and has full rank. n0 is monotonically increasing with T . This assumption is crucial to derive the asymptotic distribution for our test statistics. The hypothesis test for cointegration ranks for panel data models developed in this paper differs from those in the literature. Larsson et al. (2001) consider the heterogeneous vector error correction models. Breitung (2005) studies a cointegration rank test method for panel vector autoregression, which allows individual specific αm0 and uses the cross-sectional average of statistics for all time series in inference. Both papers assume independent distributed disturbances. Our test focuses on the asymptotic distribution of the statistic with individual fixed effects and exogenous variables presented, which can be applied to SVAR models. The extension of Assumption 4.1 with exogenous regressors is in the following Assumption 4.2, which also guarantees that the time trend terms dominate the unit root terms. The only difference between Assumptions 4.1 and 4.2 is that the exogenous variables can also generate deterministic trend, in addition to individual dummies. Assumption 4.2. (1) Xn◦0 ,t = Xnc0 + Xnu0 ,t , where Xnc0 is the nonstochastic mean and xun uniformly bounded partial sums. Let τ x n0 (i) | ∑t h=1 c′ km0 Xn0 . = τn0 + BΠ ′ 0 ,i,t , a 1 × k vector as the ith row of matrix Xnu0 ,t has Specifically, ′ ′ C (L)xun 0 ,i,h ◦′ ◦′ ◦ ◦ , and , B(L)Πkm0 | < Dx < +∞ for a constant Dx uniformly for i, t, and C (L) = Πkm0 B1 (L)Πkm0 24 ; ∑T (ii) | h=1 Th xun ,i,h | < Dx < +∞ for a constant Dx uniformly for i, T ; 0 1 ◦ ◦′ u′ u ′ t =1 B(L)Πkm0 Xn0 ,t Xn0 ,t Πkm0 B(L) , n0 T ′ ′ ′ ′ ′ T T 1 1 1 u u x x (L)Xnu ,t −1 Xnu0 ,t , 3 3 t =1 t X̃n0 ,t −1 n0 , t =1 tB1 (L)X̃n0 ,t −1 n0 , and 0 n0 T 2 n0 T 2 n0 T T u′ u Furthermore, limT →∞ n1T t =1 Xn0 ,t Xn0 ,t exists and is nonsingular. 0 (iii) When T → ∞, the limits of 1 n0 T ∑ ∑T τ ∑T τ ∑ t =1 3 2 ′ ′ ′ (∑t ◦ Xnu0 ,t Xnu0 ,t , B(L)Πkm0 ∑T t =1 X̃nu 0 ,t −1 s=1 1 n0 T ∑T B ) t =1 1 Vn◦0 m,s exist. ∑ (2) n0 is monotonically increasing with T . limn0 →∞ [ 1 n0 ] ′ β⊥′ m0 τnx0 τnx0 β⊥m0 exists and has full rank. Assumption 4.2 explicitly assumes that the exogenous variables are composed of a nonstochastic mean and a partially bounded term, which can simplify asymptotic analysis. Assumption 4.2(i) and (ii) state that processes related to component Xnu0 ,t have bounded partial sums, which rules out time trends generated by Xnu0 ,t . Assumption 4.2(iii) regulates the limits of ‘‘second moments’’ of Xnu0 ,t . For time series literature, the VAR model in Johansen (1991) also includes seasonal dummies as time-varying variables with their partial sums remaining bounded. Here we need both n0 and T tend to infinity simultaneously. Adopting assumptions in Quah (1994) and Levin et al. (2002), we assume that n0 is monotonically increasing with T .25 The following proposition presents asymptotic properties of test statistics. Proposition 4.1. Under Assumptions 2.1, 2.2(i), and 3.1(i)–(iii), for the stable case defined in Assumption 2.2(ii) or the unstable cases defined in Assumption 2.2(iii) or (iv), ′ (1) with Assumption 4.1, for the model without exogenous variables, i.e. Yn◦0 m,t = Yn◦ m,t −1 Hm0 + Cn◦ + Vn◦0 m,t , and = Yn0 m,t −1 Hm0 + Xn0 ,t Πkm0 + Cn◦0 m0 + Vn◦0 m,t , 0 m0 0 ◦ (2) with Assumption 4.2, for the model with exogenous variables, i.e. Yn0 m,t under H0 : rank(Hm0 − Im ) = m − m1 for m1 = 1, 2, . . . , m, as limT →∞ n T ◦ ′ ◦ ◦ = M < +∞, the likelihood ratio test statistics d LRn0 T (m − m1 , m) → Tr[(Wm1 + Bm1 )(Wm1 + Bm1 )′ ], where Wm1 is an m1 × m1 matrix with each entry being i.i.d. standard normally distributed random variable and Bm1 = ( ( √ ′ − 3M α⊥ m0 limn0 →0 1 x x′ n0 n0 n0 τ τ ) α⊥m0 )−1/2 ′ 1/2 26 (α ⊥ . m0 Σum0 α⊥m0 ) 24 The definition and expression of B(L) and B (L) are in the Appendix. 1 25 With Assumption 2.1(iii), n is propositional with n. 0 26 Note that τ x = τ when there is no exogenous variables or X c Π ◦ B′ = 0. n0 n0 n0 km0 K. Yang and L.-f. Lee / Journal of Econometrics 221 (2021) 337–367 351 The likelihood ratio test statistic has chi-square distribution when n/T → 0 while it is noncentral chi-square distributed with n going to infinity proportionally to T . Tr[(Wm1 + Bm1 )(Wm1 + Bm1 )′ ] has the same distribution as Tr[(Wm1 + B∗m1 )(Wm1 + B∗m1 )′ ] with B∗m1 = Rm1 Bm1 R′m1 where Rm1 is a generic m1 × m1 orthonormal matrix. For p ′ ′ ′ inference, we need to estimate consistently the term Bm1 . As α⊥ m0 S00 α⊥m0 → α⊥m0 Σum0 α⊥m0 , α⊥m0 ′ α⊥ m0 ( ′ 1 (Cn◦ m0 n0 0 ( 1 x x′ n0 n0 n0 τ τ ) α⊥m0 = ) ◦′ ◦ c′ km0 Xn0 )(Cn0 m0 ◦ + Πkm0 Xnc0 ) α⊥m0 , and the latter term can be estimated with any consistent estimators ∑ T ( ◦′ ′ ′ xˆ ,◦′ x,◦ ◦ 27 ◦′ ◦ ′ αˆ The consistent estimator for n1 (Cn◦ m0 + Πkm0 Xnc0 )(Cn◦ m0 + Πkm0 Xnc0 ) is n1 Cn m0 Cn m0 = n 1T 2 m βm and Π̂km . t =1 Yn0 m,t − 0 0 0 0 0 0 0 ) ∑T ( ◦′ )′ ◦′ ◦′ ◦′ ◦′ ◦′ ◦′ ′ ′ ˆ ˆ αm βm Yn0 m,t −1 − Π̂km X̃n0 ,t t =1 Yn0 m,t − αm βm Yn0 m,t −1 − Π̂km X̃n0 ,t . The only task left is to estimate α⊥m0 . Since ′ α⊥m0 is not unique, with restriction α⊥ m0 α⊥m0 = Im1 , it can be identified up to transformation by an orthonormal +Π −1 −1 −1 1 −1 −2 ′ matrix. We estimate α̃⊥m = S00 S01 S112 V̂m,m1 (V̂m′ ,m1 S112 S10 S00 α̃⊥m = S01 S112 V̂m,m1 )− 2 since this estimator satisfies α̃m [(S01 β̃m )(β̃m′ S11 β̃m )−1 ]′ α̃⊥m = 0, where V̂m,m1 represents the matrix of eigenvectors corresponding to the last m1 −1 −1 −1 eigenvalues λm−m1 +1 ≥ · · · ≥ λm for |λIm − S112 S10 S00 S01 S112 | = 0. Lemma 4.1 (Consistency of Estimator α̃⊥m ). Under the null hypothesis and Assumptions 2.1, 2.2(i), and 3.1(i)–(iii), in addition, for the stable case defined in Assumption 2.2(ii) or the unstable cases defined in Assumption 2.2(iii) or (iv), with Assumptions 4.1 ′ ′ or 4.2, α̃⊥m is a consistent estimator for a basis of the null space of αm0 , i.e., αm0 α̃⊥m = op (1). The above lemma allows us to use the plug-in method to estimate Bm1 and simulate critical values for hypothesis testing. We propose an estimator of the limiting distribution of the LR statistics, i.e., the cumulative distribution function FLR (c) = Pr(Tr[(Wm1 + Bm1 )(Wm1 + Bm1 )′ ] < c), as F̃LR (c) = Pr(Tr[(Wm1 + B̃m1 )(Wm1 + B̃m1 )′ ] < c), where B̃m1 = )−1/2 ) ( ( √ x,◦ ,◦′ 1 xˆ ′ ′ 1/2 − 3M α̃⊥ (α̃⊥ . Note that we treat M = 0 when we assume α̃ C C ⊥ m m S00 α̃⊥m ) m n0 m0 n0 m0 n 0 n0 T → 0. The asymptotic distribution of the test statistic is pivotal with relatively small n0 while it is non-pivotal with proportional n0 and T . Hence, we define the critical region of the LR test with asymptotic size α as {LRn0 T (m − m1 , m) ≤ c̃1−α }, where c̃1−α is the (1 − α ) × 100 percentile of the cumulative distribution function F̃LR (·). Corollary 1(i) points out that F̃LR (c) → FLR (c), which implies that the critical region define by c̃1−α is asymptotically equivalent to the critical region defined by c1−α . Monte Carlo experiments show the estimated CDF and critical regions are precise in finite sample. Moreover, Corollary 1(ii) states that the LR test for the null hypothesis H0 against the alternative hypothesis H1 with critical region {LRn0 T (m − m1 , m) ≤ c̃1−α } is consistent. Corollary 1. Under Assumptions 2.1, 2.2(i), and 3.1(i)–(iii), in addition to H0 , for the stable case defined in Assumption 2.2(ii) or the unstable cases defined in Assumption 2.2(iii) or (iv), with Assumption 4.1 or 4.2, (i) (Estimation of the limiting distribution of the LR statistics) F̃LR (c) − FLR (c) = o(1) for each c; (ii) (Consistency of the critical region of the LR statistics) the LR test for the null hypothesis H0 : rank(Hm0 − Im ) = m − m1 against the alternative hypothesis H1 : rank(Hm0 − Im ) > m − m1 with critical regions defined by c̃1−α is consistent. The above LR test procedure would not weakly consistently estimate the rank if type-I error were fixed with respect to all sample sizes. However, letting O(1) < c̃nT ,1−αnT = o(nT ), we can consistently estimate the rank of cointegration since the type-I error disappears asymptotically while the critical regions defined by c̃nT ,1−αnT is still consistent. Consistency of estimators for matrix ranks is discussed in literature, for instance, Cragg and Donald (1997) and Robin and Smith (2000). Monte Carlo simulations reveal finite sample performances of the proposed LR test statistics. Derivations and proofs of this proposition are presented in the Appendix and a supplementary file. 5. Monte Carlo experiments We conduct Monte Carlo experiments to investigate small sample properties of QMLEs and test statistics for cointegration ranks for the SVAR model with disturbances with normal or non-normal distributions. We also compare the QMLE with other estimators such as those of 2SLS, 3SLS, and misspecified single equation QML. Queen spatial weights matrices Wn are usually employed in the literature so we use a block diagonal matrix formed by a row-normalized queen matrix.28 27 For example, for each time series, unrestricted OLS estimators for both matrices are consistent. (see Lütkepohl, 2005). 28 Queen matrix means any pair of spatial units which share a border or a single common point are neighbors. Here, each block contains 9 spatial units that are arranged in a 3 × 3 rectangular. For example, spatial unit 1 is connected with units 2, 4 and 5, while unit 2 is connected with units 1, 3, 4, 5, and 6. With more units in a block, we need a larger sample size to increase the number of blocks. The simulation conclusions are not sensitive with the size of blocks (for example, the block size can be 9 or 16 as in Yu et al., 2012.). With a block diagonal structure of Wn , the number of unit eigenvalues increases with n. For example, there are 4 unit eigenvalues of Wn when n = 36, and 8 when n = 72. For additional simulations, other forms for the spatial weights matrices are allowed. 352 K. Yang and L.-f. Lee / Journal of Econometrics 221 (2021) 337–367 Table 5.1 QML estimation for case S. Method QML(Stable) QML(Stable) QML(Stable) QML(Stable) n, T 36, 30 36, 60 72, 30 72, 60 θ θc θ θc θ θc θ θc π11 Bias S.D. CP 0.001 (0.038) 0.940 0.000 (0.038) 0.946 0.000 (0.028) 0.940 0.000 (0.028) 0.942 0.002 (0.026) 0.962 0.001 (0.026) 0.970 0.003 (0.019) 0.926 0.003 (0.019) 0.932 π21 Bias S.D. CP 0.000 (0.041) 0.938 0.001 (0.041) 0.942 0.001 (0.029) 0.934 0.000 (0.029) 0.940 0.003 (0.028) 0.948 0.002 (0.028) 0.952 0.000 (0.019) 0.952 0.000 (0.019) 0.956 p11 Bias S.D. CP 0.026 (0.031) 0.868 0.007 (0.030) 0.956 0.013 (0.021) 0.914 0.003 (0.021) 0.942 0.026 (0.023) 0.760 0.007 (0.023) 0.924 0.013 (0.015) 0.862 0.004 (0.015) 0.948 p21 Bias S.D. CP −0.013 (0.029) 0.924 −0.010 (0.029) 0.948 −0.006 −0.004 −0.006 (0.020) 0.952 −0.013 (0.022) 0.890 −0.009 (0.020) 0.948 (0.022) 0.918 (0.015) 0.932 −0.004 (0.015) 0.948 Bias S.D. CP 0.000 (0.094) 0.940 −0.006 (0.096) 0.938 0.000 (0.067) 0.938 −0.002 0.002 (0.067) 0.936 −0.003 (0.068) 0.936 0.002 (0.046) 0.956 0.000 (0.046) 0.952 Bias S.D. CP −0.005 (0.087) 0.948 0.007 (0.087) 0.954 −0.003 0.002 (0.057) 0.964 −0.004 (0.061) 0.922 0.005 (0.061) 0.942 −0.001 (0.056) 0.964 0.003 (0.043) 0.942 Bias S.D. CP 0.008 (0.091) 0.954 0.007 (0.091) 0.962 0.001 (0.065) 0.934 0.001 (0.065) 0.938 0.000 (0.060) 0.954 0.000 (0.060) 0.956 −0.002 (0.045) 0.950 −0.002 (0.045) 0.950 Bias S.D. CP −0.002 (0.110) 0.934 −0.002 (0.110) 0.936 −0.004 −0.004 (0.079) 0.940 (0.080) 0.942 0.006 (0.078) 0.940 0.004 (0.078) 0.944 0.005 (0.054) 0.950 0.005 (0.054) 0.950 σ11 Bias S.D. CP 0.130 (0.041) 0.196 0.014 (0.046) 0.924 0.104 (0.030) 0.136 0.003 (0.034) 0.930 0.135 (0.029) 0.016 0.006 (0.034) 0.928 0.116 (0.021) 0.000 0.002 (0.024) 0.932 σ21 Bias S.D. CP 0.064 (0.032) 0.628 0.006 (0.036) 0.948 0.051 (0.024) 0.532 0.001 (0.027) 0.956 0.067 (0.025) 0.330 0.002 (0.029) 0.922 0.058 (0.017) 0.138 0.002 (0.019) 0.942 φ11 φ21 ψ11 ψ21 (0.068) 0.942 (0.043) 0.940 The SVAR model that we investigate is: y1,t = ψ11 Wy1,t + ψ21 Wy2,t + p11 y1,t −1 + p21 y2,t −1 + φ11 Wy1,t −1 + φ21 Wy2,t −1 + x′t Π·1 + c1 + d1,t + v1,t , y2,t = ψ12 Wy1,t + ψ22 Wy2,t + p12 y1,t −1 + p22 y2,t −1 + φ12 Wy1,t −1 + φ22 Wy2,t −1 + x′t Π·2 + c2 + d2,t + v2,t , where W is an n × n matrix and other variables are column vectors of dimension n, but for simplicity the subscript n is omitted. ( ) For this system, the disturbances v1,t and v2,t have mean 0 and, for each unit i, their covariance matrix Σ = 01.5 01.5 . For the non-normal distribution, we first generate two uniformly distributed independent random variables with mean zero ( and) variance one, then multiply the vector of the two uniform random variables by a constant matrix L, where LL′ = 01.5 01.5 , so that the resulted disturbances v1,it and v2,it have the same variance–covariance matrix as in the previous design. xt = [x′1,t , x′2,t ]′ , where for each i, xl,it for l = 1, 2 are also generated by U [0, 3]. cl and dl,t for l = 1, 2, are respectively individual and time fixed effects. dl,t ’s generated by uniformly distributed random variables on [0, 1]. cl = cl,a + cl,b are the sum of two parts, where the first part consists of randomly generated positive integers with 5.5 as its mean, the second part equals temporal means of x1,t for l = 1 and x2,t when l = 2. Such a design allows correlation of the individual effects with regressors. Π·1 = [1, 0.5]′ and Π·2 = [0.5, 1]′ . The 2 × 2 parameter matrices Ψ , P and Φ may be different for different models. In all the tables, we use θ to indicate the estimator of θ0 without bias correction while θ c is for the bias corrected one. ( ) ( 0.1 −0.2 ) 0.2 Finite )sample properties (of QMLEs.) For case S, Ψ = Φ = −00.2.2 − −0.2 −0.2 . For case MC, Ψ = −0.1 and P = ( 0.1. 0 .2 26 −0.28 P = Φ( = −00.2.2) − −0.28 −0.16 −0.1 . Hence( there) is 1 unit eigenvalue for H20 with m = 2 and m1 = 1. For case ( 0.2and ) SC, Ψ = 0.2 00..25 , P = −00.5.2 −00.2.2 , and Φ = 00.3 00.3 (. Hence there eigenvalues for H20 with ) are 2 (unit ) ( 0.2m −= )m1 = 2. −0.28 0.208 −0.204 0.2 We also consider an experiment for case E, Ψ = −00.26 , P = , and Φ = .28 −0.16 −0.204 −0.098 −0.2 −0.1 . Hence one eigenvalue is 1.0167, slightly larger than 1. We conduct these experiments for 500 repetitions with sample sizes (n, T ) = (36, 30), (36, 60), (72, 30), and (72, 60). Table 5.1 reports the QMLEs’ biases and standard deviations for the case S with normally distributed disturbances, while Tables 5.3 and 5.2 are for cases SC and MC. We only show numerical results for the first equation to save pages. The numerical results of the second equation have similar features. Biases of QMLEs for Π , Φ , and Ψ are small but those K. Yang and L.-f. Lee / Journal of Econometrics 221 (2021) 337–367 353 Table 5.2 QML estimation for case MC. Method QML(MC) QML(MC) QML(MC) QML(MC) n, T 36, 30 36, 60 72, 30 72, 60 θ θc θ θc θ θc θ θc π11 Bias S.D. CP 0.002 (0.038) 0.946 0.001 (0.038) 0.948 0.000 (0.028) 0.936 0.000 (0.028) 0.938 0.002 (0.026) 0.958 0.001 (0.026) 0.964 0.003 (0.020) 0.928 0.003 (0.020) 0.930 π21 Bias S.D. CP 0.000 (0.041) 0.936 0.000 (0.041) 0.942 0.001 (0.029) 0.934 0.000 (0.029) 0.938 0.003 (0.029) 0.950 0.002 (0.029) 0.952 0.000 (0.019) 0.954 0.000 (0.019) 0.956 p11 Bias S.D. CP 0.029 (0.030) 0.836 0.010 (0.030) 0.946 0.014 (0.021) 0.902 0.005 (0.021) 0.944 0.028 (0.023) 0.736 0.010 (0.023) 0.918 0.014 (0.014) 0.846 0.005 (0.014) 0.938 p21 Bias S.D. CP −0.015 (0.029) 0.920 −0.012 (0.029) 0.944 −0.006 −0.005 −0.006 (0.021) 0.952 −0.013 (0.022) 0.880 −0.011 (0.021) 0.942 (0.022) 0.906 (0.015) 0.922 −0.005 (0.015) 0.938 Bias S.D. CP −0.002 (0.094) 0.938 −0.007 (0.096) 0.940 −0.001 −0.002 (0.068) 0.928 0.000 (0.067) 0.942 −0.003 (0.068) 0.928 0.001 (0.046) 0.954 −0.001 (0.046) 0.954 Bias S.D. CP −0.005 (0.086) 0.950 0.004 (0.087) 0.942 −0.002 0.001 (0.056) 0.964 −0.004 (0.061) 0.934 0.003 (0.061) 0.944 −0.001 (0.056) 0.960 0.003 (0.042) 0.946 Bias S.D. CP 0.009 (0.090) 0.958 0.007 (0.090) 0.962 0.001 (0.064) 0.934 0.002 (0.064) 0.936 0.000 (0.059) 0.954 0.000 (0.059) 0.956 −0.002 (0.045) 0.950 −0.002 (0.045) 0.952 Bias S.D. CP −0.003 (0.108) 0.934 −0.003 (0.108) 0.934 −0.004 −0.004 (0.078) 0.940 (0.078) 0.944 0.006 (0.077) 0.934 0.004 (0.077) 0.940 0.005 (0.053) 0.948 0.004 (0.053) 0.950 σ11 Bias S.D. CP 0.130 (0.041) 0.204 0.014 (0.046) 0.932 0.104 (0.031) 0.142 0.003 (0.034) 0.936 0.135 (0.030) 0.022 0.006 (0.034) 0.926 0.116 (0.021) 0.000 0.003 (0.024) 0.932 σ21 Bias S.D. CP 0.064 (0.032) 0.642 0.006 (0.037) 0.946 0.051 (0.024) 0.546 0.000 (0.027) 0.954 0.067 (0.025) 0.332 0.002 (0.029) 0.920 0.058 (0.017) 0.146 0.002 (0.020) 0.946 φ11 φ21 ψ11 ψ21 (0.068) 0.940 (0.042) 0.948 estimates of other parameters’ are not negligible. The biases decrease with larger n and/or T . The bias corrected estimates for P and Σ can significantly reduce biases of their QMLEs. Standard deviations of estimates decrease at the rate of √ nT . The 95% coverage probabilities (CP) of QMLEs are also provided. The QMLEs for σ ’s and p11 have a low coverage probability. But the bias correction can increase the 95% coverage probabilities for almost all the estimates. The coverage probabilities for bias-corrected estimates are around 95%. Theoretical standard deviations of all estimates, which are in the supplementary file, are similar to the empirical standard deviations. Table 5.4 reports numerical results of QMLEs with non-normal disturbances. The first three experiments are for case SC and others are for S, MC, and Explosive cases. Their biases, standard deviations and bias corrected estimates exhibit similar pattern with those of normally distributed disturbances. 2. Comparing the QMLE with other estimates. In addition to the QML estimation, the SVAR model can be estimated by IV-based approaches. Using case S as an example, we compare the QMLE with the two stage least square estimator (2SLSE) and the three stage least square estimator (3SLSE). When implementing those IV-based estimators, we use (X , X−1 , WX , W 2 X , Y−2 , WY−2 , W 2 Y−1 ) as IVs, which are employed conventionally in the empirical literature. All the parameters for data generation are the same as those of case S with the only exception that we let π12 = π21 = 0. For single equation estimation method, if a researcher neglects the interaction among variables by introducing a misspecified single equation SDPD model, a single equation QML method would be applied. For that, we investigate the scenario that the model only contains the first (second) equation by treating Wy2,t (Wy1,t ) as if it were an exogenous contextual effect, which is labeled as ‘‘Mis-QML’’. Table 5.5 presents numerical results for comparing estimates. Overall, QMLEs with bias correction have advantages both in bias reduction and efficiency gains. QMLEs for P usually have larger biases than those of 2SLS and 3SLS, but bias corrected estimates can reduce those biases. The QMLEs for Ψ and Φ have much smaller biases than those of 2SLS or 3SLS. The QML estimation can reduce standard deviations of 2SLS/3SLS by 45% to 75% for some parameters in Π , Φ , and Ψ . There are also non-negligible efficiency gains for estimates of P. The simulation results of Mis-QMLEs for Ψ and Φ suggest inconsistency, since some of these estimates have very large biases. 3. Hypothesis testing. We investigate the test statistics on their size and power. For experiments, we use parameters in the case MC with m = 2 and m1 = 1. Under a correctly specified H0 : rank(H20 − I2 ) = 1 (case MC), the test statistic 354 K. Yang and L.-f. Lee / Journal of Econometrics 221 (2021) 337–367 Table 5.3 QML estimation for case SC. Method QML(SC) QML(SC) QML(SC) QML(SC) n, T 36, 30 36, 60 72, 30 72, 60 θ θc θ θc θ θc θ θc π11 Bias S.D. CP 0.004 (0.038) 0.942 0.002 (0.038) 0.948 0.000 (0.028) 0.948 0.000 (0.028) 0.950 0.003 (0.026) 0.958 0.002 (0.026) 0.958 0.003 (0.020) 0.918 0.003 (0.020) 0.920 π21 Bias S.D. CP 0.000 (0.039) 0.936 0.000 (0.039) 0.938 0.001 (0.027) 0.942 0.001 (0.027) 0.944 0.003 (0.027) 0.948 0.002 (0.027) 0.952 −0.001 (0.019) 0.944 −0.001 (0.019) 0.946 p11 Bias S.D. CP 0.039 (0.027) 0.682 0.023 (0.026) 0.882 0.018 (0.018) 0.818 0.011 (0.018) 0.906 0.037 (0.019) 0.476 0.022 (0.019) 0.782 0.018 (0.013) 0.730 0.011 (0.012) 0.868 p21 Bias S.D. CP −0.019 (0.027) 0.878 −0.019 (0.027) 0.886 −0.008 −0.008 −0.009 (0.019) 0.932 −0.017 (0.021) 0.834 −0.017 (0.019) 0.934 (0.021) 0.842 (0.014) 0.898 −0.009 (0.014) 0.900 φ11 Bias S.D. CP 0.006 (0.096) 0.934 0.003 (0.097) 0.936 0.003 (0.068) 0.944 0.002 (0.068) 0.944 0.007 (0.067) 0.934 0.005 (0.067) 0.936 0.004 (0.047) 0.942 0.003 (0.047) 0.944 φ21 Bias S.D. CP −0.009 (0.089) 0.932 −0.003 (0.089) 0.940 −0.003 −0.001 −0.002 (0.061) 0.948 −0.007 (0.062) 0.942 −0.002 (0.061) 0.946 (0.062) 0.956 (0.042) 0.944 0.000 (0.042) 0.940 Bias S.D. CP 0.010 (0.093) 0.948 0.008 (0.094) 0.954 0.002 (0.065) 0.938 0.002 (0.065) 0.940 0.000 (0.059) 0.958 0.000 (0.059) 0.962 −0.002 (0.045) 0.944 −0.002 (0.045) 0.944 Bias S.D. CP −0.004 (0.111) 0.936 −0.002 (0.111) 0.942 −0.004 −0.005 (0.080) 0.934 (0.080) 0.936 0.007 (0.077) 0.942 0.007 (0.077) 0.950 0.007 (0.054) 0.950 0.007 (0.054) 0.954 σ11 Bias S.D. CP 0.131 (0.042) 0.192 0.016 (0.047) 0.920 0.105 (0.031) 0.128 0.004 (0.034) 0.940 0.135 (0.029) 0.012 0.006 (0.033) 0.936 0.116 (0.021) 0.000 0.002 (0.024) 0.926 σ21 Bias S.D. CP 0.064 (0.031) 0.604 0.006 (0.035) 0.946 0.052 (0.022) 0.480 0.001 (0.025) 0.958 0.066 (0.024) 0.286 0.001 (0.027) 0.934 0.058 (0.017) 0.114 0.001 (0.019) 0.932 ψ11 ψ21 converges to a noncentral χ 2 (1) distribution. The results are in Table 5.6. For the model without exogenous variables, with smaller sample sizes (n, T ), a couple of exceptions occur with large empirical sizes. When the sample sizes are moderate, no matter with normally or non-normally distributed disturbances, empirical sizes are close to but a little higher than 0.05. For the model with exogenous variables, the empirical sizes are close to but a little higher than 0.05, even with smaller sample sizes. In Fig. 1 we show the true asymptotic CDF (with Bm1 ), the simulated asymptotic CDF (with B̃m1 ), and the empirical CDF of the test statistic when (n, T ) = (90, 120). The true asymptotic CDF is a noncentral χ 2 (1) CDF with true parameters while the simulated asymptotic CDF is a noncentral χ 2 (1) CDF with estimated parameters. The figure shows that these three CDFs are very close, which implies the proposed asymptotic CDFs have good approximation. We investigate the power under the alternative hypothesis rank(H20 − I2 ) = 2 (case S) while the null hypothesis is rank(H20 − I2 ) = 1. We study the power function by applying the parameters of case MC but slightly reducing the unit eigenvalue of H20 by √c . Therefore, the true model is in the case S while we conduct the test as if it were in the case MC. nT This setting allows the study of the power against alternatives (case S) which are close to the null hypothesis (case MC) and they are of empirical interest since it is important to distinguish these two cases. We let c = 0.1, 0.2, . . . , 0.9 and see how the rejection rate evolves when (n, T ) = (54, 30), (90, 30) and (90, 60) in Table 5.7 and Fig. 2. We use the uniformly distributed disturbances as defined in Monte Carlos for estimation and run 1000 repetitions for each experiment. The results show that the power increases with c for all designs. When c is larger, which implies that the two cases are separable, the rejection rate is close to one. The graph shows that with increasing c, the power for test will increase for all designs with and without X . Comparing (n, T ) = (54, 30) and (n, T ) = (90, 30), with bigger n, the rejection rates increase similarly at smaller c’s but converge to the value 1 faster eventually. Comparing (n, T ) = (90, 30) and (n, T ) = (90, 60), with bigger T , the rejection rates increase faster at smaller c’s. Comparing (n, T ) = (54, 30) and (n, T ) = (90, 60), with bigger n and T , the rejection rates converge to 1 generally faster with increasing c’s. K. Yang and L.-f. Lee / Journal of Econometrics 221 (2021) 337–367 355 Table 5.4 QML estimation for S, MC, SC, and Explosive cases. Method QML(SC) QML(SC) QML(SC) QML(MC) QML(Stable) QML(Explosive) n, T 36, 30 36, 60 72, 30 36, 30 36, 30 36, 30 θ θc θ θc θ θc θ θc θ θc θ θc π11 Bias S.D. CP 0.006 (0.040) 0.934 0.004 (0.040) 0.954 0.002 (0.026) 0.944 0.003 (0.026) 0.958 0.002 (0.027) 0.940 0.000 (0.027) 0.956 0.003 (0.039) 0.944 0.002 (0.039) 0.960 0.003 (0.039) 0.942 0.002 (0.039) 0.962 0.003 (0.039) 0.944 0.002 (0.039) 0.960 π21 Bias S.D. CP 0.000 (0.040) 0.940 0.000 (0.040) 0.958 0.001 (0.027) 0.938 0.001 (0.027) 0.950 0.001 (0.029) 0.938 0.000 (0.029) 0.952 −0.001 0.000 (0.042) 0.962 −0.001 (0.042) 0.940 0.000 (0.042) 0.960 −0.001 (0.042) 0.940 (0.042) 0.940 0.000 (0.042) 0.962 p11 Bias S.D. CP 0.039 (0.028) 0.682 0.024 (0.028) 0.840 0.017 (0.019) 0.828 0.010 (0.018) 0.888 0.035 (0.019) 0.510 0.020 (0.019) 0.812 0.029 (0.032) 0.826 0.011 (0.032) 0.930 0.027 (0.033) 0.856 0.007 (0.033) 0.932 0.030 (0.032) 0.824 0.011 (0.032) 0.928 p21 Bias S.D. CP −0.018 −0.018 (0.029) 0.872 −0.008 (0.019) 0.896 −0.009 −0.016 −0.017 −0.015 −0.012 (0.019) 0.850 (0.031) 0.892 −0.014 (0.031) 0.900 −0.015 (0.020) 0.848 −0.012 (0.031) 0.910 −0.010 (0.019) 0.898 (0.031) 0.926 (0.031) 0.890 (0.031) 0.904 Bias S.D. CP −0.001 −0.004 (0.095) 0.958 0.001 (0.063) 0.954 0.000 (0.063) 0.958 0.002 (0.067) 0.936 0.000 (0.067) 0.938 −0.009 −0.014 (0.095) 0.948 −0.006 (0.094) 0.944 −0.013 −0.009 −0.014 (0.095) 0.942 (0.094) 0.950 (0.095) 0.948 Bias S.D. CP −0.005 0.000 (0.085) 0.952 −0.001 (0.059) 0.936 0.001 (0.059) 0.936 −0.003 (0.084) 0.946 0.002 (0.084) 0.948 −0.009 (0.085) 0.940 0.003 (0.085) 0.948 −0.007 (0.061) 0.936 0.002 (0.061) 0.952 −0.007 (0.085) 0.950 (0.084) 0.944 0.002 (0.084) 0.948 Bias S.D. CP 0.013 (0.085) 0.962 0.011 (0.085) 0.968 0.002 (0.060) 0.952 0.002 (0.060) 0.956 −0.001 −0.001 (0.065) 0.940 (0.065) 0.944 0.013 (0.085) 0.960 0.012 (0.085) 0.964 0.012 (0.086) 0.958 0.011 (0.086) 0.964 0.013 (0.085) 0.960 0.012 (0.085) 0.964 Bias S.D. CP −0.009 −0.008 (0.106) 0.954 0.000 (0.076) 0.952 −0.001 0.004 (0.077) 0.950 −0.010 (0.108) 0.944 −0.011 −0.012 (0.107) 0.946 −0.012 (0.107) 0.954 −0.011 (0.076) 0.954 0.004 (0.077) 0.948 −0.011 (0.106) 0.948 (0.109) 0.954 (0.107) 0.946 (0.107) 0.954 σ11 Bias S.D. CP 0.044 (0.034) 0.890 0.010 (0.035) 0.984 0.020 (0.023) 0.964 0.003 (0.024) 0.992 0.041 (0.025) 0.820 0.007 (0.026) 0.988 0.041 (0.036) 0.906 0.007 (0.037) 0.988 0.042 (0.035) 0.906 0.007 (0.036) 0.990 0.041 (0.036) 0.906 0.007 (0.037) 0.988 σ21 Bias S.D. CP 0.019 (0.032) 0.944 0.002 (0.034) 0.980 0.010 (0.023) 0.958 0.001 (0.024) 0.974 0.021 (0.022) 0.920 0.004 (0.023) 0.982 0.018 (0.035) 0.946 0.001 (0.037) 0.966 0.018 (0.035) 0.948 0.001 (0.036) 0.964 0.018 (0.035) 0.946 0.001 (0.037) 0.966 φ11 φ21 ψ11 ψ21 (0.030) 0.860 (0.095) 0.954 (0.094) 0.948 6. Empirical application: Grain market integration of Yangtze River Basin In this section we provide an empirical example on the grain market integration in the 18th century China as an application of the SVAR model. Interregional trade is considered as a key condition for economic development, especially for industrialization since it brings in spillover of information and technology. Keller and Shiue (2007) point out that analyzing the evolution of interregional trade in ancient China ‘‘can provide valuable insights on comparative economic development in China and elsewhere’’. Shiue (2002), Keller and Shiue (2007) and Yan and Liu (2011) indicate that the southern China, including Yangtze River Basin, has demonstrated a high level of market integration in grain market. Keller and Shiue (2007) use univariate SDPD models to show spatial features have shaped the expansion of interregional trade. Since in the mid-Qing dynasty (18th century) the most important trading goods are grains, all the above researches focus on rice prices in difference prefectures. However, other grains, especially wheat, is not negligible in studying the grain market integration. Rice was the primary food in Yangtze River Basin (Huang, 2009), for example, the percentage of rice consumption in 9 prefectures on the south of Yangtze River counted for about 93% of grain consumption in the year 1776. However, wheat still counted for about 7% of consumption in those prefectures. A SVAR model including both rice and wheat prices might take their substitution effects into account in a complete analysis of grain market. Therefore, we add an additional equation of wheat price to the model of Keller and Shiue (2007). The equation system describes spatial and intertemporal relationships between the wheat and rice prices in a prefecture, and rice and wheat prices of its neighbors in current and last periods, in addition to the own rice and wheat prices in the prefecture from last period. This model could capture spatial effects, diffusion and time lagged effects of prices, which may provide evidence in favor of or reject the grain market integration hypothesis. 6.1. Data The data come from a historical archive, ‘‘Gongzhong Liangjian Dan’’, translated as ‘‘Grain Price Lists in the Palace’’. We collect the data of wheat prices from the electronic version from the Database of Grain Prices in Qing Dynasty, maintained 356 K. Yang and L.-f. Lee / Journal of Econometrics 221 (2021) 337–367 Table 5.5 Estimation for case S. π11 π22 p11 p21 p12 p22 φ11 φ21 φ12 φ22 ψ11 ψ21 ψ12 ψ22 Method QMLE(S) 2SLS n, T 36, 30 36, 30 Bias S.D. Bias S.D. Bias S.D. Bias S.D. Bias S.D. Bias S.D. Bias S.D. Bias S.D. Bias S.D. Bias S.D. Bias S.D. Bias S.D. Bias S.D. Bias S.D. 3SLS Mis-QMLE 36, 30 θ θc θ θ θ θc 0.002 (0.038) 0.000 (0.040) 0.020 (0.026) −0.003 (0.025) 0.002 (0.028) 0.012 (0.025) 0.000 (0.078) −0.003 (0.078) 0.002 (0.074) −0.002 (0.080) 0.009 (0.070) −0.002 (0.088) 0.001 (0.091) 0.000 (0.069) 0.001 (0.038) 0.000 (0.040) 0.003 (0.026) −0.007 (0.025) −0.003 (0.027) −0.004 (0.025) −0.002 (0.080) 0.006 (0.078) 0.002 (0.074) 0.007 (0.082) 0.008 (0.070) −0.002 (0.088) 0.000 (0.091) −0.002 (0.068) 0.007 (0.108) 0.003 (0.119) −0.003 (0.056) −0.003 (0.044) −0.004 (0.053) −0.004 (0.043) 0.020 (0.165) −0.035 (0.128) −0.008 (0.159) −0.034 (0.149) −0.101 (0.188) −0.025 (0.376) −0.054 (0.324) −0.089 (0.226) 0.007 (0.098) 0.003 (0.110) −0.002 (0.055) −0.003 (0.044) −0.008 (0.051) −0.004 (0.042) 0.035 (0.153) −0.038 (0.122) −0.028 (0.151) −0.030 (0.141) −0.136 (0.172) 0.008 (0.344) −0.015 (0.299) −0.121 (0.208) 0.017 (0.103) 0.008 (0.108) 0.050 (0.034) −0.008 (0.032) −0.005 (0.034) 0.027 (0.032) 0.104 (0.063) 0.042 (0.066) −0.015 (0.063) 0.039 (0.067) −0.005 (0.042) 0.285 (0.071) 0.108 (0.065) 0.030 (0.048) 0.016 (0.104) 0.006 (0.108) 0.011 (0.035) 0.008 (0.033) 0.010 (0.034) −0.001 (0.033) 0.115 (0.064) 0.024 (0.066) −0.031 (0.063) 0.048 (0.068) −0.001 (0.041) 0.274 (0.071) 0.106 (0.065) 0.031 (0.048) Table 5.6 ′ Empirical size of test with H0 : rank(H20 − I2 ) = 1. Normal disturbances Non-normal disturbances n, T 30 60 90 120 30 60 90 120 No X 36 54 72 90 0.144 0.057 0.062 0.051 0.093 0.059 0.057 0.061 0.087 0.04 0.057 0.049 0.069 0.067 0.057 0.059 0.145 0.065 0.057 0.06 0.087 0.058 0.035 0.055 0.084 0.066 0.054 0.053 0.072 0.056 0.052 0.052 With X 36 54 72 90 0.066 0.05 0.058 0.045 0.052 0.064 0.064 0.053 0.06 0.057 0.054 0.059 0.048 0.052 0.05 0.058 0.066 0.049 0.064 0.066 0.042 0.049 0.056 0.063 0.058 0.058 0.053 0.07 0.04 0.044 0.052 0.055 Table 5.7 ′ Power of test with H0 : rank(H20 − I2 ) = 1. (n, T ) c 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 54,30 54,30 90,30 90,30 90,60 90,60 No X With X No X With X No X With X 0.21 0.24 0.24 0.31 0.58 0.74 0.38 0.32 0.42 0.45 0.77 0.89 0.71 0.53 0.67 0.61 0.80 0.88 0.86 0.73 0.90 0.81 0.83 0.86 0.90 0.79 0.97 0.91 0.89 0.90 0.92 0.81 0.99 0.95 0.96 0.95 0.92 0.83 0.99 0.97 0.98 0.98 0.93 0.86 0.99 0.98 0.99 0.99 0.94 0.87 0.99 0.98 1.00 1.00 by Academia Sinica, while the data of rice prices are from Lee and Yu (2010b). The data covers 65 prefectures and 49 years, from 1742 to 1790. The data are collected from middle and lower Yangtze Rive Basin provinces: Anhui, Jiangsu, Zhejiang, Hubei, Hunan, and Jiangxi. We exclude the upper Yangtze River Basin since it is geographically far away from others and the transportation to that area could be very costly due to rugged terrain. The middle and lower basin is often considered as the richest area in China where interregional trade has been well-developed. We use semi-annual data, collected from K. Yang and L.-f. Lee / Journal of Econometrics 221 (2021) 337–367 357 Fig. 1. Asymptotic, Simulated, and Empirical CDFs of the test statistics. ′ ′ Fig. 2. Local Power of Test with H0 : rank(H20 − I2 ) = 1. Note: Lines illustrate the rejection rates of H1 : rank(H20 − I2 ) = 2 with increasing c. February and August of each year, so that there are 98 periods. There are 12% missing data, which are interpolated.29 We 29 Similarly to Keller and Shiue (2007), we use the TRAMO (Time Series Regression with ARIMA Noise, Missing Observations and Outliers) program to interpolate series for missing data (Gomez and Maravall, 1997). 358 K. Yang and L.-f. Lee / Journal of Econometrics 221 (2021) 337–367 Table 6.1 Data summary. Observations Prefectures Provinces Periods (1742–1790, Semiannual) 6370 65 6 98 Distance Rice price (low) Rice price (high) Wheat price (low) Wheat price (high) Rice price Wheat price Mean S.D. Maximum Minimum 487.8 1.338 1.627 0.976 1.334 1.483 1.168 260.5 0.290 0.343 0.208 0.251 0.305 0.220 1195 3.000 3.600 2.110 2.860 3.250 2.634 13.17 0.660 0.900 0.340 0.730 0.780 0.590 Note: the distances are measure by kilometers (1 km = 0.6213 mi). have one category of wheat and three categories of rice: high-quality, mid-quality and low-quality. We use the mid-quality rice prices following Shiue (2002), Keller and Shiue (2007) and Yan and Liu (2011). We can access to the maximum and minimum of rice and wheat prices so we use the means of them to represent the prices. The data are summarized in Table 6.1. 6.2. Model and empirical results In Keller and Shiue (2007) and Lee and Yu (2010b), the rice price from a prefecture is specified to depend on its own time lagged price, neighbors’ prices of current period and neighbors’ prices in last period. yri,t = prr yri,t −1 + ψrr n ∑ wi,j yrj,t + φrr j=1 n ∑ wi,j yrj,t −1 + cir + drt + vir,t , j=1 where prr represents the time lagged effect, ψrr represents the effect of neighbors’ prices (spatial effects), φrr represents the effect of neighbors’ prices in last period (diffusion), cir is the individual fixed effects, drt is the time fixed effect, and vir,t is an idiosyncratic shock. The model we are going to estimate is an extension with the specification (r) yri,t =prr yri,t −1 + ψrr n ∑ wi,j yrj,t + φrr j=1 + pwr yw i,t −1 + ψw r n ∑ wi,j yrj,t −1 j=1 n ∑ wi,j yw j,t + φ w r j=1 w (w) yw i,t =pww yi,t −1 + ψww n ∑ + pr w yri,t −1 + ψr w n ∑ j=1 r r r wi,j yw j,t −1 + ci + dt + vi,t , j=1 wi,j yw j,t + φww j=1 n ∑ n ∑ wi,j yw j,t −1 j=1 wi,j yrj,t + φr w n ∑ w wi,j yrj,t −1 + ciw + dw t + vi,t , j=1 where pkl , ψkl and φkl (k ̸ = l) represent the cross-price time lagged effects, spatial effects and diffusion of rice/wheat on wheat/rice. The idiosyncratic shocks in both equations are allowed to correlate with each other. Two types of spatial weights matrices are generated from geographic distances between prefectures and/or two prefectures in a same province. In the eighteenth century, the trade cost was highly related to geographic distance, so would be informational transmission. Furthermore, considering stability of economy and society, local governments of Qing dynasty would have policies to keep grain prices stable, such as ‘‘Chang-Ping barn’’ and transporting grains across prefectures (Chen, 2004) in order to stabilize prices.30 Therefore, if two prefectures are in the same province, the prices are likely to be correlated. The first specification of Wn assumes that wi,j = exp(−cm × Distance(i, j)/100), where distance is measured in kilometers, cm = 1.4 or 1.2. The second specification of Wn is a block diagonal matrix with 6 blocks (provinces). The interactions between any two prefectures in a province decrease with distance such that wi,j = exp(−cm × Distance(i, j)/100) × 1{i and j are in the same province}. All spatial weights matrices have been row-normalized in the system. Table 6.2 presents QMLEs for the system with different Wn ’s while Table 6.3 reports estimates for single-equation SDPD models with rice and wheat prices estimated separately, as if each single equation were a univariate model similar 30 Chang-Ping barn: the local governments stored rices in case of significant increase of rice prices due to disaster or other factors. K. Yang and L.-f. Lee / Journal of Econometrics 221 (2021) 337–367 359 Table 6.2 QML estimation for the grain market integration in Yangtze River Basin. Wn Distance cm = 1.2 Distance cm = 1.4 Province cm = 1.4 Limit None None None None 300 km Method QML QML QML QML QML Estimates θ θc θ θc θ θc θ θc θ θc prr 0.540 (0.011) 0.030 (0.012) 0.064 (0.009) 0.642 (0.010) 0.556 (0.011) 0.033 (0.012) 0.063 (0.009) 0.659 (0.010) 0.540 (0.011) 0.029 (0.012) 0.062 (0.009) 0.644 (0.010) 0.555 (0.011) 0.032 (0.012) 0.061 (0.009) 0.661 (0.010) 0.565 (0.011) 0.026 (0.012) 0.093 (0.010) 0.635 (0.010) 0.578 (0.011) 0.030 (0.012) 0.091 (0.010) 0.651 (0.010) 0.564 (0.011) 0.028 (0.012) 0.095 (0.010) 0.635 (0.010) 0.578 (0.011) 0.031 (0.012) 0.093 (0.010) 0.650 (0.010) 0.564 (0.011) 0.026 (0.012) 0.093 (0.010) 0.635 (0.010) 0.578 (0.011) 0.029 (0.012) 0.092 (0.010) 0.651 (0.010) −0.418 −0.431 (0.027) 0.028 (0.033) −0.102 (0.028) −0.603 (0.021) −0.378 (0.023) −0.002 (0.031) −0.065 (0.025) −0.562 (0.020) −0.390 (0.023) −0.004 (0.031) −0.066 (0.025) −0.575 (0.020) −0.180 −0.186 −0.202 −0.209 (0.027) 0.030 (0.033) −0.101 (0.028) −0.589 (0.021) (0.034) −0.193 (0.063) 0.118 (0.040) −0.420 (0.035) (0.034) −0.198 (0.063) 0.119 (0.040) −0.429 (0.035) (0.038) −0.285 (0.074) 0.168 (0.047) −0.497 (0.041) (0.038) −0.293 (0.074) 0.171 (0.047) −0.508 (0.041) −0.176 (0.035) −0.211 (0.060) 0.120 (0.039) −0.437 (0.036) −0.183 (0.035) −0.216 (0.060) 0.121 (0.039) −0.446 (0.036) 0.888 (0.031) −0.100 (0.045) 0.174 (0.034) 0.801 (0.015) 0.886 (0.031) −0.096 (0.045) 0.172 (0.034) 0.802 (0.015) 0.797 (0.023) −0.029 (0.041) 0.123 (0.031) 0.759 (0.015) 0.796 (0.023) −0.026 (0.041) 0.121 (0.031) 0.760 (0.015) 0.449 (0.031) 0.316 (0.086) −0.031 (0.059) 0.483 (0.037) 0.448 (0.031) 0.319 (0.086) −0.033 (0.059) 0.484 (0.037) 0.491 (0.035) 0.457 (0.103) −0.101 (0.070) 0.584 (0.045) 0.491 (0.035) 0.462 (0.103) −0.104 (0.070) 0.586 (0.045) 0.449 (0.031) 0.339 (0.080) −0.037 (0.055) 0.512 (0.036) 0.449 (0.031) 0.341 (0.080) −0.038 (0.055) 0.513 (0.036) 0.009 (0.000) 0.002 (0.000) 0.006 (0.000) 0.009 (0.000) 0.002 (0.000) 0.006 (0.000) 0.009 (0.000) 0.002 (0.000) 0.006 (0.000) 0.009 (0.000) 0.002 (0.000) 0.006 (0.000) 0.008 (0.000) 0.002 (0.000) 0.006 (0.000) 0.008 (0.000) 0.002 (0.000) 0.006 (0.000) 0.008 (0.000) 0.002 (0.000) 0.006 (0.000) 0.008 (0.000) 0.002 (0.000) 0.006 (0.000) 0.008 (0.000) 0.002 (0.000) 0.006 (0.000) 0.008 (0.000) 0.002 (0.000) 0.006 (0.000) pw r Time effect pr w pww φrr φw r Diffusion effect φr w φww ψrr ψwr Spatial effect ψr w ψww σrr σw r σww Likelihood 3.8661 3.8651 Province cm = 1.2 4.0370 Province cm = 1.2 4.0264 4.0330 Note: ‘‘distance: cm = 1.2 or 1.4’’ means that we construct wi,j = exp(−cm × Distance(i, j)/100); ‘‘province: cm = 1.2 or 1.4’’ means that we construct wi,j = exp(−cm × Distance(i, j)/100) × 1{i and j are in the same province}. All above spatial weights matrices are row-normalized in regression. ‘‘Limit’’ means that if the distance between i and j exceeds this limit, wi,j = 0. Thus, ‘‘none’’ in the row of ‘‘limit’’ indicates that there is no limit for wi,j . to Keller and Shiue (2007) and Lee and Yu (2010b). The likelihood information criterion suggests the block spatial weights matrix wi,j = exp(−cm × Distance(i, j)/100) × 1{i and j are in the same province} with cm = 1.4 performs the best and our analysis in the following will be based on this specification. We focus three issues in the following: which case the model exhibits according to the proposed test method; whether the cross-price effects would be statistically significant; whether own-price effects in the SVAR model would differ from n those in the SDPD models with a univariate dependent variable. With block spatial weights matrices, and n1 > 0 for a large sample, we are able to run diagnostic test for the model to check which case it belongs to. The model exhibits ‘‘case S’’ for all the specifications. For the specification of Wn with cm = 1.4, H0 : {the model exhibits spatial cointegration (case SC)} is rejected with LR(0, 2) = 133.01 (critical value at 5% significance level is 9.37); H0 : {the model exhibits mixed cointegration (case MC) with cointegration rank 1} is rejected with LR(1, 2) = 85.17 (critical value at 5% significance level is 3.96). Therefore, we conclude that the data generating process exhibits a stable case. In fact, the estimated eigenvalues for the matrix H20 are 0.8709 and 0.4769, which also implies that the data generating process is stable. For another specification with the spatial weights matrix being block diagonal but cm = 1.2, LR(0, 2) = 132.80 and LR(1, 2) = 85.45 while the critical values at 5% significance level are respectively 9.47 and 3.72, with the estimated eigenvalues of H20 being 0.9531 and 0.4925. For the last specification of block spatial weights matrix with the spatial interaction being restricted within 300 km, LR(0, 2) = 132.69 and LR(1, 2) = 85.30 while the critical values at 5% significance level are respectively 9.63 and 3.80, with the estimated eigenvalues of H20 being 0.8830 and 0.4865. The estimates for own-price spatial effects, temporal effects, and diffusion effects are similar with those in the existed literature in sign, while the cross-price effects turn out to be nonnegligible. For the specification with block diagonal spatial weights matrix and cm = 1.4 in Table 6.2, all three effects (spatial effect, temporal effect, and diffusion effect) of wheat prices on neighbors’ rice prices are significant. The temporal effect and diffusion effect of rice prices on neighbors’ 360 K. Yang and L.-f. Lee / Journal of Econometrics 221 (2021) 337–367 Table 6.3 Estimation for the grain market integration in Yangtze River Basin. Wn Distance: c = 1.2 Province: c = 1.4 Province: c = 1.2 Limit None None None Method SDPD prr Time effect pww φrr Diffusion effect φww ψrr Spatial effect ψww σrr σww SDPD SDPD θ θc θ θc θ θc 0.550 (0.011) 0.671 (0.009) 0.566 (0.011) 0.688 (0.009) 0.569 (0.011) 0.679 (0.009) 0.585 (0.011) 0.696 (0.009) 0.571 (0.011) 0.679 (0.009) 0.587 (0.011) 0.696 (0.009) −0.401 (0.022) −0.599 (0.019) −0.413 −0.324 (0.016) −0.464 (0.015) −0.334 (0.016) −0.475 (0.015) −0.335 (0.016) −0.473 (0.015) −0.345 (0.022) −0.613 (0.019) 0.835 (0.014) 0.860 (0.013) 0.836 (0.014) 0.860 (0.013) 0.625 (0.010) 0.646 (0.010) 0.625 (0.010) 0.646 (0.010) 0.639 (0.011) 0.660 (0.010) 0.639 (0.011) 0.660 (0.010) 0.009 (0.000) 0.006 (0.000) 0.009 (0.000) 0.006 (0.000) 0.009 (0.000) 0.006 (0.000) 0.009 (0.000) 0.006 (0.000) 0.009 (0.000) 0.006 (0.000) 0.009 (0.000) 0.006 (0.000) (0.016) −0.485 (0.015) wheat prices are also statistically significant. The inclusion of cross-price effects also dampens the magnitude of own-price effects in the univariate SDPD models. For this specification of the SVAR model, the own-price spatial effects ψ̂ww = 0.448 after bias correction, while for those SDPD models in Table 6.3, ψ̂ww = 0.625 with the same specification of Wn , which might suggest that the SDPD model overestimates the own-price spatial effects due to lacking of cross-price spatial effects. There are similar patterns for estimates of own-price temporal effects and diffusion effects. The model can also be used to illustrate how shocks to one variable transmit to other variables. We adopt the definition of Koop et al. (1996) to derive the generalized impulse response function. The impulse response function shows that a unit shock to rice or wheat prices in any spatial unit propagates across both temporal and spatial dimensions. Additional analyses for impulse response functions are available in the supplementary file. 7. Concluding remarks This paper investigates a dynamic panel SVAR model, in which behaviors of spatial units depend not only on own temporal lags, but also respond to their neighbors’ or peers’ behaviors in current period (spatial lags), and to those in previous period (space-time lags; diffusion). The disturbances in the model are specified with time fixed effects, individual fixed effects and idiosyncratic disturbances. We mainly study three issues: features of dynamics and spatial interactions that a SVAR model can generate; the identification and estimation of parameters with simultaneity and unknown cointegration relationships; detection of cases that the true model can belong to. For the first issue, we categorize the model into 6 cases by the division of parameter spaces, which are stable, pure spatial cointegration, pure variable cointegration, mixed cointegration, pure unit roots, and explosive time series. In identification and estimation, we use IVs and structures of disturbances to identify parameters, while a QML estimator can be consistent without knowing cointegration relationships. To detect which case the true model is, we introduce a hypothesis testing method, which can distinguish cases S, SC and MC with cointegration ranks. Monte Carlo experiments demonstrate the advantage on reduction of biases and efficiency gain of QML estimators with respect to other estimators. The robustness of estimators and test statistics are presented in simulation results. The model is applied to study possible grain market integration using a unique historical dataset of grain prices. Previous researches consider solely rice prices while we add a multivariate feature since wheat is believed to be a substitute for rice. The empirical result shows that rice and wheat prices are spatially correlated among each other across prefectures. In future works, the identification and estimation of SVAR model under cases VC and PU should be investigated, as in these cases the spatial interdependence of variables no longer generates stable linear combinations among variables. For example, for case VC, the vector error correction model suggests that the long-run equilibrium is among variables in the same spatial unit while variables are spatially correlated in the short-run. The LR test procedure in this paper can also be regarded as a pre-test. It is of interest to explore the estimation method with determined information for cointegration rank after this test. Furthermore, asymptotic distributions of QML estimators and test statistics rely on the time trend generated by individual fixed effects and exogenous regressors as a time trend would dominate stochastic trends. For the model with neither exogenous regressors nor individual effects, the asymptotic analysis on stochastic trends would be different, but that situation is beyond our investigation in this paper. K. Yang and L.-f. Lee / Journal of Econometrics 221 (2021) 337–367 361 Acknowledgments The authors are grateful for having valuable comments and suggestions by a co-editor, an associate editor, and two anonymous referees of this journal, audiences in the 2017 Asian Meeting of the Econometric Society, 2017 China Meeting of the Econometric Society, and seminar participants at the Ohio State University, Tsinghua University, and Jinan University. Kai Yang’s research is supported by NSFC (Grant No. 71703090) and the ‘‘Chenguang Program’’ (No. 19CG43) supported by Shanghai Education Development Foundation and Shanghai Municipal Education Commission. Appendix A. Selected notations from the main text 1. Parameters and their combinations Symbol Explanation Pm Time lagged effects Spatial lagged effects Diffusion (space–time lagged effects) n × m fixed effects for spatial units ′ Dm,t = αm ,t ⊗ ln where αm,t are time fixed effects k × m coefficient matrix for independent variables Covariance matrix for each row of Vnm,t = Im − Ψm′ = Inm − Wn ⊗ Ψm′ ′ = Sm−1 (Pm + Φm )′ = Υm−1 Λm Υm′ −1 ′ = Snm (In ⊗ Pm + Wn ⊗ Φm′ ) ′ ′ ′ ′ ′ ′ ) , vec(Pm ) , vec(Φm ) , vec(Ψm′ )′ , vec∗ (Σv m )′ ]′ = [vec(Πkm −1 Wn = Γn ω̄n Γn The first n1 columns of eigenvectors of Wn corresponding to unit eigenvalues n × n0 submatrix of Γn,n1 = [Γn,n0 , √1n ln ], where ln is an n × 1 vector with ones and n0 = n1 − 1 Ψm Φm Cnm Dm,t Πkm Σv m Sm Snm Hm Hnm θ Γn Γn,n1 Γn,n0 Γn∗1 ,n Γn∗0 ,n αm , βm α⊥m , β⊥m The first n1 rows of Γn−1 The first n0 rows of Γn−1 ′ Hm − Im = αm βm , where both matrices are m × (m − m1 ) with full column rank ′ m × m1 matrices with full column rank and βm β⊥m = αm′ α⊥m = 0 2. Variables Symbol Explanation Ynm,t Xnk,t Vnm,t Wn ynm,t n × m dependent variable matrix at time t n × k independent variable matrix at time t n × m disturbances at time t Spatial weights matrix ′ = vec(Ynm ,t ) (similar for vnm,t and cnm ) Ỹnm,t = Ynm,t − Ȳnm,T where Ȳnm,T = = Ynm,t − Ȳ¯nm,T where Ȳ¯nm,T = = Γn∗1 ,n Ynm,t ˜ Ỹ nm,t ◦ Yn1 m,t 1 T 1 T ∑T ∑tT=−11 t =0 Ynm,t , similarly for ỹnm,t Ynm,t , similarly for ỹ˜ nm,t −1 3. Simplified expression Symbol θ Anm0 B1nm Gnm,d Explanation = In ⊗ (Pm0 − Im )′ + Wn ⊗ (Ψm0 + Φm0 )′ Generic nm × nm matrices; bounded row and column sum norms; similar for B2nm , Gnm,c ∑+∞ h Generic nm × nm matrix such that h=1 abs(Gnm,d ) are with bounded row and column sum norms 362 K. Yang and L.-f. Lee / Journal of Econometrics 221 (2021) 337–367 Symbol Explanation Unm,t = Ωθ0 ,nT Information matrix with Ξθ0 ,nT Additional term for Cov ∑+∞ h=1 Gnm,c Ghnm,d vnm,t −h+1 ∂ 2 ln LnT ,m (θ0 ) + Ωθ0 ,nT = op (1). ∂θ∂θ ′ ) ∂ ln L1nT ,m0 1 √ = Ωθ0 ,nT + Ξθ0 ,nT ∂θ (n−n1 )T 1 ((n−n1 )T +O (1) T , due to non-normality ∂ ln LRnT ,m0 1 ∂θ (n−n1 )T √ n−n1 ∆R,nT T (√ ) ∆R,nT Bias term of ∆θ0 ,nT = Ωθ−01,nT ∆R,nT ′ = (Xnk,t ⊗ Im )vec(Πkm ) + cnm −1 ′ ′ ′ ′ = (Wn ⊗ ej,m )Snm0 [(In ⊗ Pm0 + Wn ⊗ Φm0 )ỹ˜ nm,t −1 + (X̃nk,t ⊗ Im )vec(Πkm0 )], Qnm,t ãj,t √ = +O n T3 Ãz ,t where ej,m is an m-dimensional vector with a unit for the jth entry but zero for all other entries = [ã1,t , ã2,t , . . . , ãm,t ], PzzA,t = Ãz ,t − Z̃nm,t [ ∑T 1 (n−n1 )T t =1 ′ Z̃nm ,t Jn Z̃nm,t ]−1 [ 1 (n−n1 )T ∑T t =1 ′ Z̃nm ,t Jn Ãz ,t ] Note: more expressions and transformations of variables are in Section 4 of testing; however, they are defined step by step and only useful within its own section. Appendix B. Lemmas ∑+∞ −1 h ′ The reduced form for the stable SVAR model is ynm,t = h=0 Hnm0 Snm0 [(Xnk,t −h ⊗ Im )vec(Πkm0 ) + cnm0 + ln ⊗ αm0,t + −1 ′ ′ ′ vnm,t −h ], where, as we recall, Hnm0 = Snm0 (Pm0 ⊗ In + Φm0 ⊗ Wn ). Denote E(vnm,t vnm,t ) = In ⊗ Σv m0 ≡ Σv,nm . The statistic ∑+∞ h −1 h=0 Hnm Snm vnm,t −h is crucial in our asymptotic analysis, where Hnm and Snm are evaluated at any value of θ in its compact (s) parameter space. For the estimation method proposed in the main text, we use the stable subsystem ynm,t , which has the similar property with the stable SVAR and the following results also apply. ∑+∞ Here we∑provide some basic lemmas on moments for relevant statistics. Define Unm,t = h=1 Gnm,h vnm,t +1−h and +∞ Wnm,t = h=1 Hnm,h vnm,t +1−h , where the matrices Gnm,h and Hnm,h are nm × nm generic matrices with regularity conditions in Assumption B.1. From the defined Unm,t , t = 1, . . . , T , their corresponding sample mean over time ∑T Ūnm,T = 1/T t =1 Unm,t has the expression as +∞ ∑ Ūnm,T = G̈nm,h vnm,T +1−h , W̄nm,T = h=1 where G̈nm,h = Lemma B.1. 1 T +∞ ∑ Ḧnm,h vnm,T +1−h , h=1 ∑h g =1 Gnm,g for h ≤ T ; and = 1 T ∑T g =1 Gnm,h−T +g for h > T . (Similar for Ḧnm,h ). Under Assumption 3.1(i), when t ≥ s, E(Unm,t W′nm,s ) = +∞ ∑ ′ ′ Gnm,t −s+h Σv,nm Hnm ,h , E(Unm,t Wnm,s ) = h=1 +∞ ∑ Tr[G′nm,t −s+h Hnm,h Σv,nm ]. h=1 Lemma B.2. Let Cnm,ts,gh = E [(v′nm,t B1nm vnm,s )(v′nm,g B2nm vnm,h )]. Under Assumption 3.1(i), (1) for all t, Cnm,tt ,tt = Tr[B1nm Σv,nm (B2nm + B′2nm )Σv,nm ] + Tr(B1nm Σv,nm ) Tr(B2nm Σv,nm ) + m m m m ∑ ∑ ∑ ∑ ( (uklpq − σ σ − σ σ − σ σ k=1 l=1 p=1 q=1 2 2 kp lq 2 2 kq lp 2 2 kl pq ) ) n ∑ pq kl [B1nm ]ii [B2nm ]ii ; i=1 (2) for all t ̸ = s, Cnm,tt ,ss = Tr(B1nm Σv,nm ) Tr(B2nm Σv,nm ); Cnm,ts,ts = Tr[B1nm Σv,nm B′2nm Σv,nm ]; and Cnm,ts,st = Tr[B1nm Σv,nm B2nm Σv,nm ]; (3) Except (1) and (2), otherwise, Cnm,ts,gh = 0; pq where [Anm ]ii is the iith (diagonal) entry of the pqth n × n block of a generic matrix Anm , uklpq = E(vn,t ,ik vn,t ,il vn,t ,ip vn,t ,iq ) 2 and σkq = E(vn,t ,ik vn,t ,iq ) for any t and i, with k, l, p, q = 1, . . . , m. K. Yang and L.-f. Lee / Journal of Econometrics 221 (2021) 337–367 Lemma B.3. 363 Under Assumption 3.1(i), cov (Unm,t Wnm,t , U′nm,s Wnm,s ) ′ ⎡ ⎤ +∞ +∞ ∑ ∑ ′ ′ = Tr ⎣ Hnm,t −s+g Σv,nm Hnm,g Gnm,h Σv,nm Gnm,t −s+h ⎦ g =1 h=1 ⎡ ⎤ +∞ +∞ ∑ ∑ + Tr ⎣ Hnm,t −s+g Σv,nm G′nm,g Hnm,h Σv,nm G′nm,t −s+h ⎦ g =1 h=1 m m m m ∑ ∑ ∑ ∑ (uklpq − σ σ − σ σ − σ σ + 2 2 kl pq ) 2 2 kq lp 2 2 kp lq ( n +∞ ∑ ∑ g =1 k=1 l=1 p=1 q=1 ) Gnm,t −s+g Hnm,t −s+g kl ii [ ′ ] [ pq Gnm,g Hnm,g ii ′ ] . i=1 ∑+∞ ∑+∞ Gnm,h = Gnm,c Ghnm,d and Wnm,t = Assumption B.1. Unm,t = h=1 Hnm,h vnm,t +1−h with Hnm,h = h= 1 Gnm,h vnm,t +1−h with ∑+∞ ∑ +∞ h h h Hnm,c Hnm,d , where Gnm,c , Hnm,c , h=1 abs(Hnm,d ) are bounded in row and column sum norms. h=1 abs(Gnm,d ) and Under Assumptions 3.1(i) and B.1, v ar( Lemma B.4. Lemma [ B.5. 1 nT (1) E (2) E t =1 t =1 [1 U′nm,t Wnm,t = 1 n [ 1 nT U′nm,t Wnm,t − E ′ Ū W̄nm,T n nm,T 1 ′ Ū W̄nm,T n nm,T ] = [ −E (3) 1 nT t =1 U′nm,t Wnm,t ) = O(nT ). Under Assumptions ] 3.1(i) and B.1, ∑T ∑T 1 nT ∑T ∑T ∑+∞ Tr[G′nm,h Hnm,h Σv,nm ] = O(1); ∑T U′nm,t Wnm,t = Op h=1 ] t =1 +∞ 1 ′ h=1 Tr G̈nm,h Ḧnm,h Σv,nm n 1 ′ Op √ 1 2 Ū W̄nm,T n nm,T nT ∑ t =1 Ũnm,t W̃nm,t − E [ 1 nT √1 ) nT . O( T1 ) [ ; )] = = . ] ( ) ∑T ′ √1 Ũ W̃ = O . nm , t p t =1 nm,t nT ( ] ′ ( Lemma B.6. Suppose Dnm,t is an nm × 1 nonstochastic matrix with uniformly bounded elements, under Assumptions 3.1(i) ∑T ∑T ∑T 1 1 ′ ′ ′ √1 ), and 1 √1 ). and B.1, nT t =1 D̃nm,t Ũnm,t = nT t =1 D̃nm,t Unm,t = Op ( t =1 D̄nm,T Ūnm,T = Op ( nT nT nT ¯ As the nature of dynamic panel model results in time lags, we define Ū nm,T = 1 T ∑T −1 t =0 ˜ ¯ Unm,t and Ũ nm,t = Unm,t − Ūnm,T . Lemma B.7. Under Assumptions 3.1(i) and B.1, ∑+∞ T 1 1 n ¯ ′ v̄ ¯′ ¯′ ¯′ E(Ū nm,T nm,T ) = n Tr(Σv,nm Gnm,c g =1 Gnm,d ) + O( T ); so E(Ūnm,T v̄nm,T ) = O( T ). And Ūnm,T v̄nm,T − E(Ūnm,T v̄nm,T ) = n Op ( √ n ). T2 The subsequent Lemmas B.8 and B.9 are implied results by the above basic lemmas. They are used to establish the uniform convergence of the sample average concentrated log likelihood function to its expectation. Lemma B.8. Under Assumptions 2.1(i)–(ii), and 3.1, suppose a generic matrix B1nm is uniformly bounded in both row and column sum norms, then T 1 ∑ nT and [ [(Jn ⊗ Im )ỹ˜ nm,t −1 ]′ B1nm ṽnm,t − E ∗ t =1 T 1 ∑ nT ] [(Jn ⊗ Im )ỹ˜ nm,t −1 ]′ B1nm ṽnm,t ∗ t =1 1 = Op ( √ nT ); T 1 ∑ nT [(Jn∗ ⊗ Im )ỹ˜ nm,t −1 ]′ B1nm [(Jn∗ ⊗ Im )ỹ˜ nm,t −1 ] t =1 [ −E T 1 ∑ nT ] [(Jn ⊗ Im )ỹ˜ nm,t −1 ]′ B1nm [(Jn∗ ⊗ Im )ỹ˜ nm,t −1 ] ∗ t =1 Furthermore, { E { and E nT } [(Jn ⊗ Im )ỹ˜ nm,t −1 ]′ B1nm ṽnm,t t =1 ∗ 1 = O( ), T } ′ ∗ ˜ ˜ [(Jn ⊗ Im )ỹnm,t −1 ] B1nm [(Jn ⊗ Im )ỹnm,t −1 ] = O(1). T 1 ∑ nT T 1 ∑ t =1 ∗ 1 = Op ( √ nT ). 364 K. Yang and L.-f. Lee / Journal of Econometrics 221 (2021) 337–367 Lemma B.9. Under Assumptions 2.1(i)–(ii), and 3.1, if a generic matrix B1nm is bounded in both row and column sum norms, then ] [ ∑ T 1 1 ′ (1) E nT t =1 vnm,t B1nm vnm,t = n Tr[B1nm (In ⊗ Σv m0 )] = O(1); 1 nT (2) E ∑T t =1 v′nm,t B1nm vnm,t − E [1 ] v̄′ B v̄ = n nm,T 1nm nm,T 1 ′ v̄ B v̄ n nm,T 1nm nm,T (3) 1 nT ∑T t =1 −E [ t =1 ] v′nm,t B1nm vnm,t = Op ( √1 ) nT . ⊗ Σvm0 )] (= O( T1)); ] = Op √ 1 2 . nT ] ( ) ∑T ′ √1 ṽ B ṽ . t =1 nm,t 1nm nm,t = Op nT 1 Tr B1nm (In nT 1 ′ v̄ B v̄ n nm,T 1nm nm,T [ [ ∑T 1 nT ṽ′nm,t B1nm ṽ′nm,t − E [ 1 nT Lemma B.10. Under Assumptions 3.1(i) and B.1, the fourth moment of any entry unk,t −1,i of Unm,t −1 is uniformly bounded, i.e., E |unk,t −1,i |4 = O(1) uniformly in k and i. Appendix C. Technique Details for Section 4 The transformed terms Vn◦0 m,t in ′ ◦ Yn◦0 m,t =Yn◦0 m,t Hm0 + Xn◦0 ,t Πkm + Cn◦0 m + Vn◦0 m,t , ′ ′ ′ ′ ′ ′ ′ ◦ or ∆Yn◦0 m,t =αm0 βm0 Yn◦0 m,t + Πkm Xn◦0 ,t + Cn◦0 m + Vn◦0 m,t , contain rows with the same zero expectations and cov(v◦n0 ,t ) = In0 ⊗ Σum0 . For simplicity, we let Xn◦0 ,t = Xnu0 ,t without loss of generality and introduce simple notations ′ Zt = Hm0 Zt −1 + C0 + π0 Xt + ϵt , or ∆Zt = αm0 βm0 Zt −1 + C0 + π0 Xt + ϵt , ′ ′ ′ ′ ′ ◦ . where Zt , C0 , and ϵt are m × n0 matrices, which equal respectively Yn◦0 m,t , Cn◦0 m , and Vn◦0 m,t . Xt = Xn◦0 ,t and π0 = Πkm0 c x Also, without Xn0 ,t , τn0 = τn0 . For each t, the columns of ϵt are m × 1 vectors with covariance matrix Σϵ . The Sij for i, j = 0, 1, the likelihood function, and the likelihood ratio test statistic are similarly defined in the main context. Following Johansen (1991)’s ∑t π0 Xs + procedure, we have that ∆Zt = B(L)(π0 Xt + C0 + ϵt ) = B(L)π0 Xt + τn0 + B(L)ϵt and Zt = Z0 + ( s=1 B(L)) ′ αm0 −1 ϵs + τn0 t + St − S0 , where St = B1 (L)ϵt , B1 (L) = B(L)1−−B(1) (L) , and , B(L) = ( β (1 − L) , β )A m0 ⊥ m0 ′ L α⊥m0 ) ( ′ ′ ′ ′ −αm0 αm0 βm0 βm0 + αm0 χ (L)βm0 (1 − L) αm0 χ (L)β⊥m0 . B is the value of B(z) at z = 1. This means that A(L) = ′ ′ α⊥ α⊥ m0 χ (L)βm0 (1 − L) m0 χ (L)β⊥m0 B ∑t s=1 each column of Zt is composed of a time trend (with exogenous variable Xt ), a unit root process, and stationary processes, ′ ′ ′ Stx where Stx = B1 (L)π0 Xt , is also St + βm0 Zt = βm0 while each column of ∆Zt is stationary. Moreover, each column of βm0 stationary, in addition to exogenous variables. In order to derive the distribution of likelihood ratio test statistics, we need the following lemmas. Note that both −1 Assumptions 4.1 and 4.2 imply that n0 is increasing in T . Here, we define Σjl = Σjl+ − Σjx Σxx Σxl , where j, l = 0, 1, and + Σ00 = lim T →∞ + ′ βm0 Σ10 = lim T →∞ + Σ01 βm0 = lim T →∞ + ′ βm0 Σ11 βm0 = lim T →∞ Σxx = lim T →∞ Σ0x = lim T →∞ Σx1 βm0 = lim T →∞ T 1 ∑ n0 T t =1 T 1 ∑ n0 T ∆Z̃t X̃t′ , t =1 T 1 ∑ n0 T X̃t X̃t′ , t =1 T 1 ∑ n0 T ˜ ′ ˜ βm0 Z̃t −1 Z̃t′−1 βm0 , t =1 T 1 ∑ n0 T ∆Z̃t Z̃˜t′−1 βm0 , t =1 T 1 ∑ n0 T ′ ˜ βm0 Z̃t −1 ∆Z̃t′ , t =1 T 1 ∑ n0 T ∆Z̃t ∆Z̃t′ , t =1 ˜ X̃t Z̃t′−1 βm0 . K. Yang and L.-f. Lee / Journal of Econometrics 221 (2021) 337–367 Lemma C.1. 365 Under H0 , Assumptions 2.1, 2.2(i), 3.1(i)–(iii), and 4.2, for any case defined by Assumption 2.2(ii)–(iv), p (1) As T → ∞, S00 → Σ00 . p ′ ′ (2) As T → ∞, βm0 S11 βm0 → βm0 Σ11 βm0 . p β⊥′ m0 (limn0 →∞ n1 τn0 τn′ 0 )β⊥m0 . 0 √ n0 n0 1 ′ ′ d (4) As T → ∞, (S − α β S ) + Σ B → S[01] , where vec(S[01] ) ∼ 01 m0 11 ϵ T )m0 ) 2 T ( ( 1 1 ′ N 0, 12 limn0 →∞ n τn0 τn0 ⊗ Σϵ . (3) As T → ∞, 1 T2 √ β⊥′ m0 S11 β⊥m0 → 1 12 0 ′ ′ (5) α⊥ m0 Σ00 α⊥m0 = α⊥m0 Σϵ α⊥m0 . −1 −1 −1 −1 ′ ′ ′ −1 ′ (6) α⊥m0 (α⊥m0 Σϵ α⊥m0 )−1 α⊥ βm0 Σ10 Σ00 m0 = Σ00 − Σ00 Σ01 βm0 (βm0 Σ10 Σ00 Σ01 βm0 ) Using Lemma C.1, we show the following two lemmas. Proposition 4.1 is a direct result of those lemmas. Lemma C.2. Under H0 , Assumptions 2.1, 2.2(i), 3.1(i)–(iii), and 4.2, for any case defined by Assumption 2.2(ii)–(iv), namely, −1 S01 | = 0 converge in probability to the S, MC, and SC cases, the largest m − m1 solutions {λ1 , . . . , λm−m1 } of |λS11 − S10 S00 −1 ′ ′ corresponding eigenvalues of |λβm0 Σ11 βm0 − βm0 Σ10 Σ00 Σ01 βm0 | = 0, while the others converge to zeros in probability. Lemma C.3. Under H0 , Assumptions 2.1, 2.2(i), 3.1(i)–(iii), and 4.2, for any case defined by Assumption 2.2(ii)–(iv), as d n0 T → M, where 0 ≤ M < +∞, the likelihood ratio test statistics LR(m − m1 , m) → Tr[(Wm1 + Bm1 )(Wm1 + Bm1 )′ ] where Wm1 is an m1 × m1 matrix with each entry being i.i.d. standard normally distributed random variable and Bm1 = ( ( ) )−1/2 √ 1 x x′ ′ ′ 1/2 − 3M α⊥ . (α⊥ m0 limn0 →0 n0 τn0 τn0 α⊥m0 m0 Σϵ α⊥m0 ) ) )−1/2 ( ( √ 1 ′ ′ 1/2 ′ Without Xnc0 ,t , τnx0 = τn0 . As Bm1 = − 3M α⊥ , we need plug-in (α⊥ m0 Σϵ α⊥m0 ) m0 limn0 →0 n τn0 τn0 α⊥m0 0 p p ′ ˆ ′ estimators for Bm1 and simulate the critical values. Our estimators, n1 Cn◦ m0 Cn◦ m0 → limn0 →0 n1 C0 C0′ and α⊥ m0 S00 α⊥m0 → 0 0 0 0 ′ ′ α⊥m0 Σϵ α⊥m0 . However, βm0 , αm0 , and α⊥m0 are not unique. To see this, with normalization βm0 βm0 = Im−m−1 , bm0 = ′ = am0 b′m0 for a generic orthonormal matrix Rm−m1 . For a βm0 Rm−m1 and am0 = αm0 Rm−m1 also satisfy Hm0 = αm0 βm0 given βm0 as the true parameter matrix, we can show that an estimator of β̃m is a consistent estimator for a basis of −1 column space of βm0 as they are identified up to transformation by an invertible matrix. The QMLEs β̃m = S112 V̂m,m−m1 ′ S11 β̃m )−1 , where V̂m,m−m1 represents the matrix of eigenvectors corresponding to the largest m − m1 and α̃m = S01 β̃m (β̃m −1 −1 −1 −1 −1 1 −1 −1 −2 eigenvalues λ1 ≥ · · · ≥ λm−m1 for |λIm − S112 S10 S00 S01 S112 | = 0. α̃⊥m = S00 S01 S112 V̂m,m1 (V̂m′ ,m1 S112 S10 S00 S01 S112 V̂m,m1 )− 2 , where V̂m,m1 represents the matrix of eigenvectors corresponding to the last eigenvalues λm−m1 +1 ≥ · · · ≥ λm for −1 −1 −1 |λIm − S112 S10 S00 S01 S112 | = 0. In the following lemma, we show the consistency of α̃⊥m in the following way: first, we show that a transformation of β̃m by post-multiplying an invertible matrix is a consistent estimator for a given βm0 , which implies that β̃m is also a consistent estimator for a basis of column space of βm0 ; second, we show a transformation of α̃m is also a consistent estimator for a basis of column space of αm0 ; last, we show that the QMLE α̃⊥m is a consistent ′ estimator for a basis of column space of α⊥m0 as αm0 α̃⊥m = op (1). As α⊥m0 is identified up to transformation by an ′ , and Tr[(Wm1 + Bm1 )(Wm1 + Bm1 )′ ] has the same distribution as orthonormal matrix with restriction α⊥ α = I ⊥ m0 m m0 1 Tr[(Wm1 + B∗m1 )(Wm1 + B∗m1 )′ ] with B∗m1 = Rm1 Bm1 R′m1 where Rm1 is a generic m1 × m1 orthonormal matrix, we can plug-in the estimator α̃⊥m to simulate the critical values. Lemma C.4. Under H0 , Assumptions 2.1, 2.2(i), 3.1(i)–(iii), and 4.2, for any case defined by Assumption 2.2(ii)–(iv), for a ′ ′ β̃m )−1 , where β̄m0 = βm0 (βm0 βm0 )−1 , is a consistent estimator for βm0 , i.e., β̂m − βm0 = op (1). The given βm0 , β̂m = β̃m (β̄m0 ′ ′ estimator α̃⊥m is a consistent estimator for a basis of null space of αm0 , i.e., αm0 α̃⊥m = op (1). Lemma 4.1 follows Lemma C.4. The proof of Corollary 1 is in the supplementary file. Appendix D. Biases terms ∆R,vec(Pm′ ),nT = (∆R,Pm,11 ,nT , ∆R,Pm,12 ,nT , . . . , ∆R,Pm,1m ,nT , . . . , ∆R,Pm,mm ,nT )′ , and similarly for ∆R,vec(Φm′ ),nT , ∆R,vec(Σv′ m ),nT and ∆R,vec(Ψm′ ),nT .31 Explicitly, ′ ),nT = 0, ∆R,vec(Πkm ⎡ ∆R,Pm,ij ,nT = − 1 n − n1 ′ Tr ⎣(Jn∗ ⊗ Em ,ij ) +∞ ∑ ⎤ g −1 ⎦ Bnm0 Snm0 , g =0 31 The arrangement and dimension of the vector depends on the arrangement of distinct parameters to estimate. 366 K. Yang and L.-f. Lee / Journal of Econometrics 221 (2021) 337–367 ⎡ ∆R,Φm,ij ,nT = − 1 ∗ n − n1 ′ Tr ⎣((Jn Wn ) ⊗ Em,ij ) +∞ ∑ ⎤ g −1 Bnm0 Snm0 ⎦, g =0 1 ∆R,Σvm,ij ,nT = − Tr[Fm,ij Σv−m0 ], ⎡ ∆R,Ψm,ij ,nT = − 1 n − n1 − −1 ′ ′ ′ Tr ⎣((Jn∗ Wn ) ⊗ Em ,ij )Snm0 (in ⊗ Pm0 + Wn ⊗ Φm0 ) 1 n − n1 +∞ ∑ ⎤ g −1 Bnm0 Snm0 ⎦ g =0 −1 ′ Tr ((Jn∗ Wn ) ⊗ Em ,ij )Snm0 . [ ] Appendix E. Supplementary data Supplementary material related to this article can be found online at https://doi.org/10.1016/j.jeconom.2020.05.010. References Allers, Maarten A., Elhorst, J. Paul, 2011. A simultaneous equations model of fiscal policy interactions. J. Reg. Sci. 51 (2), 271–291. Anselin, L., 1988. Spatial Econometrics: Methods and Models. Kluwer Academic Publishers, Boston. Baltagi, Badi H., 2006. Random effects and spatial autocorrelation with equal weights. Econometric Theory 22 (5), 973–984. Baltagi, Badi H., Deng, Ying, 2015. EC3SLS estimator for a simultaneous system of spatial autoregressive equations with random effects. Econometric Rev. 34, 658–693. Beenstock, Michael, Felsenstein, Daniel, 2007. Spatial vector autoregressions. Spatial Economic Analysis 2 (2), 167–196. Breitung, Jorg, 2005. A parametric approach to the estimation of cointegration vectors in panel data. Econometric Rev. 24 (2), 151–173. Brown, Kristine M., Laschever, Ron A., 2012. When they’re sixty-four: Peer effects and the timing of retirement. Am. Econ. J.: Appl. Econ. 4 (3), 90–115. Chen, Ye, 2004. Disaster defense and reduction policies in the Qing Dynastry. In: Studies in Qing History, Vol. 3. pp. 41–52. Cohen-Cole, Ethan, Liu, Xiaodong, Zenou, Yves, 2018. Multivariate choices and identification of social interactions. J. Appl. Econometrics 33, 165–178. Cragg, John G., Donald, S.G., 1997. Inferring the rank of a matrix. J. Econometrics 76, 223–250. de Graaff, Thomas, van Oort, Frank G., Florax, Raymond J.G.M., 2012. Regional population–employment dynamics across different sectors of the economy. J. Reg. Sci. 52 (1), 60–84. Dewachter, H., Houssa, R., Toffano, P., 2012. Spatial propagation of macroeconomic shocks in Europe. Rev. World Econ. 148 (2), 377–402. Elhorst, J., 2003. Specification and estimation of spatial panel data models. Int. Reg. Sci. Rev. 26 (3), 244–268. Engle, R.F., Granger, C.W.J., 1987. Co-integration and error correction: Representation, estimation and testing. Econometrica 55, 251–276. Gebremariam, Gebremeskel H., Gebremedhin, Tesfa G., Schaeffer, Peter V., 2011. Employment, income, and migration in appalachia: a spatial simultaneous equations approach. J. Reg. Sci. 51 (1), 102–120. Gomez, Victor, Maravall, Agustin, 1997. Program TRAMO (Time Series Regression with ARIMA Noise, Missing Observations, and Outliers) and SEATS (Signal Extraction in ARIMA Time Series) Instructions for the User. Secretaria de Estado de Hacienda, Madrid. Hauptmeier, Sebastian, Mittermaier, Ferdinand, Rincke, Johannes, 2012. Fiscal competition over taxes and public inputs. Reg. Sci. Urban Econ. 42 (3), 407–419. Huang, Jingbin, 2009. Rediscussion on the grain demand suuply and trade of Jiangnan in the Mid-Qing. J. Tsinghua Univ. (Philos. Soc. Sci.) 24, 39–48. Johansen, Soren, 1988. Statistical analysis of cointegration vectors. J. Econom. Dynam. Control 12 (23), 231–254. Johansen, Soren, 1991. Estimation and hypothesis testing of cointegration vectors in Gaussian vector autoregressive models. Econometrica 59 (6), 1551–1580. Kelejian, Harry, Prucha, Ingmar, 1998. A generalized spatial two stage least squares procedure for estimating a spatial autoregressive model with autoregressive disturbances. J. Real Estate Finance Econ. 17, 99–121. Kelejian, Harry, Prucha, Ingmar, 1999. A generalized moments estimator for the autoregressive parameter in a spatial model. Internat. Econom. Rev. 40 (2), 509–533. Kelejian, Harry, Prucha, Ingmar, 2004. Estimation of simultaneous systems of spatially interrelated cross sectional equations. J. Econometrics 118, 27–50. Keller, Wolfgang, Shiue, Carol, 2007. The origins of spatial interaction. J. Econometrics 140 (1), 304–332. Koop, Gary, Pesaran, M. Hashem, Potter, Simon M., 1996. Impulse response analysis in nonlinear multivariate models. J. Econometrics 74 (1), 119–147. Korniotis, George M., 2010. Estimating panel models with internal and external habit formation. J. Bus. Econom. Statist. 28 (1), 145–158. Larsson, Rolf, Lyhagen, Johan, Lothgren, Michkeal, 2001. Likelihood-based cointegration tests in heterogeneous panels. Econom. J. 4, 109–142. Lee, Lung-fei, 2004. Asymptotic distributions of quasi-maximum likelihood estimators for spatial autoregressive models. Econometrica 72 (6), 1899–1925. Lee, Lung-fei, 2007. GMM And 2SLS estimation of mixed regressive, spatial autoregressive models. J. Econometrics 137 (2), 489–514. Lee, Lung-Fei, Yu, Jihai, 2010a. A unified transformation approach for the estimation of spatial dynamic panel data models: stability spatial cointegration and explosive roots. In: Ullah, A., Giles, D.E.A. (Eds.), Handbook of Empirical Economics and Finance. Chapman and Hall/CRC, pp. 395–432. Lee, Lung-fei, Yu, Jihai, 2010b. Some recent developments in spatial panel data models. Regional Science and Urban Economics 40 (5), 255–271. LeSage, James, Pace, Robert Kelley, 2009. Introduction to Spatial Econometrics. In: Statistics: A Series of Textbooks and Monographs, Taylor and Francis Group, New York. Levin, Andrew T., Lin, Chienfu, Chu, Chiashang James, 2002. Unit root tests in panel data: asymptotic and finite-sample properties. J. Econometrics 108 (1), 1–24. Li, Kunpeng, 2017. Fixed-effects dynamic spatial panel data models and impulse response analysis. J. Econometrics 198 (1), 102–121. Liu, Xiaodong, 2014. Identification and efficient estimation of simultaneous equations network models. J. Bus. Econom. Statist. 32 (4), 516–536. Liu, Xiaodong, Patacchini, Eleonora, Zenou, Yves, 2014. Endogenous peer effects: Local aggregate or local average? J. Econ. Behav. Organ. 103, 39–59. Lütkepohl, Helmut, 2005. New Introduction to Multiple Time Series Analysis. Springer-Verlag, Berlin. Mutl, Jan, 2009. Panel var models with spatial dependence. Working paper. Ord, J.K., 1975. Estimation methods for models of spatial interaction. J. Amer. Statist. Assoc. 70, 120–297. K. Yang and L.-f. Lee / Journal of Econometrics 221 (2021) 337–367 367 Quah, Danny, 1994. Exploiting cross section variation for unit root inference in dynamic data. Econom. Lett. 44, 9–19. Robin, Jean-Marc, Smith, Richard J., 2000. Tests of rank. Econometric Theory 16 (2), 151–175. Shiue, Carol, 2002. Transport costs and the geography of arbitrage in eighteenth century China. Amer. Econ. Rev. 92 (5), 1406–1419. Su, Liangjun, Yang, Zhenlin, 2015. Estimation of dynamic panel data models with spatial errors. J. Econometrics 185 (1), 230–258. Theil, Henri, 1971. Principles of Econometrics. John Wiley & Sons, New York. Yan, Se, Liu, Cong, 2011. A comparison of the market intergration degree between north China and south China during 18th century: based on grain price data of the Qing dynasty. Econ. Res. J. 12, 124–137, (in Chinese). Yang, Kai, Lee, Lung-fei, 2017. Identification and QML estimation of multivariate and simultaneous spatial autoregressive models. J. Econometrics 196, 196–214. Yu, Jihai, de Jong, Robert, Lee, Lung-fei, 2008. Quasi-maximum likelihood estimators for spatial dynamic panel data with fixed effects when both n and t are large. J. Econometrics 146 (1), 119–134. Yu, Jihai, de Jong, Robert, Lee, Lung-fei, 2012. Estimation for spatial dynamic panel data with fixed effects: The case of spatial cointegration. J. Econometrics 167 (1), 16–37. Yu, Jihai, Lee, Lung-fei, 2010. Estimation of unit root spatial dynamic panel data models. Econometric Theory 26 (5), 1332–1362.