Uploaded by ricky.rqguo

Estimation of dynamic panel spatial vector autoregression

advertisement
Journal of Econometrics 221 (2021) 337–367
Contents lists available at ScienceDirect
Journal of Econometrics
journal homepage: www.elsevier.com/locate/jeconom
Estimation of dynamic panel spatial vector autoregression:
Stability and spatial multivariate cointegration
Kai Yang a , Lung-fei Lee b ,
∗
a
School of Economics, Shanghai University of Finance and Economics, and Key Laboratory of Mathematical Economics (SUFE),
Ministry of Education, No. 777 Guoding Road, Yangpu District, Shanghai, 200433, China
b
Department of Economics, The Ohio State University, 410 Arps Hall, 1945 N. High St., Columbus, OH 43210, USA
article
info
Article history:
Received 29 June 2017
Received in revised form 4 October 2019
Accepted 15 May 2020
Available online 7 August 2020
JEL classification:
C31
C33
R11
Keywords:
Dynamic panel
Spatial vector autoregression
Identification
Quasi-maximum likelihood
Spatial cointegration
Market integration
a b s t r a c t
This paper introduces dynamic panel spatial vector autoregressive models. We study
features of dynamics and spatial interactions that an SVAR model can generate and
classify the model into stable or unstable cases by partitioning parameter spaces. For
stable, spatial cointegration, and mixed cointegration cases, we investigate identification
and QML estimation of the models to take into account simultaneity and correlated
relationships. Asymptotic properties and bias-corrected estimators are presented. To
detect unknown cointegration relationships, we introduce a sequential likelihood ratio
testing procedure. Simulations show the advantage of QMLEs on bias reduction and
efficiency gains. The empirical application provides evidences on ancient China’s market
integration.
© 2020 Elsevier B.V. All rights reserved.
1. Introduction
Panel data with cross-sectional dependent variables have highlighted the need for new analytical models to model
dependence patterns. The vector autoregressive (VAR) model has proven to be useful for describing dynamic behaviors
of economic variables. However, the number of dependent variables cannot be too large for statistical inference, which
limits its application to analyze data with large cross-sectional units, in particular, for regional studies. Regional data often
contains much larger cross-sectional units (counties, metropolitan areas, prefectures or states). In order to overcome the
cross-sectional dimension problem, researchers assume relative prior strengths of connections via the specification of a
spatial weights matrix. Among various models to describe spatial dependence, spatial autoregressive models (SAR) have
attracted much attention. Early works include Anselin (1988), Kelejian and Prucha (1998, 1999) and Lee (2004, 2007). Panel
data models incorporating spatial autoregression are studied in Elhorst (2003), Baltagi (2006), Yu et al. (2008, 2012), Su and
Yang (2015), and Li (2017) among others. Asymptotic properties of estimators for single equation spatial dynamic panel
data (SDPD) models are established in Yu et al. (2008, 2012). Li (2017) studies the SDPD model with high order spatial
lags and high order time lags. He investigates the estimation of the impulse response functions and studies estimation and
inference of the average direct, indirect and total impacts (LeSage and Pace, 2009). Although a large number of economic
∗ Corresponding author.
E-mail addresses: yang.kai@mail.shufe.edu.cn (K. Yang), lee.1777@osu.edu (L.-f. Lee).
https://doi.org/10.1016/j.jeconom.2020.05.010
0304-4076/© 2020 Elsevier B.V. All rights reserved.
338
K. Yang and L.-f. Lee / Journal of Econometrics 221 (2021) 337–367
theories may concern about interrelations among several economic variables, econometric studies regarding vector SAR
(or spatial vector autoregression, SVAR) models are limited with a few exceptions: Kelejian and Prucha (2004), Baltagi
and Deng (2015), Cohen-Cole et al. (2018), Liu (2014) and Yang and Lee (2017).
This paper considers dynamic panel SVAR models which are composed of multiple dependent variables across time
and space. We study stable and/or unstable temporal and spatial features of those variables. In addition to time lags,
a cross-sectional unit may respond to its neighbors’ or peers’ behavior in current period (spatial lags), and in previous
periods (space–time lags; diffusion). Therefore, we could expect that a shock to a variable in a region could propagate
to other variables, across both temporal and spatial dimensions. For panel data models, overall disturbances in such a
model may include time and individual fixed effects in addition to idiosyncratic shocks. Such a model can be applied to
analyze equation systems with time dynamics and spatial spillover effects in regional science (de Graaff et al., 2012;
Gebremariam et al., 2011), fiscal policy with government competition (Hauptmeier et al., 2012; Allers and Elhorst,
2011), macroeconomic or financial analysis with internal and external habits in consumption (Korniotis, 2010), spatial
propagation of macroeconomic shocks (Dewachter et al., 2012) and spillover effects in environmental economics, housing
prices, and social network (Brown and Laschever, 2012).
The SVAR model adopts spatial and temporal dependence features of the SDPD model in Yu et al. (2008, 2012) and the
nature of multivariate interactions in econometrics. However, it is not a simple extension of those models. We present
main issues of SVAR with its difference from SDPD and conventional VAR models.
The first issue is on features of data that a SVAR model can generate. A SVAR model can generate stable and unstable
time series and spatial process in panel data: (i) Case S (stable): all variables (time series) for each spatial unit and spatial
processes are stable ; (ii) Case SC (pure spatial cointegration): variables are spatially cointegrated among all spatial units,
but they are not cointegrated with each other in a spatial unit as a VAR, so the cointegration rank is determined only by
the spatial weights matrix; (iii) Case VC (pure variable cointegration): variables for each spatial unit are cointegrated as
a VAR but they are not spatially cointegrated across spatial units; (iv) Case MC (mixed case of spatial cointegration and
variable cointegration): variables are both spatially cointegrated among spatial units and cointegrated with other variables
in each spatial unit; (v) Case PU (pure unit root) unstable variables are unit root processes and they are not cointegrated;
(vi) Case E (explosive) some/all variables are explosive. For a SDPD model, which is a univariate model, it may exhibit
stable, spatial cointegration, pure unit root, and explosive situations. However, for a SDPD model, ‘‘the cointegrating space
is completely known (determined by the spatial weights matrix) when cointegration occurs’’ (Yu et al., 2012). In the
conventional multivariate time series literature, the rank of cointegration is the main object of inference. In our model,
the cointegrating space is ‘‘partially known’’, because the spatial weights matrix is known but matrices of parameters
for cointegration in the temporal dimension are needed to be estimated, which causes difficulties in categorizing various
cases and estimation issues without knowing the cointegration rank, and there is a need on detecting cointegration rank
if it occurs.
The identification and estimation have several issues needed to be investigated. The first issue lies on the identification
of parameters with simultaneity among equations, correlation over time, and dependence across spatial units. The second
issue, which is the major difference from the univariate SDPD model, is on estimation. One has to study suitable estimation
strategy with neither knowing the model being stable or unstable nor knowing cointegration ranks if the model is
cointegrated. The third issue is to detect which case the true model is with available real data. We show that this third
issue is related to the investigation on cointegration rank. Hence, a testing procedure is required due to the multivariate
interactions. Therefore, we extend Johansen’s type cointegration rank test statistics to detect the rank with sequential
hypothesis testing for our spatial panels.
There are few works on VAR models with spatial features. Beenstock and Felsenstein (2007) study a SVAR model
by introducing spatial lags, space–time lags, and spatial errors into the vector autoregressive model. They propose an
IV estimation method and use annual data with 4 variables to illustrate their estimation. They find evidence of space–
time effect and spatial error. Mutl (2009) studies a panel VAR model with spatial dependent errors and proposes a
3-step estimation method to handle spatial errors. However, stable and unstable cases of the SVAR model have not been
investigated in detail and asymptotic properties of estimators are not formally studied.
We study in detail characteristics of various cases of the SVAR model in Section 2. We decompose the model to show
that the rank of a parametric matrix is crucial to categorize various cases. In Section 3, we study identification and a
unified quasi-maximum likelihood (QML) estimation. We focus on cases S, SC, and MC. The identification requirement
for spatial lags via IVs relies on the presence of exogenous variables and/or valid predetermined time lagged variables.
For estimation, we first eliminate unstable components and time fixed effects in order to avoid the incidental parameter
problem due to time effects as we are considering samples with the number of time periods T√tending to infinity. The
QML estimator for each common parameter of interest is consistent and asymptotic normal with nT rate of convergence
rate.
Furthermore, we introduce a test which distinguishes the stable case with SC and MC cases and to detect the
cointegration rank if cointegration occurs. This method of test relies on the rotation of variables by the spatial weights
matrix, and utilizes the ‘‘known’’ part of information for instability. Using the transformed model, we propose a Johansen’s
type sequential hypothesis testing procedure and show the asymptotic distribution of the test statistic.
Our main theoretical results rely on asymptotic analysis, while Monte Carlo experiments show the robustness of these
results for finite samples. Possible biases reduction and efficient gains of systematic QML estimation over single equation
estimations (IV/2SLS) and 3SLS for finite sample are also presented.
K. Yang and L.-f. Lee / Journal of Econometrics 221 (2021) 337–367
339
In application, we apply the model to study the grain market integration in historical China using a unique historical
dataset of rice and wheat prices of 65 prefectures over 49 years in Yangtze River Basin in the 18th century. Previous
researches consider rice prices solely. We add the multivariate feature by considering wheat prices. The empirical results
show that rice and wheat prices are spatially dependent with each other and across prefectures. The evidence of test
suggests a stable model. These results provide evidence of interregional and intertemporal grain market integration and
trading network in the eighteenth-century Yangtze River basin.
Section 2 specifies the dynamic panel SVAR model. Section 3 studies model identification and quasi-maximum
likelihood (QML) estimation. Asymptotic properties of QML estimates (QMLEs) for stable and unstable models are
established. Section 4 introduces a hypothesis testing procedure and its test statistic. Sections 5 and 6 present Monte Carlo
experiments and an empirical application. Additional details on proofs of results are collected in an online supplementary
file.
2. A dynamic panel spatial vector autoregressive model
We specify a dynamic panel spatial vector autoregressive (SVAR) model as
Ynm,t = Wn Ynm,t Ψm0 + Ynm,t −1 Pm0 + Wn Ynm,t −1 Φm0 + Xnk,t Πkm0 + Cnm0 + Dm0,t + Vnm,t ,
(2.1)
for t = 1, 2, . . . , T , where Ynm,t is an n × m matrix consisting of m multivariate endogenous variables (one for each
column); t represents the time index and n is the number of cross-sectional spatial units. Neighborhood relationships are
summarized by Wn , an n × n matrix. Ynm,t −1 is the time lag, and Wn Ynm,t −1 is the space–time lag, which captures diffusion.
Xnk,t is an n × k matrix of k exogenous variables of n spatial units at time t. This model includes own-variable spatial effects
represented by diagonal elements of Ψm0 and cross-variable spatial effects represented by off-diagonal elements of Ψm0 .
Furthermore, Pm0 and Φm0 are m × m matrices, representing dynamic time effects and space–time diffusion effects. Πkm0
is a k × m coefficient matrix for regressors. The Cnm0 is an n × m matrix of individuals effects. There is an m-dimensional
′
column vector of time effects for the m dependent variables in Dm0,t = αm0
,t ⊗ ln , where ln is an n × 1 vector with each
entry being one and αm0,t is the m-dimensional time fixed effects column vector. Row vectors of idiosyncratic disturbance
matrix Vnm,t are assumed to be i.i.d.(0, Σv m0 ), across spatial units and over time, but vn,t ,ih and vn,t ,il in Vnm,t of a unit i at
time t for different equations h and l are allowed to be correlated. Correlated effects across disturbances are incorporated
in off-diagonal entries of the covariance matrix Σv m0 of disturbances.
′
′
′
′
′
′
For the reduced form of the SVAR model, we may first transpose the equation, Ynm
,t = Ψm0 Ynm,t Wn + Pm0 Ynm,t −1 +
′
′
′
′
′
′
′
′
+
V
,
and
then
take
the
vectorization
to
have,
+
D
X
+
C
W
+
Π
Ynm
Φm0
nm,t
m0,t
nm0
km0 nk,t
,t −1 n
′
′
′
ynm,t = (Wn ⊗ Ψm0
)ynm,t + (In ⊗ Pm0
)ynm,t −1 + (Wn ⊗ Φm0
)ynm,t −1
′
+ (Xnk,t ⊗ Im )vec(Πkm0
) + cnm0 + ln ⊗ αm0,t + vnm,t ,
(2.2)
′
′
′
where ynm,t = vec(Ynm
,t ), cnm0 = vec(Cnm0 ), and vnm,t = vec(Vnm,t ) are (column) vectors of dimension nm. For this
arrangement, at each time t, we first pack the m multivariate variables together for each individual and then order the
individuals. For simpler exposition, we collect terms of exogenous variables and individual fixed effects into Qnm0,t =
′
(Xnk,t ⊗ Im )vec(Πkm0
) + cnm0 , and define
−1
′
′
′
Snm0 = Inm − Wn ⊗ Ψm0
and Hnm0 = Snm0
(In ⊗ Pm0
+ Wn ⊗ Φm0
).
Assuming that the process has been operated for a long time, its final form across space and time is
ynm,t =
+∞
∑
−1
h
Hnm0
Snm0
[Qnm0,t −h + ln ⊗ αm0,t −h + vnm,t −h ].
(2.3)
h=0
In order to analyze dynamics of this system, we consider the popular specification that the weights matrix Wn is
row-normalized and diagonalizable with real eigenvalues i.e., Wn = Γn ω̄n Γn−1 with a diagonal real eigenvalue matrix ω̄n
and an eigenvector matrix Γn .1 Since the weights matrix is row-normalized, the largest eigenvalue is 1. Suppose that the
eigenvalue matrix ω̄n has n1 eigenvalues to be one while others are less than one in absolute value. The eigenvalues are
arranged such that
ω̄n = diag{1, . . . , 1, ωn,n1 +1 , . . . , ωn,n } = 1n,n1 + diag{0, . . . , 0, ωn,n1 +1 , . . . , ωn,n },
where 1n,n1 represents the n × n diagonal matrix with the first n1 diagonal elements being one and remaining diagonal
elements being zero. It follows that we can decompose Snm0 and Hnm0 based on the above decomposition of Wn . We further
let
′
Sm0 = Im − Ψm0
,
−1 ′
′
Hm0 = Sm0
(Pm0 + Φm0
),
1 Detailed discussion is on Assumption 2.1. In networks, a typical undirected network matrix is diagonalizable with real eigenvalues because of
its symmetry. If Wn is row-normalized from an original symmetric model as in Ord (1975), Wn is diagonalizable with all eigenvalues being real. And
normalized weights matrices are widely employed in empirical studies of regional economics and social network analysis.
340
K. Yang and L.-f. Lee / Journal of Econometrics 221 (2021) 337–367
′ −1
) (Pm0 + ωn,n1 +1 Φm0 )′ ,
B̃nm0 = diag{0, . . . , 0, (Im − ωn,n1 +1 Ψm0
′ −1
) (Pm0 + ωn,n Φm0 )′ },
· · · , (Im − ωn,n Ψm0
Bnm0 = (Γn ⊗ Im )B̃nm0 (Γn−1 ⊗ Im ), and Wnu = Γn 1n,n1 Γn−1 ,
where B̃nm0 is an nm × nm diagonal block matrix and each diagonal block is an m × m submatrix. It follows that
−1
−1
−1 2
h
h
h
h
Hnm0
= Wnu ⊗ Hm0
+ Bhnm0 , and Hnm0
Snm0
= Wnu ⊗ (Hm0
Sm0
) + Bhnm0 Snm0
. We can decompose the vector of dependent
variables at t into several components as
(τ )
(u)
(s)
(α )
ynm,t = ynm,t + ynm,t + ynm,t + ynm,t ,
where
(τ )
ynm,t =
t
∑
−1
h
Sm0
)]Qnm0,t −h ,
[Wnu ⊗ (Hm0
h=0
(u)
t +1
)ynm,−1 +
ynm,t = (Wnu ⊗ Hm0
t
∑
−1
h
[Wnu ⊗ (Hm0
Sm0
)]vnm,t −h ,
h=0
(s)
ynm,t
=
∞
∑
−1
(Qnm0,t −h
Bhnm0 Snm0
(2.4)
+ vnm,t −h ), and
h=0
(α )
ynm,t =
t
∑
−1
h
[ln ⊗ (Hm0
Sm0
)]αm0,t −h .
h=0
The decomposition is appealing as it strips out the unit eigenvalues from Wn to construct Wnu so that the potential stable
and unstable components can be separated. In the above decomposition, because Wnu ’s eigenvalues are ones and zeros,
(τ )
if Hm0 contains unit eigenvalues as well, ynm,t represents the possible time trend (deterministic trend) due to exogenous
(u)
(s)
variables and individual fixed effects and ynm,t generates a stochastic trend. Otherwise, they would be stable. ynm,t is
3 (α )
assumed to be stable under cases S, MC, SC and E. ynm,t contains time effects due to time fixed effects.
By dropping the subscript 0 in a vector or a matrix, it refers to a corresponding vector or a matrix evaluated at a
possible parameter vector instead of the true value. For example, Snm = Inm − Wn ⊗ Ψm′ , where Ψm is a possible parameter
matrix with its true one being Ψm0 . Formal assumptions for Wn and related matrices are in Assumption 2.1.
Assumption 2.1.
(i) Wn is a nonstochastic, row-normalized and diagonalizable weights matrix. Row and column sums of Wn in absolute
−1
value are uniformly bounded, uniformly in n. Snm is nonsingular in the parameter space of Ψm ; and Snm and Snm
are
4
uniformly bounded in row and column sums norms∑
uniformly in Ψm ’s parameter space.
+∞
h
(ii) Wn = Γn ω̄n Γn−1 , all eigenvalues of Wn are real, and
h=1 abs(Bnm0 ) are bounded in row and column sums norms.
(iii) In addition to (ii), for unstable cases, Wn has n1 unit eigenvalues in ω̄n such that n1 /n tends to a positive constant
as n → ∞.5
A spatial weights matrix is a key object in a SAR model. In this study, we focus on the scenario of a nonstochastic
and row-normalized spatial weights matrix as these features are prevalent in empirical studies in regional economics and
social network (as a social norm effect, see Liu et al., 2014). Its corresponding eigenvalues and eigenvectors are better used
for dynamic analysis. We assume the row and column sum norms of the sequence of Wn are bounded, which could be
justified by the sparsity of a spatial weights matrix or decreasing interactions of spatial units with far away ones and there
are no units linked strongly by many others. We make the assumption of real eigenvalues of Wn for unstable models for
−1
proper analysis.6 The boundedness of row and column sum ∑
norms for Snm and Snm
uniformly in Ψm in its parameter space
+∞
h
are usual in spatial autoregressive models. We always need h=1 abs(Bnm0 ) to be bounded in row and column sum norms
in (ii) even for some unstable SVAR models. There are many examples of Wn , in particular, block diagonal matrices, with
a diverging number of unit eigenvalues that satisfy Assumption 2.1(iii). For instance, when studying peer effects among
2 See the supplementary file for detailed derivations.
3 Cases VC and PU are different as their unstable components are merely generated by the time lagged term, while in other cases the unstable
components are due to all three lags.
4 When evaluated at any value of Ψ ’s parameter space, we use S instead of S (Ψ ) for simplicity.
m
nm
nm
m
5 If n /n goes to zero, one would expect the unit root issue would asymptotically disappear and might not be of interest.
1
6 For the case that W is row-normalized from a non-negative symmetric matrix, it is known that all eigenvalues of W are real. For the stable
n
n
model, it is not necessary to assume that all eigenvalues ∑
of the weights matrix are real for consistency and asymptotic normality of the direct QMLE
+∞
h
(without transformation in Section 3). Instead we need
h=1 abs(Hnm0 ) to be bounded in row and column sum norms for the stable SVAR model.
However, for unstable models, it is not easy to analyze their estimators with complex unit eigenvalues for Wn .
K. Yang and L.-f. Lee / Journal of Econometrics 221 (2021) 337–367
341
teenagers, Cohen-Cole et al. (2018) use a sample consisting of 7669 students distributed over 124 schools. There are 1043
networks, which mean that there are at least 1043 unit eigenvalues of Wn when they use a row-normalized network
matrix. Brown and Laschever (2012) study peer effects on timing of retirement among Los Angeles teachers and define
that teachers in a school form a network. Their 7-year panel data sample consists of 31,931 person-year observations and
606 unique schools (network). In above examples, we can expect diverging number of unit eigenvalues with increasing
sample size. Furthermore, in regional studies, when we treat villages/counties/cities within a high level administrative
area (for example, states or provinces) as being in one network, the diagonal-block pattern of a spatial weights matrix
implies diverging number of unit eigenvalues. In the empirical regional study in Section 6, one of the possible networks
that we introduce is in this form.
2.1. Parameter space and (In)stability
The dynamic panel SVAR system has two dimensions — one in space and the other in time. In this paper, we assume
that the model exhibits stability in the spatial dimension at each time period.7 However, we consider the stable and
unstable cases in the temporal dimension, or mixed with both the spatial dimension and the temporal dimension. We
investigate parameter values with which the model can generate various cases.8
2.1.1. Categorization: the stable case and unstable cases
The stable equilibrium in the spatial dimension at each time period is presented in Assumption 2.2(i). For dynamic
panel data models, we need to consider also the temporal dimension. First of all, the stability of SVAR system requires
all the eigenvalues λH in absolute value of Hnm , |λH | < 1. Because Snm is invertible, the stable condition requires that
′
all solutions λ to |(In ⊗ Pm
+ Wn ⊗ Φm′ ) − λSnm | = 0 are inside the unit circle since it is equivalent to |Hnm − λInm | = 0.
By the spectral radius theorem, any eigenvalue of Hnm in absolute value will be less than or equal to ∥Hnm ∥ for any
consistent matrix norm ∥.∥.9 There are many sufficient conditions. For example, simple sufficient conditions can be
∥Pm′ ∥∞ + ∥Φm′ ∥∞ + ∥Ψm′ ∥∞ < 1, or ∥Pm′ ∥1 + ∥Φm′ ∥1 ∥Wn ∥1 + ∥Ψm′ ∥1 ∥Wn ∥1 < 1.10 These sufficient conditions suggest
that, in order to guarantee the model is stable, the column (row) sums of matrices for spatial lagged effects, time lagged
effects, diffusion effects cannot be large. Assumption 2.2(ii) states the condition for the stable case.
Assumption 2.2.
′
′
′
), vec(Pm
), vec(Φm
), vec(Ψm′ ), vec∗ (Σv m )]′ is
(i) (General conditions) The parameter space of coefficient θ = [vec(Πkm
11
compact and the true parameter θ0 is located in the interior of its parameter space. The parameter space of
coefficients Ψm is such that ρ (Ψm ) < 1, where ρ (Ψm ) represents the spectral radius of matrix Ψm .12 The covariance
matrix Σv m is nonsingular.
(ii) (Temporal stable case) The parameter space of coefficients Ψm , Pm and Φm is compact such that all solutions λ to
−1
′
|Snm
(In ⊗ Pm
+ Wn ⊗ Φm′ ) − λInm | = 0 are inside the unit circle.
(iii) (Temporal unstable case MC) All eigenvalues of Hm are less than or equal to one in absolute value, while the largest
eigenvalue equals one and the smallest eigenvalue in absolute value is less than one. Ψm + Φm ̸ = 0, ρ (Pm ) < 1, and
ρ ((Im − ωn,j Ψm′ )−1 (Pm + ωn,j Φm )′ ) < 1 for all those eigenvalues wn,j of Wn less than one in absolute value, where
such eigenvalues are assumed to be bounded away from 1, as n tends to infinity.
(iv) (Temporal unstable case SC) Hm = Im , Ψm + Φm ̸ = 0, and ρ (Pm ) < 1. ρ ((Im − ωn,j Ψm′ )−1 (Pm + ωn,j Φm )′ ) < 1 for all
those eigenvalues wn,j of Wn less than one in absolute value, where such eigenvalues are assumed to be bounded
away from 1, as n tends to infinity.
(v) (Temporal unstable case VC) Ψm + Φm = 0 and all eigenvalues of Pm are inside or on the unit circle while ρ (Pm ) = 1
and 0 < rank(Pm − Im ) < m. ρ ((Im − ωn,j Ψm′ )−1 (Pm + ωn,j Φm )′ ) = 1 for all those eigenvalues ωn,j of Wn for
j = 1, 2, . . . , n, as n tends to infinity.
(vi) (Temporal unstable case PU) Ψm + Φm = 0 and Pm = Im .
7 That would usually be for a cross sectional SAR model as it is an equilibrium model. An equilibrium model would require S to be invertible.
m0
If unit eigenvalues existed in Ψm0 with a row-normalized Wn , Snm0 would not be invertible. Of course, if eigenvalues are larger than one, then the
cross sectional model would be unstable. We do not consider such a case here.
8 We analyze the stability of the model using parameter of any value in its parameter spaces. We assume the true value is located in the interior
of its parameter space by Assumption 2.2.
9 Recall that a matrix norm ∥.∥ on Rn×m is consistent for two vector norms ∥.∥ on Rm and ∥.∥ on Rn if ∥Ax∥ ≤ ∥A∥∥x∥ for A ∈ Rn×m and
a
b
b
a
x ∈ Rm . All induced norms are consistent by definition. We usually use column sum norm ∥.∥1 and row sum norm ∥.∥∞ in this paper.
1
10 This is derived from ∥H ∥ < 1 for any induced matrix norm. As ∥H ∥ ≤ (∥I ⊗ P ′ ∥+∥W ⊗ Φ ′ ∥)∥S −1 ∥ ≤ (∥I ⊗ P ′ ∥+∥W ⊗ Φ ′ ∥)
,a
nm
nm
n
n
n
n
m
m
nm
m
m
1−∥W ⊗Ψ ′ ∥
n
m
′
stronger sufficient condition from the preceding inequality is ∥Pm
∥∞ +∥Φm′ ∥∞ ∥Wn ∥∞ +∥Ψm′ ∥∞ ∥Wn ∥∞ < 1, or ∥Pm′ ∥1 +∥Φm′ ∥1 ∥Wn ∥1 +∥Ψm′ ∥1 ∥Wn ∥1 <
1.
(
)
11 vec∗ (·) collects distinct entries of a matrix and transforms them into a vector. For example, vec∗
12 A general condition with any possible W is ρ (Ψ )ρ (W ) < 1.
n
m
n
σ11
σ21
σ12
σ22
= (σ11 , σ21 , σ22 )′ if σ12 = σ21 .
342
K. Yang and L.-f. Lee / Journal of Econometrics 221 (2021) 337–367
(vii) (An explosive case E) The largest eigenvalues in absolute value of Hm are larger than one. ρ (Pm ) < 1. ρ ((Im −
ωn,j Ψm′ )−1 (Pm + ωn,j Φm )′ ) < 1 for all those eigenvalues wn,j of Wn less than one in absolute value, where such
eigenvalues are assumed to be bounded away from 1, as n tends to infinity.
Assumption 2.2(i) characterizes the condition ρ (Ψm ) < 1 under which the SVAR system provides an equilibrium
outcome matrix Ynm,t in terms of time lagged variables, exogenous variables, and disturbances at each time period. We
maintain this assumption in the whole paper, which means the instability of the model will come from the temporal
dimension. Condition (ii) is equivalent to assuming that all eigenvalues of matrix Hnm are inside the unit circle.
(s)
Assumption 2.2(iii) and (iv) provide general conditions under which the component ynm,t is stable while other
components are unstable. Differing from a single equation model in Yu et al. (2012) and Lee and Yu (2010a), the model
might generate cointegration among variables and/or spatial units. There are two relevant issues: how to determine
the difference between spatial cointegration and mixed (spatial and variable) cointegration; and how to determine
the cointegration rank if cointegration occurs. We analyze characteristics of these unstable models in the following
subsections.
Assumption 2.2(v) characterizes an unstable case in which the cointegration relationship only exists among different
variables for each spatial unit. They are distinct from the spatial cointegration and mixed cointegration. It implies that
the coefficient matrix of own-variable and cross-variable spatial lagged effects and the coefficient matrix of own-variable
and cross-variable diffusion cancels out, i.e. Ψm + Φm = 0. Differing from the pure unit root case for a univariate SDPD
model, the case VC is firstly introduced in this paper. For the case VC if Ψm = Φm = 0, we get a degenerated case with
both spatial lagged effects and diffusion effects that disappear. So the model is the same with classic cointegration for
dynamic panel data models.
Assumption 2.2(vi) provides a condition under which the model exhibits pure unit root processes. This PU case can
also be viewed as an extension to the pure unit root case for a univariate SDPD model in Yu and Lee (2010). All variables
are unstable and they exhibit unit root characteristics.
Assumption 2.2(vii) characterizes an explosive case under which our proposed estimation method works.13
The important difference between cases VC/PU with cases SC/MC is that, for VC/PU, the spatial interaction does
not contribute to cointegrated relationships and no linear combinations of dependent variables from different spatial
units are stable. For cases SC/MC, the unstable components might be generated by the SVAR model not only in the
temporal dimension by time lags, but also mixed with the spatial dimension, which is what we are concerned with in this
paper. Also, PU and VC are relatively restrictive in assumptions, and require different estimation strategy with different
asymptotics, therefore in this paper we focus on cases S, SC, and MC in estimation. We propose an estimation method
which can be applied to these three cases. Indeed, it can also be applied to some explosive cases.
2.1.2. Unstable SC and MC cases: Error correction, rotation and cointegration rank
When there are relevant unit eigenvalues of the process, the system may be unstable. In the decomposition of Eq. (2.4),
(s)
the instability arises in all components except ynm,t for cases SC and MC. The cointegration among variables and spatial
units can be represented by an error correction model (ECM). We use the ECM representation to derive the cointegration
matrix and its cointegration rank. Also, the model can be represented by rotation using eigenvectors of the spatial weights
matrix, which can present different views among cases SC, MC, VC and PU. In this subsection, we investigate cointegration
relationships and cointegration ranks for cases SC and MC.
First, we represent the model with its parameters in the true value as an ECM and derive the cointegration matrix. We
−1
subtract a time lagged term from both sides of (2.2) and arrive at ynm,t − ynm,t −1 = (Hnm0 − Inm )ynm,t −1 + Snm0
[Qnm0,t +
vec(D′m0,t ) + vnm,t ]. It follows that an error correction representation is
−1 θ
−1
ynm,t − ynm,t −1 = Snm0
Anm0 ynm,t −1 + Snm0
[Qnm0,t + vec(D′m0,t ) + vnm,t ],
(2.5)
and
Aθnm0 = In ⊗ (Pm0 − Im )′ + Wn ⊗ (Ψm0 + Φm0 )′
(2.6)
= (In − Wn ) ⊗ (Pm0 − Im )′ + Wn ⊗ (Ψm0 + Φm0 + Pm0 − Im )′ .
By showing the stability of Aθnm0 ynm,t −1 , we have Aθnm0 being a cointegration matrix. In the cointegration case, ynm,t ,
(α )
(u)
(τ )
(α )
and ynm,t in (2.4) of ynm,t may be unstable if some of the eigenvalues of Hm0 are ones. As ynm,t , ynm,t and ynm,t all
∑t
h
include the summation
h=0 Hm0 , there may be instability in terms of time trend due to the summation in the presence
of unit eigenvalues in Hm0 . However, these unstable components of ynm,t can be taken by linear combinations Aθnm0 to
h
h
become stable. To see this possibility, we note that (Ψm0 + Φm0 + Pm0 − Im )′ Hm0
= Sm0 (Hm0 − Im )Hm0
. Suppose λm is
h
h
an eigenvalue of Hm0 , then the corresponding eigenvalue of (Hm0 − Im )Hm0 should be (λm − 1)λm , which will be 0 when
τ)
λm = 1. In addition, (In − Wn )ln = 0, and (In − Wn )Wnu = 0. Thus as λhm geometrically declines in h for |λm | < 1, Aθnm0 y(nm
,t ,
(u)
(
α
)
θ
θ
θ
14
Anm0 ynm,t , and Anm0 ynm,t become stable. In consequence, Anm0 ynm,t is stable and we have cointegration.
(u)
(τ )
ynm,t
13 Note that there are various explosive cases and properties of estimators under these cases would not be standard.
∑t
(u)
−1
t +1
θ
h
14 Aθ y(τ ) = ∑t [W u ⊗ (S (H − I )H h S −1 )]Q
u
u
m0
m0
m
nm0,t −h , Anm0 ynm,t =
n
nm0 nm,t
m0 m0
h=0
h=0 [Wn ⊗ (Sm0 (Hm0 − Im )Hm0 Sm0 )]vnm,t −h + Wn ⊗ (Sm0 (Hm0 − Im )Hm0 )ynm,−1 ,
(α )
(α )
−1
θ
h
and Aθnm0 ynm,t =
h=0 [ln ⊗ (Sm0 (Hm0 − Im )Hm0 Sm0 )]αm0,t −h . Strictly speaking, for the claim that Anm0 ynm,t to be stable, we need to exclude possible
explosive behavior for αm,t over time t.
∑t
K. Yang and L.-f. Lee / Journal of Econometrics 221 (2021) 337–367
343
The cointegration rank can be derived from the rank of Aθnm0 . Actually, the cointegration relationship may be determined
−1
by parameters and the spatial weights matrix. With Wn = Γn ω̄n Γn−1 , multiply by Snm0
,
θ
Snm0 Anm0 = (Γn ⊗ Im )
−1
(
An1 ,m
0
)
0
An2 ,m
(Γn−1 ⊗ Im ),
−1
′ −1
′
(Pm0 + Ψm0 + Φm0 − Im )′ ], and An2 ,m = (In2 ⊗ Im − ω̄n2 ⊗ Ψm0
) [In2 ⊗ Pm0
+ ω̄n2 ⊗ (Ψm0 + Φm0 )′ − In2 ⊗ Im ],
where An1 ,m = In1 ⊗[Sm0
with ω̄n2 = diag{0, . . . , 0, wn,n1 +1 , . . . , wn,n }. An2 ,m can have full rank, n2 m, as in Assumption 2.2(iii) and (iv). As
−1
Sm0
(Pm0 + Ψm0 + Φm0 − Im )′ = Hm0 − Im has eigenvalues (λm − 1)’s, where λm ’s are eigenvalues of Hm0 , An1 ,m has rank
n1 (m − m1 ), where m1 is the number of unit eigenvalues of Hnm0 . Therefore, cointegration rank of the system is nm − n1 m1 .
For case MC, 0 < m1 < m, while the cointegration rank for case SC is (n − n1 )m because the SC case has Hm0 = Im and
hence, m1 = m.
We use rotations of variables to investigate the difference between spatial cointegration and mixed cointegration.
We utilize the known information provided by the spatial weights matrix to rotate the dependent variables, in order to
explicitly view how unit root processes are generated due to unit eigenvalues of Wn and Hm0 . The diagonalizability of Wn
gives
Wn = Γn ω̄n Γn
−1
= (Γn,n1 , Γn,n2 )
(
0
In 1
0
)(
ω̄n2
Γn∗1 ,n
Γn∗2 ,n
)
,
(2.7)
where ω̄n has n1 unit eigenvalues in In1 and the remaining n2 = (n − n1 ) eigenvalues in ω̄n2 are strictly less than one
in absolute value. Γn = (Γn,n1 , Γn,n2 ) where the first n1 columns are eigenvectors with respect to unit eigenvalues and
the rest are with respect to other eigenvalues. Γn∗1 ,n represent the first n1 rows of Γn−1 while the rest rows form Γn∗2 ,n .
The original system is Ynm,t = Wn Ynm,t Ψm0 + Ynm,t −1 Pm0 + Wn Ynm,t −1 Φm0 + Unm,t , where Unm,t represents the remaining
model components (regressors, time fixed effects, individual fixed effects and idiosyncratic disturbances) for simplicity.
+
+
+
+
−1
Applying the transformation Γn−1 to Ynm,t , we arrive
( + at Y)nm,t = ω̄n Ynm,t Ψm0 + Ynm,t −1 Pm0 + ω̄n Ynm,t −1 Φm0 + Γn Unm,t ,
Yn1 m,s
+
−1
+
, then we have
where Ynm
,s = Γn Ynm,s for any s. With Ynm,s =
Yn+2 m,s
Yn+1 m,t = Yn+1 m,t Ψm0 + Yn+ m,t −1 Pm0 + Yn+ m,t −1 Φm0 + Γn∗1 Unm,t ,
1
1
(2.8)
Yn+2 m,t = ω̄n2 Yn+2 m,t Ψm0 + Yn+ m,t −1 Pm0 + ω̄n2 Yn+ m,t −1 Φm0 + Γn∗2 Unm,t .
2
2
Rearrange the parameters, we arrive at
′
−1
′
(i) Yn+1 m,t = Yn+ m,t −1 Hm0
+ Γn∗1 Unm,t Sm0
,
1
′
′
′
′
′
⊗ ω̄n2 )vec(Yn+ m,t −1 )
⊗ In2 + Φm0
(ii) vec(Yn+2 m,t ) = (Im − Ψm0
⊗ ω̄n2 )−1 (Pm0
2
(2.9)
′
+ (Im − Ψm0
⊗ ω̄n2 )−1 vec(Γn∗2 Unm,t ).
The subsystem (2.9)(ii) is stable under Assumption 2.2(iii) and (iv), as this subsystem captures stable components of
variables. However, for the unstable subsystem, depending on different scenarios of (2.9)(i), the model can generate
′ −1
′
various cases. Transform (2.9)(i) into ∆Yn+1 m,t = Yn+ m,t −1 (Hm0
− Im ) + Γn∗1 Unm,t Sm0
, which is similar to the error correction
1
form for a panel VAR model.
For the case SC, the rank of Hm0 − Im is 0, which suggests that there is no cointegration relationship among the m
+
+
+
θ
variables Y·+
1,n1 m,t , Y·2,n1 m,t , . . . , Y·m,n1 m,t , which are columns of Yn1 m,t . However, since rank(Anm0 ) = (n − n1 )m for case SC,
there are (n − n1 )m combinations among variables, which are stable. The decomposition of system (2.9) implies that those
combinations of variables that are linked by spatial weights matrix in (2.9)(ii) are still stable. As unstable components
of variables cancel out with each other across spatial units and the cointegration rank is determined only by the spatial
weights matrix, therefore, the case is named as pure spatial cointegration-case SC.
For the case MC, the rank of Hm0 has 0 < rank(Hm0 − Im ) < m, as eigenvalues of Hm0 are between 0 and 1 in absolute
values in Assumption 2.2(iii). To simplify the analysis, we assume that the parameter matrix Hm0 is diagonalizable such
that
−1
′
Hm0
= Υm0 Λm0 Υm0
,
where Υm0 is a nonsingular eigenvector matrix. The decomposition resulted in (2.9)(i) implies there are m1 unit root
processes among different variables for each spatial unit. There are n1 rows in Yn+1 m,t , which suggests the total number
of unit root processes is n1 m1 . A further decomposition of systems in (2.8) reveals the unit root precesses clearer. As
−1
′
= Υm0 Λm0 Υm0
, where Υm0 is a nonsingular
Hm0
( matrix and the
) eigenvalue matrix Λm0 has m1 unit eigenvalues. The
Im1
0
unit values can be arranged such that Λm0 =
, i.e., the first m1 eigenvalues are 1 and the remaining
0
Λm2 0
+
∗
ones are smaller than 1 in absolute values. Denote Yn1 m,t = Yn1 m,t Υm0 and, conformably Yn∗1 m,t = (Yn∗1 m1 ,t , Yn∗1 m2 ,t ) and
344
K. Yang and L.-f. Lee / Journal of Econometrics 221 (2021) 337–367
Table 2.1
Stable and unstable cases for SVAR models.
Name
Parameter
restriction
Cointegration matrix
(Assumption 2.2)
Cointegration
rank
Cointegration
relationship
Stable (S)
(ii)
n.a.
n.a.
n.a.
Spatial
Cointegration
(SC)
(iii)
(In − Wn ) ⊗ (Pm0 − Im )′
(n − n1 )m
Across spatial
units
Mixed
Cointegration
(MC)
(iv)
Aθnm0 in (2.5)
nm − n1 m1
Both
Variable
Cointegration
(VC)
(v)
In ⊗ (Pm0 − Im )′
n(m − m1 )
Among variables
in each spatial
unit
Pure
Unit Root
(PU)
(vi)
0
n.a.
None
Explosive (E)
Variousa
n.a.
n.a.
n.a.
a
We can define many scenarios that are ‘‘explosive’’. We introduce an estimation method, which could
be applied to one of those explosive cases. It is because the decomposition in (2.4) still works for this
case.
Υm0 = [Υm1 0 , Υm2 0 ]. Then we have the following unit root subsystem,
(i) Yn∗1 m1 ,t = Yn∗1 m1 ,t −1 + Γn∗1 Unm,t (Im − Ψm0 )−1 Υm1 0 ,
(ii) Yn∗1 m2 ,t = Yn∗1 m2 ,t −1 Λm2 0 + Γn∗1 Unm,t (Im − Ψm0 )−1 Υm2 0 ,
(2.10)
(iii) Yn+2 m,t = ω̄n2 Yn+2 m,t Ψm0 + Yn+ m,t −1 Pm0 + ω̄n2 Yn+ m,t −1 Φm0 + Γn∗2 Unm,t ,
2
2
where (i) contains n1 m1 unit root processes and (ii) consists of stable components. Recall that (2.10)(iii) implies (2.8)(ii),
thus it is also a stable subsystem. In sum, the transformed system can be divided into three subsystems, and the first one
consists of unit root processes. The second subsystem reveals that for each spatial unit there are m2 stable relationships
among variables. The third subsystem reveals that spatially there are n2 stable relationships among variables for different
units. Hence, the cointegration rank should be nm − n1 m1 while the unstable components cancel out across variables
and across spatial units. The cointegration rank is determined not only by the number of unit eigenvalues of the spatial
weight matrix, but also the unknown parameter matrices, which differs from that in Yu et al. (2012). It can be named as
a mixed cointegration case (MC).
In Table 2.1, we summarize the characteristics of stable and unstable cases.15 An example of bivariate triangular SVAR
model is provided in the supplementary file.
3. QML estimation
We study the estimation strategy and asymptotic distributions of estimators for stable, spatial cointegration and mixed
cointegration models. For estimation, we list assumptions of the SVAR model.
Assumption 3.1.
(i) vn,t ,i· , the ith row of Vnm,t , for all i and t, are i.i.d. random vectors of dimension m with zero mean and covariance
matrix Σv m0 . The elements of disturbances satisfy the moment condition supk,l,p,q E|vn,t ,ik vn,t ,il vn,t ,ip vn,t ,iq |1+δ < ∞
for some constant δ > 0.
(ii) Elements of Xnk,t are exogenous constants, uniformly bounded for all n and t.
15 The decomposition, estimation strategy and asymptotic analysis for both cases PU and VC are distinct from stable, spatial cointegration, mixed
cointegration cases, and they need to be analyzed separately.
K. Yang and L.-f. Lee / Journal of Econometrics 221 (2021) 337–367
345
(iii) n is a nondecreasing function of T . Both T and n tend to infinity.
(iv) Parameters satisfied Assumption 2.2(i) and data is generated under conditions in Assumption 2.2(ii)–(iv), or (vii).
According to Assumption 3.1(i), although disturbances are i.i.d. across time and individual spatial units, they are
allowed to be correlated among different variables. The homogeneous disturbance assumption is restrictive but it is
common in the spatial econometrics literature with QML estimation, we maintain this assumption in the paper.16
We consider the model with large T . As we are focusing on QML estimation and the investigation of dynamics without
any extra specification on the initial value, we need large T . For spatial econometrics, the number of spatial units n
is usually large, so we focus on the case with both large n and T . When n and T tend to infinity, the numbers of
both individual and time fixed effects go to infinity. We eliminate time fixed effects and have individual fixed effects
and common parameters of interest consistently estimated. To estimate the common parameters, it is computationally
effective with a concentrated log likelihood function by concentrating out individual fixed effects. Our asymptotic analysis
on the QML estimator of common parameters can also be easier with the concentrated log likelihood function. The
elimination of time effects can avoid the incidental parameter problem due to many time dummies, but the incidental
parameter problem remains for the many individual fixed effects.17
The dependent variable ynm,t of the model can be decomposed into four components. As all the eigenvalues of Bnm0 are
(s)
inside the unit circle, ynm,t is stable. But if some eigenvalues of Hm0 are equal to or larger than 1, we have respectively,
the possible SC and MC, and an explosive case defined in Assumption 2.2(vii). A direct QML with observed data without
(τ )
(u)
(s)
transformation will use all the components ynm,t , ynm,t , and ynm,t in estimation. When the true data generating process is
in one of those cases, asymptotic properties of the direct QMLE are not yet known, but would be non-standard. However,
the robust QMLE with the proposed spatial difference transformation method would overcome such a difficulty.
(τ )
(u)
(α )
In our proposed estimation, we eliminate ynm,t , ynm,t , and ynm,t and estimate parameters of the model using only
(s)
information in ynm,t . Note that the spatial difference In − Wn has its corresponding zero eigenvalues when Wn has unit
(u)
(τ )
eigenvalues, but the same eigenvectors Γn of Wn . We have [(In − Wn ) ⊗ Im ]ynm,t = [(In − Wn ) ⊗ Im ]ynm,t = 0 since
(α )
(In − Wn )Γn 1n,n1 = 0 and [(In − Wn ) ⊗ Im ]ynm,t = 0 from (In − Wn )ln = 0 as Wn is row-normalized. Therefore,
(s)
′
′
[(In − Wn ) ⊗ Im ]ynm,t = [(In − Wn ) ⊗ Im ]ynm
,t is stable. E([(In − Wn ) ⊗ Im ]vnm,t vnm,t [(In − Wn ) ⊗ Im ]) = Σn ⊗ Σv m0 ,
′
where Σn = (In − Wn )(In − Wn ) is symmetric but its rank is n − n1 . To eliminate the linear dependence among the
18
transformed disturbances, we use the eigenvalues and eigenvectors
decomposition.
Decompose Σn = Rn ΛΣn R′n where
(
)
Λ1n 0
Rn is the orthonormal eigenvectors matrix and ΛΣn =
is the eigenvalue matrix. Define Rn = [R1n , R2n ]
0
0
†
−1
1
2
where R1n corresponds to (nonzero) eigenvalues of Λ1n and R2n to those zero eigenvalues. Define Wn = Λ1n2 R′1n Wn R1n Λ1n
,
−1
†
†
†
†
Ynm,t = Λ1n2 R′1n (In − Wn )Ynm,t , and similarly for other variables Xnk,t , Cnm0 , and Vnm,t . As Wn is known, all those transformed
vectors and matrices are known, and we have
†
†
†
†
†
†
†
Ynm,t = Wn† Ynm,t Ψm0 + Ynm,t −1 Pm0 + Wn† Ynm,t −1 Φm0 + Xnk,t Πkm0 + Cnm0 + Vnm,t .
†′
(3.1)
− 21
†′
− 12
The covariance matrix of the disturbances is E(vec(Vnm,t )vec(Vnm,t )′ ) = (Λ1n R′1n Σn R1n Λ1n ) ⊗ Σv m0 = In−n1 ⊗ Σv m0 .
1
2
− 12
†
†
Denote Snm0 = (Λ1n R′1n ⊗ Im )Snm0 (R1n Λ1n ⊗ Im ), therefore |Snm0 | = |(R′1n ⊗ Im )Snm0 (R1n ⊗ Im )|. Furthermore, |Snm0 | =
n1
†′
′
| .19
|Snm0 ∥ Im − Ψm0
†
†
The partial quasi log-likelihood function for Ynm,t for t = 1, . . . , T , by regarding Ynm,0 as if it is given, is
1
(n − n1 )T
ln LnT ,m (θ, Cnm ) = −
−
m
2
ln(2π ) +
1
2(n − n1 )T
1
n − n1
T
∑
ln |Snm | −
n1
n − n1
ln |Im − Ψm′ | −
1
2
ln |Σv m |
(3.2)
vnm,t (θ )′ (Jn∗ ⊗ Σv−m1 )vnm,t (θ ),
t =1
′
where Jn = (In − Wn ) R1n Λ1n R1n (In − Wn ) and vnm,t (θ ) = (Imn − Wn ⊗ Ψm′ )ynm,t − (In ⊗ Pm
+ Wn ⊗ Φm′ )ynm,t −1 − (Xnk,t ⊗
′
Im )vec(Πkm0 ) − cnm .
The first order condition with respect to cnm gives an estimate of individual effects in terms of other parameters,
∗
ĉnm =
′
−1 ′
T
1∑
T
′
[Snm ynm,t − (In ⊗ Pm′ + Wn ⊗ Φm′ )ynm,t −1 − (Xnk,t ⊗ Im )vec(Πkm
)].
t =1
16 If disturbances had heteroskedastic variances, a QML estimation might not be consistent. Other estimation methods are needed.
17 As T tends to infinity, many individual effects would not generate inconsistency for QML estimates, however, asymptotic biases would remain.
One might eliminate the individual effects by time differencing before estimation. However, due to the remaining initial value problem in a dynamic
model, asymptotic bias issue remains. So there is not much estimation advantages to eliminate individual fixed effects for estimation.
18 The method in this section can be found in textbooks, such as Theil (1971).
19 See the supplementary file for details of derivation.
346
K. Yang and L.-f. Lee / Journal of Econometrics 221 (2021) 337–367
∑T
1
T
Define temporal means ȳnm,T =
t =1
ynm,t , and ỹnm,t = ynm,t − ȳnm,T , similarly for X̃nk,t , for t = 1, . . . , T as deviations
˜
¯
from time means. Furthermore, for time lagged explanatory variables, ȳ¯ nm,T = T1
t =1 ynm,t −1 and ỹnm,t = ynm,t − ȳnm,T ,
for t = 0, . . . , T − 1. The concentrated log likelihood function with individual effects concentrated out is
1
ln LnT ,m (θ ) = −
(n − n1 )T
−
m
2
∑T
1
ln(2π ) +
n − n1
T
∑
1
2(n − n1 )T
ln |Snm | −
n1
ln |Im − Ψm′ | −
n − n1
1
2
ln |Σv m |
(3.3)
ṽnm,t (θ )′ (Jn∗ ⊗ Σv−m1 )ṽnm,t (θ )
t =1
′
′ ˜
). With the concentrated log likelihood, we
)ỹnm,t −1 − (X̃nk,t ⊗ Im )vec(Πkm
where ṽnm,t (θ ) = Snm ỹnm,t − (In ⊗ Pm + Wn ⊗ Φm
can focus on the estimation and computation of common parameters of interest. Furthermore, asymptotic analysis on the
common parameters’ estimators can be simplified without facing explicitly an infinite number of parameters.
For the stable (S) and unstable (SC and MC) models, their identification and estimation are based on (3.3). The log
likelihood for the SVAR above needs to be maximized by evaluating many parameters. We introduce a computational
efficient method to accelerate the evaluation process by concentration as described in the supplementary file. This method
needs to evaluate only the parameter matrix Ψm in the final concentrated log likelihood.
The SVAR model contains own-variable spatial lags and cross-variable spatial lags as endogenous explanatory variables,
and the presence of endogenous variables raises an identification issue. If an optimal IV exists, we can employ the optimal
IV for endogenous variables in the equation system to identify each equation. Let e′m,j be the 1 × m row unit vector with
′
all elements being zeros except for its jth entry taking the value one. Define Ãz ,t = [ã1,t , ã2,t ..., ãm,t ], an n × m matrix
˜
˜
−1
′
′
′
[(In ⊗ Pm0
+ Wn ⊗ Φm0
)ỹ˜ nm,t −1 + (X̃nk,t ⊗ Im )vec(Πkm0
where ãj,t = (Wn ⊗ e′m,j )Snm0
)], and Z̃nm,t = [Ỹnm,t −1 , Wn Ỹnm,t −1 , X̃nk,t ]
an n × (2m + k) matrix. The matrix Ãz ,t represents optimal IVs for spatial lagged terms Ỹnm,t and they can be derived
from the expectation of the reduced form for ynm,t , while Z̃nm,t consists of predetermined and exogenous regressors. For
a finite sample, the parameters Ψm , Pm , Φm , and Πkm are identified if the matrix consists of first 2m + k columns as
′
′
′
′
′
′
′
′
[Znm
,1 , Znm,2 , . . . , Znm,T ] with m additional columns [Ãz ,1 , Ãz ,2 , . . . , Ãz ,T ] has full column rank 3m + k. Assumption 3.2
formally states identification conditions.
Assumption 3.2.
(i) (a) limT →∞
1
(n−1)T
limT →∞
[
∑T
1
(n−1)T
t =1
∑T
t =1
]
′
∗
E Z̃nm
,t Jn Z̃nm,t exists and is nonsingular.
[
]
E Ã′z ,t Jn∗ Ãz ,t and limT →∞
(b) The limit matrix limT →∞
where PzzA,t = Ãz ,t − Z̃nm,t
[
1
(n−n1 )T
1
(n−n1 )T
∑T
[
t =1
∑T
1
(n−1)T
t =1
∑T
[
t =1
]
′
∗
E Z̃nm
,t Jn Ãz ,t exist.
]
′
∗
E PzzA
,t Jn PzzA,t exists and is nonsingular,
′
∗
Z̃nm
,t Jn Z̃nm,t
]−1 [
1
(n−n1 )T
∑T
]
t =1
′
∗
Z̃nm
,t Jn Ãz ,t .
(ii)
[
lim
n→+∞
−
′
†−1
† −1 †
∗
1 †
|Snm0
Snm (In−n1 ⊗ Σv∗−
m )Snm Snm0 (In−n1 ⊗ Σv m0 )|
1
m(n − n1 )
′
†′ −1 †′
Tr[Snm0 Snm
(In−n1
1
m(n−n1 )
]
⊗
†−1
1 †
Σv∗−
m )Snm Snm0 (In−n1
⊗
Σv∗m0 )]
<0
unless Σv∗m = Σv∗m0 and Ψm = Ψm0 , where σm2 = Tr(Σv m ) and Σv∗m = Σv m /σm2 .
Assumption 3.2(i) implies the existence and nonsingularity of the limiting matrix limT →+∞
[(
)]
1
(n−n1 )T
∑T
t =1
E
′
∗
′
∗
Z̃nm
Z̃nm
,t Jn Z̃nm,t
,t Jn Ãz ,t
, which requires optimal instruments for endogenous spatial lags exist and they are not
′ ∗
′ ∗
Ãz ,t Jn Z̃nm,t
Ãz ,t Jn Ãz ,t
linearly dependent with each other. Assumption 3.2(ii) exploits the information of disturbances and the structure of the
spatial weights matrix, which is similar to simultaneous equations SAR models for cross-sectional data in Yang and Lee
(2017). Intuitively, Assumption 3.2(ii) utilizes the i.i.d. assumption for disturbances across time and space.20 Lemma 3.1
†′
†
†′
†
states that the linear independence of In−n1 , Wn , Wn and Wn Wn is sufficient to guarantee the arithmetic–geometric
inequality in Assumption 3.2(ii) with a finite sample hold.
Lemma 3.1.
1
†′
†
†′
†
†′ −1 †′
†
†−1
1
When In−n1 , Wn , Wn and Wn Wn are linearly independent, then |Snm0 Snm (In−n1 ⊗ Σv∗−
m )Snm Snm0 (In−n1
†′ − 1 †′
Snm0 Snm (In−n1
1
⊗ Σv∗m0 )| m(n−n1 ) − m(n−
Tr[
n )
1
†−1
∗
∗
1 †
∗
⊗ Σv∗−
m )Snm Snm0 (In−n1 ⊗ Σv m0 )] < 0, unless Σv m = Σv m0 and Ψm = Ψm0 .
20 This condition can be extended for models with more complicated disturbances as long as proper assumptions can be put on the first and
second moments of disturbance terms; for example, spatial moving average or spatial autoregression disturbances.
K. Yang and L.-f. Lee / Journal of Econometrics 221 (2021) 337–367
347
The identification and consistency of parameters of the SVAR model is established in the following proposition.
Proposition 3.1(i) and (ii) indicate that two sets of conditions in Assumption 3.2 can separately identify the parameters.
Therefore, as long as either one holds, the parameters can be consistently estimated.
Proposition 3.1.
Under Assumptions 2.1(i)–(ii) and 3.1,
(1) with Assumption 3.2(i), Ψm , Pm , Φm and Πkm can be identified;
(2) with Assumption 3.2(ii), Ψm and Σv m can be identified. Furthermore, if Assumption 3.2(i) (a) holds, then Pm , Φm and
Πkm can also be identified;
p
(3) as T goes to infinity, (n−1n )T ln LnT ,m (θ ) − E [ (n−1n )T ln LnT ,m (θ )] → 0 uniformly in θ in its compact parameter space;
1
1
(4) with Assumption 3.2, the QMLE θ̂nT = argmaxθ (n−1n
1 )T
ln LnT ,m (θ ) for the SVAR model is consistent.
The asymptotic distribution of the QML estimator θ̂nT can be derived from the Taylor expansion of the log likelihood
function θ̂nT − θ0 = −
[
∂ 2 ln LnT ,m (θ̃nT )
∂θ ∂θ ′
]−1
∂ ln LnT ,m (θ0 )
,
∂θ
where θ̃nT lies between θ̂nT and θ0 . At the true θ0 , the first order
∂ ln LnT ,m0
∂θ
∂ ln LnT ,m (θ0 )
∂θ
∂ ln L1
∂ ln LR
nT ,m0
nT ,m0
derivatives can be decomposed as
≡
=
+
, where the first component captures the
∂θ
∂θ
score with zero mean, and the second component generates possible asymptotic bias. Detailed formulas of the above
decomposition are in the supplementary file (Section 3). By inspection,
∑T
√
∂ ln L1nT ,m0
1
∂θ
(n−n1 )T
is a statistic of the general form:
∑
h
{Unm,t −1 vnm,t + vnm,t B1nm vnm,t + Dnm,t vnm,t − Tr[B1nm (In ⊗ Σvm0 )]}, where Unm,t = +∞
h=1 Gnm,c Gnm,d vnm,t −h+1 , B1nm ,
∑+∞
h
Gnm,c and Gnm,d are generic nm × nm matrices, such that B1nm , Gnm,c , and h=1 abs(Gnm,d ) are bounded in row and column
sum norms, and Dnm,t is an nm × 1 generic vector with uniformly bounded entries. As the components of the disturbance
′
′
′
t =1
vector for each i at t of vnm,t are not independently distributed, the central limit theorem for linear–quadratic form in Yu
et al. (2008) for the univariate case needs to be extended to the multivariate case.
Lemma 3.2.
Suppose QnT ,m
=
∑T
t =1
[U′nm,t −1 vnm,t + v′nm,t B1nm vnm,t + D′nm,t vnm,t − Tr(B1nm Σvm )], Vnm,t satisfies
2
Assumption 3.1(i), its variance σQ
is O(nT ), and
nT ,m
1
nT
2
σQ
is bounded away from zero, then
nT ,m
QnT ,m
σQnT ,m
d
−
→ N(0, 1).
With
the variance matrix of the normalized score vector consists of two components:
( a proper decomposition,
)
Cov
√
∂ ln L1nT ,m0
1
∂θ
(n−n1 )T
= Ωθ0 ,nT + Ξθ0 ,nT + O
(1)
T
, where if disturbances are normally distributed, Ξθ0 ,nT = 0. The
detailed expressions for Ωθ0 ,nT and Ξθ0 ,nT are in the supplementary file (Section 3). By the CLT in Lemma 3.2,
∂ ln L1nT ,m0
∂θ
√
1
(n−n1 )T
d
→ N(0, Ωθ0 + Ξθ0 ), where Ωθ0 = limT →+∞ Ωθ0 ,nT and Ξθ0 = limT →+∞ Ξθ0 ,nT which are assumed to exist. The
∂ ln LRnT ,m0
¯ ′ v̄
¯
other component √(n−1n )T
has expressions in forms with either v̄′nm,T B1nm v̄nm,T or Ū
nm,T nm,T , where Ūnm,T =
∂θ
1
√
R
∑T −1
∂ ln LnT ,m0
n−n1
1
U
, due to the concentration of individual effects. This component has √(n−1n )T
=
∆R,nT +
T (√t =0 ) nm,t
∂θ
T
1
n
where explicit expressions of entries in ∆R,nT , which capture possible asymptotic bias components, are in the
O
T3
Appendix D.
For the normalized
Lemma 3.3.
∂ 2 ln LnT ,m (θ̂nT )
1
,
(n−n1 )T
∂θ ∂θ ′
Under Assumptions 2.1(i)–(ii), 3.1, and 3.2,
(1) for any consistent estimate θ̂nT of θ0 ,
and (2)
it has the following regular convergence properties.
∂ 2 ln LnT ,m (θ0 )
1
(n−n1 )T
∂θ ∂θ ′
∂ 2 ln LnT ,m (θ̂nT )
1
(n−n1 )T
∂θ ∂θ ′
−
∂ 2 ln LnT ,m (θ0 )
1
(n−n1 )T
∂θ∂θ ′
= op (1);
+ Ωθ0 ,nT = op (1).
By defining ∆θ0 ,nT = Ωθ−1,nT ∆R,nT , the QMLE θ̂nT has the following asymptotic distribution:
0
Theorem 3.1.
√
Under Assumptions 2.1(i)–(ii), 3.1, and 3.2, when T → +∞,
√
(n − n1 )T (θ̂nT − θ0 ) −
n − n1
T
(
(√
∆θ0 ,nT + Op max
Consequently, (1) if (n − n1 )/T → 0, then the bias
Ξθ0 )Ωθ−01 );
(2) if (n − n1 )/T → M, a finite positive constant,
p
√
√
n − n1
n−n1
∆θ0 ,nT
T
T3
√ ))
,
1
d
→ N(0, Ωθ−01 (Ωθ0 + Ξθ0 )Ωθ−01 ).
T
vanishes, and
(n − n1 )T (θ̂nT − θ0 ) −
(3) if (n − n1 )/T → ∞, T (θ̂nT − θ0 ) − ∆θ0 ,nT → 0.
For the case that the disturbances are normally distributed, Ξθ0 = 0.
√
√
d
(n − n1 )T (θ̂nT − θ0 ) → N(0, Ωθ−01 (Ωθ0 +
d
M ∆θ0 ,nT → N(0, Ωθ−01 (Ωθ0 + Ξθ0 )Ωθ−01 ); and
348
K. Yang and L.-f. Lee / Journal of Econometrics 221 (2021) 337–367
The analysis of the asymptotic distribution differs from Lee and Yu (2010a) with the application of lemmas of
convergence for multiple dependent variables in each spatial unit and the multivariate CLT, which are in Appendix B.
When Tn → M > 0, the QMLE is not asymptotically centered at zero. However, the asymptotic bias may be
eliminated by considering a proper bias correction. As the asymptotic bias of θ̂nT is limT →+∞ T1 ∆θ0 ,nT , we define the
c
bias corrected estimator θ̂nT
= θ̂nT −
√
θ̂
1
∆θ ,nT ( nT ),
T
where ∆θ,nT (θ̂nT ) = −
[
∂ 2 ln LnT ,m (θ̂nT )
1
(n−n1 )T
∂θ∂θ ′
]−1
∆R,nT (θ̂nT ). To show
that θ̂ can be
nT consistent and asymptotically normally centered at zero, additional technical assumption is
−
needed. Define Snm
= (Γn ⊗ Im )diag {0, . . . , 0, (Im − ωn,n1 +1 Ψm′ )−1 , . . . , (Im − ωn,n Ψm′ )−1 }(Γn−1 ⊗ Im ), Ewn m,ij = (Γn ⊗
′
′
′
′
−1
−1
Im )diag {0, . . . , 0, ωn,n1 +1 Em
,ij , . . . , ωn,n Em,ij }(Γn ⊗ Im ) and Enm,ij = (Γn ⊗ Im )diag {0, . . . , 0, Em,ij , . . . , Em,ij }(Γn ⊗ Im ) with
′
the first n1 diagonal blocks being zeros. Em,ij = em,i em,j .
c
nT
Assumption 3.3. The row and column sum norms of the sequences
∑∞ ∑g −1
g =1
h=0
g −1−h
−
Bhnm Snm
Ewn m,ij Bnm
and
g −1−h
Snm Enm,ij Bnm
are bounded uniformly in absolute value in the parameter space for i, j = 1, . . . , m.
Theorem 3.2.
c
Under Assumptions 2.1(i)–(ii), 3.1–3.3, when n/T 3 → 0, the bias corrected QMLE θ̂nT
has
−
d
θ0 ) →
N(0, Ωθ−01 (Ωθ0
+
∑∞ ∑g −1
√
g =1
h=0
Bhnm
c
(n − n1 )T (θ̂nT
−
Ξθ0 )Ωθ−01 ).
4. Tests for cointegration rank
It is a crucial issue to distinguish stable and unstable cases. For cases SC and MC, the cointegration rank is also
important since an (n − n1 )m rank implies that the model exhibits pure spatial cointegration while an nm − n1 m1 rank for
0 < m1 < m, which is relatively larger, indicates that dependent variables are mixed cointegrated among spatial units and
with each other in a spatial unit. For the univariate case, m = 1, it is clear that only the spatial cointegration might occur.
Therefore, we introduce a hypothesis testing procedure under Assumption 2.2(ii) or (iii) or (iv), which can distinguish
the stable model, spatial cointegration model, or mixed cointegration model generalizing Johansen’s cointegration rank
test (Johansen, 1988, 1991).21 We will firstly introduce the transformation of the model, and present the procedure and
statistic for the hypothesis testing. Then we derive the asymptotic distribution of the statistic.
The transformation procedure consists of multiple steps and is notationally intensive. However, the basic idea is
to decompose the original system like (2.8) and only use the possible unstable subsystem with time dummies being
eliminated. Then, concentrate out all regressors and individual fixed effects. Third, use the log-likelihood function to
construct the likelihood ratio statistics.
Recall the decomposition of Wn in (2.7): Γn,n1 collects n1 eigenvectors corresponding to unit eigenvalues of Wn , which
′
′
are arranged as the first n1 columns of Γn . For the starred matrices [Γn∗1 ,n , Γn∗−n1 ,n ]′ = Γn−1 , Γn∗1 ,n represents the first
−1
n1 rows in Γn . As Wn ln = ln , without loss of generality, let Γn,n1 = (Γn,n0 , √1n ln ) where n0 = n1 − 1. Note that
Γn∗0 ,n Γn ϖn Γn−1
(
Γn∗0 ,n Γn,n0
0
1
)
, where Γn∗0 ,n is the first n0 rows of Γn∗1 ,n . Thus, Γn∗0 ,n Γn,n0 = In0 , Γn∗0 ,n ln = 0, and
)
(
1
′
0
In 1
= Γn∗0 ,n [Γn,n0 , √1n ln , Γn,n2 ]ϖn Γn−1 = [In0 , 0, 0]
Γn−1 = Γn∗0 ,n . Multiplying (Γn∗0 ,n Γn∗0 ,n )− 2 Γn∗0 ,n
0 ϖn2
In1 = Γn1 ,n Γn,n1 ≡
∗
0
from the left to the model, we can eliminate the time fixed effect and arrive at
′
◦
Yn◦0 m,t =Yn◦0 m,t −1 Hm0
+ Xn◦0 ,t Πkm0
+ Cn◦0 m0 + Vn◦0 m,t , or
′
′
′
′
′
′
′
◦
∆Yn◦0 m,t =αm0 βm0
Yn◦0 m,t −1 + Πkm0
Xn◦0 ,t + Cn◦0 m0 + Vn◦0 m,t ,
′
1
′
1
′
1
where Yn◦0 m,t = (Γn∗0 ,n Γn∗0 ,n )− 2 Γn∗0 ,n Ynm,t , Vn◦0 m,t = (Γn∗0 ,n Γn∗0 ,n )− 2 Γn∗0 ,n Vnm,t (Im − Ψm0 )−1 , Xn◦0 ,t = (Γn∗0 ,n Γn∗0 ,n )− 2 Γn∗0 ,n Xnk,t ,
∗′ − 21
n0 ,n )
◦
Cn◦0 ,t = (Γn∗0 ,n Γ
Γn∗0 ,n Cnm0 (Im − Ψm0 )−1 , and Πkm0
= Πkm0 (Im − Ψm0 )−1 . The subsystem captures all possible unstable
components with time dummies eliminated by perpendicularity of ln and the rows of Γn∗0 ,n . Here αm0 and βm0 are
′
m × (m − m1 ) matrices such that Hm0 − Im = αm0 βm0
, as m − m1 is the rank of Hm0 − Im .22
′
◦
◦′
◦
◦′
◦
Let yn0 ,t = vec(Yn0 m,t ), vn0 ,t = vec(Vn0 m,t ), and cn0 = vec(Cn◦0 m ). Define Σum0 as the covariance matrix of any row of
Vnm,t (Im − Ψm0 )−1 . We have cov(v◦n0 ,t ) = In0 ⊗ Σum0 . The log likelihood function for the subsystem y◦n0 ,t with parameters
21 Note that the testing procedure does not apply to cases PU or VC. If we want to test for these cases, we should first implement t test for
Φm0 + Ψm0 = 0.
22 The representation was introduced by Engle and Granger (1987). The matrices α and β are not unique. Normalization is needed in order
m0
m0
to estimate them; see Lütkepohl (2005, Chapter 7) for possible normalization. However, the likelihood ratio statistic can be derived as long as they
are properly selected to maximize the log likelihood function.
K. Yang and L.-f. Lee / Journal of Econometrics 221 (2021) 337–367
349
◦
βm , αm , Πkm
, Σum is
◦
ln0 T (βm , αm , Πkm
, Σum ) = constant −
n0 T
2
ln |Σum | −
T
1∑
2
)
(
[∆y◦n0 ,t − In0 ⊗ (αm βm′ ) y◦n0 ,t −1 − c◦n0
t =1
)
(
◦′
′
◦′ ′
−1
)].
) y◦n0 ,t −1 − c◦n0 − (Xn◦0 ,t ⊗ Im )vec(Πkm
)[∆y◦n0 ,t − In0 ⊗ (αm βm
)] (In0 ⊗ Σum
− (Xn◦0 ,t ⊗ Im )vec(Πkm
Now we concentrate out parameters:
1. Concentrating ĉ◦n0 . The fixed effects can be concentrated out by ĉ◦n0 =
∑T
1
T
(
) ( 1 ∑T −1 ◦ )
◦
′
∆
y
−
I
⊗
(
α
β
)
n
m
n
,
t
m
t =1
t =0 yn0 ,t
0
T
0
∑T
′
◦
⊗ Im )vec(Πkm0
).
∑T
∑T −1 ◦
1
◦
◦′
◦
◦
˜◦
2. Concentrating vec(Πkm ). Define ∆ỹ◦n0 ,t = ∆y◦n0 ,t − T1
t =1 ∆yn0 ,t , ỹn0 ,t −1 = yn0 ,t −1 − T
t =0 yn0 ,t , and X̃n0 ,t =
∑
′
′
′
′
′
′
T
◦
◦
◦
◦
′
′
◦
′ ˜◦
◦
◦
˜◦
˜◦
Xn◦0 ,t − T1
t =1 Xn0 ,t . Let X̃ = [X̃n0 ,1 , . . . , X̃n0 ,T ] , ỹ−1 = [ỹn0 ,0 , . . . , ỹn0 ,T −1 ] , and ∆ỹ = [∆ỹn0 ,1 , . . . , ∆ỹn0 ,T ] . We
′
′
′
′
ˆ
◦
◦
′
can concentrate out vec(Πkm
) by vec(Πkm
) = [(X̃◦ X̃◦ ) ⊗ Im ]−1 (X̃◦ ⊗ Im )[∆ỹ◦ − [In0 T ⊗ (αm βm
)]ỹ˜ ◦−1 ].
M
◦ ◦′ ◦ −1 ◦′
M
◦
˜
3. Concentrating Σum . Let Mx = In T − X̃ (X̃ X̃ ) X̃ , ∆ỹ = (Mx ⊗ Im )∆ỹ , and ỹ = (Mx ⊗ Im )ỹ˜ ◦ . We can reshape
−
1
T
◦
t =1 (Xn0 ,t
−1
0
−1
M
M
M
the mn0 T × 1 vector ∆ỹM as an m × n0 T matrix ∆ỸM = [∆ỹM
1,1 , ∆ỹ2,1 , . . . , ∆ỹn ,1 , . . . , ∆ỹn ,T ], and reshape the
0
0
˜M
˜M ˜M
˜M
˜M
vector ỹ˜ M
−1 as m × n0 T matrix Ỹ−1 = [ỹ1,0 , ỹ2,0 , . . . , ỹn ,0 , . . . , ỹn ,T −1 ]. We concentrate out the entries in covariance
matrix Σum by Σ̂um (αm , βm ) =
1
n0 T
˜
0
0
˜
′ M
′ M ′
(∆ỸM − αm βm
Ỹ−1 )(∆ỸM − αm βm
Ỹ−1 ) .
Therefore, the concentrated log likelihood function is
ln0 T (αm , βm ) = constant −
n0 T
2
ln |Σ̂um (αm , βm )|.
(4.1)
˜
′
˜
˜
′
′ M M
−1
To maximize the log likelihood function, given βm , αm (βm ) = (∆ỸM ỸM
−1 βm )(βm Ỹ−1 Ỹ−1 βm ) . Define
S00 =
1
′
n0 T
∆ỸM ∆ỸM , S10 =
1 ˜M
′
Ỹ−1 ∆ỸM , and S11 =
n0 T
1 ˜ M ˜ M′
Ỹ−1 Ỹ−1 .
n0 T
−1
And λ1 , λ2 , . . . , λm are solutions of |λS11 − S10 S00
S01 | = 0 with λ1 ≥ λ2 ≥ · · · ≥ λm . Substitute αm in the log likelihood
function, Similarly to Johansen (1988, pp. 235) we get concentrated log likelihood function.23
ln0 T (βm ) =constant −
=constant −
n0 T
2
n0 T
2
′
′
ln |S00 − S01 βm (βm
S11 βm )−1 βm
S10 |
−1
′
[ln |S00 | + ln |βm′ (S11 − S10 S00
S01 )βm | − ln |βm
S11 βm |].
Using Proposition A.7 of Lütkepohl (2005), under H0 : rank(Hm0 − Im ) = m − m1 , the concentrated log likelihood function
ln0 T (βm ) is maximized by the choice β̃m being the first m − m1 eigenvectors in the eigenvector matrix V̂m of the equation
−1
|λS11 − S10 S00
S01 | = 0 with λ1 ≥ λ2 ≥ · · · ≥ λm normed by V̂m′ S11 V̂m = Im . The maximized likelihood function under the
condition that rank(Hm0 − Im ) = m − m1 is
lmax = constant −
n0 T
2
m−m1
(ln |S00 | +
∑
ln |1 − λp |).
p=1
Therefore, the likelihood ratio test statistics against H1 : m − m1 < rank(Hm0 − Im ) ≤ m is
LRn0 T (m − m1 , m) = n0 T
m
∑
p=m−m1 +1
ln |1 − λp | = n0 T
m
∑
λp + op (1).
p=m−m1 +1
Under H0 , its asymptotic distribution is in Proposition 4.1.
We run a sequential hypothesis testing procedure by m1 = m, m − 1, . . . , 1 for a sequence of null hypothesis from
H0 : rank(Hm0 − Im ) = 0 to H0 : rank(Hm0 − Im ) = m − 1. The inference follows:
1. If H0 : rank(Hm0 − Im ) = 0 cannot be rejected, we conclude that the model exhibits spatial cointegration with
cointegration rank (n − n1 )m (case SC);
23 The second equality is due to
⏐
⏐ S00
⏐ ′
⏐ βm S10
.
S01 βm ⏐⏐
−1
′
′
′
= |S00 ∥ βm′ (S11 − S10 S00
S01 )βm | = |βm
S11 βm ∥ S00 − S01 βm (βm
S11 βm )−1 βm
S10 |.
βm′ S11 βm ⏐
⏐
350
K. Yang and L.-f. Lee / Journal of Econometrics 221 (2021) 337–367
2. If all H0′ s : rank(Hm0 − Im ) = j for j = 0, 1, . . . , m − m1 − 1 are rejected and H0 : rank(Hm0 − Im ) = m − m1
cannot be rejected, we conclude that the model exhibits mixed cointegration with cointegration rank nm − n1 m1
(case MC);
3. If all H0′ s : rank(Hm0 − Im ) = j for j = 0, 1, . . . , m − 1 are rejected, we conclude that the model is stable (case S).
We make the following additional assumptions in order to derive the asymptotic distribution of the likelihood ratio
′
L with L being the lag operator and χ (L) = (ϑ (L) − ϑ (1))/(1 − L). Also,
test statistics. Define ϑ (L) = (1 − L)Im − αm0 βm0
′
′
−1 ′
B = β⊥m0 (α⊥m0 χ (1)β⊥m0 ) α⊥m0 and τn0 ≡ BCn◦ m0 , where β⊥m0 and α⊥m0 are m × m1 matrices with full column rank
0
′
′
such that βm0 β⊥m0 = αm0 α⊥m0 = 0(m−m1 )×m1 .
Assumption 4.1.
limn0 →∞
1
n0
β⊥′ m0 τn0 τn′ 0 β⊥m0 exists and has full rank. n0 is monotonically increasing with T .
This assumption is crucial to derive the asymptotic distribution for our test statistics. The hypothesis test for
cointegration ranks for panel data models developed in this paper differs from those in the literature. Larsson et al. (2001)
consider the heterogeneous vector error correction models. Breitung (2005) studies a cointegration rank test method for
panel vector autoregression, which allows individual specific αm0 and uses the cross-sectional average of statistics for all
time series in inference. Both papers assume independent distributed disturbances. Our test focuses on the asymptotic
distribution of the statistic with individual fixed effects and exogenous variables presented, which can be applied to
SVAR models. The extension of Assumption 4.1 with exogenous regressors is in the following Assumption 4.2, which also
guarantees that the time trend terms dominate the unit root terms. The only difference between Assumptions 4.1 and 4.2
is that the exogenous variables can also generate deterministic trend, in addition to individual dummies.
Assumption 4.2.
(1) Xn◦0 ,t = Xnc0 + Xnu0 ,t , where Xnc0 is the nonstochastic mean and xun
uniformly bounded partial sums. Let τ
x
n0
(i) |
∑t
h=1
c′
km0 Xn0 .
= τn0 + BΠ
′
0 ,i,t
, a 1 × k vector as the ith row of matrix Xnu0 ,t has
Specifically,
′
′
C (L)xun
0 ,i,h
◦′
◦′
◦
◦
, and
, B(L)Πkm0
| < Dx < +∞ for a constant Dx uniformly for i, t, and C (L) = Πkm0
B1 (L)Πkm0 24 ;
∑T
(ii) | h=1 Th xun ,i,h | < Dx < +∞ for a constant Dx uniformly for i, T ;
0
1
◦
◦′
u′
u
′
t =1 B(L)Πkm0 Xn0 ,t Xn0 ,t Πkm0 B(L) , n0 T
′
′
′
′
′
T
T
1
1
1
u
u
x
x
(L)Xnu ,t −1 Xnu0 ,t ,
3
3
t =1 t X̃n0 ,t −1 n0 ,
t =1 tB1 (L)X̃n0 ,t −1 n0 , and
0
n0 T 2
n0 T 2
n0 T
T
u′
u
Furthermore, limT →∞ n1T
t =1 Xn0 ,t Xn0 ,t exists and is nonsingular.
0
(iii) When T → ∞, the limits of
1
n0 T
∑
∑T
τ
∑T
τ
∑
t =1
3
2
′
′
′
(∑t
◦
Xnu0 ,t Xnu0 ,t ,
B(L)Πkm0
∑T
t =1
X̃nu
0 ,t −1
s=1
1
n0 T
∑T
B
) t =1 1
Vn◦0 m,s exist.
∑
(2) n0 is monotonically increasing with T . limn0 →∞
[
1
n0
]
′
β⊥′ m0 τnx0 τnx0 β⊥m0 exists and has full rank.
Assumption 4.2 explicitly assumes that the exogenous variables are composed of a nonstochastic mean and a
partially bounded term, which can simplify asymptotic analysis. Assumption 4.2(i) and (ii) state that processes related to
component Xnu0 ,t have bounded partial sums, which rules out time trends generated by Xnu0 ,t . Assumption 4.2(iii) regulates
the limits of ‘‘second moments’’ of Xnu0 ,t . For time series literature, the VAR model in Johansen (1991) also includes seasonal
dummies as time-varying variables with their partial sums remaining bounded. Here we need both n0 and T tend to
infinity simultaneously. Adopting assumptions in Quah (1994) and Levin et al. (2002), we assume that n0 is monotonically
increasing with T .25 The following proposition presents asymptotic properties of test statistics.
Proposition 4.1. Under Assumptions 2.1, 2.2(i), and 3.1(i)–(iii), for the stable case defined in Assumption 2.2(ii) or the unstable
cases defined in Assumption 2.2(iii) or (iv),
′
(1) with Assumption 4.1, for the model without exogenous variables, i.e. Yn◦0 m,t = Yn◦ m,t −1 Hm0
+ Cn◦
+ Vn◦0 m,t , and
= Yn0 m,t −1 Hm0 + Xn0 ,t Πkm0 + Cn◦0 m0 + Vn◦0 m,t ,
0 m0
0
◦
(2) with Assumption 4.2, for the model with exogenous variables, i.e. Yn0 m,t
under H0 : rank(Hm0 − Im ) = m − m1 for m1 = 1, 2, . . . , m, as limT →∞
n
T
◦
′
◦
◦
= M < +∞, the likelihood ratio test statistics
d
LRn0 T (m − m1 , m) → Tr[(Wm1 + Bm1 )(Wm1 + Bm1 )′ ],
where Wm1 is an m1 × m1 matrix with each entry being i.i.d. standard normally distributed random variable and Bm1 =
(
(
√
′
− 3M α⊥
m0 limn0 →0
1 x x′
n0 n0 n0
τ τ
)
α⊥m0
)−1/2
′
1/2 26
(α ⊥
.
m0 Σum0 α⊥m0 )
24 The definition and expression of B(L) and B (L) are in the Appendix.
1
25 With Assumption 2.1(iii), n is propositional with n.
0
26 Note that τ x = τ when there is no exogenous variables or X c Π ◦ B′ = 0.
n0
n0
n0
km0
K. Yang and L.-f. Lee / Journal of Econometrics 221 (2021) 337–367
351
The likelihood ratio test statistic has chi-square distribution when n/T → 0 while it is noncentral chi-square
distributed with n going to infinity proportionally to T . Tr[(Wm1 + Bm1 )(Wm1 + Bm1 )′ ] has the same distribution as
Tr[(Wm1 + B∗m1 )(Wm1 + B∗m1 )′ ] with B∗m1 = Rm1 Bm1 R′m1 where Rm1 is a generic m1 × m1 orthonormal matrix. For
p
′
′
′
inference, we need to estimate consistently the term Bm1 . As α⊥
m0 S00 α⊥m0 → α⊥m0 Σum0 α⊥m0 , α⊥m0
′
α⊥
m0
(
′
1
(Cn◦ m0
n0
0
(
1 x x′
n0 n0 n0
τ τ
)
α⊥m0 =
)
◦′
◦
c′
km0 Xn0 )(Cn0 m0
◦
+ Πkm0
Xnc0 ) α⊥m0 , and the latter term can be estimated with any consistent estimators
∑ T ( ◦′
′
′
xˆ
,◦′
x,◦
◦ 27
◦′
◦
′
αˆ
The consistent estimator for n1 (Cn◦ m0 + Πkm0
Xnc0 )(Cn◦ m0 + Πkm0
Xnc0 ) is n1 Cn m0 Cn m0 = n 1T 2
m βm and Π̂km .
t =1 Yn0 m,t −
0
0
0
0
0
0
0
) ∑T ( ◦′
)′
◦′
◦′ ◦′
◦′
◦′ ◦′
′
′
ˆ
ˆ
αm βm Yn0 m,t −1 − Π̂km X̃n0 ,t
t =1 Yn0 m,t − αm βm Yn0 m,t −1 − Π̂km X̃n0 ,t . The only task left is to estimate α⊥m0 . Since
′
α⊥m0 is not unique, with restriction α⊥
m0 α⊥m0 = Im1 , it can be identified up to transformation by an orthonormal
+Π
−1
−1
−1
1
−1
−2
′
matrix. We estimate α̃⊥m = S00
S01 S112 V̂m,m1 (V̂m′ ,m1 S112 S10 S00
α̃⊥m =
S01 S112 V̂m,m1 )− 2 since this estimator satisfies α̃m
[(S01 β̃m )(β̃m′ S11 β̃m )−1 ]′ α̃⊥m = 0, where V̂m,m1 represents the matrix of eigenvectors corresponding to the last m1
−1
−1
−1
eigenvalues λm−m1 +1 ≥ · · · ≥ λm for |λIm − S112 S10 S00
S01 S112 | = 0.
Lemma 4.1 (Consistency of Estimator α̃⊥m ). Under the null hypothesis and Assumptions 2.1, 2.2(i), and 3.1(i)–(iii), in addition,
for the stable case defined in Assumption 2.2(ii) or the unstable cases defined in Assumption 2.2(iii) or (iv), with Assumptions 4.1
′
′
or 4.2, α̃⊥m is a consistent estimator for a basis of the null space of αm0
, i.e., αm0
α̃⊥m = op (1).
The above lemma allows us to use the plug-in method to estimate Bm1 and simulate critical values for hypothesis
testing. We propose an estimator of the limiting distribution of the LR statistics, i.e., the cumulative distribution function
FLR (c) = Pr(Tr[(Wm1 + Bm1 )(Wm1 + Bm1 )′ ] < c), as F̃LR (c) = Pr(Tr[(Wm1 + B̃m1 )(Wm1 + B̃m1 )′ ] < c), where B̃m1 =
)−1/2
)
(
(
√
x,◦
,◦′
1 xˆ
′
′
1/2
− 3M α̃⊥
(α̃⊥
. Note that we treat M = 0 when we assume
α̃
C
C
⊥
m
m S00 α̃⊥m )
m
n0 m0 n0 m0
n
0
n0
T
→ 0. The
asymptotic distribution of the test statistic is pivotal with relatively small n0 while it is non-pivotal with proportional
n0 and T . Hence, we define the critical region of the LR test with asymptotic size α as {LRn0 T (m − m1 , m) ≤ c̃1−α },
where c̃1−α is the (1 − α ) × 100 percentile of the cumulative distribution function F̃LR (·). Corollary 1(i) points out that
F̃LR (c) → FLR (c), which implies that the critical region define by c̃1−α is asymptotically equivalent to the critical region
defined by c1−α . Monte Carlo experiments show the estimated CDF and critical regions are precise in finite sample.
Moreover, Corollary 1(ii) states that the LR test for the null hypothesis H0 against the alternative hypothesis H1 with
critical region {LRn0 T (m − m1 , m) ≤ c̃1−α } is consistent.
Corollary 1. Under Assumptions 2.1, 2.2(i), and 3.1(i)–(iii), in addition to H0 , for the stable case defined in Assumption 2.2(ii)
or the unstable cases defined in Assumption 2.2(iii) or (iv), with Assumption 4.1 or 4.2,
(i) (Estimation of the limiting distribution of the LR statistics) F̃LR (c) − FLR (c) = o(1) for each c;
(ii) (Consistency of the critical region of the LR statistics) the LR test for the null hypothesis H0 : rank(Hm0 − Im ) = m − m1
against the alternative hypothesis H1 : rank(Hm0 − Im ) > m − m1 with critical regions defined by c̃1−α is consistent.
The above LR test procedure would not weakly consistently estimate the rank if type-I error were fixed with respect to
all sample sizes. However, letting O(1) < c̃nT ,1−αnT = o(nT ), we can consistently estimate the rank of cointegration since
the type-I error disappears asymptotically while the critical regions defined by c̃nT ,1−αnT is still consistent. Consistency of
estimators for matrix ranks is discussed in literature, for instance, Cragg and Donald (1997) and Robin and Smith (2000).
Monte Carlo simulations reveal finite sample performances of the proposed LR test statistics. Derivations and proofs
of this proposition are presented in the Appendix and a supplementary file.
5. Monte Carlo experiments
We conduct Monte Carlo experiments to investigate small sample properties of QMLEs and test statistics for cointegration ranks for the SVAR model with disturbances with normal or non-normal distributions. We also compare the QMLE
with other estimators such as those of 2SLS, 3SLS, and misspecified single equation QML. Queen spatial weights matrices
Wn are usually employed in the literature so we use a block diagonal matrix formed by a row-normalized queen matrix.28
27 For example, for each time series, unrestricted OLS estimators for both matrices are consistent. (see Lütkepohl, 2005).
28 Queen matrix means any pair of spatial units which share a border or a single common point are neighbors. Here, each block contains 9 spatial
units that are arranged in a 3 × 3 rectangular. For example, spatial unit 1 is connected with units 2, 4 and 5, while unit 2 is connected with units
1, 3, 4, 5, and 6. With more units in a block, we need a larger sample size to increase the number of blocks. The simulation conclusions are not
sensitive with the size of blocks (for example, the block size can be 9 or 16 as in Yu et al., 2012.). With a block diagonal structure of Wn , the
number of unit eigenvalues increases with n. For example, there are 4 unit eigenvalues of Wn when n = 36, and 8 when n = 72. For additional
simulations, other forms for the spatial weights matrices are allowed.
352
K. Yang and L.-f. Lee / Journal of Econometrics 221 (2021) 337–367
Table 5.1
QML estimation for case S.
Method
QML(Stable)
QML(Stable)
QML(Stable)
QML(Stable)
n, T
36, 30
36, 60
72, 30
72, 60
θ
θc
θ
θc
θ
θc
θ
θc
π11
Bias
S.D.
CP
0.001
(0.038)
0.940
0.000
(0.038)
0.946
0.000
(0.028)
0.940
0.000
(0.028)
0.942
0.002
(0.026)
0.962
0.001
(0.026)
0.970
0.003
(0.019)
0.926
0.003
(0.019)
0.932
π21
Bias
S.D.
CP
0.000
(0.041)
0.938
0.001
(0.041)
0.942
0.001
(0.029)
0.934
0.000
(0.029)
0.940
0.003
(0.028)
0.948
0.002
(0.028)
0.952
0.000
(0.019)
0.952
0.000
(0.019)
0.956
p11
Bias
S.D.
CP
0.026
(0.031)
0.868
0.007
(0.030)
0.956
0.013
(0.021)
0.914
0.003
(0.021)
0.942
0.026
(0.023)
0.760
0.007
(0.023)
0.924
0.013
(0.015)
0.862
0.004
(0.015)
0.948
p21
Bias
S.D.
CP
−0.013
(0.029)
0.924
−0.010
(0.029)
0.948
−0.006
−0.004
−0.006
(0.020)
0.952
−0.013
(0.022)
0.890
−0.009
(0.020)
0.948
(0.022)
0.918
(0.015)
0.932
−0.004
(0.015)
0.948
Bias
S.D.
CP
0.000
(0.094)
0.940
−0.006
(0.096)
0.938
0.000
(0.067)
0.938
−0.002
0.002
(0.067)
0.936
−0.003
(0.068)
0.936
0.002
(0.046)
0.956
0.000
(0.046)
0.952
Bias
S.D.
CP
−0.005
(0.087)
0.948
0.007
(0.087)
0.954
−0.003
0.002
(0.057)
0.964
−0.004
(0.061)
0.922
0.005
(0.061)
0.942
−0.001
(0.056)
0.964
0.003
(0.043)
0.942
Bias
S.D.
CP
0.008
(0.091)
0.954
0.007
(0.091)
0.962
0.001
(0.065)
0.934
0.001
(0.065)
0.938
0.000
(0.060)
0.954
0.000
(0.060)
0.956
−0.002
(0.045)
0.950
−0.002
(0.045)
0.950
Bias
S.D.
CP
−0.002
(0.110)
0.934
−0.002
(0.110)
0.936
−0.004
−0.004
(0.079)
0.940
(0.080)
0.942
0.006
(0.078)
0.940
0.004
(0.078)
0.944
0.005
(0.054)
0.950
0.005
(0.054)
0.950
σ11
Bias
S.D.
CP
0.130
(0.041)
0.196
0.014
(0.046)
0.924
0.104
(0.030)
0.136
0.003
(0.034)
0.930
0.135
(0.029)
0.016
0.006
(0.034)
0.928
0.116
(0.021)
0.000
0.002
(0.024)
0.932
σ21
Bias
S.D.
CP
0.064
(0.032)
0.628
0.006
(0.036)
0.948
0.051
(0.024)
0.532
0.001
(0.027)
0.956
0.067
(0.025)
0.330
0.002
(0.029)
0.922
0.058
(0.017)
0.138
0.002
(0.019)
0.942
φ11
φ21
ψ11
ψ21
(0.068)
0.942
(0.043)
0.940
The SVAR model that we investigate is:
y1,t = ψ11 Wy1,t + ψ21 Wy2,t + p11 y1,t −1 + p21 y2,t −1 + φ11 Wy1,t −1 + φ21 Wy2,t −1 + x′t Π·1 + c1 + d1,t + v1,t ,
y2,t = ψ12 Wy1,t + ψ22 Wy2,t + p12 y1,t −1 + p22 y2,t −1 + φ12 Wy1,t −1 + φ22 Wy2,t −1 + x′t Π·2 + c2 + d2,t + v2,t ,
where W is an n × n matrix and other variables are column vectors of dimension n, but for simplicity the subscript
n is omitted.
(
) For this system, the disturbances v1,t and v2,t have mean 0 and, for each unit i, their covariance matrix
Σ = 01.5 01.5 . For the non-normal distribution, we first generate two uniformly distributed independent random variables
with mean zero
( and) variance one, then multiply the vector of the two uniform random variables by a constant matrix L,
where LL′ = 01.5 01.5 , so that the resulted disturbances v1,it and v2,it have the same variance–covariance matrix as in the
previous design. xt = [x′1,t , x′2,t ]′ , where for each i, xl,it for l = 1, 2 are also generated by U [0, 3]. cl and dl,t for l = 1, 2,
are respectively individual and time fixed effects. dl,t ’s generated by uniformly distributed random variables on [0, 1].
cl = cl,a + cl,b are the sum of two parts, where the first part consists of randomly generated positive integers with 5.5 as
its mean, the second part equals temporal means of x1,t for l = 1 and x2,t when l = 2. Such a design allows correlation of
the individual effects with regressors. Π·1 = [1, 0.5]′ and Π·2 = [0.5, 1]′ . The 2 × 2 parameter matrices Ψ , P and Φ may
be different for different models. In all the tables, we use θ to indicate the estimator of θ0 without bias correction while
θ c is for the bias corrected one.
(
)
( 0.1 −0.2 )
0.2
Finite )sample properties (of QMLEs.) For case S, Ψ = Φ = −00.2.2 −
−0.2 −0.2 . For case MC, Ψ =
−0.1 and P =
( 0.1.
0 .2
26 −0.28
P = Φ( = −00.2.2) −
−0.28 −0.16
−0.1 . Hence( there) is 1 unit eigenvalue for H20 with m = 2 and m1 = 1. For case
( 0.2and
)
SC, Ψ = 0.2 00..25 , P = −00.5.2 −00.2.2 , and Φ = 00.3 00.3 (. Hence there
eigenvalues
for H20 with
) are 2 (unit
)
( 0.2m −=
)m1 = 2.
−0.28
0.208 −0.204
0.2
We also consider an experiment for case E, Ψ = −00.26
,
P
=
,
and
Φ
=
.28 −0.16
−0.204 −0.098
−0.2 −0.1 . Hence
one eigenvalue is 1.0167, slightly larger than 1. We conduct these experiments for 500 repetitions with sample sizes
(n, T ) = (36, 30), (36, 60), (72, 30), and (72, 60).
Table 5.1 reports the QMLEs’ biases and standard deviations for the case S with normally distributed disturbances,
while Tables 5.3 and 5.2 are for cases SC and MC. We only show numerical results for the first equation to save pages.
The numerical results of the second equation have similar features. Biases of QMLEs for Π , Φ , and Ψ are small but those
K. Yang and L.-f. Lee / Journal of Econometrics 221 (2021) 337–367
353
Table 5.2
QML estimation for case MC.
Method
QML(MC)
QML(MC)
QML(MC)
QML(MC)
n, T
36, 30
36, 60
72, 30
72, 60
θ
θc
θ
θc
θ
θc
θ
θc
π11
Bias
S.D.
CP
0.002
(0.038)
0.946
0.001
(0.038)
0.948
0.000
(0.028)
0.936
0.000
(0.028)
0.938
0.002
(0.026)
0.958
0.001
(0.026)
0.964
0.003
(0.020)
0.928
0.003
(0.020)
0.930
π21
Bias
S.D.
CP
0.000
(0.041)
0.936
0.000
(0.041)
0.942
0.001
(0.029)
0.934
0.000
(0.029)
0.938
0.003
(0.029)
0.950
0.002
(0.029)
0.952
0.000
(0.019)
0.954
0.000
(0.019)
0.956
p11
Bias
S.D.
CP
0.029
(0.030)
0.836
0.010
(0.030)
0.946
0.014
(0.021)
0.902
0.005
(0.021)
0.944
0.028
(0.023)
0.736
0.010
(0.023)
0.918
0.014
(0.014)
0.846
0.005
(0.014)
0.938
p21
Bias
S.D.
CP
−0.015
(0.029)
0.920
−0.012
(0.029)
0.944
−0.006
−0.005
−0.006
(0.021)
0.952
−0.013
(0.022)
0.880
−0.011
(0.021)
0.942
(0.022)
0.906
(0.015)
0.922
−0.005
(0.015)
0.938
Bias
S.D.
CP
−0.002
(0.094)
0.938
−0.007
(0.096)
0.940
−0.001
−0.002
(0.068)
0.928
0.000
(0.067)
0.942
−0.003
(0.068)
0.928
0.001
(0.046)
0.954
−0.001
(0.046)
0.954
Bias
S.D.
CP
−0.005
(0.086)
0.950
0.004
(0.087)
0.942
−0.002
0.001
(0.056)
0.964
−0.004
(0.061)
0.934
0.003
(0.061)
0.944
−0.001
(0.056)
0.960
0.003
(0.042)
0.946
Bias
S.D.
CP
0.009
(0.090)
0.958
0.007
(0.090)
0.962
0.001
(0.064)
0.934
0.002
(0.064)
0.936
0.000
(0.059)
0.954
0.000
(0.059)
0.956
−0.002
(0.045)
0.950
−0.002
(0.045)
0.952
Bias
S.D.
CP
−0.003
(0.108)
0.934
−0.003
(0.108)
0.934
−0.004
−0.004
(0.078)
0.940
(0.078)
0.944
0.006
(0.077)
0.934
0.004
(0.077)
0.940
0.005
(0.053)
0.948
0.004
(0.053)
0.950
σ11
Bias
S.D.
CP
0.130
(0.041)
0.204
0.014
(0.046)
0.932
0.104
(0.031)
0.142
0.003
(0.034)
0.936
0.135
(0.030)
0.022
0.006
(0.034)
0.926
0.116
(0.021)
0.000
0.003
(0.024)
0.932
σ21
Bias
S.D.
CP
0.064
(0.032)
0.642
0.006
(0.037)
0.946
0.051
(0.024)
0.546
0.000
(0.027)
0.954
0.067
(0.025)
0.332
0.002
(0.029)
0.920
0.058
(0.017)
0.146
0.002
(0.020)
0.946
φ11
φ21
ψ11
ψ21
(0.068)
0.940
(0.042)
0.948
estimates of other parameters’ are not negligible. The biases decrease with larger n and/or T . The bias corrected estimates
for P and Σ can significantly reduce biases of their QMLEs. Standard deviations of estimates decrease at the rate of
√
nT . The 95% coverage probabilities (CP) of QMLEs are also provided. The QMLEs for σ ’s and p11 have a low coverage
probability. But the bias correction can increase the 95% coverage probabilities for almost all the estimates. The coverage
probabilities for bias-corrected estimates are around 95%. Theoretical standard deviations of all estimates, which are in
the supplementary file, are similar to the empirical standard deviations.
Table 5.4 reports numerical results of QMLEs with non-normal disturbances. The first three experiments are for case
SC and others are for S, MC, and Explosive cases. Their biases, standard deviations and bias corrected estimates exhibit
similar pattern with those of normally distributed disturbances.
2. Comparing the QMLE with other estimates. In addition to the QML estimation, the SVAR model can be estimated
by IV-based approaches. Using case S as an example, we compare the QMLE with the two stage least square estimator
(2SLSE) and the three stage least square estimator (3SLSE). When implementing those IV-based estimators, we use
(X , X−1 , WX , W 2 X , Y−2 , WY−2 , W 2 Y−1 ) as IVs, which are employed conventionally in the empirical literature. All the
parameters for data generation are the same as those of case S with the only exception that we let π12 = π21 = 0. For
single equation estimation method, if a researcher neglects the interaction among variables by introducing a misspecified
single equation SDPD model, a single equation QML method would be applied. For that, we investigate the scenario that
the model only contains the first (second) equation by treating Wy2,t (Wy1,t ) as if it were an exogenous contextual effect,
which is labeled as ‘‘Mis-QML’’.
Table 5.5 presents numerical results for comparing estimates. Overall, QMLEs with bias correction have advantages
both in bias reduction and efficiency gains. QMLEs for P usually have larger biases than those of 2SLS and 3SLS, but bias
corrected estimates can reduce those biases. The QMLEs for Ψ and Φ have much smaller biases than those of 2SLS or
3SLS. The QML estimation can reduce standard deviations of 2SLS/3SLS by 45% to 75% for some parameters in Π , Φ , and
Ψ . There are also non-negligible efficiency gains for estimates of P. The simulation results of Mis-QMLEs for Ψ and Φ
suggest inconsistency, since some of these estimates have very large biases.
3. Hypothesis testing. We investigate the test statistics on their size and power. For experiments, we use parameters
in the case MC with m = 2 and m1 = 1. Under a correctly specified H0 : rank(H20 − I2 ) = 1 (case MC), the test statistic
354
K. Yang and L.-f. Lee / Journal of Econometrics 221 (2021) 337–367
Table 5.3
QML estimation for case SC.
Method
QML(SC)
QML(SC)
QML(SC)
QML(SC)
n, T
36, 30
36, 60
72, 30
72, 60
θ
θc
θ
θc
θ
θc
θ
θc
π11
Bias
S.D.
CP
0.004
(0.038)
0.942
0.002
(0.038)
0.948
0.000
(0.028)
0.948
0.000
(0.028)
0.950
0.003
(0.026)
0.958
0.002
(0.026)
0.958
0.003
(0.020)
0.918
0.003
(0.020)
0.920
π21
Bias
S.D.
CP
0.000
(0.039)
0.936
0.000
(0.039)
0.938
0.001
(0.027)
0.942
0.001
(0.027)
0.944
0.003
(0.027)
0.948
0.002
(0.027)
0.952
−0.001
(0.019)
0.944
−0.001
(0.019)
0.946
p11
Bias
S.D.
CP
0.039
(0.027)
0.682
0.023
(0.026)
0.882
0.018
(0.018)
0.818
0.011
(0.018)
0.906
0.037
(0.019)
0.476
0.022
(0.019)
0.782
0.018
(0.013)
0.730
0.011
(0.012)
0.868
p21
Bias
S.D.
CP
−0.019
(0.027)
0.878
−0.019
(0.027)
0.886
−0.008
−0.008
−0.009
(0.019)
0.932
−0.017
(0.021)
0.834
−0.017
(0.019)
0.934
(0.021)
0.842
(0.014)
0.898
−0.009
(0.014)
0.900
φ11
Bias
S.D.
CP
0.006
(0.096)
0.934
0.003
(0.097)
0.936
0.003
(0.068)
0.944
0.002
(0.068)
0.944
0.007
(0.067)
0.934
0.005
(0.067)
0.936
0.004
(0.047)
0.942
0.003
(0.047)
0.944
φ21
Bias
S.D.
CP
−0.009
(0.089)
0.932
−0.003
(0.089)
0.940
−0.003
−0.001
−0.002
(0.061)
0.948
−0.007
(0.062)
0.942
−0.002
(0.061)
0.946
(0.062)
0.956
(0.042)
0.944
0.000
(0.042)
0.940
Bias
S.D.
CP
0.010
(0.093)
0.948
0.008
(0.094)
0.954
0.002
(0.065)
0.938
0.002
(0.065)
0.940
0.000
(0.059)
0.958
0.000
(0.059)
0.962
−0.002
(0.045)
0.944
−0.002
(0.045)
0.944
Bias
S.D.
CP
−0.004
(0.111)
0.936
−0.002
(0.111)
0.942
−0.004
−0.005
(0.080)
0.934
(0.080)
0.936
0.007
(0.077)
0.942
0.007
(0.077)
0.950
0.007
(0.054)
0.950
0.007
(0.054)
0.954
σ11
Bias
S.D.
CP
0.131
(0.042)
0.192
0.016
(0.047)
0.920
0.105
(0.031)
0.128
0.004
(0.034)
0.940
0.135
(0.029)
0.012
0.006
(0.033)
0.936
0.116
(0.021)
0.000
0.002
(0.024)
0.926
σ21
Bias
S.D.
CP
0.064
(0.031)
0.604
0.006
(0.035)
0.946
0.052
(0.022)
0.480
0.001
(0.025)
0.958
0.066
(0.024)
0.286
0.001
(0.027)
0.934
0.058
(0.017)
0.114
0.001
(0.019)
0.932
ψ11
ψ21
converges to a noncentral χ 2 (1) distribution. The results are in Table 5.6. For the model without exogenous variables, with
smaller sample sizes (n, T ), a couple of exceptions occur with large empirical sizes. When the sample sizes are moderate,
no matter with normally or non-normally distributed disturbances, empirical sizes are close to but a little higher than
0.05. For the model with exogenous variables, the empirical sizes are close to but a little higher than 0.05, even with
smaller sample sizes.
In Fig. 1 we show the true asymptotic CDF (with Bm1 ), the simulated asymptotic CDF (with B̃m1 ), and the empirical
CDF of the test statistic when (n, T ) = (90, 120). The true asymptotic CDF is a noncentral χ 2 (1) CDF with true parameters
while the simulated asymptotic CDF is a noncentral χ 2 (1) CDF with estimated parameters. The figure shows that these
three CDFs are very close, which implies the proposed asymptotic CDFs have good approximation.
We investigate the power under the alternative hypothesis rank(H20 − I2 ) = 2 (case S) while the null hypothesis is
rank(H20 − I2 ) = 1. We study the power function by applying the parameters of case MC but slightly reducing the unit
eigenvalue of H20 by √c . Therefore, the true model is in the case S while we conduct the test as if it were in the case MC.
nT
This setting allows the study of the power against alternatives (case S) which are close to the null hypothesis (case MC)
and they are of empirical interest since it is important to distinguish these two cases. We let c = 0.1, 0.2, . . . , 0.9 and see
how the rejection rate evolves when (n, T ) = (54, 30), (90, 30) and (90, 60) in Table 5.7 and Fig. 2. We use the uniformly
distributed disturbances as defined in Monte Carlos for estimation and run 1000 repetitions for each experiment. The
results show that the power increases with c for all designs. When c is larger, which implies that the two cases are
separable, the rejection rate is close to one. The graph shows that with increasing c, the power for test will increase for all
designs with and without X . Comparing (n, T ) = (54, 30) and (n, T ) = (90, 30), with bigger n, the rejection rates increase
similarly at smaller c’s but converge to the value 1 faster eventually. Comparing (n, T ) = (90, 30) and (n, T ) = (90, 60),
with bigger T , the rejection rates increase faster at smaller c’s. Comparing (n, T ) = (54, 30) and (n, T ) = (90, 60), with
bigger n and T , the rejection rates converge to 1 generally faster with increasing c’s.
K. Yang and L.-f. Lee / Journal of Econometrics 221 (2021) 337–367
355
Table 5.4
QML estimation for S, MC, SC, and Explosive cases.
Method
QML(SC)
QML(SC)
QML(SC)
QML(MC)
QML(Stable)
QML(Explosive)
n, T
36, 30
36, 60
72, 30
36, 30
36, 30
36, 30
θ
θc
θ
θc
θ
θc
θ
θc
θ
θc
θ
θc
π11
Bias
S.D.
CP
0.006
(0.040)
0.934
0.004
(0.040)
0.954
0.002
(0.026)
0.944
0.003
(0.026)
0.958
0.002
(0.027)
0.940
0.000
(0.027)
0.956
0.003
(0.039)
0.944
0.002
(0.039)
0.960
0.003
(0.039)
0.942
0.002
(0.039)
0.962
0.003
(0.039)
0.944
0.002
(0.039)
0.960
π21
Bias
S.D.
CP
0.000
(0.040)
0.940
0.000
(0.040)
0.958
0.001
(0.027)
0.938
0.001
(0.027)
0.950
0.001
(0.029)
0.938
0.000
(0.029)
0.952
−0.001
0.000
(0.042)
0.962
−0.001
(0.042)
0.940
0.000
(0.042)
0.960
−0.001
(0.042)
0.940
(0.042)
0.940
0.000
(0.042)
0.962
p11
Bias
S.D.
CP
0.039
(0.028)
0.682
0.024
(0.028)
0.840
0.017
(0.019)
0.828
0.010
(0.018)
0.888
0.035
(0.019)
0.510
0.020
(0.019)
0.812
0.029
(0.032)
0.826
0.011
(0.032)
0.930
0.027
(0.033)
0.856
0.007
(0.033)
0.932
0.030
(0.032)
0.824
0.011
(0.032)
0.928
p21
Bias
S.D.
CP
−0.018
−0.018
(0.029)
0.872
−0.008
(0.019)
0.896
−0.009
−0.016
−0.017
−0.015
−0.012
(0.019)
0.850
(0.031)
0.892
−0.014
(0.031)
0.900
−0.015
(0.020)
0.848
−0.012
(0.031)
0.910
−0.010
(0.019)
0.898
(0.031)
0.926
(0.031)
0.890
(0.031)
0.904
Bias
S.D.
CP
−0.001
−0.004
(0.095)
0.958
0.001
(0.063)
0.954
0.000
(0.063)
0.958
0.002
(0.067)
0.936
0.000
(0.067)
0.938
−0.009
−0.014
(0.095)
0.948
−0.006
(0.094)
0.944
−0.013
−0.009
−0.014
(0.095)
0.942
(0.094)
0.950
(0.095)
0.948
Bias
S.D.
CP
−0.005
0.000
(0.085)
0.952
−0.001
(0.059)
0.936
0.001
(0.059)
0.936
−0.003
(0.084)
0.946
0.002
(0.084)
0.948
−0.009
(0.085)
0.940
0.003
(0.085)
0.948
−0.007
(0.061)
0.936
0.002
(0.061)
0.952
−0.007
(0.085)
0.950
(0.084)
0.944
0.002
(0.084)
0.948
Bias
S.D.
CP
0.013
(0.085)
0.962
0.011
(0.085)
0.968
0.002
(0.060)
0.952
0.002
(0.060)
0.956
−0.001
−0.001
(0.065)
0.940
(0.065)
0.944
0.013
(0.085)
0.960
0.012
(0.085)
0.964
0.012
(0.086)
0.958
0.011
(0.086)
0.964
0.013
(0.085)
0.960
0.012
(0.085)
0.964
Bias
S.D.
CP
−0.009
−0.008
(0.106)
0.954
0.000
(0.076)
0.952
−0.001
0.004
(0.077)
0.950
−0.010
(0.108)
0.944
−0.011
−0.012
(0.107)
0.946
−0.012
(0.107)
0.954
−0.011
(0.076)
0.954
0.004
(0.077)
0.948
−0.011
(0.106)
0.948
(0.109)
0.954
(0.107)
0.946
(0.107)
0.954
σ11
Bias
S.D.
CP
0.044
(0.034)
0.890
0.010
(0.035)
0.984
0.020
(0.023)
0.964
0.003
(0.024)
0.992
0.041
(0.025)
0.820
0.007
(0.026)
0.988
0.041
(0.036)
0.906
0.007
(0.037)
0.988
0.042
(0.035)
0.906
0.007
(0.036)
0.990
0.041
(0.036)
0.906
0.007
(0.037)
0.988
σ21
Bias
S.D.
CP
0.019
(0.032)
0.944
0.002
(0.034)
0.980
0.010
(0.023)
0.958
0.001
(0.024)
0.974
0.021
(0.022)
0.920
0.004
(0.023)
0.982
0.018
(0.035)
0.946
0.001
(0.037)
0.966
0.018
(0.035)
0.948
0.001
(0.036)
0.964
0.018
(0.035)
0.946
0.001
(0.037)
0.966
φ11
φ21
ψ11
ψ21
(0.030)
0.860
(0.095)
0.954
(0.094)
0.948
6. Empirical application: Grain market integration of Yangtze River Basin
In this section we provide an empirical example on the grain market integration in the 18th century China as an
application of the SVAR model. Interregional trade is considered as a key condition for economic development, especially
for industrialization since it brings in spillover of information and technology. Keller and Shiue (2007) point out that
analyzing the evolution of interregional trade in ancient China ‘‘can provide valuable insights on comparative economic
development in China and elsewhere’’. Shiue (2002), Keller and Shiue (2007) and Yan and Liu (2011) indicate that the
southern China, including Yangtze River Basin, has demonstrated a high level of market integration in grain market. Keller
and Shiue (2007) use univariate SDPD models to show spatial features have shaped the expansion of interregional trade.
Since in the mid-Qing dynasty (18th century) the most important trading goods are grains, all the above researches focus
on rice prices in difference prefectures.
However, other grains, especially wheat, is not negligible in studying the grain market integration. Rice was the primary
food in Yangtze River Basin (Huang, 2009), for example, the percentage of rice consumption in 9 prefectures on the south
of Yangtze River counted for about 93% of grain consumption in the year 1776. However, wheat still counted for about
7% of consumption in those prefectures. A SVAR model including both rice and wheat prices might take their substitution
effects into account in a complete analysis of grain market.
Therefore, we add an additional equation of wheat price to the model of Keller and Shiue (2007). The equation system
describes spatial and intertemporal relationships between the wheat and rice prices in a prefecture, and rice and wheat
prices of its neighbors in current and last periods, in addition to the own rice and wheat prices in the prefecture from last
period. This model could capture spatial effects, diffusion and time lagged effects of prices, which may provide evidence
in favor of or reject the grain market integration hypothesis.
6.1. Data
The data come from a historical archive, ‘‘Gongzhong Liangjian Dan’’, translated as ‘‘Grain Price Lists in the Palace’’. We
collect the data of wheat prices from the electronic version from the Database of Grain Prices in Qing Dynasty, maintained
356
K. Yang and L.-f. Lee / Journal of Econometrics 221 (2021) 337–367
Table 5.5
Estimation for case S.
π11
π22
p11
p21
p12
p22
φ11
φ21
φ12
φ22
ψ11
ψ21
ψ12
ψ22
Method
QMLE(S)
2SLS
n, T
36, 30
36, 30
Bias
S.D.
Bias
S.D.
Bias
S.D.
Bias
S.D.
Bias
S.D.
Bias
S.D.
Bias
S.D.
Bias
S.D.
Bias
S.D.
Bias
S.D.
Bias
S.D.
Bias
S.D.
Bias
S.D.
Bias
S.D.
3SLS
Mis-QMLE
36, 30
θ
θc
θ
θ
θ
θc
0.002
(0.038)
0.000
(0.040)
0.020
(0.026)
−0.003
(0.025)
0.002
(0.028)
0.012
(0.025)
0.000
(0.078)
−0.003
(0.078)
0.002
(0.074)
−0.002
(0.080)
0.009
(0.070)
−0.002
(0.088)
0.001
(0.091)
0.000
(0.069)
0.001
(0.038)
0.000
(0.040)
0.003
(0.026)
−0.007
(0.025)
−0.003
(0.027)
−0.004
(0.025)
−0.002
(0.080)
0.006
(0.078)
0.002
(0.074)
0.007
(0.082)
0.008
(0.070)
−0.002
(0.088)
0.000
(0.091)
−0.002
(0.068)
0.007
(0.108)
0.003
(0.119)
−0.003
(0.056)
−0.003
(0.044)
−0.004
(0.053)
−0.004
(0.043)
0.020
(0.165)
−0.035
(0.128)
−0.008
(0.159)
−0.034
(0.149)
−0.101
(0.188)
−0.025
(0.376)
−0.054
(0.324)
−0.089
(0.226)
0.007
(0.098)
0.003
(0.110)
−0.002
(0.055)
−0.003
(0.044)
−0.008
(0.051)
−0.004
(0.042)
0.035
(0.153)
−0.038
(0.122)
−0.028
(0.151)
−0.030
(0.141)
−0.136
(0.172)
0.008
(0.344)
−0.015
(0.299)
−0.121
(0.208)
0.017
(0.103)
0.008
(0.108)
0.050
(0.034)
−0.008
(0.032)
−0.005
(0.034)
0.027
(0.032)
0.104
(0.063)
0.042
(0.066)
−0.015
(0.063)
0.039
(0.067)
−0.005
(0.042)
0.285
(0.071)
0.108
(0.065)
0.030
(0.048)
0.016
(0.104)
0.006
(0.108)
0.011
(0.035)
0.008
(0.033)
0.010
(0.034)
−0.001
(0.033)
0.115
(0.064)
0.024
(0.066)
−0.031
(0.063)
0.048
(0.068)
−0.001
(0.041)
0.274
(0.071)
0.106
(0.065)
0.031
(0.048)
Table 5.6
′
Empirical size of test with H0 : rank(H20
− I2 ) = 1.
Normal disturbances
Non-normal disturbances
n, T
30
60
90
120
30
60
90
120
No X
36
54
72
90
0.144
0.057
0.062
0.051
0.093
0.059
0.057
0.061
0.087
0.04
0.057
0.049
0.069
0.067
0.057
0.059
0.145
0.065
0.057
0.06
0.087
0.058
0.035
0.055
0.084
0.066
0.054
0.053
0.072
0.056
0.052
0.052
With X
36
54
72
90
0.066
0.05
0.058
0.045
0.052
0.064
0.064
0.053
0.06
0.057
0.054
0.059
0.048
0.052
0.05
0.058
0.066
0.049
0.064
0.066
0.042
0.049
0.056
0.063
0.058
0.058
0.053
0.07
0.04
0.044
0.052
0.055
Table 5.7
′
Power of test with H0 : rank(H20
− I2 ) = 1.
(n, T )
c
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
54,30
54,30
90,30
90,30
90,60
90,60
No X
With X
No X
With X
No X
With X
0.21
0.24
0.24
0.31
0.58
0.74
0.38
0.32
0.42
0.45
0.77
0.89
0.71
0.53
0.67
0.61
0.80
0.88
0.86
0.73
0.90
0.81
0.83
0.86
0.90
0.79
0.97
0.91
0.89
0.90
0.92
0.81
0.99
0.95
0.96
0.95
0.92
0.83
0.99
0.97
0.98
0.98
0.93
0.86
0.99
0.98
0.99
0.99
0.94
0.87
0.99
0.98
1.00
1.00
by Academia Sinica, while the data of rice prices are from Lee and Yu (2010b). The data covers 65 prefectures and 49 years,
from 1742 to 1790. The data are collected from middle and lower Yangtze Rive Basin provinces: Anhui, Jiangsu, Zhejiang,
Hubei, Hunan, and Jiangxi. We exclude the upper Yangtze River Basin since it is geographically far away from others and
the transportation to that area could be very costly due to rugged terrain. The middle and lower basin is often considered
as the richest area in China where interregional trade has been well-developed. We use semi-annual data, collected from
K. Yang and L.-f. Lee / Journal of Econometrics 221 (2021) 337–367
357
Fig. 1. Asymptotic, Simulated, and Empirical CDFs of the test statistics.
′
′
Fig. 2. Local Power of Test with H0 : rank(H20
− I2 ) = 1. Note: Lines illustrate the rejection rates of H1 : rank(H20
− I2 ) = 2 with increasing c.
February and August of each year, so that there are 98 periods. There are 12% missing data, which are interpolated.29 We
29 Similarly to Keller and Shiue (2007), we use the TRAMO (Time Series Regression with ARIMA Noise, Missing Observations and Outliers) program
to interpolate series for missing data (Gomez and Maravall, 1997).
358
K. Yang and L.-f. Lee / Journal of Econometrics 221 (2021) 337–367
Table 6.1
Data summary.
Observations
Prefectures
Provinces
Periods (1742–1790, Semiannual)
6370
65
6
98
Distance
Rice price (low)
Rice price (high)
Wheat price (low)
Wheat price (high)
Rice price
Wheat price
Mean
S.D.
Maximum
Minimum
487.8
1.338
1.627
0.976
1.334
1.483
1.168
260.5
0.290
0.343
0.208
0.251
0.305
0.220
1195
3.000
3.600
2.110
2.860
3.250
2.634
13.17
0.660
0.900
0.340
0.730
0.780
0.590
Note: the distances are measure by kilometers (1 km = 0.6213 mi).
have one category of wheat and three categories of rice: high-quality, mid-quality and low-quality. We use the mid-quality
rice prices following Shiue (2002), Keller and Shiue (2007) and Yan and Liu (2011). We can access to the maximum and
minimum of rice and wheat prices so we use the means of them to represent the prices. The data are summarized in
Table 6.1.
6.2. Model and empirical results
In Keller and Shiue (2007) and Lee and Yu (2010b), the rice price from a prefecture is specified to depend on its own
time lagged price, neighbors’ prices of current period and neighbors’ prices in last period.
yri,t = prr yri,t −1 + ψrr
n
∑
wi,j yrj,t + φrr
j=1
n
∑
wi,j yrj,t −1 + cir + drt + vir,t ,
j=1
where prr represents the time lagged effect, ψrr represents the effect of neighbors’ prices (spatial effects), φrr represents
the effect of neighbors’ prices in last period (diffusion), cir is the individual fixed effects, drt is the time fixed effect, and
vir,t is an idiosyncratic shock. The model we are going to estimate is an extension with the specification
(r) yri,t =prr yri,t −1 + ψrr
n
∑
wi,j yrj,t + φrr
j=1
+ pwr yw
i,t −1 + ψw r
n
∑
wi,j yrj,t −1
j=1
n
∑
wi,j yw
j,t + φ w r
j=1
w
(w) yw
i,t =pww yi,t −1 + ψww
n
∑
+ pr w yri,t −1 + ψr w
n
∑
j=1
r
r
r
wi,j yw
j,t −1 + ci + dt + vi,t ,
j=1
wi,j yw
j,t + φww
j=1
n
∑
n
∑
wi,j yw
j,t −1
j=1
wi,j yrj,t + φr w
n
∑
w
wi,j yrj,t −1 + ciw + dw
t + vi,t ,
j=1
where pkl , ψkl and φkl (k ̸ = l) represent the cross-price time lagged effects, spatial effects and diffusion of rice/wheat on
wheat/rice. The idiosyncratic shocks in both equations are allowed to correlate with each other.
Two types of spatial weights matrices are generated from geographic distances between prefectures and/or two
prefectures in a same province. In the eighteenth century, the trade cost was highly related to geographic distance,
so would be informational transmission. Furthermore, considering stability of economy and society, local governments
of Qing dynasty would have policies to keep grain prices stable, such as ‘‘Chang-Ping barn’’ and transporting grains
across prefectures (Chen, 2004) in order to stabilize prices.30 Therefore, if two prefectures are in the same province,
the prices are likely to be correlated. The first specification of Wn assumes that wi,j = exp(−cm × Distance(i, j)/100),
where distance is measured in kilometers, cm = 1.4 or 1.2. The second specification of Wn is a block diagonal matrix
with 6 blocks (provinces). The interactions between any two prefectures in a province decrease with distance such
that wi,j = exp(−cm × Distance(i, j)/100) × 1{i and j are in the same province}. All spatial weights matrices have been
row-normalized in the system.
Table 6.2 presents QMLEs for the system with different Wn ’s while Table 6.3 reports estimates for single-equation
SDPD models with rice and wheat prices estimated separately, as if each single equation were a univariate model similar
30 Chang-Ping barn: the local governments stored rices in case of significant increase of rice prices due to disaster or other factors.
K. Yang and L.-f. Lee / Journal of Econometrics 221 (2021) 337–367
359
Table 6.2
QML estimation for the grain market integration in Yangtze River Basin.
Wn
Distance
cm = 1.2
Distance
cm = 1.4
Province
cm = 1.4
Limit
None
None
None
None
300 km
Method
QML
QML
QML
QML
QML
Estimates
θ
θc
θ
θc
θ
θc
θ
θc
θ
θc
prr
0.540
(0.011)
0.030
(0.012)
0.064
(0.009)
0.642
(0.010)
0.556
(0.011)
0.033
(0.012)
0.063
(0.009)
0.659
(0.010)
0.540
(0.011)
0.029
(0.012)
0.062
(0.009)
0.644
(0.010)
0.555
(0.011)
0.032
(0.012)
0.061
(0.009)
0.661
(0.010)
0.565
(0.011)
0.026
(0.012)
0.093
(0.010)
0.635
(0.010)
0.578
(0.011)
0.030
(0.012)
0.091
(0.010)
0.651
(0.010)
0.564
(0.011)
0.028
(0.012)
0.095
(0.010)
0.635
(0.010)
0.578
(0.011)
0.031
(0.012)
0.093
(0.010)
0.650
(0.010)
0.564
(0.011)
0.026
(0.012)
0.093
(0.010)
0.635
(0.010)
0.578
(0.011)
0.029
(0.012)
0.092
(0.010)
0.651
(0.010)
−0.418
−0.431
(0.027)
0.028
(0.033)
−0.102
(0.028)
−0.603
(0.021)
−0.378
(0.023)
−0.002
(0.031)
−0.065
(0.025)
−0.562
(0.020)
−0.390
(0.023)
−0.004
(0.031)
−0.066
(0.025)
−0.575
(0.020)
−0.180
−0.186
−0.202
−0.209
(0.027)
0.030
(0.033)
−0.101
(0.028)
−0.589
(0.021)
(0.034)
−0.193
(0.063)
0.118
(0.040)
−0.420
(0.035)
(0.034)
−0.198
(0.063)
0.119
(0.040)
−0.429
(0.035)
(0.038)
−0.285
(0.074)
0.168
(0.047)
−0.497
(0.041)
(0.038)
−0.293
(0.074)
0.171
(0.047)
−0.508
(0.041)
−0.176
(0.035)
−0.211
(0.060)
0.120
(0.039)
−0.437
(0.036)
−0.183
(0.035)
−0.216
(0.060)
0.121
(0.039)
−0.446
(0.036)
0.888
(0.031)
−0.100
(0.045)
0.174
(0.034)
0.801
(0.015)
0.886
(0.031)
−0.096
(0.045)
0.172
(0.034)
0.802
(0.015)
0.797
(0.023)
−0.029
(0.041)
0.123
(0.031)
0.759
(0.015)
0.796
(0.023)
−0.026
(0.041)
0.121
(0.031)
0.760
(0.015)
0.449
(0.031)
0.316
(0.086)
−0.031
(0.059)
0.483
(0.037)
0.448
(0.031)
0.319
(0.086)
−0.033
(0.059)
0.484
(0.037)
0.491
(0.035)
0.457
(0.103)
−0.101
(0.070)
0.584
(0.045)
0.491
(0.035)
0.462
(0.103)
−0.104
(0.070)
0.586
(0.045)
0.449
(0.031)
0.339
(0.080)
−0.037
(0.055)
0.512
(0.036)
0.449
(0.031)
0.341
(0.080)
−0.038
(0.055)
0.513
(0.036)
0.009
(0.000)
0.002
(0.000)
0.006
(0.000)
0.009
(0.000)
0.002
(0.000)
0.006
(0.000)
0.009
(0.000)
0.002
(0.000)
0.006
(0.000)
0.009
(0.000)
0.002
(0.000)
0.006
(0.000)
0.008
(0.000)
0.002
(0.000)
0.006
(0.000)
0.008
(0.000)
0.002
(0.000)
0.006
(0.000)
0.008
(0.000)
0.002
(0.000)
0.006
(0.000)
0.008
(0.000)
0.002
(0.000)
0.006
(0.000)
0.008
(0.000)
0.002
(0.000)
0.006
(0.000)
0.008
(0.000)
0.002
(0.000)
0.006
(0.000)
pw r
Time effect
pr w
pww
φrr
φw r
Diffusion effect
φr w
φww
ψrr
ψwr
Spatial effect
ψr w
ψww
σrr
σw r
σww
Likelihood
3.8661
3.8651
Province
cm = 1.2
4.0370
Province
cm = 1.2
4.0264
4.0330
Note: ‘‘distance: cm = 1.2 or 1.4’’ means that we construct wi,j = exp(−cm × Distance(i, j)/100); ‘‘province: cm = 1.2 or 1.4’’ means that we construct
wi,j = exp(−cm × Distance(i, j)/100) × 1{i and j are in the same province}. All above spatial weights matrices are row-normalized in regression.
‘‘Limit’’ means that if the distance between i and j exceeds this limit, wi,j = 0. Thus, ‘‘none’’ in the row of ‘‘limit’’ indicates that there is no limit for
wi,j .
to Keller and Shiue (2007) and Lee and Yu (2010b). The likelihood information criterion suggests the block spatial weights
matrix wi,j = exp(−cm × Distance(i, j)/100) × 1{i and j are in the same province} with cm = 1.4 performs the best and
our analysis in the following will be based on this specification.
We focus three issues in the following: which case the model exhibits according to the proposed test method; whether
the cross-price effects would be statistically significant; whether own-price effects in the SVAR model would differ from
n
those in the SDPD models with a univariate dependent variable. With block spatial weights matrices, and n1 > 0 for a
large sample, we are able to run diagnostic test for the model to check which case it belongs to. The model exhibits ‘‘case
S’’ for all the specifications. For the specification of Wn with cm = 1.4, H0 : {the model exhibits spatial cointegration (case
SC)} is rejected with LR(0, 2) = 133.01 (critical value at 5% significance level is 9.37); H0 : {the model exhibits mixed
cointegration (case MC) with cointegration rank 1} is rejected with LR(1, 2) = 85.17 (critical value at 5% significance level
is 3.96). Therefore, we conclude that the data generating process exhibits a stable case. In fact, the estimated eigenvalues
for the matrix H20 are 0.8709 and 0.4769, which also implies that the data generating process is stable. For another
specification with the spatial weights matrix being block diagonal but cm = 1.2, LR(0, 2) = 132.80 and LR(1, 2) = 85.45
while the critical values at 5% significance level are respectively 9.47 and 3.72, with the estimated eigenvalues of H20 being
0.9531 and 0.4925. For the last specification of block spatial weights matrix with the spatial interaction being restricted
within 300 km, LR(0, 2) = 132.69 and LR(1, 2) = 85.30 while the critical values at 5% significance level are respectively
9.63 and 3.80, with the estimated eigenvalues of H20 being 0.8830 and 0.4865.
The estimates for own-price spatial effects, temporal effects, and diffusion effects are similar with those in the existed
literature in sign, while the cross-price effects turn out to be nonnegligible. For the specification with block diagonal
spatial weights matrix and cm = 1.4 in Table 6.2, all three effects (spatial effect, temporal effect, and diffusion effect) of
wheat prices on neighbors’ rice prices are significant. The temporal effect and diffusion effect of rice prices on neighbors’
360
K. Yang and L.-f. Lee / Journal of Econometrics 221 (2021) 337–367
Table 6.3
Estimation for the grain market integration in Yangtze River Basin.
Wn
Distance: c = 1.2
Province: c = 1.4
Province: c = 1.2
Limit
None
None
None
Method
SDPD
prr
Time effect
pww
φrr
Diffusion effect
φww
ψrr
Spatial effect
ψww
σrr
σww
SDPD
SDPD
θ
θc
θ
θc
θ
θc
0.550
(0.011)
0.671
(0.009)
0.566
(0.011)
0.688
(0.009)
0.569
(0.011)
0.679
(0.009)
0.585
(0.011)
0.696
(0.009)
0.571
(0.011)
0.679
(0.009)
0.587
(0.011)
0.696
(0.009)
−0.401
(0.022)
−0.599
(0.019)
−0.413
−0.324
(0.016)
−0.464
(0.015)
−0.334
(0.016)
−0.475
(0.015)
−0.335
(0.016)
−0.473
(0.015)
−0.345
(0.022)
−0.613
(0.019)
0.835
(0.014)
0.860
(0.013)
0.836
(0.014)
0.860
(0.013)
0.625
(0.010)
0.646
(0.010)
0.625
(0.010)
0.646
(0.010)
0.639
(0.011)
0.660
(0.010)
0.639
(0.011)
0.660
(0.010)
0.009
(0.000)
0.006
(0.000)
0.009
(0.000)
0.006
(0.000)
0.009
(0.000)
0.006
(0.000)
0.009
(0.000)
0.006
(0.000)
0.009
(0.000)
0.006
(0.000)
0.009
(0.000)
0.006
(0.000)
(0.016)
−0.485
(0.015)
wheat prices are also statistically significant. The inclusion of cross-price effects also dampens the magnitude of own-price
effects in the univariate SDPD models. For this specification of the SVAR model, the own-price spatial effects ψ̂ww = 0.448
after bias correction, while for those SDPD models in Table 6.3, ψ̂ww = 0.625 with the same specification of Wn , which
might suggest that the SDPD model overestimates the own-price spatial effects due to lacking of cross-price spatial effects.
There are similar patterns for estimates of own-price temporal effects and diffusion effects.
The model can also be used to illustrate how shocks to one variable transmit to other variables. We adopt the definition
of Koop et al. (1996) to derive the generalized impulse response function. The impulse response function shows that a
unit shock to rice or wheat prices in any spatial unit propagates across both temporal and spatial dimensions. Additional
analyses for impulse response functions are available in the supplementary file.
7. Concluding remarks
This paper investigates a dynamic panel SVAR model, in which behaviors of spatial units depend not only on own
temporal lags, but also respond to their neighbors’ or peers’ behaviors in current period (spatial lags), and to those in
previous period (space-time lags; diffusion). The disturbances in the model are specified with time fixed effects, individual
fixed effects and idiosyncratic disturbances.
We mainly study three issues: features of dynamics and spatial interactions that a SVAR model can generate; the
identification and estimation of parameters with simultaneity and unknown cointegration relationships; detection of cases
that the true model can belong to. For the first issue, we categorize the model into 6 cases by the division of parameter
spaces, which are stable, pure spatial cointegration, pure variable cointegration, mixed cointegration, pure unit roots, and
explosive time series. In identification and estimation, we use IVs and structures of disturbances to identify parameters,
while a QML estimator can be consistent without knowing cointegration relationships. To detect which case the true
model is, we introduce a hypothesis testing method, which can distinguish cases S, SC and MC with cointegration ranks.
Monte Carlo experiments demonstrate the advantage on reduction of biases and efficiency gain of QML estimators
with respect to other estimators. The robustness of estimators and test statistics are presented in simulation results. The
model is applied to study possible grain market integration using a unique historical dataset of grain prices. Previous
researches consider solely rice prices while we add a multivariate feature since wheat is believed to be a substitute for
rice. The empirical result shows that rice and wheat prices are spatially correlated among each other across prefectures.
In future works, the identification and estimation of SVAR model under cases VC and PU should be investigated, as in
these cases the spatial interdependence of variables no longer generates stable linear combinations among variables. For
example, for case VC, the vector error correction model suggests that the long-run equilibrium is among variables in the
same spatial unit while variables are spatially correlated in the short-run. The LR test procedure in this paper can also be
regarded as a pre-test. It is of interest to explore the estimation method with determined information for cointegration
rank after this test. Furthermore, asymptotic distributions of QML estimators and test statistics rely on the time trend
generated by individual fixed effects and exogenous regressors as a time trend would dominate stochastic trends. For the
model with neither exogenous regressors nor individual effects, the asymptotic analysis on stochastic trends would be
different, but that situation is beyond our investigation in this paper.
K. Yang and L.-f. Lee / Journal of Econometrics 221 (2021) 337–367
361
Acknowledgments
The authors are grateful for having valuable comments and suggestions by a co-editor, an associate editor, and
two anonymous referees of this journal, audiences in the 2017 Asian Meeting of the Econometric Society, 2017 China
Meeting of the Econometric Society, and seminar participants at the Ohio State University, Tsinghua University, and Jinan
University. Kai Yang’s research is supported by NSFC (Grant No. 71703090) and the ‘‘Chenguang Program’’ (No. 19CG43)
supported by Shanghai Education Development Foundation and Shanghai Municipal Education Commission.
Appendix A. Selected notations from the main text
1. Parameters and their combinations
Symbol
Explanation
Pm
Time lagged effects
Spatial lagged effects
Diffusion (space–time lagged effects)
n × m fixed effects for spatial units
′
Dm,t = αm
,t ⊗ ln where αm,t are time fixed effects
k × m coefficient matrix for independent variables
Covariance matrix for each row of Vnm,t
= Im − Ψm′
= Inm − Wn ⊗ Ψm′
′
= Sm−1 (Pm + Φm )′ = Υm−1 Λm Υm′
−1
′
= Snm (In ⊗ Pm + Wn ⊗ Φm′ )
′ ′
′ ′
′ ′
) , vec(Pm
) , vec(Φm
) , vec(Ψm′ )′ , vec∗ (Σv m )′ ]′
= [vec(Πkm
−1
Wn = Γn ω̄n Γn
The first n1 columns of eigenvectors of Wn corresponding to unit eigenvalues
n × n0 submatrix of Γn,n1 = [Γn,n0 , √1n ln ], where ln is an n × 1 vector with ones and n0 = n1 − 1
Ψm
Φm
Cnm
Dm,t
Πkm
Σv m
Sm
Snm
Hm
Hnm
θ
Γn
Γn,n1
Γn,n0
Γn∗1 ,n
Γn∗0 ,n
αm , βm
α⊥m , β⊥m
The first n1 rows of Γn−1
The first n0 rows of Γn−1
′
Hm − Im = αm βm
, where both matrices are m × (m − m1 ) with full column rank
′
m × m1 matrices with full column rank and βm
β⊥m = αm′ α⊥m = 0
2. Variables
Symbol
Explanation
Ynm,t
Xnk,t
Vnm,t
Wn
ynm,t
n × m dependent variable matrix at time t
n × k independent variable matrix at time t
n × m disturbances at time t
Spatial weights matrix
′
= vec(Ynm
,t ) (similar for vnm,t and cnm )
Ỹnm,t
= Ynm,t − Ȳnm,T where Ȳnm,T =
= Ynm,t − Ȳ¯nm,T where Ȳ¯nm,T =
= Γn∗1 ,n Ynm,t
˜
Ỹ
nm,t
◦
Yn1 m,t
1
T
1
T
∑T
∑tT=−11
t =0
Ynm,t , similarly for ỹnm,t
Ynm,t , similarly for ỹ˜ nm,t −1
3. Simplified expression
Symbol
θ
Anm0
B1nm
Gnm,d
Explanation
= In ⊗ (Pm0 − Im )′ + Wn ⊗ (Ψm0 + Φm0 )′
Generic nm × nm matrices; bounded row and column sum norms; similar
for B2nm , Gnm,c
∑+∞
h
Generic nm × nm matrix such that
h=1 abs(Gnm,d ) are with bounded row
and column sum norms
362
K. Yang and L.-f. Lee / Journal of Econometrics 221 (2021) 337–367
Symbol
Explanation
Unm,t
=
Ωθ0 ,nT
Information matrix with
Ξθ0 ,nT
Additional term for Cov
∑+∞
h=1
Gnm,c Ghnm,d vnm,t −h+1
∂ 2 ln LnT ,m (θ0 )
+ Ωθ0 ,nT = op (1).
∂θ∂θ ′
)
∂ ln L1nT ,m0
1
√
= Ωθ0 ,nT + Ξθ0 ,nT
∂θ
(n−n1 )T
1
((n−n1 )T
+O
(1)
T
, due
to non-normality
∂ ln LRnT ,m0
1
∂θ
(n−n1 )T
√
n−n1
∆R,nT
T
(√ )
∆R,nT
Bias term of
∆θ0 ,nT
= Ωθ−01,nT ∆R,nT
′
= (Xnk,t ⊗ Im )vec(Πkm
) + cnm
−1
′
′
′
′
= (Wn ⊗ ej,m )Snm0 [(In ⊗ Pm0
+ Wn ⊗ Φm0
)ỹ˜ nm,t −1 + (X̃nk,t ⊗ Im )vec(Πkm0
)],
Qnm,t
ãj,t
√
=
+O
n
T3
Ãz ,t
where ej,m is an m-dimensional vector with a unit for the jth entry but zero
for all other entries
= [ã1,t , ã2,t , . . . , ãm,t ],
PzzA,t
= Ãz ,t − Z̃nm,t
[
∑T
1
(n−n1 )T
t =1
′
Z̃nm
,t Jn Z̃nm,t
]−1 [
1
(n−n1 )T
∑T
t =1
′
Z̃nm
,t Jn Ãz ,t
]
Note: more expressions and transformations of variables are in Section 4 of testing; however, they are defined step by
step and only useful within its own section.
Appendix B. Lemmas
∑+∞
−1
h
′
The reduced form for the stable SVAR model is ynm,t =
h=0 Hnm0 Snm0 [(Xnk,t −h ⊗ Im )vec(Πkm0 ) + cnm0 + ln ⊗ αm0,t +
−1
′
′
′
vnm,t −h ], where, as we recall, Hnm0 = Snm0 (Pm0 ⊗ In + Φm0 ⊗ Wn ). Denote E(vnm,t vnm,t ) = In ⊗ Σv m0 ≡ Σv,nm . The statistic
∑+∞ h −1
h=0 Hnm Snm vnm,t −h is crucial in our asymptotic analysis, where Hnm and Snm are evaluated at any value of θ in its compact
(s)
parameter space. For the estimation method proposed in the main text, we use the stable subsystem ynm,t , which has the
similar property with the stable SVAR and the following results also apply.
∑+∞
Here we∑provide some basic lemmas on moments for relevant statistics. Define Unm,t =
h=1 Gnm,h vnm,t +1−h and
+∞
Wnm,t =
h=1 Hnm,h vnm,t +1−h , where the matrices Gnm,h and Hnm,h are nm × nm generic matrices with regularity
conditions in Assumption B.1. From the defined Unm,t , t = 1, . . . , T , their corresponding sample mean over time
∑T
Ūnm,T = 1/T t =1 Unm,t has the expression as
+∞
∑
Ūnm,T =
G̈nm,h vnm,T +1−h , W̄nm,T =
h=1
where G̈nm,h =
Lemma B.1.
1
T
+∞
∑
Ḧnm,h vnm,T +1−h ,
h=1
∑h
g =1
Gnm,g for h ≤ T ; and =
1
T
∑T
g =1
Gnm,h−T +g for h > T . (Similar for Ḧnm,h ).
Under Assumption 3.1(i), when t ≥ s,
E(Unm,t W′nm,s ) =
+∞
∑
′
′
Gnm,t −s+h Σv,nm Hnm
,h , E(Unm,t Wnm,s ) =
h=1
+∞
∑
Tr[G′nm,t −s+h Hnm,h Σv,nm ].
h=1
Lemma B.2. Let Cnm,ts,gh = E [(v′nm,t B1nm vnm,s )(v′nm,g B2nm vnm,h )]. Under Assumption 3.1(i),
(1) for all t,
Cnm,tt ,tt = Tr[B1nm Σv,nm (B2nm + B′2nm )Σv,nm ] + Tr(B1nm Σv,nm ) Tr(B2nm Σv,nm )
+
m
m
m
m
∑
∑
∑
∑
(
(uklpq − σ σ − σ σ − σ σ
k=1 l=1 p=1 q=1
2 2
kp lq
2 2
kq lp
2 2
kl pq )
)
n
∑
pq
kl
[B1nm ]ii [B2nm ]ii ;
i=1
(2) for all t ̸ = s, Cnm,tt ,ss = Tr(B1nm Σv,nm ) Tr(B2nm Σv,nm );
Cnm,ts,ts = Tr[B1nm Σv,nm B′2nm Σv,nm ]; and Cnm,ts,st = Tr[B1nm Σv,nm B2nm Σv,nm ];
(3) Except (1) and (2), otherwise, Cnm,ts,gh = 0;
pq
where [Anm ]ii is the iith (diagonal) entry of the pqth n × n block of a generic matrix Anm , uklpq = E(vn,t ,ik vn,t ,il vn,t ,ip vn,t ,iq )
2
and σkq = E(vn,t ,ik vn,t ,iq ) for any t and i, with k, l, p, q = 1, . . . , m.
K. Yang and L.-f. Lee / Journal of Econometrics 221 (2021) 337–367
Lemma B.3.
363
Under Assumption 3.1(i),
cov (Unm,t Wnm,t , U′nm,s Wnm,s )
′
⎡
⎤
+∞
+∞
∑
∑
′
′
= Tr ⎣
Hnm,t −s+g Σv,nm Hnm,g
Gnm,h Σv,nm Gnm,t −s+h ⎦
g =1
h=1
⎡
⎤
+∞
+∞
∑
∑
+ Tr ⎣
Hnm,t −s+g Σv,nm G′nm,g
Hnm,h Σv,nm G′nm,t −s+h ⎦
g =1
h=1
m
m
m
m
∑
∑
∑
∑
(uklpq − σ σ − σ σ − σ σ
+
2 2
kl pq )
2 2
kq lp
2 2
kp lq
( n
+∞ ∑
∑
g =1
k=1 l=1 p=1 q=1
)
Gnm,t −s+g Hnm,t −s+g kl
ii
[
′
] [
pq
Gnm,g Hnm,g ii
′
]
.
i=1
∑+∞
∑+∞
Gnm,h = Gnm,c Ghnm,d and Wnm,t =
Assumption B.1. Unm,t =
h=1 Hnm,h vnm,t +1−h with Hnm,h =
h=
1 Gnm,h vnm,t +1−h with
∑+∞
∑
+∞
h
h
h
Hnm,c Hnm,d , where Gnm,c , Hnm,c ,
h=1 abs(Hnm,d ) are bounded in row and column sum norms.
h=1 abs(Gnm,d ) and
Under Assumptions 3.1(i) and B.1, v ar(
Lemma B.4.
Lemma [
B.5.
1
nT
(1) E
(2) E
t =1
t =1
[1
U′nm,t Wnm,t =
1
n
[
1
nT
U′nm,t Wnm,t − E
′
Ū
W̄nm,T
n nm,T
1 ′
Ū
W̄nm,T
n nm,T
]
=
[
−E
(3)
1
nT
t =1
U′nm,t Wnm,t ) = O(nT ).
Under Assumptions
] 3.1(i) and B.1,
∑T
∑T
1
nT
∑T
∑T
∑+∞
Tr[G′nm,h Hnm,h Σv,nm ] = O(1);
∑T
U′nm,t Wnm,t = Op
h=1
]
t =1
+∞
1
′
h=1 Tr G̈nm,h Ḧnm,h Σv,nm
n
1 ′
Op √ 1 2
Ū
W̄nm,T
n nm,T
nT
∑
t =1 Ũnm,t W̃nm,t − E
[
1
nT
√1
)
nT
.
O( T1 )
[
;
)] =
=
.
]
(
)
∑T
′
√1
Ũ
W̃
=
O
.
nm
,
t
p
t =1 nm,t
nT
(
]
′
(
Lemma B.6. Suppose Dnm,t is an nm × 1 nonstochastic matrix with uniformly bounded elements, under Assumptions 3.1(i)
∑T
∑T
∑T
1
1
′
′
′
√1 ), and 1
√1 ).
and B.1, nT
t =1 D̃nm,t Ũnm,t = nT
t =1 D̃nm,t Unm,t = Op (
t =1 D̄nm,T Ūnm,T = Op (
nT
nT
nT
¯
As the nature of dynamic panel model results in time lags, we define Ū
nm,T
=
1
T
∑T −1
t =0
˜
¯
Unm,t and Ũ
nm,t = Unm,t − Ūnm,T .
Lemma B.7. Under Assumptions 3.1(i) and B.1,
∑+∞
T
1
1
n
¯ ′ v̄
¯′
¯′
¯′
E(Ū
nm,T nm,T ) = n Tr(Σv,nm Gnm,c
g =1 Gnm,d ) + O( T ); so E(Ūnm,T v̄nm,T ) = O( T ). And Ūnm,T v̄nm,T − E(Ūnm,T v̄nm,T ) =
n
Op (
√
n
).
T2
The subsequent Lemmas B.8 and B.9 are implied results by the above basic lemmas. They are used to establish the
uniform convergence of the sample average concentrated log likelihood function to its expectation.
Lemma B.8. Under Assumptions 2.1(i)–(ii), and 3.1, suppose a generic matrix B1nm is uniformly bounded in both row and
column sum norms, then
T
1 ∑
nT
and
[
[(Jn ⊗ Im )ỹ˜ nm,t −1 ]′ B1nm ṽnm,t − E
∗
t =1
T
1 ∑
nT
]
[(Jn ⊗ Im )ỹ˜ nm,t −1 ]′ B1nm ṽnm,t
∗
t =1
1
= Op ( √
nT
);
T
1 ∑
nT
[(Jn∗ ⊗ Im )ỹ˜ nm,t −1 ]′ B1nm [(Jn∗ ⊗ Im )ỹ˜ nm,t −1 ]
t =1
[
−E
T
1 ∑
nT
]
[(Jn ⊗ Im )ỹ˜ nm,t −1 ]′ B1nm [(Jn∗ ⊗ Im )ỹ˜ nm,t −1 ]
∗
t =1
Furthermore,
{
E
{
and E
nT
}
[(Jn ⊗ Im )ỹ˜ nm,t −1 ]′ B1nm ṽnm,t
t =1
∗
1
= O( ),
T
}
′
∗
˜
˜
[(Jn ⊗ Im )ỹnm,t −1 ] B1nm [(Jn ⊗ Im )ỹnm,t −1 ] = O(1).
T
1 ∑
nT
T
1 ∑
t =1
∗
1
= Op ( √
nT
).
364
K. Yang and L.-f. Lee / Journal of Econometrics 221 (2021) 337–367
Lemma B.9. Under Assumptions 2.1(i)–(ii), and 3.1, if a generic matrix B1nm is bounded in both row and column sum norms,
then
]
[ ∑
T
1
1
′
(1) E nT
t =1 vnm,t B1nm vnm,t = n Tr[B1nm (In ⊗ Σv m0 )] = O(1);
1
nT
(2) E
∑T
t =1
v′nm,t B1nm vnm,t − E
[1
]
v̄′
B v̄
=
n nm,T 1nm nm,T
1 ′
v̄
B v̄
n nm,T 1nm nm,T
(3)
1
nT
∑T
t =1
−E
[
t =1
]
v′nm,t B1nm vnm,t = Op
(
√1
)
nT
.
⊗ Σvm0 )] (= O( T1));
]
= Op √ 1 2 .
nT ]
(
)
∑T
′
√1
ṽ
B
ṽ
.
t =1 nm,t 1nm nm,t = Op
nT
1
Tr B1nm (In
nT
1 ′
v̄
B v̄
n nm,T 1nm nm,T
[
[
∑T
1
nT
ṽ′nm,t B1nm ṽ′nm,t − E
[
1
nT
Lemma B.10. Under Assumptions 3.1(i) and B.1, the fourth moment of any entry unk,t −1,i of Unm,t −1 is uniformly bounded,
i.e., E |unk,t −1,i |4 = O(1) uniformly in k and i.
Appendix C. Technique Details for Section 4
The transformed terms Vn◦0 m,t in
′
◦
Yn◦0 m,t =Yn◦0 m,t Hm0
+ Xn◦0 ,t Πkm
+ Cn◦0 m + Vn◦0 m,t ,
′
′
′
′
′
′
′
◦
or ∆Yn◦0 m,t =αm0 βm0
Yn◦0 m,t + Πkm
Xn◦0 ,t + Cn◦0 m + Vn◦0 m,t ,
contain rows with the same zero expectations and cov(v◦n0 ,t ) = In0 ⊗ Σum0 . For simplicity, we let Xn◦0 ,t = Xnu0 ,t without
loss of generality and introduce simple notations
′
Zt = Hm0 Zt −1 + C0 + π0 Xt + ϵt , or ∆Zt = αm0 βm0
Zt −1 + C0 + π0 Xt + ϵt ,
′
′
′
′
′
◦
.
where Zt , C0 , and ϵt are m × n0 matrices, which equal respectively Yn◦0 m,t , Cn◦0 m , and Vn◦0 m,t . Xt = Xn◦0 ,t and π0 = Πkm0
c
x
Also, without Xn0 ,t , τn0 = τn0 .
For each t, the columns of ϵt are m × 1 vectors with covariance matrix Σϵ . The Sij for i, j = 0, 1, the likelihood
function, and the likelihood ratio test statistic are similarly defined in the main context. Following Johansen
(1991)’s
∑t
π0 Xs +
procedure, we have that ∆Zt = B(L)(π0 Xt + C0 + ϵt ) = B(L)π0 Xt + τn0 + B(L)ϵt and Zt = Z0 + ( s=1 B(L))
′
αm0
−1
ϵs + τn0 t + St − S0 , where St = B1 (L)ϵt , B1 (L) = B(L)1−−B(1)
(L)
, and
,
B(L)
=
(
β
(1
−
L)
,
β
)A
m0
⊥
m0
′
L
α⊥m0
)
(
′
′
′
′
−αm0 αm0 βm0 βm0 + αm0 χ (L)βm0 (1 − L) αm0 χ (L)β⊥m0
. B is the value of B(z) at z = 1. This means that
A(L) =
′
′
α⊥
α⊥
m0 χ (L)βm0 (1 − L)
m0 χ (L)β⊥m0
B
∑t
s=1
each column of Zt is composed of a time trend (with exogenous variable Xt ), a unit root process, and stationary processes,
′
′
′
Stx where Stx = B1 (L)π0 Xt , is also
St + βm0
Zt = βm0
while each column of ∆Zt is stationary. Moreover, each column of βm0
stationary, in addition to exogenous variables.
In order to derive the distribution of likelihood ratio test statistics, we need the following lemmas. Note that both
−1
Assumptions 4.1 and 4.2 imply that n0 is increasing in T . Here, we define Σjl = Σjl+ − Σjx Σxx
Σxl , where j, l = 0, 1, and
+
Σ00
= lim
T →∞
+
′
βm0
Σ10
= lim
T →∞
+
Σ01
βm0 = lim
T →∞
+
′
βm0
Σ11
βm0 = lim
T →∞
Σxx = lim
T →∞
Σ0x = lim
T →∞
Σx1 βm0 = lim
T →∞
T
1 ∑
n0 T
t =1
T
1 ∑
n0 T
∆Z̃t X̃t′ ,
t =1
T
1 ∑
n0 T
X̃t X̃t′ ,
t =1
T
1 ∑
n0 T
˜
′ ˜
βm0
Z̃t −1 Z̃t′−1 βm0 ,
t =1
T
1 ∑
n0 T
∆Z̃t Z̃˜t′−1 βm0 ,
t =1
T
1 ∑
n0 T
′ ˜
βm0
Z̃t −1 ∆Z̃t′ ,
t =1
T
1 ∑
n0 T
∆Z̃t ∆Z̃t′ ,
t =1
˜
X̃t Z̃t′−1 βm0 .
K. Yang and L.-f. Lee / Journal of Econometrics 221 (2021) 337–367
Lemma C.1.
365
Under H0 , Assumptions 2.1, 2.2(i), 3.1(i)–(iii), and 4.2, for any case defined by Assumption 2.2(ii)–(iv),
p
(1) As T → ∞, S00 → Σ00 .
p
′
′
(2) As T → ∞, βm0
S11 βm0 → βm0
Σ11 βm0 .
p
β⊥′ m0 (limn0 →∞ n1 τn0 τn′ 0 )β⊥m0 .
0
√
n0
n0
1
′
′ d
(4) As T → ∞,
(S
−
α
β
S
)
+
Σ
B
→
S[01] , where vec(S[01] ) ∼
01
m0
11
ϵ
T
)m0
) 2 T
(
(
1
1
′
N 0, 12 limn0 →∞ n τn0 τn0 ⊗ Σϵ .
(3) As T → ∞,
1
T2
√
β⊥′ m0 S11 β⊥m0 →
1
12
0
′
′
(5) α⊥
m0 Σ00 α⊥m0 = α⊥m0 Σϵ α⊥m0 .
−1
−1
−1
−1
′
′
′
−1 ′
(6) α⊥m0 (α⊥m0 Σϵ α⊥m0 )−1 α⊥
βm0 Σ10 Σ00
m0 = Σ00 − Σ00 Σ01 βm0 (βm0 Σ10 Σ00 Σ01 βm0 )
Using Lemma C.1, we show the following two lemmas. Proposition 4.1 is a direct result of those lemmas.
Lemma C.2. Under H0 , Assumptions 2.1, 2.2(i), 3.1(i)–(iii), and 4.2, for any case defined by Assumption 2.2(ii)–(iv), namely,
−1
S01 | = 0 converge in probability to
the S, MC, and SC cases, the largest m − m1 solutions {λ1 , . . . , λm−m1 } of |λS11 − S10 S00
−1
′
′
corresponding eigenvalues of |λβm0 Σ11 βm0 − βm0 Σ10 Σ00 Σ01 βm0 | = 0, while the others converge to zeros in probability.
Lemma C.3.
Under H0 , Assumptions 2.1, 2.2(i), 3.1(i)–(iii), and 4.2, for any case defined by Assumption 2.2(ii)–(iv), as
d
n0
T
→ M, where 0 ≤ M < +∞, the likelihood ratio test statistics LR(m − m1 , m) → Tr[(Wm1 + Bm1 )(Wm1 + Bm1 )′ ]
where Wm1 is an m1 × m1 matrix with each entry being i.i.d. standard normally distributed random variable and Bm1 =
(
(
)
)−1/2
√
1 x x′
′
′
1/2
− 3M α⊥
.
(α⊥
m0 limn0 →0 n0 τn0 τn0 α⊥m0
m0 Σϵ α⊥m0 )
)
)−1/2
(
(
√
1
′
′
1/2
′
Without Xnc0 ,t , τnx0 = τn0 . As Bm1 = − 3M α⊥
, we need plug-in
(α⊥
m0 Σϵ α⊥m0 )
m0 limn0 →0 n τn0 τn0 α⊥m0
0
p
p
′
ˆ
′
estimators for Bm1 and simulate the critical values. Our estimators, n1 Cn◦ m0 Cn◦ m0 → limn0 →0 n1 C0 C0′ and α⊥
m0 S00 α⊥m0 →
0
0
0
0
′
′
α⊥m0 Σϵ α⊥m0 . However, βm0 , αm0 , and α⊥m0 are not unique. To see this, with normalization βm0 βm0 = Im−m−1 , bm0 =
′
= am0 b′m0 for a generic orthonormal matrix Rm−m1 . For a
βm0 Rm−m1 and am0 = αm0 Rm−m1 also satisfy Hm0 = αm0 βm0
given βm0 as the true parameter matrix, we can show that an estimator of β̃m is a consistent estimator for a basis of
−1
column space of βm0 as they are identified up to transformation by an invertible matrix. The QMLEs β̃m = S112 V̂m,m−m1
′
S11 β̃m )−1 , where V̂m,m−m1 represents the matrix of eigenvectors corresponding to the largest m − m1
and α̃m = S01 β̃m (β̃m
−1
−1
−1
−1
−1
1
−1
−1
−2
eigenvalues λ1 ≥ · · · ≥ λm−m1 for |λIm − S112 S10 S00
S01 S112 | = 0. α̃⊥m = S00
S01 S112 V̂m,m1 (V̂m′ ,m1 S112 S10 S00
S01 S112 V̂m,m1 )− 2 ,
where V̂m,m1 represents the matrix of eigenvectors corresponding to the last eigenvalues λm−m1 +1 ≥ · · · ≥ λm for
−1
−1
−1
|λIm − S112 S10 S00
S01 S112 | = 0. In the following lemma, we show the consistency of α̃⊥m in the following way: first,
we show that a transformation of β̃m by post-multiplying an invertible matrix is a consistent estimator for a given βm0 ,
which implies that β̃m is also a consistent estimator for a basis of column space of βm0 ; second, we show a transformation
of α̃m is also a consistent estimator for a basis of column space of αm0 ; last, we show that the QMLE α̃⊥m is a consistent
′
estimator for a basis of column space of α⊥m0 as αm0
α̃⊥m = op (1). As α⊥m0 is identified up to transformation by an
′
, and Tr[(Wm1 + Bm1 )(Wm1 + Bm1 )′ ] has the same distribution as
orthonormal matrix with restriction α⊥
α
=
I
⊥
m0
m
m0
1
Tr[(Wm1 + B∗m1 )(Wm1 + B∗m1 )′ ] with B∗m1 = Rm1 Bm1 R′m1 where Rm1 is a generic m1 × m1 orthonormal matrix, we can plug-in
the estimator α̃⊥m to simulate the critical values.
Lemma C.4. Under H0 , Assumptions 2.1, 2.2(i), 3.1(i)–(iii), and 4.2, for any case defined by Assumption 2.2(ii)–(iv), for a
′
′
β̃m )−1 , where β̄m0 = βm0 (βm0
βm0 )−1 , is a consistent estimator for βm0 , i.e., β̂m − βm0 = op (1). The
given βm0 , β̂m = β̃m (β̄m0
′
′
estimator α̃⊥m is a consistent estimator for a basis of null space of αm0
, i.e., αm0
α̃⊥m = op (1).
Lemma 4.1 follows Lemma C.4. The proof of Corollary 1 is in the supplementary file.
Appendix D. Biases terms
∆R,vec(Pm′ ),nT = (∆R,Pm,11 ,nT , ∆R,Pm,12 ,nT , . . . , ∆R,Pm,1m ,nT , . . . , ∆R,Pm,mm ,nT )′ , and similarly for ∆R,vec(Φm′ ),nT , ∆R,vec(Σv′ m ),nT and
∆R,vec(Ψm′ ),nT .31 Explicitly,
′ ),nT = 0,
∆R,vec(Πkm
⎡
∆R,Pm,ij ,nT = −
1
n − n1
′
Tr ⎣(Jn∗ ⊗ Em
,ij )
+∞
∑
⎤
g
−1 ⎦
Bnm0 Snm0
,
g =0
31 The arrangement and dimension of the vector depends on the arrangement of distinct parameters to estimate.
366
K. Yang and L.-f. Lee / Journal of Econometrics 221 (2021) 337–367
⎡
∆R,Φm,ij ,nT = −
1
∗
n − n1
′
Tr ⎣((Jn Wn ) ⊗ Em,ij )
+∞
∑
⎤
g
−1
Bnm0 Snm0
⎦,
g =0
1
∆R,Σvm,ij ,nT = − Tr[Fm,ij Σv−m0
],
⎡
∆R,Ψm,ij ,nT = −
1
n − n1
−
−1
′
′
′
Tr ⎣((Jn∗ Wn ) ⊗ Em
,ij )Snm0 (in ⊗ Pm0 + Wn ⊗ Φm0 )
1
n − n1
+∞
∑
⎤
g
−1
Bnm0 Snm0
⎦
g =0
−1
′
Tr ((Jn∗ Wn ) ⊗ Em
,ij )Snm0 .
[
]
Appendix E. Supplementary data
Supplementary material related to this article can be found online at https://doi.org/10.1016/j.jeconom.2020.05.010.
References
Allers, Maarten A., Elhorst, J. Paul, 2011. A simultaneous equations model of fiscal policy interactions. J. Reg. Sci. 51 (2), 271–291.
Anselin, L., 1988. Spatial Econometrics: Methods and Models. Kluwer Academic Publishers, Boston.
Baltagi, Badi H., 2006. Random effects and spatial autocorrelation with equal weights. Econometric Theory 22 (5), 973–984.
Baltagi, Badi H., Deng, Ying, 2015. EC3SLS estimator for a simultaneous system of spatial autoregressive equations with random effects. Econometric
Rev. 34, 658–693.
Beenstock, Michael, Felsenstein, Daniel, 2007. Spatial vector autoregressions. Spatial Economic Analysis 2 (2), 167–196.
Breitung, Jorg, 2005. A parametric approach to the estimation of cointegration vectors in panel data. Econometric Rev. 24 (2), 151–173.
Brown, Kristine M., Laschever, Ron A., 2012. When they’re sixty-four: Peer effects and the timing of retirement. Am. Econ. J.: Appl. Econ. 4 (3),
90–115.
Chen, Ye, 2004. Disaster defense and reduction policies in the Qing Dynastry. In: Studies in Qing History, Vol. 3. pp. 41–52.
Cohen-Cole, Ethan, Liu, Xiaodong, Zenou, Yves, 2018. Multivariate choices and identification of social interactions. J. Appl. Econometrics 33, 165–178.
Cragg, John G., Donald, S.G., 1997. Inferring the rank of a matrix. J. Econometrics 76, 223–250.
de Graaff, Thomas, van Oort, Frank G., Florax, Raymond J.G.M., 2012. Regional population–employment dynamics across different sectors of the
economy. J. Reg. Sci. 52 (1), 60–84.
Dewachter, H., Houssa, R., Toffano, P., 2012. Spatial propagation of macroeconomic shocks in Europe. Rev. World Econ. 148 (2), 377–402.
Elhorst, J., 2003. Specification and estimation of spatial panel data models. Int. Reg. Sci. Rev. 26 (3), 244–268.
Engle, R.F., Granger, C.W.J., 1987. Co-integration and error correction: Representation, estimation and testing. Econometrica 55, 251–276.
Gebremariam, Gebremeskel H., Gebremedhin, Tesfa G., Schaeffer, Peter V., 2011. Employment, income, and migration in appalachia: a spatial
simultaneous equations approach. J. Reg. Sci. 51 (1), 102–120.
Gomez, Victor, Maravall, Agustin, 1997. Program TRAMO (Time Series Regression with ARIMA Noise, Missing Observations, and Outliers) and SEATS
(Signal Extraction in ARIMA Time Series) Instructions for the User. Secretaria de Estado de Hacienda, Madrid.
Hauptmeier, Sebastian, Mittermaier, Ferdinand, Rincke, Johannes, 2012. Fiscal competition over taxes and public inputs. Reg. Sci. Urban Econ. 42 (3),
407–419.
Huang, Jingbin, 2009. Rediscussion on the grain demand suuply and trade of Jiangnan in the Mid-Qing. J. Tsinghua Univ. (Philos. Soc. Sci.) 24, 39–48.
Johansen, Soren, 1988. Statistical analysis of cointegration vectors. J. Econom. Dynam. Control 12 (23), 231–254.
Johansen, Soren, 1991. Estimation and hypothesis testing of cointegration vectors in Gaussian vector autoregressive models. Econometrica 59 (6),
1551–1580.
Kelejian, Harry, Prucha, Ingmar, 1998. A generalized spatial two stage least squares procedure for estimating a spatial autoregressive model with
autoregressive disturbances. J. Real Estate Finance Econ. 17, 99–121.
Kelejian, Harry, Prucha, Ingmar, 1999. A generalized moments estimator for the autoregressive parameter in a spatial model. Internat. Econom. Rev.
40 (2), 509–533.
Kelejian, Harry, Prucha, Ingmar, 2004. Estimation of simultaneous systems of spatially interrelated cross sectional equations. J. Econometrics 118,
27–50.
Keller, Wolfgang, Shiue, Carol, 2007. The origins of spatial interaction. J. Econometrics 140 (1), 304–332.
Koop, Gary, Pesaran, M. Hashem, Potter, Simon M., 1996. Impulse response analysis in nonlinear multivariate models. J. Econometrics 74 (1), 119–147.
Korniotis, George M., 2010. Estimating panel models with internal and external habit formation. J. Bus. Econom. Statist. 28 (1), 145–158.
Larsson, Rolf, Lyhagen, Johan, Lothgren, Michkeal, 2001. Likelihood-based cointegration tests in heterogeneous panels. Econom. J. 4, 109–142.
Lee, Lung-fei, 2004. Asymptotic distributions of quasi-maximum likelihood estimators for spatial autoregressive models. Econometrica 72 (6),
1899–1925.
Lee, Lung-fei, 2007. GMM And 2SLS estimation of mixed regressive, spatial autoregressive models. J. Econometrics 137 (2), 489–514.
Lee, Lung-Fei, Yu, Jihai, 2010a. A unified transformation approach for the estimation of spatial dynamic panel data models: stability spatial
cointegration and explosive roots. In: Ullah, A., Giles, D.E.A. (Eds.), Handbook of Empirical Economics and Finance. Chapman and Hall/CRC,
pp. 395–432.
Lee, Lung-fei, Yu, Jihai, 2010b. Some recent developments in spatial panel data models. Regional Science and Urban Economics 40 (5), 255–271.
LeSage, James, Pace, Robert Kelley, 2009. Introduction to Spatial Econometrics. In: Statistics: A Series of Textbooks and Monographs, Taylor and
Francis Group, New York.
Levin, Andrew T., Lin, Chienfu, Chu, Chiashang James, 2002. Unit root tests in panel data: asymptotic and finite-sample properties. J. Econometrics
108 (1), 1–24.
Li, Kunpeng, 2017. Fixed-effects dynamic spatial panel data models and impulse response analysis. J. Econometrics 198 (1), 102–121.
Liu, Xiaodong, 2014. Identification and efficient estimation of simultaneous equations network models. J. Bus. Econom. Statist. 32 (4), 516–536.
Liu, Xiaodong, Patacchini, Eleonora, Zenou, Yves, 2014. Endogenous peer effects: Local aggregate or local average? J. Econ. Behav. Organ. 103, 39–59.
Lütkepohl, Helmut, 2005. New Introduction to Multiple Time Series Analysis. Springer-Verlag, Berlin.
Mutl, Jan, 2009. Panel var models with spatial dependence. Working paper.
Ord, J.K., 1975. Estimation methods for models of spatial interaction. J. Amer. Statist. Assoc. 70, 120–297.
K. Yang and L.-f. Lee / Journal of Econometrics 221 (2021) 337–367
367
Quah, Danny, 1994. Exploiting cross section variation for unit root inference in dynamic data. Econom. Lett. 44, 9–19.
Robin, Jean-Marc, Smith, Richard J., 2000. Tests of rank. Econometric Theory 16 (2), 151–175.
Shiue, Carol, 2002. Transport costs and the geography of arbitrage in eighteenth century China. Amer. Econ. Rev. 92 (5), 1406–1419.
Su, Liangjun, Yang, Zhenlin, 2015. Estimation of dynamic panel data models with spatial errors. J. Econometrics 185 (1), 230–258.
Theil, Henri, 1971. Principles of Econometrics. John Wiley & Sons, New York.
Yan, Se, Liu, Cong, 2011. A comparison of the market intergration degree between north China and south China during 18th century: based on grain
price data of the Qing dynasty. Econ. Res. J. 12, 124–137, (in Chinese).
Yang, Kai, Lee, Lung-fei, 2017. Identification and QML estimation of multivariate and simultaneous spatial autoregressive models. J. Econometrics
196, 196–214.
Yu, Jihai, de Jong, Robert, Lee, Lung-fei, 2008. Quasi-maximum likelihood estimators for spatial dynamic panel data with fixed effects when both n
and t are large. J. Econometrics 146 (1), 119–134.
Yu, Jihai, de Jong, Robert, Lee, Lung-fei, 2012. Estimation for spatial dynamic panel data with fixed effects: The case of spatial cointegration. J.
Econometrics 167 (1), 16–37.
Yu, Jihai, Lee, Lung-fei, 2010. Estimation of unit root spatial dynamic panel data models. Econometric Theory 26 (5), 1332–1362.
Download