Modeling mixed spatial processes and spatio-temporal dynamics in information-theoretic frameworks Rosa Bernardini Papalia Dipartimento di Scienze Statistiche, Università di Bologna,Via Belle Arti, 41 40100 Bologna , Italy Summary. This work aims at developing an entropy-based estimation strategy of a dynamic spatial heterogeneous panel data model where separate processes for each unit are considered. As regards traditional estimation techniques, the formulation of the constrained maximization problem in the maximum entropy view does not require: (i) the use of restrictive parametric assumptions on the model; (ii) the formulation of hypotheses regarding the form of the distribution of the objective variables. Restrictions expressed in terms of inequality can be introduced and it is possible to calibrate the precision in the estimation. Good results are produced in the case of small-sized samples, in the presence of high numbers of explanatory parameters and variables (highly correlated). Key words: Dynamic panel data, generalized maximum entropy estimation, spatial models 1 Introduction In recent years, there has been a growing interest in the estimation of econometric relationships based on panel data (Hsiao 1989). In this work we focus on dynamic models for spatial panels, a family of models for which there is an increasing interest in estimation problems (Elhorst 2003, Hadinger, Muller, Tondl 2002). A dynamic spatial panel data model takes the form of a linear equation extended with a variable intercept, a serially lagged dependent variable and either a spatially lagged dependent variable (known as spatial lag model) or a spatially autoregressive process incorporated in the error term (known as spatial error model). With respect to a standard space-time model (Space-Time Autoregressive /Integrated Moving Average model, STARMA, STARIMA (Hepple 1978), and the spatial autoregression space-time forecasting model (Griffith 1996)) which assumes that the spatial units are completely homogeneous, a panel data approach would presume that spatial heterogeneity is a feature of data and attempt to model that heterogeneity. The need to account for spatial heterogeneity is that spatial units are likely to differ in their background variables, which are usually space-specific time-invariant 1484 Rosa Bernardini Papalia variables that affect the dependent variable, but are difficult to measure or hard to obtain. Omission of these variables leads to bias in the resulting estimates. To overcome these problems, one possibility is to introduce a variable intercept representing the effect of the omitted variables that are peculiar to each spatial unit considered. More specifically, in the fixed effects model, a dummy variable is introduced for each spatial unit as a measure of the variable intercept, while, in the random effects model, the variable intercept is treated as a random variable that is independently and identically distributed with zero mean. The homogeneity assumptions that are often imposed on the coefficients of the lagged dependent variables can lead to serious biases when in practice the dynamics are heterogeneous across the cross section units. This work aims at developing an entropy- based estimation strategy of an heterogeneous dynamic spatial panel data model. The starting point is a general model specification which account for both temporal and spatial lagged effects in a panel data context. The entropy-based estimation procedure shares some of the characteristics of Stein Rule estimators and Bayesian approaches to estimation (Zellner, 1996, 1997, 1998). There is now a considerable body of work, which has given an application of the entropy criterion to a wide class of models (Marsh et al. 1988, 2003; Golan et al. 1996; Kullback 1959). As regards traditional estimation techniques, the formulation of the constrained maximization problem in the maximum entropy view does not require: (i) the use of restrictive parametric assumptions on the model; (ii) the formulation of hypotheses regarding the form of the distribution of the objective variables. Restrictions expressed in terms of inequality can be introduced and it is possible to calibrate the precision in the estimation. Good results are produced in the case of small-sized samples, in the presence of high numbers of explanatory parameters and variables (highly correlated). The paper is organized as follows. Section 2 introduces the basic model introducing the typical assumptions of an information theoretic framework. More specifically, section 2.2. develops the maximum entropy based formulation for alternative spatial error specifications. Finally, section 3 concludes and lists some potential advantages and investigations of the proposed approach. 2 The information-theoretic framework 2.1 The basic problem Following the heterogeneous panel approach proposed by Pesaran and Smith (1995), we take into account the possibility of cross-sectional correlation by treating individual relationships as a system of seemingly unrelated regression equations. In the context of this work, the general specification of the model for cross section unit i (i=1,..,N) and t=1,..,T time periods which involves specifying a different intercept coefficient for each cross-sectional unit (equation) is: Yit = ρi w1 Yit−1 + Xit βi + εit (1) We start by considering the basic model in vector form for a cross-section unit i: Yi = ρi w1 Yt−1 + Xi βi + εi Modeling mixed spatial processes and spatio-temporal dynamics εi = (I − λi w2 )−1 ui 1485 (2) where Yi is of dimension (Tx1), Xi is (TxKi ) and βi is (Ki x1). The resulting complete model, expressed as a system of seemingly unrelated system of equations, can be written as: −1 Y = ρW1 Y + 2 Xβ3+ (I −2λW2 ) U y1 X1 0 .. 6 ... 7 6 0 X2 .. 6 7 6 X≡4 where Y ≡ 4 .. .. ... 5 0 0 .. yN 2 β1 3 2 ρ1 3 2 u1 3 0 0 7 7 .. 5 XN 3 2 λ1 3 (3) 6 ... 7 6 ... 7 6 ... 7 6 ... 7 7 6 7 6 7 6 7 β≡6 4 ... 5 ρ ≡ 4 ... 5 U ≡ 4 ... 5 λ ≡ 4 ... 5 βN ρN uN λN where Y is a vector of dimension (NTx1), X is a block diagonal matrix of dimension (NTxK) with K=ΣKi , and β=(β1 , β2 ,.. βN )’, λ=( λ1 , λ2 ,.., λN )’ ρ=( ρ1 , ρ2 ,.., ρN )’ are unknown vectors of dimension (NKx1) (Nx1) and (Nx1), respectively. This model specification assumes that spatial effects are not identical across spatial units; by varying spatial residual correlation coefficients across units it takes into account jointly structural instability and differentiated spatial effects within and between spatial units. W1 is a (TNxTN) block diagonal matrix which expresses for each observation (row) those units (columns) that belong to its neighbourhood set as non-zero elements, that is: for pairs of units (i,j), wij 6=0 for ‘neighbours’ and wij =0 for others. It is common practice to derive spatial weights from the location and spatial arrangements of observation by means of a geographic information system. In this case, units are defined ‘neighbours’ when they are within a given distance of each other, ie wij =1 for dij 6 δ and i6=j, where dij is the great circle distance chosen, and δ is the critical cut-off value. More specifically, a spatial weights matrix W* is defined as follow: ∗ wij = 8 < 0 if i=j 1 if dij 6 δ, i 6= j . : 0 if dij > δ, i 6= j (4) and the elements of the row-standardized spatial weights matrix W (with elements of a row sum to one) result: wij = ∗ wij N P j=1 , i, j = 1, .., N. (5) ∗ wij W2 is the (TNxTN) block diagonal matrix with T copies of the (NxN) spatial weight matrix, and U is a (TNx1) vector of iid errors with variance σu2 . The specification in the form of a nonlinear simultaneous equation model result: Y = (ρW1 + λW2 − (λW2 )(ρW1 )) Y + (I − λW2 ) Xβ + u (6) Following Theil (1971) and Zellner (1998), from the joint spatial model (6) and from substitution of the reduced form equation: 1486 Rosa Bernardini Papalia Y = Sπ + τ (7) for Y, the resulting joint spatial model is derived: Y = (ρw1 + λw2 − (λw2 )(ρw1 )) (Sπ) + (I − λw2 ) Xβ + u∗ (8) where u* is a (NTx1) vector of appropriately transformed residuals, S is a (NxL) matrix of instrumental variables, π is a (Lx1) vector of unknown parameters, and τ is a (Nx1) vector of reduced form residuals. The basic generalized maximum entropy (GME) formulation we develop refer to a model specification where a constant spatial structure for error variance in each equation is assumed so that the disturbance vector is assumed to have a zero mean and a non-diagonal covariance matrix: Φ = Σ ⊗ IT (9) where Σ is an (NxN) positive definite symmetric matrix,⊗is the Kronecker product operator and IT is an identity matrix of dimension T. The overall variance-covariance matrix takes the form: Ψ = σu2 A′ A −1 (10) , where: ε = A−1 u, A = (IN − λW2 ) . (11) Assuming different error variances for error variance in each equation we have: ′ E uu 2 3 σu2 1 I1 .. 0 .. 5 . = Ω = 4 .. .. 0 .. σu2 N IN (12) Under the GME framework we recover simultaneously the unknown parameters,the unknown errors by defining an inverse problem, which is based only on indirect, partial or incomplete information. The objective is to recover probability distributions for unknown parameters and errors. Each parameter is treated as a discrete random variable with a compact support Z and M possible outcomes, 2 6M 6 ∞. The uncertainty about the outcome of the error process is represented by treating each error as a finite and discrete random variable with Jpossible outcomes, 26J6 ∞. To this end, we start by choosing a set of discrete points, the support space V =[v1 , v2 ,. . .,v J ]′ of dimension J>2, that are at uniform intervals and symmetric around zero. Each error term has corresponding unknown P weightsri =[ri1 , ri2 . . .,r iJ ]′ that have the properties of probabilities 06r ij 61 and j rij = 1. Re-parameterizing the set of equations (7) and (8), so that β=Zp β and π=Zp π , ρ=Zp ρ ,λ =Zp λ , u*=Vr u∗ and τ =Vr τ yields: λ λ λ λ ρ ρ π π Y = (Z ρ pρ ) w1+ Z p w2 − Z p w2 (Z p ) w1 (S (Z p )) + I − Z λ pλ w2 X Z β pβ + (Z u∗ pu∗ ) (13) Y = S (Z π pπ ) + (Z τ pτ ) (14) β π ρ λ u∗ τ where p=vec(p p p p ) and r=vec(p p ) are (KM+LM+2M)x1 and (2NJ)x1 vectors of unknown parameters and errors, respectively. Modeling mixed spatial processes and spatio-temporal dynamics 1487 2.2 Estimation issues Given the data consistency (13) and (14), and the covariance’s relationship (9) the GME objective function relative to our formulation problem may be formulated as: max H(p, r) = −p′ ln p − r ′ ln r (15) pi ,ri subject to: i) data consistency conditions: λ λ λ λ w2 (Z ρ pρ ) w1 (S (Z π pπ )) Y = (Z ρ pρ ) w1+ Z 2 − Z p p w λ λ β β u∗ u∗ + I − Z p w2 X Z p + (Z p ) (16) Y = S (Z π pπ ) + (Z τ pτ ) (17) ii) adding-up constraints: 1′ pβk = 1 ∀k; 1′ pπk = 1 ∀k; 1′ pρ = 1 ; 1′ pλ = 1; ′ τ 1′ pu∗ i = 1 ∀i; 1 pi = 1 ∀i; (18) where p=vec(p β pπ pρ pλ ) and r=vec(p u∗ pτ ) are (KM+LM+2M)x1 and (2NJ)x1 vectors of unknown parameters and errors, respectively. If the correlation between time periods varies with time, the error covariance that results is more general than the block diagonal structure we have considered. In such a cases it is possible to introduce different covariances for each period by adding a new set of constraints in the optimization problem. It is important to point out some advantages of our GME model specification. First, by assuming heterogeneity of parameters across units it is possible to analyze the spatial patterns in the estimated coefficients. Second, by using the GME procedure it is possible to derive an estimator even if the number of time periods involved, T, is less than the number of observations N and the corresponding variance-covariance matrix for errors is singular. 3 Remarks and conclusions In this paper we derived a GME estimation procedure for heterogeneous dynamic spatial panel data model. We generalized the estimator proposed by Marsh, Mittelhammer and Cardell (1988) to include first spatial autoregressive processes in either the dependent variable and residuals by taking into account jointly structural instability and differentiated spatial effects between and within spatial cross sectional units. Our model specification allows for much more flexibility of the coefficients across units than do traditional models and also contributes: (i) to avoid the simultaneity biases that arise from constraining the coefficients on the lagged dependent variable to be constant across units, and (ii) to provide a diagnostic tool to investigate the presence of some type of heterogeneity in panel data sets. Monte Carlo experiments were conducted for small and medium sized samples and estimators compared using the mean squared error loss of the regression coefficients. 1488 Rosa Bernardini Papalia Several advantages of the GME estimation procedure can be pointed out. The maximum entropy-based estimator is most efficient relative to traditional estimators in particular when data constraints for each observation are included in the maximum entropy-based problem formulations. This procedure is able to produce consistent estimates in models where the number of parameters exceeds the number of data points and in models characterized by a non-scalar identity covariance matrix. Prior information can be introduced by adding suitable constraints in the formulation without imposing strong distributional assumptions. Further investigation of GME estimators for dynamic spatial panel data models could yield useful results (i) to applied analyses of socio and economic phenomena, and (ii) to spacetime problems under complex and nonrandom sample designs. References [Cha84] Chamberlain G (1984) Panel Data. In: Griliches S, Intriligator MD (eds) Handbook of Econometrics. North-Holland, Amsterdam, pp 1247-1318 [Elh01] Elhorst JP (2001) Dynamic Models in Space and Time. Geographical Analysis 33: 119-140 [GJM96] Golan A, Judge G, Miller D (1996) Maximum entropy econometrics: robust estimation with limited data. Wiley New York [Gri96] Griffith DA (1996) Computational simplifications for Space-Time Forecasting within GIS: The Neighbourhood Spatial Forecasting Models. In: Longley P, Batty M (eds) Spatial Analysis: Modelling in a GIS Environment. Cambridge [Hep78] Hepple LW (1978) The Econometric Specification and Estimation of Spatio-Temporal Models. In: Carlstein T, Parkes D, Carlstein NT, Parkes D (eds) Thrift Time and Regional Dynamics. London, Edward Arnold [Hsi89] Hsiao C (1989) Modeling Ontario Regional Electricity System Demand Using a Mixed Fixed and Random Coefficient Approach. Regional Science and Urban Economics 19 [KR92] Keane M, Runkle D (1992) On the Estimation of Panel Data with Serial Correlation when Instruments are not Strictly Exogenous. Journal of Business and Economic Statistics 10(1): 1-9 [Kul59] Kullback (1959) Information theory and statistics. Wiley New York. [LZ86] Liang KY, Zeger SL (1986) Longitudinal Data Analysis using Generalised Linear Models, Biometrika 73(1): 13-22 [MM03] Marsh L, Mittlehammer RC (2003) Generalized maximum entropy estimation of spatial autoregressive models. Working paper, Kansan State University [MMC88] Marsh L, Mittlehammer RC, Cardell S (1988) A Structural-Equation GME Estimator. Selected Paper for the AAEA Annual Meeting, 1998, Salt Lake City [Mun78] Mundlak Y (1978) On the Pooling of Time Series and Cross-Section Data. Econometrica 36: 69-85 [PS95] Pesaran MH, Smith R (1995) Estimating long-run relationships from dynamic heterogeneous panels. Journal of Econometrics 68: 79-113 [The71] Theil H (1971) Principles of Econometrics. John Wiley &Sons New York. Modeling mixed spatial processes and spatio-temporal dynamics [Zel96] [Zel97] [Zel98] 1489 Zellner A (1996) Models, prior information, and Bayesian Analysis. Journal of Econometrics 75: 51-68 Zellner A (1997) A Bayesian Method of Moments (BMOM): Theory and Applications. In: Formby TB, Hill RC (eds) Advances in Econometrics. Applying Maximum Entropy to Econometric Problems. JAI Press, Grenwich, pp 85-105 Zellner A (1998) The finite sample properties of simultaneous equations’estimates and estimators Bayesian and Non-Bayesian Approaches. Journal of Econometrics 83: 185-212