Introduction to Applied Spatial Econometrics

advertisement
Introduction to Applied
Spatial Econometrics
Attila Varga
DIMETIC Pécs, July 3, 2009
Prerequisites
• Basic statistics (statistical testing)
• Basic econometrics (Ordinary Least
Squares and Maximum Likelihood
estimations, autocorrelation)
EU Patent applications 2002
Outline
•
•
•
•
•
Introduction
The nature of spatial data
Modelling space
Exploratory spatial data analysis
Spatial Econometrics: the Spatial Lag and
Spatial Error models
• Specification diagnostics
• New developments in Spatial Econometrics
• Software options
Spatial Econometrics
„A collection of techniques that deal with
the peculiarities caused by space in the
statistical analysis of regional science
models”
Luc Anselin (1988)
Increasing attention
towards Spatial Econometrics in Economics
• Growing interest in agglomeration
economies/spillovers – (Geographical
Economics)
• Diffusion of GIS technology and increased
availability of geo-coded data
The nature of spatial data
• Data representation: time series („time
line”) vs. spatial data (map)
• Spatial effects:
spatial heterogeneity
spatial dependence
Spatial heterogeneity
• Structural instability in the forms of:
– Non-constant error variances (spatial
heteroscedasticity)
– Non-constant coefficients (variable
coefficients, spatial regimes)
Spatial dependence
(spatial autocorrelation/spatial association)
• In spatial datasets „dependence is present
in all directions and becomes weaker as
data locations become more and more
dispersed” (Cressie, 1993)
• Tobler’s ‘First Law of Geography’:
„Everything is related to everything else,
but near things are more related than
distant things.” (Tobler, 1979)
Spatial dependence
(spatial autocorrelation/spatial association)
• Positive spatial autocorrelation: high or low
values of a variable cluster in space
• Negative spatial autocorrelation: locations
are surrounded by neighbors with very
dissimilar values of the same variable
EU Patent applications 2002
Spatial dependence
(spatial autocorrelation/spatial association)
• Dependence in time and dependence in
space:
– Time: one-directional between two
observations
– Space: two-directional among several
observations
Spatial dependence
(spatial autocorrelation/spatial association)
• Two main reasons:
– Measurement error (data aggregation)
– Spatial interaction between spatial units
Modelling space
• Spatial heterogeneity: conventional nonspatial models (random coefficients, error
compontent models etc.) are suitable
• Spatial dependence: need for a nonconvential approach
Modelling space
• Spatial dependence modelling requires an
appropriate representation of spatial
arrangement
• Solution: relative spatial positions are
represented by spatial weights matrices
(W)
Modelling space
1. Binary contiguity weights matrices
- spatial units as neighbors in different orders
(first, second etc. neighborhood classes)
- neighbors:
- having a common border,
or
- being situated within a
given distance band
2. Inverse distance weights matrices
Modelling space
• Binary contiguity matrices (rook, queen)
W=
0
1
1
0
1
0
1
0
1
1
0
1
0
0
1
0
• wi,j = 1 if i and j are neighbors, 0 otherwise
• Neighborhood classes (first, second, etc)
Modelling space
• Inverse distance weights matrices
0
1
W=
( d 2 ,1 ) 2
1
( d 3 ,1 ) 2
1
( d 4 ,1 ) 2
1
( d 1, 2 ) 2
0
1
(d 3, 2 ) 2
1
(d 4 , 2 ) 2
1
( d 1, 3 ) 2
1
(d 2 , 3 ) 2
0
1
(d 4 , 3 ) 2
1
( d 1, 4 ) 2
1
(d 2 , 4 ) 2
1
(d 3, 4 ) 2
0
Modelling space
• Row-standardization:
• Row-standardized spatial weights matrices:
- easier interpretation of results
(averageing of values)
- ML estimation (computation)
Modelling space
• The spatial lag operator: Wy
– is a spatially lagged value of the variable y
– In case of a row-standardized W, Wy is the
average value of the variable:
• in the neighborhood (contiguity weights)
• in the whole sample with the weight decreasing
with increasing distance (inverse distance weights)
Exploratory spatial data analysis
• Measuring global spatial association:
– The Moran’s I statistic:
a) I = N/S0 [Si,j wij (xi -m)(xj - m) / Si(xi -m)2]
normalizing factor: S0 =Si,j wij
(w is not row standardized)
b) I* = Si,j wij (xi -m)(xj - m) / Si(xi -m)2
(w is row standardized)
Global spatial association
• Basic principle
measures:
behind
all
global
- The Gamma index
G = Si,j wij cij
– Neighborhood patterns and value similarity
patterns compared
Global spatial association
• Significance of global clustering: test
statistic compared with values under H0 of
no spatial autocorrelation
- normality assumption
- permutation approach
Local indicatiors of spatial
association (LISA)
A.
The Moran scatterplot
idea: Moran’s I is a regression coefficient of a
regression of Wz on z when w is row standardized:
I=z’Wz/z’z
(where z is the variable in deviations from the mean)
- regression line: general pattern
- points on the scatterplot: local tendencies
- outliers: extreme to the central tendency (2 sigma
rule)
- leverage points: large influence on the central
tendency (2 sigma rule)
Moran scatterplot
Local indicators of spatial
association (LISA)
B. The Local Moran statistic
Ii = ziSjwijzj
– significance tests: randomization approach
Spatial Econometrics
• The spatial lag model
• The spatial error model
The spatial lag model
• Lagged values in time: yt-k
• Lagged values in space: problem (multioriented, two directional dependence)
– Serious loss of degrees of freedom
• Solution: the spatial lag operator, Wy
The spatial lag model
The general expression for the spatial lag model is
y = Wy + x +,
where y is an N by 1 vector of dependent observations, Wy is an N by 1 vector of lagged
dependent observations,  is a spatial autoregressive parameter, x is an N by K matrix of
exogenous explanatory variables,  is a K by 1 vector of respective coefficients, and  is an N by
1 vector of independent disturbance terms.
The spatial lag model
• Estimation
– Problem: endogeneity of wy (correlated with
the error term)
– OLS is biased and inconsistent
– Maximum Likelihood (ML)
– Instrumental Variables (IV) estimation
The spatial lag model
• ML estimation: The Log-Likelihood
function
L = lnI - W- N/2 ln (2) - N/2 ln (2) - (y - Wy - x)’( y - Wy - x)/2 2
Maximizing the log likelihood with respect to , , and 2 gives the values of parameters
that provide the highest likelihood of the joint occurrence of the sample of dependent variables
The Spatial Lag model
• IV estimation (2SLS)
– Suggested instruments: spatially lagged
exogenous variables
The Spatial Error model
y = x + 
with
 = W  + ,
where  is the coefficient of spatially lagged autoregressive errors, W. Errors in  are
independently distributed.
The Spatial Error model
• OLS: unbiased but inefficient
• ML estimation
The likelihood function for the regression with spatially autocorrelated error term is
L = lnI -W- N/2 ln (2) - N/2 ln (2) - (y - x)’(I - W)’(y - x)(I - W)/2 2
Specification tests
Test
Formulation
Distribution
Source
MORAN
e’We/e’e
N(0,1)
Cliff and Ord (1981)
Burridge(1980)
LM-ERR
(e’We/s ) /T

2 (1)
LM-ERRLAG
(e’We/s2)2 / [tr(W’W + W2) - tr(W’W + W2)A-1 var()]
2 (1)
Anselin (1988/B)
LM-LAG
(e’Wy/s2) 2/ (RJ
2 (1)
Anselin (1988/B)
LM-LAGERR
(e'B'BWy) 2/(H - HVar()H'

2 (1)
Anselin et al. (1996)
2 2
Steps in estimation
• Estimate OLS
• Study the LM Error and LM Lag statistics
with ideally more than one spatial weights
matrices
• The most significant statistic guides you to
the right model
• Run the right model (S-Err or S-Lag)
Table 6.2. OLS Regression Results for Log (Innovations) at the MSA Level
(N=125, 1982)
Model
Jaffe
ML -Spatial Lag
Spatial
Extended Jaffe
Constant
-1.045
(0.146)
-1.134
(0.172)
-1.407
(0.212)
0.540
(0.054)
-1.098
(0.143)
0.125
(0.055)
0.515
(0.053)
0.112
(0.036)
0.125
(0.035)
0.504
(0.055)
0.001
(0.041)
0.132
(0.036)
0.037
(0.018)
0.599
-65.336
1.899
-62.708
0.611
-62.402
0.277
(0.057)
-0.027
(0.037)
0.093
(0.034)
0.032
(0.015)
0.652
(0.163)
0.332
(0.057)
-0.337
(0.094)
0.202
(0.101)
0.725
-36.683
9.024
37.847
0.936
2.178
1.102
0.102
0.060
0.045
1.026
1.485
0.659
0.450
1.593
0.625
W_Log(INN)
Log(RD)
Log(RD75)
Log(URD)
Log(URD50)
Log(LQ)
Log(BUS)
Log(LARGE)
RANK
R2 - adj
Log-Likelihood
Kiefer-Salmon
White
1.183
B-P
0.243
LM-Err
D50
D75
IDIS2
1.465
2.688
1.691
LM-Lag
D50
D75
IDIS2
5.620
2.968
2.039
LR-Lag
D50
0.000
0.737
0.008
5.256
Notes: Estimated standard errors are in parentheses; critical values for the White statistic with
respectively 5, 20, and 35 degrees of freedom are 11.07, 31.41, and 49.52 (p=0.05); critical
value for the Kiefer-Salmon test on normality and the Breusch-Pagan (B-P) test for
heteroskedasticity is 5.99 (p=0.05); critical values for LM-Err, LM-Lag and LR-Lag statistics
are 3.84 (p=0.05) and 2.71 (p=0.10); spatial weights matrices are row-standardized: D50 is
distance-based contiguity for 50 miles; D75 is distance-based contiguity for 75 miles; and
IDIS2 is inverse distance squared.
Example: Varga (1998)
Spatial econometrics:
New developments
• Estimation: GMM
• Spatial panel models
• Spatial Probit, Logit, Tobit
Study materials
• Introductory:
– Anselin: Spacestat tutorial (included in the
course material)
– Anselin: Geoda user’s guide (included in the
course material)
• Advanced:
– Anselin: Spatial Econometrics, Kluwer 1988
Software options
•
•
•
•
GEODA – easiest to access and use
SpaceStat
R
Matlab routines
Download