Introduction to Spatial Econometrics using R

advertisement
Introduction to Spatial
Econometrics using R
Tse-Chuan Yang, Ph.D
The Geographic Information Analysis Core
Population Research Institute
Social Science Research Institute
Pennsylvania State University
March, 2013
Overview
 What are spatial data and analysis?
 Why is a spatial perspective important?
 Exploratory spatial analysis
 Explanatory spatial analysis
 Demonstration using R
 Conclusions and caveats
Goals
 To realize why spatial analysis is needed when using
ecological data
 To understand the fundamentals of spatial
econometrics modeling
 To facilitate the use of R in your future work
Why Does Space Matter?
 Arguably, everything on earth could be spatially
referenced and individual’s daily life is shaped by
spatial factors.
 The dynamics between individual and environment
(space) draw increasing attention in social science.
 Demography is inherently a spatial social science
(Voss, 2007).
 Social data are special because of dependence across
space.
Types of Spatial Data
 Shapefiles
 Point
 Line
 Polygon
What is Spatial Analysis?
 Spatial analysis can be generally divided into
(Weeks, 2004):


The analysis that puts people into place
The analysis that concerns about the associations among
observations
 Hierarchical modeling approach (hierarchical data
structure)
 Spatial econometrics approach (flat data structure)
Why is a Spatial Perspective Important?
 Spatial homogeneity (dependence) and
heterogeneity may bias the estimates in the
traditional analysis approach (Voss et al., 2006).
 Using a spatial perspective enhances the
understanding of how neighbors matter.
 A spatial perspective better reflects the real world as
people are not confined by administrative
boundaries.
How Do We Analyze Spatial Data?
 Exploratory spatial data analysis (ESDA):
 Visualization key variables
 Testing spatial dependence to gain statistical evidence
 Spatial clustering patterns
 Explanatory spatial data analysis (spatial
econometrics approach):



Spatial lag model (endogenous interaction relationships)
Spatial error model (correlated relationships)
Generalized spatial model (considering both spatial lag and
spatial error)
ESDA: Visualization
 Visualization is the fundamental aspect of ESDA and
allows a basic understanding of data.
ESDA: Testing Spatial Dependence
 The goal is to find
statistical evidence for
visual inspections:


Global measures (across
entire research region):
Moran’s I; Geary’s C;
Getis-Ord G statistic
Local measures (specific
to each observation):
Local Indicator of Spatial
Association (LISA)
Spatial Econometrics
 Spatial dependence and heterogeneity often, if not
always, violate the statistical assumptions used in the
traditional analysis approach (LeSage and Pace,
2009):


Independence
Constancy
 Spatial econometrics is arguably the most common
approach to spatial dependence and heterogeneity
(to some extent).
Spatial Structure
 Spatial weight matrix is treated as a priori.
 Spatial contiguity approach (polygon)
 Distance-based approach (point)
 K-nearest neighbor approach (point)
 No agreement on which one is the most appropriate.
It is arbitrarily determined by researchers (Leenders,
2002; Beck et al., 2006).
Spatial Weight Matrix (Contiguity)
 Rook’s spatial weight matrix:
 Queen’s spatial weight matrix:
j
i
j
j
j
J
j
j
j
i
j
j
J
j
 Second-order neighbors (Rook’s case):
k
k
j
k
k
j
i
j
k
k
j
k
k
 Spatial weight matrix can be quite messy in practice.
Spatial Regression Models
 Spatial lag model (how the dependent variable is
related across spatial units):
M  WM  X 
 ~ N (0, 2 I )
 Spatial error model (the impact of unknown factors in
the spatial structure):
M   X u
u  Wu 
 ~ N (0, 2 I ),
 Generalized spatial model (mixed both lag and error):
M  W1M   X u
u  W2u 
 ~ N (0, 2 I )
Demonstration
 Using R to Analyze Mortality Data
 County-level mortality data (1998-2002)
 Independent variables drawn from 2000 Census
 Tasks:
 Load necessary R packages
 Read the shapefile containing data
 Visualize the dependent variable and save it as a figure
 Generate spatial weight matrix using the shapefile
 Test spatial dependence (both global and local)
 Examine if a spatial perspective is better
 Implement spatial econometrics models
 Conduct model comparisons
Caveats
 Modifiable areal unit problem (Openshaw and
Taylor, 1979)
 The choice of spatial weight matrix
 The link between spatial modeling and social
theories
 A lot more!
Conclusions
 Spatial modeling should become the “conventional
analysis approach” when dealing with ecological
data.
 Spatial econometrics has paid relatively little
attention to generalized linear modeling (noncontinuous outcomes).
 Spatial econometrics largely deals with crosssectional data, though the methodological
framework for spatial panel data is available.
 R is good at statistical analysis, but for visualization,
other GIS programs may be better.
Download