Introduction to Spatial Econometrics using R Tse-Chuan Yang, Ph.D The Geographic Information Analysis Core Population Research Institute Social Science Research Institute Pennsylvania State University March, 2013 Overview What are spatial data and analysis? Why is a spatial perspective important? Exploratory spatial analysis Explanatory spatial analysis Demonstration using R Conclusions and caveats Goals To realize why spatial analysis is needed when using ecological data To understand the fundamentals of spatial econometrics modeling To facilitate the use of R in your future work Why Does Space Matter? Arguably, everything on earth could be spatially referenced and individual’s daily life is shaped by spatial factors. The dynamics between individual and environment (space) draw increasing attention in social science. Demography is inherently a spatial social science (Voss, 2007). Social data are special because of dependence across space. Types of Spatial Data Shapefiles Point Line Polygon What is Spatial Analysis? Spatial analysis can be generally divided into (Weeks, 2004): The analysis that puts people into place The analysis that concerns about the associations among observations Hierarchical modeling approach (hierarchical data structure) Spatial econometrics approach (flat data structure) Why is a Spatial Perspective Important? Spatial homogeneity (dependence) and heterogeneity may bias the estimates in the traditional analysis approach (Voss et al., 2006). Using a spatial perspective enhances the understanding of how neighbors matter. A spatial perspective better reflects the real world as people are not confined by administrative boundaries. How Do We Analyze Spatial Data? Exploratory spatial data analysis (ESDA): Visualization key variables Testing spatial dependence to gain statistical evidence Spatial clustering patterns Explanatory spatial data analysis (spatial econometrics approach): Spatial lag model (endogenous interaction relationships) Spatial error model (correlated relationships) Generalized spatial model (considering both spatial lag and spatial error) ESDA: Visualization Visualization is the fundamental aspect of ESDA and allows a basic understanding of data. ESDA: Testing Spatial Dependence The goal is to find statistical evidence for visual inspections: Global measures (across entire research region): Moran’s I; Geary’s C; Getis-Ord G statistic Local measures (specific to each observation): Local Indicator of Spatial Association (LISA) Spatial Econometrics Spatial dependence and heterogeneity often, if not always, violate the statistical assumptions used in the traditional analysis approach (LeSage and Pace, 2009): Independence Constancy Spatial econometrics is arguably the most common approach to spatial dependence and heterogeneity (to some extent). Spatial Structure Spatial weight matrix is treated as a priori. Spatial contiguity approach (polygon) Distance-based approach (point) K-nearest neighbor approach (point) No agreement on which one is the most appropriate. It is arbitrarily determined by researchers (Leenders, 2002; Beck et al., 2006). Spatial Weight Matrix (Contiguity) Rook’s spatial weight matrix: Queen’s spatial weight matrix: j i j j j J j j j i j j J j Second-order neighbors (Rook’s case): k k j k k j i j k k j k k Spatial weight matrix can be quite messy in practice. Spatial Regression Models Spatial lag model (how the dependent variable is related across spatial units): M WM X ~ N (0, 2 I ) Spatial error model (the impact of unknown factors in the spatial structure): M X u u Wu ~ N (0, 2 I ), Generalized spatial model (mixed both lag and error): M W1M X u u W2u ~ N (0, 2 I ) Demonstration Using R to Analyze Mortality Data County-level mortality data (1998-2002) Independent variables drawn from 2000 Census Tasks: Load necessary R packages Read the shapefile containing data Visualize the dependent variable and save it as a figure Generate spatial weight matrix using the shapefile Test spatial dependence (both global and local) Examine if a spatial perspective is better Implement spatial econometrics models Conduct model comparisons Caveats Modifiable areal unit problem (Openshaw and Taylor, 1979) The choice of spatial weight matrix The link between spatial modeling and social theories A lot more! Conclusions Spatial modeling should become the “conventional analysis approach” when dealing with ecological data. Spatial econometrics has paid relatively little attention to generalized linear modeling (noncontinuous outcomes). Spatial econometrics largely deals with crosssectional data, though the methodological framework for spatial panel data is available. R is good at statistical analysis, but for visualization, other GIS programs may be better.