Spatial Data Analysis Why Geography is important. What is spatial analysis? • From Data to Information – beyond mapping: added value – transformations, manipulations and application of analytical methods to spatial (geographic) data • Lack of locational invariance – analyses where the outcome changes when the locations of the objects under study changes » median center, clusters, spatial autocorrelation – where matters • In an absolute sense (coordinates) • In a relative sense (spatial arrangement, distance) Components of Spatial Analysis • Visualization – Showing interesting patterns • Exploratory Spatial Data Analysis (ESDA) – Finding interesting patterns • Spatial Modeling, Regression – Explaining interesting patterns Implementation of Spatial Analysis • Beyond GIS – Analytical functionality not part of typical commercial GIS » Analytical extensions – Exploration requires interactive approach » Training requirements » Software requirements – Spatial modeling requires specialized statistical methods » Explicit treatment of spatial autocorrelation » Space-time is not space + time • ESDA and Spatial Econometrics What Is Special About Spatial Data? • Location, Location, Location – “where” matters • Dependence is the rule – spatial interaction, contagion, externalities, spill-overs, copycatting – First Law of Geography (Tobler) • everything depends on everything else, but closer things more so • Spatial heterogeneity – Lack of stationarity in first-order statistics • Pertains to the spatial or regional differentiation observed in the value of a variable – Spatial drift (e.g., a trend surface) – Spatial association Nature of Spatial Data • Spatially referenced data “georeferenced” » “attribute” data associated with location » where matters • Example: Spatial Objects – points: x, y coordinates » cities, stores, crimes, accidents – lines: arcs, from node, to node » road network, transmission lines – polygons: series of connected arcs » provinces, cities, census tracts GIS Data Model • Discretization of geographical reality necessitated by the nature of computing devices (Goodchild) – raster (grid) vs. vector (polygon) – field view (regions, segments) vs. object view (objects in a plane) • Data model implies spatial sampling and spatial errors 3 Classes of Spatial Data • Geostatistical Data – points as sample locations (“field” data as opposed to “objects”) • Continuous variation over space • Lattice/Regional Data – polygons or points (centroids) • Discrete variation over space, observations associated with regular or irregular areal units • Point Patterns – points on a map (occurrences of events at locations in space) • Observations of a variable are made at location X • Assumption that the spatial arrangement is directly related to the interaction between units of observation Visualization and ESDA • Objective – highlighting and detecting pattern • Visualization – mapping spatial distributions – outlier detection – smoothing rates • ESDA – dynamically linked windows – linking and brushing Mapping patterns http://www.cdc.gov/nchs/data/gis/atmapfh.pdf ESDA http://www.public.iastate.edu/~arcview-xgobi/ Spatial Process • Spatial Random Field – { Z(s): s ∈ D } » s ∈ Rd : generic data location (vector of coordinates) » D ⊂ Rd : index set (subset of potential locations) » Z(s) random variable at s, with realization z(s) – Examples • s are x, y coordinates of house sales, Z sales price at s • s are counties, Z is crime rate in s Point Pattern Analysis • Objective – assessing spatial randomness • Interest in location itself – complete spatial randomness – clustering, dispersion • Distance-based statistics – nearest neighbors – number of events within given radius Point Patterns • Spatial process – index set D is point process, s is random • Data – mapped pattern » examples: location of disease, gang shootings • Research question – interest focuses on detecting absence of spatial randomness (cluster statistics) – clustered points vs dispersed points Geostatistical Data • Spatial Process – index set D is fixed subset of Rd (continuous) • Data – sample points from underlying continuous surface » examples: mining, air quality, house sales price • Research Question – interest focuses on modeling continuous spatial variation – spatial interpolation (kriging) Variogram Modeling (Geostatistics) • Objective – modeling continuous variation across space • Variogram – estimating how spatial dependence varies with distance – modeling distance decay • Kriging – optimal spatial prediction Lattice or Regional Data • Spatial process – index set D is fixed collection of countably many points in Rd – finite, discrete spatial units • Data – fixed points or discrete locations (regions) » examples: county tax rates, state unemployment • Research question – interest focuses on statistical inference – estimation, specification tests Spatial Autocorrelation • Objective – hypothesis test on spatial randomness of attributes = value and location • Global and local autocorrelation statistics: Moran’s I, Geary’s c, G(d), LISA • Visualization of spatial autocorrelation – Moran scatterplot – LISA maps Spatial process models • How is the spatial association generated? – Spatial autoregressive process (SAR) • Y = ρWY + ε – Spatial moving average process (SMA) • Y = (I + ρW) ε – ε – vector of independent errors – W = distance weights matrix – In SAR, correlation is fairly persistent with increasing distance, whereas with SMA is decays to zero fairly quickly. • Spatial process—the rule governing the trajectory of the system as a chain of changes in state. • Spatial pattern—the map of a single realization of the underlying spatial process (the data available for analysis). • Say you conduct a regression analysis. If the residuals do not display spatial autocorrelation, then there is no need to add “space” to the model. Examine s.a. in the residuals using Moran’s I or Geary’s c or G(d). Perspectives on spatial process models • Finding out how the variable Y relates to its value in surrounding locations (the spatial lag) while controlling for the influence of other explanatory variables. • When the interest is in the relation between the explanatory variables X and the dependent variable, after the spatial effect has been controlled for (this is referred to as spatial filtering or spatial screening). • The expected value of the dependent variable at each location is a function not only of explanatory variables at that location, but of the explanatory variables at all other locations as well.