Non-technical Overview of Geospatial Statistical Methods GIS/Mapping and Census Data Second Annual Census Workshop Series Workshop 3: Spatial Statistics, Spatial Research & Confidential Census Data New York Census Research Data Center (CRDC) Baruch College, CUNY, May 8, 2008 A Survey of Topics 1. 2. 3. 4. 5. 6. 7. 8. Points (Events) vs. Polygons (Areal Units) Software Packages Methods of Point Pattern Analysis 1. Centrographic description 2. Distance analysis 3. Spatial clusters Methods of Spatial Data Analysis 1. Thematic mapping vs. exploratory spatial data analysis (ESDA) 2. Spatial autocorrelation: how do we know if it is present and, if it is, why do we care? 3. Making neighbors: spatial weights Spatial Regression Models 1. Spatial error vs. spatial lag 2. Spatial heterogeneity vs. spatial dependence Spatial Interpolation Space/Time Dependence Spatial mixed and spatial generalized linear models Atlanta metro region with locations of selected homicides Where to Look for Spatial Analytic Tools ESRI home page, with links to resources for digital maps, data sets, utilities, courses, etc. http://www.esri.com/ CrimeStat III: A Spatial Statistics Program for the Analysis of Crime Incident Locations software, manual, and sample data http://www.icpsr.umich.edu/NACJD/crimestat.html/ SaTScan v7.0.3 software, manual, and sample data available at http://www.satscan.org GeoDa home site, with links to GeoDa installation, manuals, tutorials, data sets and other supporting materials https://www.geoda.uiuc.edu/ R Spatial Projects: packages (e.g., spdep) to carry out spatial data analysis using the R language http://sal.uiuc.edu/csiss/Rgeo/ SpaceStat: A Program for the Statistical Analysis of Spatial Data, SpaceStat tutorial and instructional manual, all available at http://www.terraseer.com/products_spacestat.php Stata: tools for spatial data analysis (spat* routines) SAS: spatial error covariance structures in mixed and glimmix procedures POINT PATTERN ANALYSIS Using CrimeStat 1. Spatial distribution 1. 2. 2. 3. projected or spherical coordinates polar coordinates Distance analysis Spatial clusters Spatial distribution: spherical or projected coordinate system 1. 2. 3. 4. 5. 6. 7. Mean center Median center Center of minimum distance Standard deviation of X and Y coordinates Standard distance deviation Standard deviational ellipse Average density Spatial distribution: polar coordinate system 1. Directional mean and variance 2. Convex Hull Centrographic statistics These statistics originate in the 1920’s e.g., Lefever, D. 1926. “Measuring geographic concentration by means of the standard deviational ellipse.” American Journal of Sociology 32(1): 88-94 They are called centrographic in that they are two-dimensional analogs to the basic statistical moments of the univariate distribution Standard deviational ellipse 1. 2. 3. Because we are working in 2 dimensions, standard distance deviation distorts dispersion by ignoring skew Standard deviational ellipse gives dispersion in 2 dimensions Derived from the bivariate distribution Geometric and Harmonic Means 1. These statistics “hug” the center of the distribution and as such are useful measures of central tendency when the distribution is skewed 1. 2. Geometric mean is the anti-log of the mean of logarithms of X and Y Harmonic mean is the inverse of the mean inverse of X and Y Average Density 1. 2. 3. Number of incidents divided by area Sometimes called “intensity” Area can be defined in measurement parameter page in CrimeStat 1. otherwise minimum and maximum X and Y Spatial Distribution Using Polar Coordinates 1. 2. 3. 4. 5. Centrographic statistics discussed thus far have been based on spherical or projected coordinates Another metric can be used, called a polar coordinate Statistics are calculated using trigonometric functions Input is a set of vectors defined as angular deviations from a reference vector and a distance vector CrimeStat can convert X and Y coordinates into angles with a bearing from an origin Directional Mean 1. 2. 3. Calculated as the intersection of the mean angle and the mean distance Directional mean is dependent on the choice of origin Triangulated mean is the intersection of the directional mean from the lower-left and upperright origins Measures of Spread 1. Convex Hull 1. Boundary drawn around the distribution of points 2. Represents the polygon that circumscribes all points in the distribution such that no point lies outside the polygon 2. Circular Variance 1. Standardized variance 2. 0 shows no variability; 1 indicates maximum variability Distance analysis 1. Nearest neighbor analysis 2. Ripley’s K statistic Nearest Neighbor Index (NNI) 1. NNI = d(NN) / d(ran) 2. If observed average (nearest) distance is same as mean random distance, ratio is 1.0 1. If ratio < 1, points are closer together than expected on the basis of chance 2. If ratio > 1, evidence of dispersion more widely than expected on basis of chance K-Order Nearest Neighbors 1. 2. 3. 4. NNI is only an indicator or first-order spatial randomness. What about second nearest neighbor? What about Kth nearest neighbor? Mean random distance to Kth nearest neighbor 1. It is suggested that no more than 100 nearest neighbors be calculated Linear Nearest Neighbor Index 1. 2. 3. 4. Uses indirect distances by applying a grid to the region Called Manhattan distances Must supply a length of street network Also a K-order LNNI The Problem of Edge Effects 1. 2. 3. An incident occurring near the border of the study area may actually have its nearest neighbor on the other side of the border. Since there is no information on incidents outside the study area, the program selects another point as nearest neighbor The observed nearest neighbor distance is probably greater than it should be Ripley’s K Statistic 1. 2. 3. Index of non-randomness for different scale values It is a “super-order” nearest neighbor statistic Provides a test of randomness for every distance from the smallest up to some specified limit area Cluster Analysis 1. Mode 1. Fuzzy Mode 2. Nearest Neighbor Hierarchical Clustering 1. Risk-Adjusted NNHC 3. Spatial and Temporal Analysis of Crime (STAC) 4. K-means Partitioning Clustering Mode and Fuzzy Mode 1. Locations (points) with the highest number of incidents are defined as “hot spots” (clusters) 2. Definition of a cluster is based on frequency 3. Usefulness depends on degree of resolution 4. Fuzzy mode allows a search radius 1. Caution: points are counted multiple times Nearest neighbor hierarchical clustering 1. Identifies groups of incidents that are spatially close 2. It is a hierarchical clustering routine that clusters points together on the basis of a criterion called a threshold distance 1. determined by random distance algorithm or used defined 2. Minimum number of events may also be defined 3. Clustering is repeated until either all points are grouped or the clustering routine fails Spatial and Temporal Analysis of Crime (STAC) 1. 2. STAC is a combination of a scan statistic, counting the number of events within a circle, and the hierarchical clustering technique just described. The results are visualized as a standard deviational ellipse (or convex hull) computed for the points associated with each cluster How Does STAC Work? 1. 2. STAC lays out a 20 x 20 grid structure on the plane defined by the area boundary STAC places a circle on every node of the grid, with a radius equal to 1.414 (square root of 2) times the specified search radius. This ensures that the circles overlap 1. 3. The user can specify different search radii STAC counts the number of points falling within each circle and ranks the circles in descending order (the top 25 search areas are selected) How Does STAC Work (cont.)? 4. 5. 6. For the 25 circles (or all circles with at least 2 data points) the X and Y coordinates of any node within the search radius are recorded, along with the number of data points found for each node If a point belongs to 2 different circles, the points within the circles are combined. The process is repeated until there are no overlapping circles How Does STAC Work (cont.)? 7. 8. 9. Using the data points in each cluster (hot spot), STAC calculates the best-fitting standard deviational ellipse (or convex hull) Because the standard deviational ellipse is a statistical summary, it may not contain every point in the cluster or it may contain points that are not in the cluster The convex hull creates a polygon around all points in the cluster K-Means Partitioning Clustering 1. 2. 3. 4. K-means clustering is a partitioning procedure where the data are grouped into K groups defined by the user The routine tries to find the best positioning of the K centers and then assigns each point to the center that is nearest All points are assigned to clusters Useful when the user wants to control grouping K-means routine 1. Step 1: identification of an initial guess for the location of the K clusters 1. 2. 3. Grid is overlaid on the data set and number of points in each grid cell is counted Grid with highest count is (initial) first cluster Second cluster is grid cell with next most points that is separated by separation criterion 1. 4. 2.358 * 0.5 * SQRT(A / N); where 2.358 is student t for 0.01 significance level Third (initial) cluster is selected and so on K-means routine (cont.) 2. 3. Step 2: local optimization assigns each point to the nearest of the K seed locations Then repeats by minimizing distance of points to center of assigned cluster Thematic Maps Using ArcGIS 1. Default setting is natural breaks 1. Also called Jenks’ optimization 2. Partitions data into user-defined number of classes by calculating breaks based on smallest possible total error (smallest sum of squared deviations about the class mean) Classification choices in ArcGIS 1. Other choices are: 1. 2. 3. 4. 5. Quantiles Standard deviation Equal interval Defined interval manual Functionality of GeoDa Outlier Maps in GeoDa 1. 2. Percentile maps Box maps Exploratory Spatial Data Analysis 1. 2. ESDA is the spatial analog of Tukey’s EDA ESDA is the visualization of spatial non-randomness (i.e., positive or negative spatial autocorrelation). Also used to discover: 1. 2. 3. 3. Spatial trends Spatial regimes (non-stationarity) Spatial outliers ESDA may also involve the visualization of spatial covariates Percentile maps provide an intuitively appealing display of geographic patterns of homicide rates. However, the simple visual inspection of maps is potentially unreliable in the detection of clusters and patterns in the data. Human perception is not sufficiently rigorous to assess "significant" clustering and tends to be biased towards finding patterns, even in spatially random data. Moran Scatterplot Why Do (Should) We Care? 1. 2. 3. First Law of Geography (Tobler’s Law): everything depends on everything else, but closer things more so Clusters and correlated errors Classical linear regression model assumes: COV ui u j 0, i j COV X i ui 0 An Example from Temporal Analysis 1. 2. In time series analysis autocorrelation is a common concern: first-order autoregressive scheme AR(1) This can be shown by writing the two variable PRF as Yt 1 2 X t ut where ut denotes error at time t and the AR(1) scheme is given by ut ut 1 t , 3. 1 1 is known as the coefficient of autocovariance and t satisfies the usual OLS assumptions From temporal to spatial autocorrelation 1. In AR(1) scheme: uˆ uˆ ˆ uˆ t 1 t 2 t 2. Moran’s I Spatial Autocorrelation Statistic: w z z I z i j ij i j 2 i i where wij is an element of a row-standardized spatial weights matrix and zi is yi - m, the variable of interest centered on m, the sample mean How do we know if autocorrelation is present? Durbin-Watson d statistic: N d uˆ t 2 t uˆt 1 2 N 2 ˆ u t t 1 0d 4 Bounds on d: Relationship between d and ̂ : d d 21 ˆ , therefore ˆ 1 2 So if ˆ 0, d 2; ˆ 1, d 0; ˆ 1, d 4 From temporal to spatial autocorrelation 1. Geary’s C: C i w z z ij i j j 2i z 2 2 i How do we represent the relationship between areal units in the W matrix? 1. Probably the most common choice is the construction of an “adjacency” (contiguity) matrix, in which spatially adjacent areal units (where area i shares a common border with area j) are assigned scores of 1, and 0 otherwise (including the main diagonal) Spatial weights based on contiguity Spatial weights matrix 1. 2. 3. From X and Y coordinates, we can calculate distance from centroids Minimum distance ensures that all units will be connected with at least one other unit (no “islands”) Another choice is k-nearest neighbors Threshold based spatial weights and visualization of connectivity Spatial weights matrix 1. 2. 3. Another popular choice is the inverse of the distance between the geographic centers of the areal units (again with the main diagonal set to zero) This establishes a decay function that will weigh the effect of events in geographically closer units more heavily than those in more distant units Inverse distance matrices are particularly useful partitions of geographic space when the phenomenon of interest involves the transfer or exchange of information Inverse Distance county B county C county K 1/25 1/30 1/150 1/60 1/200 county A county A 0 county B 1/25 0 county C 1/30 1/60 1/120 0 county K 1/150 1/200 1/120 0 Spatial weights matrices The choice of matrix representations is by no means limited to the examples given here. In fact there is an infinite number of weights matrices, but some representations will be more substantively and historically compelling than others. How do we measure neighbors among areal units? 1. 2. 3. 4. 5. Sparse vs. full weights matrix? Does size of matrix matter? Row-standardization? Unequal Size of areal units? Edge effects? Imagine the Moran Scatterplot as a Thematic Map: Local Moran’s I The formal expression of the local Moran’s I is statistic is: Ii = (zi / i zi2) j wij zj, where wij is an element of a row-standardized weights matrix, observations zi and zj refer to the variable in question in standardized form. Is Ii “Significant”? The significance of this statistic is based on a permutation approach whereby 999 Ii statistics are generated to form a reference distribution. Each county’s Ii statistic is compared to this reference distribution for inference purposes. Counties with non-significant Ii values appear white in the Moran Scatterplot Map. Conditional randomization or permutation approach 1. 2. 3. Randomization is conditional in the sense that the value at a given location is held fixed (that is, not used in the permutation) and the remaining values are randomly permuted over the locations in the data set For each of these resampled data sets, the value of Li can be computed The resulting empirical distribution function provides the basis for a statement about the “extremeness” of the observed statistic relative to (and conditional on) the values computed under the null hypothesis (the randomly permuted values) LISA map of (log) median housing value in DC metro area LISA map of percent black in DC metro area Stata’s spatgsa Stata’s spatlsa Other Uses of Spatial Autocorrelation 1. 2. Logan used neighborhood clusters of high poverty to identify target areas for community development funding by NYC DYCD (Department of Youth and Community Development) Spatial Autocorrelation of Regression Residuals Violates OLS Assumption Deane, Glenn, Steven F. Messner, Thomas Stucky, Charis Kubrin, and Kelly McGeever. “Not ‘Islands Unto Themselves’: Exploring the Spatial Context of City-Level Robbery Rates.” Journal of Quantitative Criminology (forthcoming) Spatial Dependence in City (log) Robbery Rates OLS assumption violation: correlated error Choosing the Appropriate Form of Spatial Dependence 1. 2. Lagrange multiplier tests as a diagnostic tool Baller, Robert D., Luc Anselin, Steven F. Messner, Glenn Deane, and Darnell F. Hawkins. 2001. “Structural Covariates of U.S. Country Homicide Rates: Incorporating Spatial Effects.” Criminology 39: 561-590. Spatial Heterogeneity vs. Spatial Dependence 1. 2. 3. Spatial data analysts too readily associate spatial nonrandomness with spatial dependence Spatial heterogeneity should be the first choice Spatial chow test for spatial regimes as an alternative to spatial regression models Interpreting the local effect of X Deane, Glenn, E. M. Beck, and Stewart E. Tolnay. 1998. “Incorporating Space into Social Histories: How Spatial Processes Operate and How We Observe Them.” International Review of Social History, Supplement 6, 43:57-80. Also reproduced in New Methods for Social History, edited by Larry J. Griffin and Marcel van der Linden (1999). Estimation of Spatial Models 1. The Mixed Spatial Lag Model: y Wy X If the autoregressive parameter is known: y Wy X This is a spatial filter model If autoregressive parameter is not known SpaceStat is still the most comprehensive package available: 1. Maximum likelihood estimation 2. Instrumental variables estimation 1. 2. 3. 3. 4. via 2SLS via generalized method of moments (GMM) Spatial Seemingly Unrelated Regressions (SUR) Conditional Autoregressive Model (CAR) OLS estimation 1. 2. 3. 4. Spatially lagged X can be estimated by OLS Trend Surface Model Spatial Expansion Model Spatial Regime Model (Spatial Chow) Problem of ML Estimation 1. 2. The spatial weights matrix W is n x n and is used ind matrix addition and multiplication operations in the ML estimators To obtain the variance-covariance matrix of these estimators, which is necessary for hypothesis testing, we must compute the inverse of the n x n matrix A Instrumental Variables Estimation 1. 2. 3. Formally this is a two-stage-least squares process in which the vector y* is estimated in the first-stage (where y* is Wy): y* is regressed on the complete set of fully exogenous variables the actual observations on the y* variable are replaced by their corresponding predicted values In the second stage, y is regressed on the predicted values of y* and the submatrix X Estimation of Spatial Models Using SpaceStat 1. The models in the Regress module are organized along 2 dimensions 2. There are 4 general classes of models, which each correspond to a menu in the Regress module 1. 2. 3. 4. Classic regression model Model with spatial error dependence Model with heteroscedastic errors Model with spatially lagged dependent variable Second Dimension Pertains to the Form of the Model Specification 1. 5 distinct forms in SpaceStat 1. 2. 3. 4. 5. Generic regression Trend surface Spatial regimes Spatial expansion Spatial ANOVA File Structure for Spatial Regression Models 1. Trend surface: you must select the model specification as 2 (only two explanatory variables) 1. 2. 3. 4. X coordinates and Y coordinates Spatial regimes: you must select the model specification as 3 (you must also specify an indicator variable to define the regimes) Spatial expansion: you must select the model specification as 4 (you must also specify the expansion variables and the order for the expansion polynomial) Spatial ANOVA: you must select the model specification as 5 (the explanatory variables must be categorical) File Structure by Estimation Method 1. Spatially lagged dependent variable 1. For spatial lag or spatial error models you may specify the spatial lag Wy explicitly 2. However the default is that SpaceStat computes the lag internally based on the weights matrix and dependent variable 2. Spatially lagged explanatory variables 3. Instrumental variables Other Software Packages and Spatial Models 1. 2. 3. 4. GWR: geographically weight regression Stata has spat* routines: spatwmat, spatgsa, spatcorr, spatlsa, spatdiag, spatreg R has a variety of packages including spdep Spatial mixed model (e.g., SAS) 1. 5. Including spatial panel model Spatial generalized linear model (and mixed model) Spatial Interpolation 1. 2. 3. There are many interpolation techniques These methods require point locations (X- and Y- coordinants) and intensities These methods include trend surface models, spatial expansion, local regression models (e.g., spatial spline models), kriging, and kernel density estimation How Do We Model the Data Generating Process? 1. Trend Surface Method: zi = a + b1ui + b2vi 2. Spatial Expansion Method: zi = a + b1X1i + … + bkXki ai = a0 + a1ui + a2vi b1i = b10 + b1ui + b2vi bki = bk0 + b1ui + b2vi Spatial Expansion 1. Spatial expansion is a mixed model zi = a0 + a1ui + a2vi + b10x1i + b11uiX1i + b12vix1i + bk0xki + bk1uiXki + bk2vixki Kernel Density Estimation 1. 2. 3. Strictly speaking, kernel density estimation is not an interpolation technique, it is the estimation of a probability surface A smooth (and symmetrical) kernel function is placed over each point The underlying density is estimated by summing the functions at all locations (points) on the surface to produce a smooth cumulative density function How Does Kernel Density Estimation Work? 1. 2. 3. “Kernels” are the chosen density functions placed over each point There are many kernel density functions (CrimeStat supports 5: normal, uniform, quartic (spherical), triangular (conical), and negative exponential (peaked)) Main difference is that the normal includes all points in the pattern, whereas the others have a circumscribed radius (a cut-off distance) that is included in the summation How Does Kernel Density Estimation Work (cont.)? 4. 5. 6. 7. The smoothness of the density function is a consequence of the bandwidth size (as shown in the figures) Luckily, as long as the kernel function is symmetrical, choice of kernel function generally doesn’t make too much difference Edge effects may also cause distortion An intensity, or an exposure to risk, at each location, results in 3-dimensional kernels Geographically Weighted Regression 1. 2. 3. GWR is very similar to kernel density estimation in concept But also like EB (empirical bayes) estimation of multilevel models in its extraction of information and parameter estimation But its really just WLS with locational coordinates doing the weighting What is GWR? 1. OLS regression: y = B0 + B1x1 + B2x2 + u 2. GWR regression: y = B0(g)+ B1(g)x1 + B2(g)x2 + u (g) indicates location of estimated parameters GWR is weighted least squares 1. Matrix form of B: 1. In OLS regression: (X’X)-1 X’Y 2. In GWR regression: (X’ W(g) X)-1 X’ W(g) Y 2. W weights the connection of locational coordinates ui, vi GWR Regression Output 1. 2. GWR will output a text file with regression estimates (both global and local) Includes standard regression diagnostics (e.g., ANOVA table) and random coefficient diagnostics GWR Casewise Regression Output PARM_1 … PARM_n Values of the estimates of the parameters at each regression point. n is one more than the number of independent variables with PARM_1 containing the values of the intercept term. SVAL_1 … SVAL_n Values of the estimates of the standard errors of the parameters at each regression point. The numbering of these variables is as for the parameter estimate variables. TVAL_1 … TVAL_n Pseudo-t values OBS Observed y variable value PRED Predicted y variable value RESID Unstandardised residual HAT Leverage value STDRES Standardised residual COOKSD Cook’s Distance LOCRSQ Pseudo-R2 values GWR as a Diagnostic Tool for Non-Stationarity 1. 2. Compare normal sampling distribution of regression parameters to spatial distribution 1. 2 x s.e. vs. IQR Visualization of Non-Stationarity 1. Save uncompressed ESRI export file (.e00) 2. Join to attribute file and create choropleth (thematic) maps Example: GWR Regression parameter relative deprivation residential instability %nh black % young male (ln) pop size % divorce West proactive police Intercept 2 x s.e. 0.0489 0.006424 0.00337 0.017283 0.118453 0.018289 0.092001 0.060788 0.614187 IQR 0.14921 0.029074 0.011177 0.045791 0.266346 0.085312 0.549081 0.193305 1.828351 Detecting Space-Time Clusters 1. Knox Index 1. Discrete treatment of space-time 2. Mantel Index 1. Continuous treatment of space-time 3. Space/Time Scans Knox Index 1. 2. 3. 4. 5. Knox statistic is simply a chi-square statistic Each pair of points is compared in terms of distance and time interval Distance is categorized as close in distance and not close in distance Time interval is categorized as close in time and not close in time N*(N-1)/2 pairs Methods for Dividing Distance and Time 1. Mean distance and mean time interval 1. This is default in CrimeStat 2. Median distance and median time 3. User defined criteria for distance and time separately Observed Frequencies for Knox Index Expected Frequencies for Knox Index Chi-Square Based Test 1. 2. Monte Carlo simulation for chi-square value under spatial randomness Random simulation is repeated K times (K selected by user) 1. where distance and time interval are selected from range of min and max distance and time Mantel Index 1. 2. 3. Mantel Index is the continuous measure counterpart to Knox statistic It is a correlation between distance and time interval for pairs of incidents More formally it is a general test for the correlation between two dissimilarity matrices Mantel Index 1. 2. Where Xij is an index of similarity between two observations, i and j, for distance and Yij is an index of similarity between the same observations for time interval T is a covariance measure Mantel Index 1. 2. The covariance is then normed by dividing by the product of the standard deviations of X and Y Thus the Mantel Index is a correlation coefficient Limitations of Mantel Index 1. 2. 3. Like usual correlation coefficient, Mantel Index is sensitive to distributional form and outliers Because test is a comparison of all pairs, N*(N-1)/2, the correlations tend to be very small, which makes it less intuitive for most analysts Continuous treatment of time and space means that sample size must be quite large to produce a stable estimate Kulldorff’s Scan: SaTScan 1. 2. H0: the null spatial model is an inhomogeneous Poisson point process with an intensity, mu, proportional to the population-at-risk H1: in some locations in the multidimensional space, the number of cases exceeds that predicted under the null model How Does SaTScan Work? 1. 2. 3. A cylindrical window is moved systematically through the study’s geographic and temporal space The window is centered on an individual region centroid at a particular time and expanded to include neighboring regions and time intervals until it reaches a maximum size The number of cases observed and expected within the window is calculated at each window size How Does SaTScan Work (cont.)? 4. 5. 6. The maximum size will not exceed 50% of the average population-at-risk for the study period and 50% of the study period span The window is then centered on the next region centroid and the process is repeated The hypotheses are evaluated with a maximum likelihood ratio test that examines whether the null or alternative model better fits the data