Center for Teaching, Research and Learning Research Support Group American University, Washington, D.C. Hurst Hall 203 rsg@american.edu (202) 885-3862 Advanced ArcGIS: Spatial Analysis ______________________________________________________________________________________________________________________________ WORKSHOP OBJECTIVE: This workshop will cover: 1. Analyzing global patterns across a study area using methods like the average nearest neighbor, Getis-Ord General G, Ripley’s K function, and global Moran’s I 2. Identifying local clusters or “hot spots” using Anselin local Moran’s I and Getis-Ord hot spot analysis (Gi*) 3. Basic discussion of these statistics and how to interpret results SCENARIO: ArcGIS is a powerful tool for processing and visualizing spatial data, and it can also be used for more advanced analysis using spatial statistics. This workshop is designed for users who wish to quantify patterns in spatial data, such as identifying general patterns across a study area or finding local clusters. A basic background in ArcGIS and statistics is helpful, although not required for this workshop. 1 I. STATISTICAL FOUNDATIONS When observing a map of spatial data, there are often signs of global patterns or local clusters. Spatial statistics provide a way to quantify these patterns and determine whether they are statistically significant. Pattern analysis can be divided into two categories: Global patterns Tests for this often entail one statistic that describes the overall clustering tendency across a study area. Global patterns can be identified by location of points and/or attribute values associated with points or regions. These test the null hypothesis that features are randomly distributed. The null hypothesis is rejected if points are closer or more dispersed than expected under randomness. Local clusters Tests for this usually produce local statistics for each region on a map. Hot spots are places with unusually high or low values of these local statistics. Local clusters can also be identified by location of points and/or attribute values associated with points or regions. 2 Although there are a plethora of statistical tests for detecting these patterns, ArcGIS has four built-in tools for analyzing global patterns and two for local clusters: Average nearest neighbor Global Pattern Getis-Ord General G Ripley’s K function (multidistance clustering) Global Moran’s I (spatial autocorrelation) Local Clusters Anselin local Moran’s I (cluster/outlier analysis) Getis-Ord hot spot analysis (Gi*) The Average Nearest Neighbor tool measures the distance between each feature centroid and its nearest neighbor's centroid location. It then averages all these nearest neighbor distances. If the average distance is less than the average for a hypothetical random distribution, the distribution of the features being analyzed is considered clustered. If the average distance is greater than a hypothetical random distribution, the features are considered dispersed. The Getis-Ord General G tool is an inferential statistic, which means that the results of the analysis are interpreted within the context of the null hypothesis (no spatial clustering.) If the z-score value is positive, the observed General G index is larger than the expected General G index, indicating high values for the attribute are clustered in the study area. If the z-score value is negative, the observed General G index is smaller than the expected index, indicating that low values are clustered in the study area. Ripley’s K-function summarizes spatial dependence (clustering or dispersion) over a range of distances. When exploring spatial patterns at multiple distances and spatial scales, patterns change, often reflecting the dominance of particular spatial processes at work. A Distance Band or Threshold Distance is needed for the analysis. The Global Moran’s I tool measures spatial autocorrelation based on both feature locations and feature values simultaneously. Given a set of features and an associated attribute, it evaluates whether the pattern expressed is clustered, dispersed, or random. The tool calculates the Moran’s I Index value and both a z-score and p-value to evaluate the significance of that Index. A positive value for I indicates that a feature has neighboring features with similarly high or low attribute values; this feature is part of a cluster. A negative value for I indicates that a feature has neighboring features with dissimilar values; this feature is an outlier. In either instance, the p-value for the feature must be small enough for the cluster or outlier to be considered statistically significant. The Hot Spot Analysis tool calculates the Getis-Ord Gi* statistic for each feature in a dataset. The resultant z-scores and p-values tell you where features with either high or low values cluster spatially. This tool works by looking at each feature within the context of neighboring features. A feature with a high value is interesting but may not be a statistically significant hot spot. To be a statistically significant hot spot, a feature will have a high value and be surrounded by other features with high values as well. The local sum for a feature and its neighbors is compared proportionally to the sum of all features; when the local sum is very different from the expected local sum, and that difference is too large to be the result of random chance, a statistically significant z-score results. 3 II. USING AVERAGE NEAREST NEIGHBOR We are going to examine whether EMS calls for a fire station have a tendency to cluster. If so, the fire station may consider stationing emergency intensive care units at locations near hot spots. We will use the average nearest neighbor to decide whether or not to reject the following null hypothesis: “EMS calls for service in February 2007 are randomly distributed across the study area.” Launch ArcMap and open an existing map. Navigate to J:/CLASSES/Workshops > GIST2 folder, then to Maps and open Tutorial 81.mxd. Open the attribute table for the Battalion2 layer and make note of the square footage of the study area in the Shape_Area field. The Average Nearest Neighbor tool is very sensitive to the area of study, so it is important to record the measured area of Battalion2. Open the properties of the EMS Calls-Feb07 layer and examine the definition query that has been created. Analyze patterns with the average nearest neighbor tool Use the Search window (or CTRL + F) to locate and open the Average Nearest Neighbor tool. Enter parameters as follows: Input Feature Class: EMS Calls-Feb07 Distance Method: EUCLIDEAN_DISTANCE Check the Generate Report box Area: 520175356 Select OK. When the process is complete, there will be a pop-up box at the bottom right of the screen. Click on the box. You can also access the results by selecting from the Main Menu Geoprocessing > Results. Make a note of the following values from the results: Observed mean distance _______ Expected mean distance _______ Nearest neighbor ratio ________ Z-score ___________ In the Results dialog box, double-click the Report File to open the graphic display of the results. Under the null hypothesis of randomly distributed EMS calls, we expect the nearest neighbor ratio to be 1. A ratio of greater than 1 indicates more dispersed calls than expected, while an index less than 1 indicates more clustered calls than expected. We observe an index of 0.749, which indicates clustering. The z-score is -12.12 with a significance level of 0.01. Thus, we have a 99 percent confidence that the data distribution is not due to random chance. 4 III. PERFORMING CLUSTER/OUTLIER ANALYSIS We are now going to examine where hot and cold spots are for the EMS calls. We will then overlay census data to see if there is a relationship between the two datasets. In the Main Menu, select File > Open. Navigate to J:/CLASSES/Workshops > GIST2 folder, then to Maps and open Tutorial 9-1.mxd. Perform cluster/outlier analysis with Local Moran’s I Analysis tool Running the Cluster/Outlier Analysis with Rendering tool Use the Search window (or CTRL + F) to locate and open the Cluster/Outlier Analysis with Rendering tool. Enter parameters as follows: Input Feature Class: Calls for Service-Feb07 Input Field: FEE Output Layer File: \GIST2\MyExercises\MoransICluster Output Feature Class: \GIST2\MyExercises\MyData.mdb\MoransICluster Select OK. This will create a new feature class with the results of the Local Moran’s I analysis. In the new feature class, the red dots symbolize hot spots – features that are surrounded by other features with similar values (either high or low). Blue dots are cold spots – areas that are surrounded by features with dissimilar values. Open the attribute table for the MoransICluster layer and note the local test statistics. Hot spots are represented by high positive z-scores while cold spots have low negative z-scores. Combining Z-scores with attribute values Hot/cold spots do not indicate anything about the actual value of the input field – we need to add another field for the number of EMS calls. Create a copy of the layer MoransICluster in the Table of Contents (right-click on Zoning to paste) and rename it ClusterTypes. In the Table of Contents, right-click on ClusterTypes > Properties > Symbology tab. In the left-hand menu, under Categories, select Unique Values. In the right-hand side, under Value Field select CO Type IDW 15706. Select Add Values > Complete List. Highlight HH, HL, LH, LL and select OK. Uncheck <all other values>. Change the symbols to Circle 1 with a size of 50. Change the color of HH to Fire Red, LL to Leaf Green, and both HL and LH to Solar Yellow. Under the Display tab, type 40% for transparency. Select OK. In the Table of Contents (List by Drawing Order), move the ClusterTypes layer underneath the MoransICluster layer. Red symbols show where high values are clustering with other high values. Green symbols are where low values are clustering with other low values. Yellow symbols are where high and low values are clustering. 5 Overlaying Census layer with EMS layer Turn on the Census 2010 layer. This overlays population density data from the Census. Note that hot spots (red dots) occur in areas of both high population and of medium population density. Cold spots occur in all population ranges. IV. STATISTICAL APPENDIX 6 V. ADDITIONAL RESOURCES www.esriurl.com/spatialstats VI. OTHER NOTES ArcGIS is available on computers in the Hurst 202, 203 labs and in Anderson Computing Labs. For a full list of our other workshops, go to http://www.american.edu/ctrl/rsgevents.cfm Assistance with ArcGIS is also available in the CTRL lab during normal business hours. For more information, go to http://www.american.edu/ctrl/lab.cfm 7 8