Center for Teaching, Research and Learning Research Support

advertisement
Center for Teaching, Research and Learning
Research Support Group
American University, Washington, D.C.
Hurst Hall 203
rsg@american.edu
(202) 885-3862
Advanced ArcGIS: Spatial Analysis
______________________________________________________________________________________________________________________________
WORKSHOP OBJECTIVE:
This workshop will cover:
1. Analyzing global patterns across a study area using methods like the average nearest
neighbor, Getis-Ord General G, Ripley’s K function, and global Moran’s I
2. Identifying local clusters or “hot spots” using Anselin local Moran’s I and Getis-Ord hot spot
analysis (Gi*)
3. Basic discussion of these statistics and how to interpret results
SCENARIO:
ArcGIS is a powerful tool for processing and visualizing spatial data, and it can also be used for more
advanced analysis using spatial statistics. This workshop is designed for users who wish to quantify
patterns in spatial data, such as identifying general patterns across a study area or finding local clusters.
A basic background in ArcGIS and statistics is helpful, although not required for this workshop.
1
I. STATISTICAL FOUNDATIONS
When observing a map of spatial data, there are often signs of global patterns or local clusters. Spatial
statistics provide a way to quantify these patterns and determine whether they are statistically
significant. Pattern analysis can be divided into two categories:
Global patterns
Tests for this often entail one statistic that describes the overall clustering tendency across a study area.
Global patterns can be identified by location of points and/or attribute values associated with points or
regions. These test the null hypothesis that features are randomly distributed. The null hypothesis is
rejected if points are closer or more dispersed than expected under randomness.
Local clusters
Tests for this usually produce local statistics for each region on a map. Hot spots are places with
unusually high or low values of these local statistics. Local clusters can also be identified by location of
points and/or attribute values associated with points or regions.
2
Although there are a plethora of statistical tests for detecting these patterns, ArcGIS has four built-in
tools for analyzing global patterns and two for local clusters:
Average nearest
neighbor
Global Pattern
Getis-Ord General G
Ripley’s K function
(multidistance
clustering)
Global Moran’s I
(spatial
autocorrelation)
Local Clusters
Anselin local
Moran’s I
(cluster/outlier
analysis)
Getis-Ord hot spot
analysis (Gi*)
The Average Nearest Neighbor tool measures the distance between each
feature centroid and its nearest neighbor's centroid location. It then averages
all these nearest neighbor distances. If the average distance is less than the
average for a hypothetical random distribution, the distribution of the
features being analyzed is considered clustered. If the average distance is
greater than a hypothetical random distribution, the features are considered
dispersed.
The Getis-Ord General G tool is an inferential statistic, which means that the
results of the analysis are interpreted within the context of the null
hypothesis (no spatial clustering.) If the z-score value is positive, the
observed General G index is larger than the expected General G index,
indicating high values for the attribute are clustered in the study area. If the
z-score value is negative, the observed General G index is smaller than the
expected index, indicating that low values are clustered in the study area.
Ripley’s K-function summarizes spatial dependence (clustering or
dispersion) over a range of distances. When exploring spatial patterns at
multiple distances and spatial scales, patterns change, often reflecting the
dominance of particular spatial processes at work. A Distance Band or
Threshold Distance is needed for the analysis.
The Global Moran’s I tool measures spatial autocorrelation based on both
feature locations and feature values simultaneously. Given a set of features
and an associated attribute, it evaluates whether the pattern expressed is
clustered, dispersed, or random. The tool calculates the Moran’s I Index
value and both a z-score and p-value to evaluate the significance of that
Index.
A positive value for I indicates that a feature has neighboring features with
similarly high or low attribute values; this feature is part of a cluster. A
negative value for I indicates that a feature has neighboring features with
dissimilar values; this feature is an outlier. In either instance, the p-value for
the feature must be small enough for the cluster or outlier to be considered
statistically significant.
The Hot Spot Analysis tool calculates the Getis-Ord Gi* statistic for each
feature in a dataset. The resultant z-scores and p-values tell you where
features with either high or low values cluster spatially. This tool works by
looking at each feature within the context of neighboring features. A feature
with a high value is interesting but may not be a statistically significant hot
spot. To be a statistically significant hot spot, a feature will have a high
value and be surrounded by other features with high values as well. The
local sum for a feature and its neighbors is compared proportionally to the
sum of all features; when the local sum is very different from the expected
local sum, and that difference is too large to be the result of random chance,
a statistically significant z-score results.
3
II. USING AVERAGE NEAREST NEIGHBOR
We are going to examine whether EMS calls for a fire station have a tendency to cluster. If so, the
fire station may consider stationing emergency intensive care units at locations near hot spots. We
will use the average nearest neighbor to decide whether or not to reject the following null
hypothesis:
“EMS calls for service in February 2007 are randomly distributed across the study area.”




Launch ArcMap and open an existing map.
Navigate to J:/CLASSES/Workshops > GIST2 folder, then to Maps and open Tutorial 81.mxd.
Open the attribute table for the Battalion2 layer and make note of the square footage of the study
area in the Shape_Area field. The Average Nearest Neighbor tool is very sensitive to the area of
study, so it is important to record the measured area of Battalion2.
Open the properties of the EMS Calls-Feb07 layer and examine the definition query that has
been created.
Analyze patterns with the average nearest neighbor tool








Use the Search window (or CTRL + F) to locate and open the Average Nearest Neighbor tool.
Enter parameters as follows:
Input Feature Class: EMS Calls-Feb07
Distance Method: EUCLIDEAN_DISTANCE
Check the Generate Report box
Area: 520175356
Select OK.
When the process is complete, there will be a pop-up box at the bottom right of the screen. Click
on the box. You can also access the results by selecting from the Main Menu Geoprocessing >
Results.
Make a note of the following values from the results:
Observed mean distance _______
Expected mean distance _______
Nearest neighbor ratio ________
Z-score ___________
In the Results dialog box, double-click the Report File to open the graphic display of the results.
Under the null hypothesis of randomly distributed EMS calls, we expect the nearest neighbor
ratio to be 1. A ratio of greater than 1 indicates more dispersed calls than expected, while an
index less than 1 indicates more clustered calls than expected. We observe an index of 0.749,
which indicates clustering.
The z-score is -12.12 with a significance level of 0.01. Thus, we have a 99 percent confidence
that the data distribution is not due to random chance.
4
III. PERFORMING CLUSTER/OUTLIER ANALYSIS
We are now going to examine where hot and cold spots are for the EMS calls. We will then overlay
census data to see if there is a relationship between the two datasets.

In the Main Menu, select File > Open. Navigate to J:/CLASSES/Workshops > GIST2 folder,
then to Maps and open Tutorial 9-1.mxd.
Perform cluster/outlier analysis with Local Moran’s I Analysis tool
Running the Cluster/Outlier Analysis with Rendering tool
 Use the Search window (or CTRL + F) to locate and open the Cluster/Outlier Analysis with
Rendering tool.
 Enter parameters as follows:
Input Feature Class: Calls for Service-Feb07
Input Field: FEE
Output Layer File: \GIST2\MyExercises\MoransICluster
Output Feature Class: \GIST2\MyExercises\MyData.mdb\MoransICluster
 Select OK. This will create a new feature class with the results of the Local Moran’s I analysis.
 In the new feature class, the red dots symbolize hot spots – features that are surrounded by other
features with similar values (either high or low). Blue dots are cold spots – areas that are
surrounded by features with dissimilar values.
 Open the attribute table for the MoransICluster layer and note the local test statistics. Hot spots
are represented by high positive z-scores while cold spots have low negative z-scores.
Combining Z-scores with attribute values
 Hot/cold spots do not indicate anything about the actual value of the input field – we need to add
another field for the number of EMS calls.
 Create a copy of the layer MoransICluster in the Table of Contents (right-click on Zoning to
paste) and rename it ClusterTypes.
 In the Table of Contents, right-click on ClusterTypes > Properties > Symbology tab.
 In the left-hand menu, under Categories, select Unique Values. In the right-hand side, under
Value Field select CO Type IDW 15706. Select Add Values > Complete List. Highlight HH,
HL, LH, LL and select OK. Uncheck <all other values>.
 Change the symbols to Circle 1 with a size of 50. Change the color of HH to Fire Red, LL to
Leaf Green, and both HL and LH to Solar Yellow. Under the Display tab, type 40% for
transparency. Select OK.
 In the Table of Contents (List by Drawing Order), move the ClusterTypes layer underneath the
MoransICluster layer.
 Red symbols show where high values are clustering with other high values. Green symbols are
where low values are clustering with other low values. Yellow symbols are where high and low
values are clustering.
5
Overlaying Census layer with EMS layer
 Turn on the Census 2010 layer. This overlays population density data from the Census.
 Note that hot spots (red dots) occur in areas of both high population and of medium population
density. Cold spots occur in all population ranges.
IV. STATISTICAL APPENDIX
6
V. ADDITIONAL RESOURCES

www.esriurl.com/spatialstats
VI. OTHER NOTES
ArcGIS is available on computers in the Hurst 202, 203 labs and in Anderson Computing Labs.
For a full list of our other workshops, go to http://www.american.edu/ctrl/rsgevents.cfm
Assistance with ArcGIS is also available in the CTRL lab during normal business hours. For more
information, go to http://www.american.edu/ctrl/lab.cfm
7
8
Download