Spatial Statistics I
RESM 575
Spring 2010
Lecture 7
Today
Part A: Spatial statistics
What are spatial measurements and
statistics?
Part B:
Measuring geographic distributions
Testing statistical significance
Identifying patterns
2
And if we have time…
Spatial statistics (continued)
Defining spatial neighborhoods and weights
Identifying clusters
Using statistics with geographic data
Analyzing geographic relationships
3
Part A: Spatial Statistics
Spatial Measurements
GIS analysis
In many cases
the map is the
analysis
Or, the GIS
tools and
methods to
create new data
was displayed
to analyze and
draw
conclusions
5
Issues with traditional GIS analysis
Making a map may not be enough to get the
answers you need
Trying to draw conclusions from a map is not
always easy
6
Mapping questions
How to classify features?
How to symbolize features?
How to show relationships?
7
New spatial statistic tools
To describe the
distribution of a set of
features
To discern patterns
To measure relationship
between features
8
Interdisciplinary contributions
Geographers
Regional scientists
Ecologists
Geologists
Economists
Wildlife biologists
Sociologists
9
New spatial stat tools
Rely on statistics to cut through the map
display
Get right at the patterns and relationships in
the data
Space or location is a fundamental
component of the stats
10
New questions from spatial stats
How sure am I that the pattern I am seeing
isn’t due to some random occurrence?
To what extent does the value of a feature
depend on the values of surrounding
features?
How well does the value of one attribute
predict the value of another?
What are the trends in the data?
11
What we will use spatial stats to find
1.
2.
3.
4.
How features are distributed
What is the pattern created by the features
Where are the clusters
What are the relationships between sets of
features or values
12
Spatial statistics
Exploratory tools
Help you measure processes, distributions,
and relationships
Spatial pattern and processes
“results are statistically significant at the .05
level”
13
Understanding your data
In general you can analyze features using
location alone or using location influenced by
an attribute value
The type of attribute values will influence the
statistical method you use
14
Geographic features
Are either:
Discrete
or
Spatially
Continuous
Spatially
continuous
surface
15
Geographic features
Spatially continuous categorical data
16
Attribute values
Include
Nominal
Ordinal
Ranked data from hi to low (soil suitability scores)
Interval
Also referred to as categorical data
Quantities, tells us relative magnitude
Ratio
Relationship between quantities
17
Choosing a spatial stat method
Depends on the type of data you have
For example:
You can analyze the distribution of discrete data
themselves or the distribution of an attribute
associated with the features
For spatially continuous data, you are interested
in the distribution of the values
18
Frame the question
In inferential stats the analysis is stated as a
hypothesis
Landslides in this area tend to occur more
frequently on slopes over 30%
To insure impartiality, we structure the
analysis assuming the inverse of the
hypothesis is true
Landslides are equally likely to occur on any type
of slope
19
Test the significance of the statistic
The null hypothesis..
there is no pattern or relationship
Significance tests help us decide whether we
should or should not reject the null
hypothesis
Must also decide on confidence level ranging
from 0 to 1, 80% sure (.20 confidence level),
95% sure (0.05 confidence level) that the
clusters did not occur by chance
20
Question the results!
Even if statistically significant, question them
Consider:
Scale you’re working at
Where the study boundaries fall
Type of data you are using
Quality of the data
How did you define proximity between features
(straight line vs travel time on roads)
21
Part B: Spatial Statistics
Measuring geographic
distributions
Testing statistical significance
Identifying patterns
Measuring geographic distributions
Identify spatial characteristics of a distribution
Where is the center?
What feature is most central?
How are features dispersed around the
center?
23
ArcTools
in
ArcGIS
24
Where is the center?
Mean Center tool
Computes the average X and Y coordinates of all
features
Creates a new point feature
25
Example
26
Example
27
Mean center tool
More common use:
To compare distributions of different types of
features or to find the center of features based on
an attribute value
28
Example
29
Example
30
Example
31
What is the most central feature?
Central feature tool
Identifies the most centrally located feature
Feature having the lowest total distance to all other
features
32
Example
33
Example
34
What is the most central feature?
More interestingly is by adding a weight in the
analysis such as population
We are now finding not just which site is most
central but which is the most accessible to
the greatest number of people.
35
Example
36
Measuring feature distribution
Standard distance tool
Measures distribution of features around the
mean
Result is a summary statistic representing
distance
If circle is large, incidents are widespread
If small, incidents are more localized
37
Example
38
Distributional trends
Directional distribution (standard ellipse) tool
Identify spatial trends in the distribution of
features
Uses
Compare distributions
Examine different time periods
Show compactness and orientation
39
Example
40
Testing statistical significance
The next section of identifying patterns or
later spatial relationships allows us to perform
significance tests on the results before
accepting them
41
Using significance tests with spatial data
Spatial data contradicts some of the
assumptions of inferential statistics
You need to be aware of these limitations!
42
Assumptions
Testing a random sample
With GIS data in a database you may not know if
the data was randomly sampled
How large the sample is in relation to the
population?
Even with randomness assumed, spatial data
often violates the independence of
observations in a sample
Spatial data is rarely evenly distributed across a region
43
For spatial pattern analysis…
The null hypothesis is that features are
evenly distributed across the study area
Hard to imagine this being true
You have to make one of two common
sampling assumptions: randomization or
normalization
44
Identifying Patterns
Why study patterns?
Range from completely clustered to completely
dispersed
45
Identifying patterns (applications)
Forestry applications
USFS may measure the pattern of clear cuts to
ensure sufficient contiguous forest habitat remaining
Agency may allow a level of clustering of clear cuts
and then make sure it is not exceeded
Wildlife studies
if population is dispersed then species can live in a
wide range of habitats, if clustered then it has very
specific habitat requirements
46
Goal for analyzing spatial patterns
Are there underlying spatial processes
influencing the locations of our features?
Are our features randomly located throughout
the study area, or are they displaying a
clustering or dispersed pattern?
47
Two approaches for analyzing spatial
patterns
Global calculations
Identifies overall patterns or
trends in the data
Effective for complex messy
data
Interested in broad overall
results
Work by comparing feature
locations and/or attributes to
a theoretical random
distribution to determine if
you have statistically
significant clustering or
dispersion
48
2nd approach
Local calculations
Identify the extent and
locations of clustering
Answer where do we
have spatial clustering
Process every feature
within the context of
its neighboring
features in order to
determine whether it
represents a spatial
outlier, or if part of a
statistically significant
spatial cluster
49
Are features clustered?
Can be found with the Nearest Neighbor Index
which does not require specifying an attribute
If based on an attribute we can test for Spatial
autocorrelation using the (Moran’s I) tool
Things that are closer are more alike than things that are
not
Measures similarity of neighboring features
Identifies if features are clustered or dispersed
50
Output for this tool
Output gives a graphical display of results in
four different ways
Statistical #s
Pictorial of #s
Significance
Sentence
51
Example
Compare z score on two different years of
data, if z score increases, then clustering is
more intense
52
Locate the hot spots
This is a local question that requires a hot spot
analysis (Getis-Ord Gi*) tool
Indicates the extent to which each feature is
surrounded by similarly high or low values
Where do features with similar attribute values
cluster spatially together
53
Getis-Ord Gi* tool
Identifies where clustering occurs in both high
and low values
Calculates a Z score for each feature
High Z = hot spot (when a feature has a high
value and it is surrounded by other features with
high values)
Low Z = cold spot (when we have features with
low values surrounded by other features with low
values)
54
Notes on the Z score
Z score is a measure of standard deviation
It is a reference value that’s associated with a
standard normal distribution
A very high or low Z score would be found in
the tails
55
More notes on Z score
A very high or low Z score means that the pattern
deviates significantly from a hypothetical random
pattern
For example, when using a 95% CI, Z scores are 1.96 and +1.96
If Z is between these -1.96 and +1.96 you can’t
reject the null
You are seeing one version of a random pattern
If very high or low (ie -2.5 or +5.4) you have a
pattern that’s too unusual to be a pattern of random
choice so we reject the null hypoth
REMEMBER: The null hypothesis is that features are evenly
distributed across the study area
56
Reference
Mitchell, A. 2005. ESRI Guide to GIS
Analysis, Volume 2. ESRI press, Redlands,
CA.
57