Spatial statistics I - geographic distributions, identifying patterns

advertisement
Spatial Statistics I
RESM 575
Spring 2010
Lecture 7
Today
Part A: Spatial statistics
 What are spatial measurements and
statistics?
Part B:
 Measuring geographic distributions
 Testing statistical significance
 Identifying patterns
2
And if we have time…
Spatial statistics (continued)




Defining spatial neighborhoods and weights
Identifying clusters
Using statistics with geographic data
Analyzing geographic relationships
3
Part A: Spatial Statistics
Spatial Measurements
GIS analysis


In many cases
the map is the
analysis
Or, the GIS
tools and
methods to
create new data
was displayed
to analyze and
draw
conclusions
5
Issues with traditional GIS analysis


Making a map may not be enough to get the
answers you need
Trying to draw conclusions from a map is not
always easy
6
Mapping questions



How to classify features?
How to symbolize features?
How to show relationships?
7
New spatial statistic tools



To describe the
distribution of a set of
features
To discern patterns
To measure relationship
between features
8
Interdisciplinary contributions







Geographers
Regional scientists
Ecologists
Geologists
Economists
Wildlife biologists
Sociologists
9
New spatial stat tools



Rely on statistics to cut through the map
display
Get right at the patterns and relationships in
the data
Space or location is a fundamental
component of the stats
10
New questions from spatial stats




How sure am I that the pattern I am seeing
isn’t due to some random occurrence?
To what extent does the value of a feature
depend on the values of surrounding
features?
How well does the value of one attribute
predict the value of another?
What are the trends in the data?
11
What we will use spatial stats to find
1.
2.
3.
4.
How features are distributed
What is the pattern created by the features
Where are the clusters
What are the relationships between sets of
features or values
12
Spatial statistics




Exploratory tools
Help you measure processes, distributions,
and relationships
Spatial pattern and processes
“results are statistically significant at the .05
level”
13
Understanding your data


In general you can analyze features using
location alone or using location influenced by
an attribute value
The type of attribute values will influence the
statistical method you use
14
Geographic features
Are either:
Discrete
or
Spatially
Continuous
Spatially
continuous
surface
15
Geographic features
Spatially continuous categorical data
16
Attribute values

Include

Nominal


Ordinal


Ranked data from hi to low (soil suitability scores)
Interval


Also referred to as categorical data
Quantities, tells us relative magnitude
Ratio

Relationship between quantities
17
Choosing a spatial stat method
Depends on the type of data you have
For example:



You can analyze the distribution of discrete data
themselves or the distribution of an attribute
associated with the features
For spatially continuous data, you are interested
in the distribution of the values
18
Frame the question

In inferential stats the analysis is stated as a
hypothesis


Landslides in this area tend to occur more
frequently on slopes over 30%
To insure impartiality, we structure the
analysis assuming the inverse of the
hypothesis is true

Landslides are equally likely to occur on any type
of slope
19
Test the significance of the statistic

The null hypothesis..



there is no pattern or relationship
Significance tests help us decide whether we
should or should not reject the null
hypothesis
Must also decide on confidence level ranging
from 0 to 1, 80% sure (.20 confidence level),
95% sure (0.05 confidence level) that the
clusters did not occur by chance
20
Question the results!


Even if statistically significant, question them
Consider:





Scale you’re working at
Where the study boundaries fall
Type of data you are using
Quality of the data
How did you define proximity between features
(straight line vs travel time on roads)
21
Part B: Spatial Statistics
Measuring geographic
distributions
Testing statistical significance
Identifying patterns
Measuring geographic distributions
Identify spatial characteristics of a distribution
 Where is the center?
 What feature is most central?
 How are features dispersed around the
center?
23
ArcTools
in
ArcGIS
24
Where is the center?

Mean Center tool


Computes the average X and Y coordinates of all
features
Creates a new point feature
25
Example
26
Example
27
Mean center tool

More common use:

To compare distributions of different types of
features or to find the center of features based on
an attribute value
28
Example
29
Example
30
Example
31
What is the most central feature?

Central feature tool

Identifies the most centrally located feature

Feature having the lowest total distance to all other
features
32
Example
33
Example
34
What is the most central feature?


More interestingly is by adding a weight in the
analysis such as population
We are now finding not just which site is most
central but which is the most accessible to
the greatest number of people.
35
Example
36
Measuring feature distribution

Standard distance tool




Measures distribution of features around the
mean
Result is a summary statistic representing
distance
If circle is large, incidents are widespread
If small, incidents are more localized
37
Example
38
Distributional trends

Directional distribution (standard ellipse) tool


Identify spatial trends in the distribution of
features
Uses



Compare distributions
Examine different time periods
Show compactness and orientation
39
Example
40
Testing statistical significance

The next section of identifying patterns or
later spatial relationships allows us to perform
significance tests on the results before
accepting them
41
Using significance tests with spatial data


Spatial data contradicts some of the
assumptions of inferential statistics
You need to be aware of these limitations!
42
Assumptions

Testing a random sample



With GIS data in a database you may not know if
the data was randomly sampled
How large the sample is in relation to the
population?
Even with randomness assumed, spatial data
often violates the independence of
observations in a sample
Spatial data is rarely evenly distributed across a region
43
For spatial pattern analysis…



The null hypothesis is that features are
evenly distributed across the study area
Hard to imagine this being true
You have to make one of two common
sampling assumptions: randomization or
normalization
44
Identifying Patterns
Why study patterns?
Range from completely clustered to completely
dispersed

45
Identifying patterns (applications)
Forestry applications
USFS may measure the pattern of clear cuts to
ensure sufficient contiguous forest habitat remaining
Agency may allow a level of clustering of clear cuts
and then make sure it is not exceeded
Wildlife studies
if population is dispersed then species can live in a
wide range of habitats, if clustered then it has very
specific habitat requirements
46
Goal for analyzing spatial patterns


Are there underlying spatial processes
influencing the locations of our features?
Are our features randomly located throughout
the study area, or are they displaying a
clustering or dispersed pattern?
47
Two approaches for analyzing spatial
patterns

Global calculations




Identifies overall patterns or
trends in the data
Effective for complex messy
data
Interested in broad overall
results
Work by comparing feature
locations and/or attributes to
a theoretical random
distribution to determine if
you have statistically
significant clustering or
dispersion
48
2nd approach

Local calculations



Identify the extent and
locations of clustering
Answer where do we
have spatial clustering
Process every feature
within the context of
its neighboring
features in order to
determine whether it
represents a spatial
outlier, or if part of a
statistically significant
spatial cluster
49
Are features clustered?


Can be found with the Nearest Neighbor Index
which does not require specifying an attribute
If based on an attribute we can test for Spatial
autocorrelation using the (Moran’s I) tool



Things that are closer are more alike than things that are
not
Measures similarity of neighboring features
Identifies if features are clustered or dispersed
50
Output for this tool





Output gives a graphical display of results in
four different ways
Statistical #s
Pictorial of #s
Significance
Sentence
51
Example

Compare z score on two different years of
data, if z score increases, then clustering is
more intense
52
Locate the hot spots



This is a local question that requires a hot spot
analysis (Getis-Ord Gi*) tool
Indicates the extent to which each feature is
surrounded by similarly high or low values
Where do features with similar attribute values
cluster spatially together
53
Getis-Ord Gi* tool


Identifies where clustering occurs in both high
and low values
Calculates a Z score for each feature


High Z = hot spot (when a feature has a high
value and it is surrounded by other features with
high values)
Low Z = cold spot (when we have features with
low values surrounded by other features with low
values)
54
Notes on the Z score



Z score is a measure of standard deviation
It is a reference value that’s associated with a
standard normal distribution
A very high or low Z score would be found in
the tails
55
More notes on Z score





A very high or low Z score means that the pattern
deviates significantly from a hypothetical random
pattern
For example, when using a 95% CI, Z scores are 1.96 and +1.96
If Z is between these -1.96 and +1.96 you can’t
reject the null
You are seeing one version of a random pattern
If very high or low (ie -2.5 or +5.4) you have a
pattern that’s too unusual to be a pattern of random
choice so we reject the null hypoth
REMEMBER: The null hypothesis is that features are evenly
distributed across the study area
56
Reference

Mitchell, A. 2005. ESRI Guide to GIS
Analysis, Volume 2. ESRI press, Redlands,
CA.
57
Download