Spatial Association Lecture Notes

advertisement
Spatial Association and
spatial statistic techniques
Danlin Yu
Ph.D. Candidate
Dept. of Geography, UWM
Detecting Spatial Association
What is spatial association
Spatial objects tend to relate with one another
Types of spatial association
Spatial autocorrelation: similar (dissimilar)
values in space tend to cluster together
Spatial heterogeneity: spatial regimes, space is
not homogeneous
Autocorrelation and heterogeneity are closely
related
Detecting spatial association
Why study spatial association
It is inherent in geographic researches
When working on spatial data, analyses based
on regular statistics are VERY likely to be
misleading or incorrect
How to detect spatial association
Power of GIS
Exploratory Spatial Data Analysis (ESDA): let
the data speak
Background
The first law of Geography:
Everything is related, but things nearby are
more related than things far away
Characteristics of spatial statistics
Existence of spatial association violates an
important statistical assumption: independence
Spatial patterns are results of spatial processes
– the one we see, is one of numerous
possibilities from the same spatial process
Types of spatial association
Point spatial association
Distance is critical in deciding point spatial
association
Line spatial association
Distance and path
Areal spatial association
Distance and contiguity
Today’s topic: univariate SA
Univariate: for pattern detection
Examples: per capita GDP for economic
performance pattern; surface temperature for
local climate pattern, etc.
Central question: is the pattern we see a result
of some specific processes (usually random or
normal processes – our null hypothesis)?
Multivariate: spatial regression or
geographically weighted regression (GWR)
Researching means
Hypothesis testing in answering this
question is conducted via spatial statistic
means
For univariate geographic data, there are a
few indexes in literature:
Moran’s Index (Moran’s I)
Geary’s Index (Geary’s c)
Getis’s G or O
Spatial statistic indexes
Purposes of the three indexes are very
similar – based on the geographic data,
calculate an index, test the index against the
null
The most often encountered index is the
Moran’s I
Discussion on Moran’s I are applicable to other
indexes subject to minor adjustments
Moran’s Index (I)
Structured like the Pearson’s productmoment statistic: measure of covariance
n
I
n
n
n
i
j
 wij
n
 w
ij

i
( y i  y )( y j  y )
j
n
2
(
y

y
)
 i
i
Moran’s I
wij is the weight, wij=1 if locations i and j
are adjacent and zero otherwise (wii=0, a
region is not adjacent to itself).
yi and y are the variable in the ith location
and the mean of the variable, respectively
n is the total number of observations
I is used to test hypotheses concerning
similarity
Determining the weights
Two rules
Distance: locations within a certain distance are
considered as neighbors
Border-sharing (for areal units only): areas
sharing borders are considered as neighbors
Weights matrix: could be symmetric or
asymmetric – binary weights matrix, general
weights matrix (distance decaying)
Determining the weights
Spatial weights matrix should be
constructed judiciously
Ideally, related to general concepts from spatial
interaction theory, such as the notions of
accessibility and potential etc.
Determining the weights
When used in hypothesis testing, this
requirement is less stringent
Since our purpose is to test the null – spatial
independence
Still, trying a few structures is a good idea –
border sharing, different distances
Determining the weights
A typical symmetric weights matrix is a
binary weights matrix where neighbors are
coded as 1, others 0
Without losing generality, it is usually row
standardized – all elements of one row add
up to 1
Hypothesis testing
The expected values and the variance for
Moran’s I are used for testing
However, it is observed that in the null
hypothesis, Moran’s I usually does not
follow normal distribution
Alternatives
Random permutation
Saddlepoint approximation
Hypothesis testing
Monte Carlo (random) permutation for
Moran’s I
Randomly arrange the values among the space
and calculate I each time (e.g., 999 times)
Comparing the actual I with the 999 randomly
gained Is
If the actual I falls into area of either more than
95% or less than 5%, it is said the I is psuedo
significant at 5% level (positive/negative)
Hypothesis testing
Saddlepoint approximation (Tiefolsdorf,
2001)
Exact distribution of Moran’s I can be obtained,
but computationally prohibitive for even
medium size data set
A saddlepoint distribution approximates the
exact distribution with reasonable accuracy
Based on the ratio of quadratic normal variables
Usually, random permutation would do the job
Global and local (1)
The Moran’s I just introduced are based on
simultaneous measurements from many
locations – hence, it is a GLOBAL statistics
Global statistics provides only a limited set
of spatial association measurements
You see the pattern, details are ignored – tree
and forest dilemma
Global and local (2)
Recently, a number of statistics have been
developed to measure dependence in
portion of the study area – the local
statistics
In spatial data analysis, the name is Local
Index of Spatial Association (LISA) by
Anselin (1995)
Global and local (3)
Definition of LISA (Anselin, 1995)
The local statistics for each observation gives
an indication of the extent of significant spatial
clustering of similar values around that
observation
The sum of local statistics for all observation is
proportional (or equal) to a corresponding
global statistics
Global and local (4)
Local statistics are well suited to
Identify existence of pockets or “hot spots”
Assess assumptions of stationarity
Identify distances beyond which no discernible
association obtains
Global and local statistics are often used
together for thorough understanding of
spatial association and processes
Global and local (5)
This discussion is based on the
decomposition of the Moran’s I to its local
version
Others can be done similarly, however,
there is an important aspects of Moran’s I
that will assist further understanding in
spatial analysis
It can be decomposed into its local version,
AND a graphic version – Moran’s scatterplot
Local Moran’s I
Following Anselin’s (1995) definition, a
local Moran’s Ii may be defined as:
n
I i  zi  wij z j
j
zis are the deviations from the mean of yis
The weights are row standardized
Local Moran’s I
Hypothesis test for local Moran’s I is more
complex
The distribution of local Moran’s I is definitely
not normal, furthermore, local Moran’s I’s
distribution is influenced by the global pattern
Random permutation won’t work – for one
specific location, during the permutation, the
local Moran’s I’s mean and variance keep
changing – which is not the case for global one
Local Moran’s I
Exact distribution of local Moran’s I can be
obtained, but extremely computationally
prohibitive
Saddlepoint approximation currently is thus
far one potential resolution
Details can be found at Tiefelsdorf (2000;
2002)
Local Moran’s I
In addition, local Moran’s Is correlate with
one another due to overlapping neighbors
Bonferroni correction or other correction
methods are needed for acquiring robust
testing results
These are all done in the SPDEP package in
R
Moran’s scatterplot
A graphic tool for detecting local spatial
association
Derived directly from the global Moran’s I
It can be used together with the local
Moran’s I for better understanding
Moran’s scatterplot
Recall the formula of Moran’s I:
n
I
n
 w
ij
i
j
 w
ij
n
n
n

i
( y i  y )( y j  y )
j
n
(y
i
 y)
2
i
If use row standardized weights matrix the
first term will be 1
Moran’s scatterplot
Therefore, I could be re-written as:
n
n
 w
ij
I
i
( yi  y )( y j  y )
j
n
(y
 y)
i
2
i
Or:
n
I
(y
n
i
 y )( wij ( y j  y ))
i
j
n
2
(
y

y
)
 i
i
Moran’s scatterplot
Recall the coefficient of the linear
regression, b:
n
b
 (ind
i
 ind )( depi  dep)
i
n
2
(
ind

ind
)
 i
i
indi and depi are the independent and
dependent variables; the “bar” versions are
their means, respectively; and b is the
regression coefficient
Moran’s scatterplot
Yes, similarity between the Moran’s I and
the regression coefficient b
n
Actually, ( wij ( y j  y )) is the so-called
j
“spatial lag” of location i.
So, I is formally equivalent to a regression
coefficient in a regression of a location’s
spatial lag on itself
Moran’s Scatterplot
This interpretation enables us to visualize
Moran’s I in a scatterplot of a location’s
spatial lag and itself – the Moran’s
scatterplot
Moran’s I is the slope of the regression line
A lack of fit (in the scatterplot) would
indicate important local spatial process and
associations (local pockets/non-stationarity)
Moran’s scatterplot
The scatterplot is centered on the coordinate
Origin
The first and third quadrants of the plot
represent positive association (high-high
and low-low), while the second and fourth
negative (high-low, low-high)
The density of the quadrants represent the
dominating local spatial process
Moran’s scatterplot
A so-called LOWESS (LOcally Weighted
rEgression Scatterplot Smoothing) curve
can aid the visual effects
Turning of the LOWESS curve usually
indicates interesting local pockets, regimes
or non-stationarity
An example: demonstration in R
More about Moran’s Scatterplot
A very important ESDA tools for spatial
data analysis
Further information could be obtained from:
The Moran Scatterplot as an ESDA tool to
assess local instability in spatial
association. pp. 111–125 in M. M. Fischer,
H. J. Scholten and D. Unwin (eds.) Spatial
analytical perspectives on GIS, London:
Taylor and Francis
An analytical example
Spatial pattern detection in China’s
provincial development
The variable used: per capita GDP
Dynamic patterns – global Moran’s I
Specific local spatial process – local
Moran’s I and the Moran’s scatterplot
China: per capita GDP in 1978
Central
Region
Western
Region
Eastern
Region
Yuan
175 - 291
292 - 430
0
250
500
1,000
431 - 680
Miles
681 - 1290
0
500
1,000
2,000
Kilometers
1291 - 2498
China: per capita GDP in 2000
Central
Region
Western
Region
Eastern
Region
Yuan
869 - 1913
1914 - 3162
0
250
500
1,000
3163 - 4532
Miles
4533 - 8411
0
500
1,000
2,000
Kilometers
8412 - 15593
An analytical example
Global Moran's I
0.25
0.2
0.15
0.1
0.05
Dynamic change of global Moran’s I from 1978 to 2000,
all are significant at 5% level per random permutation
0
2000
1999
1998
1997
1996
1995
1994
1993
1992
1991
1990
1989
1988
1987
1986
1985
1984
1983
1982
1981
1980
1979
1978
Year
An analytical example
There is a clustering trend in China’s
provincial level development (represented
by per capita GDP
But the global Moran’s I can’t tell on which
side does the clustering trend take place:
high values cluster or low values cluster?
3.0
JS
TJ
BJ
ZJ
2.0
HeB
1.0
JL
NMG
XJ
HaN
0.0
SH
FJ
QH
SX
HeN
AH
SSX
SGS
D
NX
SC
JX
XZ
GD
HuN
HuB
GX
YN
GZ
-1.0
-1
0
HLJLN
The Moran’s scatterplot in 1978
1
2
3
4
5
GDP per capita (standardized)
3
JS
TJ
ZJ
2
BJ
HaN
SH
HeB
1
FJ
AH
JX
JL SD
SX
LN GD
NMG
HeN
HLJ
GX
HuN
SSX
NX
QH
GS
XJHuB
SC
XZ
GZ
YN
0
-1
The Moran’s scatterplot in 2000
-2
-1
0
1
2
3
4
5
GDP per capita (standardized)
Local Moran’s I in 1978
Central
Region
Western
Region
Eastern
Region
Local Moran's I
< - 0.3
- 0.3 - 0
0
250
500
1,000
0 - 0.3
Miles
0.3 - 1.0
0
500
1,000
2,000
Kilometers
> 1.0
Local Moran’s I in 2000
Central
Region
Western
Region
Eastern
Region
Local Moran's I
- 0.3 - 0
0
250
500
1,000
0 - 0.3
Miles
0.3 - 1.0
0
500
1,000
2,000
Kilometers
> 1.0
An analytical example
First, China’s coast-interior divide persisted
Interior provinces exhibit great geographical
similarity in economic development and spatial
contributions to the global Moran’s I
Second, the municipalities (Beijing, Tianjin,
Shanghai) always contribute the most
Shanghai’s position is worth noting, it
development changed the spatial pattern the
most
An analytical example
Third, Guangdong’s contribution to the
global index corresponds with its changing
spatial behavior depicted in the Moran
scatterplot
Fourth, while most of the interior provinces
have similar patterns, coastal provinces vary
greatly
An analytical example
Fifth, Shandong fell into the low-low
quadrant, and contributed very little to the
global index
Sixth, Guizhou and Yunnan, two provinces
in southwest China, contributed relatively
highly to the global index in 2000
The poorest ones tend to form a poor cluster
Demo – with R and SPDEP
A little demonstration
The software package R: freeware,
powerful, open source
Packages: SPDEP and MAPTOOLS
If you have spatial data and interested in
utilizing ESDA, you can approach me for
your research
Download