Briggs Henan University 2010 1
Descriptive Spatial Statistics: Centrographic Statistics (This time)
– single, summary measures of a spatial distribution
–- Spatial equivalents of mean, standard deviation, etc..
Inferential Spatial Statistics: Point Pattern Analysis (Next time)
Analysis of point location only--no quantity or magnitude (no attribute variable)
--Quadrat Analysis
--Nearest Neighbor Analysis, Ripley’s K function
Spatial Autocorrelation (Weeks 5 and 6)
– One attribute variable with different magnitudes at each location
The Weights Matrix
Global Measures of Spatial Autocorrelation (Moran’s I, Geary’s C, Getis/Ord Global G)
Local Measures of Spatial Autocorrelation (LISA and others)
Prediction with Correlation and Regression
–Two or more attribute variables
( Week 7 )
Standard statistical models
Spatial statistical models
Briggs Henan University 2010 2
Point Pattern Analysis (PPA) and
Spatial Autocorrelation (SA) : differences and similarities
Point Pattern Analysis (last time)
--points only, and only their location
--there is no “magnitude” value
Spatial Autocorrelation: (this time)
--points and polygons, with different “magnitudes”
-- there is an attribute variable.
--income, rainfall, crime rate, etc.
Briggs Henan University 2010 3
Many ways to define it!
1. The confirmation of Tobler’s first law of geography
Everything is related to everything else, but near things are more related than distant things.
2. Using similarity
The degree to which characteristics at one location are similar (or dissimilar) to those nearby.
3. Using probability
Measure of the extent to which the occurrence of an event in one geographic area makes more probable, or less probable, the occurrence of a similar event in a neighboring geographic area.
4. Using correlation
Correlation of a variable with itself through space.
The correlation between an observation’s value on a variable and the value of near-by observations on the same variable
Lets look at these in more detail.
4
Tobler’s Law
The confirmation of Tobler’s first law of geography*:
Everything is related to everything else, but near things are more related than distant things.
The single most important concept in geography and GIS!
*Tobler W., (1970) "A computer movie simulating urban growth in the Detroit region".
Economic Geography , 46(2): 234-240
Briggs Henan University 2010
5
Positive Spatial Autocorrelation
Spatial
Autocorrelation
Spatial :
On a map
Auto :
Self
Correlation :
Degree of relative similarity
Positive : similar values cluster together on a map
Negative Spatial
Autocorrelation
Source: Dr Dan Griffith, with modification
6
Negative : dissimilar (different) values cluster together on a map
Briggs Henan University 2010
Positive spatial autocorrelation
- high values surrounded by nearby high values
- intermediate values surrounded by nearby intermediate values
- low values surrounded by nearby low values
2002 population density
Negative spatial autocorrelation
- high values surrounded by nearby low values
- intermediate values surrounded by nearby intermediate values
- low values surrounded by nearby high values competition for space
Grocery store density
2. Based on Similarity
The degree to which characteristics at one location are similar to (or different from) those nearby.
Similar to = positive spatial autocorrelation
Different from (dissimilar) = negative spatial autocorrelation
Positive spatial autocorrelation much more common than negative
Briggs Henan University 2010
9
Spatial Autocorrelation Exists Everywhere!
POLLUTION MONITORING SATELLITE IMAGE
HOUSEHOLD SAMPLING
10
3. Based on Probability
Measure of the extent to which the occurrence of an event in one geographic unit (polygon) makes more probable, or less probable, the occurrence of a similar event in a neighboring unit.
Do you recognize this from earlier discussion?
It’s the same concept as clustered, random, dispersed!
high negative spatial autocorrelation no spatial autocorrelation* high positive spatial autocorrelation
Dispersed Pattern Random Pattern Clustered Pattern
Briggs Henan University 2010 11
4. Using correlation
Correlation of a variable with itself through space.
The correlation between an observation’s value on a variable and the value of near-by observations on the same variable.
Correlation =
“similarity”, “association”, or “relationship”
Scatter diagram
Crime rate in near-by area
Crime rate in an area
Briggs Henan University 2010
12
Standard Statistics:
Spatial Autocorrelation: shows the association or relationship between two different variables shows the association or relationship between the same variable in “nearby” areas.
Each point is a geographic location income education
Education
“next door”
In a neighboring or near-b y area education
Briggs Henan University 2010
13
Two reasons
1. Spatial autocorrelation is important because it implies the existence of a spatial process
– Why are near-by areas similar to each other?
– Why do high income people live “next door” to each other?
– These are GEOGRAPHICAL questions.
• They are about location
Infer
Population
Sample
Processes
Create Pattern
2. It invalidates most traditional statistical inference tests
– If SA exists, then the results of standard statistical inference tests may be incorrect (wrong!)
– We need to use spatial statistical inference tests
Briggs Henan University 2010
14
• Statistical tests are based on the assumption that the values of observations in each sample are independent of one another
• spatial autocorrelation violates this
– samples taken from nearby areas are related to each other and are not independent
Values near each other are similar in magnitude.
Implies a relationship between nearby observations
Briggs Henan University 2010
15
Why are standard statistical tests wrong?
Example for the correlation coefficient (r)
What is the correlation coefficient (r)?
• The most common statistic in all of science
• measures the strength of the relationship (or “association”) between two variables e.g. income and education
• Varies on a scale from –1 thru 0 to +1
+1 implies a perfect positive association
•
As values go up (
) on one, they also go up (
) on the other
• income and education
( ) ( ) ( ) ( )
0 implies no association -1 0 +1
-1 implies perfect negative association
•
As values go up on one (
) , they go down (
) on the other
• price and quantity purchased
• Full name is the Pearson Product Moment correlation coefficient ,
Briggs Henan University 2010
16
Examples of Scatter Diagrams and the Correlation Coefficient
r = 0.26
r = 1 r = 0.72
perfect positive strong positive
Education weak positive
r = -0.71
r = -1 strong negative
Price perfect negative
17
Briggs Henan University 2010
Example for the correlation coefficient (r)
If Spatial Autocorrelation exists:
1. Correlation coefficients appear to be bigger than they really are, and
2. They are more likely to be found “statistically significant”
You are “fooled twice”:
--you are more likely to incorrectly conclude a relationship exists when it does not
--You believe that the relationship is stronger than it really is
Briggs Henan University 2010
18
Example for the correlation coefficient (r)
If Spatial Autocorrelation exists:
• Correlation coefficients bigger than they really are
• because income and education are similar in near by areas
• Correlation coefficient is
“biased upward”
• Also, more likely to appear “statistically significant”
– standard error is smaller because spatial autocorrelation
“artificially” reduces variability
– there is actually more variability than it appears
• “exagerated precision”
19
Briggs Henan University 2010
the problem of measuring
“nearness
or “proximity”
Briggs Henan University 2010 20
Measuring Spatial Autocorrelation: the problem of measuring “nearness
To measure spatial autocorrelation, we must know the “nearness” of our observations
• Which points or polygons are “ near” or “next to” other points or polygons?
–
Which provinces are near Henan?
Seems simple and obvious, but it is not!
Briggs Henan University 2010
21
Measuring Spatial Autocorrelation: the Spatial Weights matrix
• Wij the spatial weights matrix measures the relative location of all points i and j,
• Different methods of calculating
W ij can result in different values for autocorrelation and different conclusions from statistical significance tests!
W ij
Admin_Name Anhui Beijing
Anhui 0
Beijing
Chongqing
0
Chongq ing Fujian Gansu
0
0 Fujian
Gansu
Guangdong
Guangxi
0
Guangd ong Guangxi Guizhou Hainan Hebei
0
0
0 Guizhou
Hainan
Hebei
Heilongjiang
Henan
………
Zhejiang
0
0
Heilong jiang Henan ……. Zhejiang
?
0
0
0
Briggs Henan University 2010
22
Measuring Relative Spatial Location:
Contiguity and Distance
Two methods for measuring nearness
1. Weights based on Contiguity--binary (0,1)
– If zone j is next to zone i , it receives a weight of 1
– otherwise it receives a weight of 0,
• It is essentially excluded
– But what constitutes contiguity? Not as easy as it seems!
2. Weights based on Distance—continuous variable
– Measure the actual distance between points, or between polygon centroids
– But what measure do we use, and
– distance to what points -- All? Some?
Briggs Henan University 2010
23
• Sharing a border or boundary
– Rook: sharing a border
– Queen: sharing a border or a point
* Shares common border rook queen
Which use?
Hexagons Irregular
Briggs Henan University 2010
24
Rook
• Matrix contains a:
– 1 if share a border
– 0 if do not share a border
4 areal units
A
C
B
D associated geographic connectivity/ weights matrix
4x4 matrix
A B C D
A 0 1 1 0
W = B 1 0 0 1
Common border
C 1 0 0 1
D 0 1 1 0
Briggs Henan University 2010
25
Problem Situations for Irregular Polygons
Many!
“ Close” but no common border
– Include polygons which have a centroid within the “convex hull” for the centroids of polygons that do share a common border
X
Length of border
– Is Shanxi “as close to” Nei Mongol as to Henan?
– Base “closeness” on proportion of shared border, not just one (1) or zero (0)
– w ij
= border length ij
/border length j
)
Briggs Henan University 2010
26
1 st order
Nearest neighbor rook hexagon queen
2 nd order
Next nearest neighbor
27
Briggs Henan University 2010
• Raw versus row standardized
• Full contiguity versus sparse contiguity
Briggs Henan University 2010
28
A B C
Divide each number by the row sum D E F
Total number of neighbors
--some have more than others
A B C D E F
Row
Sum
A 0 1 0 1 0 0 2
B 1 0 1 0 1 0 3
C 0 1 0 0 0 1 2
D 1 0 0 0 1 0 2
E 0 1 0 1 0 1 3
F 0 0 1 0 1 0 2
Row standardized
--usually use this
A B C D E F
Row
Sum
A 0.0 0.5 0.0 0.5 0.0 0.0
1
B 0.3 0.0 0.3 0.0 0.3 0.0
1
C 0.0 0.5 0.0 0.0 0.0 0.5
1
D 0.5 0.0 0.0 0.0 0.5 0.0
1
E 0.0 0.3 0.0 0.3 0.0 0.3
1
F 0.0 0.0 0.5 0.0 0.5 0.0
1
Briggs Henan University 2010
29
Queens Case
Full Contiguity
Matrix for US
States
• Column headings not shown (same as rows)
• Principal diagonal has 0s
(blanks)
• other 0s omitted for simplicity
• Can be very large, thus inefficient to use.
30
Briggs Henan University 2010
Name
Sparse Contiguity Matrix for US States -- obtained from Anselin's web site (see powerpoint for link)
Fips Ncount N1 N2 N3 N4 N5 N6 N7
1
4
5
6
8
9
10
11
12
13
16
17
18
19
20
21
5
4
3
5
5
3
4
6
4
8
3
4
5
5
3
1
4
6
4
7
6
5
2
5
3
2
7
3
6
3
4
5
5
4
6
2
6
6
3
8
4
2
6
6
2
3
5
6
4
North Dakota
Ohio
Oklahoma
Oregon
Pennsylvania
Rhode Island
South Carolina
South Dakota
Tennessee
Texas
Utah
Vermont
Virginia
Washington
West Virginia
Wisconsin
Wyoming
Alabama
Arizona
Arkansas
California
Colorado
Connecticut
Delaware
District of Columbia
Florida
Georgia
Idaho
Illinois
Indiana
Iowa
Kansas
Kentucky
Louisiana
Maine
Maryland
Massachusetts
Michigan
Minnesota
Mississippi
Missouri
Montana
Nebraska
Nevada
New Hampshire
New Jersey
New Mexico
New York
North Carolina
51
53
54
55
56
47
48
49
50
42
44
45
46
38
39
40
41
34
35
36
37
30
31
32
33
26
27
28
29
22
23
24
25
28
35
22
4
35
44
24
51
13
12
32
29
26
29
40
47
28
33
51
44
18
19
22
5
16
29
6
25
10
48
34
45
46
26
5
6
24
25
13
56
5
22
4
36
47
41
51
26
49
13
8
28
32
4
36
42
24
21
31
29
29
48
1
45
41
21
10
9
39
55
5
40
56
20
4
23
36
40
9
13
27
21
35
32
54
9
37
27
28
5
8
25
37
16
21
17
16
12
49
48
41
20
25
34
37
56
18
17
17
31
18
5
54
36
55
46
1
17
38
8
49
50
42
8
42
47
30
54
48
16
10
19
1
35
35
33
24
24
19
31
47
6
47
40
1
49
55
39
55
8
39
42
50
38
47
21
46
19
16
4
50
51
42
29
53
39
31
37
40
56
54
39
27
8
32
40
31
47
30
19
27
54
11
33
47
56
41
49
25
18
20
36
38
13
32
11
42
46
29
49
53
46
51
20
46
8
34
30
51
16
21
30
56
17
19
21
N8
31
Queens Case
Sparse Contiguity
Matrix for US
States
• Ncount is the number of neighbors for each state
•Max is 8 (Missouri and Tennessee)
•Sum of Ncount is
218
•Number of common borders
29
(joins)
ncount / 2 = 109
• N1, N2… FIPS codes for neighbors
Briggs Henan University 2010
31
• Which China province has the most neighbors?
– How many does it have?
• Create contiguity matrices for the Provinces of
China
– Can be done with GeoDA or with ArcGIS
– Or you can do it “by hand”
– Use the software to see if you get it correct
Briggs Henan University 2010
32
Weights Based on Distance again, not that simple
1. Functional Form to use?
2. Distance metric to use ?
3. Which points/polygons to include?
4. How measure distance between polygons?
Briggs Henan University 2010
33
Weights Based on Distance
1. Functional Form
• We want “nearness” not distance distance nearness
• Most common choice is the inverse (reciprocal) of the distance between locations i and j (
• Other functions also used w ij
= 1/d ij
)
– inverse of squared distance ( w ij
– negative exponential ( w ij
= e d
= 1/d ij
2 or w
), or ij
= e d 2
)
Briggs Henan University 2010
34
Weights Based on Distance
2. Distance metric
• 2-D Cartesian distance via Pythagorus d ij
( X i
X j
)
2
( Y i
Y j
)
2
Use for projected data
• 3-D Spherical distance via spherical coordinates
Cos d = (sin a sin b) + (cos a cos b cos P) where: d = arc distance a = Latitude of A b = Latitude of B
P = degrees of long. A to B
Use for unprojected data
• possible distance metrics:
– Euclidean straight line/airline
– city block/manhattan metric
– distance through network
Appropriate if within a city
Briggs Henan University 2010
35
3. What points/polygons to include?
• Distances to all points/polygons?
– If use all, may make it impossible to solve necessary equations: matrix too big
– May not make theoretical sense: effects may only be ‘local’
• Is Henan influenced by Xinjiang?
• Include distance to only the “nth” nearest neighbors
– How many is n? First? Second?
• Include distances to locations only within a buffer distance
36
4. Measuring distance between polygons
– distances usually measured centroid to centroid, but
– could be measured from boundary of one polygon to centroid of others
– could be measured between the two closest boundary points
• adjustment required for contiguous polygons since distance for these would be zero
Briggs Henan University 2010
37
Briggs Henan University 2010 38
• The concept of spatial autocorrelation.
– “Near things are more similar than distant things”
• The use of the weights matrix
W ij
“nearness” to measure
• The difficulty of measuring “nearness”
– That is a surprise!
• Measures of Spatial Autocorrelation
– Join Count statistic --Geary’s C
– Moran’s I --Getis-Ord G statistic
Briggs Henan University 2010
39
• Which China province has the most neighbors?
– How many does it have?
• Create contiguity matrices for the Provinces of
China
– Can be done with GeoDA or with ArcGIS
– Or you can do it “by hand”
– Use the software to see if you get it correct
Briggs Henan University 2010
40
Appendix: A Note on Sampling Assumptions
• Another factor which influences results from these tests is the assumption made regarding the type of sampling involved:
– Free (or normality) sampling
• Analogous to sampling with replacement
• After a polygon is selected for a sample, it is returned to the population set
• The same polygon can occur more than one time in a sample
– Non-free (or randomization) sampling
• Analogous to sampling without replacement
• After a polygon is selected for a sample, it is not returned to the population set
• The same polygon can occur only one time in a sample
• The formulae used to calculate test statistics (particularly the standard error) differ depending on which assumption is made
– Generally, the formulae are substantially more complex for randomization sampling—unfortunately, it is also the more common situation!
– Usually, assuming normality sampling requires knowledge about larger trends from outside the region or access to additional information within the region in order to estimate parameters.
41
Briggs Henan University 2010
Briggs Henan University 2010 42