Spatial Autocorrelation concepts

advertisement

Spatial Autocorrelation:

The Single Most Important Concept in Geography and GIS!

Introduction to Concepts

Briggs Henan University 2010 1

Spatial Statistics

Descriptive Spatial Statistics: Centrographic Statistics (This time)

– single, summary measures of a spatial distribution

–- Spatial equivalents of mean, standard deviation, etc..

Inferential Spatial Statistics: Point Pattern Analysis (Next time)

Analysis of point location only--no quantity or magnitude (no attribute variable)

--Quadrat Analysis

--Nearest Neighbor Analysis, Ripley’s K function

Spatial Autocorrelation (Weeks 5 and 6)

– One attribute variable with different magnitudes at each location

The Weights Matrix

Global Measures of Spatial Autocorrelation (Moran’s I, Geary’s C, Getis/Ord Global G)

Local Measures of Spatial Autocorrelation (LISA and others)

Prediction with Correlation and Regression

–Two or more attribute variables

( Week 7 )

Standard statistical models

Spatial statistical models

Briggs Henan University 2010 2

Point Pattern Analysis (PPA) and

Spatial Autocorrelation (SA) : differences and similarities

Point Pattern Analysis (last time)

--points only, and only their location

--there is no “magnitude” value

Spatial Autocorrelation: (this time)

--points and polygons, with different “magnitudes”

-- there is an attribute variable.

--income, rainfall, crime rate, etc.

Briggs Henan University 2010 3

Spatial Autocorrelation

Many ways to define it!

1. The confirmation of Tobler’s first law of geography

Everything is related to everything else, but near things are more related than distant things.

2. Using similarity

The degree to which characteristics at one location are similar (or dissimilar) to those nearby.

3. Using probability

Measure of the extent to which the occurrence of an event in one geographic area makes more probable, or less probable, the occurrence of a similar event in a neighboring geographic area.

4. Using correlation

Correlation of a variable with itself through space.

The correlation between an observation’s value on a variable and the value of near-by observations on the same variable

Lets look at these in more detail.

4

Spatial Autocorrelation:

1.

Tobler’s Law

The confirmation of Tobler’s first law of geography*:

Everything is related to everything else, but near things are more related than distant things.

The single most important concept in geography and GIS!

*Tobler W., (1970) "A computer movie simulating urban growth in the Detroit region".

Economic Geography , 46(2): 234-240

Briggs Henan University 2010

5

Positive Spatial Autocorrelation

Spatial

Autocorrelation

Spatial :

On a map

Auto :

Self

Correlation :

Degree of relative similarity

Positive : similar values cluster together on a map

Negative Spatial

Autocorrelation

Source: Dr Dan Griffith, with modification

6

Negative : dissimilar (different) values cluster together on a map

Briggs Henan University 2010

Positive spatial autocorrelation

- high values surrounded by nearby high values

- intermediate values surrounded by nearby intermediate values

- low values surrounded by nearby low values

2002 population density

Negative spatial autocorrelation

- high values surrounded by nearby low values

- intermediate values surrounded by nearby intermediate values

- low values surrounded by nearby high values competition for space

Grocery store density

Spatial Autocorrelation: more ways to describe it

2. Based on Similarity

The degree to which characteristics at one location are similar to (or different from) those nearby.

Similar to = positive spatial autocorrelation

Different from (dissimilar) = negative spatial autocorrelation

Positive spatial autocorrelation much more common than negative

Briggs Henan University 2010

9

Spatial Autocorrelation Exists Everywhere!

POLLUTION MONITORING SATELLITE IMAGE

HOUSEHOLD SAMPLING

10

Spatial Autocorrelation: more ways to describe it

3. Based on Probability

Measure of the extent to which the occurrence of an event in one geographic unit (polygon) makes more probable, or less probable, the occurrence of a similar event in a neighboring unit.

Do you recognize this from earlier discussion?

It’s the same concept as clustered, random, dispersed!

high negative spatial autocorrelation no spatial autocorrelation* high positive spatial autocorrelation

Dispersed Pattern Random Pattern Clustered Pattern

Briggs Henan University 2010 11

Even More Ways to Describe SA

4. Using correlation

Correlation of a variable with itself through space.

The correlation between an observation’s value on a variable and the value of near-by observations on the same variable.

Correlation =

“similarity”, “association”, or “relationship”

Scatter diagram

Crime rate in near-by area

Crime rate in an area

Briggs Henan University 2010

12

Scatter Diagram : how is it different?

Standard Statistics:

Spatial Autocorrelation: shows the association or relationship between two different variables shows the association or relationship between the same variable in “nearby” areas.

Each point is a geographic location income education

Education

“next door”

In a neighboring or near-b y area education

Briggs Henan University 2010

13

Why is Spatial Autocorrelation Important?

Two reasons

1. Spatial autocorrelation is important because it implies the existence of a spatial process

– Why are near-by areas similar to each other?

– Why do high income people live “next door” to each other?

– These are GEOGRAPHICAL questions.

• They are about location

Infer

Population

Sample

Processes

Create Pattern

2. It invalidates most traditional statistical inference tests

– If SA exists, then the results of standard statistical inference tests may be incorrect (wrong!)

– We need to use spatial statistical inference tests

Briggs Henan University 2010

14

Why are standard statistical tests wrong?

• Statistical tests are based on the assumption that the values of observations in each sample are independent of one another

• spatial autocorrelation violates this

– samples taken from nearby areas are related to each other and are not independent

Values near each other are similar in magnitude.

Implies a relationship between nearby observations

Briggs Henan University 2010

15

Why are standard statistical tests wrong?

Example for the correlation coefficient (r)

What is the correlation coefficient (r)?

• The most common statistic in all of science

• measures the strength of the relationship (or “association”) between two variables e.g. income and education

• Varies on a scale from –1 thru 0 to +1

+1 implies a perfect positive association

As values go up (

) on one, they also go up (

) on the other

• income and education

(  ) (  ) (  ) (  )

0 implies no association -1 0 +1

-1 implies perfect negative association

As values go up on one (

) , they go down (

) on the other

• price and quantity purchased

• Full name is the Pearson Product Moment correlation coefficient ,

Briggs Henan University 2010

16

Examples of Scatter Diagrams and the Correlation Coefficient

Positive

r = 0.26

r = 1 r = 0.72

perfect positive strong positive

Education weak positive

Negative

r = -0.71

r = -1 strong negative

Price perfect negative

17

Briggs Henan University 2010

Why are standard statistical tests wrong?

Example for the correlation coefficient (r)

If Spatial Autocorrelation exists:

1. Correlation coefficients appear to be bigger than they really are, and

2. They are more likely to be found “statistically significant”

You are “fooled twice”:

--you are more likely to incorrectly conclude a relationship exists when it does not

--You believe that the relationship is stronger than it really is

Briggs Henan University 2010

18

Why are standard statistical tests wrong?

Example for the correlation coefficient (r)

If Spatial Autocorrelation exists:

• Correlation coefficients bigger than they really are

• because income and education are similar in near by areas

• Correlation coefficient is

“biased upward”

• Also, more likely to appear “statistically significant”

– standard error is smaller because spatial autocorrelation

“artificially” reduces variability

– there is actually more variability than it appears

• “exagerated precision”

19

Briggs Henan University 2010

Measuring Spatial Autocorrelation:

the problem of measuring

“nearness

or “proximity”

Briggs Henan University 2010 20

Measuring Spatial Autocorrelation: the problem of measuring “nearness

To measure spatial autocorrelation, we must know the “nearness” of our observations

• Which points or polygons are “ near” or “next to” other points or polygons?

Which provinces are near Henan?

–How measure this?

Seems simple and obvious, but it is not!

Briggs Henan University 2010

21

Measuring Spatial Autocorrelation: the Spatial Weights matrix

• Wij the spatial weights matrix measures the relative location of all points i and j,

• Different methods of calculating

W ij can result in different values for autocorrelation and different conclusions from statistical significance tests!

W ij

Admin_Name Anhui Beijing

Anhui 0

Beijing

Chongqing

0

Chongq ing Fujian Gansu

0

0 Fujian

Gansu

Guangdong

Guangxi

0

Guangd ong Guangxi Guizhou Hainan Hebei

0

0

0 Guizhou

Hainan

Hebei

Heilongjiang

Henan

………

Zhejiang

0

0

Heilong jiang Henan ……. Zhejiang

?

0

0

0

Briggs Henan University 2010

22

Measuring Relative Spatial Location:

Contiguity and Distance

Two methods for measuring nearness

1. Weights based on Contiguity--binary (0,1)

– If zone j is next to zone i , it receives a weight of 1

– otherwise it receives a weight of 0,

• It is essentially excluded

– But what constitutes contiguity? Not as easy as it seems!

2. Weights based on Distance—continuous variable

– Measure the actual distance between points, or between polygon centroids

– But what measure do we use, and

– distance to what points -- All? Some?

Briggs Henan University 2010

23

Spatial neighbors based on contiguity* (adjacency)

• Sharing a border or boundary

– Rook: sharing a border

– Queen: sharing a border or a point

* Shares common border rook queen

Which use?

Hexagons Irregular

Briggs Henan University 2010

24

Spatial weights matrix for

Rook

case

• Matrix contains a:

– 1 if share a border

– 0 if do not share a border

4 areal units

A

C

B

D associated geographic connectivity/ weights matrix

4x4 matrix

A B C D

A 0 1 1 0

W = B 1 0 0 1

Common border

C 1 0 0 1

D 0 1 1 0

Briggs Henan University 2010

25

Problem Situations for Irregular Polygons

Many!

“ Close” but no common border

– Include polygons which have a centroid within the “convex hull” for the centroids of polygons that do share a common border

X

Length of border

– Is Shanxi “as close to” Nei Mongol as to Henan?

– Base “closeness” on proportion of shared border, not just one (1) or zero (0)

– w ij

= border length ij

/border length j

)

Briggs Henan University 2010

26

Measuring Contiguity: Lagged Contiguity

Should we include second order contiguity?

1 st order

Nearest neighbor rook hexagon queen

2 nd order

Next nearest neighbor

27

Briggs Henan University 2010

Formats for Weights Matrix

• Raw versus row standardized

• Full contiguity versus sparse contiguity

Briggs Henan University 2010

28

Row-standardized geographic contiguity matrices

A B C

Divide each number by the row sum D E F

Total number of neighbors

--some have more than others

A B C D E F

Row

Sum

A 0 1 0 1 0 0 2

B 1 0 1 0 1 0 3

C 0 1 0 0 0 1 2

D 1 0 0 0 1 0 2

E 0 1 0 1 0 1 3

F 0 0 1 0 1 0 2

Row standardized

--usually use this

A B C D E F

Row

Sum

A 0.0 0.5 0.0 0.5 0.0 0.0

1

B 0.3 0.0 0.3 0.0 0.3 0.0

1

C 0.0 0.5 0.0 0.0 0.0 0.5

1

D 0.5 0.0 0.0 0.0 0.5 0.0

1

E 0.0 0.3 0.0 0.3 0.0 0.3

1

F 0.0 0.0 0.5 0.0 0.5 0.0

1

Briggs Henan University 2010

29

Queens Case

Full Contiguity

Matrix for US

States

• Column headings not shown (same as rows)

• Principal diagonal has 0s

(blanks)

• other 0s omitted for simplicity

• Can be very large, thus inefficient to use.

30

Briggs Henan University 2010

Name

Sparse Contiguity Matrix for US States -- obtained from Anselin's web site (see powerpoint for link)

Fips Ncount N1 N2 N3 N4 N5 N6 N7

1

4

5

6

8

9

10

11

12

13

16

17

18

19

20

21

5

4

3

5

5

3

4

6

4

8

3

4

5

5

3

1

4

6

4

7

6

5

2

5

3

2

7

3

6

3

4

5

5

4

6

2

6

6

3

8

4

2

6

6

2

3

5

6

4

North Dakota

Ohio

Oklahoma

Oregon

Pennsylvania

Rhode Island

South Carolina

South Dakota

Tennessee

Texas

Utah

Vermont

Virginia

Washington

West Virginia

Wisconsin

Wyoming

Alabama

Arizona

Arkansas

California

Colorado

Connecticut

Delaware

District of Columbia

Florida

Georgia

Idaho

Illinois

Indiana

Iowa

Kansas

Kentucky

Louisiana

Maine

Maryland

Massachusetts

Michigan

Minnesota

Mississippi

Missouri

Montana

Nebraska

Nevada

New Hampshire

New Jersey

New Mexico

New York

North Carolina

51

53

54

55

56

47

48

49

50

42

44

45

46

38

39

40

41

34

35

36

37

30

31

32

33

26

27

28

29

22

23

24

25

28

35

22

4

35

44

24

51

13

12

32

29

26

29

40

47

28

33

51

44

18

19

22

5

16

29

6

25

10

48

34

45

46

26

5

6

24

25

13

56

5

22

4

36

47

41

51

26

49

13

8

28

32

4

36

42

24

21

31

29

29

48

1

45

41

21

10

9

39

55

5

40

56

20

4

23

36

40

9

13

27

21

35

32

54

9

37

27

28

5

8

25

37

16

21

17

16

12

49

48

41

20

25

34

37

56

18

17

17

31

18

5

54

36

55

46

1

17

38

8

49

50

42

8

42

47

30

54

48

16

10

19

1

35

35

33

24

24

19

31

47

6

47

40

1

49

55

39

55

8

39

42

50

38

47

21

46

19

16

4

50

51

42

29

53

39

31

37

40

56

54

39

27

8

32

40

31

47

30

19

27

54

11

33

47

56

41

49

25

18

20

36

38

13

32

11

42

46

29

49

53

46

51

20

46

8

34

30

51

16

21

30

56

17

19

21

N8

31

Queens Case

Sparse Contiguity

Matrix for US

States

• Ncount is the number of neighbors for each state

•Max is 8 (Missouri and Tennessee)

•Sum of Ncount is

218

•Number of common borders

29

(joins)

 ncount / 2 = 109

• N1, N2… FIPS codes for neighbors

Briggs Henan University 2010

31

Challenge for You

• Which China province has the most neighbors?

– How many does it have?

• Create contiguity matrices for the Provinces of

China

– Can be done with GeoDA or with ArcGIS

– Or you can do it “by hand”

– Use the software to see if you get it correct

Briggs Henan University 2010

32

Weights Based on Distance again, not that simple

1. Functional Form to use?

2. Distance metric to use ?

3. Which points/polygons to include?

4. How measure distance between polygons?

Briggs Henan University 2010

33

Weights Based on Distance

1. Functional Form

• We want “nearness” not distance distance nearness

• Most common choice is the inverse (reciprocal) of the distance between locations i and j (

• Other functions also used w ij

= 1/d ij

)

– inverse of squared distance ( w ij

– negative exponential ( w ij

= e d

= 1/d ij

2 or w

), or ij

= e d 2

)

Briggs Henan University 2010

34

Weights Based on Distance

2. Distance metric

• 2-D Cartesian distance via Pythagorus d ij

( X i

X j

)

2

( Y i

Y j

)

2

Use for projected data

• 3-D Spherical distance via spherical coordinates

Cos d = (sin a sin b) + (cos a cos b cos P) where: d = arc distance a = Latitude of A b = Latitude of B

P = degrees of long. A to B

Use for unprojected data

• possible distance metrics:

– Euclidean straight line/airline

– city block/manhattan metric

– distance through network

Appropriate if within a city

Briggs Henan University 2010

35

Weights based on Distance

3. What points/polygons to include?

• Distances to all points/polygons?

– If use all, may make it impossible to solve necessary equations: matrix too big

– May not make theoretical sense: effects may only be ‘local’

• Is Henan influenced by Xinjiang?

• Include distance to only the “nth” nearest neighbors

– How many is n? First? Second?

• Include distances to locations only within a buffer distance

36

Weights based on Distance

4. Measuring distance between polygons

– distances usually measured centroid to centroid, but

– could be measured from boundary of one polygon to centroid of others

– could be measured between the two closest boundary points

• adjustment required for contiguous polygons since distance for these would be zero

Briggs Henan University 2010

37

Many decisions!

Many challenges!

That is what makes research fun!

Briggs Henan University 2010 38

What have we learned today?

• The concept of spatial autocorrelation.

– “Near things are more similar than distant things”

• The use of the weights matrix

W ij

“nearness” to measure

• The difficulty of measuring “nearness”

– That is a surprise!

Next Time

• Measures of Spatial Autocorrelation

– Join Count statistic --Geary’s C

– Moran’s I --Getis-Ord G statistic

Briggs Henan University 2010

39

Challenge for You

• Which China province has the most neighbors?

– How many does it have?

• Create contiguity matrices for the Provinces of

China

– Can be done with GeoDA or with ArcGIS

– Or you can do it “by hand”

– Use the software to see if you get it correct

Briggs Henan University 2010

40

Appendix: A Note on Sampling Assumptions

• Another factor which influences results from these tests is the assumption made regarding the type of sampling involved:

– Free (or normality) sampling

• Analogous to sampling with replacement

• After a polygon is selected for a sample, it is returned to the population set

• The same polygon can occur more than one time in a sample

– Non-free (or randomization) sampling

• Analogous to sampling without replacement

• After a polygon is selected for a sample, it is not returned to the population set

• The same polygon can occur only one time in a sample

• The formulae used to calculate test statistics (particularly the standard error) differ depending on which assumption is made

– Generally, the formulae are substantially more complex for randomization sampling—unfortunately, it is also the more common situation!

– Usually, assuming normality sampling requires knowledge about larger trends from outside the region or access to additional information within the region in order to estimate parameters.

41

Briggs Henan University 2010

Briggs Henan University 2010 42

Download