Pitfalls and Potential of Spatial Data - pantherFILE

advertisement
Geographic Information Science
Geography 625
Intermediate
Geographic Information Science
Week2: The pitfalls and potential of spatial data
Instructor: Changshan Wu
Department of Geography
The University of Wisconsin-Milwaukee
Fall 2006
University of Wisconsin-Milwaukee
Geographic Information Science
Outline
1. Introduction
2. The bad news: the pitfalls of spatial data
3. The good news: the potential of spatial
data
University of Wisconsin-Milwaukee
Geographic Information Science
1. Introduction
Why spatial data require spatial analytic techniques, distinct
from standard statistical analysis that might be applied to any
old ordinary data?
Anything special with the spatial data?
Number of
cases of
Lyme
disease
(Huxhold and Martin)
University of Wisconsin-Milwaukee
Geographic Information Science
1. Introduction
Bad news: many of the standard techniques and methods
documented in standard statistics textbooks have significant
problems when we try to apply them to the analysis of the
spatial distributions.
Good news: Geospatial referencing provides us with a number
of new ways of looking at data and the relations among them.
(e.g. distance, adjacency, interaction, and neighbor)
University of Wisconsin-Milwaukee
Geographic Information Science
2. Pitfalls of Spatial Data
Spatial data always violate the fundamental requirement of
conventional statistical analysis
Spatial autocorrelation
Modifiable areal unit problem
Ecology fallacy
Scale
Nonuniformity of space
Edge effect
University of Wisconsin-Milwaukee
Geographic Information Science
2. Pitfalls of Spatial Data
- Spatial autocorrelation
Local D indices
-2.60 - -1.01
-1.00 - -.392
-.391 - .224
.225 - .867
.868 - 1.34
1.35 - 1.91
1.92 - 3.26
Data from locations near
one another in space are
more likely to be similar
than data from locations
remote from one another.
Example:
Housing market
Elevation change
Temperature
City Boundary
Milwaukee River
±
4
2
0
Kilometers
(African American
Population Concentration)
University of Wisconsin-Milwaukee
4
Geographic Information Science
2. Pitfalls of Spatial Data
- Spatial autocorrelation
The nonrandom distribution of phenomena in space has
various consequences for conventional statistic analysis.
1) Biased parameter estimates
2) Data redundancy (affecting the calculation of confidence
intervals
y
x
University of Wisconsin-Milwaukee
Geographic Information Science
2. Pitfalls of Spatial Data
- Spatial autocorrelation
Three general possibilities
Positive autocorrelation: nearby locations are likely to be
similar to one another.
Negative autocorrelation: observations from nearby
observations are likely to be different from one another.
Zero autocorrelation: no spatial effect is discernible, and
observations seem to vary randomly through space
University of Wisconsin-Milwaukee
Geographic Information Science
2. Pitfalls of Spatial Data
- Spatial autocorrelation
Positive
Negative
Zero (Random)
University of Wisconsin-Milwaukee
Geographic Information Science
2. Pitfalls of Spatial Data
- Spatial autocorrelation
Spatial autocorrelation diagnostic measures
Joins count statistics
Moran’s I
Geary’s C
Variogram cloud
University of Wisconsin-Milwaukee
Geographic Information Science
2. Pitfalls of Spatial Data
- Spatial autocorrelation
Spatial autocorrelation structure: spatial variation across a study
area
1) First order spatial variation: occurs when observations
across a study region vary from space to space due to
changes in the underlying properties of the local
“environment”.
2) Second order: due to local interaction effects between
observations.
University of Wisconsin-Milwaukee
Geographic Information Science
2. Pitfalls of Spatial Data
- Modifiable Areal Unit Problem
Many geographic data are aggregates of data at a more
detailed level
 National census: collected at the household level but
reported for practical and privacy reasons at various
levels of aggregation (block, block group, tract, county,
state, etc.)
 Traffic Analysis Zone (TAZ)
 School district
University of Wisconsin-Milwaukee
Geographic Information Science
2. Pitfalls of Spatial Data
- Modifiable Areal Unit Problem
Modifiable Areal Unit Problem: the aggregation units used are
arbitrary with respect to the phenomena under investigation,
yet the aggregation units used will affect statistics determined
on the basis of data reported in this way.
If the spatial units in a particular study were specified
differently, we might observe very different patterns and
relationships.
University of Wisconsin-Milwaukee
Geographic Information Science
2. Pitfalls of Spatial Data
- Modifiable Areal Unit Problem
University of Wisconsin-Milwaukee
Geographic Information Science
2. Pitfalls of Spatial Data
- Modifiable Areal Unit Problem
University of Wisconsin-Milwaukee
Geographic Information Science
2. Pitfalls of Spatial Data
- Modifiable Areal Unit Problem
Openshaw and Taylor (1979) showed that with the same underlying data it is
possible to aggregate units together in ways that can produce correlations
anywhere between -1.0 to +1.0.
University of Wisconsin-Milwaukee
Geographic Information Science
2. Pitfalls of Spatial Data
- Modifiable Areal Unit Problem
Two issues
Scale issue: involves the aggregation of smaller units into
larger ones. Generally speaking, the larger the spatial units,
the stronger the relationship among variables.
Aggregation (smoothed)
University of Wisconsin-Milwaukee
Geographic Information Science
2. Pitfalls of Spatial Data
- Modifiable Areal Unit Problem
Modifiable Area: Units are arbitrary defined and different
organization of the units may create different analytical
results.
University of Wisconsin-Milwaukee
Geographic Information Science
2. Pitfalls of Spatial Data
- Modifiable Areal Unit Problem
Potential problems in almost every field that utilizes spatial data
E.g. boundaries of electoral districts
In the 2000 U.S. presidential election, Al Gore, with more of the
population vote than George Bush, but failed to become president.
A different aggregation of U.S. counties into states could have produced a
different outcome (switch just one northern Florida county to Georgia or
Alabama would have produced a different outcome)
University of Wisconsin-Milwaukee
Geographic Information Science
2. Pitfalls of Spatial Data
- Modifiable Areal Unit Problem
What are the reasons for this problem?
 Problems of data?
 Problems of spatial units?
What are the solutions for this problem?
 Using the most disaggregated data
 Produce a optimal zoning system
 Others?
University of Wisconsin-Milwaukee
Geographic Information Science
2. Pitfalls of Spatial Data
- Ecological Fallacy
The Ecological Fallacy is a situation that can occur when a
researcher or analyst makes an inference about an individual
based on aggregate data for a group.
(Reference: http://jratcliffe.net/research/ecolfallacy.htm)
University of Wisconsin-Milwaukee
Geographic Information Science
2. Pitfalls of Spatial Data
- Ecological Fallacy
Example: we might observe a strong relationship between
income and crime at the county level, with lower-income
areas being associated with higher crime rate.
Conclusion:
1) Lower-income persons are more likely to commit crime
2) Lower-income areas are associated with higher crime rates
3) Lower-income counties tend to experience higher crime rates
University of Wisconsin-Milwaukee
Geographic Information Science
2. Pitfalls of Spatial Data
- Ecological Fallacy
Issues:


Identifying associations between aggregate figures is defective ?
Inferences drawn about associations between the characteristics of an
aggregate population and the characteristics of sub-units within the
population are wrong?
What should we do?
Be aware of the process of aggregating or disaggregating data may
conceal the variations that are not visible at the larger aggregate level
University of Wisconsin-Milwaukee
Geographic Information Science
2. Pitfalls of Spatial Data
- Ecological Fallacy
Relationship between ecological fallacy and modifiable
areal unit problem?
University of Wisconsin-Milwaukee
Geographic Information Science
2. Pitfalls of Spatial Data
- Scale
The geographical scale at which we examine a phenomenon
can affect the observations we make and must always be
considered prior to spatial analysis
Problems of data representation
 Is there an optimal scale?
University of Wisconsin-Milwaukee
Geographic Information Science
2. Pitfalls of Spatial Data
- Nonuniformity of Space
Nonuniformity: space is not uniform
Area with high crime rates?
Crime locations
University of Wisconsin-Milwaukee
Geographic Information Science
2. Pitfalls of Spatial Data
- Edge Effects
Edge effects arise where an artificial boundary is imposed on a
study, often just to keep it manageable.
Spatial interpolation
University of Wisconsin-Milwaukee
Geographic Information Science
3. Potential of Spatial Data
- Introduction
Potential insight provided by examination of the locational
attributes of data
Distance
Adjacency
Interaction
Neighborhood
University of Wisconsin-Milwaukee
Geographic Information Science
3. Potential of Spatial Data
- Distance
Distance between the spatial entities of interest can be
calculated with spatial data
d ij  ( xi  x j )  ( yi  y j )
2
2
Euclidean distance
Network distance
Others (e.g. travel time)
University of Wisconsin-Milwaukee
Geographic Information Science
3. Potential of Spatial Data
- Adjacency
Adjacency can be thought of as the nominal, or binary, equivalent
of distance. Two spatial entities are either adjacent or not.
Can be defined differently
Example 1: two entities are adjacent if they share a common
boundary (e.g. Illinois and Wisconsin)
Example 2: two entities are adjacent if they are within a
specified distance
University of Wisconsin-Milwaukee
Geographic Information Science
3. Potential of Spatial Data
- Interaction
Interaction may be considered as a combination of distance
and adjacency and rests on the intuitively obvious idea that
nearer things are “more related” than distant things, a notion
often referred to as the first law of geography.
1
ij  k
d
ij 
pi p j
dk
University of Wisconsin-Milwaukee
Geographic Information Science
3. Potential of Spatial Data
- Neighborhood
Different definitions
Example 1: a particular spatial entity as the set of all other
entities adjacent to the entity we are interested in.
Example 2: a region of space associated with that entity and
defined by distance from it.
University of Wisconsin-Milwaukee
Geographic Information Science
3. Potential of Spatial Data
- Neighborhood
Adjacency
Distance
Neighborhood
Interaction
University of Wisconsin-Milwaukee
Geographic Information Science
3. Potential of Spatial Data
- Matrix representation
Distance matrix
A
A 0

B 66
C 68

D 68
E 24
F  41
B
C
D
E
F
66
68
68
24
0
51 110 99
51
0
41 
101
116

108
45 

0 
67
91
110 67
0
60
99
60
0
91
101 116 108 45
University of Wisconsin-Milwaukee
Geographic Information Science
3. Potential of Spatial Data
- Matrix representation
Adjacency d<= 50 =
 0 66 68
66 0
51

68 51 0

68 110 67
24 99 91

 41 101 116
68
24
110 99
67
91
0
60
60
0
108 45
41 
101
116

108
45 

0 
A
B
C
D
E
F
A
B
C
*
0

0

0




0
0
*
D
E
F
0
1
0
0
0
0
*
0
0
0
0
*
0
1
0

0

0


*

*
University of Wisconsin-Milwaukee
Geographic Information Science
3. Potential of Spatial Data
- Proximity Polygons
The proximity polygon of any entity is that region of the space
which is closer to the entity than it is to any other.
Applications:
Service area delineation
(e.g. schools, hospital,
supermarket, etc.)
University of Wisconsin-Milwaukee
Geographic Information Science
3. Potential of Spatial Data
- Proximity Polygons
Delaunay triangulation
Potential applications:
1. TIN model
2. Others
University of Wisconsin-Milwaukee
Download