Spatial Statistics and Spatial Knowledge Discovery

Spatial Statistics and Spatial Knowledge
Discovery
First law of geography [Tobler]: Everything is related to everything, but
nearby things are more related than distant things.
Drowning in Data yet Starving for Knowledge [Naisbitt -Rogers]
Lecture 5 : Spatial Regression
Pat Browne
Standard statistical concepts: Regression
• Regression: takes a numerical dataset
and develops a mathematical formula that
fits the data. The results can be used to
predict future behaviour. Works well with
continuous quantitative data like weight,
speed or age. Not good for categorical
data where order is not significant, like
colour, name, gender, nest/no nest.
Example: plotting snowfall against height
above sea level.
Standard statistical concepts:
Regression
Y = A + BX; The response variable is y, and x is the
continuous explanatory variable. Parameter A is the
intercept. Parameter B is the slope. The difference
between each data point and the value predicted by
the line (the model) is called a residual
Standard statistical concepts:
Regression
The regression equation can be given as:
zi = β0 + β1 yi
Where zi is the predicated value.
β0 is the intercept
β1 is the slope coefficient
Standard statistical concepts:
Regression
Alternative notation for linear regression equation:
Y = a + bX where
•Y is the dependent variable
•a is the intercept
•b is the slope or regression coefficient
•X is the independent variable (or covariate)
Standard statistical concepts: Null
hypothesis
• The null hypothesis, H0, represents a theory that has
been put forward, either because it is believed to be true,
but has not been proved. For example, in a clinical trial
of a new drug, the null hypothesis might be that the new
drug is no better, on average, than the current drug H0:
there is no difference between the two drugs on average.
• In general, the null hypothesis for spatial data is that
either the features themselves or of the values
associated with those features are randomly distributed
(e.g. no spatial pattern or bias).
Relation of i.i.d., regression, and correlation with
spatial phenomena.
• The first law of geography according to Waldo Tobler is
"Everything is related to everything else, but near things
are more related than distant things." In statistical terms
this is called autocorrelation where the traditional i.i.d.
assumption is not valid for spatially dependent variables
(e.g. temperature or crime rate) we need special
techniques to handle this type of data (e.g. Moran’s I).
These techniques usually involve including a weight
matrix which contains location information. The non-i.i.d.
nature of spatially dependent variables carries over into
regression and correlation which require spatial weights
Relation of i.i.d., regression, and
correlation with spatial database
• Spatial databases are used for spatial data mining,
which includes statistical techniques and more
specialised DM techniques such as association rules.. In
this case the data mining algorithms need to have a
spatial context. We must explicitly include location
information where previously with the i.i.d. assumption it
was not required Typical generic data mining activities
such as clustering, regression, classification, association
rules, all need a spatial context. Spatial DM is used in a
broad range scientific disciplines, such as analysis of
crime, modelling land prices, poverty mapping,
epidemiology, air pollution and health, natural and
environmental sciences, etc. The analyst must be aware
the special techniques required for SDM.
Relation of i.i.d., regression, and
correlation with spatial database
• Spatial databases are also used for pure
statistical research (e.g. environmental
studies). Those variables that are spatially
dependent (e.g. the PH of the soil) need to
be clearly identified and special
techniques applied to take into account
their spatial bias.
Unique features of spatial data Statistics
• General Statistics assumes the samples
are independently generated, which is
may not the case with spatial dependent
data.
• Like things tend to cluster together.
• Change tends to be gradual over space.
Unique features of spatial data Statistics
Spatial dependent values
• The previous maps illustrate two important
features of spatial data:
• Spatial Autocorrelation (not independent)
– The probability that they both occur is equal to the
product of the probabilities of the two individual
events, i.e.
• P(AB) = P(A)  P(B)
• Spatial data is not identically distributed.
– Two events A and B are identically distributed if P(A)
=P(B) i.e. they have the same probability distribution.
Unique features of spatial data Statistics
Autocorrelation & Spatial Heterogeneity.
• Spatial autocorrelation is detected when the value
of a variable in a location is correlated with values of
the same variable in the neighbourhood (can be
measured with Moran I).
• Spatial heterogeneity is characterized by different
values or behaviours through space which can be
measured by Local Indicators of Spatial Association
(LISA). Characterizes the non-stationarity of most
geographic processes, meaning that global
parameters may not accurately reflect the process
occurring at a particular location.
Spatial Heterogeneity.
• Spatial heterogeneity; Is there such a thing as an
average place with respect to some property (e.g.
vegetation). is difficult to imagine any subset of the
Earth’s surface being a representative sample of the
whole. GWR (later) addresses the localness of
spatial data.
Neigbourhood relationship
contiguity matrix
Spatial regression (SR)
• Spatial regression (SR) is a global spatial modeling
technique in which spatial autocorrelation among the
regression parameters are taken into account. SR is
usually performed for spatial data obtained from spatial
zones or areas. The basic aim in SR modeling is to
establish the relationship between a dependent variable
measured over a spatial zone and other attributes of the
spatial zone, for a given study area, where the spatial
zones are the subset of the study area. While SR is
known to be a modeling method in spatial data analysis
literature in spatial data-mining literature it is considered
to be a classification technique
Spatial regression (SR)
• The coefficient of determination (COF) of a
linear regression model is the quotient of
the variances of the fitted values and
observed values of the dependent
variable. The COF
Geographically weighted
regression (GWR)
• Geographically weighted regression (GWR) is a powerful
exploratory method in spatial data analysis. It serves for
detecting local variations in spatial behavior and
understanding local details, which may be masked by
global regression models. Unlike SR, where regression
coefficient for each independent variable and the
intercept are obtained for the whole study region, in
GWR, regression coefficients are computed for every
spatial zone. Therefore, the regression coefficients can
be mapped and the appropriateness of stationarity
assumption in the conventional regression analyses can
be checked.
Geographically weighted
regression (GWR)
• GWR is an effective technique for exploring
spatial nonstationarity, which is characterized by
changes in relationships across the study region
leading to varying relations between dependent
and independent variables. Hence there is a
need for better understanding of the spatial
processes has emerged local modeling
techniques. GWR has been implemented in
various disciplines such as the natural,
environmental, social and earth sciences.
Exploring spatial patterning in
spatial data values1.
• Two issues
– 1. How do variables change from place to
place? Zone similar to neighbours?
– 2. How are variables related. How does the
relationship between rainfall and altitude vary
from place to place.
Local Statistics1 moving window
Geographical Weights
•
Binary: Rook or
queen neighbours
•
Distance based
•
Boundary or
perimeter based.
•
Weights can be rownormalized using the
number of adjacent
cells
Local Univariate measures1 moving window
• Standard univariate can be computed for a
moving window, supplying the degree and
nature of variation in summary statistics
across a region of interest (e.g. we could
compute the standard deviation for several
windows and assess the degree of
variability from place to place.
• Geographical weighting schemes can be
used for the calculation of local statistics.
Local spatial autocorrelation1
• Global statistics such as Moran’s I can mask
local spatial structure. The local Moran can be
used to measure local spatial autocorrelation.
Only if there is little or no variation in the local
observations do the global observations provide
any reliable information on the local areas within
the study area. As the spatial variation of the
local observations increases, the reliability of the
global observation as representative of local
conditions decreases.
Local spatial autocorrelation1
The weights could be based on rook, queen, distance, perimeter and normalized
by number of neighbours ( slide 28)
Spatial Regression1
• The assumption of i.i.d. underlying ordinary least
squares regression rarely holds for spatial data.
There are several techniques that handle the
spatial case;
– Moving window regression
– Geographic Weighted Regression (GWR)
• We will look at GWR
• GWR has been used primarily for exploratory
data analysis, rather than hypothesis testing.
Geographic Weighted Regression (GWR) 1
•
The steps are;
1. Go to a location
2. Conduct regression using the raw data and
a geographic weighting scheme.
3. Move to next location go back to stage 2
until all locations have been visited.
•
The output is a set of regression
coefficients (e.g. slope and intercept) at
each location
Coords of observations, variables. distance from first
observation, and geographic weights
point
x
y
Var 1 Var 2 dist
Geo w
1
25
45
12
6
0
1
2
25
44
34
52
1
0.995
3
21
48
32
41
5
0.8825
4
27
52
12
25
8
0.7261
5
16
31
11
22
16
0.278
6
42
35
14
9
20
0.0889
7
9
65
56
43
26
0.034
8
29
76
75
67
32
0.006
9
61
66
43
32
42
0.0002
Location of points for previous table
Regression using previous table and locations, the geographic weighting pulls the
line towards the points with larger weights
Regression using previous table and locations, the geographic weighting pulls the
line towards the points with larger weights
Summary of spatial stats
• Moran’s I measures the average correlation between
the value of a variable at one location and the value at
nearby locations.
• Local Moran statistic measures spatial dependence on a
local basis, allowing the researcher to see its variation
over space, and by Geographically
• Geographically Weighted Regression allows the
parameters of a regression analysis to vary spatially.
GWR helps in detecting local variations in spatial
behavior and understanding local details, which may be
masked by global regression models. GWR, regression
coefficients are computed for every spatial zone.
© Oxford University Press, 2010. All rights reserved. Lloyd: Spatial Data Analysis
Two scatter plots and fitted lines for different aggregations of same value
© Oxford University Press, 2010. All rights reserved. Lloyd: Spatial Data Analysis
Second Law of Geography1
• Second law of geography: Spatial heterogeneity
[Goodchild]
• Spatial heterogeneity describes geographic variation
in the constants or parameters of relationships
• When it is present, the outcome of an analysis
depends on the area over which the analysis is made.
• Spatial heterogeneity depends on the spatial
resolution.
• Global model might be inconsistent with respect to a
regional model(s).
Second Law of Geography
• Spatial heterogeneity definitions:
– quantitative information characterizing the
ground spatial structure
– spatial variance distribution of the variable
considered, within the coarse sample
resolution (e.g. pixel or grid)
– The patterning or patchiness in important
landscape properties such as vegetation
cover.
Second Law of Geography1
• Spatial heterogeneity has been quantified from
remote sensing images by using two basic
approaches:
• (a) the direct image approach, where straight
reflectance or reflectance indices of remote
sensing images are used to quantify spatial
heterogeneity, using the original pixel size of the
image
• (b) the cartographic or patch mosaic approach,
where the image is subdivided into
homogeneous mapping units through
classification.
Second Law of Geography1
• Suppose there is a relationship between number of AIDS
cases and number of people living in an area
• The form of this relationship will vary spatially
– in some areas the number of cases per capita2 will be higher
than in others
– we could map the constant of proportionality3
• Spatial heterogeneity describes this geographic variation
in the constants or parameters of relationships . When it
is present, the outcome of an analysis depends on the
area over which the analysis is made. Often this area is
arbitrarily determined by a map boundary or political
jurisdiction
Second Law of Geography
• Second law of geography [Goodchild]
– Spatial heterogeneity
• Global model often inconsistent with
regional models (e.g. the average does
not hold anywhere).
How to decide the weight wij ?
The weight indicates the spatial interaction between entities.
1) Binary wij, also called absolute adjacency. Covers the general
case answering the question is a value in a region similar or
different to its neighbours.
wij = 1 if two geographic entities are adjacent; otherwise, wij = 0.
Choice of adjacency definition queens(8) or rooks(4).
How to decide the weight wij ?
The weight indicates the spatial interaction between entities.
2) The distance between geographic entities. Often the inverse
distance is used, further objects get less weight, near object get
more weight e.g. centre of epidemic.
wij = f(dist(i,j)), dist(i,j) is the distance between i and j.
3) The length of common boundary for area entities. Policing
borders, smaller borders less weight.
wij = f(leng(i,j)), leng(i,j) is the length of common boundary
between i and j.
How to decide the weight wij ?1
The choice of weights should ultimately be driven by a rationale for including
those areas as neighbors that have a spatial effect on a given location. This
rationale can be derived from theory or be the result of using ESDA to
experiment with different weights and connectivity orders. Since weights
matrices are used to create spatial lags that average neighboring values, the
choice of a weights matrix will determine which neighboring values will be
averaged. For instance, since rook weights will usually have fewer neighbors
than queen weights, on average, each neighboring observation has more
influence.
How to decide the weight wij ?
1
The question of which weights to choose is more pertinent in the context of
modeling than ESDA since modeling is based on substantive notions of
spatial effects while ESDA prioritizes the rejection of spatial randomness.
Therefore, if there are no substantive reasons to guide the choice of weights
in ESDA, using a weights file with as few neighbors as possible (such as
rook) makes sense. Especially with irregular areal units (as opposed to
grids), the difference between rook and queen weights is often minimal.
However, it is advisable to test how sensitive your results are to your weights
specifications by comparing multiple weights matrices.
Spatial Outlier Detection
• Global outliers are observations which
appear inconsistent with the remainder of
that data set.
• Global outliers deviate so much from other
observations that it may be possible that
they were generated by a different
mechanism.
• Spatial outliers are observations that
appear inconsistent with their neighbours.
Spatial Outlier Detection
• Detecting spatial outliers has important
applications in transportation, ecology,
public safety, public health, climatology
and location based services.
• Geographic objects have a spatial
(location, shape, metric & topological
properties) & non-spatial component
(house owner, sensor id., soil type).
Spatial Outlier Detection
• Spatial neighbourhoods may be defined using
spatial attributes & spatial relations.
• Comparisons between spatially referenced
objects can be based on non-spatial attributes.
• A spatial outlier is a spatially referenced object
whose non-spatial attribute values differ from
those of other spatially referenced objects in its
spatial neighbourhood.
Data for Outlier detection
In diagram on left G,P,S,Q show a big change in attribute for a small change in
location. The right hand diagram shows a normal distribution (corresponds to
attribute axis in left diagram)
Spatial Outlier Detection
• The upper left & lower
right quadrants of
figure 7.17 indicate a
spatial association of
dissimilar values; low
values surrounded by
high value neighbours
(P & Q) and high
values surrounded by
low values (S).
Spatial Outlier Detection
• Moranoutlier is a
point located in the
upper left or lower
right quadrant of a
Moran scatter plot.
Spatial Outlier Detection
• Moranoutlier is a
point located in the
upper left or lower
right quadrant of a
Moran scatter plot.
WZ
Q4 = LH
Db
0
Q2= LL
Q1= HH
Cb
a
Q3 = HL
z
0
values in a given location
LISA for Crime in Columbus,
OH
High crime
clusters
Low crime
clusters
LISA map
(only significant
values plotted)
Significance map
(only significant
values plotted)
For more detail on LISA, see:
Luc Anselin Local Indicators of Spatial AssociationLISA Geographical Analysis 27: 93-115
Model Evaluation
• Consider the two-class classification problem
‘nest’ or ‘no-nest’. The four possible outcomes
(or predictions) are shown on the next slide. The
desired predictions are:
– 1) where the model says the should be a nest and
there is an actual nest (True Positive)
– 2) where the model says there is no nest and there is
no nest (True Negative)
• The other outcomes are not desirable and point
to a flaw in the model.
Model Evaluation
Spatial Statistical Models
• A Point Process is a model for the spatial
distribution of points in a point pattern.
Examples: the position of trees in a forest,
location of petrol stations in a city.
• Actual real world point patterns can be
compared (using distance) with a
randomly distributed point pattern random.