variogram

advertisement
Variograms
Brian Klinkenberg
Geography
To get there, we’ll consider







Correlation
Tobler’s first law of geography
Spatial autocorrelation
Three dimensions of scale
Units of observation
Regionalized variables
Variograms
Correlation
What is correlation? What does it measure?
 [n] a statistical relation between two or more
variables such that systematic changes in the value
of one variable are accompanied by systematic
changes in the other
 [n] a statistic representing how closely two variables
co-vary; it can vary from -1 (perfect negative
correlation) through 0 (no correlation) to +1 (perfect
positive correlation); "what is the correlation
between those two variables?"
 [n] a reciprocal relation between two or more things

[Source: http://www.hyperdictionary.com/dictionary/correlation]
Correlation

Suppose we have two variables X and Y, with
means XBAR and YBAR respectively and
standard deviations SX and SY respectively. The
correlation is computed as:

The correlation coefficient, r, quantifies the
direction and magnitude of correlation.
Correlation
Correlation

What is variance? (Contrast with correlation)

The correlation coefficient is the covariance
divided by the standard deviations of the two
variables (normalized):
Tobler’s first law of geography

“All things are similar, but nearby things are
more similar than distant things.”

Or, more poetically
"All things by a mortal power,
Near or far
Hiddenly
To each other linked are,
That thou canst not stir a flower
Without the troubling of a star"
Francis Thompson, "The Mistress of Vision" 1897
Spatial autocorrelation

Extending the concept of correlation by
incorporating space. Note, however, that the
concept of spatial autocorrelation is actually
more related to that of variance than of
correlation since we are examining the nature of
the variability of a variable with itself through
space.
Spatial autocorrelation

“Usually autocorrelation means correlation among
the data from different time periods. Spatial
autocorrelation means correlation among the data
from locations. There could be many dimensions of
spatial autocorrelation, unlike autocorrelation
between periods.”
[http://economics.about.com/od/economicsglossary/g/spatial.htm]
Spatial autocorrelation

“Is a term referring to the degree of relationship that
exists between two or more spatial variables, such
that when one changes, the other(s) also change.
This change can either be in the same direction,
which is a positive autocorrelation, or in the opposite
direction, which is a negative autocorrelation. For
example, soil type and vegetation may be highly
correlated, either positively or negatively depending
upon the type of soil and vegetation under
examination.” [http://www.geo.ed.ac.uk/agidexe/term?66]
Spatial autocorrelation



How to determine paired observations when working in space?
The question of identifying pairs of observations become
complex when we move from the traditional statistics world (i.e.,
aspatial) to the real (i.e., spatial) world.
Should we only consider
 nearest neighbours (1st order) or also include
 2nd order (neighbours of neighbours) and additional higher-order
neighbours?
Should we use
 simple binary adjacency coefficients / weights (1 if immediately
adjacent / shared boundary; 0 if not) or
 distance-weighted measures of adjacency (distance could be
simple geographic or based on the length of the shared border or
friction-based geographic or genetic or …).
Space as a matrix

W where wij is some measure of interaction




adjacency
decreasing function of distance
invariant under rotation, displacement
readily obtained from a GIS
Applications of the W matrix

Spatial regression



add spatially lagged terms weighted by W
Anselin’s SPACESTAT
Moran and Geary indices of spatial
dependence
n  1 wij xi  x j 
2
c
i
j
2 wij   xi  a 
2
i
j
i
Three dimensions of scale

When we think of scale, from a cartographic point-of-view, we
are discussing only the output scale (a map with a large scale of
1:10,000 would cover just a small area). In ecology, scale is
more often associated with the size of the study area (a largescale study would encompass a large area). In landscape
ecology terms such as grain and extent are often used,
resolution is used in remote sensing, ‘support’ is a term used in
geostatistics. There is a wide diversity of terms used in a wide
array of disciplines that often share similar but not exact shades
of meaning. Always be explicit in your use of a term. However,
when fully considered we should recognize that there are
multiple dimensions of scale:
Three dimensions of scale

The phenomenon (or process) itself—the size and
phase spacing, and the range of action and extent
of the effect. (Examples: Appalachian Mountains
have an obvious size and phase spacing; clear
cutting has an obvious footprint with an effect that
can extend beyond that footprint). It is also
important to note that patches may have one set of
characteristics (e.g., mean patch size and variance)
and that within a class there will be another set of
characteristics (e.g., mean distance between
patches within a class).
Three dimensions of scale

The sampling units—the sample size and shape, the
spacing of the samples, and the extent of the study
area. (Relates to the fine-scale [high resolution]
variation that can be detected, and to the largescale [coarser resolution] variation that may be
detected.)
Three dimensions of scale

The analysis—the size of the analytical units, the
spacing of them and the extent of the analysis can
be different from the scales associated with the
phenomenon being studied (e.g., when computing a
variogram the rule of thumb is that statistics should
not be calculated to greater than one-third to onehalf of the extent of the study domain; if the data
were a satellite image, the samples may be
randomly selected throughout the image and the
pixels may be 30m or 1.2km resolution, also
consider IFOV issues).
Units of observation

When working in space, selecting the units of
observation will have a significant impact on the
results (MAUP). Is the unit of observation smaller
than potential objects of interest, the same size as
the objects of interest, or larger than the objects of
interest (think of quadrats and what happens when
the sampling point includes a tree? – such studies
often require two or more sampling frameworks)? If
larger than the objects of study, then aspects of the
modifiable areal unit problem must be considered.
Regionalized variables

A variable that takes on values according to its
spatial location is known as a regionalized variable.
Considering a variable z measured at location i, we
can partition the total variability in z into three
components:

z(i) = f(i) + s(i) + ε

where f(i) is some coarse-scale forcing or trend in
the data, s(i) is local spatial dependency, and ε is
error variance (presumed normal).
Regionalized variables
blue dots represent the data
Regionalized variables
The structural component (e.g., a linear trend)
The spatially correlated component
The random noise component (non-fitted)
Regionalized variables



Regionalized variables are variables that fall
between random variables and completely
deterministic variables.
Typical regionalized variables are functions
describing variables that have geographic
distributions (e.g. elevation of ground surface).
Unlike random variables, regionalized variables
exhibit spatial continuity; however, the change in the
variable is so complex that they cannot be described
by any deterministic function.
The variogram is used to describe regionalized
variables
Variograms
In mathematical terms, the semi-variogram:
Where h represents a distance vector.
Recall that the variance is expressed as:
Variograms
The semi-variogram is
based on modelling the
(squared) differences in
the z-values as a function
of the distances between
all of the known points.
Variograms
In graphical terms:
Why ‘semi’-variogram? As the lag distance approaches infinity,
the expression converges to twice the variance. Therefore, dividing by 2
means that the sill approximates the variance.
Variograms
This is an example of
a variogram produced
using ArcGIS's
Geostatistical Analyst.
Variograms

Statistical assumptions:





Stationary—mean and variance are not a function of location. Second-order
(weak) stationary is required—variance is a function of the separation
distance.
Isotropy—no directional trends occur in the data (as contrasted with
anisotropy). However, you can compute directional variograms in order to
assess directional trends in the data.
Unbounded variograms (i.e., with no sill) are evidence of nonstationary
variables.
Use of trend surface analysis to remove global trends in the data (to
transform a non-stationary variable [mean varies across space] to a
stationary one).
Lag distances – typically we group the distance intervals into classes so
that we can have enough sample points within any one distance class
(typically 30 is suggested as the minimum number). Small-scale (high
resolution) variation (at the resolution implied by the original sampling
scheme) may not be detectable as a result.
Variograms



The technique can provide a quantification of the scale of
variability exhibited by natural patterns of resource distributions
(although correlograms may be better for this, since you can
conduct statistical tests on the results) and an identification of the
spatial scale at which the sampled variable exhibits maximum
variance.
At larger lag distances (beyond the natural ‘scale’ of the
phenomenon) harmonic effects can be noted, in which the
variogram peaks or dips at lag distances that are multiples of the
natural scale.
Given the noise present in natural environmental data sets, it is
unlikely that you will be able clearly to identify multiple scales.
One approach might be to fit a semivariogram model to the data,
and to examine the residuals for the presence of multiple
patterns of scale.
Variograms
Variograms
http://zappa.nku.edu/~longa/geomed/modules/geostats_lite/lec/illinois.html
Kriging

Kriging is a spatial interpolation technique based on
semi-variograms. Unlike every other spatial
interpolation technique, kriging provides a map that
shows you the uncertainty associated with the
prediction.
Kriging
TIN
Prediction standard error map
Prediction map
Integrating GIS and spatial
statistics

Role of space





Terminology


an organizing dimension for information
a source of context and linkage
an explanatory variable
a problem
lattice, support, drift, topology, layer, coverage, region
Software as glue

within what conceptual framework?
Download