ppt

advertisement
Spatial Interpolation
Lecture 7-8
Purposes
• Estimating or interpolating values at
unsampled sites within the area covered
by existing observations
– Visiting every location is usually difficult or
expensive.
– Assumption: spatially distributed objects are
spatially correlated. in other words, things are
close together tend to have similar
characteristics (first law of geography, also
called spatial autocorrelation).
example
• Few sample points – to fill all cells
Example: Point elevation to
surface
Deterministic or Geostatistical
• Deterministic interpolation is directly based on the
surrounding measured values or on specified
mathematical formulas that determine the smoothness of
the resulting surface.
• Geostatistical interpolation is based on statistical models
that include autocorrelation (statistical relationships
among the measured points).
• In generally, deterministic method is less accurate but less
computation expensive, geostatistic method is more
accurate but more computation expensive.
Global interpolation
• global interpolators determine a single
function which is mapped across the
whole region
– a change in one input value affects the entire
map
– global algorithms tend to produce smoother
surfaces with less abrupt changes are used
when there is an hypothesis about the form of
the surface, e.g. a trend
Local interpolation
• Local interpolators apply an algorithm repeatedly
to a small portion of the total set of points. On
average, values at points closer in space are
more likely to be similar than point further apart
(spatial autocorrelation)
– A change in an input value only affects the result
within the window
• Two important steps for local interpolation
– Define sampling neighborhood
– Find points (samples) in the neighborhood
If no directional influence
• If there are no
directional influences
in the data, you want
to give equal weight
to sample points
regardless of their
direction from the
prediction location.
This means that you
probably want your
neighborhood to be a
circle
If directional influence
• if there is directional
influence in your data
(such as might be
caused by water
draining downhill),
then you may want to
make an ellipse with
the major axis
running
uphill/downhill
Sample points
•
Once a shape is specified,
you can restrict which
sample points within the
neighborhood are used.
You do this by specifying
the maximum and
minimum numbers of
points to use and by
dividing the neighborhood
into sectors. If the
neighborhood is sectored,
then the maximum and
minimum constraints are
applied to each part
1. Explore data before interpolation
• Before creating a surface, explore data (ED) tool enables
you to gain a deeper understanding of the phenomena
you are investigating so that you can make better
decisions on issues relating to your data.
• Exploring the distribution of the data, looking for global
and local outliers, looking for global trends, examining
spatial autocorrelation, and understanding the covariation
among multiple datasets.
• Tools includes Histogram, Voronoi Map, Normal QQPlot,
Trend Analysis, Semivariogram/Covariance Cloud,
General QQPlot, and Crosscovariance Cloud.
• Only works on point and polygon layers
1.1 Histogram
•
•
•
•
provides a univariate (onevariable) description of your
data, displays the
frequency distribution for
the dataset of interest and
calculates summary
statistics.
measures of location:
- mean, median, and
quartiles
measures of spread:
- standard deviation,
variance
measures of shape:
- skewness, kurtosis
1.2 QQPlot
The quantile-quantile (q-q) plot is a graphical technique for determining if two data sets
come from populations with a common distribution.
A q-q plot is a plot of the quantiles of the first data set against the quantiles of the second
data set. By a quantile, we mean the fraction (or percent) of points below the given
value. That is, the 0.3 quantile (or 30%) is the point at which 30% of the data fall
below and 70% fall above that value
0.3 quantile is also called 30 percentile, 0.4 quantitle is 40 percentile
25 percentile is called first quantitle
50 percentile is called second quantitle (equal to median value)
75 percentile is called third quantitle
100 percentile is called fourth quantile
The q-q plot is used to answer the following questions:
• Do two data sets come from populations with a common distribution?
• Do two data sets have common location and scale?
• Do two data sets have similar distributional shapes?
• Do two data sets have similar tail behavior?
standard
•
•
Normal QQPlot
General QQPlot
QQPlots are graphs on which
quantiles from two distributions are
plotted relative to each other.
a cumulative distribution is produced
by ordering the data and producing a
graph of the ordered values versus
cumulative distribution values
1.3 Voronoi map
• Voronoi maps are
constructed from a series of
polygons (thiessen polygon)
formed around the location
of a sample point.
• The value for each polygon
can be calculated using any
of methods:
- simple, mean, mode,
cluster, entropy, median
standard deviation, IQR
(interquartile range)
1.4 Trend analysis
• You may be interested in
mapping a trend, or you
may wish to remove a
trend from the dataset
before using kriging. The
Trend Analysis tool can
help identify global trends
in the input dataset.
globe trend and anisotropy
• globe trend is an overriding process that
affects all measurements, can be represented
by a mathematical formula (polynomial)
• anisotropy is a random process that shows
higher autocorrelation in one direction than in
another (directional autocorrelation or
directional influence). the reason for
directional influence may not be known, but
they can be statistically quantified.
1.5 Semivariogram/covariance cloud
•
•
The semivariogram/covariance
cloud shows the empirical
semivariogram (half of the
difference squared) and covariance
for all pairs of locations within a
dataset and plots them as a
function of the distance between
the two locations.
the empirical semivariogram for the
(i,j)th pair is simply
0.5*(z(si)-z(sj))2, and the empirical
covariance is the cross-product
where is the
average of the data. The
semivariogram/covariance cloud
can be used to examine the local
characteristics of spatial
autocorrelation within a dataset
and look for outliers.
Creating Variography
Semivariogram depicts the spatial autocorrelation
Understanding a semivariogram range, sill, and nugget
The distance where the model first flattens out is known as the range
The value at which the model attains the range is called the sill
The value at which the model intercepts the y-axis is called the nugget
Fitting the semivariogram
• Circular, spherical, exponential,
Gaussian, and linear.
Making a prediction
• prediction based on the semivarioogram
model and the measured values that are
nearby (using search radius)
• fixed search radius: requires a distance and
minimum number of points
• variable search radius: number of points
needs to be specified. You can also specify a
maximum distance (radius), that the search
radius cannot exceed.
To explore a directional influence in the
semivariogram cloud, we use the
Search Direction tools
• the direction the pointer is facing
determiners which pairs of data
locations are plotted on the
semivariogram.
• lag size is the size of a lag
distance, used to reduce the larger
number of possible combinations. it
is the size of the cells in the
semivariogram surface.
• number of cells is called number of
lags, counted as the number of
adjacent cells in a straight
horizontal or vertical line from the
center to the edge of the figure.
1.6 Crosscovariance cloud
•
•
The crosscovariance cloud
shows the empirical
crosscovariance for all pairs of
locations between two datasets
and plots them as a function of
the distance between the two
locations.
Let z(si) denote the value at the
i th location in dataset 1, and
let y(tj) denote the value at the j
th location in dataset 2.
Relation between variogram and
covariance
2. Inverse Distance
Weighted (IDW)
• Each sample point has a local influence that
diminishes with distance.
• Weights the points closer to the processing
cell more heavily than those farther away.
• Operator controls how weighting is done.
– Power.
• High power gives more weight to closer points.
– Radius type.
• Considers how far away to look.
– Barrier.
• Search can be limited by other polygons or polyline.
IDW Characteristics
• Inverse Distance Weighting (IDW) is a
quick deterministic interpolator that is
exact. There are very few decisions to
make regarding model parameters. It can
be a good way to take a first look at an
interpolated surface. However, there is no
assessment of prediction errors, and IDW
can produce "bulls eyes" around data
locations. There are no assumptions
required of the data.
IDW: a local interpolator
r
r
r
We usually uses power functions greater than 1.
A r = 2 is known as the inverse distance
squared weighted interpolation.
Adv. and disadv.
3. Global Polynomial interpolator
• Global Polynomial (GP) is a quick deterministic
global interpolator that is smooth (inexact).
There are very few decisions to make regarding
model parameters. It is best used for surfaces
that change slowly and gradually. However,
there is no assessment of prediction errors and it
may be too smooth. Locations at the edge of
the data can have a large effect on the surface.
There are no assumptions required of the data.
GP interpolation fits a polynomial
regression to x,y coordinates
First order
Second order:
Third order:
4. Local Polynomial interpolator
• Global Polynomial interpolation is the only method in Geostatistical
Analyst that does not use a search neighborhood. If you add the
idea of a search neighborhood to Global Polynomial interpolation,
you get Local Polynomial interpolation;
• Local Polynomial (LP) is a moderately quick deterministic
interpolator that is smooth (inexact). It is more flexible than the
global polynomial method, but there are more parameter decisions.
There is no assessment of prediction errors. The method provides
prediction surfaces that are comparable to kriging with measurement
errors. Local polynomial methods do not allow you to investigate the
autocorrelation of the data, making it less flexible and more
automatic than kriging. There are no assumptions required of the
data.
• Local polynomial interpolation
creates a surface from many
different formulas, each of which is
optimized for a neighborhood
• The neighborhood shape,
maximum and minimum number of
points, and sector configuration can
be specified. In addition, as with
IDW, the sample points in a
neighborhood can be weighted by
their distance from the prediction
location. Thus, local polynomial
interpolation produces surfaces that
better account for local variation.
5. Radial Basis Functions
• Radial Basis Functions (RBF) are moderately quick
deterministic interpolators that are exact. They are much
more flexible than IDW, but there are more parameter
decisions. There is no assessment of prediction errors.
The method provides prediction surfaces that are
comparable to the exact form of kriging. Radial Basis
Functions do not allow you to investigate the
autocorrelation of the data, making it less flexible and
more automatic than kriging. Radial Basis Functions
make no assumptions about the data.
• RBF seems like a rubber membrane that is fitted to each
of the measured data points while minimizing the total
curvature of the surface. Because the surface must pass
through each sampled point, radial basis functions are
exact interpolators
Radial basis functions (Spline)
• Radial basis functions (RBF)
methods are a series of exact
interpolation techniques, that
is, the surface must go through
each measured sample value.
• There are five different basis
functions: thin-plate spline,
spline with tension, completely
regularized spline,
multiquadric function, and
inverse multiquadric spline
• RBFs are conceptually similar
to fitting a rubber membrane
through the measured sample
values while minimizing the
total curvature of the surface.
http://www.math.ucla.edu/~baker/java/hoefer/Spline.htm
IDW will never predict values above
the maximum measured value or below
the minimum measured value.
Spline can
Wi
Radial basis
functions
When running spline in Spatial Analyst
weight:
- regularized: 0, 0.001, 0.01, 0.1, 0.5
the higher the weight,
the smoother the surface
- tension: 0, 1, 5, 10
the higher, the coarser
number of points:
- used in the calculation. the more points.
the smoother the surface
Wi
Bias or error
6. Kriging
• Kriging is a moderately quick interpolator that can be
exact or smoothed depending on the measurement error
model. It is very flexible and allows you to investigate
graphs of spatial autocorrelation. Kriging uses statistical
models that allow a variety of map outputs including
predictions, prediction standard errors, probability, etc.
The flexibility of kriging can require a lot of decisionmaking. Kriging assumes the data come from a
stationary stochastic process, and some methods
assume normally-distributed data.
Kriging assumptions
• Spatially continuous data (not event or
discrete).
– If your data is discrete data, density analysis
might be more appropriate analysis
• Spatial autocorrelation (the closer in
distance, the closer in value, using
semivariogram)
– If your data is not spatially autocorrelated, the
Kridging is not appropriate. Other statistical
methods may be appropriate.
Cont’
• Stationary (values depends on distance not on location).
– you can test this using Voronoi map, high local
varation can be shown in the Voronoi map, then you
should use IDW and others.
• Normally distributed data
– Use QQ plot to test
• Global trends (not allowed for and need to be removed
first)
– Kridging assumes a constant mean across the
surface
• Spatial clustering (not allowed and need to be removed
first)
Trend and error
•
In Kriging, a predicted value depends on two factors: a trend and an additional
element of variability. Z(s) = μ(s) + ε (s).
–
•
•
This is an intuitive idea with plenty of analogies in the real world. For instance, if you go from the
ocean to the top of a mountain, you have an upward trend in elevation. However, there is likely to
be variation on the way—you will go both up and down when crossing valleys, streams, knobs
and other features.
The trend part of a prediction is called the deterministic trend. The fluctuation
part is called spatially-autocorrelated random error. Variations on this formula
form the basis of all the different types of Kriging. Ordinary Kriging assumes a
constant unknown mean and estimates mean in the searching neighborhood,
whereas simple Kriging assumes a constant known mean.
"Error" doesn't mean a mistake—it just means a fluctuation from the trend.
– Assumption one: ε (s) is zero, (positive errors and negative errors)
– Assumption two: the autocorrelation of the error is purely spatial; it
depends only on distance and not on any other property (such as position)
of a location.
Autocorrelation between ε(s1) and ε(s1+h) is the
same as the one between ε(s2) and ε(s2+h)
Ordinary Kriging
• In many cases, however, there is
no trend in the data—or, if there
is one, it is weak enough that
your predictions are just as good
when you ignore it. Assuming that
there is no trend in the data is
mathematically equivalent to
assuming that the data have a
constant mean value.
• If the mean is a simple constant,
such as μ(s) = μ (i.e., no trend)
for all locations s, and if µ is
unknown (you do not have prior
knowledge of the mean value),
then this is the model on which
Ordinary Kriging is based.
Universal Kriging
• Sometimes, there is a trend
where the data values
change consistently with
the spatial coordinates.
Mathematically, this is
represented as a linear
regression equation on the
spatial x- and ycoordinates. Trends that
vary (do not have a
constant mean), and for
which the regression
coefficients are unknown,
form models for Universal
Kriging.
Simple Kriging and indicator Kriging
• A known constant mean for the entire dataset,
then you have the model for Simple Kriging.
• look at the left side of the equation, Z(s) = μ(s) +
ε(s). You can do some useful math operations
on Z(s). For example, suppose you want to
predict the probability that Z(s) is above or
below some threshold value, such as 0.12 ppm
for ozone concentration. You can transform Z(s)
to an indicator variable, where it gets the value 0
if Z(s) is below the threshold and 1 if it is above
it. This is called Indicator Kriging.
Ordinary Kriging
Universal Kriging
Simple Kriging
Indicator Kriging
7. Cokriging
• Cokriging is a moderately quick interpolator that can be
exact or smoothed depending on the measurement error
model. Cokriging uses multiple datasets and is very
flexible, allowing you to investigate graphs of
autocorrelation (one dataset) and cross-correlation
(between two or more datasets). Cokriging uses
statistical models that allow a variety of map outputs
including predictions, prediction standard errors,
probability, etc. The flexibility of cokriging requires the
most decision-making. Cokriging assumes the data
come from a stationary stochastic process, and some
methods assume normally-distributed data.
• Read papers 3,4
8. definitions of output maps (for
Kriging methods)
• Prediction maps (interpolated maps) estimate values at
locations where measurements have not been taken.
• Standard error maps (the square root of the variance of
a prediction) show the distribution of prediction error for
a surface. Error tends to be highest in places where
there is little or no sample data.
• Quantile maps show the values with 100 percent
probability that the true values are less than the quantile
map values.
• Probability maps show the odds that the true value at a
location is greater than a threshold value. The probability
of exceeding a threshold is determined from predicted
values, the error distribution, and the specified threshold
value
Kriging family and output maps
• Transformation and trend removal can
help justify assumptions of normality
and stationarity
creating a prediction map
• while applying a transformation
• while using detrending
• while considering error for nugget
9. Performing diagnostics
•
•
•
•
before produce a final surface, you should know how well the model predicts the
values at unknown locations. cross-validation and validation help you make an
informed decision as to which model provides the best predictions.
cross-validation uses all of the data to estimate the autocorrelation model. Then
it removes each data location, one at a time, and predicts the associated data
value. compare the predicted value with measured value
validation first remove part of the data (test dataset) using Create Subset tool,
and then uses the rest of the data (training dataset) to develop the trend and
autocorrection models to be used for prediction
in both methods, graphs and summary statistics used for diagnostics are the
same: predicted, prediction error (predicted-measured), standardized error
(error/estimated kriging standard error), normal QQPlot (standardized error and
standard normal distribution)
create subset for validation
demo
Basic rules for good predicts
• Mean error should be close to 0,
• RMS (root-mean-square) error, average standard error, and mean
standardized error should be as small as possible,
• Root mean square stardardized error should be close to 1
• uncertainty of prediction standard errors: average estimated
standard error versus RMS prediction error.
- equal, good
- larger than RMS, overestimate
- less than RMS, underestimate
or RMS standardized error
- =1, good,
- <1, overestimate
- >1, underestimate
10. Compare model results
•
•
model
surfaces
can be
compared
using
crossvalidation
statistics
previous
basic rules
for good
predict still
use in here
11. Displaying geostatistical layers
12. Comparison of different mathods
here stochastic is geostatistic
Summary
•
•
•
•
•
•
•
•
•
Kriging performs statistical analysis of the error in its predictions. This allows it to
create four kinds of surfaces: prediction, standard error, quantile, and probability.
Prediction maps estimate values at locations where measurements have not
been taken. (All interpolators make prediction maps.)
Standard error maps show the distribution of prediction error for a surface. Error
tends to be highest in places where there is little or no sample data.
Quantile maps show the values that the true values are unlikely to exceed.
Probability maps show the odds that the true value at a location is greater than a
threshold value.
The various interpolation methods (Inverse Distance Weighting, Global
Polynomial, Local Polynomial, Radial Basis Functions, and Kriging) offer tradeoffs in speed, flexibility, and their advantages and disadvantages.
Fast interpolators produce output surfaces quickly, but are not as good at
capturing subtle surface variations.
Exact interpolators predict values equal to the observed value at all sampled
locations. Smooth interpolators do not.
Flexible interpolators allow users to fine-tune the output, while inflexible
interpolators allow users to avoid making lots of choices
Main references
• ESRI book “using ArcGIS Geostatistical
Analyst”
• ESRI visual campus at campus.esri.com
Download