Uploaded by anton.wolw00

Statistical aspects of spatial interpolation - 2014-10 (1)

advertisement
Statistical aspects of
spatial interpolation
Lars Harrie
Department of Physical Geography and Ecosystem Analysis
Lund University
2008-10-24
Latest update: 2014-10
Table of contents
1. Introduction ................................................................................................................. 1
1.1 Background ........................................................................................................... 1
1.1 Aim of this document............................................................................................ 1
1.2 Content .................................................................................................................. 2
1.3 Explanation of terms ............................................................................................. 2
1.4 Further reading ...................................................................................................... 2
2. Basic interpolation methods ........................................................................................ 4
2.1 An example of annual rainfall............................................................................... 4
2.2 General formula for interpolation ......................................................................... 5
2.3 Mean value ............................................................................................................ 5
2.4 Nearest neighbour ................................................................................................. 5
2.5 Inverse distance weighting .................................................................................... 6
2.6 Comparison of the interpolation methods ............................................................. 7
3. Statistical models ........................................................................................................ 9
3.1 Spatial autocorrelation .......................................................................................... 9
3.2 A statistical model for an entity’s distribution .................................................... 10
3.3 A statistical model for annual rainfall ................................................................. 11
3.4 Random variables................................................................................................ 13
3.5 Expectation value, variance, standard deviation, covariance, correlation
coefficient and semivariance..................................................................................... 14
4. Characteristics of optimal interpolation methods and kriging .................................. 23
4.1 Characteristics of the optimal interpolation method ........................................... 23
4.2 Workflow of kriging interpolation ...................................................................... 25
4.3 Theory of kriging interpolation ........................................................................... 27
5. Spatial prediction using additional information ........................................................ 30
5.1 Basic theory of spatial prediction ....................................................................... 30
5.2 The workflow of spatial prediction ..................................................................... 32
6. Selecting interpolation methods ................................................................................ 33
6.1 Additional information available ........................................................................ 35
6.2 Very small spatial autocorrelation ...................................................................... 35
6.3 Very large spatial autocorrelation ....................................................................... 37
6.4 General case ........................................................................................................ 39
Acknowledgements ........................................................................................................... 40
References ......................................................................................................................... 40
1. Introduction
1. Introduction
1.1 Background
A common operation in geographical analysis is interpolation. In interpolation a value
for a point is estimated by values from other points. In Figure 1.1 the annual rainfall of
the point p is estimated from the measured annual rainfall at the meteorological stations.
There are several interpolation methods to estimate the rainfall at the point p; in this
document only a few of them are described.
p
Figure 1.1: The rainfall at point p can be estimated from the measured annual rainfall at
the meteorological stations (triangles).
1.1 Aim of this document
The aim of this document is to describe the statistical aspects of interpolation. It describes
basic interpolation methods such as mean value, nearest neighbour and inverse distance
weighting; the document also provides the basic theory of the more advanced methods:
kriging and spatial prediction with additional information. The description of the
interpolation methods is based on statistical theory. The statistical theory is only briefly
described in this document; it is highly recommended that the reader have basic statistical
knowledge from previous studies.
Statistical knowledge is important for carrying out interpolation correctly. However,
interpolation also requires other knowledge. Important are e.g. firm knowledge about the
entity to interpolate (e.g. rainfall) and practical skills in the use of computer programs for
interpolation (e.g. GIS programs). These latter aspects are not treated in this document.
1
1. Introduction
1.2 Content
The document is organised as follows:
Section 2: Describes three basic interpolation methods: mean value, nearest neighbour,
and inverse distance weighting. This section only describes how to use the interpolation
method and not when it is suitable to use them.
Section 3: Before we perform interpolation we need to have statistical knowledge about
the distribution of the entity. In this section we describe a statistical model often used to
characterize an entity.
Section 4: This section describes what constitute an optimal interpolation method. A
description of the “optimal” interpolation method kriging is also provided.
Section 5: In some cases we have access to additional information that could be used to
enhance the quality of the interpolation. The aim of this section is to study a method for
spatial prediction using this additional information.
Section 6: The aim of this section is to provide guidelines about how we should select
interpolation methods based on the statistical knowledge of the entity.
1.3 Explanation of terms
In the document we will use the following definitions:
* entity - a quantity that is interpolated (e.g., rainfall, temperature, altitude, CO2).
* observation point – a point where the entity is measured (observed). This could be a
meteorological station, a place where a soil sample is collected, etc.
* interpolation point – a point where the entity value is to be interpolated (e.g. point p in
Figure 1.1).
1.4 Further reading
To understand interpolation it is important to have good knowledge in both statistics and
GIS (and of course also in the application field). Some recommended books in these
fields are:
Statistics:
Blom, G., Enger, J., Englund, G., Grandell, J., and Holst, L., 2005. Sannolikhetsteori och
statistikteori med tillämpningar. Studentlitteratur. (in Swedish)
Haining, R., 1990. Spatial data analysis in the social and environmental sciences.
Cambridge University Press.
Vännman, K., 2002. Matematisk statistik. Studentlitteratur. (in Swedish)
GIS:
Burrough, P., and McDonnell, R., 1998. Principles of Geographical Information Systems.
Oxford University Press.
2
1. Introduction
Harrie, L., 2013. Geografisk informationsbehandling – teori, metoder och tillämpningar,
6th ed. (in Swedish)
Östman, A., 1995. Interpolering av geografiska data. Luleå Tekniska Universitet. (in
Swedish)
3
2. Basic interpolation methods
2. Basic interpolation methods
The aim of this section is to describe some basic interpolation methods. It is important to
understand these basic methods well. This knowledge is required in order to use more
advanced methods, such as kriging, in a proper way. The interpolation methods described
in this section are:
 mean value (sub-section 2.3),
 nearest neighbour (sub-section 2.4), and
 inverse distance weighting (sub-section 2.5).
The methods are illustrated by an annual rainfall example (described in sub-section 2.1).
2.1 An example of annual rainfall
Imagine that we want to interpolate the annual rainfall z of point p in Figure 2.1. To
perform the interpolation we can use the four observation points in Table 2.1.
Table 2.1: Values for the observation points (1-4) and the interpolation point p.
Observation points
x (km)
y (km)
Measured annual rainfall z (mm)
Point 1
0.0
0.0
400
Point 2
1.0
0.0
500
Point 3
0.0
1.0
600
Point 4
1.0
1.0
800
Interpolation point
x
y
Point p
0.2
0.2
To be estimated
y (km)
1
800
600
p
500
400
1
x (km)
Figure 2.1: The annual rainfall for point p is to be interpolated from the measured values
at the four observation points (see Table 2.1).
4
2. Basic interpolation methods
2.2 General formula for interpolation
The three interpolation methods described in this section (mean value, nearest neighbour,
and inverse distance weighting) are all special cases of the general formula of
interpolation. The general formula is stated as:
n
z( x p , y p ) 

i 1
i
* z ( xi , y i )
(2.1)
n

i 1
i
where
z(xp,yp) is the interpolated value at point p,
z(xi,yi) is the measured value at the observation point i,
λi is the weight for the measured value of the observation point i, and
n is the number of observation points.
2.3 Mean value
A simple approach is to set the same weight for all observation points (e.g. setting λi =1
for all i in Equation 2.1). This implies that the interpolated value - z(xp,yp) – equals:
n
z mv ( x p , y p ) 
 z( x , y )
i
i 1
i
(2.2)
n
If we apply Equation 2.2 for the annual rainfall example we obtain:
zmv(xp,yp) = (400+600+500+800) / 4 = 575 mm.
2.4 Nearest neighbour
One might argue that the value at point p is most likely quite equal to the value at the
closest observation point. This is the basic assumption of the nearest neighbour
interpolation method. This assumption can be implemented by manipulating the weight
terms in Equation 2.1, setting the weight for the closest observation point to 1 (λi =1,
where i is the closest observation point to the interpolation point p) and the weight for the
other observation points to zero.
By reference to Figure 2.1, we see that observation point 4 is closest in space to
interpolation point p. Hence, we get znn(xp,yp) = 400 mm when using nearest neighbour
interpolation for the annual rainfall example.
5
2. Basic interpolation methods
2.5 Inverse distance weighting
In the inverse distance weighting (IDW) method the weights are given as:
i 
1
,
dk
(2.3)
where
d is the distance between points (xi,yi) and (xp,yp), and
k is a an exponent that determines the weight’s dependence on the distance d.
If we introduce this expression for the weights to the general formula for interpolation
(Equation 2.1) we obtain:
n
z idw ( x p , y p ) 
1
 z( x , y )  d
i 1
i
i
k
n
1

k
i 1 d
(2.4)
The value of the exponent k describes how the weights depend on the distance d. If a low
value is used for k all the observation points get similar weights. In the extreme case we
set k=0 which implies that all observation points get the same weights, as for the mean
value interpolation. This follows from that d0=1 for all values of d. If k is set to a high
value then the observation points that are close to the interpolation point get much higher
weights than observation points further away. The extreme case is, here, that k is
approaching infinity. Then only the closest observation point is considered; hence, the
inverse distance interpolation will be equal to the nearest neighbour interpolation. In
practice, it is seldom that the extreme cases k=0 or kinfinity are used. Normal values of
k are 2 or 3.
Inverse distance weighting can easily be computed using a table. E.g., if we use k=2 in
the annual rainfall example we get the values in Table 2.2. By using the values in the
table, the interpolated value for the point p is equal to:
zidw(xp,yp) = 7234.3/16.204 = 446 mm
Table 2.2: Inverse distance weighting computations
d (distance to p) λi (weight)
z(xi,yi)* λi (mm)
400
0.283
12.486
4994.4
500
0.825
1.469
734.5
600
0.825
1.469
881.4
800
1.132
0.780
624.0
16.204
7234.3
z(xi,yi) (mm)
Point 1
Point 2
Point 3
Point 4
Sum
6
2. Basic interpolation methods
2.6 Comparison of the interpolation methods
For the annual rainfall example (in sub-section 2.1) we get the following interpolated
values by using the various interpolation methods:
 Mean value - zmv(xp,yp) = 575 mm
 Nearest neighbour interpolation - znn(xp,yp) = 400 mm
 Inverse distance weighting (with k=2) - zidw(xp,yp) = 446 mm
Apparently, the methods estimate quite different values for the interpolation point. An
obvious question is: Which interpolation method provides the best estimation? There are
no simple truths here, but we can make the following remarks:
* A drawback of the mean value is that points that are far away influence the result to a
great extent. In the example, the observation point 4 has a large value (z(x4,y4)=800 mm)
which causes the high value of the interpolated mean value.
* The nearest neighbour interpolation method does not take into account that the annual
rainfall values seem to rise in a north-easterly direction. From Figure 2.1, we would
probably anticipate that the annual rainfall of point p is a bit larger than 400 mm.
In the annual rainfall example the inverse distance weighting method, with a medium
value of the distance dependence of the weight (such as k=2), seems to give the most
likely value. But this is more based on intuition than something we really know; to get
more information we must have better knowledge about the distribution of the annual
rainfall. For example, we do not know, from our example, the distribution pattern for
annual rainfall in the area (Figure 2.2); firm knowledge of this pattern is important to
select the best interpolation method.
7
2. Basic interpolation methods
z
600
500
400
300
200
100
x
0.2
0.4
0.6
0.8
1.0
Figure 2.2: This figure we illustrate possible distributions of annual rainfall values
between observation points 1 and 2 along the x-axis in Figure 2.1. From Table 2.1 we
know the values for x=0 (z=400) and x=1 (z=500), but we do not know anything about the
annual rainfall values between these points. The rainfall may vary slowly as for the
dashed line or it may change more rapidly (as for the solid line). Normally, the actual
variations between the observation points are unknown, and we only have access to
statistical properties of the variations.
The concept illustrated in Figure 2.2 can perhaps more easily be described using a real
world example. Imagine that you are to interpolate annual rainfall values. How large
distances could you accept to the observation points? 100 m, 1 km, 10 km, 100 km or
even 1000 km? At some point the distances are too long (i.e., the variations are too large
between the points), to perform meaningful interpolation.
To summarize, the selection of interpolation method must be based on knowledge of the
measured entity. In the next section, we describe a statistical model often used for this
purpose.
8
3. Statistical models
3. Statistical models
Knowledge about an entity’s distribution can be of two kinds:
1) Exact knowledge of how the entity varies, meaning that we know the values of
the entity for all points.
2) Statistical knowledge of how the entity varies.
In practice, the first kind of knowledge does not exist. All measurements are, in principal,
samples in space. This implies that we can never have knowledge of an entity’s values for
all points (and if we really have knowledge of the entity for all the points there would be
no reason to interpolate). This leaves us, in all practical examples, with the second type of
knowledge: statistical knowledge about the entity’s distribution.
This section deals with statistical properties of a measured entity. A description of spatial
autocorrelation follows in sub-section 3.1. Then, in sub-section 3.2, we describe a
statistical model of how an entity varies; which is exemplified in sub-section 3.3. This
model can be described using random variables (sub-section 3.4) and fundamental
statistical quantities associated with random variables: expectation value, variance,
standard deviation, covariance, correlation coefficient and semivariance (sub-section
3.5).
3.1 Spatial autocorrelation
Tobler’s first law of geography states:
Everything is related to everything else, but near things
are more related than distant things.
Or in other words, measured entity values for close points are more likely to be similar
than measured entity values for distant points. In geography this dependency between
entity values are denoted spatial autocorrelation. The spatial autocorrelation is
fundamental for interpolation; if there were no correlation between points, the theoretical
basis for interpolation would disappear. Furthermore, Tobler’s law also indicates that
close observation points should have higher weights than distant observation points. But
how much higher should these weights be? To answer this question we need a statistical
model of the entity’s distribution.
9
3. Statistical models
3.2 A statistical model for an entity’s distribution
The distribution of an entity, e.g. annual rainfall, is very complex. To perform
computations we need to use simplified statistical models. A common model is that the
values (z) of an entity in the plane (where the position in the plane is given by the
coordinates x and y) is composed of three parts (see also Figure 3.1):
z( x, y)  m( x, y)   ( x, y)   
(3.1)
where
m(x,y) is a structural component. This component is normally a constant value or a trend
surface (e.g. a polynomial surface with a low degree).
 ( x, y) is a spatially autocorrelated variable (often called a regionalized variable). Two
points that are lying close have similar values of  ( x, y) .
  is local variation that is not spatially autocorrelated. Normally it is assumed that   is
normal distributed with an expectation value of zero and a standard deviation equal to σ;
that is,    N 0,   . Two points that are close together can have completely different
values of   .
z
6
h
a
b
5
ε"
4
d
g
c
3
e
2
ε'(x,y)
f
m(x,y)
1
0.2
0.4
0.6
0.8
1.0
x, y
Figure 3.1: A statistical model of the measured values (z) at points a-h of an entity (these
measured values are drawn with filled circles). The structural component (m) is a
constant value in this case. The regionalised variable  ( x, y) changes slowly in the
plane. The size of the local variation (   ) is illustrated by a dotted line; as you can see
there is no spatial dependence for the local variations. The horizontal axis should really
be the xy-plane; but for practical reasons we have just drawn it in one dimension. One can
regard the horizontal axis to be a line in the horizontal plane (Figure 3.2).
10
3. Statistical models
a
b
c
d
e
f
g
h
Figure 3.2: An illustration of how the points a-h in Figure 3.1 could be lying in reality.
The line through the points is the same as the horizontal axis in Figure 3.1. The map is
from Open Street Map (http://www.openstreetmap.org/).
3.3 A statistical model for annual rainfall
The aim of this section is to illustrate the statistical model above with a practical
example. For this purpose we use annual rainfall in the region of Scania, southern
Sweden. The annual rainfall can be regarded as a random variable (Z) where all measured
annual rainfalls are observations of this random variable. Annual rainfall, measured at a
location with the coordinates x and y, can be regarded as consisting of the following
components:
z( x, y)  m( x, y)   ( x, y)   
(3.2)
where
m(x,y) is a value that is dependent on the climate zone. We here regard the whole of
Scania to lie within the same climate zone; this implies that m(x,y) is a constant value.
 ( x, y) is a component that to a high degree relies on the altitude of the meteorological
station. The higher above the sea level, the larger the annual rainfall (cf. Figure 3.3); in
Scania we estimate the annual rainfall to increase with 150 mm if the altitude increases
with 100 m (Blennow et al., 1999). Altitude is a parameter with a large spatial
autocorrelation; this implies that the component of annual rainfall, that is dependent on
the altitude, is spatially autocorrelated.
11
3. Statistical models
  is the local variation of the annual rainfall. This component can be regarded as a
purely random variable with no spatial autocorrelation. But, of course, there might be
physical explanation of these local variations, e.g. local relief, aspect, and exposure to
wind. Random measurement errors will also contribute to this component.
As given by the statistical model, the measured annual rainfall is dependent on
components that vary quite differently in space. To perform a proper interpolation of
annual rainfall, knowledge about these individual components is essential.
Figure 3.3: Annual rainfall and topography in Scania, Sweden. The isolines represent the
annual rainfall (mm). The background colour represents the altitude; the highest point in
Scania is about 210 meter above sea level.
The example of how to interpret the three components above is strongly linked to the
spatial scale of observation. At another scale, other physical explanations may contribute
to the different components. For example, in a smaller area topography might contribute
more to m(x,y) than to  ( x, y) .
12
3. Statistical models
3.4 Random variables
The statistical model in sub-section 3.2 can be described using the statistical quantities:
expectation value, variance, standard deviation, covariance, correlation coefficient and
semivariance. In this document these quantities are described briefly, but first we have to
introduce the concept of a random variable.
A random variable is a variable for which the exact value is unknown; the only thing that
is known is the likelihood of possible values. A random variable (below denoted Z) can
either be discrete or continuous (Figure 3.4). In the discrete case the random variable can
only have a fixed number of values; an example of a discrete random variable is the
number of points on a dice (which can have the following values [1, 2, 3, 4, 5, 6]). The
likelihood of each possible value is determined by the probability function which is
denoted pZ(z). The likelihood for each possible value of a dice is, of course, pZ(zi)=1/6. A
continuous random variable can have all possible values in an interval. The likelihood of
a value is determined by the probability density function fZ(z). The likelihood that a
continuous random variable is within an interval is computed by integration. For
example, the likelihood that the random variable Z has a value between a and b in Figure
3.4 is:
b
f
(3.3)
( z ) dz
Z
za
The total sum of likelihood for all possible values of a random variable is equal to one.
That is, we have the following relationship for the probability function in the discrete
random variable case:
n
p
i 1
Z
( zi )  1
(3.4)
where there are n possible values. For the probability density function in the case of
continuous random variables we have:

f
Z
( z ) dz  1
(3.5)
z -
In geographical applications it is more common to work with continuous random
variables than discrete random variables. In the next sub-section, annual rainfall is used
to illustrate some statistical quantities. Annual rainfall can be regarded as a continuous
random variable.
13
3. Statistical models
Discrete random variable Z
Continuous random variable Z
pZ(z)
fZ(z)
a
b
Figure 3.4: Discrete and continuous random variables. In the discrete case the random
variable can only have a fixed number of values (in the figure only 6 values are possible).
The probabilities of the values are illustrated by the height of the vertical lines. For a
continuous random variable all possible values along the number line are possible (e.g.
all real numbers in an interval).
3.5 Expectation value, variance, standard deviation, covariance,
correlation coefficient and semivariance
The aim of this sub-section is to briefly describe some statistical quantities. In brief the
quantities are:
 Expectation value - describes the most expected value of a random variable.
 Variance and standard deviation - describe the spreading of the random variable
around the expected value.
 Covariance and correlation coefficient - describe the correlation between two
random variables.
 Semivariance – describe the correlation between a random variable at two
different locations.
Below we state the definitions of the quantities. After each definition follows an example
of how the quantity is estimated from a data material.
Definition - Expectation value (of a continuous random variable Z), E(Z):
E Z  

z f
Z
( z ) dz
z -
where
z are values of the random variable, and
fZ(z) is the probability density function (cf. Figure 3.4).
14
(3.6)
3. Statistical models
Example - Expectation value
The expectation value can be estimated by the mean value ( z ) of all these observations.
The mean value is computed by:
n
z
z
i 1
i
(3.7)
n
The expectation value, variance and standard deviation are exemplified with data about
annual rainfall measured at a meteorological station (Table 3.1). The annual rainfall is a
(continuous) random variable Z and the measured values for each year are observations
(zi) of this random variable.
Table 3.1: Annual rainfall at a meteorological station (MS).
Year
Annual rainfall (mm) - MS 1
2000
412
2001
430
2002
512
2003
500
2004
440
If we are use the values from Table 3.1 in Equation 3.7 we get:
5
z
z
i 1
5
i

412  430  512  500  440
 459 mm
5
(3.8)
The expectation value is the most likely value of a random variable. If we have a data
material where all observations have the same “status”, the expectation value is estimated
by the mean value. However, this is not always the case. In Table 3.2 the observations do
not have the same status. Some observations are made during a longer period (a decade
rather than a year) and therefore have better quality (=a smaller variance). For this data
material the mean value is not the best estimation of the expectation value.
Table 3.2: Annual rainfall at a meteorological station (MS).
Period
Annual rainfall (mm) - MS 1
1970-1979
452
1980-1989
463
1990-1999
450
2000
412
2001
430
15
3. Statistical models
Definition - Variance, V(Z):

V Z   E ( Z   ) 2

(3.9)
where
  E Z  .
Example - Variance
In most cases it is not enough to know the expectation value of a random variable; we
also need to know how the observations are spread around the expectation value. The
variance describes this spreading of the observations. The variance (s2) is estimated by:
s2 
n
1
  ( zi  z ) 2
n  1 i 1
(3.10)
This implies that we obtain the following for the annual rainfall example (using zi from
Table 3.1):
s2 
5
1
  ( z i  459) 2  1975 mm 2
5  1 i 1
(3.11)
Be aware of the unit (mm2) of the variance of the annual rainfall. It is important that you
always use the correct unit.
One might wonder why we divide by “n-1” rather than by n. A short explanation for this
is that we use the same data material to estimate the mean value ( z ) as we use for
estimating the variance. If the expectation value of Z (E(Z)) is known we divide by n
instead of “n-1”. For a more rigorous explanation, you should study literature in statistics
describing the difference between a sample and the population from which it is drawn.
Definition - Standard deviation, D(Z):
DZ   V Z 
(3.12)
Example – Standard deviation
The standard deviation is, as well as the variance, a measure of the spread of the
observations. If the variance is known (or estimated) the standard deviation can easily be
computed. For our example, we compute the following standard deviation (s) for the
annual rainfall example:
s  s 2  44 mm
16
(3.13)
3. Statistical models
Definition - Covariance, C(Z,W):
CZ ,W   E(Z   Z )  (W  W )
(3.14)
where
 Z  E Z  , and
W  EW  .
Example – Covariance
Expectation value, variance and standard deviation are all characteristics of a single
random variable. The covariance is different. Covariance describes the dependency
between two random variables. An example of a dependency is that if the observation of
one random variable is high then the corresponding observation of the other random
variable is also high.
The covariance, between the random variables Z and W, is estimated by:
c z ,w 
1 n
 ( z i  z ) (wi  w )
n  1 i 1
(3.15)
Here we will exemplify covariance by the random variables annual rainfall (Z) and
annual temperature (W) (Table 3.3). As seen in Figure 3.5, there is a clear negative
relationship between these random variables.
Table 3.3: Annual rainfall and annual temperature at 6 meteorological stations. All values
are measured during the same year.
Meteorological station
1
2
3
4
5
6
Annual rainfall (mm)
(Z)
454
512
812
725
556
630
17
Annual temperature (oCelsius)
(W)
8.3
7.9
6.5
7.5
7.3
7.5
3. Statistical models
9
Annual temperature (degree Celsius)
8
7
6
5
4
3
2
1
0
0
200
400
600
800
1000
Annual rainfall (mm)
Figure 3.5: Relationship between annual rainfall and annual temperature. A scatter plot of
the values shown in Table 3.3.
The mean values for the annual rainfall (z ) and annual temperature (w ) are:
z = 614.8 mm
w = 7.50 oCelsius
These mean values are then used to estimate the covariance (cz,w) (using zi and wi from
Table 3.3):
c z ,w 
1 6
( z i  614.8) ( wi  7.50)  - 71.0 mm o Celsius

6  1 i 1
(3.16)
The covariance is here negative. This is always the case if there is a negative relationship
between the random variables (see Figure 3.5). Pay attention to the unit of the covariance.
18
3. Statistical models
Definition – Correlation coefficient,  Z, W  :
 Z ,W  
C Z ,W 
DZ   DW 
(3.17)
Example – Correlation coefficient
In the example above we have computed the covariance to be -71.0 mm * o Celsius. Does
this indicate a low or high correspondence between annual temperature and annual
rainfall? It is not easy to answer this question; and, generally, it is not easy to judge
correspondence between parameters solely based on covariance. If we had measured in
another unit, e.g. inches instead of mm, the covariance would have been different. It
becomes much easier if we compute the correlation coefficient. The correlation
coefficient is the covariance normalised by the standard deviations of the two random
variables (Equation 3.17). This makes the correlation coefficient unit less. Furthermore
the correlation coefficient is equal to:
 almost 1 – if there is a positive linear dependence between two random variables,
 almost -1 – if there is a negative linear dependence between two random
variables, and
 around 0 – if the random variables are linearly independent.
To compute the correlation coefficient in our example above we proceed as follows. Start
by computing the standard deviations (sz, sw):
sz =135 mm
sw =0,607 oCelsius.
Then we obtain the correlation coefficient (r) by dividing the covariance by these
standard deviations:
r
c z ,w
s z  sw
 0.87
(3.18)
The correlation coefficient is almost equal to -1. We can therefore conclude that there is a
strong negative linear dependency between the two random variables. Or, in other words,
the lower the temperature is, the more it rains.
19
3. Statistical models
Definition – Semivariance,  h  :
 (h) 

1
2
E Z ( x, y )  Z (( x, y )  h) 
2

(3.19)
where
Z – is a continuous random variable
x,y – are coordinates that determine the position in the plane (non-random)
h – a vector that describes a translation in the x,y-plane.
Example – Semivariance
In sub-section 3.1 we introduced the concept spatial autocorrelation, but no measure of
this concept was described. The aim is now to illustrate that semivariance can be used as
a measure of spatial autocorrelation. But we start with a short investigation of the term
spatial autocorrelation. The term has three major parts: spatial, auto and correlation.
Firstly, correlation indicates that it is a correlation of entity values. Secondly, auto
implies that it is a correlation between two observations of the same entity. Thirdly,
spatial means that the two entity values are separated in space. To conclude, spatial
autocorrelation is the correlation between two observations of the same entity in space. It
can, for example, be the correlation between annual rainfalls measured at two
meteorological stations.
Spatial autocorrelation can be measured by spatial covariance and the spatial correlation
coefficient. But normally semivariance is used. There is an analytical relationship
between these measures. If the spatial covariance is known, you can always compute the
semivariance. The exact relationship is not stated here, but we can conclude from the
definition (3.14 and 3.19) that if the covariance is high, the semivariance is low and vice
versa.
Semivariance is a measure of the correlation between two random variables (of the same
entity) at a distance h. Behind this definition lies the stationarity condition. The
stationarity condition for a random variable Z states:
 the expectation value of Z is the same for all point,
 the correlation (or semivariance) between random values of Z at two points is
only dependent on the distance (and sometimes also direction) between the points.
To utilise semivariance these two conditions must be true. This implies that if the
expectation value of Z varies we have to remove this variation before semivariance is
computed. In the statistical model in Equation 3.1 the expectation value of Z is equal to
m(x,y). To conclude, you should always start by estimating m(x,y) (e.g. by using
polynomials in the plane). If m(x,y) is not a constant value it has to be removed before
you compute the semivariance.
Normally, you assume that the correlation between two points is only dependent on the
distance between the point and not on the direction. This is called isotropy (=direction
independent). This implies that the semivariance in Equation 3.19 is only a function of
the length of the vector h; this length we denote h .
20
3. Statistical models
After this rather theoretical discussion it is time to illustrate semivariance by an example.
In this example semivariance is computed for measurements of annual rainfall at 16
meteorological stations (Table 3.4 and Figure 3.6).
Table 3.4: Annual rainfall measured at 16 meteorological stations.
Meteorological station
Annual rainfall (mm) (Z)
A
400
B
420
C
440
D
450
E
430
F
440
G
450
H
460
I
445
J
460
K
470
L
480
M
460
N
475
O
480
P
490
y
3 km
D
H
L
P
2 km
C
G
K
O
B
F
J
N
A
E
I
M
1 km
x
1 km
2 km
3 km
Figure 3.6: Locations of the 16 meteorological stations in Table 3.4.
21
3. Statistical models
To compute the semivariance we start by creating distance intervals. In our example the
point distribution is regular and the distances between the two arbitrary points are:
1.0000, 1.4142, 2.0000, 2.2361, 2.8284, 3.0000, 3.1623, 3.6056 or 4.2426 km. Each point
pair belongs to one of these groups. E.g. the group “point-distance-2km” consists of the
following point pairs: A-C, B-D, E-G, F-H, I-K, J-L, M-O, N-P, A-I, E-M, B-J, F-N, CK, G-O, D-L and H-P (as seen we use both vertical and horizontal point pairs – i.e., an
isotropic model). For each group we estimate the semivariance by:
2
1 n
 (h )   z ( xi , yi )  z ( xi , yi )  h 
2n i 1
(3.20)
where n is the number of point pairs in the group. For the group “point-distance-2km”,
which has 16 point pairs, we have:
 (2km) 


1
( z A  z C ) 2  ( z B  z D ) 2  ...  ( z H  z P ) 2 =447 mm2
2 * 16
The same procedure is repeated for all distance groups and the result is provided in
Figure 3.7. We will come back to semivariance in sub-section 4.2 when kriging
interpolation is described.
2500
Semivariance (mm2)
2000
1500
1000
500
0
0
0.5
1
1.5
2
2.5
Distance (km)
3
3.5
4
4.5
Figure 3.7: Plotted values for the semivariance as a function of distance for the data in
Table 3.4 and Figure 3.6.
22
4. Characteristics of optimal interpolation and kriging
4. Characteristics of optimal interpolation methods and kriging
This section starts by defining what we mean by an optimal interpolation method. Then
the interpolation method kriging is described which under certain statistical prerequisites
is optimal.
4.1 Characteristics of the optimal interpolation method
In sub-section 2.6 a comparison between the interpolation methods mean value, nearest
neighbour and inverse distance weighting are given. The example reveals that the
interpolated values zmv, znn and zidw can be quite different. That sub-section also included
a short discussion about which interpolation method to prefer. It is now time to come
back to the issue of which interpolation method provides the best estimation. This time
the discussion is based on a statistical approach.
The interpolated value is a function of measured values (cf. Equation 3.1). Since the
measured values are (continuous) random variables, the interpolated value is also a
random variable. For example, in Section 2 we interpolated values for point p with three
methods (mean value, nearest neighbour and inverse distance weighting). The result of
each interpolation is a random variable: Zmv, Znn and Zidw. The random variables are
characterized by their expectation values and their variance. The best interpolation
method is the one that provides a random variable with the following two characteristics:
* a correct expectation value, and
* the smallest variance.
An illustration of these characteristics is given in Figure 4.1. In this figure the probability
functions are drawn for three interpolated random variables: Zmv, Znn and Zidw. All of the
three random variables have expectation values equal to μ (and this is a correct
expectation value). The interpolation method inverse distance weighting has the smallest
variance; therefore this method is the best one in this example. That the method is the
best one is synonymous with the fact that the probability of getting a good value (= a
value close to the expectation value) with this method is larger than for the other
methods.
It is important to note that the best interpolation method (according to the statistical
requirements above) does not always provide the best estimation. In Figure 4.1 the
interpolation has resulted in three values: zmv, znn and zidw. These interpolated values are
observation of the random variables: Zmv, Znn respective Zidw. As seen from the figure, the
mean value interpolation (zmv) provides the best estimation (= the value closest to the
expectation value). However, in a practical case the expectation value is unknown and
therefore it is not possible to know which estimated value that is the best one. This means
that the method that should be selected is the interpolation method that provides the best
statistical properties.
23
4. Characteristics of optimal interpolation and kriging
probability
Znn
Zmv
Zidw
Z
μ
zmv
zidw
znn
Figure 4.1: Illustration of three random variables: Zmv (dashed-dotted line), Znn (solid line)
and Zidw (dashed line). The random variables are results from interpolation methods mean
value, nearest neighbour and inverse distance weighting. zmv, znn and zidw are interpolated
values; these values are observations of the random variables Zmv, Znn and Zidw. (Note that
this is just an illustration of concepts. The figure should not be interpreted as inverse
distance weighting always provides a random variable with smaller variance than the
other interpolation method. Nor is it always the case that zmv is the best estimation.)
Above we stated that a good interpolation method must provide an estimate with a correct
expectation value. In fact, all interpolation methods that are based on the general formula
(Equation 2.1) provide a random variable with a correct expectation value. This fact is
based on the assumption that the expectation value is the same for all points; this
assumption is common in geographic analysis (see the discussion about stationarity in
sub-section 3.5). We will now show that the general formula for interpolation provides an
estimate with a correct expectation value. We start by stating that the expectation value is
a linear operator. That is, the following rule holds:
E aX  Y   a  E  X   E Y 
(4.1)
We now use the linear property in the general formula for interpolation (Equation 2.1)
and utilise the fact that all the points have the same expectation value (=μ):
24
4. Characteristics of optimal interpolation and kriging
 n

  i * z ( x i , y i ) 

E z ( x p , y p )   E  i 1 n


i



i 1


n

i 1
i
* E  z ( xi , y i ) 
n

i 1
i
n


i 1
i

n

i 1
n
*
i
   i
i 1
n

i 1

(4.2)
i
That each interpolation method that is based on the general formula provides a correct
expectation value does not imply that all interpolation methods are equally good. The
variance of the estimation varies depending on which interpolation method is used. And,
according to the second characteristic, we should select a method that minimises the
variance (cf. Figure 4.1). A method that is derived to minimise the variance is kriging,
which is the topic of the remaining part of this section.
4.2 Workflow of kriging interpolation
The aim of this sub-section is to describe the workflow of kriging interpolation. We will
go through each step of the method as a recipe; the theoretical background is left to the
next sub-section.
1) The point p is to be estimated from a set of observation (Figure 4.2). But before
we perform the actual interpolation we must investigate the statistical properties
of the entity. This investigation is based on the measured values at the observation
points.
Figure 4.2: The value of the interpolation point p is to be estimated from the observation
points (non-filled circles).
25
4. Characteristics of optimal interpolation and kriging
2) Start by investigating the trend of the data. In other word we estimate the value
m(x,y) in Equation 3.1. This investigation could e.g. be performed by fitting a
polynomial surface to the observed points. If the trend is not constant it has to be
removed before proceeding with the next step in the workflow. The reason for
removing the trend is that kriging requires that the entity has the stationarity
property (which requires that the expectation value is constant). The trend is
removed by subtracting the trend surface from the original values.
3) Compute the semivariance between point pairs as a function of distance between
the points. This procedure starts with computing the distance between each point
pair of the observation points. Then the point pairs are grouped according to the
distances between the points. It could for example be one group for point pairs
with a distance between 0-1 km, one group for point pairs with a distance between
1-2 km, etc. Compute the semivariance for each group separately by applying
Equation 3.20. The result of these computations is plotted in a graph (Figure 4.3).
4) A curve is fitted to the computed semivariances (Figure 4.3); this curve could
either be a linear piecewise function, or a smooth mathematical function. This
curve is called a semivariogram. A semivariogram looks ideally like follows. It
starts with a rather low value (equal to the semivariance for points that are lying
very close). This value is denoted the nugget. The semivariance then increases
with longer distances between the points in the point pair. When the distance is
equal to the range the semivariogram has reached its maximum value. Then the
semivariance is a constant value – this value is denoted the sill. It is far from all
practical examples where the semivariogram has this ideal look. Minor deviations
from this ideal picture are not a major problem, but the deviations should not be
too large. For example, in Figure 3.7 some semivariances are plotted as a function
of distance. If a semivariogram would be fitted to these values an increasing linear
function would provide the best fit. This could be interpreted as if the data
material in Table 3.4 and Figure 3.6 covers such a small area that the longest
distance is shorter than the range. If this would happen it is not a problem to
apply kriging interpolation. But if the deviations are too large from the ideal look
of the semivariogram then kriging interpolation should not be applied. This is
further discussed in Section 6.
26
4. Characteristics of optimal interpolation and kriging
(km)
Figure 4.3: A graph that shows the estimated semivariance ( ˆ ) as a function of distance
(h). Each dot in the graph represents a computation of the semivariance (using Equation
3.20) for a group of point pairs. A curve – semivariogram – is fitted to the points. The
semivariogram is characterised by its sill (s), range (r), and nugget (n).
5) Perform the interpolation. The points that are used in the interpolation are points that
are closer than the distance range to the interpolation point (Figure 4.2). The weights are
set depending on the distance between the observation points and the interpolation point.
The correspondence between the distance and the size of the weight is determined by the
semivariogram (Figure 4.3).
6) If a trend was removed in step 2 it has to be added back again after the interpolation.
To summarize, after the six steps above, you have completed a kriging interpolation. It is
important that you go through the steps in the correct order; not all computer programs
used for kriging interpolation will guide you through the workflow.
4.3 Theory of kriging interpolation
Kriging interpolation is often called the optimal interpolation method. But it is important
to note what is meant by optimal here. Kriging is optimal if:
1) Stationarity is assumed (i.e. the expectation value is the same for all points and the
correlation is only dependent on distance between the points).
2) Optimal means the same as minimising the variance of the estimation (cf. Figure 4.1).
27
4. Characteristics of optimal interpolation and kriging
A complete derivation of the kriging interpolation is outside the scope of this document.
Instead a short justification of the kriging method is provided. Start by rewriting the
general formula for interpolation (Equation 2.1) as:
n

z
(
x
,
y
)

i * z ( x i , y i )

p
p


i 1
 n

i  1
 
i 1
(4.3)
Since the weights are scaled such that their sum equal to one, it is not necessary to divide
by the sum of the weights (cf. Equation 2.1). In sub-section 4.1, it was stated that the
general formula for interpolation provides estimations with correct expectation value; this
holds true also for Equation 4.3. The other requirement of the interpolation method –
minimising the variance – is the sole base for setting the weights in kriging. The weights
are derived from the following optimisation problem:
n
n


V  Z ( x p , y p )   i  Z ( xi , yi )  is minimised under the constraint that  i  1
i 1
i 1


(4.4)
At first sight this optimisation problem seems to be impossible to solve. How could we
optimise the weights using the sought random value Z(xp,yp) in the optimisation
expression? But it is not the actual value of Z(xp,yp) that is used in the derivations, it is
rather the correlations between the value Z(xp,yp) and the values at the observation points
Z(xi,yi); these correlations are given by the semivariogram.
28
4. Characteristics of optimal interpolation and kriging
By solving the optimisation problem in Equation 4.4 we obtain the following weights for
the kriging interpolation:
λ  G 1 γ
where
 1 
 
λ   2  is a vector of the weights
 ... 
 
 n 
  1,1  1, 2 ...  1,n 

 2, 2 ...  2,n 
2 ,1

G
is a matrix constructe d from the semivariog ram
 ...
... ... ... 


 n ,1  n , 2 ...  n ,n 
G -1 is the inverse of matrix G
 1, 2   distance between point 1 and point 2 
  1, p 
 
2, p 
γ
a vector th at contains values from the semivariog ram for distances between
 ... 


(4.5)
 n , p 
the interpolat ion point ( p ) and the n observatio n points.
As seen from Equation (4.5), it is quite complicated to compute the weights for the
kriging interpolation. Luckily, there are several computer programs that perform these
computations; i.e., it is not necessary to understand all the pertinent details. But it is
important to understand from where kriging gets the weights. From Equation 4.5 we can
conclude that the weights are only based on the semivariogram. This implies that before
we perform the actual kriging interpolation (using Equation 3.3) it is necessary to derive
a semivariogram for the entity.
29
5. Spatial prediction using additional information
5. Spatial prediction using additional information
In some of the examples previously in this document we have interpolated annual
rainfall. We have also stated that there is a relationship between annual rainfall and
altitude (sub-section 3.3). This relationship is also illustrated in Figure 3.3. A sound idea
would be to utilise this relationship while performing the interpolation. This is the basic
idea of the method we have described as: spatial prediction using additional information.
5.1 Basic theory of spatial prediction
The aim of spatial prediction using additional information is to interpolate a value for an
entity from observation points. The difference from ordinary interpolation is that we use
additional information. For example, to predict a value of annual rainfall we need the
following data (besides the measured annual rainfall at some observation points):
 data about the topography (e.g. a digital elevation model – DEM), and
 a relationship between annual rainfall and topography.
To perform spatial prediction of annual rainfall we need to define the relationship
between annual rainfall and topography. If this relationship is not known for the study
area it has to be estimated by empirical methods. The most common method to use is
linear regression. In linear regression the following model is used:
y   x
(5.1)
where
x is the independent parameter,
y is the dependent parameter,
α and β are the regression parameters.
For the example of annual rainfall and topography we have the following model:
annualRainfall      Altitude
(5.2)
The relationship between the annual rainfall and the topography is described by
estimating values of the regression parameters α and β. The regression parameters are
estimated from measured annual rainfall and altitude at the observation points (cf. Table
5.1 and Figure 5.1). We do not here state the formulas for estimating these parameters.
These formulas can be found in a standard textbook in statistics. Another approach to
computing these parameters is to use a statistical program (e.g. SPSS) or spread sheet
software (e.g. Excel) where these formulas are implemented.
30
5. Spatial prediction using additional information
Table 5.1: Altitude and annual rainfall at the observation points.
Altitude (m)
Annual rainfall (mm)
35
130
50
55
70
75
90
180
150
610
760
600
630
670
640
705
815
780
900
850
Annual rainfall (mm)
800
750
700
650
600
550
500
0
20
40
60
80
100
120
Altitude (m)
140
160
180
200
Figure 5.1 illustrates the values of the observation points in Table 5.1. The line in the
graph is called the regression line. α is equal to the y-intercept of the regression line, and
β equals the slope.
Finally, some words of caution are required. It is important only to use the regression
model within the proper range. In Figure 5.1 we see that the regression line is determined
by altitude values between 40 and 180 m. This implies that the regression is only valid
within this range. To use values outside this range implies extrapolation; this is not a
reliable method and should generally be avoided. It is also necessary to realize that
autocorrelated data may violate one of the assumptions behind regression analysis – that
of independent observations. If autocorrelation among the observations is present, one
must be careful when interpreting the coefficients of explanation and significance
statistics.
31
5. Spatial prediction using additional information
5.2 The workflow of spatial prediction
In this sub-section we describe the required steps in spatial prediction. We illustrate the
methodology by the example given in Table 5.1 and Figure 5.1. In this example we aim
to predict annual rainfall where topography is used as the additional information. To
perform the spatial prediction the following steps are required:
1) Establish an empirical relationship between annual rainfall and topography
(Equation 5.2). To do this data from the observation points in Table 5.1 are used.
In this example we obtain:
α = 545 mm
β = 1.56 mm/m.
2) Estimate the annual relationship for the new point p. This point is lying 89 m
above the see level.
AnnaulRainfallPointP = 545 [mm] + 1.56 [mm/m] * 89 [m] = 684 mm
In the example above we have only estimated the annual rainfall for one point. But the
same methodology could be used to estimate annual rainfall for many points. If, for
example, we have a digital elevation model (DEM) we can predict the annual rainfall for
a whole area. But, again one should be a bit cautious. Above we stated that the regression
equation is only valid for the range in altitude from which it was derived. This is also true
across a geographical area. That is, you are only allowed to predict annual rainfall in the
same area where the observation points are lying. Predicting values outside this area
implies extrapolation and should be avoided.
32
6. Spatial prediction using additional information
6. Selecting interpolation methods
Imagine that you are working in a project and you need to interpolate data. How would
you proceed? Which interpolation method should you select? The aim of this section is to
provide some guidelines on how to select a proper interpolation method.
An interpolation process should always start by an investigation of the entity. In our
discussion here we base this investigation on the statistical model in sub-section 3.2:
z( x, y)  m( x, y)   ( x, y)   
(6.1)
The first thing to find out is whether there is a trend in the data (i.e., m(x,y) is not a
constant value). This could be investigated by creating a thematic map based on the
observation points or by adjusting a polynomial surface to the observation points. This
provides you with a general feeling about the entity values in the region.
The second thing to find out is the correlation between entity values, i.e., the spatial
autocorrelation. The spatial autocorrelation is related to the parameters  ( x, y) and   in
Equation 6.1 such that:
 ( x, y) <<    small spatial autocorrelation
 ( x, y) >>    large spatial autocorrelation.
A first investigation of the spatial autocorrelation could be to study a map of the
observation data. If high (and low) entity values tend to be grouped in clusters it would
seem as if the spatial autocorrelation was high. On the other hand, if the entity values
look purely random, the spatial autocorrelation is probably low. This should be further
investigated by computing semivariances and plotting them in a diagram (remember to
remove the trend before computing the semivariances, cf. sub-section 4.2).
A third thing to investigate is if any additional information for interpolation is available.
This requires good knowledge of the entity to interpolate. E.g. if we are to interpolate
annual rainfall we (hopefully) know that annual rainfall is related to topography. Then we
could create a map where the distribution of the entity and the additional data is
visualised together. For example, in Figure 3.3 we can clearly see that there is a
relationship between annual rainfall and topography. If we have found a possible
relationship we can statistically test this relationship. A way forward here is to use linear
regression (Equation 5.1). Start by computing the regression parameters α and β from
your data material (at the observation points). If there is a linear relationship between the
parameters, the value of the regression parameter β is significantly not equal to zero (the
same as that the regression line in Figure 5.1 is not horizontal).
After these three investigations it is time to decide which interpolation method to use. A
guideline for this selection is illustrated in Figure 6.1. More information about the
different choices is provided in the remaining sub-sections. You should interpret the
cases very small spatial autocorrelation and very large spatial autocorrelation as ideal
cases. These cases seldom occur in reality in their pure form. But if you have a practical
case that is somewhat similar to these cases these ideal cases should give you some clue
of how you could solve your interpolation problem.
33
6. Spatial prediction using additional information
Is there additional information available that has a large
correlation with the entity?
No
Yes
Does the entity have the following
properties:
 No or a very small trend.
 No or very small spatial
autocorrelation.
 Large local variations.
Additional information
available
Use spatial prediction with
additional information.
Sub-section 6.1
Yes
No
Does the entity have the following
properties:
 Very large autocorrelation.
 Small local variations.
Very small spatial
autocorrelation
Use mean value.
Use many observation points.
Sub-section 6.2
Yes
Very large spatial
autocorrelation
Nearest neighbour
or inverse distance weighting
with large k would be able to
model the data. Sub-section 6.3
No
Does the entity have the following
properties:
 Very large trend.
 Comparatively small local
variations and spatial
autocorrelation.
No
Yes
Very large trend
Some type of trend surface
analysis, e.g. using polynomials
or splines. This type of
interpolation is not described in
this document.
General case
Use kriging or inverse distance
weighting with normal k.
Use all points closer than distance
range.
Sub-section 6.4
Figure 6.1: A guideline for the selection of an interpolation method.
34
6. Spatial prediction using additional information
6.1 Additional information available
If you have additional information available with a high correlation to the entity the
method spatial prediction using additional information may be a good choice for
interpolation. The reason is that by using the additional information you can model a
large extent of the variations of m(x,y) and/or  ( x, y) using spatial prediction. But, of
course, this is only true if there is a strong correlation between the sought entity and the
additional information. So it is important to test this correlation (e.g. by a statistical
hypotheses test of the regression parameter β).
6.2 Very small spatial autocorrelation
The case that we denote very small spatial autocorrelation is characterised by:
a) Very small spatial autocorrelation (i.e.  ( x, y) is zero or close to zero).
b) The trend is zero or small (i.e., m(x,y) is almost constant).
For this case the semivariogram looks as in Figure 6.2 and the point value distribution is
as in Figure 6.3.
To illustrate the case very small spatial autocorrelation we can utilise the statistical
model in sub-section 3.3. In this model of the annual rainfall the component  ( x, y) was
completely dependent on the altitude of the station. If this component is zero, it would
imply that we have a flat landscape where the annual rainfall only is dependent on the
climate zone (determines the trend m(x,y)) and local variations (determines   ).
Normally, it is assumed that the local variations can be modelled with the same
distribution for all observation points (   =σ2).
Semivariance
γ(h)
Distance (h)
Figure 6.2: Semivariance for the case very small spatial autocorrelation.
35
6. Spatial prediction using additional information
z
6
5
ε''
4
3
m(x,y)
2
1
0.2
0.4
p
0.6
0.8
x
1.0
Figure 6.3: Interpolation of a point p in the case very small spatial autocorrelation. As
can be seen, all points are independent and have the same expectation value (=m(x,y)).
For the case very small spatial autocorrelation, the best interpolation method is the mean
value of all observation points. The motivation is that this is the interpolation method that
minimizes the variance of the interpolated value. We do not provide a complete proof of
this, but just a hint for the mathematically inclined reader (this hint is difficult enough; if
you do not understand the details you should not worry). The variance of the interpolated
value, V(z(xp,yp)), is equal to (where    N 0,   ):
 n

  z ( xi , y i ) 
n
  1 V  z ( x , y )   1
V z ( x p , y p )   V  i 1

i
i
2

 n 2  i 1
n
 n




n
1
n  2  2
 2  2 

n
n i 1
n2
n
V z ( x , y )  
i 1
i
i
(6.2)
As can be seen, the variance of the interpolated values is the variance of each individual
observation divided by the number of observations. This is true for mean values; i.e., if
all the points have the same weights. It can be shown that all other possible settings of the
weights will give a higher variance for the interpolated value.
The discussion above is quite abstract and it is, perhaps, helpful to describe it using the
example from Figure 6.3. The ideal interpolated value for point p is m(x,y). However, this
value is unknown; the only known values are the values measured at the observation
points. These values are all random values with the expectation value m(x,y) and the
variance σ2. Then, the best estimation of point p is the mean value of all the observation
points.
36
6. Spatial prediction using additional information
What can we learn from this? The main point here is that if we have independent
observations (which are all random values with the same expectation value and variance)
then the best estimation results are from using the mean value. This rule does not hold if
the observations are dependent on each other (=they are spatially autocorrelated).
6.3 Very large spatial autocorrelation
The case that we denote as very large spatial autocorrelation is characterised by:
a) Very large to extreme spatial autocorrelation (i.e.  ( x, y) is large)
b) No or small local variations (   almost zero).
The main characteristic of this case is that the local variations are much smaller than the
spatial autocorrelation term. For this case the semivariogram looks as in Figure 6.4 and
the point value distribution is as in Figure 6.5.
To illustrate the case very large spatial autocorrelation we can utilise the statistical
model in sub-section 3.3.   =0 would imply that there are no local variations of the
annual rainfall. Hence, the annual rainfall will be completely determined by the climate
zone and the altitude.
Semivariance
γ(h)
Distance (h)
Figure 6.4: Semivariance for the case very large spatial autocorrelation.
The interpolation problem for the case very large spatial autocorrelation is totally
different from that of the previous case. For that case, the main problem was to reduce the
larger local variations in the interpolation. For the case very large spatial autocorrelation
the main problem is to capture the undulations of the entity. Figure 6.5 is an illustration
of the distribution of an entity with a strong autocorrelation. As seen from the figure, a
good estimation of point p is given by using the nearest observation point, i.e., nearest
neighbour interpolation. Why is the nearest neighbour a good selection? The reason is,
again, that this interpolation method provides an estimate with low variance. We can state
the following: If the local variations are very small in comparison to the autocorrelation
for all points (  ( x, y) >>   ), two neighbouring points are strongly dependent on each
other (the covariance between the points is high); therefore, the nearest neighbour
37
6. Spatial prediction using additional information
interpolation is a good interpolation from a statistical point of view. The same argument
holds if the trend is undulating dramatically, and this undulation is much larger than the
local variations.
This document mainly concerns the statistical aspects of selecting interpolation methods.
One should be aware of that there are other aspects. One example is that if you
interpolate a surface you often want a smooth surface (e.g. for visualisation purposes). In
such a case the nearest neighbour interpolation method is not a good choice (even though
it is a statistical good choice). The reason is that the nearest neighbouring method will
provide a discontinuous surface.
z
6
ε'(x,y)
5
4
3
m(x,y)
2
1
0.2
0.4
p
0.6
0.8
1.0
x
Figure 6.5: Interpolation of a point p in case very large spatial autocorrelation. In this
case  ( x, y) >>   and m(x,y) is constant. It could also be that m(x,y) is undulating.
38
6. Spatial prediction using additional information
6.4 General case
In most real-life situations, neither  ( x, y) nor   is close to zero and we do not have
access to additional information. This implies that we have the following situation:
 The closest observation points have highest dependency with the interpolation
point, but also points a bit further away are correlated to the interpolation point.
 All points have spatially uncorrelated local variations (   ).
 No additional information is available.
For this case the semivariogram looks as in Figure 6.6 and the point value distribution is
as in Figure 6.7.
The first point suggests that the closest points are the most important. But, since there are
local variations we should use many observation points in the interpolation. The best
interpolation method is here to set higher weights to the closest points and lower weights
to point further away. This could be achieved by using kriging or inverse distance
weighting with a normal value of k (2-3). We have previously described how to apply
kriging interpolation (sub-section 4.2) and inverse distance weighting (sub-section 2.5)
and will not describe these methods any further.
Which observation points should be used in the interpolation? Again, the main aim is to
select a proper number of points to minimise the variance in the estimation (cf. subsection 4.1). To minimise the variance we should only use points that are correlated to the
interpolation point, and these points are the points that are closer than the distance range
from the interpolation point (cf. Figure 6.6).
Semivariance
γ(h)
range
Distance (h)
Figure 6.6: Semivariance for the general case.
39
6. Spatial prediction using additional information
z
6
ε"
5
ε'(x,y)
4
3
m(x,y)
2
1
p
0.2
0.4
0.6
0.8
1.0
x, y
Figure 6.7: Interpolation of a point p in the general case.
Kriging is an optimal interpolation method when there is no trend and the semivariogram
is known. It selects the best weights for the surrounding points, and also considers the
point density and spatial distribution of the points. Thereby it minimizes some unwanted
effects in inverse-distance interpolation, particularly the creation of local minima and
maxima in the interpolated surface model (“bulls-eye effect”). Another advantage of
kriging is that it produces an interpolation error for each interpolated cell. These errors
can be printed in a map and will show areas of lower and higher fidelity of the
interpolated surface. Many variations of kriging exist, for example, block kriging, that
generates smoothed output; co-kriging, that incorporates correlated information (like
elevation in the case of interpolating rainfall), universal kriging, that incorporates
information about linear trends, and indicator kriging, that allows interpolation of data
that is not normally distributed.
Acknowledgements
Thanks to Lars Eklundh for corrections and constructive comments.
Thanks to Maj-Lena Lindersson for help with the climatology examples.
Thanks to David Tenenbaum for improving the language.
References
Blennow, K., Bärring, L., Jönsson, P., Linderson, M.-L., Mattsson, J.O.
& Schlyter, P., 1999: Klimat, sjöar och vattendrag. In: Germundsson, T.
& Schlyter, P. (eds/.): Atlas över Skåne./ National Atlas of Sweden vol 18. Swedish
Society for Anthropology and Geography. Almqvist & Wiksell, Stockholm. pp 30-37. (In
Swedish)
40
Download