Statistical aspects of spatial interpolation Lars Harrie Department of Physical Geography and Ecosystem Analysis Lund University 2008-10-24 Latest update: 2014-10 Table of contents 1. Introduction ................................................................................................................. 1 1.1 Background ........................................................................................................... 1 1.1 Aim of this document............................................................................................ 1 1.2 Content .................................................................................................................. 2 1.3 Explanation of terms ............................................................................................. 2 1.4 Further reading ...................................................................................................... 2 2. Basic interpolation methods ........................................................................................ 4 2.1 An example of annual rainfall............................................................................... 4 2.2 General formula for interpolation ......................................................................... 5 2.3 Mean value ............................................................................................................ 5 2.4 Nearest neighbour ................................................................................................. 5 2.5 Inverse distance weighting .................................................................................... 6 2.6 Comparison of the interpolation methods ............................................................. 7 3. Statistical models ........................................................................................................ 9 3.1 Spatial autocorrelation .......................................................................................... 9 3.2 A statistical model for an entity’s distribution .................................................... 10 3.3 A statistical model for annual rainfall ................................................................. 11 3.4 Random variables................................................................................................ 13 3.5 Expectation value, variance, standard deviation, covariance, correlation coefficient and semivariance..................................................................................... 14 4. Characteristics of optimal interpolation methods and kriging .................................. 23 4.1 Characteristics of the optimal interpolation method ........................................... 23 4.2 Workflow of kriging interpolation ...................................................................... 25 4.3 Theory of kriging interpolation ........................................................................... 27 5. Spatial prediction using additional information ........................................................ 30 5.1 Basic theory of spatial prediction ....................................................................... 30 5.2 The workflow of spatial prediction ..................................................................... 32 6. Selecting interpolation methods ................................................................................ 33 6.1 Additional information available ........................................................................ 35 6.2 Very small spatial autocorrelation ...................................................................... 35 6.3 Very large spatial autocorrelation ....................................................................... 37 6.4 General case ........................................................................................................ 39 Acknowledgements ........................................................................................................... 40 References ......................................................................................................................... 40 1. Introduction 1. Introduction 1.1 Background A common operation in geographical analysis is interpolation. In interpolation a value for a point is estimated by values from other points. In Figure 1.1 the annual rainfall of the point p is estimated from the measured annual rainfall at the meteorological stations. There are several interpolation methods to estimate the rainfall at the point p; in this document only a few of them are described. p Figure 1.1: The rainfall at point p can be estimated from the measured annual rainfall at the meteorological stations (triangles). 1.1 Aim of this document The aim of this document is to describe the statistical aspects of interpolation. It describes basic interpolation methods such as mean value, nearest neighbour and inverse distance weighting; the document also provides the basic theory of the more advanced methods: kriging and spatial prediction with additional information. The description of the interpolation methods is based on statistical theory. The statistical theory is only briefly described in this document; it is highly recommended that the reader have basic statistical knowledge from previous studies. Statistical knowledge is important for carrying out interpolation correctly. However, interpolation also requires other knowledge. Important are e.g. firm knowledge about the entity to interpolate (e.g. rainfall) and practical skills in the use of computer programs for interpolation (e.g. GIS programs). These latter aspects are not treated in this document. 1 1. Introduction 1.2 Content The document is organised as follows: Section 2: Describes three basic interpolation methods: mean value, nearest neighbour, and inverse distance weighting. This section only describes how to use the interpolation method and not when it is suitable to use them. Section 3: Before we perform interpolation we need to have statistical knowledge about the distribution of the entity. In this section we describe a statistical model often used to characterize an entity. Section 4: This section describes what constitute an optimal interpolation method. A description of the “optimal” interpolation method kriging is also provided. Section 5: In some cases we have access to additional information that could be used to enhance the quality of the interpolation. The aim of this section is to study a method for spatial prediction using this additional information. Section 6: The aim of this section is to provide guidelines about how we should select interpolation methods based on the statistical knowledge of the entity. 1.3 Explanation of terms In the document we will use the following definitions: * entity - a quantity that is interpolated (e.g., rainfall, temperature, altitude, CO2). * observation point – a point where the entity is measured (observed). This could be a meteorological station, a place where a soil sample is collected, etc. * interpolation point – a point where the entity value is to be interpolated (e.g. point p in Figure 1.1). 1.4 Further reading To understand interpolation it is important to have good knowledge in both statistics and GIS (and of course also in the application field). Some recommended books in these fields are: Statistics: Blom, G., Enger, J., Englund, G., Grandell, J., and Holst, L., 2005. Sannolikhetsteori och statistikteori med tillämpningar. Studentlitteratur. (in Swedish) Haining, R., 1990. Spatial data analysis in the social and environmental sciences. Cambridge University Press. Vännman, K., 2002. Matematisk statistik. Studentlitteratur. (in Swedish) GIS: Burrough, P., and McDonnell, R., 1998. Principles of Geographical Information Systems. Oxford University Press. 2 1. Introduction Harrie, L., 2013. Geografisk informationsbehandling – teori, metoder och tillämpningar, 6th ed. (in Swedish) Östman, A., 1995. Interpolering av geografiska data. Luleå Tekniska Universitet. (in Swedish) 3 2. Basic interpolation methods 2. Basic interpolation methods The aim of this section is to describe some basic interpolation methods. It is important to understand these basic methods well. This knowledge is required in order to use more advanced methods, such as kriging, in a proper way. The interpolation methods described in this section are: mean value (sub-section 2.3), nearest neighbour (sub-section 2.4), and inverse distance weighting (sub-section 2.5). The methods are illustrated by an annual rainfall example (described in sub-section 2.1). 2.1 An example of annual rainfall Imagine that we want to interpolate the annual rainfall z of point p in Figure 2.1. To perform the interpolation we can use the four observation points in Table 2.1. Table 2.1: Values for the observation points (1-4) and the interpolation point p. Observation points x (km) y (km) Measured annual rainfall z (mm) Point 1 0.0 0.0 400 Point 2 1.0 0.0 500 Point 3 0.0 1.0 600 Point 4 1.0 1.0 800 Interpolation point x y Point p 0.2 0.2 To be estimated y (km) 1 800 600 p 500 400 1 x (km) Figure 2.1: The annual rainfall for point p is to be interpolated from the measured values at the four observation points (see Table 2.1). 4 2. Basic interpolation methods 2.2 General formula for interpolation The three interpolation methods described in this section (mean value, nearest neighbour, and inverse distance weighting) are all special cases of the general formula of interpolation. The general formula is stated as: n z( x p , y p ) i 1 i * z ( xi , y i ) (2.1) n i 1 i where z(xp,yp) is the interpolated value at point p, z(xi,yi) is the measured value at the observation point i, λi is the weight for the measured value of the observation point i, and n is the number of observation points. 2.3 Mean value A simple approach is to set the same weight for all observation points (e.g. setting λi =1 for all i in Equation 2.1). This implies that the interpolated value - z(xp,yp) – equals: n z mv ( x p , y p ) z( x , y ) i i 1 i (2.2) n If we apply Equation 2.2 for the annual rainfall example we obtain: zmv(xp,yp) = (400+600+500+800) / 4 = 575 mm. 2.4 Nearest neighbour One might argue that the value at point p is most likely quite equal to the value at the closest observation point. This is the basic assumption of the nearest neighbour interpolation method. This assumption can be implemented by manipulating the weight terms in Equation 2.1, setting the weight for the closest observation point to 1 (λi =1, where i is the closest observation point to the interpolation point p) and the weight for the other observation points to zero. By reference to Figure 2.1, we see that observation point 4 is closest in space to interpolation point p. Hence, we get znn(xp,yp) = 400 mm when using nearest neighbour interpolation for the annual rainfall example. 5 2. Basic interpolation methods 2.5 Inverse distance weighting In the inverse distance weighting (IDW) method the weights are given as: i 1 , dk (2.3) where d is the distance between points (xi,yi) and (xp,yp), and k is a an exponent that determines the weight’s dependence on the distance d. If we introduce this expression for the weights to the general formula for interpolation (Equation 2.1) we obtain: n z idw ( x p , y p ) 1 z( x , y ) d i 1 i i k n 1 k i 1 d (2.4) The value of the exponent k describes how the weights depend on the distance d. If a low value is used for k all the observation points get similar weights. In the extreme case we set k=0 which implies that all observation points get the same weights, as for the mean value interpolation. This follows from that d0=1 for all values of d. If k is set to a high value then the observation points that are close to the interpolation point get much higher weights than observation points further away. The extreme case is, here, that k is approaching infinity. Then only the closest observation point is considered; hence, the inverse distance interpolation will be equal to the nearest neighbour interpolation. In practice, it is seldom that the extreme cases k=0 or kinfinity are used. Normal values of k are 2 or 3. Inverse distance weighting can easily be computed using a table. E.g., if we use k=2 in the annual rainfall example we get the values in Table 2.2. By using the values in the table, the interpolated value for the point p is equal to: zidw(xp,yp) = 7234.3/16.204 = 446 mm Table 2.2: Inverse distance weighting computations d (distance to p) λi (weight) z(xi,yi)* λi (mm) 400 0.283 12.486 4994.4 500 0.825 1.469 734.5 600 0.825 1.469 881.4 800 1.132 0.780 624.0 16.204 7234.3 z(xi,yi) (mm) Point 1 Point 2 Point 3 Point 4 Sum 6 2. Basic interpolation methods 2.6 Comparison of the interpolation methods For the annual rainfall example (in sub-section 2.1) we get the following interpolated values by using the various interpolation methods: Mean value - zmv(xp,yp) = 575 mm Nearest neighbour interpolation - znn(xp,yp) = 400 mm Inverse distance weighting (with k=2) - zidw(xp,yp) = 446 mm Apparently, the methods estimate quite different values for the interpolation point. An obvious question is: Which interpolation method provides the best estimation? There are no simple truths here, but we can make the following remarks: * A drawback of the mean value is that points that are far away influence the result to a great extent. In the example, the observation point 4 has a large value (z(x4,y4)=800 mm) which causes the high value of the interpolated mean value. * The nearest neighbour interpolation method does not take into account that the annual rainfall values seem to rise in a north-easterly direction. From Figure 2.1, we would probably anticipate that the annual rainfall of point p is a bit larger than 400 mm. In the annual rainfall example the inverse distance weighting method, with a medium value of the distance dependence of the weight (such as k=2), seems to give the most likely value. But this is more based on intuition than something we really know; to get more information we must have better knowledge about the distribution of the annual rainfall. For example, we do not know, from our example, the distribution pattern for annual rainfall in the area (Figure 2.2); firm knowledge of this pattern is important to select the best interpolation method. 7 2. Basic interpolation methods z 600 500 400 300 200 100 x 0.2 0.4 0.6 0.8 1.0 Figure 2.2: This figure we illustrate possible distributions of annual rainfall values between observation points 1 and 2 along the x-axis in Figure 2.1. From Table 2.1 we know the values for x=0 (z=400) and x=1 (z=500), but we do not know anything about the annual rainfall values between these points. The rainfall may vary slowly as for the dashed line or it may change more rapidly (as for the solid line). Normally, the actual variations between the observation points are unknown, and we only have access to statistical properties of the variations. The concept illustrated in Figure 2.2 can perhaps more easily be described using a real world example. Imagine that you are to interpolate annual rainfall values. How large distances could you accept to the observation points? 100 m, 1 km, 10 km, 100 km or even 1000 km? At some point the distances are too long (i.e., the variations are too large between the points), to perform meaningful interpolation. To summarize, the selection of interpolation method must be based on knowledge of the measured entity. In the next section, we describe a statistical model often used for this purpose. 8 3. Statistical models 3. Statistical models Knowledge about an entity’s distribution can be of two kinds: 1) Exact knowledge of how the entity varies, meaning that we know the values of the entity for all points. 2) Statistical knowledge of how the entity varies. In practice, the first kind of knowledge does not exist. All measurements are, in principal, samples in space. This implies that we can never have knowledge of an entity’s values for all points (and if we really have knowledge of the entity for all the points there would be no reason to interpolate). This leaves us, in all practical examples, with the second type of knowledge: statistical knowledge about the entity’s distribution. This section deals with statistical properties of a measured entity. A description of spatial autocorrelation follows in sub-section 3.1. Then, in sub-section 3.2, we describe a statistical model of how an entity varies; which is exemplified in sub-section 3.3. This model can be described using random variables (sub-section 3.4) and fundamental statistical quantities associated with random variables: expectation value, variance, standard deviation, covariance, correlation coefficient and semivariance (sub-section 3.5). 3.1 Spatial autocorrelation Tobler’s first law of geography states: Everything is related to everything else, but near things are more related than distant things. Or in other words, measured entity values for close points are more likely to be similar than measured entity values for distant points. In geography this dependency between entity values are denoted spatial autocorrelation. The spatial autocorrelation is fundamental for interpolation; if there were no correlation between points, the theoretical basis for interpolation would disappear. Furthermore, Tobler’s law also indicates that close observation points should have higher weights than distant observation points. But how much higher should these weights be? To answer this question we need a statistical model of the entity’s distribution. 9 3. Statistical models 3.2 A statistical model for an entity’s distribution The distribution of an entity, e.g. annual rainfall, is very complex. To perform computations we need to use simplified statistical models. A common model is that the values (z) of an entity in the plane (where the position in the plane is given by the coordinates x and y) is composed of three parts (see also Figure 3.1): z( x, y) m( x, y) ( x, y) (3.1) where m(x,y) is a structural component. This component is normally a constant value or a trend surface (e.g. a polynomial surface with a low degree). ( x, y) is a spatially autocorrelated variable (often called a regionalized variable). Two points that are lying close have similar values of ( x, y) . is local variation that is not spatially autocorrelated. Normally it is assumed that is normal distributed with an expectation value of zero and a standard deviation equal to σ; that is, N 0, . Two points that are close together can have completely different values of . z 6 h a b 5 ε" 4 d g c 3 e 2 ε'(x,y) f m(x,y) 1 0.2 0.4 0.6 0.8 1.0 x, y Figure 3.1: A statistical model of the measured values (z) at points a-h of an entity (these measured values are drawn with filled circles). The structural component (m) is a constant value in this case. The regionalised variable ( x, y) changes slowly in the plane. The size of the local variation ( ) is illustrated by a dotted line; as you can see there is no spatial dependence for the local variations. The horizontal axis should really be the xy-plane; but for practical reasons we have just drawn it in one dimension. One can regard the horizontal axis to be a line in the horizontal plane (Figure 3.2). 10 3. Statistical models a b c d e f g h Figure 3.2: An illustration of how the points a-h in Figure 3.1 could be lying in reality. The line through the points is the same as the horizontal axis in Figure 3.1. The map is from Open Street Map (http://www.openstreetmap.org/). 3.3 A statistical model for annual rainfall The aim of this section is to illustrate the statistical model above with a practical example. For this purpose we use annual rainfall in the region of Scania, southern Sweden. The annual rainfall can be regarded as a random variable (Z) where all measured annual rainfalls are observations of this random variable. Annual rainfall, measured at a location with the coordinates x and y, can be regarded as consisting of the following components: z( x, y) m( x, y) ( x, y) (3.2) where m(x,y) is a value that is dependent on the climate zone. We here regard the whole of Scania to lie within the same climate zone; this implies that m(x,y) is a constant value. ( x, y) is a component that to a high degree relies on the altitude of the meteorological station. The higher above the sea level, the larger the annual rainfall (cf. Figure 3.3); in Scania we estimate the annual rainfall to increase with 150 mm if the altitude increases with 100 m (Blennow et al., 1999). Altitude is a parameter with a large spatial autocorrelation; this implies that the component of annual rainfall, that is dependent on the altitude, is spatially autocorrelated. 11 3. Statistical models is the local variation of the annual rainfall. This component can be regarded as a purely random variable with no spatial autocorrelation. But, of course, there might be physical explanation of these local variations, e.g. local relief, aspect, and exposure to wind. Random measurement errors will also contribute to this component. As given by the statistical model, the measured annual rainfall is dependent on components that vary quite differently in space. To perform a proper interpolation of annual rainfall, knowledge about these individual components is essential. Figure 3.3: Annual rainfall and topography in Scania, Sweden. The isolines represent the annual rainfall (mm). The background colour represents the altitude; the highest point in Scania is about 210 meter above sea level. The example of how to interpret the three components above is strongly linked to the spatial scale of observation. At another scale, other physical explanations may contribute to the different components. For example, in a smaller area topography might contribute more to m(x,y) than to ( x, y) . 12 3. Statistical models 3.4 Random variables The statistical model in sub-section 3.2 can be described using the statistical quantities: expectation value, variance, standard deviation, covariance, correlation coefficient and semivariance. In this document these quantities are described briefly, but first we have to introduce the concept of a random variable. A random variable is a variable for which the exact value is unknown; the only thing that is known is the likelihood of possible values. A random variable (below denoted Z) can either be discrete or continuous (Figure 3.4). In the discrete case the random variable can only have a fixed number of values; an example of a discrete random variable is the number of points on a dice (which can have the following values [1, 2, 3, 4, 5, 6]). The likelihood of each possible value is determined by the probability function which is denoted pZ(z). The likelihood for each possible value of a dice is, of course, pZ(zi)=1/6. A continuous random variable can have all possible values in an interval. The likelihood of a value is determined by the probability density function fZ(z). The likelihood that a continuous random variable is within an interval is computed by integration. For example, the likelihood that the random variable Z has a value between a and b in Figure 3.4 is: b f (3.3) ( z ) dz Z za The total sum of likelihood for all possible values of a random variable is equal to one. That is, we have the following relationship for the probability function in the discrete random variable case: n p i 1 Z ( zi ) 1 (3.4) where there are n possible values. For the probability density function in the case of continuous random variables we have: f Z ( z ) dz 1 (3.5) z - In geographical applications it is more common to work with continuous random variables than discrete random variables. In the next sub-section, annual rainfall is used to illustrate some statistical quantities. Annual rainfall can be regarded as a continuous random variable. 13 3. Statistical models Discrete random variable Z Continuous random variable Z pZ(z) fZ(z) a b Figure 3.4: Discrete and continuous random variables. In the discrete case the random variable can only have a fixed number of values (in the figure only 6 values are possible). The probabilities of the values are illustrated by the height of the vertical lines. For a continuous random variable all possible values along the number line are possible (e.g. all real numbers in an interval). 3.5 Expectation value, variance, standard deviation, covariance, correlation coefficient and semivariance The aim of this sub-section is to briefly describe some statistical quantities. In brief the quantities are: Expectation value - describes the most expected value of a random variable. Variance and standard deviation - describe the spreading of the random variable around the expected value. Covariance and correlation coefficient - describe the correlation between two random variables. Semivariance – describe the correlation between a random variable at two different locations. Below we state the definitions of the quantities. After each definition follows an example of how the quantity is estimated from a data material. Definition - Expectation value (of a continuous random variable Z), E(Z): E Z z f Z ( z ) dz z - where z are values of the random variable, and fZ(z) is the probability density function (cf. Figure 3.4). 14 (3.6) 3. Statistical models Example - Expectation value The expectation value can be estimated by the mean value ( z ) of all these observations. The mean value is computed by: n z z i 1 i (3.7) n The expectation value, variance and standard deviation are exemplified with data about annual rainfall measured at a meteorological station (Table 3.1). The annual rainfall is a (continuous) random variable Z and the measured values for each year are observations (zi) of this random variable. Table 3.1: Annual rainfall at a meteorological station (MS). Year Annual rainfall (mm) - MS 1 2000 412 2001 430 2002 512 2003 500 2004 440 If we are use the values from Table 3.1 in Equation 3.7 we get: 5 z z i 1 5 i 412 430 512 500 440 459 mm 5 (3.8) The expectation value is the most likely value of a random variable. If we have a data material where all observations have the same “status”, the expectation value is estimated by the mean value. However, this is not always the case. In Table 3.2 the observations do not have the same status. Some observations are made during a longer period (a decade rather than a year) and therefore have better quality (=a smaller variance). For this data material the mean value is not the best estimation of the expectation value. Table 3.2: Annual rainfall at a meteorological station (MS). Period Annual rainfall (mm) - MS 1 1970-1979 452 1980-1989 463 1990-1999 450 2000 412 2001 430 15 3. Statistical models Definition - Variance, V(Z): V Z E ( Z ) 2 (3.9) where E Z . Example - Variance In most cases it is not enough to know the expectation value of a random variable; we also need to know how the observations are spread around the expectation value. The variance describes this spreading of the observations. The variance (s2) is estimated by: s2 n 1 ( zi z ) 2 n 1 i 1 (3.10) This implies that we obtain the following for the annual rainfall example (using zi from Table 3.1): s2 5 1 ( z i 459) 2 1975 mm 2 5 1 i 1 (3.11) Be aware of the unit (mm2) of the variance of the annual rainfall. It is important that you always use the correct unit. One might wonder why we divide by “n-1” rather than by n. A short explanation for this is that we use the same data material to estimate the mean value ( z ) as we use for estimating the variance. If the expectation value of Z (E(Z)) is known we divide by n instead of “n-1”. For a more rigorous explanation, you should study literature in statistics describing the difference between a sample and the population from which it is drawn. Definition - Standard deviation, D(Z): DZ V Z (3.12) Example – Standard deviation The standard deviation is, as well as the variance, a measure of the spread of the observations. If the variance is known (or estimated) the standard deviation can easily be computed. For our example, we compute the following standard deviation (s) for the annual rainfall example: s s 2 44 mm 16 (3.13) 3. Statistical models Definition - Covariance, C(Z,W): CZ ,W E(Z Z ) (W W ) (3.14) where Z E Z , and W EW . Example – Covariance Expectation value, variance and standard deviation are all characteristics of a single random variable. The covariance is different. Covariance describes the dependency between two random variables. An example of a dependency is that if the observation of one random variable is high then the corresponding observation of the other random variable is also high. The covariance, between the random variables Z and W, is estimated by: c z ,w 1 n ( z i z ) (wi w ) n 1 i 1 (3.15) Here we will exemplify covariance by the random variables annual rainfall (Z) and annual temperature (W) (Table 3.3). As seen in Figure 3.5, there is a clear negative relationship between these random variables. Table 3.3: Annual rainfall and annual temperature at 6 meteorological stations. All values are measured during the same year. Meteorological station 1 2 3 4 5 6 Annual rainfall (mm) (Z) 454 512 812 725 556 630 17 Annual temperature (oCelsius) (W) 8.3 7.9 6.5 7.5 7.3 7.5 3. Statistical models 9 Annual temperature (degree Celsius) 8 7 6 5 4 3 2 1 0 0 200 400 600 800 1000 Annual rainfall (mm) Figure 3.5: Relationship between annual rainfall and annual temperature. A scatter plot of the values shown in Table 3.3. The mean values for the annual rainfall (z ) and annual temperature (w ) are: z = 614.8 mm w = 7.50 oCelsius These mean values are then used to estimate the covariance (cz,w) (using zi and wi from Table 3.3): c z ,w 1 6 ( z i 614.8) ( wi 7.50) - 71.0 mm o Celsius 6 1 i 1 (3.16) The covariance is here negative. This is always the case if there is a negative relationship between the random variables (see Figure 3.5). Pay attention to the unit of the covariance. 18 3. Statistical models Definition – Correlation coefficient, Z, W : Z ,W C Z ,W DZ DW (3.17) Example – Correlation coefficient In the example above we have computed the covariance to be -71.0 mm * o Celsius. Does this indicate a low or high correspondence between annual temperature and annual rainfall? It is not easy to answer this question; and, generally, it is not easy to judge correspondence between parameters solely based on covariance. If we had measured in another unit, e.g. inches instead of mm, the covariance would have been different. It becomes much easier if we compute the correlation coefficient. The correlation coefficient is the covariance normalised by the standard deviations of the two random variables (Equation 3.17). This makes the correlation coefficient unit less. Furthermore the correlation coefficient is equal to: almost 1 – if there is a positive linear dependence between two random variables, almost -1 – if there is a negative linear dependence between two random variables, and around 0 – if the random variables are linearly independent. To compute the correlation coefficient in our example above we proceed as follows. Start by computing the standard deviations (sz, sw): sz =135 mm sw =0,607 oCelsius. Then we obtain the correlation coefficient (r) by dividing the covariance by these standard deviations: r c z ,w s z sw 0.87 (3.18) The correlation coefficient is almost equal to -1. We can therefore conclude that there is a strong negative linear dependency between the two random variables. Or, in other words, the lower the temperature is, the more it rains. 19 3. Statistical models Definition – Semivariance, h : (h) 1 2 E Z ( x, y ) Z (( x, y ) h) 2 (3.19) where Z – is a continuous random variable x,y – are coordinates that determine the position in the plane (non-random) h – a vector that describes a translation in the x,y-plane. Example – Semivariance In sub-section 3.1 we introduced the concept spatial autocorrelation, but no measure of this concept was described. The aim is now to illustrate that semivariance can be used as a measure of spatial autocorrelation. But we start with a short investigation of the term spatial autocorrelation. The term has three major parts: spatial, auto and correlation. Firstly, correlation indicates that it is a correlation of entity values. Secondly, auto implies that it is a correlation between two observations of the same entity. Thirdly, spatial means that the two entity values are separated in space. To conclude, spatial autocorrelation is the correlation between two observations of the same entity in space. It can, for example, be the correlation between annual rainfalls measured at two meteorological stations. Spatial autocorrelation can be measured by spatial covariance and the spatial correlation coefficient. But normally semivariance is used. There is an analytical relationship between these measures. If the spatial covariance is known, you can always compute the semivariance. The exact relationship is not stated here, but we can conclude from the definition (3.14 and 3.19) that if the covariance is high, the semivariance is low and vice versa. Semivariance is a measure of the correlation between two random variables (of the same entity) at a distance h. Behind this definition lies the stationarity condition. The stationarity condition for a random variable Z states: the expectation value of Z is the same for all point, the correlation (or semivariance) between random values of Z at two points is only dependent on the distance (and sometimes also direction) between the points. To utilise semivariance these two conditions must be true. This implies that if the expectation value of Z varies we have to remove this variation before semivariance is computed. In the statistical model in Equation 3.1 the expectation value of Z is equal to m(x,y). To conclude, you should always start by estimating m(x,y) (e.g. by using polynomials in the plane). If m(x,y) is not a constant value it has to be removed before you compute the semivariance. Normally, you assume that the correlation between two points is only dependent on the distance between the point and not on the direction. This is called isotropy (=direction independent). This implies that the semivariance in Equation 3.19 is only a function of the length of the vector h; this length we denote h . 20 3. Statistical models After this rather theoretical discussion it is time to illustrate semivariance by an example. In this example semivariance is computed for measurements of annual rainfall at 16 meteorological stations (Table 3.4 and Figure 3.6). Table 3.4: Annual rainfall measured at 16 meteorological stations. Meteorological station Annual rainfall (mm) (Z) A 400 B 420 C 440 D 450 E 430 F 440 G 450 H 460 I 445 J 460 K 470 L 480 M 460 N 475 O 480 P 490 y 3 km D H L P 2 km C G K O B F J N A E I M 1 km x 1 km 2 km 3 km Figure 3.6: Locations of the 16 meteorological stations in Table 3.4. 21 3. Statistical models To compute the semivariance we start by creating distance intervals. In our example the point distribution is regular and the distances between the two arbitrary points are: 1.0000, 1.4142, 2.0000, 2.2361, 2.8284, 3.0000, 3.1623, 3.6056 or 4.2426 km. Each point pair belongs to one of these groups. E.g. the group “point-distance-2km” consists of the following point pairs: A-C, B-D, E-G, F-H, I-K, J-L, M-O, N-P, A-I, E-M, B-J, F-N, CK, G-O, D-L and H-P (as seen we use both vertical and horizontal point pairs – i.e., an isotropic model). For each group we estimate the semivariance by: 2 1 n (h ) z ( xi , yi ) z ( xi , yi ) h 2n i 1 (3.20) where n is the number of point pairs in the group. For the group “point-distance-2km”, which has 16 point pairs, we have: (2km) 1 ( z A z C ) 2 ( z B z D ) 2 ... ( z H z P ) 2 =447 mm2 2 * 16 The same procedure is repeated for all distance groups and the result is provided in Figure 3.7. We will come back to semivariance in sub-section 4.2 when kriging interpolation is described. 2500 Semivariance (mm2) 2000 1500 1000 500 0 0 0.5 1 1.5 2 2.5 Distance (km) 3 3.5 4 4.5 Figure 3.7: Plotted values for the semivariance as a function of distance for the data in Table 3.4 and Figure 3.6. 22 4. Characteristics of optimal interpolation and kriging 4. Characteristics of optimal interpolation methods and kriging This section starts by defining what we mean by an optimal interpolation method. Then the interpolation method kriging is described which under certain statistical prerequisites is optimal. 4.1 Characteristics of the optimal interpolation method In sub-section 2.6 a comparison between the interpolation methods mean value, nearest neighbour and inverse distance weighting are given. The example reveals that the interpolated values zmv, znn and zidw can be quite different. That sub-section also included a short discussion about which interpolation method to prefer. It is now time to come back to the issue of which interpolation method provides the best estimation. This time the discussion is based on a statistical approach. The interpolated value is a function of measured values (cf. Equation 3.1). Since the measured values are (continuous) random variables, the interpolated value is also a random variable. For example, in Section 2 we interpolated values for point p with three methods (mean value, nearest neighbour and inverse distance weighting). The result of each interpolation is a random variable: Zmv, Znn and Zidw. The random variables are characterized by their expectation values and their variance. The best interpolation method is the one that provides a random variable with the following two characteristics: * a correct expectation value, and * the smallest variance. An illustration of these characteristics is given in Figure 4.1. In this figure the probability functions are drawn for three interpolated random variables: Zmv, Znn and Zidw. All of the three random variables have expectation values equal to μ (and this is a correct expectation value). The interpolation method inverse distance weighting has the smallest variance; therefore this method is the best one in this example. That the method is the best one is synonymous with the fact that the probability of getting a good value (= a value close to the expectation value) with this method is larger than for the other methods. It is important to note that the best interpolation method (according to the statistical requirements above) does not always provide the best estimation. In Figure 4.1 the interpolation has resulted in three values: zmv, znn and zidw. These interpolated values are observation of the random variables: Zmv, Znn respective Zidw. As seen from the figure, the mean value interpolation (zmv) provides the best estimation (= the value closest to the expectation value). However, in a practical case the expectation value is unknown and therefore it is not possible to know which estimated value that is the best one. This means that the method that should be selected is the interpolation method that provides the best statistical properties. 23 4. Characteristics of optimal interpolation and kriging probability Znn Zmv Zidw Z μ zmv zidw znn Figure 4.1: Illustration of three random variables: Zmv (dashed-dotted line), Znn (solid line) and Zidw (dashed line). The random variables are results from interpolation methods mean value, nearest neighbour and inverse distance weighting. zmv, znn and zidw are interpolated values; these values are observations of the random variables Zmv, Znn and Zidw. (Note that this is just an illustration of concepts. The figure should not be interpreted as inverse distance weighting always provides a random variable with smaller variance than the other interpolation method. Nor is it always the case that zmv is the best estimation.) Above we stated that a good interpolation method must provide an estimate with a correct expectation value. In fact, all interpolation methods that are based on the general formula (Equation 2.1) provide a random variable with a correct expectation value. This fact is based on the assumption that the expectation value is the same for all points; this assumption is common in geographic analysis (see the discussion about stationarity in sub-section 3.5). We will now show that the general formula for interpolation provides an estimate with a correct expectation value. We start by stating that the expectation value is a linear operator. That is, the following rule holds: E aX Y a E X E Y (4.1) We now use the linear property in the general formula for interpolation (Equation 2.1) and utilise the fact that all the points have the same expectation value (=μ): 24 4. Characteristics of optimal interpolation and kriging n i * z ( x i , y i ) E z ( x p , y p ) E i 1 n i i 1 n i 1 i * E z ( xi , y i ) n i 1 i n i 1 i n i 1 n * i i i 1 n i 1 (4.2) i That each interpolation method that is based on the general formula provides a correct expectation value does not imply that all interpolation methods are equally good. The variance of the estimation varies depending on which interpolation method is used. And, according to the second characteristic, we should select a method that minimises the variance (cf. Figure 4.1). A method that is derived to minimise the variance is kriging, which is the topic of the remaining part of this section. 4.2 Workflow of kriging interpolation The aim of this sub-section is to describe the workflow of kriging interpolation. We will go through each step of the method as a recipe; the theoretical background is left to the next sub-section. 1) The point p is to be estimated from a set of observation (Figure 4.2). But before we perform the actual interpolation we must investigate the statistical properties of the entity. This investigation is based on the measured values at the observation points. Figure 4.2: The value of the interpolation point p is to be estimated from the observation points (non-filled circles). 25 4. Characteristics of optimal interpolation and kriging 2) Start by investigating the trend of the data. In other word we estimate the value m(x,y) in Equation 3.1. This investigation could e.g. be performed by fitting a polynomial surface to the observed points. If the trend is not constant it has to be removed before proceeding with the next step in the workflow. The reason for removing the trend is that kriging requires that the entity has the stationarity property (which requires that the expectation value is constant). The trend is removed by subtracting the trend surface from the original values. 3) Compute the semivariance between point pairs as a function of distance between the points. This procedure starts with computing the distance between each point pair of the observation points. Then the point pairs are grouped according to the distances between the points. It could for example be one group for point pairs with a distance between 0-1 km, one group for point pairs with a distance between 1-2 km, etc. Compute the semivariance for each group separately by applying Equation 3.20. The result of these computations is plotted in a graph (Figure 4.3). 4) A curve is fitted to the computed semivariances (Figure 4.3); this curve could either be a linear piecewise function, or a smooth mathematical function. This curve is called a semivariogram. A semivariogram looks ideally like follows. It starts with a rather low value (equal to the semivariance for points that are lying very close). This value is denoted the nugget. The semivariance then increases with longer distances between the points in the point pair. When the distance is equal to the range the semivariogram has reached its maximum value. Then the semivariance is a constant value – this value is denoted the sill. It is far from all practical examples where the semivariogram has this ideal look. Minor deviations from this ideal picture are not a major problem, but the deviations should not be too large. For example, in Figure 3.7 some semivariances are plotted as a function of distance. If a semivariogram would be fitted to these values an increasing linear function would provide the best fit. This could be interpreted as if the data material in Table 3.4 and Figure 3.6 covers such a small area that the longest distance is shorter than the range. If this would happen it is not a problem to apply kriging interpolation. But if the deviations are too large from the ideal look of the semivariogram then kriging interpolation should not be applied. This is further discussed in Section 6. 26 4. Characteristics of optimal interpolation and kriging (km) Figure 4.3: A graph that shows the estimated semivariance ( ˆ ) as a function of distance (h). Each dot in the graph represents a computation of the semivariance (using Equation 3.20) for a group of point pairs. A curve – semivariogram – is fitted to the points. The semivariogram is characterised by its sill (s), range (r), and nugget (n). 5) Perform the interpolation. The points that are used in the interpolation are points that are closer than the distance range to the interpolation point (Figure 4.2). The weights are set depending on the distance between the observation points and the interpolation point. The correspondence between the distance and the size of the weight is determined by the semivariogram (Figure 4.3). 6) If a trend was removed in step 2 it has to be added back again after the interpolation. To summarize, after the six steps above, you have completed a kriging interpolation. It is important that you go through the steps in the correct order; not all computer programs used for kriging interpolation will guide you through the workflow. 4.3 Theory of kriging interpolation Kriging interpolation is often called the optimal interpolation method. But it is important to note what is meant by optimal here. Kriging is optimal if: 1) Stationarity is assumed (i.e. the expectation value is the same for all points and the correlation is only dependent on distance between the points). 2) Optimal means the same as minimising the variance of the estimation (cf. Figure 4.1). 27 4. Characteristics of optimal interpolation and kriging A complete derivation of the kriging interpolation is outside the scope of this document. Instead a short justification of the kriging method is provided. Start by rewriting the general formula for interpolation (Equation 2.1) as: n z ( x , y ) i * z ( x i , y i ) p p i 1 n i 1 i 1 (4.3) Since the weights are scaled such that their sum equal to one, it is not necessary to divide by the sum of the weights (cf. Equation 2.1). In sub-section 4.1, it was stated that the general formula for interpolation provides estimations with correct expectation value; this holds true also for Equation 4.3. The other requirement of the interpolation method – minimising the variance – is the sole base for setting the weights in kriging. The weights are derived from the following optimisation problem: n n V Z ( x p , y p ) i Z ( xi , yi ) is minimised under the constraint that i 1 i 1 i 1 (4.4) At first sight this optimisation problem seems to be impossible to solve. How could we optimise the weights using the sought random value Z(xp,yp) in the optimisation expression? But it is not the actual value of Z(xp,yp) that is used in the derivations, it is rather the correlations between the value Z(xp,yp) and the values at the observation points Z(xi,yi); these correlations are given by the semivariogram. 28 4. Characteristics of optimal interpolation and kriging By solving the optimisation problem in Equation 4.4 we obtain the following weights for the kriging interpolation: λ G 1 γ where 1 λ 2 is a vector of the weights ... n 1,1 1, 2 ... 1,n 2, 2 ... 2,n 2 ,1 G is a matrix constructe d from the semivariog ram ... ... ... ... n ,1 n , 2 ... n ,n G -1 is the inverse of matrix G 1, 2 distance between point 1 and point 2 1, p 2, p γ a vector th at contains values from the semivariog ram for distances between ... (4.5) n , p the interpolat ion point ( p ) and the n observatio n points. As seen from Equation (4.5), it is quite complicated to compute the weights for the kriging interpolation. Luckily, there are several computer programs that perform these computations; i.e., it is not necessary to understand all the pertinent details. But it is important to understand from where kriging gets the weights. From Equation 4.5 we can conclude that the weights are only based on the semivariogram. This implies that before we perform the actual kriging interpolation (using Equation 3.3) it is necessary to derive a semivariogram for the entity. 29 5. Spatial prediction using additional information 5. Spatial prediction using additional information In some of the examples previously in this document we have interpolated annual rainfall. We have also stated that there is a relationship between annual rainfall and altitude (sub-section 3.3). This relationship is also illustrated in Figure 3.3. A sound idea would be to utilise this relationship while performing the interpolation. This is the basic idea of the method we have described as: spatial prediction using additional information. 5.1 Basic theory of spatial prediction The aim of spatial prediction using additional information is to interpolate a value for an entity from observation points. The difference from ordinary interpolation is that we use additional information. For example, to predict a value of annual rainfall we need the following data (besides the measured annual rainfall at some observation points): data about the topography (e.g. a digital elevation model – DEM), and a relationship between annual rainfall and topography. To perform spatial prediction of annual rainfall we need to define the relationship between annual rainfall and topography. If this relationship is not known for the study area it has to be estimated by empirical methods. The most common method to use is linear regression. In linear regression the following model is used: y x (5.1) where x is the independent parameter, y is the dependent parameter, α and β are the regression parameters. For the example of annual rainfall and topography we have the following model: annualRainfall Altitude (5.2) The relationship between the annual rainfall and the topography is described by estimating values of the regression parameters α and β. The regression parameters are estimated from measured annual rainfall and altitude at the observation points (cf. Table 5.1 and Figure 5.1). We do not here state the formulas for estimating these parameters. These formulas can be found in a standard textbook in statistics. Another approach to computing these parameters is to use a statistical program (e.g. SPSS) or spread sheet software (e.g. Excel) where these formulas are implemented. 30 5. Spatial prediction using additional information Table 5.1: Altitude and annual rainfall at the observation points. Altitude (m) Annual rainfall (mm) 35 130 50 55 70 75 90 180 150 610 760 600 630 670 640 705 815 780 900 850 Annual rainfall (mm) 800 750 700 650 600 550 500 0 20 40 60 80 100 120 Altitude (m) 140 160 180 200 Figure 5.1 illustrates the values of the observation points in Table 5.1. The line in the graph is called the regression line. α is equal to the y-intercept of the regression line, and β equals the slope. Finally, some words of caution are required. It is important only to use the regression model within the proper range. In Figure 5.1 we see that the regression line is determined by altitude values between 40 and 180 m. This implies that the regression is only valid within this range. To use values outside this range implies extrapolation; this is not a reliable method and should generally be avoided. It is also necessary to realize that autocorrelated data may violate one of the assumptions behind regression analysis – that of independent observations. If autocorrelation among the observations is present, one must be careful when interpreting the coefficients of explanation and significance statistics. 31 5. Spatial prediction using additional information 5.2 The workflow of spatial prediction In this sub-section we describe the required steps in spatial prediction. We illustrate the methodology by the example given in Table 5.1 and Figure 5.1. In this example we aim to predict annual rainfall where topography is used as the additional information. To perform the spatial prediction the following steps are required: 1) Establish an empirical relationship between annual rainfall and topography (Equation 5.2). To do this data from the observation points in Table 5.1 are used. In this example we obtain: α = 545 mm β = 1.56 mm/m. 2) Estimate the annual relationship for the new point p. This point is lying 89 m above the see level. AnnaulRainfallPointP = 545 [mm] + 1.56 [mm/m] * 89 [m] = 684 mm In the example above we have only estimated the annual rainfall for one point. But the same methodology could be used to estimate annual rainfall for many points. If, for example, we have a digital elevation model (DEM) we can predict the annual rainfall for a whole area. But, again one should be a bit cautious. Above we stated that the regression equation is only valid for the range in altitude from which it was derived. This is also true across a geographical area. That is, you are only allowed to predict annual rainfall in the same area where the observation points are lying. Predicting values outside this area implies extrapolation and should be avoided. 32 6. Spatial prediction using additional information 6. Selecting interpolation methods Imagine that you are working in a project and you need to interpolate data. How would you proceed? Which interpolation method should you select? The aim of this section is to provide some guidelines on how to select a proper interpolation method. An interpolation process should always start by an investigation of the entity. In our discussion here we base this investigation on the statistical model in sub-section 3.2: z( x, y) m( x, y) ( x, y) (6.1) The first thing to find out is whether there is a trend in the data (i.e., m(x,y) is not a constant value). This could be investigated by creating a thematic map based on the observation points or by adjusting a polynomial surface to the observation points. This provides you with a general feeling about the entity values in the region. The second thing to find out is the correlation between entity values, i.e., the spatial autocorrelation. The spatial autocorrelation is related to the parameters ( x, y) and in Equation 6.1 such that: ( x, y) << small spatial autocorrelation ( x, y) >> large spatial autocorrelation. A first investigation of the spatial autocorrelation could be to study a map of the observation data. If high (and low) entity values tend to be grouped in clusters it would seem as if the spatial autocorrelation was high. On the other hand, if the entity values look purely random, the spatial autocorrelation is probably low. This should be further investigated by computing semivariances and plotting them in a diagram (remember to remove the trend before computing the semivariances, cf. sub-section 4.2). A third thing to investigate is if any additional information for interpolation is available. This requires good knowledge of the entity to interpolate. E.g. if we are to interpolate annual rainfall we (hopefully) know that annual rainfall is related to topography. Then we could create a map where the distribution of the entity and the additional data is visualised together. For example, in Figure 3.3 we can clearly see that there is a relationship between annual rainfall and topography. If we have found a possible relationship we can statistically test this relationship. A way forward here is to use linear regression (Equation 5.1). Start by computing the regression parameters α and β from your data material (at the observation points). If there is a linear relationship between the parameters, the value of the regression parameter β is significantly not equal to zero (the same as that the regression line in Figure 5.1 is not horizontal). After these three investigations it is time to decide which interpolation method to use. A guideline for this selection is illustrated in Figure 6.1. More information about the different choices is provided in the remaining sub-sections. You should interpret the cases very small spatial autocorrelation and very large spatial autocorrelation as ideal cases. These cases seldom occur in reality in their pure form. But if you have a practical case that is somewhat similar to these cases these ideal cases should give you some clue of how you could solve your interpolation problem. 33 6. Spatial prediction using additional information Is there additional information available that has a large correlation with the entity? No Yes Does the entity have the following properties: No or a very small trend. No or very small spatial autocorrelation. Large local variations. Additional information available Use spatial prediction with additional information. Sub-section 6.1 Yes No Does the entity have the following properties: Very large autocorrelation. Small local variations. Very small spatial autocorrelation Use mean value. Use many observation points. Sub-section 6.2 Yes Very large spatial autocorrelation Nearest neighbour or inverse distance weighting with large k would be able to model the data. Sub-section 6.3 No Does the entity have the following properties: Very large trend. Comparatively small local variations and spatial autocorrelation. No Yes Very large trend Some type of trend surface analysis, e.g. using polynomials or splines. This type of interpolation is not described in this document. General case Use kriging or inverse distance weighting with normal k. Use all points closer than distance range. Sub-section 6.4 Figure 6.1: A guideline for the selection of an interpolation method. 34 6. Spatial prediction using additional information 6.1 Additional information available If you have additional information available with a high correlation to the entity the method spatial prediction using additional information may be a good choice for interpolation. The reason is that by using the additional information you can model a large extent of the variations of m(x,y) and/or ( x, y) using spatial prediction. But, of course, this is only true if there is a strong correlation between the sought entity and the additional information. So it is important to test this correlation (e.g. by a statistical hypotheses test of the regression parameter β). 6.2 Very small spatial autocorrelation The case that we denote very small spatial autocorrelation is characterised by: a) Very small spatial autocorrelation (i.e. ( x, y) is zero or close to zero). b) The trend is zero or small (i.e., m(x,y) is almost constant). For this case the semivariogram looks as in Figure 6.2 and the point value distribution is as in Figure 6.3. To illustrate the case very small spatial autocorrelation we can utilise the statistical model in sub-section 3.3. In this model of the annual rainfall the component ( x, y) was completely dependent on the altitude of the station. If this component is zero, it would imply that we have a flat landscape where the annual rainfall only is dependent on the climate zone (determines the trend m(x,y)) and local variations (determines ). Normally, it is assumed that the local variations can be modelled with the same distribution for all observation points ( =σ2). Semivariance γ(h) Distance (h) Figure 6.2: Semivariance for the case very small spatial autocorrelation. 35 6. Spatial prediction using additional information z 6 5 ε'' 4 3 m(x,y) 2 1 0.2 0.4 p 0.6 0.8 x 1.0 Figure 6.3: Interpolation of a point p in the case very small spatial autocorrelation. As can be seen, all points are independent and have the same expectation value (=m(x,y)). For the case very small spatial autocorrelation, the best interpolation method is the mean value of all observation points. The motivation is that this is the interpolation method that minimizes the variance of the interpolated value. We do not provide a complete proof of this, but just a hint for the mathematically inclined reader (this hint is difficult enough; if you do not understand the details you should not worry). The variance of the interpolated value, V(z(xp,yp)), is equal to (where N 0, ): n z ( xi , y i ) n 1 V z ( x , y ) 1 V z ( x p , y p ) V i 1 i i 2 n 2 i 1 n n n 1 n 2 2 2 2 n n i 1 n2 n V z ( x , y ) i 1 i i (6.2) As can be seen, the variance of the interpolated values is the variance of each individual observation divided by the number of observations. This is true for mean values; i.e., if all the points have the same weights. It can be shown that all other possible settings of the weights will give a higher variance for the interpolated value. The discussion above is quite abstract and it is, perhaps, helpful to describe it using the example from Figure 6.3. The ideal interpolated value for point p is m(x,y). However, this value is unknown; the only known values are the values measured at the observation points. These values are all random values with the expectation value m(x,y) and the variance σ2. Then, the best estimation of point p is the mean value of all the observation points. 36 6. Spatial prediction using additional information What can we learn from this? The main point here is that if we have independent observations (which are all random values with the same expectation value and variance) then the best estimation results are from using the mean value. This rule does not hold if the observations are dependent on each other (=they are spatially autocorrelated). 6.3 Very large spatial autocorrelation The case that we denote as very large spatial autocorrelation is characterised by: a) Very large to extreme spatial autocorrelation (i.e. ( x, y) is large) b) No or small local variations ( almost zero). The main characteristic of this case is that the local variations are much smaller than the spatial autocorrelation term. For this case the semivariogram looks as in Figure 6.4 and the point value distribution is as in Figure 6.5. To illustrate the case very large spatial autocorrelation we can utilise the statistical model in sub-section 3.3. =0 would imply that there are no local variations of the annual rainfall. Hence, the annual rainfall will be completely determined by the climate zone and the altitude. Semivariance γ(h) Distance (h) Figure 6.4: Semivariance for the case very large spatial autocorrelation. The interpolation problem for the case very large spatial autocorrelation is totally different from that of the previous case. For that case, the main problem was to reduce the larger local variations in the interpolation. For the case very large spatial autocorrelation the main problem is to capture the undulations of the entity. Figure 6.5 is an illustration of the distribution of an entity with a strong autocorrelation. As seen from the figure, a good estimation of point p is given by using the nearest observation point, i.e., nearest neighbour interpolation. Why is the nearest neighbour a good selection? The reason is, again, that this interpolation method provides an estimate with low variance. We can state the following: If the local variations are very small in comparison to the autocorrelation for all points ( ( x, y) >> ), two neighbouring points are strongly dependent on each other (the covariance between the points is high); therefore, the nearest neighbour 37 6. Spatial prediction using additional information interpolation is a good interpolation from a statistical point of view. The same argument holds if the trend is undulating dramatically, and this undulation is much larger than the local variations. This document mainly concerns the statistical aspects of selecting interpolation methods. One should be aware of that there are other aspects. One example is that if you interpolate a surface you often want a smooth surface (e.g. for visualisation purposes). In such a case the nearest neighbour interpolation method is not a good choice (even though it is a statistical good choice). The reason is that the nearest neighbouring method will provide a discontinuous surface. z 6 ε'(x,y) 5 4 3 m(x,y) 2 1 0.2 0.4 p 0.6 0.8 1.0 x Figure 6.5: Interpolation of a point p in case very large spatial autocorrelation. In this case ( x, y) >> and m(x,y) is constant. It could also be that m(x,y) is undulating. 38 6. Spatial prediction using additional information 6.4 General case In most real-life situations, neither ( x, y) nor is close to zero and we do not have access to additional information. This implies that we have the following situation: The closest observation points have highest dependency with the interpolation point, but also points a bit further away are correlated to the interpolation point. All points have spatially uncorrelated local variations ( ). No additional information is available. For this case the semivariogram looks as in Figure 6.6 and the point value distribution is as in Figure 6.7. The first point suggests that the closest points are the most important. But, since there are local variations we should use many observation points in the interpolation. The best interpolation method is here to set higher weights to the closest points and lower weights to point further away. This could be achieved by using kriging or inverse distance weighting with a normal value of k (2-3). We have previously described how to apply kriging interpolation (sub-section 4.2) and inverse distance weighting (sub-section 2.5) and will not describe these methods any further. Which observation points should be used in the interpolation? Again, the main aim is to select a proper number of points to minimise the variance in the estimation (cf. subsection 4.1). To minimise the variance we should only use points that are correlated to the interpolation point, and these points are the points that are closer than the distance range from the interpolation point (cf. Figure 6.6). Semivariance γ(h) range Distance (h) Figure 6.6: Semivariance for the general case. 39 6. Spatial prediction using additional information z 6 ε" 5 ε'(x,y) 4 3 m(x,y) 2 1 p 0.2 0.4 0.6 0.8 1.0 x, y Figure 6.7: Interpolation of a point p in the general case. Kriging is an optimal interpolation method when there is no trend and the semivariogram is known. It selects the best weights for the surrounding points, and also considers the point density and spatial distribution of the points. Thereby it minimizes some unwanted effects in inverse-distance interpolation, particularly the creation of local minima and maxima in the interpolated surface model (“bulls-eye effect”). Another advantage of kriging is that it produces an interpolation error for each interpolated cell. These errors can be printed in a map and will show areas of lower and higher fidelity of the interpolated surface. Many variations of kriging exist, for example, block kriging, that generates smoothed output; co-kriging, that incorporates correlated information (like elevation in the case of interpolating rainfall), universal kriging, that incorporates information about linear trends, and indicator kriging, that allows interpolation of data that is not normally distributed. Acknowledgements Thanks to Lars Eklundh for corrections and constructive comments. Thanks to Maj-Lena Lindersson for help with the climatology examples. Thanks to David Tenenbaum for improving the language. References Blennow, K., Bärring, L., Jönsson, P., Linderson, M.-L., Mattsson, J.O. & Schlyter, P., 1999: Klimat, sjöar och vattendrag. In: Germundsson, T. & Schlyter, P. (eds/.): Atlas över Skåne./ National Atlas of Sweden vol 18. Swedish Society for Anthropology and Geography. Almqvist & Wiksell, Stockholm. pp 30-37. (In Swedish) 40