Geostatistics Interpolation Why: So far, none of the interpolation methods discussed so far provides direct estimates of the quality of the predication made in terms of an estimation variance for the predicted values at certain location. In all methods, the only way to determine the goodness of the prediction would be to compute estimates for a set of extra checkpoints that had not been used in the original interpolation. A further objection of all methods so far is that there is no a priori method of knowing whether the best values have been chosen for the weighting parameters or if the size of the search neighborhood is appropriate. Moreover, no method studied so far provides sensible information on (Davis, 1989): The number of points needed to compute the local coverage The size, orientation, and shape of the neighborhood from which those points are drawn Whether there are better ways to estimate the interpolation weights than as a simple function of distance, The errors associated with the interpolated values Solution: Optimal methods of interpolation using Geostatistics methods. Naser El-Sheimy Why Geostatistics 3 Consider the following problem of local 4 2 estimation: Estimating the elevation of point (p) using the elevation of some 1 points p It is reasonable, as discussed before, to evolve an interpolation procedure 7 5 6 which weight point (1) “nearest point” higher than any other farthest points. A whole range of methods can be produced to decide on the weights according to each sample, mostly based on the distance (and possibly direction) of the sample from the point being estimated. Problems with this approach: 1. Which weighting factors are the best to choose (e.g. 1/d or 1/d2)? 2. How far do we go in including points? 3. How reliable is the estimated elevation? 4. Can we expect the same method to be equally valid on all types of terrain? On the other hand, the idea of weighting samples by some measure of their similarity to what is being estimated is intuitively appealing ‘Similarity’ can be measured by the covariance between the samples or by their correlation (BUT WHAT ABOUT STATIONARITY) Let us look instead at the ‘difference in elevation’ between samples and at the same time try to describe these differences as a function of the distance between them and use that in describing the similarity between samples: This is the concept of Variogram and Geostatistics Naser El-Sheimy Introduction: Geostatistics Geostatistics is a branch of applied statistics developed by George Matheron of the Centre de Morophologie Mathematicque in Fontainebleau, France (1960). A unique aspect of geostatistics is the use of regionalized variables which are variables that fall between random variables and completely deterministic variables. Typical regionalized variables are functions describing variables that have geographic distributions (e.g. elevation of ground surface). Unlike random variables, regionalized variables exhibit spatial continuity; however, the change in the variable is so complex that they cannot be described by any deterministic function. Furthermore, although regionalized variables are spatially continuos, it is not always possible to sample every location. Therefore, unknown values must be estimated from data taken at specific locations that can be sampled. The size, shape, orientation, and spatial arrangement of these samples constitute the support of the regionalized variables and influence the capability to predict the unknown samples. In brief, geostatistical analysis usually has the following steps: 1. Estimating the form of the regionalized variables (i.e. the spatial similarity between the samples) 2. Estimation of the surface using Point or Universal kriging. Naser El-Sheimy Hypothesis of Variograms Regionalized variable theory uses the semivariance to express the degree of relationship (similarity) between 3 points on a surface. 2 In the opposite figure, it seems sound to expect that the elevation of point (7) will be ‘very different’ from that at point (p), whilst point (1) will have a value ‘not very different’ from that at point (p). 1 p 7 6 Let us assume that the difference in elevation between two points depends only on the distance between them and their relative orientation (e.g. SN, NW, etc.). Suppose that we took a number of point-pairs that are (d) distance apart and we calculate the difference in elevation between these point-pairs. And once again but with distance (2d) and so on. Therefore, we can build up a histogram of the differences in elevation and how it relates to the distance between the point-pairs. By this way we can build an implicit assumption of the distance weighing technique into a statistical form. But investigating histograms is a very tedious job, therefore let us resort to the usual trick of summarizing the histogram by its parameters: the mean and the variance (of the differences). The quantity which of interest to us in these case is the variance of the differences in elevation between samples, but in these case we will call it semivariance or variogram (because it varies with the distance and the direction of the distance ‘d’) Naser El-Sheimy 4 5 Hypothesis of Variograms The variogram is a measure of how quickly things change on the average. The underlying principle is that, on the average, two observations closer together are more similar than two observations farther apart. Because the underlying processes of the data often have preferred orientations, values may change more quickly in one direction than another. As such, the variogram is a function of direction. Having defined a semivariogram or variogram, what sort of behavior do we except it to have? The units of the semivariance should have the same units as the variance of the elevation difference (i.e. m2, ft2, etc) The magnitude of the semivariance between points depends on the distance and orientation between the points: A smaller distance yields a smaller semivariance and a larger distance results in a larger semivariance. The semivariance = 0 at a distance d = 0. However, as points are compared to increasingly distant points, the semivariance increases. At some distance, called the range or span of the regionalized variable. The range is the greatest distance over which the value at a point on the surface is related to the value at another point (neighborhood). Semivariogram Models (Ideal Model) Sill Linear Sphere Exponential d Range or Span Lag d Naser El-Sheimy Calculating the Semivariance 1. Regularly Spaced Points (Omnidirection) Consider regularly spaced points distance (d) apart, the semivariance can be estimated for distances that are multiple of (d): (d ) 1 N (d ) ( z z ) 2 2 N (d ) i 1 i ih Where, zi is the measurement of a regionalized variable taken at location i zi+h is another measurement taken (h) intervals away Nd is number of separating distance = number of points – Number of Lags d \ d [1] [2] [3] [4] [2] Calculations for lag (d), [3] for lag 2d, and [4] for lag 3d 2. Irregularly Spaced Data The difference from the omnidirectional variograms is that h is a vector rather than a scalar. For example, if d={d1, d2}, then each pair of compared samples should be separated by d1 m in E-W direction and by d2 m in S-N direction, see the opposite figure: Naser El-Sheimy d2 d1 Irregularly Spaced Data (cont.) In practice, it is difficult to find enough sample points which are separated by exactly the same lag vector [d]. Thus, the set of all possible lag vectors is usually partitioned into classes. Directional variograms Vectors that end in the same cell are grouped into one class and variograms value is estimated separately for each class. The number of directions may be different (4, 8, 16, etc.) Variograms is estimated using the same equation as the omnidirectional one: N (d ) 1 (d ) ( zi z ) 2 i h 2 N (d ) i1 The only difference here is the points used in the equation are defined as the points located at the tail and head of vector [d]. Naser El-Sheimy Understanding semivariances The semivariance is not only equal to the average of the squared differences between pairs of points spaced at distance h, it is also equal to the variance of these differences. That is, the semivariance can also be defined as: 2 (Z i Z ih ) 1 N (d ) (d ) (Z Z ih ) 2 N (d ) i1 i N (d ) Note that the mean of the regionalized variable Zi is also the mean of the regionalized variable at Zi+h because these are the same observations, merely taken in a different order. That is Zi Zi h N N d d Therefore, their differences must be zero. That is: Zi Zi h ( Zi Zi h ) 0 N N N d d d Note that this strictly true if the regionalized variable is stationary. If the data are not stationary, the mean of the sequence changes with d and the d) equation must be modified. Semivariance h Autocorrelation 1 Autocorrelation Distance d Naser El-Sheimy Estimation of the surface using Point Kriging Kriging is named after a South African engineer, D. G. Krige. Kriging estimates are best linear unbiased estimates of the surface at the specified locations, provided the surface is stationary and the correct form of the variogram has been determined. Principle (the use of variograms in Kriging): Given: (3) points of known elevation 1 Required: Estimate the elevation of point (p) (call it Ze(p)) using a weighted average of the known samples: p Ze ( p) Wi Z ( pi ) 2 this is called linear estimator, and if the weights sum to 1 we call it unbiased linear estimator (remember if all weight are the same we will end up with the mean of all samples). For simplicity let us consider the following situation: Ze(p) = Z1 that is all weights, except W1, equal zeros. This estimated value will most likely differ from the actual value at point p, Za(p), and this difference is called the estimation error: p Z e ( p) Z a ( p) Z1 Z a ( p) and the variance of the estimation error = 2p (Z1 Z a ( p)) 2 which is the is the same form as the semivariance at a distance d1p (in fact it is twice the semivariance (d1 p ) . Naser El-Sheimy 3 Estimation of the surface using Point Kriging (Cont.) Now if we use the three known elevations, Z1, Z2, and Z3, to estimate the unknown elevation at point P, Ze(P). Three weights must be determined W1, W2, and W3, to make an estimate. Once the individual weights are known, estimation can be made by Ze ( p) W Z W Z W Z such that W1+ W2+ W3 = 1 1 1 2 2 3 3 The estimate and estimation error-depend on the weights chosen. Ideally, kriging tries to choose the optimal weights that produce the minimum estimation error. That is kriging minimize the estimation variance with respect to weights, i.e: 2 0 Wi i=1 to n (n is 3 in our example) This will provide n-equation in n-unknowns (W1, W2, and W3). These weights will provide an estimator that has minimum value of the estimation variance. However, they will not necessarily add up to one (there is nothing in the above equations that constraints the sum of the weight to one). Effectively we need to satisfy the equation Wi 1 . Thus, we actually will have (n+1) equation in (n) unknowns. To rectify this we must introduce another unknown, in the form of Lagrange multiplier, , to balance up the system (i.e. get n+1 equations in n+1 unknowns. Naser El-Sheimy Estimation of the surface using Point Kriging (Cont.) Therefore instead of minimizing the estimation variance, we actually minimize: 2 (Wi 1) with respect to W1, W2, ……..Wn and After differentiating 2 (Wi 1) with respect to W1, W2, W3 and , the following set of equation is obtained: W (d ) W (d ) W (d ) (d ) 1 11 2 12 3 13 1p W (d ) W (d ) W (d ) (d ) 1 21 2 22 3 23 2p W (d ) W (d ) W (d ) (d ) 1 31 2 32 3 33 3p W W W 1 1 3 3 Where (dij) is the semivariance between control points i and j corresponding to the distance between them, dij. Since dij = dji the left-hand matrix is symmetrical, with zeroes along the diagonal since the distance from a point to itself is zero. The values of the semivariances are taken from the known or estimated semivariogram. Separating these equations into matrix form yields ( d ) 11 ( d ) 21 (d 31 ) 1 With (d12 ) (d13 ) (d 21) (d 23 ) (d32 ) (d33 ) 1 1 an 1 W (d1 p ) 1 1 W (d 2 p ) 2 1 W (d ) 3 3p 0 1 estimation z2 W1 (d1 p ) W2 (d 2 p ) W3 (d3 p ) Naser El-Sheimy variance of: Universal Kriging In Punctual Kriging we assumed that the regionalized variable being mapped is stationary. Unfortunately, in DTM it may happens that regionalized variables are not stationary, but rather exhibit changes in their average values from place to place. If we attempt to compute a semivariogram for such a variable, we will discover that it may not have the properties we have discussed. However, if we reexamine the definition as in the following 2 N (d ) ) (Z Z 1 i i h ) equation: (d ) (Z Z 2 N (d ) i1 i ih N (d ) We will note that it contains two parts, the first is the difference between pairs of points and the second being the average of these differences. If the regionalized variable is stationary, the second part vanishes, but it is not stationary, this average will have some value. Therefore, the regionalized variable can be regarded as composed of two parts, the residual and the drift. The drift is the weighted average of points within the neighborhood around the unknown value. The residual is the difference between the regionalized variable and the drift. If the drift is removed from a regionalized variable, the residual then becomes a regionalized variable in itself which is stationary, and thereby, allowing a semivariogram to be made. Naser El-Sheimy Universal Kriging Universal Kriging can thus be regarded as consisting of 3 steps: 1. Achieving stationarity by estimating and removing the drift 2. Kriging the stationary residuals to obtain the needed estimates, and 3. Combining the estimated residuals with the drift to obtain estimates of the actual surface How it works Method 1: Removing the drift 1. Removing the drift using the global interpolation methods 2. Following that, the residuals can be treated as a regionalized variable and therefore punctual kriging methods can be used. 3. Note that the output of the punctual kriging is residuals. That is, in order to obtain actual surface, the drift has to be added back. Method 2: Estimating the drift simultaneously while kriging 1. Incorporate the drift into the system of simultaneous equations used to find the kriging weights as additional constraints. 2. Solving this expanded set of equations will produce a set of weights for the kriged estimate, which will include the effect of specified drift within the local neighborhood. Naser El-Sheimy Universal Kriging (Cont.) Estimating the drift require that the drift to be modeled by an arbitrary function of the points used in the computation, e.g. low order polynomial. That is the drift D at point P might be defined as either: Firs order polynomial: D p a1X i a2Yi Or second order: D p a1X i a2Yi a3 X iYi a4 X i2 a5Yi2 Where, Xi, Yi are the coordinates of the ith point within the neighborhood, and the a's are the unknown drift coefficients. The simplest example consists of kriging a point, assuming the drift is linear (i.e. the number of primary unknowns is three “one elevation and two drift parameters”). The linear drift model, given in, has two coefficients, so a minimum of three known points must be used in the drift estimation process, or we will run out of degrees of freedom. The following is an example of how to estimate both the drift and the regionalized variable by universal kriging: h11 h21 h31 h41 h51 h12 h22 h32 h42 h52 h13 h23 h33 h43 h53 h14 h24 h34 h44 h54 h15 h25 h35 h45 h55 1 1 1 1 1 X 1 Y 1 X 2 Y 2 X 3 Y 3 X 4 Y 4 X 5 Y 5 Naser El-Sheimy 1 1 1 1 1 0 0 0 Y W h1 p 1 1 1 X Y W h2 p 2 2 2 h3 p X Y W 3 3 3 X Y W h4 p 4 4 4 X Y W h5 p 5 5 5 0 0 1 0 0 X p 1 0 0 Y p 2 X