Robust Estimation of the Variogram Noel Cressie & Douglas M. Hawkins Mathematical Geology, Vol. 12, No. 2, 1980 The Problem • Published work in geostatistics underplays the problems in estimating variograms. • It assumes that the underlying data are normal. • If the data are not normal, it assumes they have been transformed to normality. The Question • What are we trying to achieve with robust estimation of the variogram? • We are trying to show that other estimators whose means do not coincide with the conventional estimator may not be biased or undesirable. Reality of Data • Data are not normal. • Data are contaminated by outliers which results in a heavy-tailed distribution. Problem Solving Approach Seek estimators whose efficiency is high when data are normal. (e = O) Seek estimators which are reasonably insensitive to departures from normality. (e > 0). 4 methods for estimating the variogram were delineated A. Define the quantities Yt = (Zt+h - Zt) 2. Then 2g(h) =E(Y), and the estimation of the variogram becomes a problem of estimating the expectation of the random variables Yt which, under the normality of the Zt, follow scaled C2 distributions. B. If the sequence Z t may be assumed to be stationary (or to have been transformed initially to stationarity) then E(Zt+h)= E(Zt). Thus 2g(h) = var (Zt+h - Zt) and so the estimation of the variogram may be approached by estimating the variance of the symmetric random variables Zt+h - Zt C. For many purposes (including the estimation of kriging weights), it is enough to determine the set 3'(h) for all h up to an unknown constant multiple. Now, provided var (Zt) is finite: 2g(h) = 2 var (Zt) [1 - r(h)] where r(h) here refers to the autocorrelation at lag h. Thus an estimate of the set r(h) also determines the set g(h) up to the unknown variance. D. Assuming for the moment that the data form a traverse, there exists an autoregressive-moving average (ARMA) model describing them j(B)Z t = t~ (B)a t where j and ff me power series in the backshift operator B and the series a t is white noise. From j, y, and the variance of the at, one may deduce g(h), and any method of estimating j and y can yield g(h) up to an unknown constant multiple of the variance of the at. Preliminary Transformation • We have powerful tools, algorithms, and theory for the robust estimation of symmetry. • Nonsymmetric distributions are not well understood. • To help solve the problem, look for a method of estimating a center of symmetry. • The class of power transformations was chosen Yt = {(Zt+h - Zt)2} l because they are broad and may be suitable. Transformation _ • Assume Y is normally distributed. • The unbiased estimator for 2g(h) is: _ • Y4/(0.457 + 0.494n -1 + 0.045n -2). • Estimators must be asymptotically normal and have high asymptotic efficiency. • This applies to Yt even though they are not independent. • For problems of practical interest the interdependence between Yt and Ys will be negligible so there is justification in treating the { Yt} as if they were an independent, random sample. The Study-Testing the Theory • To test the theory 10 estimators were used: – – – – – – – – – – The mean of the Yt values. The median of the Yt values. 5% trimmed mean of Yt. 10% trimmed mean of Yt. 25% trimmed mean of Yt. The Huber M-estimator. The Tukey bisquare M-estimator. The Hampel M-estimator. The Andrews M-estimator The conventional estimator The Study • 500 traverses • Length of 50 units • Model used: Zt = 0.6Zt-1 + Ut and estimating g(1) using each estimator. The Ut were independent identically distributed random variables from the following distributions– – – – – – – A: N(0, 1) B: Standard Laplace: f(x) = ½ exp (- |x|) C: N(o, 1) 5% contaminated with N(0, 9) D: N(0, 1) 10% contaminated with N(0, 9) E: N(0, 1) 20% contaminated with N(0, 9) F: N(O, 1) 5% contaminated with N(0,100) G: Real data from the Hartebeestfontein gold mine Cases “If one assumes the process to be normally distributed, then a normal distribution for Zt and a "classical" geostatistical estimation and kriging scheme may be deduced.” “If, however, the process remains normal but the et process (white noise) is assumed to have some heavier-tailed distribution, then the Zt are not normal but exhibit occasional outliers. It is here that the use of good robust estimators of the variogram is desirable.” “These estimators, by reducing the sampling variability of estimates of g(h), can provide a more stable estimation procedure for gw(h), and any downward bias will merely imply that they underestimate s2e while leaving the structure of gw (h) unaffected.” “This determination in the nonnormal case may however have to be done in an iterative way by successively computing estimates W t and residuals Z t -W t and using these residuals to refine our estimates of the distribution of the et, and, hence, improve our estimates W t.” (page 123) Results • The conventional estimator is the most stable for the normal distribution. • The conventional estimator is the least stable for distributions with a heavy tail. • The M-estimators consistently had the smallest standard deviations for nonnormal cases. • The median performs poorly. • The mean of Yt performs well. • Tukey’s bisquare M-estimator was best for the trimmed means. Results • “The M-estimators are the estimators of choice, having quite good efficiency for normal data coupled with stability for all the heavy-tailed distributions studied.” • “Another very simple estimator, Y, the mean of the Yt performs equally well for the distributions contaminated with N(0, 9) data and, which is more significant, for the real data.” • “Y mean is not robust in that it goes to infinity if any Yt does, however it does seem to be robust enough to handle data which, while not normal, have outliers that deviate by not more than six or seven standard deviations.” Conclusions • The arithmetic mean of the fourth root of (Zt + h- Zt)2 gives a reasonably robust, stable estimate of the variogram. • The orthodox M-estimators are equally stable, and theoretically more robust. • Trimmed means and the median do not perform well.