Geoderma 89 Ž1999. 1–45 Geostatistics in soil science: state-of-the-art and perspectives P. Goovaerts ) Department of CiÕil and EnÕironmental Engineering, The UniÕersity of Michigan, Ann Arbor, MI 48109-2125, USA Received 4 November 1997; accepted 6 July 1998 Abstract This paper presents an overview of the most recent developments in the field of geostatistics and describes their application to soil science. Geostatistics provides descriptive tools such as semivariograms to characterize the spatial pattern of continuous and categorical soil attributes. Various interpolation Žkriging. techniques capitalize on the spatial correlation between observations to predict attribute values at unsampled locations using information related to one or several attributes. An important contribution of geostatistics is the assessment of the uncertainty about unsampled values, which usually takes the form of a map of the probability of exceeding critical values, such as regulatory thresholds in soil pollution or criteria for soil quality. This uncertainty assessment can be combined with expert knowledge for decision making such as delineation of contaminated areas where remedial measures should be taken or areas of good soil quality where specific management plans can be developed. Last, stochastic simulation allows one to generate several models Žimages. of the spatial distribution of soil attribute values, all of which are consistent with the information available. A given scenario Žremediation process, land use policy. can be applied to the set of realizations, allowing the uncertainty of the response Žremediation efficiency, soil productivity. to be assessed. q 1999 Elsevier Science B.V. All rights reserved. Keywords: geostatistics; spatial interpolation; risk assessment; decision making; stochastic simulation 1. Introduction During the last decade, the development of computational resources has fostered the use of numerical methods to process the large bodies of soil data ) Fax: q1-313-734-2275; E-mail: goovaert@engin.umich.edu 0016-7061r99r$ - see front matter q 1999 Elsevier Science B.V. All rights reserved. PII: S 0 0 1 6 - 7 0 6 1 Ž 9 8 . 0 0 0 7 8 - 0 2 P. GooÕaertsr Geoderma 89 (1999) 1–45 that are collected across the world. A key feature of soil information is that each observation relates to a particular location in space and time. Knowledge of an attribute value, say a pollutant concentration, is thus of little interest unless location or time of measurement or both are known and accounted for in the analysis. Geostatistics provides a set of statistical tools for incorporating spatial and temporal coordinates of observations in data processing. Until the late 1980s, geostatistics was essentially viewed as a means to describe spatial patterns by semivariograms and to predict the values of soil attributes at unsampled locations by kriging, e.g., see review papers by Vieira et al. Ž1983. , Trangmar et al. Ž 1985. , and Warrick et al. Ž 1986. . New tools have recently been developed to tackle advanced problems, such as the assessment of the uncertainty about soil quality or soil pollutant concentrations, the stochastic simulation of the spatial distribution of attribute values, and the modeling of space–time processes. Because of their publication in a wide variety of journals and congress proceedings, these new developments are generally barely known by soil scientists who must also struggle with different sets of notation to establish links between all these techniques. This paper aims to provide a coherent and understandable overview of the state-of-the-art in soil geostatistics, refer to recent applications of geostatistical algorithms to soil data, and point out challenges for the future. I have extracted most of the material in this paper from my recent book ŽGoovaerts, 1997a. on the application of geostatistics to natural resources evaluation. The presentation follows the usual steps of a geostatistical analysis, introducing tools for description of spatial patterns, quantitative modeling of spatial continuity, spatial prediction and uncertainty assessment. Particular attention is paid to practical issues such as the modeling of sample semivariograms, the choice of an interpolation algorithm that incorporates all the relevant information available, or the incorporation of uncertainty assessment in decision making. Some common misunderstandings regarding the modeling of cross semivariograms, the use of the kriging variance or Gaussian-based algorithms will also be reviewed. The different concepts will be illustrated using multivariate soil data related to heavy metal contamination of an area of the Swiss Jura ŽAtteia et al., 1994; Webster et al., 1994. , kindly provided by Mr. J.-P. Dubois of the Swiss Federal Institute of Technology. 2. Description of spatial patterns Analysis of spatial data typically starts with a ‘posting’ of data values. For example, Fig. 1 shows the spatial distribution of five stratigraphic classes and of the concentrations of two heavy metals recorded, respectively, at 359 and 259 locations in a 14.5 km2 area in the Swiss Jura. For both continuous and P. GooÕaertsr Geoderma 89 (1999) 1–45 3 Fig. 1. Locations of sampling sites superimposed on the geologic map, and concentrations in Cd and Ni at 259 of these sites Žunitss mg kgy1 .. categorical attributes, the spatial distribution of values is not random in that observations close to each other on the ground tend to be more alike than those further apart. The presence of such a spatial structure is a prerequisite to the application of geostatistics, and its description is a preliminary step towards spatial prediction or stochastic simulation. 2.1. Continuous attributes Consider the problem of describing the spatial pattern of a continuous attribute z, say a pollutant concentration such as cadmium or nickel. The information available consists of values of the variable z at n locations u a , z Žu a ., a s 1,2, . . . ,n. Spatial patterns are usually described using the experimental semivariogram gˆ Žh. which measures the average dissimilarity between 4 P. GooÕaertsr Geoderma 89 (1999) 1–45 data separated by a vector h. It is computed as half the average squared difference between the components of data pairs: 1 N Žh . 2 gˆ Ž h . s z Žu a . y z Žu a q h. Ž1. Ý 2 N Ž h . as1 where N Žh. is the number of data pairs within a given class of distance and direction. Webster and Oliver Ž 1993. showed that reliable estimation of semivariogram values requires at least 150 data, and larger samples are needed to describe anisotropic Ž direction-dependent. variation. This does not mean, however, that geostatistics cannot be applied to smaller data sets. It is noteworthy that geostatistics has become the reference approach for characterization of petroleum reservoirs where the information available typically reduces to a few wells. Sparse data are often supplemented by expert knowledge or ancillary information originating from a better sampled area which is deemed similar to the study area; see also latter discussion on semivariogram modeling. Fig. 2 Žtop graphs. shows the omnidirectional semivariograms computed from the heavy metal concentrations displayed in Fig. 1. These graphs point out Fig. 2. Experimental omnidirectional semivariograms for Cd and Ni: original concentrations and indicator transforms using thresholds corresponding to the second- Ž — ., fifth- Ž – – . and eighth-decile Ž- - -. of the sample histogram. To facilitate the comparison, indicator semivariogram values were rescaled by the indicator variance. P. GooÕaertsr Geoderma 89 (1999) 1–45 5 distinct spatial behaviors of the two metals: Ni concentrations appear to vary more continuously than Cd concentrations, as illustrated by the smaller nugget effect and larger range of its semivariogram. In combination with a good knowledge about the phenomenon and the study area, such a spatial description can improve our understanding of the physical underlying mechanisms controlling spatial patterns ŽOliver and Webster, 1986a; Goovaerts and Webster, 1994; Webster et al., 1994. . In the present study, the long-range structure of the semivariogram of Ni concentrations is probably related to the control asserted by rock type, while the short-range structure for cadmium suggests the impact of local man-made contamination ŽGoovaerts, 1997a, p. 38.. Description of spatial patterns is too often restricted to the spatial variation over the full range of attribute values. Spatial patterns may, however, differ depending on whether the attribute value is small, medium, or large. For example, in many environmental applications, a few random ‘hot spots’ of large concentrations coexist with a background of small values that vary more continuously in space. Depending on whether large concentrations are clustered or scattered in space, our interpretation of the physical processes controlling contamination and our decision for remediation may change. The characterization of the spatial distribution of z-values above or below a given threshold value z k requires a prior coding of each observation z Žu a . as an indicator datum iŽu a ; z k ., defined as: i Žu a ; z k . s ½ 1 0 if z Ž u a . F z k otherwise Ž2. Indicator semivariograms can then be computed by substituting indicator data iŽu a ; z k . for z-data z Žu a . in Eq. Ž 1.: gˆ I Ž h; z k . s 1 N Žh . Ý i Žu a ; z k . y i Žu a q h; z k . 2 N Žh. 2 Ž3. as1 The indicator variogram value 2gˆ I Žh; z k . measures how often two z-values separated by a vector h are on opposite sides of the threshold value z k . In other words, 2gˆ I Žh; z k . measures the transition frequency between two classes of z-values as a function of h. The greater is gˆ I Žh; z k ., the less connected in space are the small or large values. Fig. 2 Žbottom graphs. shows the omnidirectional indicator semivariograms computed for the second-, fifth- and eighth-decile of the distributions of cadmium and nickel concentrations. To facilitate the comparison, all semivariograms values were standardized by dividing them by the indicator variance. For both metals, indicator semivariograms for small concentrations have smaller nugget effect than those for larger concentrations, which suggests that homogeneous areas of small concentrations coexist within larger zones where large and P. GooÕaertsr Geoderma 89 (1999) 1–45 6 medium concentrations are intermingled. Two clusters of small concentrations are indeed apparent on the location maps of Fig. 1 and correspond to the Argovian rocks. 2.2. Categorical attributes Many soil variables such as texture or water table classes take only a limited number of states which might be ordered or not. Spatial patterns of such categorical variables can also be described using geostatistics. Let S be a categorical attribute with K possible states sk , k s 1,2, . . . , K. The K states are exhaustive and mutually exclusive in the sense that one and only one state sk occurs at each location u a . The pattern of variation of a category sk can be characterized by semivariograms of type Ž 3. defined on an indicator coding of the presence or absence of that category: i Ž u a ; sk . s ½ 1 0 if s Ž u a . s sk otherwise Ž4. The indicator variogram value 2gˆ I Žh; sk . measures how often two locations a vector h apart belong to different categories skX / sk . The smaller is 2gˆ I Žh; sk ., the more connected is category sk . The ranges and shapes of the directional indicator semivariograms reflect the geometric patterns of sk . Fig. 3 shows the indicator semivariograms of two stratigraphic classes of Fig. 1 computed in four directions with an angular tolerance of 22.58. For both classes the indicator semivariogram value equals zero at the first lag, which means that any two data locations less than 100 m apart belong to the same formation. The longer SW–NE range Žlarger dashed line. reflects the corresponding preferential orientation of these two lithologic formations. Fig. 3. Experimental indicator semivariograms of Argovian and Sequanian rocks computed in four directions Ž —: 22.58, – –: 67.58, – – –: 112.58, . . . : 157.58; angular tolerances 22.58.. P. GooÕaertsr Geoderma 89 (1999) 1–45 7 Fig. 4. Experimental omnidirectional cross semivariogram between Cd and Ni. 2.3. BiÕariate description Soil information is generally multivariate, and geostatistics is increasingly used to investigate how the correlation between two soil properties varies in space. A measure of the joint variation of two continuous attributes z i and z j is the experimental cross semivariogram: gˆ i j Ž h . s 1 N Žh . Ý z i Žu a . y z i Žu a q h. P z j Žu a . y z j Žu a q h. 2 N Ž h . as1 Ž5. In other words, one looks at the joint variation of gradients of z i- and z j-values from one location to another a vector h away. If both attributes are positively related, an increase Ždecrease. in z i from u a to u a q h tends to be associated with an increase Ždecrease. in z j , and so the cross semivariogram value is positive as for the pair cadmium–nickel in Fig. 4. Cross semivariograms can also be computed from indicator values of type Ž2. in order to characterize the spatial connection of small or large values of two soil properties ŽGoovaerts, 1997a, p. 52.. For example, a similarity in the spatial distribution of large values for two heavy metals could indicate the existence of common sources of contamination. Another application of indicator cross semivariograms is the study of the spatial architecture of categories such as soil types in Goovaerts Ž 1994a.. 3. Modeling spatial variation Description of spatial patterns is rarely a goal per se. Rather, one generally wants to capitalize on the existence of spatial dependence to predict soil properties at unsampled locations. A key step between description and prediction is the modeling of the spatial distribution of attribute values. Most of P. GooÕaertsr Geoderma 89 (1999) 1–45 8 geostatistics is based on the concept of random function, whereby the set of unknown values is regarded as a set of spatially dependent random variables. Each datum z Žu a . is then viewed as a particular realization of a random variable ZŽu a .. An important characteristic of the random function is its semivariogram which is modeled from the experimental values. 3.1. UniÕariate case Consider the problem of modeling the semivariogram of a single attribute z from the set of experimental values gˆ Žh k . computed for a finite number of lags, h k , k s 1, . . . , K, in distance and direction. Most soil scientists are now aware that semivariograms can be modeled using only functions that are conditionally negative definite, such as the exponential or spherical models Ž McBratney and Webster, 1986. . Fewer know that the Gaussian semivariogram model is generally unrealistic and leads to unstable kriging systems and artifacts in the estimated maps Ž Wackernagel, 1995, pp. 109–111. . In many situations, two or more permissible models must be combined to fit the shape of the experimental semivariogram, as illustrated for cadmium in Fig. 5. Such nested model is written as: L g Ž h . s Ý b l g l Ž h . with b l G 0 Ž6. ls0 where b l is the positive sill or slope of the corresponding basic semivariogram model g l Žh.. The way in which these permissible models are chosen and their parameters Žrange, sill. are estimated is still controversial Ž Goovaerts, 1997a, pp. 97–107.. Several methods have been proposed, ranging from full black-box procedures which involve an automatic choice and fitting of the model to visual approaches in which the model is selected so that the fit looks satisfactory graphically. An Fig. 5. Experimental omnidirectional semivariogram for Cd, and the nested model fitted which includes a nugget effect and two spherical models with ranges of 200 m and 1.3 km. P. GooÕaertsr Geoderma 89 (1999) 1–45 9 intermediate approach consists of an automatic Ž least-squares. estimation of parameters of models chosen by the user Žsemi-automatic procedure.. Black-box procedures should be avoided because they cannot take into account ancillary information that is critical when sparse or preferential sampling makes the experimental semivariogram unreliable. Differences between semi-automatic and visual procedures reside in the criterion used to assess the goodness of the fit. The user might feel more comfortable if the choice of a particular model can be based on statistical criteria such as the sum of squares of differences between experimental and model semivariogram values. Using such criteria amounts to reducing semivariogram modeling to an exercise in fitting a curve to experimental values, which I think is too restrictive. The objective of the structural analysis is to build a permissible model that captures the major spatial features of the attribute under study. Although experimental semivariogram values play an important role in this process, ancillary information such as provided by physical knowledge of the area and phenomenon may be of great interest. For example, strong prior qualitative information may lead one to adopt an anisotropic model even if data sparsity prevents seeing anisotropy from the experimental semivariograms. In all situations one should avoid overfitting semivariogram models. The more complicated model usually does not lead to more accurate estimates. 3.2. BiÕariate case Modeling the coregionalization between two variables z i and z j requires the modeling of the two direct semivariograms g ii Žh. and g j j Žh. plus the cross semivariogram g i j Žh.. A difficulty lies in the fact that the three models must be built together. More precisely, one must ensure that the matrix of semivariogram models G Žh. s w g i j Žh.x is conditionally negative semi-definite for all lags h, which requires the following inequality to be satisfied for all lags h: < g i j Ž h . < F g ii Ž h . g j j Ž h . ;h ( Ž7. The apparent complexity of condition Ž 7. has discouraged some practitioners from modeling cross semivariograms and so using multivariate interpolation. In fact, condition Ž7. is easily checked as long as the three Ž cross. semivariograms are modeled as linear combinations of the same set of basic models g l Žh.: L g i j Ž h . s Ý bil j g l Ž h . ; i , j Ž8. ls0 or, using matrix notation, L G Ž h . s Ý Bl g l Ž h . Ž9. ls0 where Bl s w bil j x is referred to as a coregionalization matrix. Sufficient conditions for the so-called linear model of coregionalization Ž LMC. to be permissible P. GooÕaertsr Geoderma 89 (1999) 1–45 10 are: Ž1. the functions g l Žh. are permissible semivariogram models, and Ž 2. each coregionalization matrix is positive semi-definite, which implies that the coefficients bil j satisfy the following constraints: biil G 0, bjlj G 0 ; l Ž 10. ( < bil j < F biil bjlj ; l Ž 11. In practice, the modeling is done in two steps: 1. both direct semivariograms are first modeled as linear combinations of selected basic structures g l Žh. , 2. the same basic structures are then fitted to the cross semivariogram under the constraint Ž11.. This approach was used to fit visually the following model to the Ž cross. semivariograms of Cd and Ni displayed in Fig. 6: g Cd Ž h . s 0.3 g 0 Ž h . q 0.3 Sph Ž hr200 m . q 0.26 Sph Ž hr1.3 km . g Ni Ž h . s 11 g 0 Ž h . q 71 Sph Ž hr1.3 km . g Cd – Ni Ž h . s 0.6 g 0 Ž h . q 3.8 Sph Ž hr1.3 km . Ž 12. The requirement that all semivariograms must share the same set of basic structures might seem a severe limitation of the linear model of coregionaliza- Fig. 6. Experimental omnidirectional direct and cross semivariograms for Cd and Ni, and the linear model of coregionalization fitted. P. GooÕaertsr Geoderma 89 (1999) 1–45 11 tion. Variables that are well cross-correlated are however likely to show similar patterns of spatial variability. In addition, there is no need for the direct and cross semivariograms to include all the basic structures; for example, the Ni semivariogram and the cross semivariogram Cd–Ni do not include the shortrange Ž200 m. spherical structure. The reader is warned against the use of alternative models Že.g., Myers, 1982; Zhang et al., 1992. that are more flexible but do not provide easy way to check their permissibility Ž Goovaerts, 1994b. . It is unfortunate that such models have been mainly used in soil science. 3.3. MultiÕariate case For more than two variables Ž NÕ ) 2., checking the permissibility of the linear model of coregionalization becomes cumbersome because, for each structure g l Žh., the NÕ = NÕ matrix of coregionalization Bl must be positive semi-definite. Fortunately, Goulard Ž 1989. has developed an iterative procedure that fits the linear model of coregionalization directly under the constraint of positive semi-definiteness of all matrices Bl . Most applications of this innovative fitting technique have been in the field of soil science Ž Goulard and Voltz, 1992; Goovaerts, 1992; Voltz and Goulard, 1994; Webster et al., 1994. . 4. Spatial prediction The main application of geostatistics to soil science has been the estimation and mapping of soil attributes in unsampled areas. Kriging is a generic name adopted by the geostatisticians for a family of generalized least-squares regression algorithms. The practitioner often gets confused in the face of the palette of kriging methods available: simple, ordinary, universal or with a trend, cokriging, kriging with an external drift . . . . This section presents a brief description of the main methods and provides references to soil applications. A detailed presentation of the mathematics can be found in textbooks such as Isaaks and Srivastava Ž1989, pp. 278–337. and Goovaerts Ž 1997a, pp. 125–258. . 4.1. UniÕariate kriging Consider the problem of estimating the value of a continuous attribute z at any unsampled location u using only z-data z Ž u a ., a s 1, . . . ,n4 . All kriging estimators are but variants of the basic linear regression estimator Z ) Žu. defined as: n Žu . Z ) Ž u . y m Ž u . s Ý la Ž u . Z Ž u a . y m Ž u a . as1 Ž 13. P. GooÕaertsr Geoderma 89 (1999) 1–45 12 where la Žu. is the weight assigned to datum z Žu a . interpreted as a realization of the random variable ZŽu a . and located within a given neighborhood W Žu. centered on u. The nŽ u. weights are chosen so as to minimize the estimation or error variance s E2 Ž u. s Var Z ) Žu. y ZŽu.4 under the constraint of unbiasedness of the estimator. These weights are obtained by solving a system of linear equations which is known as ‘kriging system.’ Differences between kriging variants reside in the model considered for the trend mŽ u. in expression Ž13.. Ž1. Simple kriging Ž SK. considers the mean mŽu. known and constant throughout the study area A: m Ž u . s m, known ; u g A Ž2. Ordinary kriging ŽOK. accounts for local fluctuations of the mean by limiting the domain of stationarity of the mean to the local neighborhood W Žu.: m Ž uX . s constant but unknown ;uX g W Ž u . Unlike simple kriging, the mean here is deemed unknown. Ž3. Kriging with a trend model ŽKT., also known as ‘universal kriging,’ considers that the unknown local mean mŽuX . smoothly varies within each local neighborhood W Žu., and the trend is modeled as a linear combination of functions f k Žu. of the coordinates: K X m Ž u . s Ý a k Ž uX . f k Ž uX . , ks0 with a k Ž uX . f a k constant but unknown ; uX g W Ž u . The coefficients a k ŽuX . are unknown and deemed constant within each local neighborhood W Ž u.. By convention, f 0 ŽuX . s 1, hence ordinary kriging is but a particular case of KT with K s 0. These differences are illustrated in the one-dimensional example of Fig. 7 where Cd concentration is estimated every 50 m using at each location the five closest data. The middle graph shows the mean implicitly used by each kriging variant: global mean for SK, constant mean within local search neighborhoods for OK, local linear function of the x-coordinates for KT. For the latter two Fig. 7. Impact of the kriging algorithm on the estimation of the trend Žmiddle graph. and of Cd concentration Žbottom graph. along a transect. The vertical dashed lines delineate the segments that are estimated using the same five Cd concentrations. For example, the first segment, 1–2.1 km, includes all estimates that are based on the data at locations u 1 to u 5. P. GooÕaertsr Geoderma 89 (1999) 1–45 13 algorithms, the parameters of the trend model are constant within each segment where the same five neighboring data are used, and these are delineated by 14 P. GooÕaertsr Geoderma 89 (1999) 1–45 vertical dashed lines. The corresponding estimates of Cd concentration are displayed at the bottom of the figure. Simple kriging is inappropriate because it does not account for the increase in Cd concentration along the transect. Ordinary kriging with local search neighborhoods provides similar results to kriging with a trend model while being easier to implement. Large differences between the two estimators arise only beyond location u 10 and are due to the arbitrary extrapolation of the trend model evaluated from the last five data. Note that in this example the linear trend model yields negative estimates of concentration around 7 km. Such aberrant results show that the blind use of complex trend models is risky and may yield worse estimates than the straightforward ordinary kriging. 4.2. Accounting for exhaustiÕe secondary information When measurements are sparse or poorly correlated in space, the estimation of the primary attribute of interest is generally improved by accounting for secondary information originating from other related categorical or continuous attributes. This secondary information is said to be exhaustively sampled when it is available at all primary data locations and at all nodes of the estimation grid, e.g., a soil map or an elevation model. In this case, secondary data can be incorporated using three variants of the aforementioned univariate methods: kriging within strata, simple kriging with varying local means, and kriging with an external drift. Where major scales of spatial variation are related to changes in land use, soil type or lithology, secondary categorical information such as land use, soil or geologic maps can be used to stratify the study area Ž e.g., see Stein et al., 1988; Voltz and Webster, 1990; Van Meirvenne et al., 1994. . Kriging is then performed within each stratum using a semivariogram model and data specific to that stratum. When data sparsity prevents computing reliable semivariogram estimates within each stratum, experimental values can be combined into a single pooled within-stratum semivariogram. Fig. 8 shows a stratification of the transect of Fig. 7 based on geology. Within-stratum semivariograms computed from the entire study area indicate that Cd concentration varies more continuously within the stratum A1. Kriging within strata yields estimates Ž bottom graph, solid line. that suddenly change at the strata boundaries, as opposed to ordinary kriging without stratification Ž dashed line. . A second approach is simple kriging where the global mean is replaced by local means derived from a calibration of the secondary information. For example, the calibration of the geologic map of Fig. 1 yields an average Cd concentration for each rock type, see Fig. 9 Žleft top graph.. Residual data can be computed by subtracting from each Cd measurement the local mean of the prevailing rock type. Residual values are then estimated at unsampled locations P. GooÕaertsr Geoderma 89 (1999) 1–45 15 Fig. 8. Kriging within strata. The transect is first split into two strata A1 and A2 , according to geology Žtop graph.. Within each stratum, the Cd semivariogram is inferred and modeled Žmiddle graph., and Cd concentrations are estimated using ordinary kriging and stratum-specific data Žbottom graph, solid line.. Vertical arrows depict discontinuities at the strata boundaries. The dashed line represents the OK estimate without stratification. 16 P. GooÕaertsr Geoderma 89 (1999) 1–45 P. GooÕaertsr Geoderma 89 (1999) 1–45 17 using simple kriging and the semivariogram of residuals, see Fig. 9 Ž third row. . The final estimates of Cd concentration are obtained by adding the local means to the residual estimates. Unlike kriging within strata, data across geologic boundaries are used in the estimation, which attenuates discontinuities at boundaries. A similar approach was used by Bierkens Ž 1997. to incorporate soil map information in the mapping of heavy metal concentrations. To account for the accuracy of the soil map information in mapping, Heuvelink and Bierkens Ž1992. proposed a weighted average of soil map predictions and kriged estimates where more weight is given to the information with the smallest prediction error variance. A case study showed this heuristic method to produce a more accurate map of mean water table than either kriging or soil map prediction as long as the soil map accuracy is correctly estimated and point observations are scarce. If the secondary attribute is continuous, the local means in simple kriging can be derived by regression procedures. For example, Odeh et al. Ž1997. used a combination of multiple linear regression and ordinary Ž instead of simple. kriging to account for landform attributes derived from a digital elevation model in the prediction of percent topsoil organic carbon. Kriging with an external drift is but a variant of kriging with a trend model where the trend mŽu. is modeled as a linear function of a smoothly varying secondary Žexternal. variable y Žu. instead of a function of the spatial coordinates: m Ž u . s a 0 Ž u . q a1Ž u . y Ž u . K as opposed to: mŽu. s a0 Žu. q Ý ks1 a k Žu. f k Žu. for kriging with a trend model. Besides the difficult inference of the residual semivariogram, this method requires that the relation between primary trend and secondary variable is linear and makes physical sense. One of the few applications of the method to soil science is a paper by Bourennane et al. Ž1996. where the slope gradient is used as external drift for predicting the thickness of a pedological horizon. Gotway and Hartford Ž1996. used a similar approach to account for corn yield measurements and soil nitrate concentration data in the prediction of the amount of nitrogen left in the soil after harvest. In Heuvelink Ž1996., the external drift takes the form of a polygonal map of mean highest water table derived from a calibration of a soil map. Fig. 9. Simple kriging with varying local means. The trend component at each location is estimated by the average Cd concentration for the rock type prevailing at that location. Residuals are interpolated by simple kriging, and the results Žthird row. are added to the trend estimates to yield the Cd estimates Žbottom graph.. P. GooÕaertsr Geoderma 89 (1999) 1–45 18 4.3. Cokriging When the secondary information is not exhaustively sampled, the estimation can be done using a multivariate extension of the kriging estimator ŽEq. Ž 13.. which is referred to as cokriging: NÕ n i Žu . Z1) Ž u . y m1Ž u . s Ý Ý la iŽ u . Zi Ž u a i . y m i Ž u a i . Ž 14. is1 a is1 where la iŽu. is the weight assigned to the primary datum z 1Žu a i . and la iŽu., i ) 1, is the weight assigned to the secondary datum z i Žu a i .. Like kriging, three cokriging variants can be distinguished according to the models adopted for the trends of primary and secondary variables: simple, ordinary, and universal or with trend models. To be complete, one must also mention a variant of ordinary cokriging, called standardized ordinary cokriging Ž Isaaks and Srivastava, 1989, p. 416., which uses a single unbiasedness constraint that calls for all primary and secondary data weights to sum to one. To be unbiased though, the method requires a prior rescaling of all secondary variables to the primary mean. Because it does not call for the secondary data weights to sum to zero, ordinary cokriging with a single unbiasedness constraint gives a larger weight to the secondary information while reducing the occurrence of negative weights ŽGoovaerts, 1998.. Numerous examples of cokriging can be found in the soil literature, e.g., McBratney and Webster Ž 1983. , Yates and Warrick Ž 1987. , Stein et al. Ž1988., Leenaers et al. Ž1990.. Most of these studies, however, do not check the permissibility of the coregionalization model which is used in prediction; recall previous remark on the linear model of coregionalization. Cokriging is much more demanding than kriging in that NÕŽ NÕ q 1.r2 direct and cross semivariograms must be inferred and jointly modeled, and a large cokriging system must be solved. As emphasized by Wackernagel Ž 1994. on the occasion of Pedometrics’92, the additional modeling and computational effort implied by cokriging is not worth doing when primary and secondary variables are recorded at the same locations Žisotopic case. and the direct semivariogram g 11Žh. of the primary variable is proportional to the cross semivariograms with the secondary variables, g 1i Žh., i ) 1. More generally, practice has shown that cokriging improves over kriging only when the primary variable is undersampled with regard to the secondary variables and those secondary data are well correlated with the primary value to be estimated Ž Journel and Huijbregts, 1978, p. 326; Goovaerts, 1998. . Cokriging can also be used to incorporate secondary information that is exhaustively sampled. Several studies Ž Asli and Marcotte, 1995; Goovaerts, 1998. have shown that secondary data that are close or even co-located with the estimated location tend to screen the influence of further away secondary data. Thus, in the presence of highly redundant secondary information, one gains little by retaining more than one secondary datum in cokriging. Besides the screening P. GooÕaertsr Geoderma 89 (1999) 1–45 19 effect, the use of redundant information can make the cokriging system unstable. Cokriging using the single secondary datum collocated with the location being estimated is referred to as collocated cokriging Ž Almeida and Journel, 1994. . Whereas kriging with an external drift uses the secondary exhaustive information only to inform on the shape of the trend of the primary variable, cokriging exploits more fully the secondary information by directly incorporating the values of the secondary variable and measuring the degree of spatial association with the primary variable through the cross semivariogram. Moreover, in some situations, it is more realistic to use the secondary information as a full covariate than just an indicator of the trend shape ŽGotway and Hartford, 1996. . For example, it is important to account for the actual values and magnitude of local fluctuations of hydrologic parameters, such as specific capacity, in the prediction of conductivity. Another application of cokriging is the combination of different measurements of the same attribute. For example, a few precise laboratory measurements of clay content can be supplemented by more numerous field data collected using cheaper measurement devices. Measurement errors are likely to be larger for field data, and their semivariogram is likely to have a larger relative nugget effect. To account for such a difference in the patterns of spatial continuity, precise laboratory measurements and less precise field data are weighted differently through cokriging. Note that the secondary information can take the form of constraint intervals indicating that the primary attribute is valued between specific bounds, e.g., reference colors on a colorimetric paper to evaluate acidity levels. Secondary data are thus coded into indicators of type Ž2. prior to their incorporation in the cokriging estimator Ž Goovaerts, 1997a, pp. 241–244.. 4.4. Smoothing effect and kriging Õariance Estimation by kriging is best in the least-squares sense because the local error variance Var Z ) Žu. y ZŽu.4 is minimum. A shortcoming of the least-squares criterion, however, is that the local variation of z-values is smoothed: estimated values are typically much less variable than actual values, which is expressed by an overestimation of small values while large values are underestimated. Another drawback of the estimation is that the smoothing depends on the local data configuration; it is small close to the data locations and increases as the location being estimated gets farther away from sampled locations. This uneven smoothing yields kriged maps that artificially appear more variable in densely sampled areas than in sparsely sampled areas. For all these reasons, interpolated maps should not be used for applications sensitive to presence of extreme values and their patterns of continuity, typically soil pollution data and physical properties Žpermeability, porosity. that control solute transport in soil. A better alternative is to use simulated maps which reproduce the spatial variability modeled from the data, see later section. 20 P. GooÕaertsr Geoderma 89 (1999) 1–45 Kriging provides not only a least-squares estimate of the attribute but also the attached error variance. The so-called kriging variance is unfortunately often misused as a measure of reliability of the kriging estimate, as reminded by several authors Ž Journel, 1993; Armstrong, 1994. . By doing so, one assumes that the variance of the errors is independent of the actual data values and depends only on the data configuration, a situation referred to as ‘homoscedasticity.’ In the example of Fig. 10, the kriging variance is similar at locations uX1 and uX2 with similar data configurations, although the potential for error is expected to be greater at location uX2 , which is surrounded by a very large value and a small one, compared with location uX1, which is surrounded by two consistently small Cd values. Homoscedasticity is rarely met in practice because the local variance of data usually changes across the study area Ž nonstationarity. . For example, the Fig. 10. Ordinary kriging estimates of Cd concentration with the associated kriging variance. P. GooÕaertsr Geoderma 89 (1999) 1–45 21 sills of the within-stratum semivariograms of Fig. 8 indicated that the variance of Cd concentrations is larger within the stratum A1. In such a case, one might consider the possibility of subdividing the study area into more homogenous zones which are then treated as different statistical populations. Such subdivision, however, is conditioned by our ability to delineate the different zones in the field, and the availability of enough data to infer statistics such as semivariograms for each zone. Stationarity is a property of the random function model that is needed for statistical inference, and in most situations the ‘stationarity decision’ is taken by the user simply because he cannot afford to split data into smaller subsets! An interesting approach consists of rescaling locally a relative semivariogram computed from all the data so that its sill equals the variance of the data within the kriging search neighborhood Ž Isaaks and Srivastava, 1989, pp. 516–524.. Whereas this local rescaling does not change the kriging weights and so the estimated value, it might provide kriging variances that better inform on the actual estimation errors. Another way to correct for the lack of homoscedasticity is to transform the data to stabilize the variance, typically when the sample histogram is asymmetric and the local variance of data is related to their local mean Ž proportional effect. . A frequent transform consists of taking the logarithm of strictly positive measurements, y Žu a . s ln z Ž u a .. The problem lies in the back-transform of the estimated value y ) Žu. to retrieve the estimate of the original variable z at u. For example, the unbiased back-transform of the simple lognormal kriging estimate is: 2 z ) Ž u . s exp y ) Ž u . q s SK Ž u . r2 Ž 15. 2 Ž . where s SK u is the simple lognormal kriging variance. Because of the exponentiation of both kriging estimate and variance, final results are very sensitive to the semivariogram model which controls the kriging variance! 4.5. Factorial kriging Semivariogram models used in earth science are frequently nested in that several structures with different ranges or slopes are combined to fit the experimental curves. Recall expression Ž6. for the nested model: L g Žh. s Ý b l g l Žh. Ž 16. ls0 The random function ZŽu. with a nested semivariogram g Žh. can be interpreted as the sum of Ž L q 1. independent random functions Z l Žu., each with zero mean and semivariogram b l g l Žh. : L Z Žu. s Ý Z l Žu. q m Žu. ls0 Ž 17. P. GooÕaertsr Geoderma 89 (1999) 1–45 22 where the trend component mŽu. is assumed locally constant as in the practice of ordinary kriging. According to Burrough Ž 1983. , the variation of soil properties appears to be consistent with the hypothesis of the nested model, and examples of sources of variation affecting soil at different spatial scales are earthworms, geology, relief, to which one can add tree-throw and man’s divisions into farms and fields ŽOliver and Webster, 1986b. . Once several spatial scales have been identified on the semivariograms, the corresponding spatial components Z l Žu. can be estimated and mapped using a variant of kriging, known as factorial kriging or kriging analysis ŽMatheron, 1982; Goovaerts, 1992.. Several authors ŽGoovaerts, 1994c; Webster et al., 1994. have used factorial kriging to separate local variation in soil properties due to field-to-field differences or local sources of pollution from regional variation related to different soil types or geological classes. In these studies, the maps of spatial components served mainly as descriptive tools to improve our understanding of the sources of spatial variation. More recently, Bourgault et al. Ž1995. proposed to use the map of the regional component of soil electromagnetic response as secondary information in the cokriging of electrical conductivity. They capitalized on the fact that these two soil properties were better correlated at a regional scale Ž r s 0.63. than at a local scale Ž r s 0.30.. Such scale-dependent relations are frequent in soil science, e.g., see Goulard and Voltz Ž1992., Goovaerts and Webster Ž1994., Dobermann et al. Ž1995, 1997.. Many soil properties of the soil are controlled by the same physical processes which operate at different spatial scales and influence these properties in different ways. Accounting for the spatial scale in the study of correlations may enhance a relation between variables that is otherwise blurred in an approach where all different sources of variation are mixed, leading to a better understanding of the physical underlying mechanisms controlling spatial patterns. Multivariate factorial kriging, also called factorial kriging analysis Ž Matheron, 1982; Wackernagel, 1988, 1995, pp. 160–165., allows one to analyse relations between variables at the spatial scales detected and modeled from experimental semivariograms. Like factorial kriging, which is based on the linear model of regionalization Ž Eq. Ž 16.., multivariate factorial kriging is based on the specific linear model of coregionalization fitted to the experimental direct and cross semivariograms: L g i j Ž h . s Ý bil j g l Ž h . ; i , j ls0 Under that particular model, each random function Zi Žu. can be interpreted as the sum of independent random functions Zil Žu.: L Zi Ž u . s Ý Zil Ž u . q m i Ž u . ls0 Ž 18. P. GooÕaertsr Geoderma 89 (1999) 1–45 23 Characterizing the linear relation between the two variables Zi and Z j at a particular scale l amounts to computing the covariance between the two corresponding spatial components Zil Žu. and Z jl Žu., which is but the contribution Žsill, slope. of the structure g l Ž h. to the cross semivariogram g i j Žh.: Cov Zil Ž u . ,Z jl Ž u . s bil j ½ Ž 19. 5 A standardized measure of the spatial relation is the structural correlation coefficient defined as: r ilj s Corr Zil Ž u . ,Z jl Ž u . s ½ 5 bil j (b P b l ii l jj Ž 20. Application of expression Ž20. to the linear model of coregionalization Ž12. yields the following structural correlation coefficients for the pair Cd–Ni: 0.33, 0.0, and 0.88 at the micro- Ž nugget effect. , local Ž 200 m. and regional Ž 1.3 km. scales, respectively. The linear correlation between the original concentrations which embody all three different scales is 0.49. Taking into account the spatial scale in the correlation analysis reveals a strong correlation of the two metals at the regional scale that matches the scale of the stratigraphy, suggesting that the source of these metals is geochemical. Whereas multivariate analysis is usually done without regard to spatial position of observations, multivariate factorial kriging accounts for the regionalized nature of variables by analyzing the Ž L q 1. matrices of structural correlation coefficients separately. Each correlation structure is thus distinguished by filtering the structures belonging to other scales of spatial variation. Factorial kriging has been increasingly used these last five years in soil science. One must be aware that the decomposition into spatial components, and hence any subsequent interpretation, depends on the somewhat arbitrary decision of whether to include a particular component in the linear model of Ž co. regionalization. Therefore, during the structural analysis, it is crucial to take into account any physical information related to the phenomenon and the study area. 5. Modeling of local uncertainty There is necessarily some uncertainty about the value of the attribute z at an unsampled location u. In geostatistics, the usual approach for modeling local uncertainty consists of computing a kriging estimate z ) Žu. and the associated error variance, which are then combined to derive a Gaussian-type confidence interval ŽIsaaks and Srivastava, 1989, pp. 517–519., see Fig. 11 Žright top 24 P. GooÕaertsr Geoderma 89 (1999) 1–45 Fig. 11. Different models of uncertainty about the Cd concentration at the unsampled location u: 95% confidence interval derived from the ordinary kriging estimate and the associated kriging variance, and local distributions of probability Žccdf. established using either a multi-Gaussian or an indicator approach Žbottom graphs.. graph. . A more rigorous approach is to model the uncertainty about the unknown z Žu. before and independently of the choice of a particular estimate for that unknown. The interpretation of the unknown as a realization of a random variable ZŽu. leads one to model the uncertainty at u through the conditional cumulative distribution function Ž ccdf. of that variable: F Ž u; z < Ž n . . s Prob Z Ž u . F z < Ž n . 4 Ž 21 . where the notation ‘ <Ž n.’ expresses conditioning to the local information, say, nŽ u. neighboring data z Žu a .. The ccdf fully models the uncertainty at u since it gives the probability that the unknown is no greater than any given threshold z, see Fig. 11 Ž bottom graphs. . This section presents the main algorithms for P. GooÕaertsr Geoderma 89 (1999) 1–45 25 modeling these ccdfs, with an emphasis on indicator methods. The use of ccdf models in spatial prediction and decision making is discussed. 5.1. The parametric approach Determination of ccdfs is straightforward if an analytical model defined by a few parameters can be adopted for such distributions. Disjunctive kriging ŽMatheron, 1976; Webster and Oliver, 1989. and multi-Gaussian kriging Ž Verly, 1983. are two parametric approaches capitalizing on the congenial properties of the Gaussian model. The theory underlying disjunctive kriging is rather complex and requires the use of orthogonal polynomial expansions with their related convergence problems. The second approach is easier to implement in that, under the multi-Gaussian model, the mean and variance of the Gaussian ccdf at an unsampled location are but the simple kriging estimate and variance, see Fig. 11 Žleft bottom graph.. The congeniality of the parametric approach is balanced by strong assumptions about the normality of the two-point cdf for disjunctive kriging and of the multi-point cdf for multi-Gaussian kriging. The normality of the one-point distribution of data Žsample histogram. can be easily checked and, if required, data can be normalized using a transform which amounts at replacing the original values by the corresponding quantiles in the standard normal distribution ŽDeutsch and Journel, 1998, p. 141.. The normality of the one-point cdf is a necessary but not sufficient condition to ensure that a random function is bivariate, and a fortiori multivariate, Gaussian. A common misunderstanding in the soil literature about disjunctive kriging is that the normal score transform ensures the normality of the two-point distribution. In fact, such a univariate transform has no impact on the bivariate properties of the random function Ž Deutsch and Journel, 1998, p. 143; Rivoirard, 1994, p. 46.. Deutsch and Journel Ž 1998, p. 142. provide a way to check the normality of the two-point distribution from the shape of indicator semivariograms computed for a series of threshold values, see also Goovaerts Ž1997a, p. 271.. Unfortunately, such checks are rarely performed, and the adoption of a Gaussian-based approach is essentially a subjective decision too often driven by the simplicity of the corresponding algorithms. Multi-Gaussian models have their own merits but they should not be adopted if critical features of the data are not reproduced. In particular, the Gaussian model does not allow for any significant spatial correlation of very large or very small values, a property known as destructuration effect. Connected strings of small or large values are very important features in particular for soil pollution, and so the Gaussian model should never be considered when the structural analysis Žindicator semivariograms. or qualitative information indicates that very large or very small values are spatially correlated. Even in the absence of information about the connectivity of extreme values, the Gaussian model is not P. GooÕaertsr Geoderma 89 (1999) 1–45 26 conservative in the sense that it leads one to understate the potential for hazard, Ž 1997. for mass transport in an aquifer section as shown by Gomez-Hernandez ´ ´ with heterogeneous hydraulic conductivity. 5.2. The nonparametric approach Unlike the Gaussian-based techniques, the nonparametric algorithms do not assume any particular shape or analytical expression for the conditional distributions. Instead, the value of the function F Ž u; z <Ž n.. is determined for a series of K threshold values z k discretizing the range of variation of z: F Ž u; z k < Ž n . . s Prob Z Ž u . F z k < Ž n . 4 k s 1, . . . , K Ž 22. The resolution of the discrete ccdf is then increased by interpolation within each class Ž z k , z kq1 x and extrapolation beyond the two extreme threshold values z 1 and z K . For example, Fig. 11 Ž right bottom graph. shows the model fitted to the nine ccdf values estimated at location u. Nonparametric geostatistical estimation of ccdf values Ž Journel, 1983. is based on the interpretation of the conditional probability Ž Eq. Ž 22.. as the conditional expectation of an indicator random variable I Žu; z k . given the information Ž n.: F Ž u; z k < Ž n . . s E I Ž u; z k . < Ž n . 4 Ž 23. with I Žu; z k . s 1 if ZŽu. F z k and zero otherwise. Ccdf values can thus be estimated by least-squares Žkriging. interpolation of indicator transforms of data. Practical implementation of the indicator approach involves the following steps ŽGoovaerts, 1997a, pp. 284–328.. Ž1. Code each observation z Ž u a . into a vector of K indicator values: 1 if z Ž u a . F z k k s 1, . . . , K Ž 24. 0 otherwise The set of K threshold values is typically chosen such that the range of z-values is split into Ž K q 1. classes of approximately equal frequency, e.g., the nine deciles of the sample cumulative distribution. Ž2. For each threshold z k , compute the experimental indicator semivariogram ŽEq. Ž3.. , and model it using a linear combination Ž Eq. Ž 6.. of permissible semivariogram models. Ž3. At each unsampled location u, the following should be done. Ø Estimate each of the K ccdf values as a linear combination of neighboring indicator data by kriging as for continuous attributes; for example, the ordinary indicator kriging estimator is: i Žu a ; z k . s ½ ) F Ž u; z k < Ž n . . O IK s ) I Ž u; z k . OK n Žu . s Ý la Žu; z k . I Žu a ; z k . as1 Ž 25. P. GooÕaertsr Geoderma 89 (1999) 1–45 27 Ø Correct the estimated probabilities w F Žu; z k <Ž n..x ) that do not meet the following constraints: F Ž u; z k < Ž n . . ) F Ž u; z k < Ž n . . ) g 0,1 ; z k F F Ž u; z kX < Ž n . . Ž 26. ) ; z kX ) z k Ž 27 . Ø Interpolate or extrapolate ccdf values to build a continuous model for the conditional cdf, which allows one to retrieve the probability of being no greater than any threshold z, in addition to the K original thresholds z k . The indicator approach appears much more demanding than the multi-Gaussian approach both in terms of semivariogram modeling and computer requirements. This additional complexity is balanced by the possibility of modeling spatial correlation patterns specific to different classes of attribute values through indicator semivariograms. In particular, the connectivity of extreme values can be accounted for. Note that in many applications the objective is not to model the whole ccdf but rather to assess the probability of exceeding a particular value, say a regulatory threshold in soil pollution Ž Leonte and Schofield, 1996. or a critical value for soil quality Ž Smith et al., 1993. . In this case, the indicator approach is as straightforward as the parametric approach since a single indicator semivariogram and kriging system need to be considered. Fig. 12 Žfirst two rows. shows the different steps of the indicator approach to map the probability of contamination by cadmium in the region Žregulatory threshold z c s 0.8 mg kgy1 .. A usual criticism of the indicator approach is that the indicator coding amounts to discarding much of the information in the data. In the example of Fig. 12, all concentrations ranging from 0.81 mg kgy1 to 5.2 mg kgy1 yield the same unit indicator value, while all concentrations no greater than 0.8 kgy1 are translated into zero indicator values. In theory, this loss of information can be compensated by accounting for indicator values defined at different thresholds that is using indicator cokriging instead of kriging. Practice has shown, however, that indicator cokriging improves little over indicator kriging Ž Goovaerts, 1994d. because cumulative indicator data carry substantial information from one threshold to the next one, and all indicator values are available at each sampled location Žisotopic or equally-sampled case. . An alternative to indicator cokriging is probability kriging Ž Journel, 1984; Goovaerts, 1997a, p. 301. where ccdf values are estimated as a linear combination of neighboring indicator and uniform transforms of the data: ) F Ž u; z k < Ž n . . PK s ) I Ž u; z k . PK n Žu . s n Žu . Ý la Žu; z k . I Žu a ; z k . q Ý na Žu; z k . X Žu a . as1 a s1 Ž 28. 28 P. GooÕaertsr Geoderma 89 (1999) 1–45 Fig. 12. Estimation of the probability of contamination by cadmium using the indicator approach. The original concentrations are transformed into indicators of exceedence of the regulatory threshold 0.8 mg kgy1 that are then kriged using an indicator semivariogram. The bottom probability map accounts for additional information in the form of soft probabilities derived from a calibration of the geologic map of Fig. 1. The standardized rank x Žu a . of observation z Žu a . is computed as r Žu a .rn, where r Žu a . g w1,n x is the rank of the datum in the sample cumulative distribution. The data ranks valued in w0,1x allow one to discriminate Cd concentrations P. GooÕaertsr Geoderma 89 (1999) 1–45 29 with similar indicator transforms 0 or 1, and so corrects for the loss of resolution caused by the use of a single threshold in indicator kriging. A limitation of the indicator approach is the a posteriori correction of estimated probabilities that do not meet constraints Ž 26. and Ž 27. , although practice has shown that order relation deviations are generally of small magnitude ŽGoovaerts, 1994d. . A more elegant solution would consist of implementing these constraints directly into the indicator kriging system, as proposed by de Gruijter et al. Ž1997. for categorical variables. A potential pitfall is the interpolation or extrapolation of the corrected probabilities to derive a continuous ccdf model. Characteristics of the ccdf such as the mean or variance may overly depend on the modeling of the upper and lower tails of the distribution ŽGoovaerts, 1997a, p. 338.. A linear model is usually adopted for interpolation within each class Ž z k , z kq1 x, whereas power or hyperbolic models are used for extrapolation beyond the two extreme threshold values z 1 and z K ŽDeutsch and Journel, 1998, pp. 135–138.. The choice of these models is fully arbitrary, and I prefer to capitalize on the higher level of discretization of the cdf Ž i.e., the cumulative histogram. to improve the within-class resolution of the ccdf ŽGoovaerts, 1997a, p. 327.. For example, the resolution of the discrete ccdf of Fig. 11 Žright bottom graph. has been increased by performing a linear interpolation between tabulated bounds provided by the histogram of 259 Cd concentrations. An alternative to the piecewise interpolationrextrapolation of the ccdf model consists of fitting a continuous parametric model to the set of estimated probabilities. In all cases, the impact of extrapolation models can be reduced by selecting more threshold values within the two tails of the distribution Ž Deutsch and Lewis, 1992; Chu, 1996.. The major advantage of the indicator approach is its ability to incorporate soft information of various types Ž e.g., soil map or qualitative field observations such as the smell or color of contaminated soil. in addition to direct measurements on the attribute of interest. The only requirement is that each soft datum must be coded into a vector of K cumulative probabilities of the type: Prob Z Ž u . F z k <specific local information at u 4 k s 1, . . . , K Ž 29. Goovaerts and Journel Ž1995. showed how probabilities ŽEq. Ž29.. can be derived from the calibration of a soil map and combined with precise measurements of metal concentration to map the risk of deficiency of these metals in the soil. Similar applications in soil science are presented in Colin et al. Ž 1996. , Goovaerts et al. Ž1997. , and Journel Ž 1997. . Fig. 12 Ž bottom graphs. illustrates the use of the geologic map as soft information in mapping the risk of contamination by cadmium. Probabilities of contamination derived from a calibration of geology are used as local means in the simple kriging of indicator data; recall similar approach for continuous attributes in Fig. 9. Taking the geology into consideration produces a more detailed map and enhances the 30 P. GooÕaertsr Geoderma 89 (1999) 1–45 contrast between Argovian rocks, on which the soil contains little cadmium, and other rocks with larger probabilities of exceeding the regulatory threshold. Another interesting feature of the indicator approach is its ability to account for a secondary continuous variable, say another metal content, that is nonlinearly related to the primary variable. The idea consists of discretizing the two variables using two series of K thresholds Že.g., deciles of the sample cumulative distributions., then combining primary and secondary indicator data at each threshold using a cokriging algorithm similar to the one introduced for continuous variables Ž Zhu and Journel, 1993; Goovaerts, 1997a, p. 308.. 5.3. Using the model of local uncertainty Once the uncertainty about an unsampled value has been modeled using either parametric or nonparametric approaches, an estimate for that unknown can be retrieved from the ccdf, say the mean or the median of the conditional distribution. For example, the map in Fig. 13 Ž left top graph. depicts the mean of the conditional cdfs of Cd concentration modeled using an indicator approach. Fig. 13. Ordinary kriging estimates of cadmium concentration and the corresponding estimation variances Žright column.. Left maps show the mean ŽE-type estimate. and variance of ccdfs modeled using an indicator approach. P. GooÕaertsr Geoderma 89 (1999) 1–45 31 This map of conditional means, also known as E-type estimates, looks similar to the map of Cd concentrations estimated using ordinary kriging Ž Fig. 13, right top graph.. The advantage of the indicator approach over ordinary kriging is that it provides a measure of uncertainty that accounts for the local data, whereas the kriging variance depends only on the data configuration and semivariogram model; recall previous discussion and Fig. 10. These differences are clear on the two maps at the bottom of Fig. 13 which depict the variance of the conditional cdfs and the kriging variance, respectively. The conditional variance is larger in the high-valued parts of the study area where the Cd measurements fluctuate the most and so the largest uncertainty is intuitively expected. The uncertainty is smaller on Argovian rocks where Cd concentrations are consistently small. In contrast, the kriging variance map indicates greater uncertainty in the extreme west corner of the study area where data are sparse, whereas the uncertainty is smallest near data locations. Elsewhere the kriging variance is about the same whatever the surrounding data values. As mentioned previously, several methods could be used to correct for the lack of stationarity of the variance, such as data transformation, stratification of the study area, or local rescaling of a relative semivariogram. Mapping metal concentrations or other soil properties is often a preliminary step towards decision making, such as the delineation of polluted areas or the identification of zones that are suitable for crop growth. For soil pollution, a straightforward approach consists of declaring contaminated all locations where the pollutant concentration estimate exceeds the regulatory threshold. Similarly, a farmer may decide to grow a given crop wherever the estimated value of the limiting factor Že.g., depth to parent material, soil acidity. exceeds some threshold. Such an approach has two drawbacks: Ž1. the estimation error is ignored: a contaminated location can be declared safe on the basis of a wrong estimate of pollutant concentration which is slightly less than the regulatory threshold, Ž2. the decision rule requires crip thresholds such as provided by environmental protection agencies for soil pollution. For land evaluation, however, the use of crisp thresholds is generally inappropriate in that the crop response is continuous from a slight reduction in crop performance through to crop failure ŽBurrough, 1989. . For environmental applications where crisp thresholds exist the uncertainty about the predicted soil attribute can be accounted for by estimating and mapping the probability of exceeding those thresholds. A common question is then above which level of risk should we decide to clean a polluted area or develop specific land use policies ŽGoovaerts, 1997b. . When the probability is very large or very small, the risk-based decision is quite straightforward. Decision making is much more difficult for locations with intermediate probabilities, say in the interval w0.3,0.7x. Depending on the resources available, complementary investigation could be made to reduce the uncertainty at these locations, which amounts to decreasing or increasing the intermediate probabilities. Never- P. GooÕaertsr Geoderma 89 (1999) 1–45 32 theless, the choice of a probability threshold is mainly subjective: a given probability of contamination may be unacceptable for residential areas, yet tolerable for industrial yards. One must also keep in mind that probabilities are but estimates and so are subject to errors as the estimates of attribute values. The practical use of probability maps in decision making faces the difficult choice of crisp thresholds for attribute values and probability thresholds. Instead of the abstract notion of risk, decision makers usually prefer dealing with a financial assessment of the different options. For soil pollution, they would like to know the financial costs that might result from a wrong decision, i.e., declaring safe a contaminated location or cleaning a safe location. The idea is to go beyond a mere assessment of the risk and provide decision makers with a set of alternative solutions and the corresponding potential costs Ž Srivastava, 1987; Journel, 1987; Goovaerts and Journel, 1995; Goovaerts et al., 1997. . This requires, however, that decision makers provide scientists with the impact or cost functions specific to the problem at hand. Such an approach supports the assertion of Bouma Ž1997. that researchers must interact with stakeholders Žfarmers, planners and politicians. to analyze problems and propose cost-effective solutions. Fig. 14 Žleft top graph. shows an example of cost functions for soil contamination by cadmium Ž regulatory threshold z c s 0.8 mg kgy1 .. Consider first the decision of taking no remedial measure at a location u. If this location is actually safe, z Ž u. F z c , then the decision is correct and there is no cost. Otherwise, the cost of misclassifying a contaminated location Ž potential ill health, legal liability. is modeled as proportional to the actual contamination w z Žu. y z c x: L1 Ž z Ž u . . s ½ 0 if z Ž u . F z c z Žu. y z c otherwise Ž 30. The second cost function expresses the potential consequences of taking the decision of cleaning a location. If this location is actually safe, there is undue application of remedial measures and the corresponding cost is here modeled as a constant value of 2.5: L2 Ž z Žu. . s ½ 0 2.5 if z Ž u . ) z c otherwise Ž 31. The actual cost attached to either type of decision cannot be computed because the actual concentration z Ž u. of pollutant is unknown. A ccdf model allows one to account for the uncertainty about the unknown value and determine the expected cost for the two alternatives: q` wi Žu. s E Li Ž Z Žu. . < Ž n . s Hy` L Ž z Žu. . d F Žu; z < Ž n . . i s 1,2 i Ž 32 . P. GooÕaertsr Geoderma 89 (1999) 1–45 33 Fig. 14. Classification of locations as contaminated by cadmium on the basis that the resulting expected cost Žunnecessary cleaning. is smaller than the cost associated with wrongly classifying a location as safe Žpotential ill health.. Expected costs are computed using ccdf models and cost functions specific to each type of decision. This integral is, in practice, approximated by the discrete sum: Kq 1 w i Ž u . , Ý L i Ž z k . F Ž u; z k < Ž n . . y F Ž u; z ky1 < Ž n . . i s 1,2 Ž 33. ks1 where z k , k s 1, . . . , K, are K threshold values discretizing the range of variation of z-values, and z k is the mean of the class Ž z ky1 , z k x, e.g., z k s Ž z ky1 q z k .r2. Fig. 14 Žbottom graphs. shows the two maps of expected costs. At each location the health cost and the remediation cost are compared, and the decision of cleaning the location is taken if one can expect the health cost to be larger than the cost of unnecessary remediation. Rautman Ž 1997. presented a more sophisticated risk-based economic-decision model to evaluate the performance of different technologies for measuring uranium activity of in-situ soils. The concept of a cost function in soil pollution is analogous to that of a membership function introduced by Burrough Ž 1989. for land evaluation problems. The membership function is here used to express the suitability of a soil 34 P. GooÕaertsr Geoderma 89 (1999) 1–45 for a given purpose, say a given crop A, as a function of the value of a soil property. For example, the following function illustrates the impact of soil acidity on the growth of crop A: °1 if z Ž u . F 5 if z Ž u . ) 7 L Ž z Ž u . . s~0 ¢Ž7 y z Žu. . r2 if 5 - z Žu. F 7 The membership ranges from 1 to 0 and reflects the possibility that soil is too acid for crop A. This gradual response of the crop to soil pH is more realistic than crop failure below a crisp pH threshold. The actual membership cannot be computed since the actual pH value at u is unknown. However, the expected membership can be computed from the ccdf model using the same expression as for the expected cost ŽEq. Ž33.. . The concept of expected membership was introduced by Lark and Bolam Ž 1997. under the name of ‘weighted membership.’ This approach allows one to combine two types of uncertainty: the uncertainty about the value of a soil property at an unsampled location and the imprecision Ž fuzziness. of the impact of that soil property on land suitability. Note that one still faces the difficult problem of choosing a membership threshold for decision making, which could be overcome by converting memberships into economic impacts. 5.4. Categorical attributes Unlike continuous variables, categorical attributes such as texture or water table classes cannot be estimated as a mere linear combination of neighboring observations. In many situations, the unsampled location is simply allocated to the same category as the nearest observation, i.e., in the same Thiessen polygon or Dirichlet tile. Such an approach has two serious weaknesses: it ignores spatial correlation and transition probabilities between categories, and it provides no measure of the reliability of the prediction. Qualitative information can be handled by indicator algorithms as long as it is coded into indicator values, say 1 if the category is present and 0 otherwise; recall expression Ž4.. Soft indicators, i.e., probabilities valued between zero and one, can also be considered to account for the uncertainty in the classification of sampled locations Žfuzzy classification.. Then, indicator kriging is used to estimate the probability for each state sk of the attribute s to occur at the unsampled location u: p Ž u; sk < Ž n . . s Prob S Ž u . s sk < Ž n . 4 k s 1, . . . , K Ž 34. Secondary information such as provided by differences in lithology, landform or drainage can also be incorporated in the estimation of conditional probabilities. The prediction process amounts to allocating u to a single category on the basis of the set of conditional probabilities Ž Eq. Ž 34.. . Bierkens and Burrough Ž1993a,b. proposed to use as predictor the category with the largest probability P. GooÕaertsr Geoderma 89 (1999) 1–45 35 of occurrence which defines the ‘map purity.’ Such a criterion typically leads one to allocate most of the locations to the most frequent categories since the probability of occurrence is likely to be larger if the corresponding global proportion is large. Conversely, the less frequent categories tend to be underrepresented. This approach is thus inadequate if one aims at reproducing sample proportions that are deemed representative of the entire area. One solution consists of preferentially allocating locations to the category with the largest probability of occurrence under the constraint of reproduction of global proportions Ž Soares, 1992. . An application to the mapping of land uses is given in Goovaerts Ž 1997a, p. 357.. 6. Stochastic simulation As illustrated by the map of Fig. 15 Ž left column. , kriging tends to smooth out local detail of the spatial variation of the soil attribute. The variance of ordinary kriging estimates is much smaller than the sample variance sˆ 2 s 0.83, and the experimental semivariogram has a much smaller relative nugget effect than the semivariogram model, which indicates the underestimation of the short-range variability of Cd values. Unlike kriging, stochastic simulation does not aim at minimizing a local error variance but focuses on the reproduction of statistics such as the sample histogram or the semivariogram model in addition to the honoring of data values. In the example of Fig. 15, the same information Ž data, semivariogram model. is used by the kriging and simulation approaches, but the simulated map looks more ‘realistic’ than the map of statistically ‘best’ estimates because it reproduces the spatial variability modeled from the sample information. Stochastic simulation is thus increasingly preferred to kriging for applications where the spatial variation of the measured field must be preserved, such as the delineation of contaminated areas Ž Desbarats, 1996; Goovaerts, 1997c. or the modeling of solute transport in the vadose zone Ž Vanderborght et al., 1997. . 6.1. Modeling the spatial uncertainty One may generate many realizations that all honor the same data and match reasonably well the same statistics. For example, Fig. 16 shows three realizations of the spatial distribution of Cd values that all honor the 259 measurements of Cd concentration displayed in Fig. 1 and reproduce approximately the semivariogram model. The three images are consistent with the sample information, and their differences provide a measure of spatial uncertainty. Features such as zones of large values are deemed certain if seen on most of the realizations, and their probability of occurrence Ž i.e., probability that a given threshold is jointly exceeded at a series of locations. can be computed as long as 36 P. GooÕaertsr Geoderma 89 (1999) 1–45 Fig. 15. Ordinary kriging estimates and simulated values of Cd concentration over the study area. Bottom graphs show the corresponding histograms and standardized experimental and model Žsolid line. semivariograms. The smoothing effect of kriging leads to underestimation of the short-range variability of Cd values. the realizations are equiprobable. Such joint probabilities cannot be derived from the ccdfs introduced in the previous section: each ccdf is specific to a single location and thus provides only a measure of local uncertainty. In many situations, decision making concerns areas or blocks that are much larger than the measurement support, such as the delineation of 1-ha remediation units from soil core measurements. Provided the soil variable averages linearly, P. GooÕaertsr Geoderma 89 (1999) 1–45 37 Fig. 16. Three realizations of the spatial distribution of Cd values over the study area and the corresponding standardized semivariograms. Differences between realizations provide a model for the uncertainty about the distribution in space of Cd values. the average value over the large support can be estimated by block kriging or, indirectly, by the arithmetical mean of kriging estimates at a series of locations discretizing this support. This linear averaging, however, does not apply to probability values: the probability that the average attribute value exceeds a given threshold Ž block probability. is not equal to the linear average of local probabilities of exceedence defined at a series of discretizing locations! Such an upscaling can be done using disjunctive kriging under the stringent assumption of bivariate normality ŽWebster, 1991. . A nonparametric alternative consists of approximating numerically the block probability by generating many simulated block values as linear averages of simulated point values, and counting the proportion of block values that exceed the critical threshold Ž Isaaks, 1990; Kyriakidis, 1997. . The major advantage of the simulation-based approach is that it provides a nonparametric measure of the uncertainty attached to the prediction of a single block or multiple spatially dependent blocks Ž Goovaerts, 1999. . 38 P. GooÕaertsr Geoderma 89 (1999) 1–45 Stochastic simulation can also be used to upscale properties like permeability that do not average linearly in space. Note that the use of block probability may lead to over-optimistic decisions in that large local probabilities of contamination or large local pollutant concentrations are smoothed out through the averaging. One would then prefer dealing with local probabilities of contamination and, for example, decide to clean the block if the probability threshold is exceeded for a given proportion of locations within the block. Such a proportion can be derived using block-indicator kriging ŽGoovaerts, 1997a, p. 306.. Other decision making criteria for soil contamination are discussed in Goovaerts Ž 1997b. and Kyriakidis Ž 1997.. The impact of a given scenario, such as a particular remediation process or land use policy, can be investigated from a simulated map that reproduces aspects of the pattern of spatial dependence or other statistics deemed consequential for the problem at hand Že.g., connectivity of large values, spatial correlation with secondary attributes.. Moreover, having many equiprobable realizations allows one to assess the uncertainty about the consequences of this particular scenario. Consider the problem of assessing the economic impact of declaring the study area safe with respect to Cd. For each realization, this global cost is obtained as the sum of local costs computed from each simulated value using the cost function Ž Eq. Ž 30.. . The three realizations of Fig. 16 yield, respectively, a cost of 1058, 1001, and 825. These differences reflect the uncertainty about the economic impact of taking no remedial measure which results from our imperfect knowledge of the spatial distribution of Cd values in the area. The same operation can be repeated for 100 realizations, and the resulting histogram of costs allows a probabilistic assessment of financial consequences, e.g., there is here a 75% probability that the cost will be larger than 930 Ž Fig. 17. . Fig. 17. The distribution of costs resulting from a wrong decision to declare the study area safe with respect to Cd. This distribution is obtained by applying the cost function of Fig. 14 Žsolid line. to 100 realizations of the spatial distribution of Cd values. P. GooÕaertsr Geoderma 89 (1999) 1–45 39 6.2. Simulation algorithms Like estimation, simulation can be accomplished using a growing variety of techniques which differ in the underlying random function model Ž multi-Gaussian or nonparametric. , the amount and type of information that can be accounted for, and the computer requirements. Goovaerts Ž 1997a. Ž pp. 376–424. provide a detailed description of the main simulation algorithms, and these are available in the public domain geostatistical software library GSLIB ŽDeutsch and Journel, 1998. . The key concept of each class of algorithms is briefly outlined below. Consider the simulation of the continuous attribute z at N grid nodes uXj conditional to the data z Žu a . , a s 1, . . . ,n4 . Sequential simulation amounts to modeling the ccdf F ŽuXj ; z <Ž n.. s Prob ZŽuXj . F z <Ž n.4 , then sampling it at each of the N nodes visited along a random sequence. To ensure reproduction of the semivariogram model, each ccdf is made conditional not only to the original n data but also to all values simulated at previously visited locations. Two major classes of sequential simulation algorithms can be distinguished, depending on whether the series of conditional cdfs are modeled using the multi-Gaussian or the indicator formalism. The probability field approach also requires the sampling of N successive ccdfs. However, unlike in the sequential approach, all ccdfs are conditioned only to the original n data. Reproduction of the semivariogram model is here approximated by imposing an autocorrelation pattern on the probability values used for sampling these ccdfs. Unlike previous simulation algorithms, in simulated annealing the creation of a stochastic image is formulated as an optimization problem without any reference to a random function model. The basic idea is to perturb an initial Žseed. image gradually so as to match target constraints, such as reproduction of the semivariogram model. Stochastic simulation has been seldom used in soil science so far unlike in mining or petroleum. Most applications relate to soil pollution and involve diverse algorithms such as sequential Gaussian simulation Ž Desbarats, 1996. , sequential indicator simulation Ž Goovaerts, 1997c; Garcia and Froidevaux, 1997., probability field Ž Bourgault et al., 1995; Kyriakidis, 1997. . Simulated annealing was used by Sterk and Stein Ž 1997. to generate unsmoothed maps of wind-blown sediment transport. Bierkens and Burrough Ž 1993a,b. used the categorical formalism of sequential indicator simulation to generate maps of water table classes which were then used to assess land suitability for pastures. 7. Conclusions and perspectives The growing interest of soil scientists in geostatistics arises because they increasingly realise that quantitative spatial prediction must incorporate the spatial correlation among observations. Also, geostatistics offers an increasingly 40 P. GooÕaertsr Geoderma 89 (1999) 1–45 wide palette of techniques well suited to the diversity of problems and information soil scientists have to deal with. The recent developments of data acquisition and computational resources have provided the geostatistician with large amounts of information of different types Žcontinuous, categorical. , which can be stored and managed in Geographic Information Systems, and which they can process rapidly. Multivariate approaches, such as factorial kriging analysis, can be used to investigate how the correlation between variables changes as a function of the spatial scale, and to improve our understanding of scale-dependent physical processes. Multivariate geostatistical interpolation also allows one to supplement a few expensive measurements of the attribute of interest Že.g., metal concentrations. by more abundant data on correlated attributes that are cheaper to determine Že.g., pH or elevation. . In particular, indicator geostatistics enables secondary categorical information such as provided by land use or soil map to be accounted for in the prediction of continuous variables. There is still research to be done on the incorporation of variables measured on different supports, in particular the combination of field data with remote sensing information. Another avenue of research is the three-dimensional spatial modeling of soil processes since the development of measurement techniques allows the collection of a larger amount of data in both the horizontal and vertical directions. There is necessarily uncertainty about the attribute value at an unsampled location, and its assessment is critical for some applications. The uncertainty can be assessed from the kriging variance using a Gaussian-type confidence interval centered on the kriging estimate. A more promising approach is to assess first the uncertainty about the unknown, then deduce an estimate that is optimal in some appropriate sense. This can be achieved using an indicator approach that provides not only an estimate but also the probability of exceeding critical values, such as regulatory thresholds in soil pollution or criteria for soil quality. The last five years have witnessed a growing use of geostatistics to create colorful probability maps, yet the practical use of these maps for decision making, such as delineation of contaminated areas or areas of good soil quality, has received little attention. We should now combine probabilistic models provided by geostatisticians with expert knowledge in order to assess the financial impacts of the different options available as is being done in mining. In the future, more research should be devoted to this important topic, in particular the incorporation of the uncertainty related to several correlated attributes such as multiple criteria for land suitability. Another way to model uncertainty is to generate numerous images Ž realizations. that all honor the data and reproduce aspects of the patterns of spatial dependence or other statistics deemed consequential for the problem at hand. A given scenario Ž remediation process, land use policy. can be applied to the set of realizations, allowing the uncertainty of the response Ž remediation efficiency, soil productivity. to be assessed. Stochastic simulation is one of the most active P. GooÕaertsr Geoderma 89 (1999) 1–45 41 areas of research in geostatistics, and one can expect an increasing use of these techniques to model soil spatial uncertainty. The modeling of the space–time variability is also becoming one of the major avenues of research is environmental geostatistics. The interested reader should refer to the following papers for a brief overview of the techniques currently available and references to the few space–time studies that have been conducted Ž 1994. , so far in soil science: Goovaerts and Sonnet Ž 1993. , Papritz and Fluhler ¨ Heuvelink et al. Ž1997.. Finally, geostatistics was originally developed to solve practical problems, namely, evaluating recoverable reserves in mining and its growth bears witness to its utility and success. We should do well to keep this trademark of practicality in soil science and to foster the increasing and successful application of geostatistics to soil-related issues. Acknowledgements This work was partly done while the author was with the Unite´ Biometrie, ´ Universite´ Catholique de Louvain, Belgium. The author thanks Mr. J.-P. Dubois of the Swiss Federal Institute of Technology at Lausanne for the data and the National Fund for Scientific Research Ž Belgium. for its financial support. Data can be downloaded from http:rrwww-personal.engin.umich.edur;goovaertr. References Almeida, A., Journel, A.G., 1994. Joint simulation of multiple variables with a Markov-type coregionalization model. Math. Geol. 26, 565–588. Armstrong, M., 1994. Is research in mining geostats as dead as a dodo? In: Dimitrakopoulos, R. ŽEd.., Geostatistics for The Next Century. Kluwer Academic Publishers, Dordrecht, pp. 303–312. Asli, M., Marcotte, D., 1995. Comparison of approaches to spatial estimation in a bivariate context. Math. Geol. 27, 641–658. Atteia, O., Dubois, J.-P., Webster, R., 1994. Geostatistical analysis of soil contamination in the Swiss Jura. Environ. Pollut. 86, 315–327. Bierkens, M.F.P., Burrough, P.A., 1993a. The indicator approach to categorical soil data: I. Theory. J. Soil Sci. 44, 361–368. Bierkens, M.F.P., Burrough, P.A., 1993b. The indicator approach to categorical soil data: II. Application to mapping and land use suitability analysis. J. Soil Sci. 44, 369–381. Bierkens, M.F.P., 1997. Using stratification and residual kriging to map soil pollution in urban areas. In: Baafi, E.Y., Schofield, N.A. ŽEds.., Geostatistics Wollongong ’96. Kluwer Academic Publishers, Dordrecht, pp. 996–1007. Bouma, J., 1997. The role of quantitative approaches in soil science when interacting with stakeholders. Geoderma 78, 1–12. Bourennane, H., King, D., Chery, ´ P., Bruand, A., 1996. Improving the kriging of a soil variable using slope gradient as external drift. Eur. J. Soil Sci. 47, 473–483. 42 P. GooÕaertsr Geoderma 89 (1999) 1–45 Bourgault, G., Journel, A.G., Lesh, S.M., Rhoades, J.D., Corwin, D.L., 1995. Geostatistical analysis of a soil salinity data set. In: Corwin, D.L., Loague, K. ŽEds.., Proc. of the 1995 Bouyoucos Conference on the Applications of GIS to the Modeling of Non-Point Source Pollutants in the Vadose Zone, Riverside, CA. 1–3 May 1995. U.S. Salinity Lab., Riverside, CA, pp. 53–114. Burrough, P.A., 1983. Multiscale sources of spatial variation in soil: II. A non-Brownian fractal model and its application in soil survey. J. Soil Sci. 34, 599–620. Burrough, P.A., 1989. Fuzzy mathematical methods for soil survey and land evaluation. J. Soil Sci. 40, 477–492. Chu, J., 1996. Fast sequential indicator simulation: beyond reproduction of indicator variograms. Math. Geol. 28, 923–936. Colin, P., Froidevaux, R., Garcia, M., Nicoletis, S., 1996. Integrating geophysical data for mapping the contamination of industrial sites by polycyclic aromatic hydrocarbons: a geostatistical approach. In: Rouhani, S., Srivastava, R.M., Desbarats, A.J., Cromer, M.V., Johnson, A.I. ŽEds.., Geostatistics for Environmental and Geotechnical Applications. American Society for Testing and Materials STP 1283, Philadelphia, pp. 69–87. Desbarats, A.J., 1996. Modeling spatial variability using geostatistical simulation. In: Rouhani, S., Srivastava, R.M., Desbarats, A.J., Cromer, M.V., Johnson, A.I. ŽEds.., Geostatistics for Environmental and Geotechnical Applications. American Society for Testing and Materials STP 1283, Philadelphia, pp. 32–48. Deutsch, C.V., Journel, A.G., 1998. GSLIB: Geostatistical Software Library and User’s Guide: 2nd edn. Oxford Univ. Press, New York, 369 pp. Deutsch, C.V., Lewis, R., 1992. Advances in the practical implementation of indicator geostatistics. In: Proceedings of the 23rd International APCOM Symposium, Tucson, AZ, Society of Mining Engineers, pp. 169–179. de Gruijter, J.J., Walvoort, D.J.J., van Gaans, P.F.M., 1997. Continuous soil maps—a fuzzy set approach to bridge the gap between aggregation levels of process and distribution models. Geoderma 77, 169–195. Dobermann, A., Goovaerts, P., George, T., 1995. Sources of soil variation in an acid Ultisol of the Philippines. Geoderma 68, 173–191. Dobermann, A., Goovaerts, P., Neue, H.U., 1997. Scale-dependent correlations among soil properties in two tropical lowland rice fields. Soil Sci. Soc. Am. J. 61, 1483–1496. Garcia, M., Froidevaux, R., 1997. Application of geostatistics to 3D modelling of contaminated sites: a case study. In: Soares, A., Gomez-Hernandez, J., Froidevaux, R. ŽEds.., geoENV ´ ´ I—Geostatistics for Environmental Applications. Kluwer Academic Publishers, Dordrecht, pp. 309–325. Gomez-Hernandez, J.J., 1997. Issues on environmental risk assessment. In: Baafi, E.Y., Schofield, ´ ´ N.A. ŽEds.., Geostatistics Wollongong ’96. Kluwer Academic Publishers, Dordrecht, pp. 15–26. Goovaerts, P., 1992. Factorial kriging analysis: a useful tool for exploring the structure of multivariate spatial soil information. J. Soil Sci. 43, 597–619. Goovaerts, P., 1994a. Comparison of coIK, IK, and mIK performances for modeling conditional probabilities of categorical variables. In: Dimitrakopoulos, R. ŽEd.., Geostatistics for The Next Century. Kluwer Academic Publishers, Dordrecht, pp. 18–29. Goovaerts, P., 1994b. On a controversial method for modeling a coregionalization. Math. Geol. 26, 197–204. Goovaerts, P., 1994c. Study of spatial relationships between two sets of variables using multivariate geostatistics. Geoderma 62, 93–107. Goovaerts, P., 1994d. Comparative performance of indicator algorithms for modeling conditional probability distribution functions. Math. Geol. 26, 389–411. P. GooÕaertsr Geoderma 89 (1999) 1–45 43 Goovaerts, P., 1997a. Geostatistics for Natural Resources Evaluation. Oxford Univ. Press, New York, 512 pp. Goovaerts, P., 1997b. Accounting for local uncertainty in environmental decision-making processes. In: Baafi, E.Y., Schofield, N.A. ŽEds.., Geostatistics Wollongong ’96. Kluwer Academic Publishers, Dordrecht, pp. 929-940. Goovaerts, P., 1997c. Kriging vs. stochastic simulation for risk analysis in soil contamination. In: Soares, A., Gomez-Hernandez, J., Froidevaux, R. ŽEds.., geoENV I—Geostatistics for Envi´ ´ ronmental Applications. Kluwer Academic Publishers, Dordrecht, pp. 247-258. Goovaerts, P., 1998. Ordinary cokriging revisited. Math. Geol. 30, 21–42. Goovaerts, P., 1999. Geostatistical tools for deriving block-averaged values of soil properties. J. Environ. Quality, submitted. Goovaerts, P., Sonnet, Ph., 1993. Study of spatial and temporal variations of hydrogeochemical variables using factorial kriging analysis. In: Soares, A. ŽEd.., Geostatistics Troia ´ ’92, Kluwer Academic Publishers, Dordrecht, pp. 745-756. Goovaerts, P., Webster, R., 1994. Scale-dependent correlation between topsoil copper and cobalt concentrations in Scotland. Eur. J. Soil Sci. 45, 79–95. Goovaerts, P., Journel, A.G., 1995. Integrating soil map information in modelling the spatial variation of continuous soil properties. Eur. J. Soil Sci. 46, 397–414. Goovaerts, P., Webster, R., Dubois, J.-P., 1997. Assessing the risk of soil contamination in the Swiss Jura using indicator geostatistics. Environ. Ecol. Stat. 4, 31–48. Gotway, C.A., Hartford, A.H., 1996. Geostatistical methods for incorporating auxiliary information in the prediction of spatial variables. J. Agric. Biol. Environ. Stat. 1, 17–39. Goulard, M., 1989. Inference in a coregionalization model. In: Armstrong, M. ŽEd.., Geostatistics, Kluwer Academic Publishers, Dordrecht, pp. 397-408. Goulard, M., Voltz, M., 1992. Linear coregionalization model: tools for estimation and choice of cross-variogram matrix. Math. Geol. 24, 269–286. Heuvelink, G.B.M., 1996. Identification of field attribute error under different models of spatial variation. Int. J. GIS 10, 921–935. Heuvelink, G.B.M., Bierkens, M.F.P., 1992. Combining soil maps with interpolations from point observations to predict quantitative soil properties. Geoderma 55, 1–15. Heuvelink, G.B.M., Musters, P., Pebesma, E.J., 1997. Spatio-temporal kriging of soil water content. In: Baafi, E.Y., Schofield, N.A. ŽEds.., Geostatistics Wollongong ’96. Kluwer Academic Publishers, Dordrecht, pp. 1020-1030. Isaaks, E.H., 1990. The Application of Monte Carlo Methods to the Analysis of Spatially Correlated Data. PhD thesis, Stanford University, Stanford, CA. Isaaks, E.H., Srivastava, R.M., 1989. An Introduction to Applied Geostatistics. Oxford Univ. Press, New York, 561 p. Journel, A.G., 1983. Non-parametric estimation of spatial distributions. Math. Geol. 15, 445–468. Journel, A.G., 1984. The place of non-parametric geostatistics. In: Verly, G., David, M., Journel, A.G., Marechal, A. ŽEds.., Geostatistics for Natural Resources Characterization, Reidel, ´ Dordrecht, pp. 307-355. Journel, A.G., 1987. Geostatistics for the Environmental Sciences. EPA project no CR 811893. Technical report, US Environmental Protection Agency, EMS Laboratory, Las Vegas, NV. Journel, A.G., 1993. Geostatistics: roadblocks and challenges. In: Soares, A. ŽEd.., Geostatistics Troia ´ ’92, Kluwer Academic Publishers, Dordrecht, pp. 213-224. Journel, A.G., 1997. Geostatistics: tools for advanced spatial modeling in GIS. In: Corwin, D.L., Loague, K. ŽEds.., Application of GIS to the Modeling of Non-point Source Pollutants in the Vadose Zone, Special SSSA Publications, pp. 39–55. Journel, A.G., Huijbregts, C.J., 1978. Mining Geostatistics. Academic Press, New York, 600 p. Kyriakidis, P.C., 1997. Selecting panels for remediation in contaminated soils via stochastic 44 P. GooÕaertsr Geoderma 89 (1999) 1–45 imaging. In: Baafi, E.Y., Schofield, N.A. ŽEds.., Geostatistics Wollongong ’96. Kluwer Academic Publishers, Dordrecht, pp. 973-983. Lark, R.M., Bolam, H.C., 1997. Uncertainty in prediction and interpretation of spatially variable data on soils. Geoderma 78, 263–282. Leenaers, H., Okx, J.P., Burrough, P.A., 1990. Employing elevation data for efficient mapping of soil pollution on floodplains. Soil Use and Management 6, 105–114. Leonte, D., Schofield, N., 1996. Evaluation of a soil contaminated site and clean-up criteria: a geostatistical approach. In: Rouhani, S., Srivastava, R.M., Desbarats, A.J., Cromer, M.V., Johnson, A.I. ŽEds.., Geostatistics for Environmental and Geotechnical Applications. American Society for Testing and Materials STP 1283, Philadelphia, pp. 133-145. Matheron, G., 1976. A simple substitute for conditional expectation: the disjunctive kriging. In: Guarascio, M., David, M., Huijbregts, C.J. ŽEds.., Advanced Geostatistics in the Mining Industry, Reidel, Dordrecht, pp. 221-236. Matheron, G., 1982. Pour une analyse krigeante de donnees ´ regionalisees. ´ ´ Centre de Geostatis´ tique, Ecole des Mines de Paris, Report N-732, Fontainebleau. McBratney, A.B., Webster, R., 1983. Optimal interpolation and isarithmic mapping of soil properties: V. Co-regionalization and multiple sampling strategy. J. Soil Sci. 34, 137–162. McBratney, A.B., Webster, R., 1986. Choosing functions for semi-variograms of soil properties and fitting them to sampling estimates. J. Soil Sci. 37, 617–639. Myers, D.E., 1982. Matrix formulation of co-kriging. Math. Geol. 14, 249–257. Odeh, I.O.A., McBratney, A.B., Slater, B.K., 1997. Predicting soil properties from ancillary information: non-spatial models compared with geostatistical and combined methods. In: Baafi, E.Y., Schofield, N.A. ŽEds.., Geostatistics Wollongong ’96. Kluwer Academic Publishers, Dordrecht, pp. 1008-1019. Oliver, M.A., Webster, R., 1986a. Semi-variograms for modelling the spatial pattern of landform and soil properties. Earth Surface Processes and Landforms 11, 491–504. Oliver, M.A., Webster, R., 1986b. Combining nested and linear sampling for determining the scale and form of spatial variation of regionalized variables. Geogr. Anal. 18, 227–242. Papritz, A., Fluhler, H., 1994. Temporal change of spatially autocorrelated soil properties: optimal ¨ estimation by cokriging. Geoderma 62, 29–43. Rautman, C.A., 1997. Geostatistics and cost-effective environmental remediation. In: Baafi, E.Y., Schofield, N.A. ŽEds.., Geostatistics Wollongong ’96. Kluwer Academic Publishers, Dordrecht, pp. 941-950. Rivoirard, J., 1994. Introduction to Disjunctive Kriging and Non-linear Geostatistics. Oxford Univ. Press, New York, 180 pp. Smith, J.L., Halvorson, J.J., Papendick, R.I., 1993. Using multiple-variable indicator kriging for evaluating soil quality. Soil Sci. Soc. Am. J. 57, 743–749. Soares, A., 1992. Geostatistical estimation of multi-phase structures. Math. Geol. 24, 149–160. Srivastava, R.M., 1987. Minimum variance or maximum profitability?. Canadian Industrial Mining Bulletin 80, 63–68. Stein, A., Hoogerwerf, M., Bouma, J., 1988. Use of soil-map delineations to improve Žco.kriging of point data on moisture deficits. Geoderma 43, 163–177. Sterk, G., Stein, A., 1997. Mapping wind-blown mass transport by modeling variability in space and time. Soil Sci. Soc. Am. J. 61, 232–239. Trangmar, B.B., Yost, R.S., Uehara, G., 1985. Application of geostatistics to spatial studies of soil properties. Advances in Agronomy 38, 45–94. Vanderborght, J., Jacques, D., Mallants, D., Tseng, P.H., Feyen, J., 1997. Analysis of solute redistribution in heterogeneous soil: II. Numerical simulation of solute transport. In: Soares, A., Gomez-Hernandez, J., Froidevaux, R. ŽEds.., geoENV I—Geostatistics for Environmental ´ ´ Applications. Kluwer Academic Publishers, Dordrecht, pp. 283–295. P. GooÕaertsr Geoderma 89 (1999) 1–45 45 Van Meirvenne, M., Scheldeman, K., Baert, G., Hofman, G., 1994. Quantification of soil textural fractions of Bas-Zaire using soil map polygons andror point observations. Geoderma 62, 69–82. Verly, G., 1983. The multi-Gaussian approach and its applications to the estimation of local reserves. Math. Geol. 15, 259–286. Vieira, S.R., Hatfield, J.L., Nielsen, D.R., Biggar, J.W., 1983. Geostatistical theory and application to variability of some agronomical properties. Hilgardia 51, 1–75. Voltz, M., Goulard, M., 1994. Spatial interpolation of soil moisture retention curves. Geoderma 62, 109–123. Voltz, M., Webster, R., 1990. A comparison of kriging, cubic splines and classification for predicting soil properties from sample information. J. Soil Sci. 41, 473–490. Wackernagel, H., 1988. Geostatistical techniques for interpreting multivariate spatial information. In: Chung, C.F., Fabbri, A.G., Sinding-Larsen, R. ŽEds.., Quantitative Analysis of Mineral and Energy Resources. Reidel, Dordrecht, pp. 393–409. Wackernagel, H., 1994. Cokriging versus kriging in regionalized multivariate data analysis. Geoderma 62, 83–92. Wackernagel, H., 1995. Multivariate Geostatistics: An Introduction with Applications. SpringerVerlag, Berlin, 256 p. Warrick, A.W., Myers, D.E., Nielsen, D.R., 1986. Geostatistical methods applied to soil science. In: Methods of Soil Analysis, Part 1. Physical and Mineralogical Methods. Agronomy Monograph no. 9, 2nd edn., pp. 53–82. Webster, R., 1991. Local disjunctive kriging of soil properties with change of support. J. Soil Sci. 42, 301–318. Webster, R., Oliver, M.A., 1989. Optimal interpolation and isarithmic mapping of soil properties: VI. Disjunctive kriging and mapping the conditional probability. J. Soil Sci. 40, 497–512. Webster, R., Oliver, M.A., 1993. How large a sample is needed to estimate the regional variogram adequately? In: Soares, A. ŽEd.., Geostatistics Troia ´ ’92, Kluwer Academic Publishers, Dordrecht, pp. 155–166. Webster, R., Atteia, O., Dubois, J.-P., 1994. Coregionalization of trace metals in the soil in the Swiss Jura. Eur. J. Soil Sci. 45, 205–218. Yates, S.R., Warrick, A.W., 1987. Estimating soil water content using cokriging. Soil Sci. Soc. Am. J. 51, 23–30. Zhang, R., Warrick, A.W., Myers, D.E., 1992. Improvement of the prediction of soil particle size fractions using spectral properties. Geoderma 52, 223–234. Zhu, H., Journel, A.G., 1993. Formatting and integrating soft data: Stochastic imaging via the Markov-Bayes algorithm. In: Soares, A. ŽEd.., Geostatistics Troia ´ ’92, Kluwer Academic Publishers, Dordrecht, pp. 1–12.