Determination of the optimum spatiotemporal sampling density to map soil pollution in an intensive mining area K. Modis1,* and K. Vatalis2 1 School of Mining and Metallurgical Engineering, National Technical University of Athens, 9 Heroon Polytechniou St, 157 80 Zografou Campus, Athens, Greece 2 Dept of Geotechnology & Environmental Engineering, Technological Educational Institution of W Macedonia, 501 00 Kozani, Greece *Corresponding author: E-mail: kmodis@mail.ntua.gr, Tel +30 210 7722323, Fax: +30 210 7722156 Abstract According to previously established theoretical analysis and under certain conditions, a critical sampling grid can be determined for an earth-related spatiotemporally-distributed natural variable. Sampling above this critical limit adds little to the mapping results, while, based on this limit, the ideal process of reproducing the original phenomenon is theoretically defined. The objective of this paper is the application of the above theory to the study of soil pollution in the wider lignite opencast mining and industrial area of Ptolemais, western Macedonia. Geochemical data related to environmental studies in this area show that the waste characteristics favor solubilisation and mobilization of inorganic contaminants and in some cases the generation of acidic leachates. Data acquired for soil pollution by arsenic and various heavy metals were analyzed in the space and time domains, in order to derive their respective correlation functions. The spatiotemporal structural analysis of the deposit generated a spherical covariance model with two ranges, one for the spatial and one for the temporal part, from which the critical sampling grid was identified. The conclusion drawn is that in some parts of the studied area, the sampling grid is denser than required. Finally, an optimal spatiotemporal sampling grid is proposed in order to gain the maximum information at the lowest cost. Keywords: optimal sampling density; spatiotemporal geostatistics; soil pollution; risk assessment; 1. INTRODUCTION Lignite, which is a poor quality coal with properties intermediate to those of bituminous coal and peat, is the only significant domestic fossil fuel in Greece. There are significant resources of lignite, located mostly in the northern part of the country, in the region of Western Macedonia. Total exploitable lignite reserves are estimated at about 4 billion tons [1]. Usually, activities associated with lignite and coal mining and beneficiation result in the production of huge volumes of reactive solid wastes and the subsequent generation of acidic leachates containing heavy metals and metalloids causing widespread contamination of soil, surface- and groundwater [2]. During lignite combustion, the major, minor and trace elements may concentrate in the fly ash while the mineral matter undergoes a series of physical and chemical changes [3]. The assessment of the degree of soil and water contamination in wider mining and waste disposal sites is in general a complex process and involves calculation of ecological or health risk in mining, waste disposal, industrial, agricultural or residential areas as well as in ecosystems. The conventional methodology used is based on the principle “source – pathway – target” and accounts for spatial and temporal variability of contaminant patterns. A probabilistic assessment incorporates variability of parameters and uncertainty in measurements [4]. Geostatistics is used to predict the extent of soil and groundwater contamination as well as to calculate the risk in active or abandoned mining, waste disposal and urban sites, by accounting for the spatial distribution and uncertainty of the estimates. It facilitates quantification of the spatial features of soil parameters and enables spatial interpolation [5], [6]. Even tough highly sophisticated in cases, geostatistical and other estimators are affected by the quality of the sampling campaign. An important question that should be answered is how representative of the formations the samples are, despite the estimation procedure. Apart from core recovery and accuracy of chemical analyses that can be improved using appropriate techniques, the size of the sampling grid is the main parameter that affects the accuracy of the estimation. In the present paper, using a mathematical approach based on information theory, the estimation of the optimum sampling density for the prediction of soil contamination in the wider lignite opencast mining and industrial area of Ptolemais is presented. Up to date, the usual practice for determining an optimum grid size was to employ the estimation variance as a criterion of efficiency [7], [8]: One experiments with various grid setups and plots the average estimation variance as a function of sampling density. The optimal grid size is the threshold for which further decrease offers no estimation variance improvement. The contribution of this work is that, avoiding this subjective and awkward process, it calculates the theoretical optimum sampling grid size, with a minimum mean square error, that enables the accurate representation of the existing formations. Therefore this work is complementary to the methodology followed up to date. 2. MATERIALS AND METHODS The area under study covers a 600 km2 (30 x 20 km) field belonging to the wider lignite mining and waste disposal region of Ptolemais, 600 Km North of Athens, Greece (Figure 1). Figure 1. Map of the study area with marks of the sampling locations 2.1. Related notions and theorems from the sampling theory Originally developed for deterministic electrical signals [9], [10], information theory states that a band limited random waveform can be reconstructed by its samples if the sampling rate is greater than a critical value depending on the signal characteristics [11]. If the spectrum of a sampled signal is obtained, then from the uniqueness of the Fourier transform, the spectrum of the original signal can also be obtained and in return the reconstruction of the original image can be achieved [12]. The critical sampling rate is called the Nyquist rate. In case of real world sampling, random noise is always present; thus the signal can be modelled by a random function f(x). Then as the sampling theorem states if f(x) is sampled uniformly at distance Δ(x), it can be recovered without error from the sample values f(mx), m = 1, 2, …, ∞, provided that the sampling frequency ξxs=1/ Δ(x) is grater than the Nyquist frequency (Figure 2) [13]. The Nyquist frequency for band limited signals is twice the signal bandwidth. Figure 2. Signal reconstruction A continuous stationary random field f(x) is called band limited if its power spectral density function S(ξ) is band limited [13]. Consequently, it is proven that f ( x) m f (mx)sinc( x s m) converges to f(x) in the mean square sense, where ξxs=1/ Δ(x) while the sinc function, defined as 1, x 0 sinc( x) sin( x) x ,x 0 is the continuous inverse Fourier transform of the rectangular pulse Π(x) of width 2π and height 1. The sampling theorem stated above says that a continuous signal sampled in larger rates than its Nyquist rate can be reconstructed with no error by convolving the sampled signal with a sinc filter of infinite length. In the case of two-dimensional discrete images two sinc filters are used, one for each dimension. In practice, the two sinc functions are truncated in a finite length S determined by the size of the interpolation grid. Figure 3. Fourier transform of the spherical correlation model Returning now to earth related sciences, natural variables under investigation are usually modelled as random fields. In that case, it is demonstrated by Modis and Papaodysseus in [14] that the most common random function models described by a covariance function with an influence range a, are approximately band limited (as for example the “spherical scheme” in Figure 3) with Nyquist interval (lag) Δs = a/2. In other words, if the structural analysis of the ore body reveals an underlying structure with a certain influence range, then the optimum exploratory grid size is determined to half the value of this range. 2.2. Sampling procedure and statistics Most trace elements in coal are associated with the mineral fraction although some are organically bound [15]. At sufficiently high exposures, some of these elements (e.g. As, Cd, Cr, Hg, Ni, Pb, Se, Bi and U) can be harmful to human health and to the environment. Although these elements are present in small concentrations in the coal basin (generally a few ppm), the vast amount of coal that is burned annually mobilizes tons of these pollutants [16]. Deposition of these pollutants downwind from power plants can lead to high trace element concentrations in soil and uptake by plants. The pollutants can retard plant growth or enter the food chain, causing adverse health effects in animals and humans [17], [18]. During the period from early 2003 to 2006, 101 soil samples from a set of 48 locations in the wider area (Figure 1) were taken at various time periods, covering not only the waste disposal site but also the surrounding cultivated areas. A number of yearly samples are available for each location, varying from 1 to 4. The samples were collected from a depth of 10-15 and 15-30 cm and analyzed for a variety of characteristics, including the presence of As, Cr, Ni, Bi and other heavy metals. Table 1. Means, ranges and tentative allowable concentrations (ppm) of chemical pollutants pollutant mean stdev min median max TAC As 12.30 10.10 2.40 7.10 42.20 7.00 Cr 17.50 19.90 0.60 6.50 60.50 55.00 Ni 10.10 10.50 0.40 6.50 57.50 20.00 Bi 12.70 16.10 0.00 0.30 43,50 0,20 The arithmetic average concentrations and ranges of most important pollutants (in ppm), are seen in Table 1, followed by the Clarke and Sloss [19], Swaine [20] and Adriano [21] Tentative Allowable Concentration limits (TAC) for these pollutants in soil. A first processing of the data reveals that 51% of the samples exceed the TAC for As, 14% for Cr, 19% for Ni and 61% for Bi. 2.3 Variography Further statistical analysis of the samples spatial distribution and the application of the geostatistial methodology were carried out using the SEKS-GUI software library [22]. This methodology has already been applied successfully to soil and ground water modeling problems [23], [24], [14]. In order to ensure the proper calculation of the covariance function, which will be the main structural characteristic of the pollutants spatiotemporal distribution, it is necessary to work on a homogeneous dataset. Detrending of the data is done by the application of a Gaussian kernel [25]. In order to assess data normality, a Normal Scores transformation [26] is also applied to the detrended data. The next step in the exploratory analysis is the investigation of the systematic dependencies in the data. A physically and statistically acceptable covariance model is sought to describe the correlation among data. Figure 4 shows the average experimental spatiotemporal covariance over all directions and the fitted spherical model for the case of Cr concentrations. The other pollutants also exhibit similar behaviour. A spatial range of influence of 3000 to 5000 m appears according to model. In the temporal part, a spherical structure again is evident in all pollutant distributions with ranges from 3 to 5 years respectively. Figure 4. Cr concentrations experimental and model normalized spatiotemporal covariance. 3. RESULTS AND DISCUSSION Using the above mentioned Modis and Papaodysseus [14] formula to estimate the optimal sampling grid, we get a value of 1500 to 2500 m for the spatial part and 1 to 2 years for temporal, according to model. From the above analysis and referring to Figure 1, it is apparent that the area, while locally over sampled, is generally under-sampled, since the average sampling grid is sparser than the recommended. The sampling locations are distributed unevenly, leading to over-sampling of certain neighbourhoods, while under-sampling the rest, while the optimum sampling grid might have been obtained with a similar more or less cost. It is clear that in order to reduce the costs of the sampling survey while maximizing the information input, a more effective planning of the sampling grid could be carried out. From the temporal point of view, it seems that the area is sufficiently sampled with the annual samples, though a biennial sampling rate could possibly be enough for certain pollutants. 4. CONCLUSIONS From the covariance models generated by the structural analysis, one for each pollutant, an average critical spatiotemporal sampling grid was identified. The conclusion drawn from the analysis is that in some parts of the area the sampling grid is denser than required. The design of the sampling program can be formulated as an optimisation problem. The selection of the appropriate sampling network in relation to the grid size can contribute to save money and time as well as to maximize the information about the spatiotemporal distribution of soil pollution. References 1. Kavouridis, K., 2008. Lignite industry in Greece within a world context: Mining, energy supply and environment. Energy policy, 36, 1257-1272. 2. Komnitsas, K., A. Kontopoulos, I. Lazar, M. Cambridge, 1998. Risk assessment and proposed remedial actions in coastal tailings disposal sites in Romania, Miner. Eng., 1179-1190. 3. Filippidis, A. and Georgakopoulos, A., 1992. Mineralogical and chemical investigation of fly ash from the main and northern lignite fields in Ptolemais Greece, Fuel, 71(4), 373-376 4. Komnitsas, K. and K. Modis, 2006. Soil risk assessment of As and Zn contamination in a coal mining region using geostatistics, Sci. Total Environ., 190-196. 5. Journel, A.G. and Huijbregts, C.J., 1978. Mining Geostatistics, Academic Press, London 6. Modis, K. and Komnitsas, K., 2007. Optimum sampling density for the prediction of acid mine drainage in an underground sulphides mine. Mine Water Environ, 237–242. 7. David, M., 1976. What Happens If?- Some Remarks on Useful Geostatistical Concepts in the Design of Sampling Patterns, on Proceedings, Australasian Institute of Mining and Metallurgy, Symposium on Sampling Practices in the Minerals Industry, 1- 15. 8. Dowd P. A. and Milton D. W., 1987. Geostatistical Estimation of a Section of the Perseverance Nickel Deposit on Geostatistical Case Studies, G. Matheron and M. Armstrong (eds), Reidel, 39- 67. 9. Whittaker, E. T., 1915. On the Functions which are Represented by the Expansions of the Interpolation Theory. Proc. Roy. Soc., Edinburg, Section A 35, 181-194. 10. Shannon, C. E., 1949. Communications in the Presence of Noise, Proc. IRE 37, 10- 21. 11. Lloyd, S. P., 1959. A Sampling Theorem for Stationary (Wide Sense) Stochastic Processes, Trans. Am. Math. Soc. 92, 1-12. 12. Papoulis A. and Pillai U., 2002. Probability, Random Variables and Stochastic Processes, McGraw Hill, New York. 13. Jain, A. K., 1989. Fundamentals of Digital Image Processing, Prentice Hall, 84- 99. 14. Modis K. and Papaodysseus K., 2006. Theoretical Estimation of the Critical Sampling Size for Homogeneous Orebodies with Small Nugget Effect on Mathematical Geology, 38, 489-501. 15. Finkelman, R.B., 1981. Modes of occurrence of trace elements in coal. US Geological Survey Open-File Report, 81-99. 16. Orem, W.H. and Finkelman, R.B., 2004. Coal Formation and Geochemistry. US Geological Survey, Reston, VA, USA. Treatise on Geochemistry, 7, 191-222 17. Adriano, D.C., 2001. Trace Elements in Terrestrial Environments Biogeochemistry, Bioavailability and Risks of Metals, Springer, New York 18. Swaine, D.J., 1989. Environmental aspects of trace elements in coal, J. Coal Qual. 8, 67–71. 19. Clarke, L.B. and Sloss, L.L., 1992. Trace elements from coal combustion and gasification. IEACR 149, IEA Coal Research, London. 20. Swaine, D.J., 1990. Trace Elements in Coal, Butterworths, London. 21. Adriano, D.C., 1986. Trace elements in the terrestrial environment, Springer-Verlag. New York. 22. Yu, H-L., Kolovos, A., Christakos, G., Chen, J-C., Warmerdam, S. and Dev, B., 2007. Interactive spatiotemporal modeling of health systems: the SEKS–GUI framework. Stoch Environ Res Risk Assess, 555- 572. 23. Christakos, G., 1998. Spatiotemporal information systems in soil and environmental sciences, Geoderma, 141-179 24. Serre, M.L., G. Christakos, H. Li, C. T. Miller, 2003. A BME solution of the inverse problem for saturated groundwater flow, Stoch Environ Res Risk Assess, 354-369. 25. Hart, J.D. and Wehrly, T.E., 1986. Kernel regression estimation using repeated measurements data, J. Am. Statist. Assoc., 1080–1088. 26. Bogaert, P., Serre, M. and Christakos, G., 1999. Efficient computational BME analysis of nonGaussian data in terms of transformation functions, in S. J. Lippard, A. Naess and R. SindingLarsen (Eds): proceedings of IAMG’99- Fifth Annual Conference of the International Association for Mathematical Geology, Tapir, Trondheim, Norway, 57- 62.