This file was created by scanning the printed publication. Errors identified by the software have been corrected; however, some errors may remain. Optimum Area Sampling Frame Using High Resolution Satellite Images with Operational Objective : How to Conciliate Statistical Requirements and Practical Aspects? Helene de ~oissezon',Herve ~eanjear? Abstract: If high resolution satellite images can be used for measuring land use characteristics, it is most often impossible to envisage an exhaustive coverage of large areas. Sample survey methods represent a feasible alternative when assessing land use surface. In many cases, classical sampling survey techniques can be used. However, when spatial correlation is observed between sites, geostatistics principles could be introduced into the inventory design. Using classical statistical techniques, the target variable has to be estimated from the elementary measurements on the sample sites through an extrapolation model. When designing such statistical process, many issues have to be addressed, such as the characteristics of the population to be considered, the sampling design itself, the extrapolation model to be applied on the whole study area, and the sampling error. Practical considerations and constraints should not be underestimated in any sample survey. In this paper, statistical issues on sampling frame design based on high resolution satellite images are *addressedwith emphasis on practical implementation. Sampling survey designs of some operational remote sensing projects in the world are reviewed, such as FAO/FRA 90, and the European MARS, TREES, FIRS programmes. Some recommandations are then proposed for setting up sound sampling schemes. INTRODUCTION In Earth observation, the analysis and monitoring of land surface characteristics such as proportion of crops or forest types are often requiring high resolution satellite images. Although some programmes are using this type of data on a wall to wall basis, it becomes difficult to assess large areas with a full 'Remote sensing expert in agronomic statistics, SCOT Conseil, 1 rue Hermes, F-31526Ramonville France Remote sensing expert inforestry, SCOT Conseil, I rue Hermes, F-31526Romonville France coverage, because of the high cost induced and the huge amount of data to be processed. A sampling scheme, based on sound statistical concepts, represents a feasible method. Some operational regional and international programmes have already developed and implemented such procedures. However, the sampling scheme has been sometimes designed a posteriori, e.g. the samples have not been selected and located under rigourous rules at the beginning. Moreover, pratical constraints can be either overestimated or underestimated, whether they are given too much or too little consideration. This situation results in problems in final accuracy and in implementing an extrapolation model. This paper is addressing the optimization of an optimum area sampling frame design : consideration will be given to both statistical requirements and practical constraints. RECENT ADVANCES IN SAMPLING DESIGNS Few investigations have been carried out on the most adapted statistical procedures for covering large areas with a sample of high resolution images. However, the litterature is rather rich in research studies concerning small areas within satellite images, where sampling schemes can be tested and evaluated (the true variance can be calculated, and the sampling can be repeated as many times as wishes). The geostatistical approach Geostatistical concepts constitute an alternative to classical sampling techniques. They assume that the sample units are not independent, and that the autocorrelation can be used in the model. Geostatistics requires a systematic sampling rather than a random sampling. The sernivariogram is the basic tool in geostatistics. In remote sensing images, strong correlation has been observed between landscapes (Ramstein and RaQ, 1989; Webster and al., 1989; Gohin and Langlois, 1993; Wood and al., 1994). McGwire and al. (1993) has shown that the spatial correlation is increasing with the sample unit size. Atkinson (1995) confirms this result and has pointed out that the gain of a systematic sampling by comparison to a random sampling is increasing with the size of the sample unit. A minimum distance between sample sites can reduce the effects of the spatial correlation (the sampling units become less dependent), and thus can improve the efficiency of the scheme. According to McGwire and al. (1993), geostatistics may be tested and applied in wide vegetation mapping and monitoring projects. The studies carried out by Jupp and al. (1988) constitute a good starting point for developing such methods. In spite of this suggestions, no complete study has proved the applicability of geostatistical concepts when sampling large areas (country to continent scale) with high resolution images. In particular, the spatial correlation between SPOT or TM scenes selected on a systematic basis has still to be proved. The main bottleneck for evaluating the usefblness of geostatistical bases is related to the important means (number of satellite scenes and processing time) which would be required for building and testing a semivariogram. Therefore, classical statistical concepts are generally still prevailing since their application rules and their limits are well known. Gain of systematic sampling Previous studies have shown that systematic sampling is becoming more efficient when the sample size or the sampling intensity increases (Atkinson, 1995). In fact, as the sample size increases, the sampling units are likely to be more and more correlated since they are closer to each other. However, it is always recommended to carefidly examine the spatial behaviour of the variable to estimate. If the variable is following a periodicity (sequence of ridges and valleys for instance), a bias is likely to be introduced. If several simulations confirmed the gain of systematic sampling (Cochran, 1981; Brion and Fournier, 1995), the magnitude of the gain is varying and is dependending on the spatial correlation (Dunn and Harrison, 1993). Multi phaselstage sampling When the population to be assessed in not well known, e.g. when stratification is not possible, it may be convenient to proceed to a multiple phase sampling. In remote sensing, a double phase sampling with regression has been tested or recommended in many studies, whether between high resolution images and ground measurements (Smiatek, 1995) or between coarse resolution satellite images and high resolution satellite images (Mayaux and Lambin, 1995; Kleinn and al., 1993; Piiivinen and Pitkiinen, 1992; Iverson and al., 1989; Nelson and Holben, 1986). Multistage sampling is a feasible alternative. In that case, the total variance is often mainly due to the first stage. Gain of stratification The a priori stratification consists in partinioning the population with criteria well correlated to the variable to estimate. The final gain is depending on this relationship. The a posteriori stratification, which consists in grouping sampling units in ensembles defined after the sampling scheme, may provide good results (Cochran, 1981; Dunn and Harrison, 1983). EXPERIENCE OF SOME LARGE SCALE REMOTE SENSING PROJECT Several operational remote sensing projects have set up sound sampling schemes based on high resolution satellite images as sampling units. The activity B of the MARS project (Monitoring Agriculture by Remote Sensing) is an European programme carried out by the Joint Research Centre at the Institute of Remote Sensing Applications and is aimed at providing fast estimates of crops areas and yields to DGVI in Brussels. A total of 60 sampling sites (40x40 &n2) have been located after stratification, which corresponds to 6 % sampling intensity. The allocation of sites is proportional to the agricultural lands in the country, and is coinciding with SPOT scenes (KJ). A weighting procedure is then employed to derive the extrapolation model. The MERA 95 project (MARS extension and Environment Related Applications) is an extension of MARS to Central and Eastern Europe. The sampling design has been slightly modified, with no stratification, and 40 sites have been selected among a systematic sample base, with proportionality to the agricultural activity. The FA0 Forest Resources Assessment 1990, following recommendations given by statisticians (Czaplewski, 1991) is using a sample of 117 TM sites located randomly in the tropical belt (FAO, 1993). The variables to be estimated are the forest areas and the change of forest cover in the last 10 years. The sampling intensity is 10 %, and a stratification has been performed using ecological criteria. The sample size has been fixed with an objective of 5% of the standard error. The deforestation rate was estimated with an error of 12.5 %, which can be explained by the high coefficient of variation of deforestation. The FIRS project (Forest Information by Remote Sensing) is another activity of JRC, Italy. One of its fbndamental activities was to stratify Europe into 115 homogeneous strata with three main criteria : proportion of forest cover, species composition, and timber volume. The strata are grouped into 6 statistical regions, where 223 sites have been selected with unaligned systematic sampling. The sampling intensity is 3 % of the total area, or 10 % of the forest lands. The TREES project (Tropical Ecosystem Environment observation by Satellites) is using a double sampling scheme with regression between AVHRR derived forest cover proportion as the auxiliary variable and TM derived forest cover proportion as the target variable. The sampling unit is corresponding to a block of 11x11 AVHRR pixels. A total of 1121 units have been selected, each TM scene generating more thant 200 units (cluster sampling). A regression is performed between the forest proportions estimated at TM and AVHRR levels. DESIGNING AN OPTIMUM SCHEME Constraints The constraints in designing an operational sampling scheme are related to the satellite itself (availability of data, acquisition delays, repetitivity for multitemporal analysis...), the operationality of the system (fblfilement of the requirements and techcal feasibility), efficiency and cost-effectiveness (the measurement error induced by remote sensing techques must be compared and added to the sampling error), and the client requirements (terms of reference and required precision at which level of confidence). Definition of a sound sampling scheme using HRSD At a very first stage, it is necessary to identify which land cover parameters are to be analysed. The sampling scheme will depend on the characteristics of the land cover type under investigation, e. g. its surface proportion in the entire domain (study area), the size of the elementary components of this land cover type (size of agricultural plots for instance), its spatial distribution (presence or absence in the different strata of the study area), its occuring frequency (proportion of the land cover type in the strata), and its temporal variation (crops cycle for instance). Based on high resolution satellite images, the variable to estimate with satisfactory precision can be the proportion of the land cover type, e.g. the proportion of agricultural areas or the forest areas. It can be also the variation of proportion with a multitemporal data set. The a priori knowledge of the domain is a critical issue for setting up a sound sampling scheme. The sample must be as representative as possible of the entire population. If limited knowledge is available, multiphase sampling is recommended. In table 1, sampling schemes are proposed with respect to the characteristics of the item. 1. 2. 3. I 4. I Table 1.-Designing a sampling scheme based on the characteristics of the item (Cochran. 1981). charateristics of item type of sampling needed widespread throughout the region, a general survey with low sampling intensity. occuring with reasonable frequency in all parts. widespread throughout the region but a general survey, but with a higher sampling with low frequency. ratio. occuring with reasonable frequency in for best results, a stratified sample with most parts of the region, but with more different intensities in different parts of the sporadic distribution, being absent in 1 region. Can sometimes be included in a1 some parts and highly concentrated in general survey with supplementary sampling. others. distribution very sporadic or concentrated not suitable for a general survey. Requires a in a small part of the region. I sample geared to its distribution. 1 The sampling type has to be examined when using satellite images. Indeed, the measurements are not made on a single point, but within an area (SPOT scene for instance). The results may be affected to the scene centre (sample of points taken from an infinity of points) or to the entire area (area frame sampling with a sample taken from N sites covering the entire region). The definition of the sample base is an important step of the sampling scheme : it is, in principle, composed of all units (points or frames) constituting the entire region. When using SPOT scenes centres as the sample base on very large region, a geometric problem is raised by the narrowing of tracks towards the pole, which induces of higher density of units. The sampling scheme should take into consideration this distortion by selecting the sample-units on a proportional basis (proportional to the distance between two orbits for instance). A stratification is then often required, mostly when the a priori proportion of the item is known with sufficient accuracy. The ideal variate for stratification is the value of the variable to estimate. In practice, other source of information is used, such as ancillary data (existing maps or statistics). The stratification process can lead to different types of strata : the strata in which the item is almost absent and in which no sample will be taken, and the strata with different proportion of the item (from low to high proportion). It is sometimes recommended to delineate special strata which correspond to other criteria (mountainous areas, swampy zones...etc). It is important to measure the total area of each stratum, and to verify that the sum is giving the total area of the inventoried region. The number of strata will be low if the a priori knowledge of the population is limited. Systematic sampling in two dimensions represent a good area sampling frame scheme. An alternative is the unaligned sampling (Cochran, 1981). A combination of systematic sampling (selecting the cluster) and random sampling (selecting the unit in the cluster) may also give good results. The number of sample units within each stratum may be calculated with the following equation @s-1) : with n as the total sample size, nj the sample size in stratum j, Sj the size of stratum j, and Varj the variance of stratum j. The total sample size n is determined by the required overall precision and the available budget. For optimizing the design (cost effectiveness), the Lagrange equation can be used to solve the system. In stratified sampling, the overall estimate of the mean value is weighted by the strata sizes with Nj as the stratum size, Yj the estimated mean value in stratum j, and N the total number of sample units. The total error of the design is a sum of the sampling error and the measurements error. When the measurement error is important, it is not worth designing a complex sampling scheme with a very high precision since the overall precision will be low. CONCLUSION In sampling statistics, the estimates are always given with a confidence interval and a probability level. This means that we are not absolutely sure of the error we have made with the sample. It is sometimes difficult to decide how much error should be tolerated by the client or the user. Moreover, starting from the user requirements and means (budget allocated for the survey), and looking into the characteristics of the parameter to be estimated, it should be pointed out that there is no single optimum sampling design. In our problem consisting in optimizing the estimate of the sample size and the location of the sample units, it can be concluded that geostatistical concepts should be fbrther investigated before being applied in an operational project. The combination of several standard statistical principles often lead to satisfactory results (systematic sampling at first stage and random sampling at second stage for instance). The use of standard principles always simplifies the design where the error can be easily calculated. In operational projects, the use of a satellite reference grid (SPOT or TM) seems to be the best way for building up the sampling base (FAOIFRA 90 and MARS projects). However, in high latitudes, the distorsion of the grid has to be examined in order to avoid an oversampling. This issue should be addressed by global survey over large areas, such as the agricultural statistics project in Russia which has just been launched under European Union support (TACIS) and in which SCOT Conseil is providing technical assistance. REFERENCES Atkinson, P., 1994, Testing the efficiency of sampling strategies with simulated remotely sensed data, in SFPT, no137, p 12. Brion, P., et Fournier, P., 1995, Etude Geostat-Maroc: Partie statistique. Rapport CNES trayaux INSEE, SCEES. Chevrou R.B., D E W Montpellier, Mars 1988, Inventaire Forestier National, Methodes et Procedures. Cochran W. G., Mars 1981, Sampling techniques, thlrd edition. Czaplewslu R.L., Analysis of alternative sample survey designs, FA0 1991. Dagnelie P., 1973, Theorie et methodes statistiques, Volume 1, Centre de Documentation de Toulouse. Dagnelie P., 1975, Theorie et methodes statistiques, Volume 2, Centre de Documentation de Toulouse. Dunn R., Hamson A.R., 1993, Two-dimensional systematic sampling of land use, Appl. Statistics, 42, No 4, pp. 585-60 1. Fitzpatrick-Lins K., Mars 1981, Comparison of sampling procedures and data analysis for a land-use and land-cover map, PERS vol. 47, N 3, pp. 343-35 1. Food and Agriculture Organization, 1993, Forest resources assessment 1990: tropical countries, FA0 Forestry Paper 112, Rome, 6 lpp. Gallego FJ , P Vossen, JF Dallemand, V Perdigao, Sampling plans in MERA Project, MARS project, 1995. Houllier F., 6 Juin 1986, Echantillonnage et modelisation de la dynamique des peuplements forestiers/Application au cas de 1'Inventaire Forestier National, These. Iverson L. R., Cook E.A., Graham R. L., 1989, A techmque for extrapolating and validating forest cover across large regions - Calibrating AVHRR data with TM data, International Journal of Remote Sensing, Vol. 10, no11, p. 1805- 18 12. Kleinn C., Dees M., Pelz D.R., 1993, Sampling aspects in the TREES project - global inventory of tropical forests, final report, Freiburg Universitiit Germany, contract for JRC Ispra, Italie. McGwire K., Fried1 M., Estes J.E., 1993, Spatial structure, sampling design and scale in remotely-sensed imagery of a California savanna woodland, International Journal of Remote Sensing, Vol. 14, no11, p. 2 137-2164. Nelson R. and Holben B., 1986, Identiflmg deforestation in Brazil using multiresolution satellite data, International Journal of Remote Sensing, vol. 7, pp. 429 - 448. Palvinen R. and P i t h e n J., 1992, Calibrating AVHRR data with TM data for tropical forest cover assessment, IUFRO S 4.02.05 Wacharakitti international workshop "Remote sensing and permanent plot techmques for world forest monitoring", Pattaya Thalland. Ram M., Lacaze B., Rarnbal S . and Wlnkel T., August 1994, IdentifLing spatial patterns of Mdterranean landscapes from geostatistical analysis of remotely-sensed data, International Journal of Remote Sensing, Volume 15, Number 12, Special issue: scaling in remote sensing. Ram M., Puech C., August 1994, Thresholds of homogeneity in targets in the landscape. Relationship with remote sensing, International Journal of Remote Sensing, Volume 15, Number 12, Special issue: scaling in remote sensing. SCOT CONSEIL & GAF for JRC EMAP Unit, 1995, FIRS Project, Regionalization and stratification of European forest ecosystems, final report. SCOT CONSEIL, CCE, 1994, Documentation Action IV, Estimations rapides des superfkies et des productions potentielles au niveau europeen, Volume I - Document de synthese. SCOT CONSEIL, CCE, 1994, Documentation Action IV, Estimations rapides des superficies et des productions potentielles au niveau europeen, Volume I11 Methodologie. Smiatek G., 1995, Sampling thematic mapper imagery for land use data, Remote Sensing of Environment, 52: 116-121.