This file was created by scanning the printed publication. Errors identified by the software have been corrected; however, some errors may remain. Sampling Satellite Images for Area Estimates in a Large Region - - - - - - -- - Abstract. This paper describes two alternative methods (square sites and strips) to select a sample of sites for satellite image analysis giving priority to a particular type of area (intensive agricultural or forest) when no suitable stratification is available. Both methods are based on twophase sampling. For square sites, a systematic sample is first drawn on the SPOT reference grid. Pre-selected sites are visually photo-interpreted for a rough estimation of the percentage of agriculturaJ land or forest, that determines the sampling probability in the second phase. The alternative scheme, based on thin satellite image strips, seems to be more efficient than sampling square sites according to a simulation study based on ground survey data in Spain. INTRODUCTION We consider here two problems for which high resolution satellite images are sampled because analysing a complete coverage of is too expensive: Area estimation of a particular land cover (an annual crop, forest, etc.) or estimation of area change between two dates. Accuracy assessment of a land cover map obtained by other means, for example by multi-temporal analysis of low resolution satellite images. The Institute for Remote Sensing Applications (IRSA) of the European Commission (EC) is involved in several projects where these problems are tackled: 1. Rapid estimates of area change for main crops in the European Union (EU) of the MARS Project (Monitoring Agriculture with Remote Sensing). 2. Multi-country crop area estimates in central Europe. This is part of the MERA-95 Project (MARS and Environmental related Applications). 3. FIRS Project (Forest Information from Remote Sensing). Forest mapping and estimation in Europe, from the Atlantic to the Urals (Kennedy et a1 1995). 4. TREES Project (Tropical Ecosystem Environmental observations by Satellite): Forest mapping in the inter-tropical belt (Malingreau et a1 1995). 5. Estimation of crop area change in countries of the former Soviet Union. 1 MARS Project, IRSA, JRC, 21020 Ispra (Va), Italy 509 The examples we present below refer to items 1 and 2 (MARS Project), where the purpose is mainly statistical. Mapping is not a priority. For FIRS and TREES (items 3 and 4) mapping is the main priority, but area estimation and evaluation of mapping accuracy are associated targets (Mayaux and Lambin, 1995). Item 5 refers to a contract of the EC with the French company "SCOT Conseil", in which IRSA will be responsible for site sampling. SAMPLING SITES OF 40 KM x 40 KM This unit size is reasonable for multitemporal analysis of high resolution satellite images. They are large enough to include a variety of land cover types in most landscapes, and, if conveniently located, fit inside marketed images. The "Rapid Crop Area Estimates in the EC" of the MARS Project. Rapid crop area estimates are being produced in the European Union by analysis of high resolution satellite images on a sample of sites of 40 km x 40 krn (figure 1). Estimates are computed for inter-annual variations, not for a particular year (Pous et al, 1995, Sharman and De Boissezon, 1992). The method is run on a non random sample of 53 square sites of 40 km x 40 km for EU- 12 (The 12 members before 1995). With the 3 new member states (Austria, Finland and Sweden), the number of sites has become 60. Ground surveys are conducted on a sample of approximately 16 segments of 1400 m x 1400 m in each site (Carfagna 1995): 40 points per segment are visited and a subsample of farms are interviewed (Gallego et a1 1994). Some problems arise to extrapolate results because of the non-random character of the sample of sites. Sampling Sites in Central Europe A different approach has been followed to draw a sample of sites in Central Europe (Poland, Czech Republic, Slovakia, Hungary, Romania and Bulgaria) in the frame of the MERA-95 Project (MARS and Environmental Related Applications). The sampling procedure, described in more detail by Vossen et a1 (1995), has two-phases: First sampling phase: A systematic sample is selected by blocks on the K-J indexes of the "SPOT reference grid". K is related with the satellite orbit, and J refers to the latitude. Sites on the border are selected if more than 50% is inside the studied region. In the example of central Europe, 58 sites were selected in this phase (figure 2). The sampling intensity is higher in the north of the area because the distance qlutls, between contiguous satellite orbits is shorter. Second sampling phase: Spot XS digital quick-looks or Landsat-TM photographic printouts of the preselected sites were analysed for a rough visual estimation As of the percentage of agricultural land. 40 sites were subsampled without replacement with a probability proportional to P,= As x qluthj (figure 3). Estimates are computed with a HorvitzThomson or n estimator (Sarndal, 1992) using ]/As as weights rather than l/P, to compensate the variable sampling rate in the first phase. Figure 2: Systematic sample of sites based on the SPOT reference grid. Figure 3: Sample of sites for image analysis SAMPLING SEGMENTS IN SATELLITE IMAGE STRIPS An alternative sampling scheme is currently evaluated. This approach is based on long narrow sites. The site is a strip as long as the studied territory along a satellite orbit track (figure 4). The width of the site can be between 1 km and 10 km, depending on the land use category to be estimated. 1 km or 2 km can suit for agricultural estimates, but wider strips would probably be more efficient in forestry. Different sets of sites can be defined for different satellite sensors, such as Landsat-TM, SPOT-XS, or IRS- 1C. Sampling Segments inside the Strips Segment sampling inside the strips follows a two-phase procedure, similar to the site sampling scheme of the previous section. The first phase can be systematic or random; in the simulations presented below we have used systematic sampling, that seems to be generally more efficient in a geographic context (Dunn and Harrison, 1993). In the second phase, a subsample is selected with a probability proportional to an index of agricultural intensity (or forest intensity), and to the distance between satellite orbits. Image Acquisition One important condition for the strip approach to be efficient is that images are marketed by continuous strips instead of approximately square scenes. In the case of Landsat TM, the delivered image for agricultural purpose would have for example a size of 2000 km x 3 km 40 Mbytes instead of the standard frames 180 km x 180 km E 200 Mbytes. = Strips may be divided into pieces corresponding to different receiving stations: During the agricultural season, image strips would be delivered regardless of the cloud coverage. This gives in Europe approximately 11 TM images or 6-7 SPOT images per year. The selection of images that can be exploited for land use identification in a particular segment would be made later by the photo-interpreter. The target is getting an "exploitable sample" of at least n segments. For annual crops, a possible definition of "exploitable segment" can be: a segment for which at least 3 cloud free images are available during two successive agricultural seasons with a time interval of at least one month. The definition of exploitable segment is to be adapted to each case; for many forest applications, one cloud free image may be enough. The required number of second phase sample segments to reach the targeted number of exploitable segments is being studied. Figure 4: Sites on Landsat tracks COMPARING THE EFFICIENCY OF SAMPLING STRATEGIES. We do not have yet data to test the efficiency of the strip approach with real strips following satellite tracks, but we can simulate the scheme with data from ground survey segments. The set of data used for this test comes from 1992 and 1993 ground surveys on 8023 square segments of 49 ha from the Spanish Ministry of Agriculture in an area of 270.000 Krn2 (Castilla y Le6n, Madrid, Castilla-la Mancha and Andalucia). The set of 8023 segments was obtained by systematic sampling with 3 replicates in blocks of 10 krn x 10 km. The sample is described by MAPA (1990), Ambrosio (1993), Fuentes (1994) and Gallego (1995). We consider this set of 8023 segments as population from which we select samples of 80 segments with different strategies (figure 4). Unequal probability subsampling, when applied, was made using 1992 data to compute agricultural intensity indexes. We compare area estimates in 1993 or area change estimates 1993-1992. For each strategy, we have repeated the procedure 100 times in order to estimate variances without unstability problems, that are well known for the variance computation of the Horvitz-Thompson estimator (Cochran, 1977, Sandal, 1992). Sampling Strategies Compared Random: simple random sampling without replacement. This is the reference to compute the relative efficiency of the other strategies as ratio of variances. Systematic: Systematic sampling ordering population segments by geographic co-ordinates (latitude and then longitude). Uniform probability. Proportional: Sample with a probability proportional to arable land in 1992. Sites systematic: Systematic sample by clusters of 16 segments approximately arranged in squares of 40 km x 40 km . Uniform probability. Sites probabilistic: The same as the previous one with probability proportional to the arable land in 1992. This approach nearly coincides with the one described above, that has been proposed for the MERA-95 Project. Strips: Two-stage sampling: systematic sampling of strips 100 krn apart from each other and sub-sample with probability proportional to 1992 arable land. Proportional Sites probabilistic Figure 4: Samples with different strategies 513 Strips Comments on Simulation Results. Some provisional conclusions can be drawn by examining Table 1, that gives the relative efficiency of the studied strategies compared with simple random sampling for the main annual crops and fallow (understood as arable land under repose in the crop rotation). Systematic sampling looks slightly superior to random sampling, but there are some important exceptions. Sampling unclustered segments with probability proportional to an index of agricultural intensity (strategy Proportional) is much more efficient (ratio ranging between 2 and 9) than simple random or systematic sampling. However this strategy may be unsuitable for satellite image acquisition. Sampling by square sites of 40 km x 40 km reduces efficiency both with systematic and unequal probability sampling (Sites systematic compared with Systematic or Sites proportional compared with Proportional ). In the case of systematic sampling, efficiency ratios become close to 1 when area changes are estimated instead of crop area for a particular year. For the more efficient unequal probability sampling, sites of 40 km x 40 km still have a poor behaviour. Sampling by strips is not as good as sampling unclustered segments, but is more efficient than sampling by square sites and is compatible with satellite image acquisition. The efficiency gain with a two-phase sampling, where the second phase is a subsampling with a probability proportional to arable land is higher than the efficiency obtained in most European countries with area frame sampling stratification (Taylor et a1 1996). Some exceptions to the comments above can be noted for crops with a relatively small area, as we can check in the case of sugar beet. Table 1: relative efficiency compared with random sampling Wheat 93 Barlev 93 Cereals 93 Sunflower 93 Su ar beet 93 Fallow 93 A Wheat - - A Cereals A Sunflower A Sugar beet A Fallow 1 Systematic uniform 1.20 1.61 1.42 1 0.82 1 I 1.11 1.13 0.87 1 .OO Proportional I Sites systematic 1 Sites probabilistic 1 Strips CONCLUSIONS Two-phase sampling may provide a scheme for an adequate use of remote sensing applied to land cover area estimation in large regions, specially if no suitable stratification is available or if available stratification is not very efficient. Image acquisition by long thin strips instead of approximately square scenes can improve the efficiency of associated sampling plans. However many factors are still to be studied, including image distribution policy, cost of ground surveys and image analysis, and efficiency ratio for different land cover types in different landscapes. ACKNOWLEDGEMENTS We are grateful to the Spanish Ministry of Agriculture, in particular to Porfirio SBnchez, Jos6 Maria FernBndez del Pozo, and Maria Jos6 Postigo, that has kindly provided data to test the efficiency of sampling strategies. REFERENCES Ambrosio L., Alonso R., Villa A., 1993, Estimaci6n de superficies cultivadas por muestreo de Breas y teledeteccibn. Precisi6n relativa. Estadistica Espafiola, Vol. 35, pp. 91-103. Carfagna E., Gallego F.J., 1995, Yield Estimates from Area Frame at European Level. Seminar on Yield Forecasting. Villefranche sur Mer, 24-27 October 1994. pp. 237-240. Cochran W., 1977, Sampling Techniques. New York: John Wiley and Sons Dunn R., Harrison A.R., 1993, Two-dimensional systematic sampling of land use. JRSS: Applied Statistics, vol. 42 n. 4, pp. 585-601. Fuentes M., Gallego F.J., 1994, Cluster area estimation on a sample by square grid. getting a stable variance for the survey of the Spanish Ministry of Agriculture. Conference on the MARS Project: overview and perspectives. Office for Publications of the E.C. Luxembourg. pp. 146- 149. Gallego F.J., Delinc6 J.,Carfagna E., 1994, Two-Stage Area Frame Sampling on Square Segments for Farm Surveys. Survey Methodology , vol 20, No. 2 , pp. 107-115. Gallego F.J., 1995, Sampling Frames of Square Segments, Report EUR 16317, Office for Publ. of the E.C. Luxembourg. 68 pp. ISBN 92-827-5 106-6 Kennedy P., Folving S., Mc. Cormick N., 1995, European Forest Ecosystems Mapping and Forest Statistics: the FIRS Project. Proceedings of the wokshop on "Defining a System of Nomenclature for European Forest Mapping. June 13-15, 1994, Joensuu, Finland. pp. 33-48. Office for Pubblications of the EC. Luxembourg. Malingreau J.P., Achard F., D'Souza G., Stibig H.T., D'Souza J., Estreguil C., Eva H., 1995, AVHRR for Global Tropical Forest Monitoring: The Lessons of the TREES Project. Remote Sensing Reviews, 1995, Vol. 12, pp. 29-40. MAPA, 1990, El Marco de Areas como Instrumento de base para la Estadistica de superficies de cultivo. Boletin Mensual de Estad. Min. Agric. n. 12, pp. 85106. Madrid Mayaux Ph., Lambin E.F., 1995, Estimation of tropical forest area from coarse spatial resolution data: a two-step correction function for proportional errors due to spatial aggregation. Remote sens. Environ. n. 53, pp 1-15 Pous B., Reboux D., GuCrif M., DelCcolle R., Fischer A., de Boissezon H., 1995. PrCvisions du rendement et information satellitaire: acquis de l'action IV du projet MARS et nouvelles perspectives. Seminar on Yield Forecasting. Villefranche sur Mer, 24-27 October 1994. pp. 567-578. Sarndal C.E., Swenson B., Wretman , 1992, Model Assisted Survey Sampling. Springer Verlag . Sharman M., de Boissezon H., (1992), Action IV, de l'image aux statistiques: bilan opkrationnel aprks deux annCes d' estimation rapides des superficies et des rendements potentiels au niveau EuropCen. Conference on the Appl. of Remote Sensing to Agricultural Statistics (Belgirate). Office for Publications of the E.C. Luxembourg. pp. 177-186. Taylor J., Sannier C., DelincC J, Gallego F.J., 1996, Regional crop inventories in Europe assisted by remote sensing (in press). BIOGRAPHICAL SKETCH Javier Gallego graduated in Mathematical Statistics in Valladolid (Spain) and made a PhD. in Multivariate Descriptive Analysis in Paris. He taught Applied Statistics again in Valladolid, where he became Professor in 1986. Then he decided to flee the University and join the European Commission. Since 1988 he works in the Joint Research Centre in Ispra (Italy), where he is in charge of statistical aspects of the MARS Project (Monitoring Agriculture with Remote Sensing).