Sampling Satellite Images for Area
Estimates in a Large Region
Abstract. This paper describes two alternative methods (square sites and
strips) to select a sample of sites for satellite image analysis giving
priority to a particular type of area (intensive agricultural or forest) when
no suitable stratification is available. Both methods are based on twophase sampling. For square sites, a systematic sample is first drawn on
the SPOT reference grid. Pre-selected sites are visually photo-interpreted
for a rough estimation of the percentage of agriculturaJ land or forest,
that determines the sampling probability in the second phase. The
alternative scheme, based on thin satellite image strips, seems to be more
efficient than sampling square sites according to a simulation study
based on ground survey data in Spain.
We consider here two problems for which high resolution satellite images are
sampled because analysing a complete coverage of is too expensive:
Area estimation of a particular land cover (an annual crop, forest, etc.) or
estimation of area change between two dates.
Accuracy assessment of a land cover map obtained by other means, for
example by multi-temporal analysis of low resolution satellite images.
The Institute for Remote Sensing Applications (IRSA) of the European
Commission (EC) is involved in several projects where these problems are
1. Rapid estimates of area change for main crops in the European Union (EU) of
the MARS Project (Monitoring Agriculture with Remote Sensing).
2. Multi-country crop area estimates in central Europe. This is part of the
MERA-95 Project (MARS and Environmental related Applications).
3. FIRS Project (Forest Information from Remote Sensing). Forest mapping and
estimation in Europe, from the Atlantic to the Urals (Kennedy et a1 1995).
4. TREES Project (Tropical Ecosystem Environmental observations by Satellite):
Forest mapping in the inter-tropical belt (Malingreau et a1 1995).
5. Estimation of crop area change in countries of the former Soviet Union.
MARS Project, IRSA, JRC, 21020 Ispra (Va), Italy
The examples we present below refer to items 1 and 2 (MARS Project),
where the purpose is mainly statistical. Mapping is not a priority. For FIRS and
TREES (items 3 and 4) mapping is the main priority, but area estimation and
evaluation of mapping accuracy are associated targets (Mayaux and Lambin,
1995). Item 5 refers to a contract of the EC with the French company "SCOT
Conseil", in which IRSA will be responsible for site sampling.
This unit size is reasonable for multitemporal analysis of high resolution
satellite images. They are large enough to include a variety of land cover types in
most landscapes, and, if conveniently located, fit inside marketed images.
The "Rapid Crop Area Estimates in the EC" of the MARS Project.
Rapid crop area estimates are
being produced in the European
Union by analysis of high
resolution satellite images on a
sample of sites of 40 km x 40 krn
(figure 1). Estimates are computed
for inter-annual variations, not for
a particular year (Pous et al, 1995,
Sharman and De Boissezon, 1992).
The method is run on a non
random sample of 53 square sites
of 40 km x 40 km for EU- 12 (The
12 members before 1995). With
the 3 new member states (Austria,
Finland and Sweden), the number
of sites has become 60. Ground surveys are conducted on a sample of
approximately 16 segments of 1400 m x 1400 m in each site (Carfagna 1995):
40 points per segment are visited and a subsample of farms are interviewed
(Gallego et a1 1994). Some problems arise to extrapolate results because of the
non-random character of the sample of sites.
Sampling Sites in Central Europe
A different approach has been followed to draw a sample of sites in Central
Europe (Poland, Czech Republic, Slovakia, Hungary, Romania and Bulgaria) in
the frame of the MERA-95 Project (MARS and Environmental Related
Applications). The sampling procedure, described in more detail by Vossen et a1
(1995), has two-phases:
First sampling phase:
A systematic sample is selected by blocks on the K-J indexes of the "SPOT
reference grid". K is related with the satellite orbit, and J refers to the latitude.
Sites on the border are selected if more than 50% is inside the studied region. In
the example of central Europe, 58 sites were selected in this phase (figure 2). The
sampling intensity is higher in the north of the area because the distance qlutls,
between contiguous satellite orbits is shorter.
Second sampling phase:
Spot XS digital quick-looks or Landsat-TM photographic printouts of the preselected sites were analysed for a rough visual estimation As of the percentage of
agricultural land. 40 sites were subsampled without replacement with a probability
proportional to P,= As x qluthj (figure 3). Estimates are computed with a HorvitzThomson or n estimator (Sarndal, 1992) using ]/As as weights rather than l/P, to
compensate the variable sampling rate in the first phase.
Figure 2: Systematic sample of sites based on
the SPOT reference grid.
Figure 3: Sample of sites for image analysis
An alternative sampling scheme is currently evaluated. This approach is
based on long narrow sites. The site is a strip as long as the studied territory along
a satellite orbit track (figure 4). The width of the site can be between 1 km and
10 km, depending on the land use category to be estimated. 1 km or 2 km can suit
for agricultural estimates, but wider strips would probably be more efficient in
forestry. Different sets of sites can be defined for different satellite sensors, such
as Landsat-TM, SPOT-XS, or IRS- 1C.
Sampling Segments inside the Strips
Segment sampling inside the strips follows a two-phase procedure, similar to
the site sampling scheme of the previous section. The first phase can be
systematic or random; in the simulations presented below we have used
systematic sampling, that seems to be generally more efficient in a geographic
context (Dunn and Harrison, 1993). In the second phase, a subsample is selected
with a probability proportional to an index of agricultural intensity (or forest
intensity), and to the distance between satellite orbits.
Image Acquisition
One important condition for the strip approach to be efficient is that images
are marketed by continuous strips instead of approximately square scenes. In the
case of Landsat TM, the delivered image for agricultural purpose would have for
example a size of 2000 km x 3 km 40 Mbytes instead of the standard frames
180 km x 180 km E 200 Mbytes.
Strips may be divided into pieces corresponding to different receiving
stations: During the agricultural season, image strips would be delivered
regardless of the cloud coverage. This gives in Europe approximately 11 TM
images or 6-7 SPOT images per year. The selection of images that can be
exploited for land use identification in a particular segment would be made later
by the photo-interpreter.
The target is getting an
"exploitable sample" of at least n
segments. For annual crops, a possible
definition of "exploitable segment"
can be: a segment for which at least 3
cloud free images are available
during two successive agricultural
seasons with a time interval of at least
one month. The definition of
exploitable segment is to be adapted
to each case; for many forest
applications, one cloud free image
may be enough. The required number
of second phase sample segments to
reach the targeted number of
exploitable segments is being studied.
Figure 4: Sites on Landsat tracks
We do not have yet data to test the efficiency of the strip approach with real
strips following satellite tracks, but we can simulate the scheme with data from
ground survey segments. The set of data used for this test comes from 1992 and
1993 ground surveys on 8023 square segments of 49 ha from the Spanish Ministry
of Agriculture in an area of 270.000 Krn2 (Castilla y Le6n, Madrid, Castilla-la
Mancha and Andalucia). The set of 8023 segments was obtained by systematic
sampling with 3 replicates in blocks of 10 krn x 10 km. The sample is described
by MAPA (1990), Ambrosio (1993), Fuentes (1994) and Gallego (1995). We
consider this set of 8023 segments as population from which we select samples of
80 segments with different strategies (figure 4). Unequal probability subsampling,
when applied, was made using 1992 data to compute agricultural intensity
indexes. We compare area estimates in 1993 or area change estimates 1993-1992.
For each strategy, we have repeated the procedure 100 times in order to estimate
variances without unstability problems, that are well known for the variance
computation of the Horvitz-Thompson estimator (Cochran, 1977, Sandal, 1992).
Sampling Strategies Compared
Random: simple random sampling without replacement. This is the reference
to compute the relative efficiency of the other strategies as ratio of variances.
Systematic: Systematic sampling ordering population segments by geographic
co-ordinates (latitude and then longitude). Uniform probability.
Proportional: Sample with a probability proportional to arable land in 1992.
Sites systematic: Systematic sample by clusters of 16 segments approximately
arranged in squares of 40 km x 40 km . Uniform probability.
Sites probabilistic: The same as the previous one with probability proportional
to the arable land in 1992. This approach nearly coincides with the one
described above, that has been proposed for the MERA-95 Project.
Strips: Two-stage sampling: systematic sampling of strips 100 krn apart from
each other and sub-sample with probability proportional to 1992 arable land.
Sites probabilistic
Figure 4: Samples with different strategies
Comments on Simulation Results.
Some provisional conclusions can be drawn by examining Table 1, that gives
the relative efficiency of the studied strategies compared with simple random
sampling for the main annual crops and fallow (understood as arable land under
repose in the crop rotation).
Systematic sampling looks slightly superior to random sampling, but there are
some important exceptions.
Sampling unclustered segments with probability proportional to an index of
agricultural intensity (strategy Proportional) is much more efficient (ratio
ranging between 2 and 9) than simple random or systematic sampling.
However this strategy may be unsuitable for satellite image acquisition.
Sampling by square sites of 40 km x 40 km reduces efficiency both with
systematic and unequal probability sampling (Sites systematic compared with
Systematic or Sites proportional compared with Proportional ). In the case of
systematic sampling, efficiency ratios become close to 1 when area changes
are estimated instead of crop area for a particular year. For the more efficient
unequal probability sampling, sites of 40 km x 40 km still have a poor
Sampling by strips is not as good as sampling unclustered segments, but is
more efficient than sampling by square sites and is compatible with satellite
image acquisition.
The efficiency gain with a two-phase sampling, where the second phase is a
subsampling with a probability proportional to arable land is higher than the
efficiency obtained in most European countries with area frame sampling
stratification (Taylor et a1 1996).
Some exceptions to the comments above can be noted for crops with a
relatively small area, as we can check in the case of sugar beet.
Table 1: relative efficiency compared with random sampling
Wheat 93
Barlev 93
Cereals 93
Sunflower 93
Su ar beet 93
Fallow 93
A Wheat
A Cereals
A Sunflower
A Sugar beet
A Fallow
1 .OO
Two-phase sampling may provide a scheme for an adequate use of remote
sensing applied to land cover area estimation in large regions, specially if no
suitable stratification is available or if available stratification is not very efficient.
Image acquisition by long thin strips instead of approximately square scenes
can improve the efficiency of associated sampling plans. However many factors
are still to be studied, including image distribution policy, cost of ground surveys
and image analysis, and efficiency ratio for different land cover types in different
We are grateful to the Spanish Ministry of Agriculture, in particular to Porfirio
SBnchez, Jos6 Maria FernBndez del Pozo, and Maria Jos6 Postigo, that has kindly
provided data to test the efficiency of sampling strategies.
Javier Gallego graduated in Mathematical Statistics in Valladolid (Spain) and
made a PhD. in Multivariate Descriptive Analysis in Paris. He taught Applied
Statistics again in Valladolid, where he became Professor in 1986. Then he
decided to flee the University and join the European Commission. Since 1988 he
works in the Joint Research Centre in Ispra (Italy), where he is in charge of
statistical aspects of the MARS Project (Monitoring Agriculture with Remote