Document 11863949

advertisement
This file was created by scanning the printed publication.
Errors identified by the software have been corrected;
however, some errors may remain.
Optimum Area Sampling Frame Using
High Resolution Satellite Images with
Operational Objective :
How to Conciliate Statistical
Requirements and Practical Aspects?
Helene de ~oissezon',Herve ~eanjear?
Abstract: If high resolution satellite images can be used for measuring
land use characteristics, it is most often impossible to envisage an
exhaustive coverage of large areas. Sample survey methods represent a
feasible alternative when assessing land use surface. In many cases,
classical sampling survey techniques can be used. However, when spatial
correlation is observed between sites, geostatistics principles could be
introduced into the inventory design. Using classical statistical techniques,
the target variable has to be estimated from the elementary measurements
on the sample sites through an extrapolation model. When designing such
statistical process, many issues have to be addressed, such as the
characteristics of the population to be considered, the sampling design
itself, the extrapolation model to be applied on the whole study area, and
the sampling error. Practical considerations and constraints should not be
underestimated in any sample survey. In this paper, statistical issues on
sampling frame design based on high resolution satellite images are
*addressedwith emphasis on practical implementation. Sampling survey
designs of some operational remote sensing projects in the world are
reviewed, such as FAO/FRA 90, and the European MARS, TREES,
FIRS programmes. Some recommandations are then proposed for setting
up sound sampling schemes.
INTRODUCTION
In Earth observation, the analysis and monitoring of land surface
characteristics such as proportion of crops or forest types are often requiring high
resolution satellite images. Although some programmes are using this type of data
on a wall to wall basis, it becomes difficult to assess large areas with a full
'Remote sensing expert in agronomic statistics, SCOT Conseil, 1 rue Hermes, F-31526Ramonville France
Remote sensing expert inforestry, SCOT Conseil, I rue Hermes, F-31526Romonville France
coverage, because of the high cost induced and the huge amount of data to be
processed. A sampling scheme, based on sound statistical concepts, represents a
feasible method. Some operational regional and international programmes have
already developed and implemented such procedures. However, the sampling
scheme has been sometimes designed a posteriori, e.g. the samples have not been
selected and located under rigourous rules at the beginning. Moreover, pratical
constraints can be either overestimated or underestimated, whether they are given
too much or too little consideration. This situation results in problems in final
accuracy and in implementing an extrapolation model. This paper is addressing the
optimization of an optimum area sampling frame design : consideration will be
given to both statistical requirements and practical constraints.
RECENT ADVANCES IN SAMPLING DESIGNS
Few investigations have been carried out on the most adapted statistical
procedures for covering large areas with a sample of high resolution images.
However, the litterature is rather rich in research studies concerning small areas
within satellite images, where sampling schemes can be tested and evaluated (the
true variance can be calculated, and the sampling can be repeated as many times as
wishes).
The geostatistical approach
Geostatistical concepts constitute an alternative to classical sampling
techniques. They assume that the sample units are not independent, and that the
autocorrelation can be used in the model. Geostatistics requires a systematic
sampling rather than a random sampling. The sernivariogram is the basic tool in
geostatistics. In remote sensing images, strong correlation has been observed
between landscapes (Ramstein and RaQ, 1989; Webster and al., 1989; Gohin and
Langlois, 1993; Wood and al., 1994). McGwire and al. (1993) has shown that the
spatial correlation is increasing with the sample unit size. Atkinson (1995) confirms
this result and has pointed out that the gain of a systematic sampling by
comparison to a random sampling is increasing with the size of the sample unit. A
minimum distance between sample sites can reduce the effects of the spatial
correlation (the sampling units become less dependent), and thus can improve the
efficiency of the scheme. According to McGwire and al. (1993), geostatistics may
be tested and applied in wide vegetation mapping and monitoring projects. The
studies carried out by Jupp and al. (1988) constitute a good starting point for
developing such methods. In spite of this suggestions, no complete study has
proved the applicability of geostatistical concepts when sampling large areas
(country to continent scale) with high resolution images. In particular, the spatial
correlation between SPOT or TM scenes selected on a systematic basis has still to
be proved. The main bottleneck for evaluating the usefblness of geostatistical bases
is related to the important means (number of satellite scenes and processing time)
which would be required for building and testing a semivariogram. Therefore,
classical statistical concepts are generally still prevailing since their application
rules and their limits are well known.
Gain of systematic sampling
Previous studies have shown that systematic sampling is becoming more
efficient when the sample size or the sampling intensity increases (Atkinson, 1995).
In fact, as the sample size increases, the sampling units are likely to be more and
more correlated since they are closer to each other. However, it is always
recommended to carefidly examine the spatial behaviour of the variable to
estimate. If the variable is following a periodicity (sequence of ridges and valleys
for instance), a bias is likely to be introduced. If several simulations confirmed the
gain of systematic sampling (Cochran, 1981; Brion and Fournier, 1995), the
magnitude of the gain is varying and is dependending on the spatial correlation
(Dunn and Harrison, 1993).
Multi phaselstage sampling
When the population to be assessed in not well known, e.g. when stratification
is not possible, it may be convenient to proceed to a multiple phase sampling. In
remote sensing, a double phase sampling with regression has been tested or
recommended in many studies, whether between high resolution images and
ground measurements (Smiatek, 1995) or between coarse resolution satellite
images and high resolution satellite images (Mayaux and Lambin, 1995; Kleinn and
al., 1993; Piiivinen and Pitkiinen, 1992; Iverson and al., 1989; Nelson and Holben,
1986). Multistage sampling is a feasible alternative. In that case, the total variance
is often mainly due to the first stage.
Gain of stratification
The a priori stratification consists in partinioning the population with criteria
well correlated to the variable to estimate. The final gain is depending on this
relationship. The a posteriori stratification, which consists in grouping sampling
units in ensembles defined after the sampling scheme, may provide good results
(Cochran, 1981; Dunn and Harrison, 1983).
EXPERIENCE OF SOME LARGE SCALE REMOTE SENSING PROJECT
Several operational remote sensing projects have set up sound sampling
schemes based on high resolution satellite images as sampling units. The activity B
of the MARS project (Monitoring Agriculture by Remote Sensing) is an European
programme carried out by the Joint Research Centre at the Institute of Remote
Sensing Applications and is aimed at providing fast estimates of crops areas and
yields to DGVI in Brussels. A total of 60 sampling sites (40x40 &n2) have been
located after stratification, which corresponds to 6 % sampling intensity. The
allocation of sites is proportional to the agricultural lands in the country, and is
coinciding with SPOT scenes (KJ). A weighting procedure is then employed to
derive the extrapolation model. The MERA 95 project (MARS extension and
Environment Related Applications) is an extension of MARS to Central and
Eastern Europe. The sampling design has been slightly modified, with no
stratification, and 40 sites have been selected among a systematic sample base,
with proportionality to the agricultural activity. The FA0 Forest Resources
Assessment 1990, following recommendations given by statisticians (Czaplewski,
1991) is using a sample of 117 TM sites located randomly in the tropical belt
(FAO, 1993). The variables to be estimated are the forest areas and the change of
forest cover in the last 10 years. The sampling intensity is 10 %, and a stratification
has been performed using ecological criteria. The sample size has been fixed with
an objective of 5% of the standard error. The deforestation rate was estimated
with an error of 12.5 %, which can be explained by the high coefficient of variation
of deforestation. The FIRS project (Forest Information by Remote Sensing) is
another activity of JRC, Italy. One of its fbndamental activities was to stratify
Europe into 115 homogeneous strata with three main criteria : proportion of forest
cover, species composition, and timber volume. The strata are grouped into 6
statistical regions, where 223 sites have been selected with unaligned systematic
sampling. The sampling intensity is 3 % of the total area, or 10 % of the forest
lands. The TREES project (Tropical Ecosystem Environment observation by
Satellites) is using a double sampling scheme with regression between AVHRR
derived forest cover proportion as the auxiliary variable and TM derived forest
cover proportion as the target variable. The sampling unit is corresponding to a
block of 11x11 AVHRR pixels. A total of 1121 units have been selected, each TM
scene generating more thant 200 units (cluster sampling). A regression is
performed between the forest proportions estimated at TM and AVHRR levels.
DESIGNING AN OPTIMUM SCHEME
Constraints
The constraints in designing an operational sampling scheme are related to the
satellite itself (availability of data, acquisition delays, repetitivity for multitemporal
analysis...), the operationality of the system (fblfilement of the requirements and
techcal feasibility), efficiency and cost-effectiveness (the measurement error
induced by remote sensing techques must be compared and added to the
sampling error), and the client requirements (terms of reference and required
precision at which level of confidence).
Definition of a sound sampling scheme using HRSD
At a very first stage, it is necessary to identify which land cover parameters are
to be analysed. The sampling scheme will depend on the characteristics of the land
cover type under investigation, e. g. its surface proportion in the entire domain
(study area), the size of the elementary components of this land cover type (size of
agricultural plots for instance), its spatial distribution (presence or absence in the
different strata of the study area), its occuring frequency (proportion of the land
cover type in the strata), and its temporal variation (crops cycle for instance).
Based on high resolution satellite images, the variable to estimate with satisfactory
precision can be the proportion of the land cover type, e.g. the proportion of
agricultural areas or the forest areas. It can be also the variation of proportion with
a multitemporal data set. The a priori knowledge of the domain is a critical issue
for setting up a sound sampling scheme. The sample must be as representative as
possible of the entire population. If limited knowledge is available, multiphase
sampling is recommended. In table 1, sampling schemes are proposed with respect
to the characteristics of the item.
1.
2.
3.
I
4.
I
Table 1.-Designing a sampling scheme based on the characteristics of the item
(Cochran. 1981).
charateristics of item
type of sampling needed
widespread throughout the region, a general survey with low sampling intensity.
occuring with reasonable frequency in all
parts.
widespread throughout the region but a general survey, but with a higher sampling
with low frequency.
ratio.
occuring with reasonable frequency in for best results, a stratified sample with
most parts of the region, but with more different intensities in different parts of the
sporadic distribution, being absent in 1 region. Can sometimes be included in a1
some parts and highly concentrated in general survey with supplementary sampling.
others.
distribution very sporadic or concentrated not suitable for a general survey. Requires a
in a small part of the region.
I sample geared to its distribution.
1
The sampling type has to be examined when using satellite images. Indeed, the
measurements are not made on a single point, but within an area (SPOT scene for
instance). The results may be affected to the scene centre (sample of points taken
from an infinity of points) or to the entire area (area frame sampling with a sample
taken from N sites covering the entire region). The definition of the sample base is
an important step of the sampling scheme : it is, in principle, composed of all units
(points or frames) constituting the entire region. When using SPOT scenes centres
as the sample base on very large region, a geometric problem is raised by the
narrowing of tracks towards the pole, which induces of higher density of units. The
sampling scheme should take into consideration this distortion by selecting the
sample-units on a proportional basis (proportional to the distance between two
orbits for instance). A stratification is then often required, mostly when the a priori
proportion of the item is known with sufficient accuracy. The ideal variate for
stratification is the value of the variable to estimate. In practice, other source of
information is used, such as ancillary data (existing maps or statistics). The
stratification process can lead to different types of strata : the strata in which the
item is almost absent and in which no sample will be taken, and the strata with
different proportion of the item (from low to high proportion). It is sometimes
recommended to delineate special strata which correspond to other criteria
(mountainous areas, swampy zones...etc). It is important to measure the total area
of each stratum, and to verify that the sum is giving the total area of the
inventoried region. The number of strata will be low if the a priori knowledge of
the population is limited. Systematic sampling in two dimensions represent a good
area sampling frame scheme. An alternative is the unaligned sampling (Cochran,
1981). A combination of systematic sampling (selecting the cluster) and random
sampling (selecting the unit in the cluster) may also give good results. The number
of sample units within each stratum may be calculated with the following equation
@s-1) :
with n as the total sample size, nj the sample size in stratum j, Sj the size of stratum
j, and Varj the variance of stratum j. The total sample size n is determined by the
required overall precision and the available budget. For optimizing the design (cost
effectiveness), the Lagrange equation can be used to solve the system. In stratified
sampling, the overall estimate of the mean value is weighted by the strata sizes
with Nj as the stratum size, Yj the estimated mean value in stratum j, and N the
total number of sample units. The total error of the design is a sum of the sampling
error and the measurements error. When the measurement error is important, it is
not worth designing a complex sampling scheme with a very high precision since
the overall precision will be low.
CONCLUSION
In sampling statistics, the estimates are always given with a confidence interval
and a probability level. This means that we are not absolutely sure of the error we
have made with the sample. It is sometimes difficult to decide how much error
should be tolerated by the client or the user. Moreover, starting from the user
requirements and means (budget allocated for the survey), and looking into the
characteristics of the parameter to be estimated, it should be pointed out that there
is no single optimum sampling design. In our problem consisting in optimizing the
estimate of the sample size and the location of the sample units, it can be
concluded that geostatistical concepts should be fbrther investigated before being
applied in an operational project. The combination of several standard statistical
principles often lead to satisfactory results (systematic sampling at first stage and
random sampling at second stage for instance). The use of standard principles
always simplifies the design where the error can be easily calculated. In operational
projects, the use of a satellite reference grid (SPOT or TM) seems to be the best
way for building up the sampling base (FAOIFRA 90 and MARS projects).
However, in high latitudes, the distorsion of the grid has to be examined in order
to avoid an oversampling. This issue should be addressed by global survey over
large areas, such as the agricultural statistics project in Russia which has just been
launched under European Union support (TACIS) and in which SCOT Conseil is
providing technical assistance.
REFERENCES
Atkinson, P., 1994, Testing the efficiency of sampling strategies with simulated remotely
sensed data, in SFPT, no137, p 12.
Brion, P., et Fournier, P., 1995, Etude Geostat-Maroc: Partie statistique. Rapport CNES
trayaux INSEE, SCEES.
Chevrou R.B., D E W Montpellier, Mars 1988, Inventaire Forestier National, Methodes et
Procedures.
Cochran W. G., Mars 1981, Sampling techniques, thlrd edition.
Czaplewslu R.L., Analysis of alternative sample survey designs, FA0 1991.
Dagnelie P., 1973, Theorie et methodes statistiques, Volume 1, Centre de Documentation
de Toulouse.
Dagnelie P., 1975, Theorie et methodes statistiques, Volume 2, Centre de Documentation
de Toulouse.
Dunn R., Hamson A.R., 1993, Two-dimensional systematic sampling of land use, Appl.
Statistics, 42, No 4, pp. 585-60 1.
Fitzpatrick-Lins K., Mars 1981, Comparison of sampling procedures and data analysis for
a land-use and land-cover map, PERS vol. 47, N 3, pp. 343-35 1.
Food and Agriculture Organization, 1993, Forest resources assessment 1990: tropical
countries, FA0 Forestry Paper 112, Rome, 6 lpp.
Gallego FJ , P Vossen, JF Dallemand, V Perdigao, Sampling plans in MERA Project,
MARS project, 1995.
Houllier F., 6 Juin 1986, Echantillonnage et modelisation de la dynamique des
peuplements forestiers/Application au cas de 1'Inventaire Forestier National, These.
Iverson L. R., Cook E.A., Graham R. L., 1989, A techmque for extrapolating and
validating forest cover across large regions - Calibrating AVHRR data with TM data,
International Journal of Remote Sensing, Vol. 10, no11, p. 1805- 18 12.
Kleinn C., Dees M., Pelz D.R., 1993, Sampling aspects in the TREES project - global
inventory of tropical forests, final report, Freiburg Universitiit Germany, contract for
JRC Ispra, Italie.
McGwire K., Fried1 M., Estes J.E., 1993, Spatial structure, sampling design and scale in
remotely-sensed imagery of a California savanna woodland, International Journal of
Remote Sensing, Vol. 14, no11, p. 2 137-2164.
Nelson R. and Holben B., 1986, Identiflmg deforestation in Brazil using multiresolution
satellite data, International Journal of Remote Sensing, vol. 7, pp. 429 - 448.
Palvinen R. and P i t h e n J., 1992, Calibrating AVHRR data with TM data for tropical
forest cover assessment, IUFRO S 4.02.05 Wacharakitti international workshop
"Remote sensing and permanent plot techmques for world forest monitoring", Pattaya
Thalland.
Ram M., Lacaze B., Rarnbal S . and Wlnkel T., August 1994, IdentifLing spatial patterns
of Mdterranean landscapes from geostatistical analysis of remotely-sensed data,
International Journal of Remote Sensing, Volume 15, Number 12, Special issue:
scaling in remote sensing.
Ram M., Puech C., August 1994, Thresholds of homogeneity in targets in the landscape.
Relationship with remote sensing, International Journal of Remote Sensing, Volume
15, Number 12, Special issue: scaling in remote sensing.
SCOT CONSEIL & GAF for JRC EMAP Unit, 1995, FIRS Project, Regionalization and
stratification of European forest ecosystems, final report.
SCOT CONSEIL, CCE, 1994, Documentation Action IV, Estimations rapides des
superfkies et des productions potentielles au niveau europeen, Volume I - Document
de synthese.
SCOT CONSEIL, CCE, 1994, Documentation Action IV, Estimations rapides des
superficies et des productions potentielles au niveau europeen, Volume I11 Methodologie.
Smiatek G., 1995, Sampling thematic mapper imagery for land use data, Remote Sensing
of Environment, 52: 116-121.
Download