Document 11863914

advertisement
This file was created by scanning the printed publication.
Errors identified by the software have been corrected;
however, some errors may remain.
A Calibration-Based Model for
Correcting Area Estimates From
Coarse Resolution Land Cover Data
Aaron Moody
*
Abstract.-A two stage modeling strategy significantly improves area
estimates by correcting coarse-resolution measurements of class proportions. Stage I models use measurements of scale-invariantlandscape spatial properties to estimate the slope and intercept of proportion transition
relationships. A stage I1 model uses a regression estimator to predict true
class proportions based on measured coarse-scale proportions, and the
slope and intercept estimates from the stage I models. Model development and testing on a calibration site is followed by testing and inversion
for a validation site. Inversion involves using spatial variables measured
at the coarse scale as input to the stage I models. A probabilistic sampling strategy allows statistical assessment of the models and results.
INTRODUCTION
Image spatial resolution will influence land-cover area estimates made from
classified remotely sensed data (Mayaux and Lambin 1995; Moody and Woodcock 1994). Models of these resolution effects can lead to improved area estimates derived from coarse resolution remote sensing. One might consider two
approaches for modeling this scale-dependent areal bias. Mixture models, incorporated into the classification process, can estimate the subpixel composition of
pixels if pure-class spectra are known (Adams et al. 1986). Alternatively, calibrated models, applied in a post-classification mode, can improve area estimates
from coarse-resolution data if the relationship between "true" and measured proportions can be modeled (Mayaux and Lambin 1995; Kakhan et al. 1995).
This paper describes the development and evaluation of three linked, statistical models that provide post-classification correction- of area estimates. The first
two models estimate the slope and intercept of a line characterizing the relationship between true and coarse-scale proportions. In this case, "true" proportions
refer to proportions measured at 30 m, and coarse-scale proportions refer to measurements at 1020 m. The slope and intercept are modeled based on a small set of
relatively scale-invariant, measurable spatial properties of the landscape. The
third model uses these estimated slopes and intercepts to predict 30 m proportions
based on measurements of proportions at 1020 m.
*Assistant Professor, Department of Geography, University of North Carolina, Chapel Hill, NC. 27599
BACKGROUND
Two basic spatial effects contribute to biased area estimates. Afirst order
effect is the tendency of large classes to increasingly dominate the landscape
when measured at increasingly coarse scales. Accordingly, small classes tend to
diminish in size. Figure 1 illustrates these patterns for all the data used in this
study. Second order effects refer to modulations of these basic patterns due to
specific landscape spatial organization (Moody and Woodcock 1995). These
effects result in the scatter about the smoothed fit notable in Figure Id. In either
case, scale-dependent changes in the apparent area of classes result from class
membership transitions between scales. This effect can be thus characterized in
terms of proportion transition lines relating true and coarse resolution proportions.
The slope of such a line can summarize the rate of transition.
If transition rates depend partly on landscape spatial organization, it is sensible to try to model them using measures of spatial pattern. A variety of spatial
measures exist, and many reviews and summaries are present in vegetation
analysis and landscape ecology literature (Legendre and Fortin 1989; Cullinan
and Thomas 1992). In this research, landscape spatial properties are used to
model the slopes and intercepts of the proportion transition lines for a large set of
sampling units in independent calibration and validation sites. These are the
stage I models. Model inversion, will require that the landscape pattern measures
&relatively scale-invariant, or resistant to resolution.
0.0
0.2
0.4
0.6
0.8
0.0
Measured Proportions at 30 m
0.0
0.2
0.4
0.6
0.2
0.4
0.6
0.8
Measured Proportions at 30 m
0.8
0.0
0.2
0.4
0.6
0.8
Measured Proportions at 30 m
Measured Proportions at 30 m
Figure 1.-Relationship between proportions at 30 m and at 4 coarser scales.
84
There are several calibration-based methods for improving coarse resolution
area estimates (Czaplewski and Catts 1992; Kalkhan et al. 1995). The model
used here is a form of "classical" model in which known but incorrect values are
estimated using unknown correct values (Brown 1982). For example, if PO
represents true proportions and Pr represents measured proportions at some
coarse spatial resolution r ,then:
P, = Po + P1-Po+ error
where Po and P1 are the intercept and slope of the proportion transition line that
relates true proportions to proportions measured at resolution r . Inverting this
simple model provides:
which is the stage I1 model used in this paper.
A five part strategy is employed as follows: a ) identify a set of scale invariant spatial measures; b) use a subset of these measures (calculated at 30 m) (X30)
to develop stage I models for predicting the slope (61) and intercept (60) of the
proportion transition lines for the calibration site; c) predict 30 m proportions
(Po)for the calibration site by applying the stage 11model in Eq. 2 using the
measured 1020 m proportions (P,),and the slopes (&) and intercepts (f30)
estimated from the stage I models; d) using the stage I models developed on the
calibration site, repeat step c and evaluate the procedure when applied to the data
from the validation site; e) invert and evaluate the procedure by running the stage
I models using the spatial variables as measured at 1020 m (Xlo20), and supply
the results to the stage I1 model to predict 30 m proportions based only on information measured at 1020 m for the validation site.
METHODS and RESULTS
The Plumas and Stanislaus National Forests are used as calibration and validation sites, respectively. Landsat Thematic Mapper data have been classified to
produce maps of general land-cover categories for each site. Classes include barren, brush, hardwood, water, and conifer. Although the two sites have similar
characteristics, they are spatially separated by roughly 2' of latitude.
The data for each site are aggregated to 1020 m resolution using a pluralitybased aggregation procedure. This involves coding each grid cell in a 1020 m
sampling grid with the most frequently occurring subgrid-cell class.
A set of randomly located 238x238 pixel subregions serve as the sampling
units for the analyses. Each unit contains 56,644 30 m pixels and 49 1020 m pixels. The Plumas contains fifty sampling units for model development and initial
testing. The Stanislaus contains thirty-five units for model validation and model
inversion. The number of units represent 30% of all possible such units from
each site. Within each sampling unit the following measurements are collected:
proportions at 30 m for each class, proportions at 1020 m for each class, a set of
spatial measures at 30 m, and the same set of spatial measures at 1020 m. The
ultimate goal is to estimate 30 m proportions by supplying 1020 m area measurements and slope and intercept coefficients to the stage II model (Eq.2). An intermediate goal is to estimate the proper slopes and intercepts using a multiple
regression model with a parsimonious set of spatial measures as the independent
variables. The slopes and intercepts of the proportion transition lines are the
dependent variables in these stage I models.
A variety of spatial measures are determined within each unit using the r.le
software (Baker and Cai 1992). Five of these demonstrate scale-invariance as
determined by examining the simple correlations between each variable and itself
at the two different scales. Of these five, three prove significant in modeling both
the slope and the intercept of the proportion transitions as determined within each
sampling unit. An additional variable (c below) that does not have the scale
invariance property is included because it characterizes an important landscape
characteristic not previously included in the model. The four variables used are:
maximum class size (mx), inverse Simpson's index (s-l), contagion (c ), and
entropy (ent). Expressions for the latter 3 variables are:
c = 2.ln(k)
- ent
(4)
where k is the number of classes present, Pi is the proportion of class i in the
sampling unit, and Pij are elements of a k xk co-occurrence matrix and represent
adjacency probabilities between classes i and j . Maximum class size refers to
the proportion of the largest class in the sampling unit. Simpson's index indicates
the probability of randomly selecting two pixels of the same attribute. Contagion
measures the degree of clumping in the landscape. Entropy is maximized when
all pixels of a given class are as far away from one another as possible. Table 1
shows the cross-scale correlation matrix for this set of variables.
Table 2 presents regression summaries for the two stage I models. The
independent variables of are the spatial variables measured at 30 m as described
above. The dependent variables are a) the intercepts and b) the slopes of the proportion transition lines as determined using a linear least squares fit between the
30 and 1020 m proportions for the classes existent within each sampling unit.
Table 1.-Cross-scale correlations (scale-invariance) of independent variables.
Table 2.-Stage 1 models. Slope model R 'adj =0.63. Intercept model R 'adj =0.70.
Slope Model
Bo
m 3 0
s - l 30
c30
ent 30
Intercept Model
PO
m 30
~ ' ~ 3 0
C3o
ent30
P > It1
Coefficient
-5.77
4.95
-0.74
-0.679
2.83
Standard Error
1.05
0.84
0.22
0.17
0.49
t-value
-5.49
5.92
-3.33
-4.06
5.83
0.00
0.00
0.002
0.0002
0.00
0.0042
-0.30
0.067
0.113
-0.145
0.093
0.074
0.020
0.015
0.043
0.045
-4.054
3.3904
7.68
-3.39
0.96
0.0002
0.0015
0.00
0.0015
The stage I models described in Table 2 estimate the Oo and coefficients
necessary for employing the stage II model (Eq. 2). This sequential modeling
process is conducted three times. First, the stage I models are developed using
the data from the Plumas (calibration site). The predicted values from these
models are then used in Eq. 2 to test the overall modeling process for the calibration data. Second, the stage I models are applied in a predictive mode using the
independent variables as measured at 30 m from the Stanislaus (validation site).
Again predicted $0 and values supply the coefficients to run Eq. 2 and estimate
30 m proportions for the Stanislaus. Third, stage I models are applied using the
variables measured at 1020 m from the Stanislaus and the results are again used
to estimate 30 m proportions. The first two cases are forward models in the sense
that they require high resolution information to perform the correction. The third
is an inverted model, because it relies only coarse resolution data. The results
from these three tests are presented in Figures 2,3, and 4.
DISCUSSION and CONCLUSIONS
As seen in Figure 1, Figures 2a, 3a, and 4a also illustrate the basic scaling
effects for class proportions. Note that at 1020 m, the greatest underestimations
occur for intermediate-small classes, and the greatest overestimations occur for
intermediate-large classes. Very large, very small, and moderate sized classes (at
the cross-over point, around 30%) are all reasonably estimated at 1020m.
The goal of the two stage modeling procedure is to pull the coarse-resolution
area estimates closer to the zero-one line. Figures 2b and 3b demonstrate that the
model improves area estimates for both the calibration and the validation sites
when operated in the forward mode. The correction procedure performs best for
large classes. For small classes a notable dip (albeit reduced) below the zero-one
line still occurs. An intercept effect is also evidenced by the vertical alignment of
estimates above the zero value of the x-axis. Results from the inverted model
(Figure 4b) also show general improvement, although considerable scatter occurs
for very large classes (note the 2 outliers).
Once corrected values are derived, it is possible to tabulate the total absolute
error within each sampling unit. For a given sampling unit g , the total error is:
Forward Model: Plumas
.. ..* .
..*.:'I
Zero-One Line
0.0
0.2
0.4
0.6
0.8
Measured Proportions at 30m
Measured Proportions at 30 rn
Forward Model: Total Error by Region
...
-
- Zero-One Line
,-
u
0
0.1
0.2
0.3
0.4
0.5
0.6
Pre-Correction
0.7
Post-Correction
Pre-CorrectionError
Figure 2.-Forward model results for the calibration site.
Forward Model: Stanislaus
Stanislaus
/
Zero-One Line
..
0.0
Measured Proportions at 30 rn
0.2
10.4
0.6
Measured Proportionsat 30rn
Forward Model: Total Error by Region
Pre-Correction
Post-Correction
Pre-CorrectionError
Figure 3.-Forward model results for the validation site.
0.8
are the proportions for class i at 1020 and 30 m, respecwhere Pi,l02o and
tively. By calculating these values for both pre- and postcorrection data, the
results can readily be compared. Figures 2c and 3c show the relationship between
pre- and postcorrection error for the calibration and validation sites using the forward model. For any point falling below the zero-one line, the total error is
reduced due to the correction procedure. At both sites, the total error for roughly
90% of the sampling units is either reduced or unchanged after correction. Conversely, the error for roughly 10% of the units is increased. For the inverted
model (Figure 4c) 80% of the regions are either improved or unchanged after
correction with two positive outliers (regions 21 and 25).
Figures 2d, 3d, and 4d are boxplots comparing the distributions of the preand postcorrection total error values. In all cases, the correction procedure results
in a significant reduction in error. For the inverted model the test was performed
after removing the outliers. In each case, however, the T-tests are suspect due to
unequal variance.
Several interesting questions deserve continued attention. How do the spatial measures used govern the proportion transitions? Will the scale-invariance
property transfer to other landscape types?
Stanislaus
lnverted Model: Stanislaus
0 1
0.0
0.2
0.4
0.6
0.8
0.O
0.2
0.4
0.6
0.8
Measured Proportions at 30 m
Measured Proportions at 30m
Inverted Model: Total Error by Region
lnverted Model: Two Outliers Removed
7
A
0
Pre-Correction
Post-Correction
Pre-CorrectionError
Figure 4.-Inverted model results for the validation site.
89
Might other scale-invariant measures better predict proportion transitions? Is the
general procedure extensible across landscape types? What is the effect of constraining the intercept to zero? What are the sensitivities of the cross-over point
seen in Figure l ? Resolving these and other issues will help formalize a body of
understanding of how class proportions scale. This understanding hopefully will
lead to improved land-cover area estimates at local, regional and global scales.
REFERENCES
Adam, J. B., Smith, M. O., and Johnson, P. E. 1986. Spectral mixture modeling:
A new analysis of rock and soil types at the Viking Lander 1 site. J. Geophy. Res. 91(B8):8098-8 112.
Baker, W. L. and Cai, Y. 1992. The r.le programs for multiscale analysis of
landscape structure using the GRASS geographical information system.
Landscape Ecology 7(4):29 1-302.
Brown, P. J. 1982. Multivariate calibration. J.Royal Statistical Soc. 3:287-321.
Cullinan, V. I. and Thomas, I. M. 1992. A comparison of quantitative methods for
examining landscape pattern and scale. Landscape Ecology 7 (3):2 11-227.
Czaplewski, R. L. and Catts, G. P. 1992. Calibration of remotely sensed proportion or area estimates for misclassification error. Remote Sens. Environ.
39:29-43.
Kalkhan, M. A., Reich, R. M., and Czaplewski, R. L. 1995. Evaluation of statistical properties of the inverse estimator for remotely sensed areal estimates
using simple random sampling. Proc. Amer. Soc. Photogramm. and Remote
Sensing Conf , 2 7 Feb. - 2 Mar. 1995, Charlotte, NC, 3:258-270.
Legendre, P. and Fortin, M-J. 1989. Spatial pattern and ecological analysis.
Vegetatio 80: 107- 138.
Mayaux P., and Lambin, E. F. 1995. Estimation of tropical forest area from
coarse spatial resolution data: A two step correction function for proportional errors. Remote Sens. Environ. 53: 1- 16.
Moody, A. and Woodcock, C. E. 1995. The influence of scale and the spatial
characteristics of landscapes on land-cover mapping using remote sensing.
Landscape Ecology lO(6):363-379.
Moody, A. and Woodcock, C. E. 1994. Scale-dependent errors in the estimation
of land-cover proportions -Implications for global land-cover datasets.
Photogramm. Eng. Remote Sens. 60(5):585-594.
BIOGRAPHICAL SKETCH
Aaron Moody is a geographer at the University of North Carolina, Chapel
Hill with a specialization in remote sensing of vegetation. He holds a Ph.D. from
Boston University and an M.A. from the University of California at Santa Barbara, both in geography.
Download