This file was created by scanning the printed publication. Errors identified by the software have been corrected; however, some errors may remain. A Calibration-Based Model for Correcting Area Estimates From Coarse Resolution Land Cover Data Aaron Moody * Abstract.-A two stage modeling strategy significantly improves area estimates by correcting coarse-resolution measurements of class proportions. Stage I models use measurements of scale-invariantlandscape spatial properties to estimate the slope and intercept of proportion transition relationships. A stage I1 model uses a regression estimator to predict true class proportions based on measured coarse-scale proportions, and the slope and intercept estimates from the stage I models. Model development and testing on a calibration site is followed by testing and inversion for a validation site. Inversion involves using spatial variables measured at the coarse scale as input to the stage I models. A probabilistic sampling strategy allows statistical assessment of the models and results. INTRODUCTION Image spatial resolution will influence land-cover area estimates made from classified remotely sensed data (Mayaux and Lambin 1995; Moody and Woodcock 1994). Models of these resolution effects can lead to improved area estimates derived from coarse resolution remote sensing. One might consider two approaches for modeling this scale-dependent areal bias. Mixture models, incorporated into the classification process, can estimate the subpixel composition of pixels if pure-class spectra are known (Adams et al. 1986). Alternatively, calibrated models, applied in a post-classification mode, can improve area estimates from coarse-resolution data if the relationship between "true" and measured proportions can be modeled (Mayaux and Lambin 1995; Kakhan et al. 1995). This paper describes the development and evaluation of three linked, statistical models that provide post-classification correction- of area estimates. The first two models estimate the slope and intercept of a line characterizing the relationship between true and coarse-scale proportions. In this case, "true" proportions refer to proportions measured at 30 m, and coarse-scale proportions refer to measurements at 1020 m. The slope and intercept are modeled based on a small set of relatively scale-invariant, measurable spatial properties of the landscape. The third model uses these estimated slopes and intercepts to predict 30 m proportions based on measurements of proportions at 1020 m. *Assistant Professor, Department of Geography, University of North Carolina, Chapel Hill, NC. 27599 BACKGROUND Two basic spatial effects contribute to biased area estimates. Afirst order effect is the tendency of large classes to increasingly dominate the landscape when measured at increasingly coarse scales. Accordingly, small classes tend to diminish in size. Figure 1 illustrates these patterns for all the data used in this study. Second order effects refer to modulations of these basic patterns due to specific landscape spatial organization (Moody and Woodcock 1995). These effects result in the scatter about the smoothed fit notable in Figure Id. In either case, scale-dependent changes in the apparent area of classes result from class membership transitions between scales. This effect can be thus characterized in terms of proportion transition lines relating true and coarse resolution proportions. The slope of such a line can summarize the rate of transition. If transition rates depend partly on landscape spatial organization, it is sensible to try to model them using measures of spatial pattern. A variety of spatial measures exist, and many reviews and summaries are present in vegetation analysis and landscape ecology literature (Legendre and Fortin 1989; Cullinan and Thomas 1992). In this research, landscape spatial properties are used to model the slopes and intercepts of the proportion transition lines for a large set of sampling units in independent calibration and validation sites. These are the stage I models. Model inversion, will require that the landscape pattern measures &relatively scale-invariant, or resistant to resolution. 0.0 0.2 0.4 0.6 0.8 0.0 Measured Proportions at 30 m 0.0 0.2 0.4 0.6 0.2 0.4 0.6 0.8 Measured Proportions at 30 m 0.8 0.0 0.2 0.4 0.6 0.8 Measured Proportions at 30 m Measured Proportions at 30 m Figure 1.-Relationship between proportions at 30 m and at 4 coarser scales. 84 There are several calibration-based methods for improving coarse resolution area estimates (Czaplewski and Catts 1992; Kalkhan et al. 1995). The model used here is a form of "classical" model in which known but incorrect values are estimated using unknown correct values (Brown 1982). For example, if PO represents true proportions and Pr represents measured proportions at some coarse spatial resolution r ,then: P, = Po + P1-Po+ error where Po and P1 are the intercept and slope of the proportion transition line that relates true proportions to proportions measured at resolution r . Inverting this simple model provides: which is the stage I1 model used in this paper. A five part strategy is employed as follows: a ) identify a set of scale invariant spatial measures; b) use a subset of these measures (calculated at 30 m) (X30) to develop stage I models for predicting the slope (61) and intercept (60) of the proportion transition lines for the calibration site; c) predict 30 m proportions (Po)for the calibration site by applying the stage 11model in Eq. 2 using the measured 1020 m proportions (P,),and the slopes (&) and intercepts (f30) estimated from the stage I models; d) using the stage I models developed on the calibration site, repeat step c and evaluate the procedure when applied to the data from the validation site; e) invert and evaluate the procedure by running the stage I models using the spatial variables as measured at 1020 m (Xlo20), and supply the results to the stage I1 model to predict 30 m proportions based only on information measured at 1020 m for the validation site. METHODS and RESULTS The Plumas and Stanislaus National Forests are used as calibration and validation sites, respectively. Landsat Thematic Mapper data have been classified to produce maps of general land-cover categories for each site. Classes include barren, brush, hardwood, water, and conifer. Although the two sites have similar characteristics, they are spatially separated by roughly 2' of latitude. The data for each site are aggregated to 1020 m resolution using a pluralitybased aggregation procedure. This involves coding each grid cell in a 1020 m sampling grid with the most frequently occurring subgrid-cell class. A set of randomly located 238x238 pixel subregions serve as the sampling units for the analyses. Each unit contains 56,644 30 m pixels and 49 1020 m pixels. The Plumas contains fifty sampling units for model development and initial testing. The Stanislaus contains thirty-five units for model validation and model inversion. The number of units represent 30% of all possible such units from each site. Within each sampling unit the following measurements are collected: proportions at 30 m for each class, proportions at 1020 m for each class, a set of spatial measures at 30 m, and the same set of spatial measures at 1020 m. The ultimate goal is to estimate 30 m proportions by supplying 1020 m area measurements and slope and intercept coefficients to the stage II model (Eq.2). An intermediate goal is to estimate the proper slopes and intercepts using a multiple regression model with a parsimonious set of spatial measures as the independent variables. The slopes and intercepts of the proportion transition lines are the dependent variables in these stage I models. A variety of spatial measures are determined within each unit using the r.le software (Baker and Cai 1992). Five of these demonstrate scale-invariance as determined by examining the simple correlations between each variable and itself at the two different scales. Of these five, three prove significant in modeling both the slope and the intercept of the proportion transitions as determined within each sampling unit. An additional variable (c below) that does not have the scale invariance property is included because it characterizes an important landscape characteristic not previously included in the model. The four variables used are: maximum class size (mx), inverse Simpson's index (s-l), contagion (c ), and entropy (ent). Expressions for the latter 3 variables are: c = 2.ln(k) - ent (4) where k is the number of classes present, Pi is the proportion of class i in the sampling unit, and Pij are elements of a k xk co-occurrence matrix and represent adjacency probabilities between classes i and j . Maximum class size refers to the proportion of the largest class in the sampling unit. Simpson's index indicates the probability of randomly selecting two pixels of the same attribute. Contagion measures the degree of clumping in the landscape. Entropy is maximized when all pixels of a given class are as far away from one another as possible. Table 1 shows the cross-scale correlation matrix for this set of variables. Table 2 presents regression summaries for the two stage I models. The independent variables of are the spatial variables measured at 30 m as described above. The dependent variables are a) the intercepts and b) the slopes of the proportion transition lines as determined using a linear least squares fit between the 30 and 1020 m proportions for the classes existent within each sampling unit. Table 1.-Cross-scale correlations (scale-invariance) of independent variables. Table 2.-Stage 1 models. Slope model R 'adj =0.63. Intercept model R 'adj =0.70. Slope Model Bo m 3 0 s - l 30 c30 ent 30 Intercept Model PO m 30 ~ ' ~ 3 0 C3o ent30 P > It1 Coefficient -5.77 4.95 -0.74 -0.679 2.83 Standard Error 1.05 0.84 0.22 0.17 0.49 t-value -5.49 5.92 -3.33 -4.06 5.83 0.00 0.00 0.002 0.0002 0.00 0.0042 -0.30 0.067 0.113 -0.145 0.093 0.074 0.020 0.015 0.043 0.045 -4.054 3.3904 7.68 -3.39 0.96 0.0002 0.0015 0.00 0.0015 The stage I models described in Table 2 estimate the Oo and coefficients necessary for employing the stage II model (Eq. 2). This sequential modeling process is conducted three times. First, the stage I models are developed using the data from the Plumas (calibration site). The predicted values from these models are then used in Eq. 2 to test the overall modeling process for the calibration data. Second, the stage I models are applied in a predictive mode using the independent variables as measured at 30 m from the Stanislaus (validation site). Again predicted $0 and values supply the coefficients to run Eq. 2 and estimate 30 m proportions for the Stanislaus. Third, stage I models are applied using the variables measured at 1020 m from the Stanislaus and the results are again used to estimate 30 m proportions. The first two cases are forward models in the sense that they require high resolution information to perform the correction. The third is an inverted model, because it relies only coarse resolution data. The results from these three tests are presented in Figures 2,3, and 4. DISCUSSION and CONCLUSIONS As seen in Figure 1, Figures 2a, 3a, and 4a also illustrate the basic scaling effects for class proportions. Note that at 1020 m, the greatest underestimations occur for intermediate-small classes, and the greatest overestimations occur for intermediate-large classes. Very large, very small, and moderate sized classes (at the cross-over point, around 30%) are all reasonably estimated at 1020m. The goal of the two stage modeling procedure is to pull the coarse-resolution area estimates closer to the zero-one line. Figures 2b and 3b demonstrate that the model improves area estimates for both the calibration and the validation sites when operated in the forward mode. The correction procedure performs best for large classes. For small classes a notable dip (albeit reduced) below the zero-one line still occurs. An intercept effect is also evidenced by the vertical alignment of estimates above the zero value of the x-axis. Results from the inverted model (Figure 4b) also show general improvement, although considerable scatter occurs for very large classes (note the 2 outliers). Once corrected values are derived, it is possible to tabulate the total absolute error within each sampling unit. For a given sampling unit g , the total error is: Forward Model: Plumas .. ..* . ..*.:'I Zero-One Line 0.0 0.2 0.4 0.6 0.8 Measured Proportions at 30m Measured Proportions at 30 rn Forward Model: Total Error by Region ... - - Zero-One Line ,- u 0 0.1 0.2 0.3 0.4 0.5 0.6 Pre-Correction 0.7 Post-Correction Pre-CorrectionError Figure 2.-Forward model results for the calibration site. Forward Model: Stanislaus Stanislaus / Zero-One Line .. 0.0 Measured Proportions at 30 rn 0.2 10.4 0.6 Measured Proportionsat 30rn Forward Model: Total Error by Region Pre-Correction Post-Correction Pre-CorrectionError Figure 3.-Forward model results for the validation site. 0.8 are the proportions for class i at 1020 and 30 m, respecwhere Pi,l02o and tively. By calculating these values for both pre- and postcorrection data, the results can readily be compared. Figures 2c and 3c show the relationship between pre- and postcorrection error for the calibration and validation sites using the forward model. For any point falling below the zero-one line, the total error is reduced due to the correction procedure. At both sites, the total error for roughly 90% of the sampling units is either reduced or unchanged after correction. Conversely, the error for roughly 10% of the units is increased. For the inverted model (Figure 4c) 80% of the regions are either improved or unchanged after correction with two positive outliers (regions 21 and 25). Figures 2d, 3d, and 4d are boxplots comparing the distributions of the preand postcorrection total error values. In all cases, the correction procedure results in a significant reduction in error. For the inverted model the test was performed after removing the outliers. In each case, however, the T-tests are suspect due to unequal variance. Several interesting questions deserve continued attention. How do the spatial measures used govern the proportion transitions? Will the scale-invariance property transfer to other landscape types? Stanislaus lnverted Model: Stanislaus 0 1 0.0 0.2 0.4 0.6 0.8 0.O 0.2 0.4 0.6 0.8 Measured Proportions at 30 m Measured Proportions at 30m Inverted Model: Total Error by Region lnverted Model: Two Outliers Removed 7 A 0 Pre-Correction Post-Correction Pre-CorrectionError Figure 4.-Inverted model results for the validation site. 89 Might other scale-invariant measures better predict proportion transitions? Is the general procedure extensible across landscape types? What is the effect of constraining the intercept to zero? What are the sensitivities of the cross-over point seen in Figure l ? Resolving these and other issues will help formalize a body of understanding of how class proportions scale. This understanding hopefully will lead to improved land-cover area estimates at local, regional and global scales. REFERENCES Adam, J. B., Smith, M. O., and Johnson, P. E. 1986. Spectral mixture modeling: A new analysis of rock and soil types at the Viking Lander 1 site. J. Geophy. Res. 91(B8):8098-8 112. Baker, W. L. and Cai, Y. 1992. The r.le programs for multiscale analysis of landscape structure using the GRASS geographical information system. Landscape Ecology 7(4):29 1-302. Brown, P. J. 1982. Multivariate calibration. J.Royal Statistical Soc. 3:287-321. Cullinan, V. I. and Thomas, I. M. 1992. A comparison of quantitative methods for examining landscape pattern and scale. Landscape Ecology 7 (3):2 11-227. Czaplewski, R. L. and Catts, G. P. 1992. Calibration of remotely sensed proportion or area estimates for misclassification error. Remote Sens. Environ. 39:29-43. Kalkhan, M. A., Reich, R. M., and Czaplewski, R. L. 1995. Evaluation of statistical properties of the inverse estimator for remotely sensed areal estimates using simple random sampling. Proc. Amer. Soc. Photogramm. and Remote Sensing Conf , 2 7 Feb. - 2 Mar. 1995, Charlotte, NC, 3:258-270. Legendre, P. and Fortin, M-J. 1989. Spatial pattern and ecological analysis. Vegetatio 80: 107- 138. Mayaux P., and Lambin, E. F. 1995. Estimation of tropical forest area from coarse spatial resolution data: A two step correction function for proportional errors. Remote Sens. Environ. 53: 1- 16. Moody, A. and Woodcock, C. E. 1995. The influence of scale and the spatial characteristics of landscapes on land-cover mapping using remote sensing. Landscape Ecology lO(6):363-379. Moody, A. and Woodcock, C. E. 1994. Scale-dependent errors in the estimation of land-cover proportions -Implications for global land-cover datasets. Photogramm. Eng. Remote Sens. 60(5):585-594. BIOGRAPHICAL SKETCH Aaron Moody is a geographer at the University of North Carolina, Chapel Hill with a specialization in remote sensing of vegetation. He holds a Ph.D. from Boston University and an M.A. from the University of California at Santa Barbara, both in geography.