The propogation of the uncertainty of land use maps to modeled landscape dynamics Shoufan Fang1, George Z. Gertner 1, Guangxing Wang 1, and Alan B. Anderson2 1 Department of Natural Resources and Environmental Sciences, University of Illinois at Urbana-Champaign, W503 Turner Hall, 1102 S. Goodwin Avenue, Urbana, IL 61801, USA. E-mail:Gertner@uiuc.edu 2 US Army Corps of Engineer, Construction Engineering Research Laboratory, 9000 Research Park, Champaign, IL 61824, USA. __________________________________________________________________________________ Abstract Land use maps are widely used in modeling land use change, urban sprawl, and other landscape related studies. Misclassification of land use maps is usually provided as a measure of their quality. However, this very important information is rarely considered in land use based studies, especially in modeling landscape dynamics. The ignorance of the uncertainty of land use maps may cause models to provide unreliable prediction. This study is an attempt to investigate the impact of the accuracy of land use maps to modeling urban sprawl. In this study, the regional confusion matrix has been localized using a topographical map. Based on the regional and local confusion matrices, several uncertainty levels have been adopted. The results showed that the localized confusion matrix has significant change in error rates to reflect the character of the study area. The predictions made at different uncertainty levels are quite different. The uncertainty sources are also analyzed in this study. __________________________________________________________________________________ Introduction Spatially modeling ecosystem dynamics has boosted in landscape-based studies (Fang et al., 2005a). Previous spatial modeling usually uses land use/cover maps as base maps and implies that maps are error free or error negligible. However, this implication is very doubtful due to classification errors, which are usually listed in a confusion matrix, from remote sensors and data processing (Lunetta et al., 1991). Fang et al. (2002) have revealed that classification error was a major uncertainty source and its influence is significant. Therefore, misclassification of land use maps has to be considered in spatial modeling so as to increase the reliability of spatially modeled ecosystem dynamics. Large trees can cover their surrounding urbanization in urban/suburban areas. As it happens, urbanization can be misclassified as forest when satellite imagery was used to make land use maps. If a land use map covers a large area, its confusion matrix may not represent the condition in a small study area with such a character thus the confusion matrix needs to be modified for the small area. Suitable historical information needs to be used to modify confusion matrices, since a land use map is published for use, it is impossible to collect ground observations to modify a confusion matrix because ground conditions changed from the time when satellite imagery was taken. The objectives of this study are to utilize a topographical map as historical information to modify a confusion matrix of a land use map for a study area in order to obtain more realistic error rates, and to investigate the influence of misclassification of a land use map in the prediction of the probability of urban sprawl. Study Area, Materials, and Methodology This study was conducted in the east of Peoria City, State of Illinois, USA. The study area was of 5.7 × 6.9 km². It contained mainly a lake, hills, commercial-industrial and residential districts, forests, agricultural fields, and wetland. One of the major characters of this area is that plenty number of houses (residential land use) have been built on a hill and covered by tall trees. The 1993 land use map (resolution: 30×30 m²), its accuracy, and the feature maps for predicting urban sprawl of the study area had been provided by research team of Land Use Evolution and Impact Assessment Model (LEAM). The confusion matrix of the land use map was converted from the map accuracy (Table 1). Table 1 The regional confusion matrix obtained from EPA accuracy table. Class 10 20 30 40 50 80 90 Total 10 0.910 0.002 0.006 0.050 0.000 0.000 0.032 1.000 20 0.004 0.617 0.005 0.068 0.004 0.291 0.012 1.000 30 0.000 0.005 0.110 0.573 0.029 0.111 0.170 0.999 Classified 40 0.001 0.001 0.001 0.822 0.003 0.051 0.120 1.000 50 0.045 0.012 0.008 0.211 0.087 0.362 0.275 1.000 80 0.000 0.003 0.001 0.017 0.001 0.967 0.010 1.000 90 0.000 0.002 0.006 0.212 0.008 0.106 0.667 1.000 The topographical map (scale: 1:24000) made by The United States Geographical Survey (USGS) in 1996 had been used for modifying the confusion matrix. The topographical map had been scanned to make a graphic file. The resolution of the scanned topographical map was 3.9 and 4.2 meters along north-south and east-west directions, respectively. A block (matrix) containing 56 pixels (8 row by 7 column) on the scanned topographical map was used to match one pixel on the 1993 land use map. Six distinct points were selected to estimate the coefficients of a pair of models to overlap the two maps using least square adjustment (Wolf and Ghilani, 1997): ⎧⎪ xtp = A1,0 +A1,1 ⋅ xLU + A1,2 ⋅ yLU ⎨ ⎪⎩ ytp = A 2,0 +A 2,1 ⋅ xLU + A 2,2 ⋅ yLU (1) where x and y were respectively the coordinates of the maps, subscripts tp and LU represent topographical and land use maps, respectively, and A’s were the unknown coefficients. About 0.5% of the pixels on the 1993 land use map had been randomly selected to modify the original confusion matrix. Majority method was used to determine the category of the sampled pixels on the topographical map, and the classification from the topographical map was treated as the “truth” in modification of the original confusion matrix. Table 2 Levels of uncertainty adapted in prediction of the probability of urban sprawl. Level ER0 Description Treat the land use map as the “truth” Level ER3 Description Original error rates on all cells ER1 ER2 Original error rates on neighboring cells Modified error rates on neighboring cells ER4 ~ Modified error rates on all cells ~ The uncertainty in 1993 land use map was assumed at five levels in probability prediction based on the regional and modified confusion matrices (Table 2). Probability of urban sprawl was predicted based on the 1993 land use map and the assumed uncertainty levels. The probability model had the following form (for details see Fang et al., 2005a): 55 æ P(U _ R)' ö÷ 13 Logçç A X B jUj ÷= + (2) å å i i ÷ çè1 - P(U _ R)' ÷ ø i= 1 j= 1 where P(U_R)’ was the probability of land use converted from available (Undeveloped) to Residential, and A, B, X, and U were coefficients and mapped features and their cross products. Map uncertainty was induced into prediction via two ways. For uncertainty levels ER1 and ER2, the error rates of neighboring pixels were applied to compute the effect of the immediate neighboring pixels (N_E): 1 8 N_E = ∑ I(Ck = R) ⋅ p(R | Ck ) (3) 8 k =1 where I(.) is an indicator function, Ck is the category of the kth neighboring pixel, R is Category Residential, and p(R| Ck ) is the rate of R when the pixel was classified as Ck . For uncertainty levels ER3 and ER4, first use Eqs. (3) to induce errors from neighbors, then adjust the predicted probability from Eq. (2) to obtain the final probability: P(U _ R) = P(U _ R)'⋅ p(av | C) (4) where P(U_R)’ was the original predicted probability using Eq. (2) and p(av|C) was the rate of the categories which were available for development when the pixel in estimation was classified as C. For further information and details about the uncertainty methods described here, see Fang et al. (2005b). Results The fitted two models for overlapping topographical and land use maps had very high quality (see Table 3). Both R-squares were not smaller than 0.9999 and the residuals were at most a half of the land use pixel width/length. Table 3 Estimated coefficients and the quality measures of the fitted matching models. Matching Model A0 (Intercept) (pvalue) A1 ( xLU ) (p-value) A2 ( yLU ) (p-value) Model’s F-value (p-value) R-square xtp ytp -1206598 (<0.0001) 1.04650 (<0.0001) 0.11640 (<0.0016) 15498.3 (<0.0001) 0.9999 -857779 (<0.0001) -0.11314 (<0.0001) 0.93771 (<0.0001) 94304.8 (<0.0001) 1.0000 Residual (Unit: meter) Model xtp ytp Largest -15.956 4.451 Median 13.265 -3.145 Mean Smallest 11.721 -3.747 3.223 1.167 More than 80% of the sample was in the categories of Forest and Urbanization on the topographical map and any one of the other categories had a small number of observations. For this reason, just the rates of Forest and Urbanization were modified for the study area. When a pixel was classified as Forest on the land use map, the rates of the “truth” was Forest and Urbanization were respectively 0.770 and 0.197. For classified Urbanization, its rates of true and Forest were 0.877 and 0.096, respectively. These error rates showed evidential difference from the regional confusion matrix. The error rates of other categories in the modified confusion matrix were taken from the original confusion matrix. A. B. C. D. E. Figure 1 Predicted urban sprawl probabilities when different levels of error were used in prediction. From A to E, uncertainty levels are from ER0 to ER4, respectively. Figure 1 listed the probability maps predicted considering different uncertainty levels. When the original confusion matrix was applied to only neighboring pixels, the predicted probability of urban sprawl was almost the same to that predicted without considering misclassification (Figures 1.A and B). When the modified confusion matrix was applied to just neighboring pixels, the predicted probability map had noticeable difference (Figures 1.A and C), and much higher probability predicted on the hill along the east side of the lake. The much higher error rate of Urbanization-in-Forest in the modified confusion matrix seemed to be the major cause of the increased probability. When confusion matrix was applied to all pixels, the major difference in the predicted probability maps was that there was a probability almost everywhere except water pixels (Figures 1.D and E). When the modified confusion matrix was applied, the probabilities of development at forest pixels turned lower than those predicted with the original confusion matrix. This also reflected the higher rate of Urbanization-in-Forest. Discussion and Conclusion The modified confusion matrix reflects the ground characters better than the original one. It may also reflect the differences of the definitions of the land use maps and the historical information used to localize the confusion matrix. It is very difficult to eliminate processing uncertainty when historical data are used to modify confusion matrix. Temporal and scalar uncertainty and the differences of definitions existing in land use maps and the historical information used to localize confusion matrices are the most important uncertainty sources. When additional processes (such as scanning) are necessary to treat the historical information before sampling, those treatments also raises uncertainty. In order to reduce the uncertainty of the localized confusion matrices, eliminate as more uncertainty sources mentioned above as possible. Incorporation of the uncertainty of land use maps into land use prediction may change the outcome of spatial modeling. The impact of the uncertainty of land use maps depends on which factors have how much uncertainty. A spatially-specified confusion matrix map (Gertner et al. 2002) is necessary to improve the spatial modeling considering the uncertainty of land use maps. Acknowledgments The authors would like to appreciate the U.S. Army Corps of Engineering, Construction Engineering Research Laboratory for support and the team of Land Use Evolution and Impact Assessment Model (LEAM) for providing maps and LEAM model. Reference Fang, S., G. Z. Gertner, Z. Sun, and A. A. Anderson (2005a) The impact of interactions in spatial simulation of the dynamics of urban sprawl. Landscape and Urban Planning 73(4):294-306. Fang, S., G. Z. Gertner, G. Wang, and A. A. Anderson (2005b) The impact of misclassification in land use maps in the prediction of landscape dynamics. Landscape Ecology (Accepted). Fang, S., S. Wente, G. Z. Gertner, G. Wang, and A. B. Anderson (2002) Uncertainty analysis of predicted disturbance from off-road vehicular traffic in complex landscapes at Fort Hood. Environmental Management 30(2): 199-208. Gertner, G. Z., S. Fang, G. Wang, and S. Shinkareva (2002) Image-aided spatial accuracy assessment of land cover classification. In the International Union of Forestry Research Organization Conference entitled, Symposium on Statistics and Information Technology in Forestry, 09/2002, Blacksburg, Virginia. Lunetta R.S., Congalton R.G., Fenstermaker L.K., Jensen J.R., McGwire K.C. and Tinney L.R. (1991) Remote sensing and geographic information system data integration: Error sources and research issues. Photogrammetric Engineering & Remote Sensing 57(6):677-687. Wolf, P. R. and C. D. Ghilani (1997) Adjustment computations: statistics and least squares in surveying and GIS. John Wiley, New York.