Proceeding of the 10th International Symposium on Spatial Accuracy Assessment in Natural Resources and Environmental Sciences Florianopolis-SC, Brazil, July 10-13, 2012. Geostatistical methodologies for spatial accuracy assessment and update of land cover maps. João D. Carneiro1 , Amílcar O. Soares2 and Maria J. Pereira2 CERENA/IST, Instituto Superior Técnico Av. Rovisco Pais 1049-001, Lisboa Portugal 1 jcarneiro@geostatisticskitchen.org 2 asoraes@ist.utl.pt 3 maria.pereira@ist.utl.pt Abstract The application of classification algorithms on remote sensed data in order to obtain land use thematic maps is widely used because it is a fast and simple procedure. Difficulties arise, however, when the time comes to make use of this data for land use planning due to the difficulty in correctly assessing classification accuracy. The sole use of the confusion matrix disregards all spatial dependent error information. The use of a geostatistical framework to tackle this issue has achieved some success. Previous works proposed the use of indicator kriging to local varying means and sequential indicator simulation with prediction via collocated indicator cokriging. Such approaches have made significant progress. Two main problems remain however unsolved: the full incorporation of spatial error patterns for each thematic class and the correct assessment of patch size influence for each class.Also, the information contained in the confusion matrix remains to be related with spatial error patterns. In the present work, these two issues are address through the use of patch size weighted spatial covariance estimation together with a Direct Sequential Simulation algorithm where the histogram sampling method takes in account the error analysis in unequal spatial supports (thematic patch sizes) and also the information provided by the confusion matrix. Early tests of the methodology applied to a synthetic application shown promising results. Keywords: Geostatistics, Stochastic Simulation, Confusion Matrix, accuracy assessment 1. Introduction The classification of satellite images for producing land cover maps is one of the most common applications of remote sensing. As any classification model it contains errors of different types which may derive from several origins such as misregistration problems, mixed pixels, sensors properties and ground conditions and, last but not less important, the spatial stationarity assumptions on radiometric data between the training sites and the application sites. Although classification accuracy assessment is now widely accepted as a fundamental component of thematic mapping investigations it is not uncommon for map accuracy to be inadequately quantified. Presently land cover classification accuracy assessments are based on 2 Proceeding of the 10th International Symposium on Spatial Accuracy Assessment in Natural Resources and Environmental Sciences Florianopolis-SC, Brazil, July 10-13, 2012. error or confusion matrix, which is a simple cross-tabulation of the mapped class against that observed in the reference data at a set of validation pixels (Story and Congalton 1986; Congalton 1991; Stehman 1997; Stehman 1998) providing a summary of commission (type I) errors and omission (type II) errors. The widely used target accuracy of 85 % of global error can be inappropriate and that the approach to accuracy assessment commonly adopted in remote sensing is pessimistically biased (Giles Foody 2008). The errors of classification derived from satellite images are not distributed evenly by all land cover classes: some land cover classes are easier to discriminate than others and very often the occurrence of mixed pixels of certain thematic classes difficult the labeling of those classes. Also, a certain real class tends to be misclassified in a small sub-set of land cover classes which have spectral features that are similar to the real class, and not indiscriminately in any class (Stehman et al. 2003; Zhu et al. 2000). Moreover, fragmented landscapes tend to be more difficult to classify, i.e., for a certain thematic class small patches usually tend to produce more uncertain classifications classes than more continuous patches. Consequently, the spatial distribution of classification errors is not typically but the confusion matrix and other metrics derived from it are location- independent measures, thus they don’t provide any information about the spatial patterns of the error. Spatial characterization of classification errors is of prime importance (Giles Foody 2008) for the further use of classified thematic maps such as characterization of spatial uncertainty areas, evaluation of the classified themes which can be considered reliable or with the need of local field samples. 1.1. Using geostatistics Geostatistics framework is appropriate to model spatial variation of the classification uncertainty. Kyriakidis and Dungan (2001) proposed to map local indices of classification quality derived from local (posterior) conditional distributions functions of thematic classes obtained by indicator kriging with locally varying means (Amilcar Soares 1992). This method integrates in a unique step reference data (hard data) and the image classifier’s posterior probability vectors (soft data). The same authors proposed the use of Sequential Indicator Simulation (SIS) and the multiphase classification (Amilcar Soares 1990; Amilcar Soares 1992; Amilcar Soares 1998) to evaluate the propagation of classification uncertainty on ecological model predictions, using as local mean values the user’s accuracy ( Foody 2002), which means that a constant local mean for each class was used in the conditioning process. This may be too simplistic because often there is a distinct pattern of the thematic errors over the region (classified area) due to sensor’s properties and/or the ground conditions. de Bruin ( 2000) and Magnussen and Bruin (2003) used SIS with prediction via collocated indicator cokriging, which is a co-simulation method to generate stochastic realizations of thematic classes maps for updating cover type maps and that are suited for the estimation of the spatial distribution of prediction errors. The use of collocated indicator cokriging for data integration explicitly takes into account with cross-correlation between hard and soft data (Goovaerts 1994) . Consequently, the collocated cokriging estimates are potentially less influenced by sharp local contrasts in the soft data, then SIS with local varying means. Magnussen and Bruin (2003) state that this is an advantage, but their results clearly shown that some of the thematic classes tend to disappear which in our opinion is a strong disadvantage for this type of applications. For example, the water bodies are Proceeding of the 10th International Symposium on Spatial Accuracy Assessment in Natural Resources and Environmental Sciences Florianopolis-SC, Brazil, July 10-13, 2012. usually very easy to classify in remote sensing images when compared with other land cover types and produce very sharp contrasts with other thematic classes, but water courses (linear) will tend to disappear when using this method for updating the land cover map, despite of the success of the classifier in discriminating them. Moreover, different thematic classes may produce distinct patterns of error and taking into account the spatial cross-correlation between hard and soft data may blur this pattern. None of the previously described methods take into account the influence of the uncertainty derived from the size of the patches neither the information provided by the confusion matrix is directly used in the estimation and simulation processes. 1.2. Used Methodologies The stated issues are tackled in the present work by making use of the following approaches: 1. Use of weighted variograms (Monestiez et al. 2006) to access and filter the influence of thematic patch size in error mapping. 2. Use of DSS-Direct Sequential Simulation (Soares 2001) with weighted Simple Kriging. The weights are assigned to each hard data point in direct proportion to the thematic patch in which they fall. Soft data influence is also considered by using Local Varying Means. The reasoning follows the one used in incorporating block measuring error in the block kriging procedure (Liu 2007). 3. Use of the confusion matrix as a joint distribution function in the DSS formalism following closely the formalism of Horta and Soares (2010). The direct integration of the confusion matrix in the DSS formalism requires that this matrix be identified as a joint probability mass function and its related cumulative distribution function. Hence, the following assumption is required: ๏ท The class labels are not mere identifiers but in some way must relate to to a continuous value akin to the procedure used when estimating cdf´s of continuous threshold values of a random variable. It is the same procedure used in the Indicator framework. As such, an order relation must be contained in the class labels. When considering continuous spectral classes it can be assumed that such an order exists. If the stated assumption is found acceptable when considering the data under analysis, the DSS algorithm is used to simulate the variable corresponding to the ground data variable of the confusion matrix. The available soft data is used to select a conditional distribution derived from the confusion matrix based joint distribution function (it is the conditioning probability value). Being G the random variable representing the hard data and S the one representing the known soft data, for a given spatial coordinate where S=s, the sampled distribution in the DSS algorithm is given by: ๐(๐บ ≤ ๐ฅ |๐ ≤ ๐ ) = ๐(๐บ ≤ ๐ฅ, ๐ ≤ ๐ )⁄๐(๐ ≤ ๐ ) (1) This is the only modification to algorithm outlined by Soares (2001) and expanded to Joint Distributions by Horta and Soares (2010). 3 4 Proceeding of the 10th International Symposium on Spatial Accuracy Assessment in Natural Resources and Environmental Sciences Florianopolis-SC, Brazil, July 10-13, 2012. 2. Preliminary Results For the two outlined methods, a classified map of Portuguese rural landscape is used (figure 1). The classified image contains four thematic classes and the ground data contains fifteen data points. Figure 1: Classified image (left) and corresponding ground data (right). The first method used, DSS with simple weighted kriging and local varying means provided by the soft data, is presented here for class 2 on a 50x50 regular grid. The obtained mean image (figure 2) shows low values for the central band correpondng to where the available hard data agreed with the soft. Note the isolated human body shaped like patch for which the hard data value is class 2. The smaller area of the patch induces higher values of standard deviation than what to be expected by using the unweighted approach (not shown). Figure 2: DSS with weighted simple kriging mean image of the classification error for class 2 (left) and its corresponding standard deviations (right). Note the isolated human body shaped like patch for which the hard data value is class 2. The smaller area of the patch induces higher values of standard deviation than what to be expected by using the unweigthed approach. Proceeding of the 10th International Symposium on Spatial Accuracy Assessment in Natural Resources and Environmental Sciences Florianopolis-SC, Brazil, July 10-13, 2012. The second approach, based on the confusion matrix, is now compared with simple DSS. The results can be observed in figure 3. The most striking feature that can be observed relates to the disappearance of class 4. That fact is believed to be related to the fact that only one of hard data points belonged to this class and also to the fact there weren’t miss classifications present for this class. Being such, it should be expected that the conditional distributions reached probability value of one for the previous class (labeled 3). That is indeed the present case. The Confusion matrix approach seemed to fare better in reproducing the ground data available for class 2 when taking into account the soft image available and the hard data. The standard deviation map also shows less random patterns. Figure 3: Usual DSS maps (left) showing the mean classified image (top) and its correponding standard deviation map (bottom). The same maps can be observed for the confusion matrix based DSS (right) 3. Conclusions and Ongoing Work The methods presented in the current work are preliminary attempts at tackling both the influence of thematic patch size and integrating the confusion matrix in a geostatistical framework. There is still considerable work remaining (especially in 5 6 Proceeding of the 10th International Symposium on Spatial Accuracy Assessment in Natural Resources and Environmental Sciences Florianopolis-SC, Brazil, July 10-13, 2012. the confusion matrix based DSS method) either from a theoretical or a pratical implementation point of view. However, the authors believe that promising avenues are here shown for future work. References Someone, J. (2009), This is the title of the thesis. PhD thesis, University of Somecity, Somecountry. Smith, J., Jones, A.B. (2010), “A paper we will write someday”. Journal of Spatial Data Quality, Vol. 1(1):13-24. Silva, A., Ferreira, B., Fernandes, M. (2000), “How to reference a proceedings paper”. In: Gates, J., Lewis, C. (eds.). Proceedings of some symposium, Place, Country, pp. 313-319. Someone, A., Lewis, B. (2008), “How to reference a paper in a book”. In: Silva, J., Ferreira, C. (eds.). A book of papers, Publisher, pp. 313-319. Jones, A.B. (1998), This is a book title, Publisher, London, U.K., 215p. de Bruin, S. 2000. “Predicting the Areal Extent of Land-Cover Types Using Classified Imagery and Geostatistics.” Remote Sensing of Environment 74 (3) (December): 387396. Congalton, R. 1991. “A review of assessing the accuracy of classifications of remotely sensed data.” Remote Sensing of Environment 37 (1) (July): 35-46. Foody, G. 2002. “Status of land cover classification accuracy assessment.” Remote Sensing of Environment 80 (1) (April): 185-201. Foody, Giles. 2008. “Harshness in image classification accuracy assessment.” International Journal of Remote Sensing 29 (11) (June): 3137-3158. Goovaerts, Pierre. 1994. Comparison of CoIK, IK and MIK performances for Modeling Conditional Probabilities of Categorical Variables. In Geostatistics for the Next Century, ed. Dimitrakopoukos, 18-29. Kluwer, Dordrecht. Horta, Ana, and Amílcar Soares. 2010. “Direct Sequential Co-simulation with Joint Probability Distributions.” Mathematical Geosciences 42 (3) (February): 269-292. Kyriakidis, Phaedon, and Jennifer Dungan. 2001. “A geostatistical approach for mapping thematic classification accuracy and evaluating the impact of inaccurate spatial data on ecological model predictions.” Environmental and Ecological Statistics: 311-330. Magnussen, S;, and S Bruin. 2003. “Updating cover type maps using sequential indicator simulation.” Remote Sensing of Environment 87 (2-3) (October): 161-170. Proceeding of the 10th International Symposium on Spatial Accuracy Assessment in Natural Resources and Environmental Sciences Florianopolis-SC, Brazil, July 10-13, 2012. Monestiez, P, L Dubroca, E Bonnin, J Durbec, and C Guinet. 2006. “Geostatistical modelling of spatial distribution of Balaenoptera physalus in the Northwestern Mediterranean Sea from sparse count data and heterogeneous observation efforts.” Ecological Modelling 193 (3-4) (March): 615-628. Soares, Amilcar. 1990. “Geostatistical Estimation of Orebody Geometry: Morphological Kriging.” Mathematical Geology. Soares, Amilcar 1992. “Geostatistical Estimation of Multi-Phase Structures.” Mathematical Geology. Soares, Amilcar. 1998. “Short Note Sequential Indicator Simulation with Correction for Local Probabilities.” Mathematical Geology 30 (6): 761-765. Soares, Amilcar. 2001. “Direct Sequential Simulation and Cosimulation.” Mathematical Geology 33 (8): 911-926. Stehman, S. 1997. “Selecting and interpreting measures of thematic classification accuracy.” Remote Sensing of Environment 62 (1) (October): 77-89. doi:10.1016/S00344257(97)00083-7. http://linkinghub.elsevier.com/retrieve/pii/S0034425797000837. Stehman, S and Czaplewski, R. 1998. “Design and Analysis for Thematic Map Accuracy AssessmentFundamental Principles.” Remote Sensing of Environment 64 (3) (June): 331344. Stehman, S, J.D Wickhamb, J.H. Smithc, and L. Yang. 2003. “Thematic accuracy of the 1992 National Land-Cover Data for the eastern United States: Statistical methodology and regional results.” Remote Sensing of Environment 86 (4) (August): 500-516. Story, Michael, and Russell G Congalton. 1986. “Accuracy Assessmentโฏ: A User ’ s Perspective.” Society 52 (3): 397-399. Zhu, Zhillang, Umin Yang, Stephen V Stehman, and Raymond L Czaplewski. 2000. “Accuracy Assessment for the U . S . Geological Survey Regional Land-Cover Mapping Programโฏ: New York and New Jersey Region.” Photogrammetric Engineering & Remote Sensing 66 (12): 1425-1435. 7