Working Note The Iterated Contextual Probability classifier (ICP) Heiko Balzter1 and John Baker2 1 Centre for Ecology and Hydrology Institute of Terrestrial Ecology Monks Wood Abbots Ripton Huntingdon Cambridgeshire PE17 2LS UK Tel. +44-(0)1487-773381 Fax: +44-(0)1487-773277 Email: hbal@ceh.ac.uk 2 Remote Sensing Applications Consultants 7/12/1999 TABLE OF CONTENTS 1. BRIEF REVIEW OF CONTEXTUAL CLASSIFICATION .......................................................... 4 1.1. ITERATED CONDITIONAL MODES (ICM) ........................................................................................... 6 1.2. CLASSIFICATION OF HOMOGENEOUS OBJECTS ................................................................................... 6 1.3. PROBABILISTIC RELAXATION ............................................................................................................ 6 1.4. CONTEXTUAL CLASSIFICATION USING STOCHASTIC MODELS ............................................................ 7 2. THE ICP ALGORITHM .................................................................................................................... 8 3. EXAMPLE: FOREST/NON-FOREST CLASSIFICATION ........................................................ 10 4. CONCLUSIONS ............................................................................................................................... 12 5. REFERENCES .................................................................................................................................. 13 1. Brief review of contextual classification Classification has the objective to label a number of objects (pixels) with one of n classes. The labelling process unavoidably contains errors, caused by outliers, extreme values, unusual pixel signatures or errors in the image data. A good classification algorithm minimises these errors. Traditional classification approaches often neglect the information in an image about spatial neighbourhood relationships between adjacent pixels. A standard maximum likelihood classification is based solely on information about the pixel grey values in the image. It does not matter at which location in the image the particular pixel is situated. The pixel is labelled with the class having the most similar signature (mean and standard deviation of grey values) to the pixel itself. However, in practice pixels of the same class tend to appear in clusters, and the neighbourhood of a pixel provides potentially valuable extra-information. This is illustrated in Image 1. The top image shows a SAR image of Ust-Ilimsk. The spatial location of the pixels is kept as information in this image. If this spatial information would be neglected, the pixel coordinates could be assigned randomly, keeping the spectral information but losing the spatial information. This has been done in the bottom image of Image 1. Clearly, it would seem a waste to classify the bottom image rather than the top one. Note, however, that exactly this is done by most traditional classification algorithms. Contextual classification exploits the spatial information to resolve this problem. A brief summary of such algorithms is given below, followed by a description of the ICP algorithm. Image 1: SAR image of part of the Ust-Ilimsk study area in Siberia. ERS coherence (red), JERS-1 intensity (green) and ERS-1 intensity (blue). Top: Pixels are shown at their geocoded location. Bottom: The same pixels but with randomly assigned coordinates. 5 1.1. Iterated Conditional Modes (ICM) The idea of iterated approximation to an otherwise computationally expensive but qualitatively good classification was published by Besag (1986). ICM was developed as a quicker approximation to the maximum a posteriori (MAP) classifier, which chooses the classification with the highest overall maximum probability. As the number of pixels in an image increases, the MAP classification clearly creates computational problems. The same is true for the method of simulated annealing. ICM avoids the increase of computational burden with image size, by maximizing the posterior marginal probabilities at each pixel. 1.2. Classification of homogeneous objects The image is first segmented into large homogeneous areas compared to the pixel spacing. These areas are then treated as objects and classified. 1.3. Probabilistic relaxation This technique is based on an iterative adjustment of the a posteriori probabilities of each pixel belonging to the classes. The new probability of pixel i belonging to class k{1,...,K} is obtained from 1 w K p k 1 2 rij k , g p (ft ) g w f 1 g 1 p i( t 1) k 2 K w K p i( t ) h 1 1 rij h, g p (ft ) g w 2 f 1 g 1 h 1 2 (t ) i 6 where w2 is the number of pixels in the window, t and t+1 are subsequent iterations, and rij k , h N ij k , h K K N k , h N k , h g 1 i g 1 j is a compatibility measure of classes i and j, estimated from the numbers of occurence of the classes in neighbouring pixels. However, if the initial classification is not accurate, the estimates of rij will not be reliable, and little improvement through the probabilistic relaxation algorithm may be expected (Sharma and Sarkar 1998). This is a major drawback, given that the aim of the algorithm is to improve an initially poor classification. 1.4. Contextual classification using stochastic models The influence of neighbouring pixels may also be expressed in terms of a stochastic model. The spectral information in the neighbourhood is taken into account explicitly, as well as the distribution of classes in the neighbourhood. The decision rule of pixel i belonging to class k can then be formulated as P y, 1 Pk P k P k , f g 1 where is the vector of spectral observations in the neighbourhood of the pixel, is the vector of classes in the neighbourhood, f() is a normalisation constant, P(|k) is 7 the probability of observing a neighbourhood configuration given the central pixel belongs to class k and P(|k,) is the probability of observing a spectral vector given a class configuration k and . It is obvious that these distributions are difficult to determine, especially for a large number of classes or large neighbourhoods, and autocorrelation models have been used for this purpose. In fact, Sharma and Sarkar (1998) describe just a four pixel neighbourhood for this technique. 2. The ICP algorithm The ICP algorithm is based on a Bayesian classification using adaptive a priori probabilities for polarimetric SAR data described in Van Zyl and Burnette (1992). Assuming multivariate Gaussian distribution, the likelihood of observing a backscatter vector Y with d elements given a particular class i is estimated from the spectral signatures with mean i. Its probability density function is given by Devijver and Kittler (1982): P y 2 d 2 1 2 1 T exp Y i 1 Y i 2 where -1 is the inverse of the covariance matrix and d is the number of spectral channels. y denotes the true backscatter vector and the true class to which it belongs. The likelihood contains the spectral information from the SAR image, including location, size and shape of the d-dimensional hyperellipsoids and correlation between the channels. 8 In the first iteration, uniform prior probabilities are assumed. According to Bayes' theorem, multiplication of the prior with the likelihood and normalisation gives the posterior: P y P P y P y Up to this step, the classification is a standard maximum likelihood classification. To refine the classification result iteratively, contextual information about the a posteriori probabilities of each pixel is now taken into account. For a predefined window size w, any window contains =w2 pixels, the posterior probabilities of a pixel belonging to a given class are averaged and used as a contextual prior probability in the following iteration. Finally, the weight of the contextual against the spectral information can be adjusted by raising the contextual probabilities to the power : P i y Pi i n The normalisation constant is determined by pixel-wise summation over all products for each class: P y P y P i 9 Adjustable parameters in this procedure are the window size, the number of iterations and the contextual weight. For computational reasons, some additions have been made to the algorithm. For incomplete windows, the pixel values are kept constant. This affects only w/2 pixels at the margins of the image. It identifies a NULL pixel as belonging to the margin around the image, if the pixel has only NULL pixels either to the left or to the right. The pixel is then kept unclassified. Otherwise the pixel is assumed to be mistakenly unclassified. Given d classes, each class is assigned a prior probability of 1/d and enters the next classification iteration. This is aimed at pixels with exceptionally high distances from the hyperellipsoids of all classes. 3. Example: Forest/non-forest classification Image 2 shows an application of ICP based on a supervised maximum likelihood classification (MLC) using training sites. ICP parameters were a 5x5 window, 3 iterations and a weight of contextual information of 2. Input channels were two JERS-1 passes, all three ERS-1 and 2 passes and ERS coherence, 50 m pixel spacing. No filtering has been applied. The test site is Ust-Ilimsk. Figure 1 shows that within the first three iterations most of the changes are happening, and the algorithm reaches a stable solution. At little extra cost, the noisy MLC classification can be improved considerably. 10 MLC ICP, iteration 1 ICP, iteration 2 ICP, iteration 3 Image 2: Results from MLC and the first three iterations of ICP. 11 0.4 0.3 4 0.2 Kappa % change 6 2 0.1 0 0.0 0 2 4 6 8 10 ICP iteration Figure 1: Percentage of pixels with changed class assignment and coefficient of agreement during 10 iterations of ICP. The result of MLC is labelled iteration 0. Note that is lower than it should be because of the time lag of 7 years between the ground data and image acquisitions and the coregistration error. 4. Conclusions ICP can be used as a robust method for removing noise in classified maps based on maximum likelihood classification (Gaveau et al., 1999). At little computational expense, ICP uses the prior information that the same class tends to occur in two neighbouring pixels. ICP merges this prior information with the information gained from the signature of the class of each pixel. If the signature of the pixel is extreme enough, it will maintain its class, even if all its neighbours belong to a different class. 12 5. References Besag, J. 1986: On the statistical analysis of dirty pictures. Journal of the Royal Statistical Society B 48, 259-302. Devijver, P. A. and Kittler, J. 1982: Pattern recognition. London: Prentice-Hall. Gaveau, D.L.A., Balzter, H. and Plummer, S. 1999: Boreal forest InSAR classification properties. Proceedings of the CEOS SAR Working group meeting, Toulouse, 26-29 Oct. 1999. Sharma, K. M. S. and Sarkar, A. 1998: Modified contextual classification technique for remote sensing data. Photogrammetric Engineering and Remote Sensing 64, 273280. Van Zyl, J. J. and Burnette, C. F. 1992: Bayesian Classification of Polarimetric Sar Images Using Adaptive a- Priori Probabilities. International Journal of Remote Sensing 13, 835-840. 13