Baker-Balzter Iterative Conditional Probability Classifier

advertisement
Working Note
The Iterated Contextual Probability
classifier (ICP)
Heiko Balzter1 and John Baker2
1
Centre for Ecology and Hydrology
Institute of Terrestrial Ecology
Monks Wood
Abbots Ripton
Huntingdon
Cambridgeshire
PE17 2LS
UK
Tel. +44-(0)1487-773381
Fax: +44-(0)1487-773277
Email: hbal@ceh.ac.uk
2
Remote Sensing Applications Consultants
7/12/1999
TABLE OF CONTENTS
1.
BRIEF REVIEW OF CONTEXTUAL CLASSIFICATION .......................................................... 4
1.1. ITERATED CONDITIONAL MODES (ICM) ........................................................................................... 6
1.2. CLASSIFICATION OF HOMOGENEOUS OBJECTS ................................................................................... 6
1.3. PROBABILISTIC RELAXATION ............................................................................................................ 6
1.4. CONTEXTUAL CLASSIFICATION USING STOCHASTIC MODELS ............................................................ 7
2.
THE ICP ALGORITHM .................................................................................................................... 8
3.
EXAMPLE: FOREST/NON-FOREST CLASSIFICATION ........................................................ 10
4.
CONCLUSIONS ............................................................................................................................... 12
5.
REFERENCES .................................................................................................................................. 13
1. Brief review of contextual classification
Classification has the objective to label a number of objects (pixels) with one of n
classes. The labelling process unavoidably contains errors, caused by outliers,
extreme values, unusual pixel signatures or errors in the image data. A good
classification algorithm minimises these errors.
Traditional classification approaches often neglect the information in an image about
spatial neighbourhood relationships between adjacent pixels. A standard maximum
likelihood classification is based solely on information about the pixel grey values in
the image. It does not matter at which location in the image the particular pixel is
situated. The pixel is labelled with the class having the most similar signature (mean
and standard deviation of grey values) to the pixel itself. However, in practice pixels
of the same class tend to appear in clusters, and the neighbourhood of a pixel provides
potentially valuable extra-information. This is illustrated in Image 1. The top image
shows a SAR image of Ust-Ilimsk. The spatial location of the pixels is kept as
information in this image. If this spatial information would be neglected, the pixel
coordinates could be assigned randomly, keeping the spectral information but losing
the spatial information. This has been done in the bottom image of Image 1. Clearly, it
would seem a waste to classify the bottom image rather than the top one. Note,
however, that exactly this is done by most traditional classification algorithms.
Contextual classification exploits the spatial information to resolve this problem.
A brief summary of such algorithms is given below, followed by a description of the
ICP algorithm.
Image 1: SAR image of part of the Ust-Ilimsk study area in Siberia. ERS coherence
(red), JERS-1 intensity (green) and ERS-1 intensity (blue).
Top: Pixels are shown at their geocoded location.
Bottom: The same pixels but with randomly assigned coordinates.
5
1.1.
Iterated Conditional Modes (ICM)
The idea of iterated approximation to an otherwise computationally expensive but
qualitatively good classification was published by Besag (1986). ICM was developed
as a quicker approximation to the maximum a posteriori (MAP) classifier, which
chooses the classification with the highest overall maximum probability. As the
number of pixels in an image increases, the MAP classification clearly creates
computational problems. The same is true for the method of simulated annealing.
ICM avoids the increase of computational burden with image size, by maximizing the
posterior marginal probabilities at each pixel.
1.2.
Classification of homogeneous objects
The image is first segmented into large homogeneous areas compared to the pixel
spacing. These areas are then treated as objects and classified.
1.3.
Probabilistic relaxation
This technique is based on an iterative adjustment of the a posteriori probabilities of
each pixel belonging to the classes. The new probability of pixel i belonging to class
k{1,...,K} is obtained from


1 w K
p k 1  2  rij k , g  p (ft ) g 
w f 1 g 1


p i( t 1) k  
2
K 
w
K


 p i( t ) h 1  1  rij h, g  p (ft ) g  




w 2 f 1 g 1
h 1 


2
(t )
i
6
where w2 is the number of pixels in the window, t and t+1 are subsequent iterations,
and
rij k , h  
N ij k , h 
K
K
 N k , h  N k , h 
g 1
i
g 1
j
is a compatibility measure of classes i and j, estimated from the numbers of occurence
of the classes in neighbouring pixels.
However, if the initial classification is not accurate, the estimates of rij will not be
reliable, and little improvement through the probabilistic relaxation algorithm may be
expected (Sharma and Sarkar 1998). This is a major drawback, given that the aim of
the algorithm is to improve an initially poor classification.
1.4.
Contextual classification using stochastic models
The influence of neighbouring pixels may also be expressed in terms of a stochastic
model. The spectral information in the neighbourhood is taken into account explicitly,
as well as the distribution of classes in the neighbourhood. The decision rule of pixel i
belonging to class k can then be formulated as
P y,   

1
Pk  P k P k ,  
f  
g 1
where  is the vector of spectral observations in the neighbourhood of the pixel,  is
the vector of classes in the neighbourhood, f() is a normalisation constant, P(|k) is
7
the probability of observing a neighbourhood configuration  given the central pixel
belongs to class k and P(|k,) is the probability of observing a spectral vector 
given a class configuration k and .
It is obvious that these distributions are difficult to determine, especially for a large
number of classes or large neighbourhoods, and autocorrelation models have been
used for this purpose. In fact, Sharma and Sarkar (1998) describe just a four pixel
neighbourhood for this technique.
2. The ICP algorithm
The ICP algorithm is based on a Bayesian classification using adaptive a priori
probabilities for polarimetric SAR data described in Van Zyl and Burnette (1992).
Assuming multivariate Gaussian distribution, the likelihood of observing a
backscatter vector Y with d elements given a particular class i is estimated from the
spectral signatures with mean i. Its probability density function is given by Devijver
and Kittler (1982):
P  y    2 

d
2


1
2
1

T
exp  Y   i   1 Y   i 
2

where -1 is the inverse of the covariance matrix and d is the number of spectral
channels. y denotes the true backscatter vector and  the true class to which it belongs.
The likelihood contains the spectral information from the SAR image, including
location, size and shape of the d-dimensional hyperellipsoids and correlation between
the channels.
8
In the first iteration, uniform prior probabilities are assumed. According to Bayes'
theorem, multiplication of the prior with the likelihood and normalisation gives the
posterior:
P  y  
P P  y  
P y 
Up to this step, the classification is a standard maximum likelihood classification.
To refine the classification result iteratively, contextual information about the a
posteriori probabilities of each pixel is now taken into account. For a predefined
window size w, any window contains =w2 pixels, the posterior probabilities of a
pixel belonging to a given class are averaged and used as a contextual prior
probability in the following iteration. Finally, the weight of the contextual against the
spectral information can be adjusted by raising the contextual probabilities to the
power :
  P i  y  
 

Pi     i

n





The normalisation constant is determined by pixel-wise summation over all products
for each class:
P y    P y  P 
i
9
Adjustable parameters in this procedure are the window size, the number of iterations
and the contextual weight.
For computational reasons, some additions have been made to the algorithm.
For incomplete windows, the pixel values are kept constant. This affects only w/2
pixels at the margins of the image.
It identifies a NULL pixel as belonging to the margin around the image, if the pixel
has only NULL pixels either to the left or to the right. The pixel is then kept
unclassified. Otherwise the pixel is assumed to be mistakenly unclassified. Given d
classes, each class is assigned a prior probability of 1/d and enters the next
classification iteration. This is aimed at pixels with exceptionally high distances from
the hyperellipsoids of all classes.
3. Example: Forest/non-forest classification
Image 2 shows an application of ICP based on a supervised maximum likelihood
classification (MLC) using training sites. ICP parameters were a 5x5 window, 3
iterations and a weight of contextual information of 2.
Input channels were two JERS-1 passes, all three ERS-1 and 2 passes and ERS
coherence, 50 m pixel spacing. No filtering has been applied.
The test site is Ust-Ilimsk.
Figure 1 shows that within the first three iterations most of the changes are happening,
and the algorithm reaches a stable solution. At little extra cost, the noisy MLC
classification can be improved considerably.
10
MLC
ICP, iteration 1
ICP, iteration 2
ICP, iteration 3
Image 2: Results from MLC and the first three iterations of ICP.
11
0.4
0.3
4
0.2
Kappa
% change
6
2
0.1
0
0.0
0
2
4
6
8
10
ICP iteration
Figure 1: Percentage of pixels with changed class assignment and  coefficient of
agreement during 10 iterations of ICP. The result of MLC is labelled iteration 0. Note
that  is lower than it should be because of the time lag of 7 years between the ground
data and image acquisitions and the coregistration error.
4. Conclusions
ICP can be used as a robust method for removing noise in classified maps based on
maximum likelihood classification (Gaveau et al., 1999). At little computational
expense, ICP uses the prior information that the same class tends to occur in two
neighbouring pixels. ICP merges this prior information with the information gained
from the signature of the class of each pixel. If the signature of the pixel is extreme
enough, it will maintain its class, even if all its neighbours belong to a different class.
12
5. References
Besag, J. 1986: On the statistical analysis of dirty pictures. Journal of the Royal
Statistical Society B 48, 259-302.
Devijver, P. A. and Kittler, J. 1982: Pattern recognition. London: Prentice-Hall.
Gaveau, D.L.A., Balzter, H. and Plummer, S. 1999: Boreal forest InSAR
classification properties. Proceedings of the CEOS SAR Working group meeting,
Toulouse, 26-29 Oct. 1999.
Sharma, K. M. S. and Sarkar, A. 1998: Modified contextual classification technique
for remote sensing data. Photogrammetric Engineering and Remote Sensing 64, 273280.
Van Zyl, J. J. and Burnette, C. F. 1992: Bayesian Classification of Polarimetric Sar
Images Using Adaptive a- Priori Probabilities. International Journal of Remote
Sensing 13, 835-840.
13
Download