TARGETED CHANGE DETECTION: A NOVEL SENSOR-INDEPENDENT PARTIALLY-SUPERVISED APPROACH

advertisement
In: Wagner W., Székely, B. (eds.): ISPRS TC VII Symposium – 100 Years ISPRS, Vienna, Austria, July 5–7, 2010, IAPRS, Vol. XXXVIII, Part 7B
Contents
Author Index
Keyword Index
TARGETED CHANGE DETECTION:
A NOVEL SENSOR-INDEPENDENT PARTIALLY-SUPERVISED APPROACH
D. Fernàndez-Prieto a, M. Marconcini a, *
a
Dept. of Earth Observation Science, Applications and Future Technologies, European Space Agency, ESA-ESRIN,
Via Galileo Galilei, 00044, Frascati, Rome, Italy - (diego.fernandez, mattia.marconcini)@esa.int
KEY WORDS: Change Detection, Land Cover, Satellite, Multitemporal, Multispectral, Hyperspectral
ABSTRACT:
In several real-world applications (e.g., forestry, agriculture), the objective of change detection is actually limited to one (or few)
specific “targeted” land-cover transition(s) affecting a certain area in a given time period. In such cases, ground-truth information is
generally available for the only land-cover classes of interest at the two dates, which limits (or hinders) the possibility of successfully employing standard supervised approaches. Moreover, even unsupervised change-detection methods cannot be effectively
used, as they allow identifying all the areas experiencing any type of change, but not discriminating where specific land-cover transitions of interest occur. In this paper, we present a novel technique capable of addressing this challenging issue (formulated in terms
of a compound decision problem) by exploiting the only ground truth available for the targeted land-cover classes at the two dates.
In particular, the proposed method relies on a partially-supervised approach and jointly exploits the Expectation-Maximization (EM)
algorithm and an iterative labelling strategy based on Markov random fields (MRF) accounting for spatial and temporal correlation
between the two images. Moreover, it also allows handling images acquired by different sensors at the two investigated times. Experimental results on different multi-temporal and multi-sensor data sets confirmed the effectiveness and the reliability of the proposed technique, which provided change-detection accuracies comparable with those obtained by fully-supervised methods.
pervised methods represent an ideal approach to changedetection analysis, since they permit both to identify areas experiencing changes, as well as to reliably determine the associated land-cover transitions. Nevertheless, their range of applicability is significantly limited by the difficulties in gathering
exhaustive and accurate ground-truth information for all the
land-cover classes characterizing each date under analysis. Indeed, such a requirement is costly, time consuming and not
always possible or feasible to satisfy.
However, in several operational change-detection problems the
main objective is not to characterize all the land-cover transitions occurred in the investigated area, but rather to identify a
single (or few) targeted land-cover transition(s) of interest. This
is typical for instance in agriculture, urban planning, or forestry
applications. In such circumstances, when only one (or few)
specific land-cover transitions need to be identified, it is reasonable to assume that the collection of ground-truth information associated with the only (or few) land-cover class(es) of
interest at the two considered dates is highly simplified. However, under this assumption, neither supervised nor unsupervised change-detection techniques can be effectively employed.
Let us consider for instance the case of two images acquired
over the same area at different times t1 and t2 , where the objective is to identify all the patterns experiencing the targeted landcover transitions from class “A” (e.g., forest) to “B” (e.g., urban
area) under the hypothesis that a ground truth for class “A” at
t1 and for class “B” at t2 is available (or can be easily retrieved
by an operator), respectively. In this context, on the one hand,
supervised techniques cannot be used, since the lack of an exhaustive ground truth characterizing all the land-cover classes at
the two dates under consideration will not allow a successful
training of the classifiers. On the other hand, unsupervised
techniques may allow identifying all the areas experiencing any
type of change, but not discriminating where specific targeted
land-cover transitions of interest occur. In this latter case, a
comparison (e.g., trough a significance-testing approach (Jeon
and Landgrebe, 1999)) of the labelled samples available for the
1. INTRODUCTION
Detecting changes occurring on the Earth’s surface represents
one of the main applications of satellite remote sensing. Indeed,
in a variety of different fields and applications (e.g., urban
planning, forestry, agriculture, disaster management, etc.) the
employment of multi-temporal satellite data has become essential for identifying where and (when possible) which types of
transitions have occurred between two given dates (Jensen,
2009).
Generally, change-detection methods are categorized as either
supervised or unsupervised, depending on the availability of
suitable prior information (Coppin et al., 2004; Duda et al.,
2000; Lu et al., 2004; Radke et al., 2005; Singh, 1989).
When an exhaustive multi-temporal ground truth characterizing
all the land-cover classes over the area of interest at both times
is available, then, supervised approaches can be applied. These
types of techniques are generally robust and effective, and allow identifying all the land-cover transitions occurred between
the two considered dates. In this framework, three main approaches are generally employed: post-classification comparison (PCC), supervised direct multi-data classification (DMC)
and compound classification (Duda et al., 2000; Lu et al., 2004;
Singh, 1989).
When no ground truth is available, instead, unsupervised techniques must be used, which allow detecting areas experiencing
changes (being even capable of separating land-cover transitions of different nature and characterizing their distribution),
but are unable to provide information on the specific type of
changes occurred. Several unsupervised change-detection techniques have been presented so far in the literature. Most of them
are based on image differencing, image ratioing, image regression, change vector analysis (CVA) and principal component
analysis (PCA), which all require the selection of proper thresholds for determining changed regions (Coppin et al., 2004; Lu
et al., 2004; Radke et al., 2005).
It is worth noting that, in the above described framework, su-
213
In: Wagner W., Székely, B. (eds.): ISPRS TC VII Symposium – 100 Years ISPRS, Vienna, Austria, July 5–7, 2010, IAPRS, Vol. XXXVIII, Part 7B
Contents
Author Index
maximizing the posterior probability given the two images X 1 ,
X 2 and, finally, to draw pixels experiencing the targeted landcover transition from ωint1 to ωint2 . This can be formalized as a
compound decision problem (Duda et al., 2000):
classes of interest at the two dates against all those identified as
changed pixels may provide some degree of information on the
type of land-cover transition, but not with the accuracy and
reliability provided by fully-supervised approaches).
In this paper, we formulate this complex issue in terms of a
compound decision problem (Duda et al., 2000) and propose a
novel partially-supervised change-detection (PSCD) technique
capable of exploiting the only prior knowledge available for the
specific land-cover classes of interest at the two times (thus
avoiding the need to rely on exhaustive ground-truth information for all the classes), while providing accuracies comparable
with those of fully-supervised methods. Moreover, it has the
great advantage of being sensor-independent, which allows
selecting at each date the set of sensors and features most suitable for characterizing the targeted class(es) of interest.
The proposed method aims at estimating at each date the probability density function (PDF) and the prior of both the class(es)
of interest and the remaining unknown land-cover classes (for
which no ground truth is available) represented as a single unknown information class. In particular, PDFs are approximated
by a mixture of suitable basis functions whose free parameters
are determined employing the iterative ExpectationMaximization (EM) algorithm (Dempster et al., 1977). Changed
pixels are then identified using an iterative labelling strategy
based on Markov random fields (MRF) (Solberg et al., 1996)
which allows taking into account both spatial and temporal
correlation between the two images, as well as properly constraining the probability estimates.
For demonstrating the capabilities of the proposed method,
extensive experimental trials have been carried out with different combinations of multispectral, hyperspectral and SAR data.
Obtained results confirmed the effectiveness and the reliability
of the proposed technique, which provided very promising results. In particular, accuracies are comparable to those achieved
with the PCC method in the presence of an exhaustive ground
truth for each image both considering the maximum likelihood
(ML) classifier (Richards and Jia, 2006), as well as support
vector machines (SVM) (Cristianini and Shawe-Taylor, 2000).
{C 1 , C 2 } = argmax{P (C 1 , C 2 | X 1 ,X 2 )}
According with the Bayes theory, finding a solution to (1) is
equal to determine the sets of labels maximizing the likelihood
L(X 1 ,X 2 | C 1 , C 2 ) = P(C 1 , C 2 ) ⋅ p (X 1 ,X 2 | C 1 , C 2 ) .
In the reasonable hypothesis of time-conditional independence,
the problem can be written as:
{C1,C 2}= argmax{L(X1,X 2 | C1,C 2 ) = P(C1,C 2 )⋅ p(X1 | C1 )⋅ p(X 2 | C 2 )} (2)
C1 ,C 2
where p (X 1 | C 1 ) and p(X 2 | C 2 ) represent the conditional
PDFs at t1 and t2 respectively.
3. PROPOSED PARTIALLY-SUPERVISED CHANGEDETECTION TECHNIQUE
For addressing the complex task described in Section 2, we
propose a novel partially-supervised technique aimed at app (X 1 | C 1 ) ,
proximating the class-conditional densities
2
2
p(X | C ) as mixtures of suitable basis kernel functions and
estimating the joint prior probability P (C 1 , C 2 ) properly taking
into consideration the spatio-temporal context.
The rationale is based on the observation that the PDF of an
image can be always approximated by a mixture of suitable
kernels (i.e., Parzen density estimation (Duda et al., 2000)).
Accordingly, similarly to what is commonly done in the context
of Radial Basis Function Neural Networks (RBF-NN) (Bruzzone and Fernàndez-Prieto, 1999), we model for each pixel of
both images the PDFs p( x1ij ) , p (xij2 ) as a mixture of K circularly symmetric multivariate Gaussian functions. Kernel parameters (i.e., centres and variances) are initialized using the kmean clustering algorithm (Bruzzone and Fernàndez-Prieto,
1999), whereas final estimates are obtained by using the EM
algorithm (Dempster et al., 1977). Then, class-conditional densities of the interest class p( x1ij | ωint1 ) , p( xij2 | ωint2 ) are modelled
by properly weighting the resulting set of kernels using again
the EM algorithm over the training samples available for ωint1
and ωint2 . This is somewhat analogous to the training phase of
RBF-NN which is generally carried out in two steps: i) selection of centres and variances of the kernel functions associated
with hidden units on the basis of clustering techniques; and ii)
computation of weights associated with the connections between the hidden and output layers on the basis of available
training patterns.
The PDF of the entire image is itself a mixture of the interest
and unknown class-conditional densities, weighted by corresponding prior probabilities. Accordingly, we obtain a first
1
2
) , p(x ij2 | ωunk
) initializing
rough approximation for p(x1ij | ωunk
priors to 0.5. Afterwards, estimates are refined using a novel
MRF-based iterative labelling strategy accounting for spatiotemporal correlation, which permits to model P (Cij1 , Cij2 ) mutually considering the local neighbourhood of each pixel in the
two images. Finally, changed pixels are identified and associated with the targeted land-cover transition by minimizing a
For the sake of simplicity we will describe the problem and the
proposed method under the assumption of a single targeted
land-cover transition of interest. The extension to the case of
multiple transitions is straightforward.
Let us consider two I × J co-registered remote-sensing images
X 1 ={x1ij }iI,,j=J 1 , x1ij ∈ D , and X 2 = {xij2 }iI,,j=J 1 , xij2 ∈ D , referring
the same geographical area at times t1 and t2 , respectively,
where x1ij , xij2 represent corresponding feature vectors (even
derived from different sets of sensors at each date, respectively,
and merged using a stacked vector approach (Richards and Jia,
2006)) associated with the pixel at position (i, j ) , and D1 , D2
define respective dimensionalities.
Let Ω1 = {ω11 ,…,ω L1 } and Ω 2 = {ω12 ,…,ω L2 } be the set of landcover classes characterizing X 1 and X 2 , respectively. In the
following, we will denote as ωint1 ∈Ω1 and ωint2 ∈Ω2 the information classes of interest at t1 and t2 , for which N1 and N 2
labelled training patters are available, respectively. Hence,
1
2
ωunk
= {Ω1 − ωint1 } and ωunk
= {Ω 2 − ωint2 } will represent corresponding unknown classes (each consisting of the merger of
remaining classes, for which no ground truth is available).
Let C 1 = {Cij1 }iI,,jJ=1 and C 2 = {Cij2 }iI,,jJ=1 denote two sets of labels for
t
X 1 and X 2 , respectively, where Cij1 ∈{ωintt ,ωunk
} and
t
Cij2 ∈{ωintt ,ωunk
} are associated with the pixel at position (i, j ) .
In this framework, our aim is to identify the two sets C 1 , C 2
1
(1)
C 1 ,C 2
2. PROBLEM FORMULATION AND ASSUMPTIONS
1
Keyword Index
2
2
214
In: Wagner W., Székely, B. (eds.): ISPRS TC VII Symposium – 100 Years ISPRS, Vienna, Austria, July 5–7, 2010, IAPRS, Vol. XXXVIII, Part 7B
Contents
Author Index
Keyword Index
3.1.1 Estimation of pˆ ( xtij )
For computing both the centres M t = {μ kt }kK=1 and variances
Σt = {σ 2 tk }kK=1 of all the kernels, as well as the set of weights W t
defining pˆ (xtij ) , we employ the EM algorithm over all the pixels of X t . EM allows determining the maximum likelihood
(ML) estimator of the parameters characterizing a certain distribution in the presence of incomplete observations (Dempster et
al., 1977). Indeed, our objective is to identify the ML estimate
for the set of parameters θ t = {M t , Σt , W t } = {μ kt ,σ 2 tk , wkt }kK=1 that
allows maximizing the log-likelihood of X t , i.e.
proper energy function. In the following, we will first introduce
the method adopted for modelling both the class-conditional
densities and the joint prior probability; then, we will present
the iterative strategy for identifying the sets of labels C 1 , C 2
maximizing the likelihood L(X 1 ,X 2 | C 1 , C 2 ) .
3.1 Conditional Density Modelling
Computing p (X t | C t ) , t = 1,2 , requires at the two considered
dates the estimation of the class-conditional densities
t
p(x tij | ωintt ) and p(x tij | ωunk
) , ∀xtij ∈X t . The proposed approach
is based on the observation that the PDF of each pixel can be
modelled as a mixture of the conditional PDFs of both the interest and unknown classes:
I ,J
ln L(θ t ) = ln [ p (X t |θ t )] = ∑ ln [ pˆ (xijt )]
(8)
i , j =1
t
t
p ( xtij ) = P (ωintt ) ⋅ p (xtij | ωintt ) + P(ωunk
) ⋅ p (xijt | ωunk
)
(3)
(l )
At each iteration l , the set of estimated parameters [θ t ] provides an increase in the log-likelihood until a local maximum is
(l )
(l )
reached, i.e. ln L ([θ t ] ) ≥ ln L ([θ t ] ) .
For simplicity, weights are initially set to 1/ K , whereas kernel
parameters are initialized using the k-means clustering algorithm (Bruzzone and Fernàndez-Prieto, 1999) fixing the number
of clusters equal to K . In particular, centres and variances of
the Gaussians are initialized to the centres and variances of the
resulting clusters. Then, according with Dempster et al., 1977,
the updated estimates for the unknown parameters are given by:
According with Parzen density estimation (Duda et al., 2000),
we aim at obtaining a reliable nonparametric estimate for
p (xtij ) as a mixture of a suitable set of kernel functions
Φ t = {φkt (⋅)}kK=1 :
K
pˆ (xtij ) = ∑ wkt ⋅φkt (xtij )
(4)
k =1
[ wkt ](l -1) ⋅[φkt (xijt )](l -1)
[ pˆ (xtij )](l -1)
[ wkt ](l ) = i , j =1
I ⋅J
I ,J
∑
where K denotes the number of kernels (a free parameter to be
K
set by the user), and W t = {wkt }k =1 represent the weights regulating the contribution of each kernel.
However, since the density p (xtij ) is given by a linear combit
nation of p( xtij | ωintt ) and p(x tij | ωunk
) , it is worth noting that, if
t
the set of kernels Φ allows obtaining a reliable estimate
pˆ (xtij ) , then also both the class-conditional densities can be
reliably approximated as a linear combination of Φ t . Hence,
they can be estimated as:
(9)
[ wkt ](l -1) ⋅[φkt (xijt )](l -1) t
⋅ xij
[ pˆ (xtij )](l -1)
[ μ kt ](l ) = i , j =1I , J
[ wkt ](l -1) ⋅[φkt ( xijt )](l -1)
∑
[ pˆ (xtij )](l -1)
i , j =1
I ,J
∑
(10)
K
pˆ (x tij | ωintt ) = ∑ wintt k ⋅φkt (xtij )
(5)
[ wkt ](l -1) ⋅[φkt ( xijt )](l -1) t
⋅ xij − [ μ kt ](l )
[ pˆ (x tij )](l -1)
i , j =1
2 t (l )
[σ k ] =
I ,J
[ wkt ](l -1) ⋅[φkt (x ijt )](l -1)
Dt ⋅ ∑
[ pˆ (xtij )](l -1)
i , j =1
I ,J
∑
k =1
K
t
t
t
t
) = ∑ wunk
pˆ (x tij | ωunk
k ⋅φk ( x ij )
(6)
2
(11)
k =1
Reasonably, we assume that convergence is reached when the
relative increase in the log-likelihood is lower than a prefixed
threshold ε .
t
t
= {wunk
are the weights reguwhere Wintt = {wintt k }k =1 , Wunk
k }k =1
lating the contribution of each kernel for the interest and unknown classes, respectively.
As commonly done in the literature we consider normalized
isotropic Gaussian kernels, i.e.
K
φkt (x tij ) =
K
t
t 2
⎡ || xij − μ k || ⎤
exp ⎢−
⎣
2σ 2tk ⎥⎦
( 2πσ 2 tk )
1
Dt
3.1.2 Estimation of pˆ (x tij | ωintt )
Once M t and Σt have been determined (and hence the set of
kernels Φ t properly defined), we exploit the available ground
truth for the class of interest ωintt at each date for deriving the
estimate of the corresponding conditional density pˆ ( xtij | ωintt ) .
In particular, the set of weights Wintt associated with ωintt is
determined using again the EM algorithm, but solely on the
available training samples T t = {xijt ∈X t | yij = ωintt } , | T t |= N t ,
where yij denotes the true label for pixel at position (i, j ) .
Weights are initialized to 1/ K , and then updated (according
with Dempster et al., 1977) using the following equation:
(7)
where μ kt is the centre and σ 2 tk is the variance (which tunes the
smoothness of the estimate).
In the following, we describe into details the procedures
t
),
adopted for estimating pˆ (xtij ) , pˆ (x tij | ωintt ) and pˆ (x tij | ωunk
respectively.
215
In: Wagner W., Székely, B. (eds.): ISPRS TC VII Symposium – 100 Years ISPRS, Vienna, Austria, July 5–7, 2010, IAPRS, Vol. XXXVIII, Part 7B
Contents
I ,J
∑α
ij
[ wintt k ](l ) = i , j =1
⋅
Author Index
[ wintt k ](l -1) ⋅φkt (xijt )
t
⎧1 if xij ∈ T
[ pˆ (x tij | ωintt )](l -1)
, α ij = ⎨
t
Nt
⎩0 if xij ∉ T
where β > 0 tunes the influence of the context and δ is the
Kronecker delta function defined as:
t
t
Keyword Index
(12)
1
2
1
2
⎧1 if (Cij ,Cij ) = (Cgh ,Cgh )
1
2
1
2
⎩0 if (Cij ,Cij ) ≠ (Cgh ,Cgh )
δ [(Cij1 ,Cij2 ),(C gh1 ,Cgh2 )] = ⎨
The corresponding log-likelihood is given by:
3.3 Iterative Labelling
I ,J
ln L(θ t ≡ Wintt ) = ∑α ij ⋅ ln[ pˆ (x tij | ωintt )]
(13)
i , j =1
Solving Eq. (2) is equivalent to maximize the log-likelihood
ln L(X 1 ,X 2 | C1 ,C 2 ) , which can be written as:
Even in this case we assume that convergence is reached if the
relative increase in the log-likelihood is lower than ε .
2
t
ln L(X 1 ,X 2 | C 1 , C 2 ) = −∑U data
(X t , C t ) −U contex (C 1 , C 2 ) − ln( Z ) (19)
t =1
t
3.1.3 Estimation of pˆ (x tij | ωunk
)
It is worth noting that Eq. (3) can be re-written as:
K
K
K
k =1
k =1
k =1
t
(X t , C t ) = − ln[ p (X t | C t )] , represents the classwhere U data
conditional energy function at date t = 1,2 , while U contex is
given by (17). Since Z solely depends on the selected type of
neighbourhood, the final problem becomes solving:
t
t
t
t
t
t
t
t
t
t
t
P(ωunk
) ⋅ ∑ wunk
k ⋅φk (xij ) = ∑ wk ⋅φk ( xij ) − P(ωint ) ⋅ ∑ wint k ⋅φk (xij ) (14)
t
t
t
t
t
) ⋅ wunk
Hence, ∀k = 1,…, K it holds P (ωunk
k = wk − P (ωint ) ⋅ wint k .
t
) = 1 , we have:
Since P (ωintt ) + P (ωunk
t
wunk
k =
(18)
wkt − P(ωintt ) ⋅ wintt k wkt − P(ωintt ) ⋅ wintt k
=
t
P (ωunk
)
1− P(ωintt )
1
2
{C 1 , C 2 } = argmin{U contex (C 1 , C 2 ) + U data
(X 1 , C 1 ) + U data
(X 2 , C 2 )} (20)
C 1 ,C 2
To this aim, we propose a strategy based on the Iterated Conditional Modes (ICM) algorithm (Besag, 1996) which allows
maximizing local conditional probabilities sequentially. In particular, at each iteration l we update the estimated prior probabilities for the class of interest at each date Pˆ (ωintt ) and, accordingly, also the class-conditional densities of the unknown
t
) . The algorithm works as follows:
classes pˆ (x tij | ωunk
(15)
Then, as M t , Σt , W t and Wintt have been determined, we can
t
) substituting (15) into (6), upon it is possicompute pˆ (x tij | ωunk
ble to obtain a reliable estimate Pˆ (ωintt ) for the prior probability
of the class of interest. This can be properly accomplished
throughout the iterative labelling phase presented below.
Step 1. After estimating pˆ ( xtij | ωintt ) , ∀xtij ∈X t , t = 1,2 following the approach described in the previous paragraphs, set
Pˆ (ωintt ) = 0.5 (no prior knowledge is assumed to be available
about the true prior P (ωintt ) ) and compute the conditional dent
) accordingly;
sity of the unknown classes pˆ (x tij | ωunk
Step 2. Derive the initial sets of labels C 1 , C 2 by solely minimizing the non-contextual terms of Eq. (20), i.e.
1
2
{C 1 , C 2 } = argmin{U data
(X 1 , C 1 ) + U data
(X 2 , C 2 )} ;
Step 3. On the basis of current C 1 and C 2 , compute the new
estimated prior probabilities ratioing the number of pixels associated with the class of interest over the whole number of pixels, i.e. Pˆ (ωintt ) = Cintt ( I ⋅ J ) , Cintt = {Cijt | Cijt ∈ C t ,Cijt = ωintt }iI,,jJ=1 ;
then, update the class-conditional densities for the unknown
t
classes pˆ (x tij | ωunk
) accordingly;
Step 4. Update C 1 and C 2 according with Eq. (20);
Step 5. Repeat Step 3 and Step 4 until no changes occur between successive iterations.
At the end of the process, the final targeted change-detection
map C * is defined as:
3.2 Joint Prior Modelling
For modelling the joint prior P (C 1 , C 2 ) , we propose an approach based on MRF (Solberg et al., 1996). In particular, we
assume that the couple of labels Cij1 , Cij2 associated with pixel
at position (i, j ) at times t1 and t2 depends on the couples of
labels associated with pixels belonging to the spatial
neighbourhood G ij of (i, j ) at the two dates (we always considered first order neighbourhoods).
In other words, the higher the number of its spatial neighbours
experiencing a certain land-cover transition is, the higher the
probability for a given pixel of experiencing the same transition
is. In this hypothesis, according with the MRF theory, it holds
the equivalence:
P (Cij1 ,Cij2 | Cgh1 , Cgh2 ;( g , h) ∈ G ij ) = Z −1 ⋅ exp[−U contex (Cij1 ,Cij2 )] (16)
⎧1
C * = {Cij*}iI,,jJ=1 , Cij* = ⎨
⎩0
where Z = Z (G ij ) is a normalizing constant called partition
function, while U contex is a Gibbs energy function (accounting
for the spatio-temporal context) of the form:
if Cij1 = ωint1 and Cij2 = ωint2
otherwise
(21)
4. EXPERIMENTAL RESULTS
U contex (C ,C ) = −
1
ij
2
ij
∑
( g ,h )∈G ij
β ⋅δ [(C ,C ),(C ,C )]
1
ij
2
ij
1
gh
2
gh
(17)
In order to assess the effectiveness of the proposed technique,
we carried out several experiments with different combinations
216
In: Wagner W., Székely, B. (eds.): ISPRS TC VII Symposium – 100 Years ISPRS, Vienna, Austria, July 5–7, 2010, IAPRS, Vol. XXXVIII, Part 7B
Contents
Author Index
Keyword Index
(Cristianini and Shawe-Taylor, 2000). For the selection of the
two free parameters (i.e., a penalization parameter and the variance of considered Gaussian kernels) we employed a 10-fold
cross-validation strategy (Duda et al., 2000).
Available prior knowledge has been used for defining regions
of interest composed on the whole by 21941 pixels whose
ground truth was known at the two times, respectively. 10 landcover classes have been considered at t1 , whereas 9 have been
taken into consideration at t2 . At both dates, from all the available labeled samples we defined two spatially-disjoint training
sets (see Table 1). This means that there is no overlapping between training samples at t1 and t2 . All of them have been used
for training both the ML and SVM supervised classifiers at each
time.
of multispectral, hyperspectral and SAR data. Nevertheless, due
to space constraints, we will focus the attention solely on a
representative change-detection problem referring to an intense
farming area experiencing several land-cover transitions located
in Barrax, a village close to Albacete (Castilla-La Mancha,
Spain). In particular, available data consist in: two Landsat-5
Thematic Mapper (TM) images acquired on 15th July 2003 and
17th July 2004, respectively (as generally done in the literature
among the 7 spectral bands we did not consider the lowresolution band associated with the thermal infrared channel);
one PROBA CHRIS image (composed by 63 spectral bands
with centre wavelengths from 400 to 1050 nm) acquired on 16th
July 2004; and one Envisat ASAR alternate polarization (VV,
HH) image (despeckled using a 3× 3 Gamma filter) acquired on
18th July 2004. All of them have been properly co-registered to
a common spatial geometry of 30 meters and a study area of
512 × 512 pixels has been selected. July 2003 will be referred
to as t1 , whereas July 2004 will be referred to as t2 (no significant changes indeed occurred between 16th and 18th July 2004).
From the original available images we derived the following
three datasets (using the stacked vector approach for multisource data fusion (Richards and Jia, 2006)) composed by: i)
the 6 Landsat TM bands at both t1 and t2 (i.e., D1 = D2 = 6 )
[Dataset I]; ii) the 6 Landsat TM bands at t1 and the merger of
the 2 Envisat ASAR backscattering intensity images with the 6
Landsat TM bands at t2 (i.e., D1 = 6 and D2 = 8 ) [Dataset II];
and iii) the 6 Landsat TM bands at t1 and the 63 PROBA
CHRIS bands at t2 (i.e., D1 = 6 and D2 = 63 ) [Dataset III].
However, it is worth noting that our objective is not to seek for
a set of features at both dates which could be more effective for
solving the investigated problem, but rather to demonstrate that
the presented method is even capable of effectively handling
different types of data at the two times.
As described in Section 3, the user is required to set the number
of Gaussian kernels K to be employed for approximating the
PDFs. Hence, in order to understand how significant the selection of this free parameter is, we performed a series of experiments varying K from 20 to 120 with steps of 10.
According with a variety of experiments on toy and real datasets we fixed ε = 10−4 and β = 102 .
In all the trials we employed the k-means clustering for initializing both centres and variances of kernel functions. Nevertheless, the very high complexity of the algorithm (i.e., approximately O( N D⋅( K +1) log N ) , where N and D represent the number of samples to be clustered and their dimensionality, respectively (Inaba et al., 1994)), prevented us from using all the patterns of each investigated image at a time, as this would have
required a very high computational burden. In order to overcome this limitation, for each image we ran the k-means algorithm on a random subset containing one third of the total
amount of samples. However, as this might affect the final
change-detection accuracies of the proposed technique, for each
number of considered kernels K , we performed 10 different
trials running each time the k-means clustering on a different
random subset. Moreover, we finally also combined the 10
resulting change-detection maps through a majority voting ensemble.
For validating the potentialities of the presented method, we
compared the results with those obtained by supervised PCC. In
particular, we considered ML and SVM fully-supervised classifiers trained by exploiting a complete ground truth for all the
land-cover classes characterizing each considered date. ML is a
simple yet generally rather effective statistical classifier, which
does not require the user to set any free parameter (Richards
and Jia, 2006). SVM are advanced state-of-art classifiers, which
proved capable of outperforming other traditional approaches
Land-cover class
alfalfa
bare soil
corn
garlic
grasslands
onions
poppy
potatoes
spring crops
stubble
sugar beet
sunflower
wheat
total
t1 (July 2003)
2031
2585
1737
101
–
213
336
208
–
2416
365
–
369
10361
t2 (July 2004)
634
–
2664
302
42
220
–
283
3318
2247
–
449
–
10159
Table 1. Number of spatially-disjoint training samples
considered at the two dates.
Change-detection results have been evaluated (over those samples whose land-cover class is known at both dates) in terms of:
percentage overall accuracy OA% (i.e., the percentage of samples correctly identified as both changed or unchanged over the
whole number of samples), and kappa coefficient of accuracy
(which also takes into consideration errors and their type)
(Richards and Jia, 2006).
Among the different land-cover transitions occurred between
the two dates, here we take into consideration (one at a time)
two of them, namely “bare soil to spring crops” and “alfalfa to
corn” (experienced by 4583 and 1035 pixels, respectively, over
the whole available 21941 whose ground truth was known at
both times).
In our trials, we empirically experienced that a common range
for K resulting in average high detection accuracies spans
from 60 to 80. When instead nearing the lower or the upper
bound of the considered interval, performances tend to vary
depending on the specific land-cover transition of interest. Accordingly, in Table 2 and Table 3 we show the results obtained
with the proposed technique for K = 40 , 60, 80, 100. In particular, we report the median over the 10 realizations with different k-means clustering initialization. Moreover, also accuracies finally obtained with the majority voting ensemble (denoted as PSCDMV), as well as those obtained by supervised PCC
using ML and SVM (denoted as PCCML and PCCSVM, respectively) are presented.
While investigating the “bare soil to spring crops” transition
with the proposed PSCD technique, from all the training pixels
reported in Table 1, we considered the only 2585 available for
bare soil at t1 and the only 3318 spatially-disjoint available for
217
In: Wagner W., Székely, B. (eds.): ISPRS TC VII Symposium – 100 Years ISPRS, Vienna, Austria, July 5–7, 2010, IAPRS, Vol. XXXVIII, Part 7B
Contents
Author Index
two times, while providing accuracies comparable with those of
fully-supervised methods. In particular, the PSCD technique
relies on a partially-supervised approach and jointly exploits the
Expectation-Maximization (EM) algorithm and an iterative
labelling strategy based on Markov random fields (MRF) accounting for spatial and temporal correlation between the two
images. Moreover, it also allows handling images acquired by
different sensors at the two considered times. Experimental
results on multi-sensor datasets derived from multispectral,
hyperspectral and SAR data confirmed the effectiveness and the
reliability of the proposed technique
spring crops at t2 . Obtained results are very satisfactory, as
confirmed by both the high kappa and OA% values reported in
Table 2 (always higher than 0.79 and 92, respectively). Moreover, by employing the majority voting ensemble it is possible
to further improve the performances and obtaining for both
indexes accuracies even closer to those obtained by PCCML and
PCCSVM with a fully-supervised training at both dates.
K
40
60
80
100
PSCDMV
PCCML
PCCSVM
Dataset I
kappa OA%
0.8925 96.59
0.8903 96.53
0.8822 96.25
0.8878 96.47
0.9076 97.08
0.9339 97.88
0.9501 98.36
Dataset II
kappa OA%
0.8403 95.09
0.8458 95.21
0.8412 95.11
0.8470 95.31
0.8649 95.82
0.9323 97.83
0.9359 97.90
Dataset III
kappa OA%
0.7955 92.61
0.8488 94.97
0.8441 94.80
0.8502 94.93
0.8622 95.32
0.9330 97.86
0.9703 99.02
REFERENCES
Besag, J., 1986. On the statistical analysis of dirty pictures.
Journal of the Royal Statistical Society, Series B, 48, pp. 259302.
Bruzzone, L., and Fernàndez-Prieto, D., 1999. A Technique for
the Selection of Kernel-Function Parameters in RBF Neural
Networks for Classification of Remote-Sensing Images. IEEE
Trans. Geosci. Remote Sens., 37(2), pp. 1179-1184.
Table 2. kappa coefficient of accuracy and OA% obtained for
the “bare soil to spring crops” land-cover transition.
While addressing the “alfalfa to corn” transition with the proposed PSCD technique, from all the training pixels reported in
Table 1, we considered the only 2031 available for alfalfa at t1
and the only 2664 spatially-disjoint available for corn at t2 .
Such a transition is rather difficult to characterize, as only experienced by few fields in the considered area. Indeed, according with the results in Table 3, this is confirmed by the very low
accuracies obtained by PCCML despite fully-supervised training.
Instead, in the light of the high complexity of the problem, performances exhibited by the proposed method are very promising, especially for Dataset II and Dataset III. Moreover, with
the majority voting ensemble the gap with respect to PCCSVM
becomes very small (Dataset II) or it is even possible to outperform results obtained with SVMs (Dataset III).
K
40
60
80
100
PSCDMV
PCCML
PCCSVM
Dataset I
kappa OA%
0.6725 97.36
0.7110 97.79
0.6530 97.25
0.6553 97.32
0.7524 98.16
0.5163 96.87
0.9003 99.08
Dataset II
kappa OA%
0.8549 98.79
0.8190 98.49
0.8172 98.45
0.7233 97.92
0.8896 99.06
0.5134 96.86
0.9244 99.32
Keyword Index
Coppin, P., Jonckheere, I., Nackaerts, K., and Muys, B., 2004.
Digital change detection methods in ecosystem monitoring: a
review. Int. J. Remote Sens., 25(9), pp. 1565-1596.
Cristianini N., and Shawe-Taylor, J., 2000. An Introduction to
Support Vector Machines. Cambridge University Press.
Dempster, A., Laird, N., and Rubin, D., 1977. Maximum Likelihood from Incomplete Data Via the EM Algorithm. The Royal
Statistical Society, Series B, 39(1), pp. 1-38.
Duda, R. O., Hart, P. E., and Stork, D. G., 2000. Pattern Classification. Wiley, New York.
Inaba, M., Katoh, N., and Imai, H., 1994. Applications of
weighted voronoi diagrams and randomization to variancebased k-clustering. Proceedings of the 10th Annual Symposium
on Computational Geometry, pp. 332–339.
Dataset III
kappa OA%
0.8117 98.40
0.8343 98.60
0.8206 98.45
0.8367 98.62
0.9079 99.22
0.5353 97.05
0.8969 99.04
Jensen, J. R., 2009. Remote Sensing Of The Environment. Pearson, Upper Saddle River.
Jeon, B., and Landgrebe, D. A., 1999. Partially supervised classification using weighted unsupervised clustering. IEEE Trans.
Geosci. Remote Sens., 37(3), pp. 1073-1079.
Table 3. kappa coefficient of accuracy and OA% obtained for
the “alfalfa to corn” land-cover transition.
Lu, D., Mausel, P., Brondizio, E., and Moran, E., 2004. Change
detection techniques. Int. J. Remote Sens., 25(12), pp. 23652407.
5. CONCLUSIONS
Radke, R. J., Andra, S., Al-Kofahi, O., and Roysam, B., 2005.
Image Change Detection Algorithms: A systematic Survey.
IEEE Trans. Geosci. Remote Sens., 14(3), pp. 294-307, 2005.
In this paper we presented a novel partially-supervised changedetection (PSCD) technique capable of addressing targeted
change-detection problems where the objective is to identify
one (or few) targeted land-cover transitions, under the assumption that ground-truth information is available for the only (few)
class(es) of interests at the two investigated dates.
In this context, either supervised or unsupervised standard approaches cannot be effectively employed. The proposed
method, instead, allows exploiting the only prior knowledge
available for the specific land-cover classes of interest at the
Richards, J. A., and Jia, X., 2006. Remote Sensing Digital Image Analysis. An Introduction. Springer, Berlin.
Singh, A., 1989. Digital change detection techniques using
remotely-sensed data. Int. J. Remote Sens., 10(6), pp. 989-1003.
Solberg, A. H. S., Taxt, T., and Jain, A. K., 1996. A Markov
Random Field Model for Classification of Multisource Satellite
Imagery. IEEE Trans. Geosci. Remote Sens., 34(1), pp. 100-113.
218
Download