In: Wagner W., Székely, B. (eds.): ISPRS TC VII Symposium – 100 Years ISPRS, Vienna, Austria, July 5–7, 2010, IAPRS, Vol. XXXVIII, Part 7B Contents Author Index Keyword Index TARGETED CHANGE DETECTION: A NOVEL SENSOR-INDEPENDENT PARTIALLY-SUPERVISED APPROACH D. Fernàndez-Prieto a, M. Marconcini a, * a Dept. of Earth Observation Science, Applications and Future Technologies, European Space Agency, ESA-ESRIN, Via Galileo Galilei, 00044, Frascati, Rome, Italy - (diego.fernandez, mattia.marconcini)@esa.int KEY WORDS: Change Detection, Land Cover, Satellite, Multitemporal, Multispectral, Hyperspectral ABSTRACT: In several real-world applications (e.g., forestry, agriculture), the objective of change detection is actually limited to one (or few) specific “targeted” land-cover transition(s) affecting a certain area in a given time period. In such cases, ground-truth information is generally available for the only land-cover classes of interest at the two dates, which limits (or hinders) the possibility of successfully employing standard supervised approaches. Moreover, even unsupervised change-detection methods cannot be effectively used, as they allow identifying all the areas experiencing any type of change, but not discriminating where specific land-cover transitions of interest occur. In this paper, we present a novel technique capable of addressing this challenging issue (formulated in terms of a compound decision problem) by exploiting the only ground truth available for the targeted land-cover classes at the two dates. In particular, the proposed method relies on a partially-supervised approach and jointly exploits the Expectation-Maximization (EM) algorithm and an iterative labelling strategy based on Markov random fields (MRF) accounting for spatial and temporal correlation between the two images. Moreover, it also allows handling images acquired by different sensors at the two investigated times. Experimental results on different multi-temporal and multi-sensor data sets confirmed the effectiveness and the reliability of the proposed technique, which provided change-detection accuracies comparable with those obtained by fully-supervised methods. pervised methods represent an ideal approach to changedetection analysis, since they permit both to identify areas experiencing changes, as well as to reliably determine the associated land-cover transitions. Nevertheless, their range of applicability is significantly limited by the difficulties in gathering exhaustive and accurate ground-truth information for all the land-cover classes characterizing each date under analysis. Indeed, such a requirement is costly, time consuming and not always possible or feasible to satisfy. However, in several operational change-detection problems the main objective is not to characterize all the land-cover transitions occurred in the investigated area, but rather to identify a single (or few) targeted land-cover transition(s) of interest. This is typical for instance in agriculture, urban planning, or forestry applications. In such circumstances, when only one (or few) specific land-cover transitions need to be identified, it is reasonable to assume that the collection of ground-truth information associated with the only (or few) land-cover class(es) of interest at the two considered dates is highly simplified. However, under this assumption, neither supervised nor unsupervised change-detection techniques can be effectively employed. Let us consider for instance the case of two images acquired over the same area at different times t1 and t2 , where the objective is to identify all the patterns experiencing the targeted landcover transitions from class “A” (e.g., forest) to “B” (e.g., urban area) under the hypothesis that a ground truth for class “A” at t1 and for class “B” at t2 is available (or can be easily retrieved by an operator), respectively. In this context, on the one hand, supervised techniques cannot be used, since the lack of an exhaustive ground truth characterizing all the land-cover classes at the two dates under consideration will not allow a successful training of the classifiers. On the other hand, unsupervised techniques may allow identifying all the areas experiencing any type of change, but not discriminating where specific targeted land-cover transitions of interest occur. In this latter case, a comparison (e.g., trough a significance-testing approach (Jeon and Landgrebe, 1999)) of the labelled samples available for the 1. INTRODUCTION Detecting changes occurring on the Earth’s surface represents one of the main applications of satellite remote sensing. Indeed, in a variety of different fields and applications (e.g., urban planning, forestry, agriculture, disaster management, etc.) the employment of multi-temporal satellite data has become essential for identifying where and (when possible) which types of transitions have occurred between two given dates (Jensen, 2009). Generally, change-detection methods are categorized as either supervised or unsupervised, depending on the availability of suitable prior information (Coppin et al., 2004; Duda et al., 2000; Lu et al., 2004; Radke et al., 2005; Singh, 1989). When an exhaustive multi-temporal ground truth characterizing all the land-cover classes over the area of interest at both times is available, then, supervised approaches can be applied. These types of techniques are generally robust and effective, and allow identifying all the land-cover transitions occurred between the two considered dates. In this framework, three main approaches are generally employed: post-classification comparison (PCC), supervised direct multi-data classification (DMC) and compound classification (Duda et al., 2000; Lu et al., 2004; Singh, 1989). When no ground truth is available, instead, unsupervised techniques must be used, which allow detecting areas experiencing changes (being even capable of separating land-cover transitions of different nature and characterizing their distribution), but are unable to provide information on the specific type of changes occurred. Several unsupervised change-detection techniques have been presented so far in the literature. Most of them are based on image differencing, image ratioing, image regression, change vector analysis (CVA) and principal component analysis (PCA), which all require the selection of proper thresholds for determining changed regions (Coppin et al., 2004; Lu et al., 2004; Radke et al., 2005). It is worth noting that, in the above described framework, su- 213 In: Wagner W., Székely, B. (eds.): ISPRS TC VII Symposium – 100 Years ISPRS, Vienna, Austria, July 5–7, 2010, IAPRS, Vol. XXXVIII, Part 7B Contents Author Index maximizing the posterior probability given the two images X 1 , X 2 and, finally, to draw pixels experiencing the targeted landcover transition from ωint1 to ωint2 . This can be formalized as a compound decision problem (Duda et al., 2000): classes of interest at the two dates against all those identified as changed pixels may provide some degree of information on the type of land-cover transition, but not with the accuracy and reliability provided by fully-supervised approaches). In this paper, we formulate this complex issue in terms of a compound decision problem (Duda et al., 2000) and propose a novel partially-supervised change-detection (PSCD) technique capable of exploiting the only prior knowledge available for the specific land-cover classes of interest at the two times (thus avoiding the need to rely on exhaustive ground-truth information for all the classes), while providing accuracies comparable with those of fully-supervised methods. Moreover, it has the great advantage of being sensor-independent, which allows selecting at each date the set of sensors and features most suitable for characterizing the targeted class(es) of interest. The proposed method aims at estimating at each date the probability density function (PDF) and the prior of both the class(es) of interest and the remaining unknown land-cover classes (for which no ground truth is available) represented as a single unknown information class. In particular, PDFs are approximated by a mixture of suitable basis functions whose free parameters are determined employing the iterative ExpectationMaximization (EM) algorithm (Dempster et al., 1977). Changed pixels are then identified using an iterative labelling strategy based on Markov random fields (MRF) (Solberg et al., 1996) which allows taking into account both spatial and temporal correlation between the two images, as well as properly constraining the probability estimates. For demonstrating the capabilities of the proposed method, extensive experimental trials have been carried out with different combinations of multispectral, hyperspectral and SAR data. Obtained results confirmed the effectiveness and the reliability of the proposed technique, which provided very promising results. In particular, accuracies are comparable to those achieved with the PCC method in the presence of an exhaustive ground truth for each image both considering the maximum likelihood (ML) classifier (Richards and Jia, 2006), as well as support vector machines (SVM) (Cristianini and Shawe-Taylor, 2000). {C 1 , C 2 } = argmax{P (C 1 , C 2 | X 1 ,X 2 )} According with the Bayes theory, finding a solution to (1) is equal to determine the sets of labels maximizing the likelihood L(X 1 ,X 2 | C 1 , C 2 ) = P(C 1 , C 2 ) ⋅ p (X 1 ,X 2 | C 1 , C 2 ) . In the reasonable hypothesis of time-conditional independence, the problem can be written as: {C1,C 2}= argmax{L(X1,X 2 | C1,C 2 ) = P(C1,C 2 )⋅ p(X1 | C1 )⋅ p(X 2 | C 2 )} (2) C1 ,C 2 where p (X 1 | C 1 ) and p(X 2 | C 2 ) represent the conditional PDFs at t1 and t2 respectively. 3. PROPOSED PARTIALLY-SUPERVISED CHANGEDETECTION TECHNIQUE For addressing the complex task described in Section 2, we propose a novel partially-supervised technique aimed at app (X 1 | C 1 ) , proximating the class-conditional densities 2 2 p(X | C ) as mixtures of suitable basis kernel functions and estimating the joint prior probability P (C 1 , C 2 ) properly taking into consideration the spatio-temporal context. The rationale is based on the observation that the PDF of an image can be always approximated by a mixture of suitable kernels (i.e., Parzen density estimation (Duda et al., 2000)). Accordingly, similarly to what is commonly done in the context of Radial Basis Function Neural Networks (RBF-NN) (Bruzzone and Fernàndez-Prieto, 1999), we model for each pixel of both images the PDFs p( x1ij ) , p (xij2 ) as a mixture of K circularly symmetric multivariate Gaussian functions. Kernel parameters (i.e., centres and variances) are initialized using the kmean clustering algorithm (Bruzzone and Fernàndez-Prieto, 1999), whereas final estimates are obtained by using the EM algorithm (Dempster et al., 1977). Then, class-conditional densities of the interest class p( x1ij | ωint1 ) , p( xij2 | ωint2 ) are modelled by properly weighting the resulting set of kernels using again the EM algorithm over the training samples available for ωint1 and ωint2 . This is somewhat analogous to the training phase of RBF-NN which is generally carried out in two steps: i) selection of centres and variances of the kernel functions associated with hidden units on the basis of clustering techniques; and ii) computation of weights associated with the connections between the hidden and output layers on the basis of available training patterns. The PDF of the entire image is itself a mixture of the interest and unknown class-conditional densities, weighted by corresponding prior probabilities. Accordingly, we obtain a first 1 2 ) , p(x ij2 | ωunk ) initializing rough approximation for p(x1ij | ωunk priors to 0.5. Afterwards, estimates are refined using a novel MRF-based iterative labelling strategy accounting for spatiotemporal correlation, which permits to model P (Cij1 , Cij2 ) mutually considering the local neighbourhood of each pixel in the two images. Finally, changed pixels are identified and associated with the targeted land-cover transition by minimizing a For the sake of simplicity we will describe the problem and the proposed method under the assumption of a single targeted land-cover transition of interest. The extension to the case of multiple transitions is straightforward. Let us consider two I × J co-registered remote-sensing images X 1 ={x1ij }iI,,j=J 1 , x1ij ∈ D , and X 2 = {xij2 }iI,,j=J 1 , xij2 ∈ D , referring the same geographical area at times t1 and t2 , respectively, where x1ij , xij2 represent corresponding feature vectors (even derived from different sets of sensors at each date, respectively, and merged using a stacked vector approach (Richards and Jia, 2006)) associated with the pixel at position (i, j ) , and D1 , D2 define respective dimensionalities. Let Ω1 = {ω11 ,…,ω L1 } and Ω 2 = {ω12 ,…,ω L2 } be the set of landcover classes characterizing X 1 and X 2 , respectively. In the following, we will denote as ωint1 ∈Ω1 and ωint2 ∈Ω2 the information classes of interest at t1 and t2 , for which N1 and N 2 labelled training patters are available, respectively. Hence, 1 2 ωunk = {Ω1 − ωint1 } and ωunk = {Ω 2 − ωint2 } will represent corresponding unknown classes (each consisting of the merger of remaining classes, for which no ground truth is available). Let C 1 = {Cij1 }iI,,jJ=1 and C 2 = {Cij2 }iI,,jJ=1 denote two sets of labels for t X 1 and X 2 , respectively, where Cij1 ∈{ωintt ,ωunk } and t Cij2 ∈{ωintt ,ωunk } are associated with the pixel at position (i, j ) . In this framework, our aim is to identify the two sets C 1 , C 2 1 (1) C 1 ,C 2 2. PROBLEM FORMULATION AND ASSUMPTIONS 1 Keyword Index 2 2 214 In: Wagner W., Székely, B. (eds.): ISPRS TC VII Symposium – 100 Years ISPRS, Vienna, Austria, July 5–7, 2010, IAPRS, Vol. XXXVIII, Part 7B Contents Author Index Keyword Index 3.1.1 Estimation of pˆ ( xtij ) For computing both the centres M t = {μ kt }kK=1 and variances Σt = {σ 2 tk }kK=1 of all the kernels, as well as the set of weights W t defining pˆ (xtij ) , we employ the EM algorithm over all the pixels of X t . EM allows determining the maximum likelihood (ML) estimator of the parameters characterizing a certain distribution in the presence of incomplete observations (Dempster et al., 1977). Indeed, our objective is to identify the ML estimate for the set of parameters θ t = {M t , Σt , W t } = {μ kt ,σ 2 tk , wkt }kK=1 that allows maximizing the log-likelihood of X t , i.e. proper energy function. In the following, we will first introduce the method adopted for modelling both the class-conditional densities and the joint prior probability; then, we will present the iterative strategy for identifying the sets of labels C 1 , C 2 maximizing the likelihood L(X 1 ,X 2 | C 1 , C 2 ) . 3.1 Conditional Density Modelling Computing p (X t | C t ) , t = 1,2 , requires at the two considered dates the estimation of the class-conditional densities t p(x tij | ωintt ) and p(x tij | ωunk ) , ∀xtij ∈X t . The proposed approach is based on the observation that the PDF of each pixel can be modelled as a mixture of the conditional PDFs of both the interest and unknown classes: I ,J ln L(θ t ) = ln [ p (X t |θ t )] = ∑ ln [ pˆ (xijt )] (8) i , j =1 t t p ( xtij ) = P (ωintt ) ⋅ p (xtij | ωintt ) + P(ωunk ) ⋅ p (xijt | ωunk ) (3) (l ) At each iteration l , the set of estimated parameters [θ t ] provides an increase in the log-likelihood until a local maximum is (l ) (l ) reached, i.e. ln L ([θ t ] ) ≥ ln L ([θ t ] ) . For simplicity, weights are initially set to 1/ K , whereas kernel parameters are initialized using the k-means clustering algorithm (Bruzzone and Fernàndez-Prieto, 1999) fixing the number of clusters equal to K . In particular, centres and variances of the Gaussians are initialized to the centres and variances of the resulting clusters. Then, according with Dempster et al., 1977, the updated estimates for the unknown parameters are given by: According with Parzen density estimation (Duda et al., 2000), we aim at obtaining a reliable nonparametric estimate for p (xtij ) as a mixture of a suitable set of kernel functions Φ t = {φkt (⋅)}kK=1 : K pˆ (xtij ) = ∑ wkt ⋅φkt (xtij ) (4) k =1 [ wkt ](l -1) ⋅[φkt (xijt )](l -1) [ pˆ (xtij )](l -1) [ wkt ](l ) = i , j =1 I ⋅J I ,J ∑ where K denotes the number of kernels (a free parameter to be K set by the user), and W t = {wkt }k =1 represent the weights regulating the contribution of each kernel. However, since the density p (xtij ) is given by a linear combit nation of p( xtij | ωintt ) and p(x tij | ωunk ) , it is worth noting that, if t the set of kernels Φ allows obtaining a reliable estimate pˆ (xtij ) , then also both the class-conditional densities can be reliably approximated as a linear combination of Φ t . Hence, they can be estimated as: (9) [ wkt ](l -1) ⋅[φkt (xijt )](l -1) t ⋅ xij [ pˆ (xtij )](l -1) [ μ kt ](l ) = i , j =1I , J [ wkt ](l -1) ⋅[φkt ( xijt )](l -1) ∑ [ pˆ (xtij )](l -1) i , j =1 I ,J ∑ (10) K pˆ (x tij | ωintt ) = ∑ wintt k ⋅φkt (xtij ) (5) [ wkt ](l -1) ⋅[φkt ( xijt )](l -1) t ⋅ xij − [ μ kt ](l ) [ pˆ (x tij )](l -1) i , j =1 2 t (l ) [σ k ] = I ,J [ wkt ](l -1) ⋅[φkt (x ijt )](l -1) Dt ⋅ ∑ [ pˆ (xtij )](l -1) i , j =1 I ,J ∑ k =1 K t t t t ) = ∑ wunk pˆ (x tij | ωunk k ⋅φk ( x ij ) (6) 2 (11) k =1 Reasonably, we assume that convergence is reached when the relative increase in the log-likelihood is lower than a prefixed threshold ε . t t = {wunk are the weights reguwhere Wintt = {wintt k }k =1 , Wunk k }k =1 lating the contribution of each kernel for the interest and unknown classes, respectively. As commonly done in the literature we consider normalized isotropic Gaussian kernels, i.e. K φkt (x tij ) = K t t 2 ⎡ || xij − μ k || ⎤ exp ⎢− ⎣ 2σ 2tk ⎥⎦ ( 2πσ 2 tk ) 1 Dt 3.1.2 Estimation of pˆ (x tij | ωintt ) Once M t and Σt have been determined (and hence the set of kernels Φ t properly defined), we exploit the available ground truth for the class of interest ωintt at each date for deriving the estimate of the corresponding conditional density pˆ ( xtij | ωintt ) . In particular, the set of weights Wintt associated with ωintt is determined using again the EM algorithm, but solely on the available training samples T t = {xijt ∈X t | yij = ωintt } , | T t |= N t , where yij denotes the true label for pixel at position (i, j ) . Weights are initialized to 1/ K , and then updated (according with Dempster et al., 1977) using the following equation: (7) where μ kt is the centre and σ 2 tk is the variance (which tunes the smoothness of the estimate). In the following, we describe into details the procedures t ), adopted for estimating pˆ (xtij ) , pˆ (x tij | ωintt ) and pˆ (x tij | ωunk respectively. 215 In: Wagner W., Székely, B. (eds.): ISPRS TC VII Symposium – 100 Years ISPRS, Vienna, Austria, July 5–7, 2010, IAPRS, Vol. XXXVIII, Part 7B Contents I ,J ∑α ij [ wintt k ](l ) = i , j =1 ⋅ Author Index [ wintt k ](l -1) ⋅φkt (xijt ) t ⎧1 if xij ∈ T [ pˆ (x tij | ωintt )](l -1) , α ij = ⎨ t Nt ⎩0 if xij ∉ T where β > 0 tunes the influence of the context and δ is the Kronecker delta function defined as: t t Keyword Index (12) 1 2 1 2 ⎧1 if (Cij ,Cij ) = (Cgh ,Cgh ) 1 2 1 2 ⎩0 if (Cij ,Cij ) ≠ (Cgh ,Cgh ) δ [(Cij1 ,Cij2 ),(C gh1 ,Cgh2 )] = ⎨ The corresponding log-likelihood is given by: 3.3 Iterative Labelling I ,J ln L(θ t ≡ Wintt ) = ∑α ij ⋅ ln[ pˆ (x tij | ωintt )] (13) i , j =1 Solving Eq. (2) is equivalent to maximize the log-likelihood ln L(X 1 ,X 2 | C1 ,C 2 ) , which can be written as: Even in this case we assume that convergence is reached if the relative increase in the log-likelihood is lower than ε . 2 t ln L(X 1 ,X 2 | C 1 , C 2 ) = −∑U data (X t , C t ) −U contex (C 1 , C 2 ) − ln( Z ) (19) t =1 t 3.1.3 Estimation of pˆ (x tij | ωunk ) It is worth noting that Eq. (3) can be re-written as: K K K k =1 k =1 k =1 t (X t , C t ) = − ln[ p (X t | C t )] , represents the classwhere U data conditional energy function at date t = 1,2 , while U contex is given by (17). Since Z solely depends on the selected type of neighbourhood, the final problem becomes solving: t t t t t t t t t t t P(ωunk ) ⋅ ∑ wunk k ⋅φk (xij ) = ∑ wk ⋅φk ( xij ) − P(ωint ) ⋅ ∑ wint k ⋅φk (xij ) (14) t t t t t ) ⋅ wunk Hence, ∀k = 1,…, K it holds P (ωunk k = wk − P (ωint ) ⋅ wint k . t ) = 1 , we have: Since P (ωintt ) + P (ωunk t wunk k = (18) wkt − P(ωintt ) ⋅ wintt k wkt − P(ωintt ) ⋅ wintt k = t P (ωunk ) 1− P(ωintt ) 1 2 {C 1 , C 2 } = argmin{U contex (C 1 , C 2 ) + U data (X 1 , C 1 ) + U data (X 2 , C 2 )} (20) C 1 ,C 2 To this aim, we propose a strategy based on the Iterated Conditional Modes (ICM) algorithm (Besag, 1996) which allows maximizing local conditional probabilities sequentially. In particular, at each iteration l we update the estimated prior probabilities for the class of interest at each date Pˆ (ωintt ) and, accordingly, also the class-conditional densities of the unknown t ) . The algorithm works as follows: classes pˆ (x tij | ωunk (15) Then, as M t , Σt , W t and Wintt have been determined, we can t ) substituting (15) into (6), upon it is possicompute pˆ (x tij | ωunk ble to obtain a reliable estimate Pˆ (ωintt ) for the prior probability of the class of interest. This can be properly accomplished throughout the iterative labelling phase presented below. Step 1. After estimating pˆ ( xtij | ωintt ) , ∀xtij ∈X t , t = 1,2 following the approach described in the previous paragraphs, set Pˆ (ωintt ) = 0.5 (no prior knowledge is assumed to be available about the true prior P (ωintt ) ) and compute the conditional dent ) accordingly; sity of the unknown classes pˆ (x tij | ωunk Step 2. Derive the initial sets of labels C 1 , C 2 by solely minimizing the non-contextual terms of Eq. (20), i.e. 1 2 {C 1 , C 2 } = argmin{U data (X 1 , C 1 ) + U data (X 2 , C 2 )} ; Step 3. On the basis of current C 1 and C 2 , compute the new estimated prior probabilities ratioing the number of pixels associated with the class of interest over the whole number of pixels, i.e. Pˆ (ωintt ) = Cintt ( I ⋅ J ) , Cintt = {Cijt | Cijt ∈ C t ,Cijt = ωintt }iI,,jJ=1 ; then, update the class-conditional densities for the unknown t classes pˆ (x tij | ωunk ) accordingly; Step 4. Update C 1 and C 2 according with Eq. (20); Step 5. Repeat Step 3 and Step 4 until no changes occur between successive iterations. At the end of the process, the final targeted change-detection map C * is defined as: 3.2 Joint Prior Modelling For modelling the joint prior P (C 1 , C 2 ) , we propose an approach based on MRF (Solberg et al., 1996). In particular, we assume that the couple of labels Cij1 , Cij2 associated with pixel at position (i, j ) at times t1 and t2 depends on the couples of labels associated with pixels belonging to the spatial neighbourhood G ij of (i, j ) at the two dates (we always considered first order neighbourhoods). In other words, the higher the number of its spatial neighbours experiencing a certain land-cover transition is, the higher the probability for a given pixel of experiencing the same transition is. In this hypothesis, according with the MRF theory, it holds the equivalence: P (Cij1 ,Cij2 | Cgh1 , Cgh2 ;( g , h) ∈ G ij ) = Z −1 ⋅ exp[−U contex (Cij1 ,Cij2 )] (16) ⎧1 C * = {Cij*}iI,,jJ=1 , Cij* = ⎨ ⎩0 where Z = Z (G ij ) is a normalizing constant called partition function, while U contex is a Gibbs energy function (accounting for the spatio-temporal context) of the form: if Cij1 = ωint1 and Cij2 = ωint2 otherwise (21) 4. EXPERIMENTAL RESULTS U contex (C ,C ) = − 1 ij 2 ij ∑ ( g ,h )∈G ij β ⋅δ [(C ,C ),(C ,C )] 1 ij 2 ij 1 gh 2 gh (17) In order to assess the effectiveness of the proposed technique, we carried out several experiments with different combinations 216 In: Wagner W., Székely, B. (eds.): ISPRS TC VII Symposium – 100 Years ISPRS, Vienna, Austria, July 5–7, 2010, IAPRS, Vol. XXXVIII, Part 7B Contents Author Index Keyword Index (Cristianini and Shawe-Taylor, 2000). For the selection of the two free parameters (i.e., a penalization parameter and the variance of considered Gaussian kernels) we employed a 10-fold cross-validation strategy (Duda et al., 2000). Available prior knowledge has been used for defining regions of interest composed on the whole by 21941 pixels whose ground truth was known at the two times, respectively. 10 landcover classes have been considered at t1 , whereas 9 have been taken into consideration at t2 . At both dates, from all the available labeled samples we defined two spatially-disjoint training sets (see Table 1). This means that there is no overlapping between training samples at t1 and t2 . All of them have been used for training both the ML and SVM supervised classifiers at each time. of multispectral, hyperspectral and SAR data. Nevertheless, due to space constraints, we will focus the attention solely on a representative change-detection problem referring to an intense farming area experiencing several land-cover transitions located in Barrax, a village close to Albacete (Castilla-La Mancha, Spain). In particular, available data consist in: two Landsat-5 Thematic Mapper (TM) images acquired on 15th July 2003 and 17th July 2004, respectively (as generally done in the literature among the 7 spectral bands we did not consider the lowresolution band associated with the thermal infrared channel); one PROBA CHRIS image (composed by 63 spectral bands with centre wavelengths from 400 to 1050 nm) acquired on 16th July 2004; and one Envisat ASAR alternate polarization (VV, HH) image (despeckled using a 3× 3 Gamma filter) acquired on 18th July 2004. All of them have been properly co-registered to a common spatial geometry of 30 meters and a study area of 512 × 512 pixels has been selected. July 2003 will be referred to as t1 , whereas July 2004 will be referred to as t2 (no significant changes indeed occurred between 16th and 18th July 2004). From the original available images we derived the following three datasets (using the stacked vector approach for multisource data fusion (Richards and Jia, 2006)) composed by: i) the 6 Landsat TM bands at both t1 and t2 (i.e., D1 = D2 = 6 ) [Dataset I]; ii) the 6 Landsat TM bands at t1 and the merger of the 2 Envisat ASAR backscattering intensity images with the 6 Landsat TM bands at t2 (i.e., D1 = 6 and D2 = 8 ) [Dataset II]; and iii) the 6 Landsat TM bands at t1 and the 63 PROBA CHRIS bands at t2 (i.e., D1 = 6 and D2 = 63 ) [Dataset III]. However, it is worth noting that our objective is not to seek for a set of features at both dates which could be more effective for solving the investigated problem, but rather to demonstrate that the presented method is even capable of effectively handling different types of data at the two times. As described in Section 3, the user is required to set the number of Gaussian kernels K to be employed for approximating the PDFs. Hence, in order to understand how significant the selection of this free parameter is, we performed a series of experiments varying K from 20 to 120 with steps of 10. According with a variety of experiments on toy and real datasets we fixed ε = 10−4 and β = 102 . In all the trials we employed the k-means clustering for initializing both centres and variances of kernel functions. Nevertheless, the very high complexity of the algorithm (i.e., approximately O( N D⋅( K +1) log N ) , where N and D represent the number of samples to be clustered and their dimensionality, respectively (Inaba et al., 1994)), prevented us from using all the patterns of each investigated image at a time, as this would have required a very high computational burden. In order to overcome this limitation, for each image we ran the k-means algorithm on a random subset containing one third of the total amount of samples. However, as this might affect the final change-detection accuracies of the proposed technique, for each number of considered kernels K , we performed 10 different trials running each time the k-means clustering on a different random subset. Moreover, we finally also combined the 10 resulting change-detection maps through a majority voting ensemble. For validating the potentialities of the presented method, we compared the results with those obtained by supervised PCC. In particular, we considered ML and SVM fully-supervised classifiers trained by exploiting a complete ground truth for all the land-cover classes characterizing each considered date. ML is a simple yet generally rather effective statistical classifier, which does not require the user to set any free parameter (Richards and Jia, 2006). SVM are advanced state-of-art classifiers, which proved capable of outperforming other traditional approaches Land-cover class alfalfa bare soil corn garlic grasslands onions poppy potatoes spring crops stubble sugar beet sunflower wheat total t1 (July 2003) 2031 2585 1737 101 – 213 336 208 – 2416 365 – 369 10361 t2 (July 2004) 634 – 2664 302 42 220 – 283 3318 2247 – 449 – 10159 Table 1. Number of spatially-disjoint training samples considered at the two dates. Change-detection results have been evaluated (over those samples whose land-cover class is known at both dates) in terms of: percentage overall accuracy OA% (i.e., the percentage of samples correctly identified as both changed or unchanged over the whole number of samples), and kappa coefficient of accuracy (which also takes into consideration errors and their type) (Richards and Jia, 2006). Among the different land-cover transitions occurred between the two dates, here we take into consideration (one at a time) two of them, namely “bare soil to spring crops” and “alfalfa to corn” (experienced by 4583 and 1035 pixels, respectively, over the whole available 21941 whose ground truth was known at both times). In our trials, we empirically experienced that a common range for K resulting in average high detection accuracies spans from 60 to 80. When instead nearing the lower or the upper bound of the considered interval, performances tend to vary depending on the specific land-cover transition of interest. Accordingly, in Table 2 and Table 3 we show the results obtained with the proposed technique for K = 40 , 60, 80, 100. In particular, we report the median over the 10 realizations with different k-means clustering initialization. Moreover, also accuracies finally obtained with the majority voting ensemble (denoted as PSCDMV), as well as those obtained by supervised PCC using ML and SVM (denoted as PCCML and PCCSVM, respectively) are presented. While investigating the “bare soil to spring crops” transition with the proposed PSCD technique, from all the training pixels reported in Table 1, we considered the only 2585 available for bare soil at t1 and the only 3318 spatially-disjoint available for 217 In: Wagner W., Székely, B. (eds.): ISPRS TC VII Symposium – 100 Years ISPRS, Vienna, Austria, July 5–7, 2010, IAPRS, Vol. XXXVIII, Part 7B Contents Author Index two times, while providing accuracies comparable with those of fully-supervised methods. In particular, the PSCD technique relies on a partially-supervised approach and jointly exploits the Expectation-Maximization (EM) algorithm and an iterative labelling strategy based on Markov random fields (MRF) accounting for spatial and temporal correlation between the two images. Moreover, it also allows handling images acquired by different sensors at the two considered times. Experimental results on multi-sensor datasets derived from multispectral, hyperspectral and SAR data confirmed the effectiveness and the reliability of the proposed technique spring crops at t2 . Obtained results are very satisfactory, as confirmed by both the high kappa and OA% values reported in Table 2 (always higher than 0.79 and 92, respectively). Moreover, by employing the majority voting ensemble it is possible to further improve the performances and obtaining for both indexes accuracies even closer to those obtained by PCCML and PCCSVM with a fully-supervised training at both dates. K 40 60 80 100 PSCDMV PCCML PCCSVM Dataset I kappa OA% 0.8925 96.59 0.8903 96.53 0.8822 96.25 0.8878 96.47 0.9076 97.08 0.9339 97.88 0.9501 98.36 Dataset II kappa OA% 0.8403 95.09 0.8458 95.21 0.8412 95.11 0.8470 95.31 0.8649 95.82 0.9323 97.83 0.9359 97.90 Dataset III kappa OA% 0.7955 92.61 0.8488 94.97 0.8441 94.80 0.8502 94.93 0.8622 95.32 0.9330 97.86 0.9703 99.02 REFERENCES Besag, J., 1986. On the statistical analysis of dirty pictures. Journal of the Royal Statistical Society, Series B, 48, pp. 259302. Bruzzone, L., and Fernàndez-Prieto, D., 1999. A Technique for the Selection of Kernel-Function Parameters in RBF Neural Networks for Classification of Remote-Sensing Images. IEEE Trans. Geosci. Remote Sens., 37(2), pp. 1179-1184. Table 2. kappa coefficient of accuracy and OA% obtained for the “bare soil to spring crops” land-cover transition. While addressing the “alfalfa to corn” transition with the proposed PSCD technique, from all the training pixels reported in Table 1, we considered the only 2031 available for alfalfa at t1 and the only 2664 spatially-disjoint available for corn at t2 . Such a transition is rather difficult to characterize, as only experienced by few fields in the considered area. Indeed, according with the results in Table 3, this is confirmed by the very low accuracies obtained by PCCML despite fully-supervised training. Instead, in the light of the high complexity of the problem, performances exhibited by the proposed method are very promising, especially for Dataset II and Dataset III. Moreover, with the majority voting ensemble the gap with respect to PCCSVM becomes very small (Dataset II) or it is even possible to outperform results obtained with SVMs (Dataset III). K 40 60 80 100 PSCDMV PCCML PCCSVM Dataset I kappa OA% 0.6725 97.36 0.7110 97.79 0.6530 97.25 0.6553 97.32 0.7524 98.16 0.5163 96.87 0.9003 99.08 Dataset II kappa OA% 0.8549 98.79 0.8190 98.49 0.8172 98.45 0.7233 97.92 0.8896 99.06 0.5134 96.86 0.9244 99.32 Keyword Index Coppin, P., Jonckheere, I., Nackaerts, K., and Muys, B., 2004. Digital change detection methods in ecosystem monitoring: a review. Int. J. Remote Sens., 25(9), pp. 1565-1596. Cristianini N., and Shawe-Taylor, J., 2000. An Introduction to Support Vector Machines. Cambridge University Press. Dempster, A., Laird, N., and Rubin, D., 1977. Maximum Likelihood from Incomplete Data Via the EM Algorithm. The Royal Statistical Society, Series B, 39(1), pp. 1-38. Duda, R. O., Hart, P. E., and Stork, D. G., 2000. Pattern Classification. Wiley, New York. Inaba, M., Katoh, N., and Imai, H., 1994. Applications of weighted voronoi diagrams and randomization to variancebased k-clustering. Proceedings of the 10th Annual Symposium on Computational Geometry, pp. 332–339. Dataset III kappa OA% 0.8117 98.40 0.8343 98.60 0.8206 98.45 0.8367 98.62 0.9079 99.22 0.5353 97.05 0.8969 99.04 Jensen, J. R., 2009. Remote Sensing Of The Environment. Pearson, Upper Saddle River. Jeon, B., and Landgrebe, D. A., 1999. Partially supervised classification using weighted unsupervised clustering. IEEE Trans. Geosci. Remote Sens., 37(3), pp. 1073-1079. Table 3. kappa coefficient of accuracy and OA% obtained for the “alfalfa to corn” land-cover transition. Lu, D., Mausel, P., Brondizio, E., and Moran, E., 2004. Change detection techniques. Int. J. Remote Sens., 25(12), pp. 23652407. 5. CONCLUSIONS Radke, R. J., Andra, S., Al-Kofahi, O., and Roysam, B., 2005. Image Change Detection Algorithms: A systematic Survey. IEEE Trans. Geosci. Remote Sens., 14(3), pp. 294-307, 2005. In this paper we presented a novel partially-supervised changedetection (PSCD) technique capable of addressing targeted change-detection problems where the objective is to identify one (or few) targeted land-cover transitions, under the assumption that ground-truth information is available for the only (few) class(es) of interests at the two investigated dates. In this context, either supervised or unsupervised standard approaches cannot be effectively employed. The proposed method, instead, allows exploiting the only prior knowledge available for the specific land-cover classes of interest at the Richards, J. A., and Jia, X., 2006. Remote Sensing Digital Image Analysis. An Introduction. Springer, Berlin. Singh, A., 1989. Digital change detection techniques using remotely-sensed data. Int. J. Remote Sens., 10(6), pp. 989-1003. Solberg, A. H. S., Taxt, T., and Jain, A. K., 1996. A Markov Random Field Model for Classification of Multisource Satellite Imagery. IEEE Trans. Geosci. Remote Sens., 34(1), pp. 100-113. 218