3. Our methodology

advertisement
Mean shift-based clustering of remotely sensed
data1
LIOR FRIEDMAN†, NATHAN S. NETANYAHU2†3 and MAXIM
SHOSHANI‡
† Dept. of Computer Science, Bar-Ilan University, Ramat-Gan 52900, Israel
‡ Dept. of Civil and Environment Engineering, Technion, Israel Institute of
Technology, Haifa 32000, Israel
3 Center for Automation Research, University of Maryland at College Park, MD
20742, USA
The mean shift algorithm (MSA) is a statistical approach to the clustering
problem. The method is a variant of density estimation. We present in this
thesis the approach and its use for clustering of remotely sensed images. We
explain how the approach can be applied to this type of data and show some
experimental results obtained from real data sets, which indicate that the
MSA technique has a fairly good accuracy and high reliability. The
adaptation of the procedure to a parallel environment is also presented and
discussed
1.
Introduction
Unsupervised clustering plays a most significant role in numerous applications of image
processing and remote sensing. For example, unsupervised clustering is often used to
ultimately classify an area of interest to land cover categories. The approach is especially
useful when reliable training data are either scarce or expensive, and when relatively little
known information about the data is available. Thus, unsupervised clustering serves as a
fundamental building block in the pursuit of unsupervised classification (Anil et al. 2000).
An approach based on the principle of mean shift (MS) (Backer 1995) has been pursued in
recent years for image segmentation and clustering (Bezdek 1981, Castleman 1996, Cihlar
et al. 2000). The principle comprises, essentially, a variant of statistical density estimation.
Typically, an MS procedure (which operates in feature space) examines - for each data point
- the "center of mass" of its local neighborhood and then shifts the point in the general
direction of that center of mass. This process is repeated iteratively until every point
converges to its cluster center.
Given the special characteristics of MS-based clustering (see below) and its enhanced
performance on colored images (Castleman 1996 ), we believe that it is of interest to the
1
A preliminarily version of this paper appeared in Proceedings of the IEEE International
Conference on Geosciences and Remote Sensing (Friedman et al. 2003)
-1-
remote sensing community to investigate the method's applicability also to large, multispectral data sets. In general, an MS-based approach has the following characteristics:
1. It does not assume a specific type of data distribution, unlike many standard
clustering approaches which assume, e.g., Gaussian distribution. Instead, it takes
the most general approach, whereby the immediate region of each point is
examined and a corresponding cluster is estimated.
2. Unlike ISOCLUS (or k-means), it does not require a pre-specified number of
clusters.
3. In general, it is fully deterministic (assuming that its specific implementation does
not depend on selecting initially a subset of data points). This makes it easy to
understand and analyze.
4. It is highly parallelizable, which should prove especially valuable and efficient in
remote sensing applications. (Parallelization can be realized, e.g., by applying the
MS procedure in parallel to disjoint feature subsets and combining the individual
clusters obtained.)
Many standard clustering approaches make certain assumptions as to the data distribution.
The most common is that of a Gaussian distribution. The mean shift procedure does not
make any such assumptions. Instead, it takes the most general approach to the problem, by
examining for each point its neighboring region, and estimating which cluster the point
belongs to according to its local density.
The mean shift algorithm is a deterministic process. Although some variants of the
process depend on an initial choice of points from the starting data set, the main approach
by which we treat the entire data set is deterministic and is, therefore, easier to understand
and analyze.
In addition, the algorithm is highly parallel, which is a most important feature. This
attribute is crucial in remote sensing applications. The aim of remote sensing applications is
to analyze satellite-acquired images of very large regions. Typically these images are
multispectral and cover a large area resulting in large data sets.
Recent development of hardware has reduced significantly the cost of parallel systems.
The penetration of local area networks into the office environment also makes such systems
more pervasive. Overall ,the growing availability of CPU power for distributed processing
results in the ability nowadays to process larger data sets more accurately, which as
mentioned, is critical in remote sensing applications.
We have studied the mean shift algorithm, in the context of clustering. The main goal of
this paper is to demonstrate the applicability of the mean shift approach to clustering of
remotely sensed data. Furthermore, we demonstrate the use of parallelism to reduce the
amount of running time of the procedure, thereby demonstrating the benefits of local area
networks in the computation of demanding image processing applications.
The rest of the paper is organized as follows: Section 2 contains background of the mean
shift procedure. In Section 3 we present the steps taken to demonstrate the applicability of
mean shift in remote sensing, and the steps taken for parallel implementation. In Section 4
we show the results achieved by the work described and in Section 5 we present our
conclusions.
2.
Background of the mean shift algorithm
The idea of the mean shift algorithm was first suggested by Fukunaga et al. (1975) for use
in cluster analysis (Silverman 1986, Catleman 1996). It is categorized within the statistical
-2-
approaches for clustering or more specifically, as a density estimation approach. Simply put,
the idea behind the algorithm is to estimate in the proximity of each point ,the average
density and shift the point in that general direction.
Mean shift is a simple iterative procedure, where each data point is“ shifted” toward the
average of the data points in its neighborhood. The output of this algorithm is a mapping for
each point in the original data set S into its appropriate cluster center. This information is
denoted by the vector of means .
The following diagram taken from Comaniciu et al. (1999) depicts the trajectory of a
single point during the mean shift process.
Figure 1: Trajectory of a point in a mean shift procedure.
Cheng (1995) generalized the original definition in three aspects:
1. First he allows for the use of any kernel instead of the flat kernel. A kernel function is
defined by him as follows: Let X be a point in the n-dimensional Euclidean space,
n
. Denote the i-th component of x  X as xi . The norm of x  X is a nonnegative
n
number
x such that
x 2   xi
2
. The inner product of x and y in X
i 1
is x, y 
n
 x y . A function
i 1
i
i
profile k :[0, ] 
K:X 
is said to be a kernel if there exits a
, such that, K ( x)  k ( x ) and
2
-3-

k is non-negative,

k is non-increasing, i.e., k (a )  k (b) if a  b , and

k is piecewise continuous and

 k (r )dr   .
0
2. The second generalization is weighting the points in the computation of the mean
sample. This weight function can be constant throughout the procedure or can change
between iterations.
3. Finally he allowed the initial data set picked to be any subset of the original data.
This leads to the following generalization of the mean shift procedure:
Let S  X be a finite set (the “data”), let K be a kernel, and let w : S  0,  be a
weight function. The sample mean at x  X with kernel K is defined as:
(1)
 K s  x   w( s)  s
m x  
 K s  x   w( s)
sS
sS
T  S be a finite set. The evolution of T in the form of iterations
T  M (T ) with M (T )  m(t ), t  T  is called the generalized mean shift algorithm.
Let
The original mean shift version proposed by Fukunaga et al. (1975), initially assigns the
subset T as S, it also used the computed means M (T ) at the end of each iteration to replace
S. This procedure is known as the “blurring” process. However, as Chang (1995) showed, T
can be assigned to any subset of the data while S remains intact throughout the procedure. It
will be shown later that this convention has great advantages in constructing a parallel
implementation the mean shift. The pseudocode of the generalized version of the mean shift
algorithm is provided below in Figure 2.
Although in the mean shift procedure each point is shifted towards the maximum average
density of its neighborhood, the convergence of the process is not guaranteed. Indeed, each
iteration only guarantees convergence in infinitesimal steps. Cheng et al. (1992) first proved
the convergence of the process for the discrete case. This was also shown by Comanicio and
Meer (1999). Cheng (1995) also addressed issues such as the rate of convergence, the
number of clusters produced by the process, and the effect of the kernel size on the
convergence rate.
Comaniciu et al. (2001) further studied the effect of the kernel function and the window
size on the process. In Comaniciu et al. (1999) they suggested the use of the Epanechnikov
kernel, which yields the minimum integrated square error (MISE) in filtering and
segmentation of an image. This kernel is defined by
1  x
K ( x)  
 0
2
-4-
if x  1
otherwise
was used by them in applying the procedure to a so-called joint spatial range domain for
segmentation. In Comaniciu et al. (2001) the advantages of using an adaptive window size
were introduced. It was shown that for some applications it would be beneficial to allow the
computation of the kernel to adapt for each evaluated point according to its local
neighborhood.
As can be seen from Chang's generalization and other related work (Meer 2005), before
any attempt to implement a mean shift procedure one must consider the following
parameters:
1.
Specific algorithmic variant,
2.
kernel function to be used, and
3.
window size.
S  Initial dataset
T S
K  Kernel function
w : S  (0, )
M 0 (T )  T
M 1 (T )  T
Do
{ //------- a single shift iteration (computes M i1 (T ) )
for each t  T
{
//will hold the new sampled mean of t
mi 1 (t )  0
//will hold the weighted sum of the kernel
k 0
for each x  S
{
mi 1 (t )  mi 1 (t )  K (mi (t )  x)  t  w(t )
k  k  K (mi (t )  x)  w(t )
}
mi 1 (t )  mi 1 (t ) / k
}
//optional (done in the blurring variant)
S  mi1 (t )
} while M i (T )! M i 1 (T ) //until convergence
Figure 2: Psedocode of the generalized mean shift algorithm.
The exact effect of these parameters on the results will be discussed in the following
chapters. It suffices to say that different combinations of these parameters will result in
different clustering outcomes.
One of the downsides of the mean shift procedure is its computational cost. In the naive
approach, the new sampled means are computed for all the points in the subset at each
iteration. In order to compute it for a single point, we go over all the points in the original
-5-
data set and apply to them the kernel function. Applying the kernel function is proportional
to the dimensionality of the data resulting in an overall complexity of O(nmdN ) , where:

n is the number of points in the entire dataset (S),

m the number of points chosen for the subset T,

d the dimension of the feature space, and

N the number of iterations until convergence is achieved.
Several approaches for reducing the complexity of the algorithm were explored.
Comaniciu and Meer (1998) combined the mean shift procedure with the k-nearest neighbor
technique in order to reduce the running time by using a smaller number of points (m). They
chose a relatively small, evenly distributed subset T and applied to it the mean shift
procedure. The rest of the points in the data set were associated with the resulting clusters
using a k-nearest neighbor approach.
A different approach was taken by Elgammal et al. (2003), who showed how the use of
fast Gauss transform (FGT) can speed up the calculation of the Gaussian kernel function.
The FGT is an important variant of the fast multipole method, which relies on the fact that
the computation of the Gaussian is only required only up to a certain degree of accuracy.
This approach was further generalized in higher dimensions by Yang et al. (2003) who
showed that although direct extension of the FGT to higher dimensions is exponential, an
improved version of this approach can still achieve linear computational complexity.
Another method for improving the computation of the sampled mean was introduced by
Georgescu et al. (2003). They showed how the use of locality sensitive hashing (LSH) can
reduce the time it takes to compute adaptive mean shift. They showed that by improving the
time it takes to perform neighborhood queries (when evaluating the kernel function) a
substantial speedup can be achieved.
Yang et al. (2003) also showed how the use of a quasi Newton method, specifically the
Broyden, Feltcher, Goldfarb, and Shanno method (BFGS), can improve the convergence
rate of of the mean shift procedure, thereby obtaining a faster procedure.
Another way to approach the computational needs of the mean shift procedure is by
adapting it to run in parallel. As Cheng et al. (1995) pointed out, mean shift in its core is a
parallel process. In this research we explored the applicability of such an approach for the
mean shift procedure. We demonstrated that a parallel implementation for the mean shift
procedure can achieve the expected linear speedup.
3.
Our methodology
3.1
Application to remote sensing
We have applied the mean shift algorithm to remotely sensed data, and obtained some early
promising results. In addition to a mere qualitative visual assessment of the results (which is
common in performance evaluation of image segmentation), we have measured
quantitatively the accuracy of our implemented version of MS against ground truth (GT) of
several remotely sensed images. On average, our current results indicate an overall accuracy
of roughly 70%. (Running k-means on the same data sets did not yield higher accuracy.)
In applying MS to remotely sensed data, we have tested the method with several kernel
types and sizes. Although a more complex kernel may provide slightly more accuracy,
-6-
taking into an overall consideration the procedure's simplicity, running time, etc., the flat
kernel appears to be "optimal" (at least according to our practical experience so far). We
have also tested several MS variants, in addition to the common blurring procedure. (See
Comaniciu and Meer (1998), for further details.) The alternative variants did not seem to
improve the overall accuracy. They do have some advantages, however, as far as running
time and ease of parallel implementation may be concerned. As mentioned, the kernel
bandwidth is the only parameter that needs to be decided upon before the procedure is
invoked. For the specific task we have considered (i.e., clustering of NDVI data), the
optimal kernel size (i.e., the one for which the highest overall accuracy was attained)
remained essentially the same for all tested images. Note that for a very small kernel the
accuracy will approach 100% (each point will be in its own cluster); as the kernel size
increases, the overall accuracy (and the number of clusters found) will decrease. When the
number of clusters found approaches the real number of clusters, the overall accuracy will
start rising again until an optimum is reached. Afterwards, increasing the kernel size results
in reducing the overall accuracy until a single cluster is found. (See Figure 5, for curves
obtained on two real data sets.)
We have also observed that the accuracy of MS has risen proportionally to d, i.e., the
number of bands (dimensions) used. As more bands were used, the overall accuracy was
higher. (Of course, the running time becomes much slower with the dimension.)
3.2
Computationally efficient algorithm(s) for mean shift.
Efficiency is an important issue for every algorithm. In dealing with remote sensing
applications this is all the more significant as we usually deal with very large images (i.e.
points). Standard images may contain up to several millions of data points, and unlike
standard applications of image processing, which usually deal with up to three bands
(RGB), remotely sensed images are typically multispectral and may contain a large number
of spectral bands. For example, Landsat images contain seven different bands and currently
new satellites are capable of providing hyperspectral images that contain several hundred
spectral bands. Occasionally it also makes sense to examine multitemporal images, i.e., a
stack of images over the same area that were taken at different times and that are composed
into a single image. This leads of course to an even larger number of image bands, that need
to processed. Therefore, it is not uncommon that processing of even a relatively small sized
image could take several hours and even days.
Given this critical issue, we aimed at deriving relatively fast methods for the computation
of the mean shift algorithm. Basically the mean shift runs in o(n 2 ) time, where n is the
number of data points to be processed. If we take into account that the number of bands is
typically large, the multiplication constants could grow significantly and the running time of
the procedure can be very high.
-7-
Mean Shift Procedure Running Time
500
Running Time(Sec)
450
400
350
300
without Hashing
With Hashing
250
200
150
100
50
0
750
1500
2250
3000
3750
4500
5250
Number of Points
Figure 3: Running time of mean shift with and without hashing.
We started out by examining how the different parameters (algorithmic variant, kernel
type, kernel window size, etc.) affect the running time. We continued by exploring several
optimization methods and their use in our application. It turns out that the mean shift
algorithm has a special characteristic that is very useful for optimizing the running time.
Being a deterministic process, the mean shift computation performed on a given point does
not depend on other computations within the same iteration. Furthermore, since data points
tend to coincide, therefore, much of the processing could be saved by saving intermediate
results on the fly. In order to take advantage of this characteristic we maintained a hash table
that maps data points to their computed means in a given iteration. Before a new mean is
computed the algorithm first checks the table whether the point was already processed. Only
if it was not, does its mean get computed (and stored for future use).
Having implemented a hash table to exploit this observation, it turned out that in the
blurring process, on average 90% of the mean computations were repeated at some stage.
Thus, it was possible to reduce the running time by approximately 70%-80%. Figure 3
shows the running time as a function of the number of points for both implementations ,with
and without the use of the hash table mechanism. The test was performed on randomly
generated data using the blurring variant with a flat kernel.
While exploring the hashing mechanism we came across another interesting aspect. If we
treat close points as having the mean, more computation can be saved. By controlling the
"closeness "parameter, one can increase or decrease the processing time of the procedure at
the expense of its accuracy. This technique is somewhat related to the bucketing technique
that uses fast Gaussian transforms (Elgammal et al. 2003). Although it seemed a promising
avenue of further research, it was deemed at that stage that the accuracy of the algorithm
was more important than additional speedup that could be gained. Thus, no further study
was performed to check the practical impact of this technique.
-8-
3.3
Parallelized mean shift
As Cheng et al. (1992) noted, mean shift can be executed in parallel. Consider, for example,
the blurring process. At each iteration the new sampled mean is calculated for the entire set
T using only results from previous iterations and the weight function w .This means that the
computations at each iteration can be divided between several processors as long as they
possess results for the previous iteration.
On P1
T  Initial dataset
K  Kernel function
w : S  (0, )
send S to P2 ,, Pn
divide M i (T ) into n sections M i1 (t ),  , M in (t )
compute M 1 (S ) until convergence
for j  2  n
receive M j (S )
On Pj  P2 , , Pn
receive S from P1
compute M j (S ) until convergence
send M j (S ) to P1
Figure 4: Pseudocode of parallel mean shift using the initial data variant .
Of course, other algorithmic variants can be adapted to a multiprocessor environment.
Figure 4 contains pseudocode describing a parallel implementation of the initial data
variant. Using this variant, removes the need to communicate back the intermediate results
between each iteration, so each processor can complete the entire mean shift computation
independently to its end. (In this procedure as the dataset S is kept constant throughout the
procedure, while each processor computes the next phase results in relation to S. (Cheng
1995.)
Every parallel implementation involves the use of a server. This machine is responsible
for the coordination of the entire process. The process starts when the server loads up the
data set for the tested image. It then waits until all the clients are connected to it and are
ready to start processing. The server then communicates to all the clients the initial data for
processing and allocates the points according to the number of clients. When a client
receives a set of points, it starts processing and reports back the results to the server. The
server then combines the results received to create a single image. If necessary, as in the
blurring variant, the process is repeated several times until the entire procedure is completed
and the final results are integrated into the output image. There can be various parallel
implementations which vary in many aspects depending on the initial data set, the
approximation used, whether intermediate results are communicated back between each
iterations, and the exact work distribution between the different processors.
In summary, the parallel implementation of the mean shift procedure does yield the
expected speedup. Although currently there are still some open issues regarding optimal
-9-
distribution between the processors it is clear that even a relatively simple mechanism can
considerably reduce the processing time.
4.
Experimental results
The study area that was selected represents a Mediterranean environment in Southern Israel,
and contains a relatively large number of crop types. Ground truth of the area is available
due to a survey conducted by experts of the Israeli Ministry of Agriculture. Five LandsatTM images were acquired during the '96-'97 growing season, under clear sky conditions
(see Figure 6 and 7). These images were radiometrically calibrated, using the empirical line
method (Shoshany and Svoray 2002, Cohen and Shoshani 2002), and geometrically rectified
with 0.5-pixel positional accuracy. Image acquisition dates allow for a representation of the
different phyto-phenologies of crop types in this environment, i.e., distinguishing between
summer and winter crops. Each of the images was converted to an Normalized Difference
Vegetation Index (NDVI) layer to form a multi-layer input for subsequent clustering
/classification. Following is an exemplary result of applying the mean shift on the given
area. The overall accuracy was arrived at by first mapping the clusters found against the
clusters provided by GT. A cluster is mapped to that cluster for which the number of
overlapping points is the largest. The remaining points are considered errors. In some cases,
if clusters correspond to the same GT clusters, they are merged.
Table 1: Contingency table for Image 2, with flat kernel, and bands 2, 3, 4, 5.
Results obtained: Overall accuracy = 70.12%, number of clusters = 8
(2,888 points in ground truth), and optimal kernel distance = 21.
GT/MS
1
2
3
4
5
1
0
5
24
52
430
2
2
18
19
12
170
3
3
0
51
47
310
4
7
0
2
73
650
5
76
5
0
37
465
Unknown 2
0
132
138
158
Accuracy 82.69% 97.14% 66.38% 70.73% 57.62%
Reliability
84.15%
77.63%
75.98%
89.66%
91.72%
36.92%
The first set of tests was aimed at examining the basic behavior of the algorithm. To keep
things simple, the blurring variant was used with the flat kernel function. Various tests were
then performed on one of the images. These tests included runs with different band
combinations and various kernel sizes. The aim was to find the optimal kernel size for a
given band combination and whether it remained roughly fixed for the same data type. The
experiments were then repeated for different kernels. Specifically, we experimented with the
Gaussian kernel, truncated kernel and Epanechnikov kernel. The last set of experiments
focused on different algorithmic variants e.g., the blurring variant, the initial data variant,
and the random select variant (Comaniciu and Meer 1999) (in which only a random subset
of the data is clustered using the mean shift and the rest is classified according to minimal
- 10 -
distance1.) The optimal parameter values obtained previously was used in this last set of
experiments.
Having experimented extensively with the first image, it was deemed unnecessary to
repeat all of the above tests on the rest of the images. Only a subset of tests was conducted
to establish the validity of the parameter settings. It appeared that those optimal values
remained relatively fixed for the same data type. Chapter 4 provides a detailed presentation
of the results obtained.
We arrived at two important conclusions. The mean shift procedure reaches roughly an
overall accuracy of 70% and overall reliability of 80%. This performance is satisfactory,
since it is comparable to the performance of different approaches on the same data.
Secondly, the more complex kernel functions did not seem to have a drastic effect on the
accuracy. Considering the significant processing they require it might be desirable to use the
simpler flat kernel instead.
At this point it was also evident, as was assumed initially, that working with a larger
number of bands adds more information which results in a more accurate classification. For
example using two bands resulted in an overall accuracy of roughly 60%, while using 3 and
4 bands resulted in 65% and 70% respecting. Of course, this justifies the additional
processing which is required for a larger number of bands. It was not conclusive, however,
as to which algorithmic variant to use. Although the blurring process is generally more
accurate than the initial data variant, it seems that the difference in accuracy is not
significant (although usually the initial data tends to recognize more clusters than there are).
On the other hand ,as will be explained in the following section, the advantages of the initial
data variant, specifically in terms of parallelization, might justify its use.
Overall Accuarcy
72%
70%
68%
66%
64%
14
19
24
Kernel Size
Figure 5: Overall accuracy for two data sets as a function of the kernel size (with flat kernel).
Secondly, as can be noticed see Figure 5, the accuracy is usually high for a small distance.
As the kernel size increases, the overall accuracy tends to decrease, as well as the number of
clusters that the procedure recognizes. This trend continues until the number of detected
clusters approaches the real underlying number of clusters. In that region the accuracy has a
local minimum, i.e., the overall accuracy rises again (this shows as a knee shape in the
graphs). When the kernel distance is further increased, the accuracy starts to decrease again.
This is followed by the convergence of all points into a small number of clusters. The local
- 11 -
minimum behavior is a good method of detecting the region in which the optimal kernel
distance resides.
Table 1 shows results of the MS procedure invoked on 4 bands with a kernel size of 20.
GT contained 2888 points. The overall accuracy and reliability obtained were roughly 70%
and 80%, respectively, Figure 8 represent the resulting clustered image.
Figure 6: Grey scale representation of NDVI data taken in November 1996.
Figure 7: Grey scale representation of NDVI data taken in February 1996.
- 12 -
Figure 8: resulting clustered image of test image 2
when performing mean shift on bands 2, 3, 4, 5,
flat kernel with size of 21. Overall accuracy = 70.12% (see Table 1).
5.
Conclusions
The first goal of the research was to demonstrate that mean shift can be used successfully to
cluster remotely sensed images. Although the results ,due to mean shift are not significantly
better than those due to other common techniques, it is still evident that the mean shift is
comparable. The accuracy for the tested images was roughly 70% with a relatively high
reliability. This accuracy was also the resulting accuracy produced by the ISO data
algorithm when run on the same images and it is also a common ball park figure for various
other approaches. Therefore, we can safely conclude that the mean shift procedure performs
in an acceptable manner, as far as clustering of remotely sensed data is concerned.
However the usefulness of the mean shift algorithm does not rely solely on the accuracy it
achieves. Its other advantages make it an important technique for image clustering. The first
advantage of the mean shift is its simplicity. It has very few free variables that need to be
decided upon by the user. Once the exact variant and the kernel type are chosen, only the
kernel distance needs to be determined. Other approaches usually require a larger number of
free variables. This advantage is very important and it is made even more so by the ability to
pinpoint the optimal values. It appeared, that the accuracy had a characteristic behavior vs.
kernel size. It is speculated that this distinct behavior might be used in an automated process
that will enable to focus on a relatively small range in which the optimal kernel distance
belong to. It’s also important to notice that once found, the optimal kernel distance is
expected to reside. Also, it is important to note that once found, the optimal kernel distance
remained relatively fixed for all the test images. To conclude, it seems fairly reasonable to
assume, at least for remotely sensed images, that the kernel distance can be easily
determined for any environment type after which it can be used for any data sets of the same
type.
- 13 -
Another even more important advantage of the mean shift is regarding its processing time.
Although the mean shift is not the most efficient process around, we have shown that by
using a parallel implementation of the process we can achieve a linear speed up in its
processing time. This is of course is a huge advantage, especially in today computing
environment were the use of local area networks, which contains many relatively weak
processors is most common. Other approaches does not present such an ability and when
trying to process big data sets there is no other option other than to wait for the full process
to be performed on a single processor machine .
Another important thing to note is the tradeoff between the number of clusters detected by
the procedure and the overall accuracy. As the number of detected clusters increases, the
overall accuracy increases too. This is a direct byproduct of our computation. The overall
accuracy is computed by assigning each cluster found to the ground truth cluster for which it
contains the greatest number of points. For a small kernel distance, each point represents its
own cluster, such that the accuracy is 100%. On the other hand, for a sufficiently large
kernel distance, all the points converge into a single cluster and the accuracy is proportional
to the biggest real cluster (i.e., the cluster that has the most points in ground truth). These
extreme cases demonstrate two things. First, in principle controlling the kernel distance
determined the number of clusters found in the image. In general as the kernel distance
increases, more points converge into the same clusters, so less clusters are reported by the
procedure. Secondly, the computation of the overall accuracy should be combined with the
number of clusters found. Only when this number is correlated with the actual ground truth,
can the results be considered correct. Fortunately, as we found in our experiments, the
overall accuracy has a local minimum when the detected number of clusters approaches the
actual number of the clusters in the image, and this minimum can be used to determine the
optimal kernel size.
A more important advantage of the mean shift is its processing time. Although the mean
shift is not the most efficient process, we showed that its parallel implementation achieves a
linear speedup in running time. This of course is a significant advantage, especially in
today's computing environment when the use of local area networks, which contains many
relatively weak processors is most common. Other approaches does not present such an
ability and when trying to process big data sets there is no other option other than to wait for
the full process to be performed on a single processor machine.
As to open issues regarding mean shift, we propose two research directions that might
yield some useful results. Although a linear speedup was demonstrated by the parallel
implementation, the current application can still benefit from a more sophisticated
distribution of the processing. We showed that a hash table saves considerable running time
on a single processor. But is less efficient in our parallel implementation. The hash table
related advantages can still be applied though in a parallel environment. If the points are
distributed such that "close" points are allocated to the same machine, there will be much
less hash misses, i.e., the heart of the hash mechanism is expected to become apparent. A
relatively simple approach for this, might be that before distributing the jobs to the different
machines, the points will be divided to a number of sets each of which will contain points
that are relatively near each other and should therefore converge to the same cluster.
The second research direction is also related to hashing. As mentioned earlier hashing can
be used for a tradeoff tunning between time and algorithmic correctness, i.e., by allowing
also sufficiently close points, (where" close "is variable), to share the same mean, much
computation can be saved. However the effect on the overall accuracy of the procedure
should be further studied. It might turn out that such a tradeoff is not of interesting since it
yields a significant reduction in the overall accuracy.
In conclusion, this research demonstrated the use of the mean shift procedure in clustering
of remotely sensed images. We showed that the overall performance of this technique is
- 14 -
comparable to other common approaches. We showed how the technique can be applied in
remote sensing and how to adapt the various parameters for this type of data. We also
demonstrated that parallel implementation results in a linear speedup and discussed several
issues for such an implementation. The main conclusion that can be drawn is that in view of
the significant advantages that mean shift offers it is worth while incorporating it also in
other applications.
References
Anil, K. J., Robert, P.W., Mao, J., 2000, Statistical Pattern Recognition: A
Review ,IEEE Transactions on Pattern Analysis and Machine
Intelligence, 22, pp. 4-37.
Backer, E., 1995, Computer-Assisted Reasoning in Cluster Analysis (New
Jersey: Prentice Hall International).
Bezdek, J.C.,1981, Pattern Recognition with Fuzzy Objective Function
Algorithms (New York: Plentum Press).
Castleman, K. R, 1996, Digital Image Processing (New Jersey: Prentice Hall
International).
Cihlar, J., Latifovic, R., Beaubien, J, 2000, A Comparison of Clustering
Strategies for Unsupervised Classification ,Canadian Journal of Remote
Sensing, 26, pp. 446-454.
Cheng, Y., 1995, Mean Shift, Mode Seeking, and Clustering, IEEE
Transactions on Pattern Analysis and Machine Intelligence, 17, pp.
790-799.
Cheng, Y. and Wan, Z., 1992, Analysis of the Blurring Process, In
Computational Learning Theory and Natural Learning Systems, 1992,
Petsche, T. (Ed.) et al. (London: MIT Press), pp. 257-276.
Cohen, Y. and Shoshany M., 2002, Integration of remote sensing, GIS and
expert knowledge in national knowledge-based crop recognition in
Mediterranean environment, International Journal of Applied Earth
Observation and GeoInformation, 4, pp. 75-87.
Comaniciu, D., Meer, P., 1998, Distribution Free Decomposition of
Multivariate Data ,International Workshops on Advances in Pattern
Recognition, 1451, pp. 602-610.
Comaniciu, D., Meer, P.,2002, Mean Shift: A Robust Approach toward
Feature Space Analysis, IEEE Transactions on Pattern Analysis and
Machine Intelligence, 24, pp 603-619.
Comaniciu, D., Meer, P., 2003, Kernel Based Object Tracking, IEEE
Transactions on Pattern Analysis and Machine Intelligence, 25, pp 564577.
Comaniciu, D., Meer, P., 1999, Mean Shift Analysis and Applications, IEEE
International Conference on Computer Vision, Kerkyra, Greece, pp.
1197-1203.
- 15 -
Comaniciu, D., Ramesh, V., Meer, P., 2001, The Variable Bandwidth Mean
Shift and Data-Driven Scale Selection ,International Conference on
Computer Vision, 1, pp. 438-445.
Elgammal, A., Duraiswami, R., Davis, L. S., 2003, Efficient Kernel Density
Estimation Using the Fast Gauss Transform with Applications to Color
Modeling and Tracking ,IEEE Transactions on Pattern Analysis and
Machine Intelligence, 21 ,pp. 1499-1504.
Fukunaga, K. and Hostetler, L.D., 1975, The Estimation of the Gradient of a
Density Function with Applications in Pattern Recognition ,IEEE
Transactions on Information Theory, 21, pp. 32-40.
Georgescu, B., Shimshoni, I., Meer, P.,2003, Mean shift based clustering in
high dimensions: A texture classification example, In International
Conference on Computer Vision, 2003, Nice: France, pp. 456-463.
Meer, P., 2005, Robust Techniques for Computer Vision, In Emerging
Topics in Computer Vision, 2005 Medioni, G. and Kang, S. B. (Eds.)
(New Jersey: Prentice Hall).
Silverman, B. W., 1986, Density Estimation for Statistics and Data Analysis
(London: Chapman and Hall).
Shoshany, M. and Svoray, T., 2002, Multidate adaptive unmixing and its
application to analysis of ecosystem transitions along a
climatic gradient, Remote Sensing of Environment, 82, pp. 5-20.
Yang, C., Duraiswami, R, Gumerov, N. A., Davis, L., 2003, Improved Fast
Gauss Transform and Efficient Kernel Density Estimation, In
International Conference on Computer Vision, 2003, Nice: France, pp.
464-471.
Yang, C., Duraiswami, R., DeMenthon D., Davis, L., 2003, Mean Shift
Analysis using Quasi Newton Methods ,International Conference Image
Processing, Barcelona,3, pp. 447-450.
- 16 -
Download