Project

advertisement
Image Segmentation via Clustering Consensus Methods
Jacob Tamir
tamiob@gmail.com
Kucherov Tamir
tamirkucherov@gmail.com
Supervisors:
Prof. Volkovich Zeev
Prof. Barzily Zeev
MSc. In Software Engineering Ort Braude College
This report is submitted in partial fulfillment of the requirement
for the degree of MSc in Software Engineering, Second Stage
7.9.2012
Abstract
The project is devoted for segmentation of images using the
consensus-clustering methods.
We apply clustering aggregation methods in order to receive
relevant answers for predefined questions. As known, no
algorithm is capable to reflect perfectly the image structure.
In our project we cluster the images by means of several
clustering algorithms, and summarize the results via two
clustering consensus techniques: HGPA (Hyper Graph Partition
Algorithm) and IDLA (Information Distance Learning
Approach). The first one is a well-known approach which is used
in many practical systems. The second one is a new approach
developed in ORT Braude College. The IDLA approach
is different from HGPA by taking into account additional
machine learning steps.
The outcomes obtained by the two algorithms were summarized
and compared. We came to conclusion that the IDLA technique is
preferable and the improvement over the HGPA method is 10.7%
Table of Contents
1
INTRODUCTION ................................................................................................................................. 1
2
RESEARCH APPROACH ...................................................................................................................... 3
2.1
2.1.1
2.1.2
2.1.3
2.2
2.2.1
2.2.2
2.2.3
2.2.4
2.2.5
2.2.6
2.2.7
2.2.8
2.3
2.3.1
2.3.2
2.3.3
2.3.4
2.3.5
2.4
THEORY ................................................................................................................................ 3
Cluster Algorithms ......................................................................................... 3
Consensus Clustering Algorithms .................................................................. 3
Rand Index ..................................................................................................... 4
IMPLEMENTATION ................................................................................................................. 5
Project general flow ...................................................................................... 5
Data used ...................................................................................................... 5
Design ............................................................................................................ 7
User Interface ................................................................................................ 8
Use case diagram ........................................................................................ 10
Class Diagram .............................................................................................. 11
Constraints and Limitations ......................................................................... 12
Programming environment .......................................................................... 12
TESTING AND VALIDATION.................................................................................................... 13
Experiment #1 ............................................................................................... 13
Experiment #2 ............................................................................................... 14
Experiment #3 ............................................................................................... 15
Experiment #4 ............................................................................................... 16
Experiment #5 ............................................................................................... 17
OUTCOMES ....................................................................................................................... 18
REFERENCES ............................................................................................................................. 21
APPENDIX A ............................................................................................................................. 22
DISTANCE FUNCTION ........................................................................................................ 22
Euclidian distance ......................................................................................... 22
Manhattan distance ..................................................................................... 22
Minkowski distance ...................................................................................... 22
APPENDIX B.............................................................................................................................. 23
RAW RESULTS .................................................................................................................. 23
AVERAGE RAND INDEX ....................................................................................................... 26
1 Introduction
Image segmentation is a process of partitioning an image into segments. Each segment
represents group of nearby pixels. The segmentation is based on similarities between
pixel properties, such as pattern, color and brightness. The purpose of the segmentation
process is to simplify the image understanding by emphasize the objects that interests
us.
Image-recognition and machine-vision algorithms are using the result of the
segmentation process in order to recognize desired objects in the image, such as
borders, lines, or predefined shapes.
Image segmentation is practically used in face recognition, medical imaging, fingerprint recognition and in other fields.
There are several well-known algorithms and techniques for image segmentation. The
technique changes according to the problem nature. The segmentation technique that
we are focusing on is a combination of the clustering consensus aggregation method.
The clustering main task is to classify a dataset of objects and assign it into groups
(clusters), so that data objects in the same group will share common properties and
features among themselves and will be similar to each other more than to objects in
other groups [11].
The task of clustering can be achieved by using a wide range of cluster algorithms. It is
known that no superior algorithm can reflect perfectly the image structure. In order to
achieve more accurate result, clustering should be an iterative process which
summarizes several different cluster approaches. For each step, different cluster
algorithm is applied with different initial partition and different input parameters. The
output partitions are being explored and the best one is selected. In our work we
investigate different approaches (consensus clustering) which take output partitions
and create a consensus summarizing one.
Consensus clustering, also called cluster aggregation or cluster ensemble refers to the
problem of finding a single (consensus) partition from a number of different partitions.
Input partitions are combined together to output a single consensus partition that is
“better” than the existing partitions [12].
Most consensus clustering methods are based on the clustering results only. Thus we
can lose the special data features. The solution to this problem could be to use
consensus clustering with distance learning function procedure.
In our work we investigated two consensus clustering approaches: HGPA [2] (Hyper
Graph Partitioning Algorithm) and IDLA (Information Distance Learning Approach).
Comparison between the two methods is provided.
Image Segmentation via Clustering Consensus Methods
1
As part of our work, we implemented the new proposed aggregation method (IDLA).
The new method extends the Wang and Jin work [9] distance learning method by using
it for cluster aggregation.
In order to test the IDLA method and visualize the results, user interface (UI) was
developed. The user can choose the clustering algorithms, define its parameters
(number of clusters and distance function), visualize the intermediate results and use
this results as an input for consensus methods (IDLA and HGPA). The outcome of
each consensus method is an aggregated partition, which is then used to draw final
image.
Based on the produced image, we make a comparison between the two algorithms and
conclude results. The comparison is made with the Rand Index method. Details are in
section 2.
Our application was tested with various images as an input, and total of 162 samples
were gathered. Based on the samples, and analysis of results, we came to conclusion
that the IDLA method produce better consensus than the HGPA method and the
improvement of the IDLA method over the HGPA method is 10.7%
The rest of the paper is organized as following: Section 2 presents the research that
was made. This chapter contains the project theory, general flow and implementation.
Experiments and outcomes are also presented. The last two sections provides the
references that is been used and algorithms extensions.
Image Segmentation via Clustering Consensus Methods
2
2 Research Approach
2.1 Theory
In our project we use two iterative clustering algorithms (PAM and K-Means) for
grouping the data, and two consensus algorithms (IDLA and HGPA).
2.1.1
Cluster Algorithms
Clustering is a task of assigning a set of objects into groups (called clusters). Objects in
the same cluster are more similar to each other than to those in other clusters. In our
project we use two iterative clustering methods (PAM and K-Means). Illustration of
clustering process is presented below:
Clustering
Dataset
Figure 2.1: Illustration of clustering process.
2.1.2
Consensus Clustering Algorithms
Consensus clustering aims to find a single, final clustering, which is better in some
sense then existing clusterings. In this approach, all input clustering solutions, or a
selected subset of input clustering solutions are combined together to output a single,
better, consensus clustering. The result is a better cluster that emphasizes the needed
objectives. Clusters are better separated and objective functions are improved. [12]
Many partitions
Single partition
Dataset
Figure 2.2: Consensus clustering illustration
We use two algorithms for consensus clustering: HPGA and IDLA.
Image Segmentation via Clustering Consensus Methods
3
HGPA (HyperGraph-Partitioning Algorithm)
HGPA algorithm is an approach to aggregate an ensemble of clusters into one cluster.
The cluster ensemble problem is formulated as partitioning the hypergraph by cutting a
minimal number of hyperedges. All vertex and hyperedges are equally weighted. The
aim is to find hyperedge separator that partitions the hypergraph into k unconnected
components\clusters of approximately the same size. [2]
IDLA (Information Distance Learning Approach)
IDLA (Information distance learning approach) is a proposed solution for the problem
which exists in consensus cluster method. The problem in clustering aggregation
methods is the loss of special data features. Most aggregation methods based on the
clustering results only. E.g. If x and y belongs to the same cluster in the majority of
partitions, they will belong to the same cluster in the consensus partition. IDLA
method extends the Wang and Jin work [9] distance learning method by using it for
cluster aggregation.
In this paper we don’t present the details of each algorithm. Such information can be
found in the first stage of the project paper.
2.1.3
Rand Index
The measure of similarity between two data clustering is indicated with Rand Index
parameter. Given a set of n elements S  {q1 ,.., qn } and two partitions of S to compare,
X  {x1 ,.., xr } a partition of S into r subsets, and Y  { y1 ,.., ys } a partition of S into s
subsets, we define the following parameters:

a , the number of pairs of elements in S that are in the same set in X and in the
same set in Y .
 b , the number of pairs of elements in S that are in different sets in X and in
different sets in Y .
 c , the number of pairs of elements in S that are in the same set in X and in
different sets in Y .
 d , the number of pairs of elements in S that are in different sets in X and in the
same set in Y .
Rand index parameter is calculated with the formula:
R
ab
abcd
Rand index has the value between 0 and 1. Value 0 indicates the two partitions are
completely different and value 1 indicates two partitions are exactly the same.
Image Segmentation via Clustering Consensus Methods
4
2.2 Implementation
This section describes the project general flow, implementation, data used and design
of our system.
2.2.1
Project general flow
Clustering Algorithms
K-means
partitions
Clustering
Aggregation
Partition 1
Picture (Input)
K-means.
Partition 2
Partition 3
HGPA
Partition 4
PAM partitions
Partition 1
PAM
Partition 2
IDLA
Partition 3
Combining the partitions into one
consensus partition
Figure 2.3: Project general flow.
System receives image as an input. Image is then processed by two clustering
algorithms (PAM and K-Means). Clusters of each algorithm is created and presented
on user interface. Each group of clusters is considers as a partition and serves as an
input for clustering aggregation methods (HGPA and IDLA). The result of each
clustering aggregation algorithm is presented as an image.
2.2.2
Data used
Our application receives images as an input. In every run, one image is being
processed. Images used in our application include images which we created for testing,
common images used in image processing checks and real world images.
During the project development we tested the program flow with small images contain
up to 10 colors and small amount of pixels. Thus we verified that internal logic is
correct. Example of such pictures presented below.
Figure 2.4: Images used for testing, contains up to 5 colors.
Image Segmentation via Clustering Consensus Methods
5
Once the logic and results were verified, we tested our system on common images in
image processing field [14]. Example of such pictures:
Figure 2.5: Common images used in image processing checks.
Algorithms were also applied to a real world images:
Figure 2.6: Real world images used for checks.
Our project was tested with variety of images. Outcomes of experiments are present in
the Testing and Validation section.
Image Segmentation via Clustering Consensus Methods
6
2.2.3
Design
The following system layers were used:
1. User interface –is built with the WPF technology. WPF development
framework intended for building and designing rich user interfaces.
2. Business Logic – contains the application classes. This is the logic part of the
system. External libraries (such as Matlab dll files and C# matrix calculation
libraries) are also used.
3. Data-Access – One of the systems aims is to load images from local file
system. This module contains the code for such operations.
4. File system is used for storing the input images. Temporary files for internal
computations are also managed on the file system, in a dedicated folder.
System module architecture is presented in figure 2.7
User Interface
«uses»
Business Logic
«uses»
Data Access
«uses»
File System
Figure 2.7: General system module architecture.
Image Segmentation via Clustering Consensus Methods
7
2.2.4
User Interface
User interface was developed in WPF technology and contains two main screens. The
first one is the program main page. User has the following options:
1. Ability to choose the image to process.
2. Add and remove the clustering algorithm. For each algorithm user can choose
the number of clusters and distance function (Euclidean \ Minkowski \
Manhattan). Information about each distance function can be found in
Appendix A.
3. For HGPA and IDLA algorithm user can choose the number of clusters.
Figure 2.8: Main user interface screenshot.
Once the run is launched, results for PAM and K-Means clustering algorithms are
shown. In the following examples the following clustering functions were chosen:
1.
2.
3.
4.
5.
PAM Algorithm with 6 clusters and Minkowski distance function.
K-Means algorithm with 4 clusters and Euclidean distance function.
PAM Algorithm with 6 clusters and Manhattan distance function
K-Means algorithm with 5 clusters and Minkowski distance function.
PAM Algorithm with 7 clusters and Euclidean distance function
Intermediate clustering results are shown in UI, figure 2.9
Image Segmentation via Clustering Consensus Methods
8
Figure 2.9: Visualization of intermediate clustering results.
For every clustering algorithm, results are shown in UI. Each clustering result (which
represents partition) is compared to HGPA and IDLA consensus partition. Similarity
parameter is computed with Rand Index method.
Rand index has the value between 0 and 1. The better result is the one which has a
higher value of Rand Index. More information about Rand Index is presented in the
Theory section 2.1.3.
Image Segmentation via Clustering Consensus Methods
9
2.2.5
Use case diagram
Figure 2.10: Use-case diagram of the general system operation.
The use case above defines the interactions between the user and the system. User
loads an image that he interested to process. Initial image is visualized in the UI. Then
user has the ability to add\remove clustering method (PAM and K-Means). For each
clustering method, there is an option to define the number of clusters and choose the
distance function to apply (Euclidean \ Manhattan \ Minkowski). More information
about distance function can be found in Appendix A). Images are visualized to the user
for each clustering method. Group of result clusters are created, and aggregation
methods are applied (IDLA and HGPA). The result of each aggregation method is
displayed on UI for visualization.
Image Segmentation via Clustering Consensus Methods
10
2.2.6
Class Diagram
Following is a class diagram chart used in our project:
Figure 2.11: System class diagram.
Image Segmentation via Clustering Consensus Methods
11
2.2.7
Constraints and Limitations
System design considers the following guidelines:




2.2.8
Parts of segmentation process are implemented using existing libraries (Matlab
and hMetis)
Application supports future extensions. Additional algorithms, such as
clustering algorithms or new consensus methods could be integrated easily.
Images for segmentation are loaded from local file-system.
We refrained from using big input images (size more than ~ 60x60 pixels). The
reason for this is the maximum allowed matrix allocation size. (In Windows 32
operation systems, we cannot define matrix with size increasing 100 million
cells). The solution for this problem is to run the application on Windows 64
operation system, where we can define bigger number of elements.
Programming environment
Project was developed in Microsoft Visual Studio 2010 SP2.0 development
environment.
The programming language is C#. User interface is build using the WPF technology
(Windows Presentation Foundation technology).
PAM algorithm and part of IDLA were implemented in Matlab environment. With
Matlab deployment tool we created a *.dll file, and used it as an external reference in
our project.
Image Segmentation via Clustering Consensus Methods
12
2.3 Testing and Validation
This section demonstrates our test results:
2.3.1
Experiment #1
In the first experiment we use House image. Two algorithms with PAM clustering
methods and one with K-Means method were used. For each one we set different
number of clusters and different distance function. Images for consensus clustering
HGPA and IDLA are produced, both with seven clusters.
Figure 2.12: Experiment 1, HGPA and IDLA results on the right Results pane.
Intermediate results for each clustering method are produced. Rand Index parameters
compared to HGPA and IDLA is calculated and shown on UI as well. All three
examples produced higher Rand index parameter for IDLA algorithm. Higher Rand
index indicates the partition of the image more similar to HGPA\IDLA consensus.
Figure 2.13: Experiment 1 – Intermediate results.
Image Segmentation via Clustering Consensus Methods
13
2.3.2
Experiment #2
Figure 2.14: Experiment 2.
Figure 2.15: Experiment 2 – Intermediate results.
At this experiment, we also receive higher Rand Index for IDLA algorithm.
Image Segmentation via Clustering Consensus Methods
14
2.3.3
Experiment #3
Figure 2.16: Experiment 3.
Figure 2.17: Experiment 3 – Intermediate results.
In this experiment, all results partitions are more similar to IDLA consensus clustering.
Image Segmentation via Clustering Consensus Methods
15
2.3.4
Experiment #4
Figure 2.18: Experiment 4.
Figure 2.19: Experiment 4 – Intermediate results.
In this test, two out of three images are more similar to IDLA consensus clustering (the
left one and the right one). The image in the middle is slightly more similar to HGPA
consensus clustering.
Image Segmentation via Clustering Consensus Methods
16
2.3.5
Experiment #5
Figure 2.20: Experiment 5.
Figure 2.21: Experiment 5 – Intermediate results.
In this test, all results partitions are more similar to IDLA consensus clustering.
Five experiments we presented above. We performed 162 samples in total. Results are
summarized in the next section. Details about each sample are presented in Appendix
B.
Image Segmentation via Clustering Consensus Methods
17
2.4 Outcomes
The following section is based on results in Appendix B. We analyzed a total of 162
samples. Table 2.1 presents the number of samples for every Rand Index interval, in
HGPA and IDLA algorithms.
Algorithm
HGPA
IDLA
Rand Index Value
Below 0.7 0.7 - 0.74 0.75 - 0.79 0.8 - 0.84 0.85 - 0.89 0.9 - 0.94
0.95 - 1
36
12
27
32
34
19
2
4
2
10
29
33
60
24
Table 2.1: Number of samples for every Rand Index interval.
Following two histograms represents the Rand Index distribution over the samples.
Figure 2.22: HGPA number of samples Vs. Rand Index parameter.
Figure 2.23: IDLA number of samples Vs. Rand Index parameter.
Image Segmentation via Clustering Consensus Methods
18
Graphs conclusions





In HGPA graph, 66% of Rand Index values are below 0.85 and 29.6% of the
samples are below 0.75
In IDLA graph, 27.7% of Rand Index values are below 0.85 and 3.7% of the
samples are below 0.75
In IDLA graph, the distribution of data is closer to 1 than in HGPA graph.
Average Rand Index of HGPA graph is 0.777, while in IDLA graph the
average is 0.884
IDLA algorithm improves the clustering similarity by 10.7%
10.7%  (0.884  0.777) *100
The following graph presents the influence of number of clusters on the Rand Index
parameter.
Figure 2.24: Rand Index value vs. number of clusters.
Graph conclusions




When number of clusters is below 6, HGPA produce much lower Rand Index
value than IDLA algorithm.
HGPA Rand Index parameter improves as we increase the number of clusters.
IDLA results are more stable than HGPA results.
As the number of clusters increase, the Rand Index parameter of both
algorithms converge to 0.91
Image Segmentation via Clustering Consensus Methods
19
Final conclusions:
1. Consensus partition which IDLA algorithm produces is more similar to
the input partitions than the HGPA algorithm.
2. IDLA algorithm produce better consensus regardless the number of
clusters.
3. Improvement of IDLA method over HGPA method is 10.7%
4. IDLA approach is preferable due to the fact that it takes into account
additional machine learning steps.
Image Segmentation via Clustering Consensus Methods
20
References
[1] Strehl, A. and Ghosh, J. (2003). Relationship-based clustering and visualization for
high-dimensional data mining. INFORMS Journal on Computing 15(2):208–230.
[2] Strehl, A.; Ghosh, J.; and Cardie, C. (2002). Cluster ensembles - a knowledge reuse
framework for combining multiple partitions. Journal of Machine Learning Research
3:583–617
[3] Yi Zhang and Tao Li. Consensus Clustering + Meta Clustering = Multiple
Consensus Clustering. School of Computer Science, Florida International University,
Miami
[4] G. Kharypis, R. Aggarwal, V. Kumar, and S. Shekhar. Multilevel hypergraph
partitioning: Applications in vlsi domain. In Proceedings of the Design and
Automation Conference, (1997).
[5] G. Kharypis and V. Kumar. Multiple k-way partitioning, scheme for irregular
graphs. SIAM Journal on Scientific Computing, 20(1):359–392, (1998).
[6] Sankar K. Pal and PabritaMitra - Pattern Recognition Algorithms for Data Mining
(Pages 1-5, Pattern Recognition in Brief, Pages 167-170 Ensemble Classifiers)
[7] David Vernon – Machine Vision, Automated visual inspection and Robot Vision.
(Pages 118-122, Image Analysis).
[8] Nam Nguyen, Rich Caruana, Consensus Clusterings, Department of Computer
Science, Cornell University Ithaca, New York
[9] Shijun Wang and Rong Jin, Dept. of Radiology and Imaging Sciences National
Institutes of Health and Dept. of Computer Science and Engineering Michigan State
University (respectively), an information geometry approach for distance metric
learning.
[10] Martin Hahmann, Dirk Habich, and Wolfgang Lehner, Browsing Robust
Clustering-Alternatives, TU Dresden; Database Technology Group; Dresden, Germany
[11] A. K. Jain, M. N. Murty, and P. J. Flynn. Data clustering: a review. ACM
Comput. Surv., 31(3), (1999).
[12] A. Gionis, H. Mannila, and P. Tsaparas. Clustering aggregation. In Proc. of ICDE,
2005.
[13] J. Macqueen, some methods for classification and analysis of multivariate
observations, university of California, Los Angeles
[14] Link for testing images, University of Southern California, Signal and Image
Processing Institute:
http://sipi.usc.edu/database/database.php?volume=misc&image=7#top
Image Segmentation via Clustering Consensus Methods
21
Appendix A
Distance Function
Distance function is a mathematical formula which defines the distance between the
elements in the data set.
Euclidian distance
Euclidian distance is the distance between two points that could be measured with a
ruler. It is calculated by Pythagorean formula:
Manhattan distance
The distance between two points in a grid based on a strictly horizontal and/or vertical
path. Manhattan distance is the sum of the horizontal and vertical components,
whereas the diagonal distance might be computed by applying the Pythagorean
theorem.
Minkowski distance
The Minkowski distance is a metric on Euclidean space which can be considered as a
generalization of both the Euclidean distance and the Manhattan distance.
The Minkowski distance of order p between two points
is defined as:
Image Segmentation via Clustering Consensus Methods
22
Appendix B
Raw results
Following are the complete results as were measured. The graphs in section 2.4 and
conclusions are based on these results.
Picture name
Clustering
algorithm
Number of
clusters
Distance
function
Lenna
Lenna
Lenna
Lenna
Lenna
Lenna
Lenna
Lenna
Lenna
Jelly beans
Jelly beans
Jelly beans
Jelly beans
Jelly beans
Jelly beans
Luna
Luna
Luna
Luna
Luna
Luna
Luna
Luna
Luna
Mountains
Mountains
Mountains
Mountains
Mountains
Mountains
Mountains
Mountains
Mountains
Shoes
Shoes
Shoes
Shoes
Shoes
Shoes
Fish
Fish
Fish
Fish
Fish
Fish
Fish
Fish
Fish
K-Means
K-Means
K-Means
PAM
PAM
PAM
PAM
PAM
PAM
K-Means
K-Means
K-Means
K-Means
K-Means
K-Means
PAM
PAM
K-Means
K-Means
K-Means
K-Means
K-Means
K-Means
K-Means
K-Means
K-Means
K-Means
K-Means
K-Means
K-Means
PAM
PAM
PAM
PAM
PAM
PAM
PAM
PAM
PAM
PAM
PAM
K-Means
PAM
PAM
K-Means
PAM
PAM
K-Means
7
8
9
7
8
9
10
11
12
10
11
12
7
8
9
5
4
6
7
8
9
9
9
9
10
11
12
10
11
12
4
4
4
4
5
6
4
5
6
5
4
6
7
8
9
10
11
12
Minkowski
Euclidian
Manhattan
Minkowski
Euclidian
Manhattan
Minkowski
Euclidian
Manhattan
Euclidian
Euclidian
Euclidian
Euclidian
Euclidian
Euclidian
Minkowski
Euclidian
Manhattan
Euclidian
Euclidian
Euclidian
Euclidian
Euclidian
Euclidian
Euclidian
Euclidian
Euclidian
Euclidian
Euclidian
Euclidian
Euclidian
Euclidian
Euclidian
Euclidian
Euclidian
Euclidian
Euclidian
Euclidian
Euclidian
Minkowski
Euclidian
Manhattan
Minkowski
Euclidian
Manhattan
Minkowski
Euclidian
Manhattan
Number of
clusters in
HGPA alg.
6
6
6
7
7
7
10
10
10
10
10
10
8
8
8
6
6
6
8
8
8
9
9
9
9
9
9
11
11
11
4
4
4
4
4
4
6
6
6
6
6
6
7
7
7
10
10
10
Image Segmentation via Clustering Consensus Methods
Number of
clusters in
IDLA alg.
6
6
6
7
7
7
10
10
10
10
10
10
8
8
8
6
6
6
8
8
8
9
9
9
9
9
9
11
11
11
4
4
4
4
4
4
6
6
6
6
6
6
7
7
7
10
10
10
Rand Index
HGPA
Rand Index
IDLA
0.85
0.84
0.86
0.81
0.82
0.8
0.88
0.89
0.87
0.81
0.82
0.84
0.71
0.73
0.71
0.79
0.83
0.72
0.69
0.68
0.73
0.78
0.78
0.77
0.76
0.8
0.8
0.83
0.88
0.89
0.62
0.62
0.61
0.61
0.64
0.66
0.85
0.9
0.87
0.85
0.83
0.86
0.83
0.86
0.87
0.84
0.85
0.84
0.91
0.92
0.94
0.89
0.89
0.88
0.95
0.93
0.91
0.94
0.94
0.88
0.81
0.81
0.82
0.84
0.88
0.8
0.96
0.92
0.84
0.95
0.87
0.84
0.84
0.88
0.88
0.91
0.94
0.94
0.96
0.96
0.94
0.83
0.86
0.86
0.84
0.83
0.89
0.82
0.83
0.86
0.8
0.84
0.87
0.9
0.94
0.93
23
Picture name
Clustering
algorithm
Number of
clusters
Distance
function
Number of
clusters in
HGPA alg.
Number of
clusters in
IDLA alg.
Rand Index
HGPA
Rand Index
IDLA
Fish
Fish
Fish
Fish
Fish
Fish
Plane
Plane
Plane
Plane
Plane
Plane
Plane
Plane
Plane
Drop
Drop
Drop
Drop
Drop
Drop
Drop
Drop
Drop
Camels
Camels
Camels
Camels
Camels
Camels
Camels
Camels
Camels
Diva
Diva
Diva
Diva
Diva
Diva
Plane2
Plane2
Plane2
Plane2
Plane2
Plane2
Plane2
Plane2
Plane2
Plane2
Plane2
Plane2
PAM
PAM
K-Means
K-Means
K-Means
K-Means
K-Means
K-Means
K-Means
K-Means
K-Means
K-Means
K-Means
K-Means
K-Means
K-Means
K-Means
K-Means
K-Means
K-Means
K-Means
K-Means
K-Means
K-Means
PAM
PAM
PAM
PAM
PAM
PAM
PAM
PAM
PAM
K-Means
K-Means
K-Means
K-Means
K-Means
K-Means
K-Means
K-Means
K-Means
K-Means
K-Means
K-Means
K-Means
K-Means
K-Means
PAM
PAM
PAM
13
14
15
13
14
15
4
5
6
7
8
9
10
11
12
5
4
6
7
8
9
10
11
12
4
5
6
7
8
9
10
11
12
4
5
6
7
8
9
2
3
4
5
6
7
5
6
7
8
9
10
Minkowski
Euclidian
Manhattan
Minkowski
Euclidian
Manhattan
Minkowski
Euclidian
Manhattan
Minkowski
Euclidian
Manhattan
Minkowski
Euclidian
Manhattan
Euclidian
Euclidian
Euclidian
Euclidian
Euclidian
Euclidian
Euclidian
Euclidian
Euclidian
Minkowski
Minkowski
Minkowski
Manhattan
Manhattan
Manhattan
Minkowski
Minkowski
Minkowski
Euclidian
Euclidian
Euclidian
Euclidian
Euclidian
Euclidian
Euclidian
Euclidian
Euclidian
Euclidian
Euclidian
Euclidian
Euclidian
Euclidian
Euclidian
Minkowski
Euclidian
Manhattan
13
13
13
13
13
13
4
4
4
7
7
7
10
10
10
6
6
6
9
9
9
12
12
12
6
6
6
9
9
9
12
12
12
6
6
6
8
8
8
3
3
3
5
5
5
7
7
7
8
8
8
13
13
13
13
13
13
4
4
4
7
7
7
10
10
10
6
6
6
9
9
9
12
12
12
6
6
6
9
9
9
12
12
12
6
6
6
8
8
8
3
3
3
5
5
5
7
7
7
8
8
8
0.9
0.93
0.93
0.93
0.95
0.93
0.58
0.6
0.64
0.87
0.9
0.91
0.86
0.88
0.88
0.85
0.83
0.85
0.79
0.84
0.85
0.89
0.91
0.89
0.8
0.78
0.81
0.74
0.77
0.78
0.9
0.96
0.92
0.83
0.71
0.84
0.73
0.76
0.77
0.68
0.69
0.66
0.59
0.68
0.71
0.62
0.73
0.76
0.7
0.76
0.76
0.93
0.92
0.93
0.97
0.98
0.95
0.98
0.82
0.78
0.99
0.93
0.94
0.99
0.93
0.94
0.96
0.93
0.98
0.89
0.94
0.97
0.93
0.93
0.95
0.86
0.82
0.9
0.88
0.94
0.93
0.9
0.93
0.92
0.89
0.82
0.88
0.89
0.95
0.93
0.68
0.68
0.66
0.77
0.77
0.76
0.73
0.8
0.84
0.8
0.8
0.84
Image Segmentation via Clustering Consensus Methods
24
Picture name
Clustering
algorithm
Number of
clusters
Distance
function
Number of
clusters in
HGPA alg.
Number of
clusters in
IDLA alg.
Rand Index
HGPA
Rand Index
IDLA
ToyHouse
ToyHouse
ToyHouse
ToyHouse
ToyHouse
ToyHouse
ToyHouse
ToyHouse
ToyHouse
ToyHouse
ToyHouse
ToyHouse
ToyHouse2
ToyHouse2
ToyHouse2
ToyHouse2
ToyHouse2
ToyHouse2
ToyHouse2
ToyHouse2
ToyHouse2
ToyHouse2
ToyHouse2
ToyHouse2
Tree
Tree
Tree
Tree
Tree
Tree
Tree
Tree
Tree
Tree
Tree
Tree
USA
USA
USA
USA
USA
USA
USA
USA
USA
USA
USA
USA
Jelly beans 2
Jelly beans 2
Jelly beans 2
Jelly beans 2
Jelly beans 2
Jelly beans 2
K-Means
K-Means
K-Means
PAM
PAM
PAM
PAM
PAM
PAM
K-Means
K-Means
K-Means
PAM
PAM
PAM
K-Means
K-Means
K-Means
PAM
PAM
PAM
K-Means
K-Means
K-Means
K-Means
K-Means
K-Means
PAM
PAM
PAM
K-Means
K-Means
K-Means
PAM
PAM
PAM
PAM
PAM
PAM
K-Means
K-Means
K-Means
PAM
PAM
PAM
K-Means
K-Means
K-Means
K-Means
K-Means
K-Means
PAM
PAM
PAM
2
3
4
5
6
7
8
9
10
11
12
13
2
3
4
5
6
7
8
9
10
11
12
13
2
3
4
5
6
7
8
9
10
11
12
13
2
3
4
5
6
7
8
9
10
11
12
13
2
3
4
5
6
7
Euclidian
Euclidian
Euclidian
Euclidian
Euclidian
Euclidian
Euclidian
Euclidian
Euclidian
Euclidian
Euclidian
Euclidian
Euclidian
Euclidian
Euclidian
Euclidian
Euclidian
Euclidian
Euclidian
Euclidian
Euclidian
Euclidian
Euclidian
Euclidian
Minkowski
Euclidian
Manhattan
Minkowski
Euclidian
Manhattan
Minkowski
Euclidian
Manhattan
Minkowski
Euclidian
Manhattan
Minkowski
Euclidian
Manhattan
Minkowski
Euclidian
Manhattan
Minkowski
Euclidian
Manhattan
Minkowski
Euclidian
Manhattan
Minkowski
Euclidian
Manhattan
Minkowski
Euclidian
Manhattan
4
4
4
7
7
7
9
9
9
13
13
13
4
4
4
7
7
7
10
10
10
11
11
11
2
2
2
5
5
5
8
8
8
13
13
13
4
4
4
7
7
7
10
10
10
12
12
12
3
3
3
6
6
6
4
4
4
7
7
7
9
9
9
13
13
13
4
4
4
7
7
7
10
10
10
11
11
11
2
2
2
5
5
5
8
8
8
13
13
13
4
4
4
7
7
7
10
10
10
12
12
12
3
3
3
6
6
6
0.44
0.51
0.54
0.76
0.89
0.92
0.75
0.76
0.78
0.89
0.88
0.89
0.5
0.56
0.59
0.91
0.92
0.9
0.84
0.85
0.86
0.81
0.82
0.82
0.5
0.5
0.5
0.66
0.66
0.67
0.76
0.77
0.78
0.9
0.91
0.91
0.43
0.55
0.6
0.83
0.85
0.83
0.77
0.79
0.8
0.89
0.9
0.89
0.48
0.52
0.53
0.68
0.74
0.77
0.8
0.94
0.99
0.94
0.79
0.82
0.88
0.86
0.87
0.89
0.93
0.88
0.74
0.79
0.79
0.85
0.83
0.89
0.89
0.91
0.9
0.95
0.92
0.95
0.97
0.81
0.76
0.89
0.9
0.93
0.99
0.92
0.93
0.91
0.89
0.91
0.62
0.77
0.89
0.91
0.9
0.92
0.93
0.9
0.92
0.93
0.98
0.93
0.78
0.96
0.91
0.97
0.84
0.8
Image Segmentation via Clustering Consensus Methods
25
Picture name
Clustering
algorithm
Number of
clusters
Distance
function
Jelly beans 2
Jelly beans 2
Jelly beans 2
House
House
House
House
House
House
K-Means
K-Means
K-Means
PAM
PAM
PAM
K-Means
K-Means
K-Means
8
9
10
8
9
10
5
6
7
Minkowski
Euclidian
Manhattan
Minkowski
Euclidian
Manhattan
Minkowski
Euclidian
Manhattan
Number of
clusters in
HGPA alg.
10
10
10
10
10
10
7
7
7
Number of
clusters in
IDLA alg.
10
10
10
10
10
10
7
7
7
Rand Index
HGPA
Rand Index
IDLA
0.78
0.78
0.81
0.79
0.81
0.81
0.83
0.85
0.85
0.85
0.85
0.85
0.9
0.9
0.9
0.92
0.9
0.91
Average Rand Index
Following tables represents the average Rand Index parameter in HGPA and IDLA
algorithms, grouped by number of clusters. Table results were used in figure 2.24
Image Segmentation via Clustering Consensus Methods
26
Download