Image Segmentation via Clustering Consensus Methods Jacob Tamir tamiob@gmail.com Kucherov Tamir tamirkucherov@gmail.com Supervisors: Prof. Volkovich Zeev Prof. Barzily Zeev MSc. In Software Engineering Ort Braude College This report is submitted in partial fulfillment of the requirement for the degree of MSc in Software Engineering, Second Stage 7.9.2012 Abstract The project is devoted for segmentation of images using the consensus-clustering methods. We apply clustering aggregation methods in order to receive relevant answers for predefined questions. As known, no algorithm is capable to reflect perfectly the image structure. In our project we cluster the images by means of several clustering algorithms, and summarize the results via two clustering consensus techniques: HGPA (Hyper Graph Partition Algorithm) and IDLA (Information Distance Learning Approach). The first one is a well-known approach which is used in many practical systems. The second one is a new approach developed in ORT Braude College. The IDLA approach is different from HGPA by taking into account additional machine learning steps. The outcomes obtained by the two algorithms were summarized and compared. We came to conclusion that the IDLA technique is preferable and the improvement over the HGPA method is 10.7% Table of Contents 1 INTRODUCTION ................................................................................................................................. 1 2 RESEARCH APPROACH ...................................................................................................................... 3 2.1 2.1.1 2.1.2 2.1.3 2.2 2.2.1 2.2.2 2.2.3 2.2.4 2.2.5 2.2.6 2.2.7 2.2.8 2.3 2.3.1 2.3.2 2.3.3 2.3.4 2.3.5 2.4 THEORY ................................................................................................................................ 3 Cluster Algorithms ......................................................................................... 3 Consensus Clustering Algorithms .................................................................. 3 Rand Index ..................................................................................................... 4 IMPLEMENTATION ................................................................................................................. 5 Project general flow ...................................................................................... 5 Data used ...................................................................................................... 5 Design ............................................................................................................ 7 User Interface ................................................................................................ 8 Use case diagram ........................................................................................ 10 Class Diagram .............................................................................................. 11 Constraints and Limitations ......................................................................... 12 Programming environment .......................................................................... 12 TESTING AND VALIDATION.................................................................................................... 13 Experiment #1 ............................................................................................... 13 Experiment #2 ............................................................................................... 14 Experiment #3 ............................................................................................... 15 Experiment #4 ............................................................................................... 16 Experiment #5 ............................................................................................... 17 OUTCOMES ....................................................................................................................... 18 REFERENCES ............................................................................................................................. 21 APPENDIX A ............................................................................................................................. 22 DISTANCE FUNCTION ........................................................................................................ 22 Euclidian distance ......................................................................................... 22 Manhattan distance ..................................................................................... 22 Minkowski distance ...................................................................................... 22 APPENDIX B.............................................................................................................................. 23 RAW RESULTS .................................................................................................................. 23 AVERAGE RAND INDEX ....................................................................................................... 26 1 Introduction Image segmentation is a process of partitioning an image into segments. Each segment represents group of nearby pixels. The segmentation is based on similarities between pixel properties, such as pattern, color and brightness. The purpose of the segmentation process is to simplify the image understanding by emphasize the objects that interests us. Image-recognition and machine-vision algorithms are using the result of the segmentation process in order to recognize desired objects in the image, such as borders, lines, or predefined shapes. Image segmentation is practically used in face recognition, medical imaging, fingerprint recognition and in other fields. There are several well-known algorithms and techniques for image segmentation. The technique changes according to the problem nature. The segmentation technique that we are focusing on is a combination of the clustering consensus aggregation method. The clustering main task is to classify a dataset of objects and assign it into groups (clusters), so that data objects in the same group will share common properties and features among themselves and will be similar to each other more than to objects in other groups [11]. The task of clustering can be achieved by using a wide range of cluster algorithms. It is known that no superior algorithm can reflect perfectly the image structure. In order to achieve more accurate result, clustering should be an iterative process which summarizes several different cluster approaches. For each step, different cluster algorithm is applied with different initial partition and different input parameters. The output partitions are being explored and the best one is selected. In our work we investigate different approaches (consensus clustering) which take output partitions and create a consensus summarizing one. Consensus clustering, also called cluster aggregation or cluster ensemble refers to the problem of finding a single (consensus) partition from a number of different partitions. Input partitions are combined together to output a single consensus partition that is “better” than the existing partitions [12]. Most consensus clustering methods are based on the clustering results only. Thus we can lose the special data features. The solution to this problem could be to use consensus clustering with distance learning function procedure. In our work we investigated two consensus clustering approaches: HGPA [2] (Hyper Graph Partitioning Algorithm) and IDLA (Information Distance Learning Approach). Comparison between the two methods is provided. Image Segmentation via Clustering Consensus Methods 1 As part of our work, we implemented the new proposed aggregation method (IDLA). The new method extends the Wang and Jin work [9] distance learning method by using it for cluster aggregation. In order to test the IDLA method and visualize the results, user interface (UI) was developed. The user can choose the clustering algorithms, define its parameters (number of clusters and distance function), visualize the intermediate results and use this results as an input for consensus methods (IDLA and HGPA). The outcome of each consensus method is an aggregated partition, which is then used to draw final image. Based on the produced image, we make a comparison between the two algorithms and conclude results. The comparison is made with the Rand Index method. Details are in section 2. Our application was tested with various images as an input, and total of 162 samples were gathered. Based on the samples, and analysis of results, we came to conclusion that the IDLA method produce better consensus than the HGPA method and the improvement of the IDLA method over the HGPA method is 10.7% The rest of the paper is organized as following: Section 2 presents the research that was made. This chapter contains the project theory, general flow and implementation. Experiments and outcomes are also presented. The last two sections provides the references that is been used and algorithms extensions. Image Segmentation via Clustering Consensus Methods 2 2 Research Approach 2.1 Theory In our project we use two iterative clustering algorithms (PAM and K-Means) for grouping the data, and two consensus algorithms (IDLA and HGPA). 2.1.1 Cluster Algorithms Clustering is a task of assigning a set of objects into groups (called clusters). Objects in the same cluster are more similar to each other than to those in other clusters. In our project we use two iterative clustering methods (PAM and K-Means). Illustration of clustering process is presented below: Clustering Dataset Figure 2.1: Illustration of clustering process. 2.1.2 Consensus Clustering Algorithms Consensus clustering aims to find a single, final clustering, which is better in some sense then existing clusterings. In this approach, all input clustering solutions, or a selected subset of input clustering solutions are combined together to output a single, better, consensus clustering. The result is a better cluster that emphasizes the needed objectives. Clusters are better separated and objective functions are improved. [12] Many partitions Single partition Dataset Figure 2.2: Consensus clustering illustration We use two algorithms for consensus clustering: HPGA and IDLA. Image Segmentation via Clustering Consensus Methods 3 HGPA (HyperGraph-Partitioning Algorithm) HGPA algorithm is an approach to aggregate an ensemble of clusters into one cluster. The cluster ensemble problem is formulated as partitioning the hypergraph by cutting a minimal number of hyperedges. All vertex and hyperedges are equally weighted. The aim is to find hyperedge separator that partitions the hypergraph into k unconnected components\clusters of approximately the same size. [2] IDLA (Information Distance Learning Approach) IDLA (Information distance learning approach) is a proposed solution for the problem which exists in consensus cluster method. The problem in clustering aggregation methods is the loss of special data features. Most aggregation methods based on the clustering results only. E.g. If x and y belongs to the same cluster in the majority of partitions, they will belong to the same cluster in the consensus partition. IDLA method extends the Wang and Jin work [9] distance learning method by using it for cluster aggregation. In this paper we don’t present the details of each algorithm. Such information can be found in the first stage of the project paper. 2.1.3 Rand Index The measure of similarity between two data clustering is indicated with Rand Index parameter. Given a set of n elements S {q1 ,.., qn } and two partitions of S to compare, X {x1 ,.., xr } a partition of S into r subsets, and Y { y1 ,.., ys } a partition of S into s subsets, we define the following parameters: a , the number of pairs of elements in S that are in the same set in X and in the same set in Y . b , the number of pairs of elements in S that are in different sets in X and in different sets in Y . c , the number of pairs of elements in S that are in the same set in X and in different sets in Y . d , the number of pairs of elements in S that are in different sets in X and in the same set in Y . Rand index parameter is calculated with the formula: R ab abcd Rand index has the value between 0 and 1. Value 0 indicates the two partitions are completely different and value 1 indicates two partitions are exactly the same. Image Segmentation via Clustering Consensus Methods 4 2.2 Implementation This section describes the project general flow, implementation, data used and design of our system. 2.2.1 Project general flow Clustering Algorithms K-means partitions Clustering Aggregation Partition 1 Picture (Input) K-means. Partition 2 Partition 3 HGPA Partition 4 PAM partitions Partition 1 PAM Partition 2 IDLA Partition 3 Combining the partitions into one consensus partition Figure 2.3: Project general flow. System receives image as an input. Image is then processed by two clustering algorithms (PAM and K-Means). Clusters of each algorithm is created and presented on user interface. Each group of clusters is considers as a partition and serves as an input for clustering aggregation methods (HGPA and IDLA). The result of each clustering aggregation algorithm is presented as an image. 2.2.2 Data used Our application receives images as an input. In every run, one image is being processed. Images used in our application include images which we created for testing, common images used in image processing checks and real world images. During the project development we tested the program flow with small images contain up to 10 colors and small amount of pixels. Thus we verified that internal logic is correct. Example of such pictures presented below. Figure 2.4: Images used for testing, contains up to 5 colors. Image Segmentation via Clustering Consensus Methods 5 Once the logic and results were verified, we tested our system on common images in image processing field [14]. Example of such pictures: Figure 2.5: Common images used in image processing checks. Algorithms were also applied to a real world images: Figure 2.6: Real world images used for checks. Our project was tested with variety of images. Outcomes of experiments are present in the Testing and Validation section. Image Segmentation via Clustering Consensus Methods 6 2.2.3 Design The following system layers were used: 1. User interface –is built with the WPF technology. WPF development framework intended for building and designing rich user interfaces. 2. Business Logic – contains the application classes. This is the logic part of the system. External libraries (such as Matlab dll files and C# matrix calculation libraries) are also used. 3. Data-Access – One of the systems aims is to load images from local file system. This module contains the code for such operations. 4. File system is used for storing the input images. Temporary files for internal computations are also managed on the file system, in a dedicated folder. System module architecture is presented in figure 2.7 User Interface «uses» Business Logic «uses» Data Access «uses» File System Figure 2.7: General system module architecture. Image Segmentation via Clustering Consensus Methods 7 2.2.4 User Interface User interface was developed in WPF technology and contains two main screens. The first one is the program main page. User has the following options: 1. Ability to choose the image to process. 2. Add and remove the clustering algorithm. For each algorithm user can choose the number of clusters and distance function (Euclidean \ Minkowski \ Manhattan). Information about each distance function can be found in Appendix A. 3. For HGPA and IDLA algorithm user can choose the number of clusters. Figure 2.8: Main user interface screenshot. Once the run is launched, results for PAM and K-Means clustering algorithms are shown. In the following examples the following clustering functions were chosen: 1. 2. 3. 4. 5. PAM Algorithm with 6 clusters and Minkowski distance function. K-Means algorithm with 4 clusters and Euclidean distance function. PAM Algorithm with 6 clusters and Manhattan distance function K-Means algorithm with 5 clusters and Minkowski distance function. PAM Algorithm with 7 clusters and Euclidean distance function Intermediate clustering results are shown in UI, figure 2.9 Image Segmentation via Clustering Consensus Methods 8 Figure 2.9: Visualization of intermediate clustering results. For every clustering algorithm, results are shown in UI. Each clustering result (which represents partition) is compared to HGPA and IDLA consensus partition. Similarity parameter is computed with Rand Index method. Rand index has the value between 0 and 1. The better result is the one which has a higher value of Rand Index. More information about Rand Index is presented in the Theory section 2.1.3. Image Segmentation via Clustering Consensus Methods 9 2.2.5 Use case diagram Figure 2.10: Use-case diagram of the general system operation. The use case above defines the interactions between the user and the system. User loads an image that he interested to process. Initial image is visualized in the UI. Then user has the ability to add\remove clustering method (PAM and K-Means). For each clustering method, there is an option to define the number of clusters and choose the distance function to apply (Euclidean \ Manhattan \ Minkowski). More information about distance function can be found in Appendix A). Images are visualized to the user for each clustering method. Group of result clusters are created, and aggregation methods are applied (IDLA and HGPA). The result of each aggregation method is displayed on UI for visualization. Image Segmentation via Clustering Consensus Methods 10 2.2.6 Class Diagram Following is a class diagram chart used in our project: Figure 2.11: System class diagram. Image Segmentation via Clustering Consensus Methods 11 2.2.7 Constraints and Limitations System design considers the following guidelines: 2.2.8 Parts of segmentation process are implemented using existing libraries (Matlab and hMetis) Application supports future extensions. Additional algorithms, such as clustering algorithms or new consensus methods could be integrated easily. Images for segmentation are loaded from local file-system. We refrained from using big input images (size more than ~ 60x60 pixels). The reason for this is the maximum allowed matrix allocation size. (In Windows 32 operation systems, we cannot define matrix with size increasing 100 million cells). The solution for this problem is to run the application on Windows 64 operation system, where we can define bigger number of elements. Programming environment Project was developed in Microsoft Visual Studio 2010 SP2.0 development environment. The programming language is C#. User interface is build using the WPF technology (Windows Presentation Foundation technology). PAM algorithm and part of IDLA were implemented in Matlab environment. With Matlab deployment tool we created a *.dll file, and used it as an external reference in our project. Image Segmentation via Clustering Consensus Methods 12 2.3 Testing and Validation This section demonstrates our test results: 2.3.1 Experiment #1 In the first experiment we use House image. Two algorithms with PAM clustering methods and one with K-Means method were used. For each one we set different number of clusters and different distance function. Images for consensus clustering HGPA and IDLA are produced, both with seven clusters. Figure 2.12: Experiment 1, HGPA and IDLA results on the right Results pane. Intermediate results for each clustering method are produced. Rand Index parameters compared to HGPA and IDLA is calculated and shown on UI as well. All three examples produced higher Rand index parameter for IDLA algorithm. Higher Rand index indicates the partition of the image more similar to HGPA\IDLA consensus. Figure 2.13: Experiment 1 – Intermediate results. Image Segmentation via Clustering Consensus Methods 13 2.3.2 Experiment #2 Figure 2.14: Experiment 2. Figure 2.15: Experiment 2 – Intermediate results. At this experiment, we also receive higher Rand Index for IDLA algorithm. Image Segmentation via Clustering Consensus Methods 14 2.3.3 Experiment #3 Figure 2.16: Experiment 3. Figure 2.17: Experiment 3 – Intermediate results. In this experiment, all results partitions are more similar to IDLA consensus clustering. Image Segmentation via Clustering Consensus Methods 15 2.3.4 Experiment #4 Figure 2.18: Experiment 4. Figure 2.19: Experiment 4 – Intermediate results. In this test, two out of three images are more similar to IDLA consensus clustering (the left one and the right one). The image in the middle is slightly more similar to HGPA consensus clustering. Image Segmentation via Clustering Consensus Methods 16 2.3.5 Experiment #5 Figure 2.20: Experiment 5. Figure 2.21: Experiment 5 – Intermediate results. In this test, all results partitions are more similar to IDLA consensus clustering. Five experiments we presented above. We performed 162 samples in total. Results are summarized in the next section. Details about each sample are presented in Appendix B. Image Segmentation via Clustering Consensus Methods 17 2.4 Outcomes The following section is based on results in Appendix B. We analyzed a total of 162 samples. Table 2.1 presents the number of samples for every Rand Index interval, in HGPA and IDLA algorithms. Algorithm HGPA IDLA Rand Index Value Below 0.7 0.7 - 0.74 0.75 - 0.79 0.8 - 0.84 0.85 - 0.89 0.9 - 0.94 0.95 - 1 36 12 27 32 34 19 2 4 2 10 29 33 60 24 Table 2.1: Number of samples for every Rand Index interval. Following two histograms represents the Rand Index distribution over the samples. Figure 2.22: HGPA number of samples Vs. Rand Index parameter. Figure 2.23: IDLA number of samples Vs. Rand Index parameter. Image Segmentation via Clustering Consensus Methods 18 Graphs conclusions In HGPA graph, 66% of Rand Index values are below 0.85 and 29.6% of the samples are below 0.75 In IDLA graph, 27.7% of Rand Index values are below 0.85 and 3.7% of the samples are below 0.75 In IDLA graph, the distribution of data is closer to 1 than in HGPA graph. Average Rand Index of HGPA graph is 0.777, while in IDLA graph the average is 0.884 IDLA algorithm improves the clustering similarity by 10.7% 10.7% (0.884 0.777) *100 The following graph presents the influence of number of clusters on the Rand Index parameter. Figure 2.24: Rand Index value vs. number of clusters. Graph conclusions When number of clusters is below 6, HGPA produce much lower Rand Index value than IDLA algorithm. HGPA Rand Index parameter improves as we increase the number of clusters. IDLA results are more stable than HGPA results. As the number of clusters increase, the Rand Index parameter of both algorithms converge to 0.91 Image Segmentation via Clustering Consensus Methods 19 Final conclusions: 1. Consensus partition which IDLA algorithm produces is more similar to the input partitions than the HGPA algorithm. 2. IDLA algorithm produce better consensus regardless the number of clusters. 3. Improvement of IDLA method over HGPA method is 10.7% 4. IDLA approach is preferable due to the fact that it takes into account additional machine learning steps. Image Segmentation via Clustering Consensus Methods 20 References [1] Strehl, A. and Ghosh, J. (2003). Relationship-based clustering and visualization for high-dimensional data mining. INFORMS Journal on Computing 15(2):208–230. [2] Strehl, A.; Ghosh, J.; and Cardie, C. (2002). Cluster ensembles - a knowledge reuse framework for combining multiple partitions. Journal of Machine Learning Research 3:583–617 [3] Yi Zhang and Tao Li. Consensus Clustering + Meta Clustering = Multiple Consensus Clustering. School of Computer Science, Florida International University, Miami [4] G. Kharypis, R. Aggarwal, V. Kumar, and S. Shekhar. Multilevel hypergraph partitioning: Applications in vlsi domain. In Proceedings of the Design and Automation Conference, (1997). [5] G. Kharypis and V. Kumar. Multiple k-way partitioning, scheme for irregular graphs. SIAM Journal on Scientific Computing, 20(1):359–392, (1998). [6] Sankar K. Pal and PabritaMitra - Pattern Recognition Algorithms for Data Mining (Pages 1-5, Pattern Recognition in Brief, Pages 167-170 Ensemble Classifiers) [7] David Vernon – Machine Vision, Automated visual inspection and Robot Vision. (Pages 118-122, Image Analysis). [8] Nam Nguyen, Rich Caruana, Consensus Clusterings, Department of Computer Science, Cornell University Ithaca, New York [9] Shijun Wang and Rong Jin, Dept. of Radiology and Imaging Sciences National Institutes of Health and Dept. of Computer Science and Engineering Michigan State University (respectively), an information geometry approach for distance metric learning. [10] Martin Hahmann, Dirk Habich, and Wolfgang Lehner, Browsing Robust Clustering-Alternatives, TU Dresden; Database Technology Group; Dresden, Germany [11] A. K. Jain, M. N. Murty, and P. J. Flynn. Data clustering: a review. ACM Comput. Surv., 31(3), (1999). [12] A. Gionis, H. Mannila, and P. Tsaparas. Clustering aggregation. In Proc. of ICDE, 2005. [13] J. Macqueen, some methods for classification and analysis of multivariate observations, university of California, Los Angeles [14] Link for testing images, University of Southern California, Signal and Image Processing Institute: http://sipi.usc.edu/database/database.php?volume=misc&image=7#top Image Segmentation via Clustering Consensus Methods 21 Appendix A Distance Function Distance function is a mathematical formula which defines the distance between the elements in the data set. Euclidian distance Euclidian distance is the distance between two points that could be measured with a ruler. It is calculated by Pythagorean formula: Manhattan distance The distance between two points in a grid based on a strictly horizontal and/or vertical path. Manhattan distance is the sum of the horizontal and vertical components, whereas the diagonal distance might be computed by applying the Pythagorean theorem. Minkowski distance The Minkowski distance is a metric on Euclidean space which can be considered as a generalization of both the Euclidean distance and the Manhattan distance. The Minkowski distance of order p between two points is defined as: Image Segmentation via Clustering Consensus Methods 22 Appendix B Raw results Following are the complete results as were measured. The graphs in section 2.4 and conclusions are based on these results. Picture name Clustering algorithm Number of clusters Distance function Lenna Lenna Lenna Lenna Lenna Lenna Lenna Lenna Lenna Jelly beans Jelly beans Jelly beans Jelly beans Jelly beans Jelly beans Luna Luna Luna Luna Luna Luna Luna Luna Luna Mountains Mountains Mountains Mountains Mountains Mountains Mountains Mountains Mountains Shoes Shoes Shoes Shoes Shoes Shoes Fish Fish Fish Fish Fish Fish Fish Fish Fish K-Means K-Means K-Means PAM PAM PAM PAM PAM PAM K-Means K-Means K-Means K-Means K-Means K-Means PAM PAM K-Means K-Means K-Means K-Means K-Means K-Means K-Means K-Means K-Means K-Means K-Means K-Means K-Means PAM PAM PAM PAM PAM PAM PAM PAM PAM PAM PAM K-Means PAM PAM K-Means PAM PAM K-Means 7 8 9 7 8 9 10 11 12 10 11 12 7 8 9 5 4 6 7 8 9 9 9 9 10 11 12 10 11 12 4 4 4 4 5 6 4 5 6 5 4 6 7 8 9 10 11 12 Minkowski Euclidian Manhattan Minkowski Euclidian Manhattan Minkowski Euclidian Manhattan Euclidian Euclidian Euclidian Euclidian Euclidian Euclidian Minkowski Euclidian Manhattan Euclidian Euclidian Euclidian Euclidian Euclidian Euclidian Euclidian Euclidian Euclidian Euclidian Euclidian Euclidian Euclidian Euclidian Euclidian Euclidian Euclidian Euclidian Euclidian Euclidian Euclidian Minkowski Euclidian Manhattan Minkowski Euclidian Manhattan Minkowski Euclidian Manhattan Number of clusters in HGPA alg. 6 6 6 7 7 7 10 10 10 10 10 10 8 8 8 6 6 6 8 8 8 9 9 9 9 9 9 11 11 11 4 4 4 4 4 4 6 6 6 6 6 6 7 7 7 10 10 10 Image Segmentation via Clustering Consensus Methods Number of clusters in IDLA alg. 6 6 6 7 7 7 10 10 10 10 10 10 8 8 8 6 6 6 8 8 8 9 9 9 9 9 9 11 11 11 4 4 4 4 4 4 6 6 6 6 6 6 7 7 7 10 10 10 Rand Index HGPA Rand Index IDLA 0.85 0.84 0.86 0.81 0.82 0.8 0.88 0.89 0.87 0.81 0.82 0.84 0.71 0.73 0.71 0.79 0.83 0.72 0.69 0.68 0.73 0.78 0.78 0.77 0.76 0.8 0.8 0.83 0.88 0.89 0.62 0.62 0.61 0.61 0.64 0.66 0.85 0.9 0.87 0.85 0.83 0.86 0.83 0.86 0.87 0.84 0.85 0.84 0.91 0.92 0.94 0.89 0.89 0.88 0.95 0.93 0.91 0.94 0.94 0.88 0.81 0.81 0.82 0.84 0.88 0.8 0.96 0.92 0.84 0.95 0.87 0.84 0.84 0.88 0.88 0.91 0.94 0.94 0.96 0.96 0.94 0.83 0.86 0.86 0.84 0.83 0.89 0.82 0.83 0.86 0.8 0.84 0.87 0.9 0.94 0.93 23 Picture name Clustering algorithm Number of clusters Distance function Number of clusters in HGPA alg. Number of clusters in IDLA alg. Rand Index HGPA Rand Index IDLA Fish Fish Fish Fish Fish Fish Plane Plane Plane Plane Plane Plane Plane Plane Plane Drop Drop Drop Drop Drop Drop Drop Drop Drop Camels Camels Camels Camels Camels Camels Camels Camels Camels Diva Diva Diva Diva Diva Diva Plane2 Plane2 Plane2 Plane2 Plane2 Plane2 Plane2 Plane2 Plane2 Plane2 Plane2 Plane2 PAM PAM K-Means K-Means K-Means K-Means K-Means K-Means K-Means K-Means K-Means K-Means K-Means K-Means K-Means K-Means K-Means K-Means K-Means K-Means K-Means K-Means K-Means K-Means PAM PAM PAM PAM PAM PAM PAM PAM PAM K-Means K-Means K-Means K-Means K-Means K-Means K-Means K-Means K-Means K-Means K-Means K-Means K-Means K-Means K-Means PAM PAM PAM 13 14 15 13 14 15 4 5 6 7 8 9 10 11 12 5 4 6 7 8 9 10 11 12 4 5 6 7 8 9 10 11 12 4 5 6 7 8 9 2 3 4 5 6 7 5 6 7 8 9 10 Minkowski Euclidian Manhattan Minkowski Euclidian Manhattan Minkowski Euclidian Manhattan Minkowski Euclidian Manhattan Minkowski Euclidian Manhattan Euclidian Euclidian Euclidian Euclidian Euclidian Euclidian Euclidian Euclidian Euclidian Minkowski Minkowski Minkowski Manhattan Manhattan Manhattan Minkowski Minkowski Minkowski Euclidian Euclidian Euclidian Euclidian Euclidian Euclidian Euclidian Euclidian Euclidian Euclidian Euclidian Euclidian Euclidian Euclidian Euclidian Minkowski Euclidian Manhattan 13 13 13 13 13 13 4 4 4 7 7 7 10 10 10 6 6 6 9 9 9 12 12 12 6 6 6 9 9 9 12 12 12 6 6 6 8 8 8 3 3 3 5 5 5 7 7 7 8 8 8 13 13 13 13 13 13 4 4 4 7 7 7 10 10 10 6 6 6 9 9 9 12 12 12 6 6 6 9 9 9 12 12 12 6 6 6 8 8 8 3 3 3 5 5 5 7 7 7 8 8 8 0.9 0.93 0.93 0.93 0.95 0.93 0.58 0.6 0.64 0.87 0.9 0.91 0.86 0.88 0.88 0.85 0.83 0.85 0.79 0.84 0.85 0.89 0.91 0.89 0.8 0.78 0.81 0.74 0.77 0.78 0.9 0.96 0.92 0.83 0.71 0.84 0.73 0.76 0.77 0.68 0.69 0.66 0.59 0.68 0.71 0.62 0.73 0.76 0.7 0.76 0.76 0.93 0.92 0.93 0.97 0.98 0.95 0.98 0.82 0.78 0.99 0.93 0.94 0.99 0.93 0.94 0.96 0.93 0.98 0.89 0.94 0.97 0.93 0.93 0.95 0.86 0.82 0.9 0.88 0.94 0.93 0.9 0.93 0.92 0.89 0.82 0.88 0.89 0.95 0.93 0.68 0.68 0.66 0.77 0.77 0.76 0.73 0.8 0.84 0.8 0.8 0.84 Image Segmentation via Clustering Consensus Methods 24 Picture name Clustering algorithm Number of clusters Distance function Number of clusters in HGPA alg. Number of clusters in IDLA alg. Rand Index HGPA Rand Index IDLA ToyHouse ToyHouse ToyHouse ToyHouse ToyHouse ToyHouse ToyHouse ToyHouse ToyHouse ToyHouse ToyHouse ToyHouse ToyHouse2 ToyHouse2 ToyHouse2 ToyHouse2 ToyHouse2 ToyHouse2 ToyHouse2 ToyHouse2 ToyHouse2 ToyHouse2 ToyHouse2 ToyHouse2 Tree Tree Tree Tree Tree Tree Tree Tree Tree Tree Tree Tree USA USA USA USA USA USA USA USA USA USA USA USA Jelly beans 2 Jelly beans 2 Jelly beans 2 Jelly beans 2 Jelly beans 2 Jelly beans 2 K-Means K-Means K-Means PAM PAM PAM PAM PAM PAM K-Means K-Means K-Means PAM PAM PAM K-Means K-Means K-Means PAM PAM PAM K-Means K-Means K-Means K-Means K-Means K-Means PAM PAM PAM K-Means K-Means K-Means PAM PAM PAM PAM PAM PAM K-Means K-Means K-Means PAM PAM PAM K-Means K-Means K-Means K-Means K-Means K-Means PAM PAM PAM 2 3 4 5 6 7 8 9 10 11 12 13 2 3 4 5 6 7 8 9 10 11 12 13 2 3 4 5 6 7 8 9 10 11 12 13 2 3 4 5 6 7 8 9 10 11 12 13 2 3 4 5 6 7 Euclidian Euclidian Euclidian Euclidian Euclidian Euclidian Euclidian Euclidian Euclidian Euclidian Euclidian Euclidian Euclidian Euclidian Euclidian Euclidian Euclidian Euclidian Euclidian Euclidian Euclidian Euclidian Euclidian Euclidian Minkowski Euclidian Manhattan Minkowski Euclidian Manhattan Minkowski Euclidian Manhattan Minkowski Euclidian Manhattan Minkowski Euclidian Manhattan Minkowski Euclidian Manhattan Minkowski Euclidian Manhattan Minkowski Euclidian Manhattan Minkowski Euclidian Manhattan Minkowski Euclidian Manhattan 4 4 4 7 7 7 9 9 9 13 13 13 4 4 4 7 7 7 10 10 10 11 11 11 2 2 2 5 5 5 8 8 8 13 13 13 4 4 4 7 7 7 10 10 10 12 12 12 3 3 3 6 6 6 4 4 4 7 7 7 9 9 9 13 13 13 4 4 4 7 7 7 10 10 10 11 11 11 2 2 2 5 5 5 8 8 8 13 13 13 4 4 4 7 7 7 10 10 10 12 12 12 3 3 3 6 6 6 0.44 0.51 0.54 0.76 0.89 0.92 0.75 0.76 0.78 0.89 0.88 0.89 0.5 0.56 0.59 0.91 0.92 0.9 0.84 0.85 0.86 0.81 0.82 0.82 0.5 0.5 0.5 0.66 0.66 0.67 0.76 0.77 0.78 0.9 0.91 0.91 0.43 0.55 0.6 0.83 0.85 0.83 0.77 0.79 0.8 0.89 0.9 0.89 0.48 0.52 0.53 0.68 0.74 0.77 0.8 0.94 0.99 0.94 0.79 0.82 0.88 0.86 0.87 0.89 0.93 0.88 0.74 0.79 0.79 0.85 0.83 0.89 0.89 0.91 0.9 0.95 0.92 0.95 0.97 0.81 0.76 0.89 0.9 0.93 0.99 0.92 0.93 0.91 0.89 0.91 0.62 0.77 0.89 0.91 0.9 0.92 0.93 0.9 0.92 0.93 0.98 0.93 0.78 0.96 0.91 0.97 0.84 0.8 Image Segmentation via Clustering Consensus Methods 25 Picture name Clustering algorithm Number of clusters Distance function Jelly beans 2 Jelly beans 2 Jelly beans 2 House House House House House House K-Means K-Means K-Means PAM PAM PAM K-Means K-Means K-Means 8 9 10 8 9 10 5 6 7 Minkowski Euclidian Manhattan Minkowski Euclidian Manhattan Minkowski Euclidian Manhattan Number of clusters in HGPA alg. 10 10 10 10 10 10 7 7 7 Number of clusters in IDLA alg. 10 10 10 10 10 10 7 7 7 Rand Index HGPA Rand Index IDLA 0.78 0.78 0.81 0.79 0.81 0.81 0.83 0.85 0.85 0.85 0.85 0.85 0.9 0.9 0.9 0.92 0.9 0.91 Average Rand Index Following tables represents the average Rand Index parameter in HGPA and IDLA algorithms, grouped by number of clusters. Table results were used in figure 2.24 Image Segmentation via Clustering Consensus Methods 26