PARTITIONAL CLUSTERING Deniz ÜSTÜN CONTENT WHAT IS CLUSTERING ? WHAT IS PARTITIONAL CLUSTERING ? THE USED ALGORITHMS IN PARTITIONAL CLUSTERING What is Clustering ? A process of clustering is classification of the objects which are similar among them, and organizing of data into groups. The techniques for Clustering are among the unsupervised methods. What is Partitional Clustering ? The Partitional Clustering Algorithms separate the similar objects to the Clusters. The Partitional Clustering Algorithms are succesful to determine center based Cluster. The Partitional Clustering Algorithms divide n objects to k cluster by using k parameter. The techniques of the Partitional Clustering start with a randomly chosen clustering and then optimize the clustering according to some accuracy measurement. The Used Algorithms in Partitional Clustering K-MEANS ALGORITHM K-MEDOIDS ALGORITHM FUZZY C-MEANS ALGORITHM K-MEANS ALGORITHM K-MEANS algorithm is introduced as one of the simplest unsupervised learning algorithms that resolve the clustering problems by J.B. MacQueen in 1967 (MacQueen, 1967). K-MEANS algorithm allows that one of the data belong to only a cluster. Therefore, this algorithm is a definite clustering algorithm. Given the N-sample of the clusters in the N-dimensional space. K-MEANS ALGORITHM This space is separated, {C1,C2,…,Ck} the K clusters. The vector mean (Mk) of the Ck cluster is given (Kantardzic, 2003) : 1 nk M k X ik nk i 1 where the value of Xk is i.sample belong to Ck. The square-error formula for the Ck is given : nk 2 ei2 X ik M k i 1 K-MEANS ALGORITHM The square-error formula for the Ck is called the changing in cluster. The square-error for all the clusters is the sum of the changing in clusters. K Ek2 ek2 k 1 The aim of the square-error method is to find the K clusters that minimize the value of the Ek2 according to the value of the given K K-MEANS ALGORITHM EXAMPLE Gözlemler Değişken1 Değişken2 Küme Üyeliği X1 3 2 C1 X2 2 3 C2 X3 7 8 C1 10 3 7 2 8 M1 , 5,5 2 2 9 8 7 6 5 2 3 M 2 , 2,3 1 1 4 3 2 1 0 0 1 2 3 4 5 6 7 8 9 10 K-MEANS ALGORITHM EXAMPLE 2 3 3 3 0 e 3 5 2 5 7 5 8 5 21 2 1 2 2 e 2 2 2 2 E e e 21 0 21 2 2 1 2 2 2 2 K-MEANS ALGORITHM EXAMPLE d M 1 , X 1 5 32 5 22 2,82 d M 2 , X 1 2 32 3 22 1,41 d M 1 , X 2 5 22 5 32 3,60 d M 2 , X 2 2 22 3 32 0 d M 1 , X 3 5 72 5 82 3,60 d M 2 , X 3 2 7 2 3 82 7,07 Gözlemler d(M1) d(M2) Küme Üyeliği X1 2,82 1,41 C2 X2 3,60 0 C2 X3 3,60 7,07 C1 d M 2 , X1 d M1 , X1 K-MEANS ALGORITHM EXAMPLE Gözlemler Değişken1 Değişken2 Küme Üyeliği X1 3 2 C2 X2 2 3 C2 X3 7 8 C1 10 3 2 2 3 M2 , 2.5,2.5 2 2 9 8 7 6 5 7 8 M 1 , 7,8 1 1 4 3 2 1 0 0 1 2 3 4 5 6 7 8 9 10 K-MEANS ALGORITHM EXAMPLE e22 3 2.5 2 2.5 2 2.5 3 2.5 1 2 2 e 7 7 8 8 0 2 1 2 2 E e e 0 1 1 2 2 1 2 2 2 2 K-MEANS ALGORITHM EXAMPLE-1 Küme Üyeliği Gözlemler d(M1) d(M2) X1 7,21 0,7 C2 X2 7,07 0,7 C2 X3 0 7,10 C1 10 9 d M 1 , X 1 d M 2 , X 1 7 32 8 22 7,21 2,5 32 2,5 22 0,7 8 C1 7 6 d M 1 , X 2 d M 2 , X 2 d M 1 , X 3 d M 2 , X 3 7 22 8 32 7,07 2,5 22 2,5 32 0,7 7 7 8 8 0 2.5 72 2.5 82 7,10 2 2 5 4 3 C2 2 1 0 0 1 2 3 4 5 6 7 8 9 10 K-MEANS ALGORITHM EXAMPLE-2 Dataset The Number of Attributes The Number of Features The Number of Class Synthetic 1200 2 4 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 K-MEANS ALGORITHM EXAMPLE-2 K=2 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 K-MEANS ALGORITHM EXAMPLE-2 1 K=3 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 K-MEANS ALGORITHM EXAMPLE-2 K=4 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 K-MEDOIDS ALGORITHM The aim of the K-MEDOIDS algorithm is to find the K representative objects (Kaufman and Rousseeuw, 1987). Each cluster in K-MEDOIDS algorithm is represented by the object in cluster. K-MEANS algorithm determine the clusters by the mean process. However, K-MEDOIDS algorithm find the cluster by using mid-point. nk 2 ei2 X ik Ok i 1 K-MEDOIDS ALGORITHM EXAMPLE-1 K-MEDOIDS ALGORITHM EXAMPLE-1 Select the Randomly K-Medoids K-MEDOIDS ALGORITHM EXAMPLE-1 Allocate to Each Point to Closest Medoid K-MEDOIDS ALGORITHM EXAMPLE-1 Allocate to Each Point to Closest Medoid K-MEDOIDS ALGORITHM EXAMPLE-1 Allocate to Each Point to Closest Medoid K-MEDOIDS ALGORITHM EXAMPLE-1 Determine New Medoid for Each Cluster K-MEDOIDS ALGORITHM EXAMPLE-1 Determine New Medoid for Each Cluster K-MEDOIDS ALGORITHM EXAMPLE-1 Allocate to Each Point to Closest Medoid K-MEDOIDS ALGORITHM EXAMPLE-1 Stop Process K-MEDOIDS ALGORITHM EXAMPLE-2 Dataset The Number of Attributes The Number of Features The Number of Class Synthetic 2000 2 3 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 K-MEDOIDS ALGORITHM EXAMPLE-2 K=2 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 K-MEDOIDS ALGORITHM EXAMPLE-2 K=3 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 FUZZY C-MEANS ALGORITHM Fuzzy C-MEANS algorithm is the best known and widely used a method. Fuzzy C-MEANS algorithm is introduced by DUNN in 1973 and improved by BEZDEK in 1981 [Höppner vd, 2000]. Fuzzy C-MEANS lets that objects are belonging to two and more cluster. The total value of the membership of a data for all the classes is equal to one. However, the value of the memebership of the cluster that contain this object is high than other clusters. This Algorithm is used the least squares method [Höppner vd, 2000]. FUZZY C-MEANS ALGORITHM N C Jm u xi ci , i 1 j 1 2 m ij 1 m The algorithm start by using randomly membership matrix (U) and then the center vector calculate [Höppner vd, 2000]. N cj m u ij xi i 1 N m u ij i 1 FUZZY C-MEANS ALGORITHM According to the calculated center vector, the membership matrix (u) is computed by using the given as: uij 1 xi ci k 1 xi ck C 2 m 1 The new membership matrix (unew) is compared with the old membership matrix (uold) and the the process continues until the difference is smaller than the value of the ε FUZZY C-MEANS ALGORITHM EXAMPLE Dataset The Number of Attributes The Number of Features The Number of Class Synthetic 2000 2 3 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 FUZZY C-MEANS ALGORITHM EXAMPLE 1 C=3 m=5 ε=1e-6 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Results K-MEDOIDS is the best algorithm according to K-MEANS and FUZZY C-MEANS. However, K-MEDOIDS algorithm is suitable for small datasets. K-MEANS algorithm is the best appropriate in terms of time. In FUZZY C-MEANS algorithm, a object can belong to one or more cluster. However, a object can belong to only a cluster in the other two algorithms. References [MacQueen, 1967] J.B., MacQueen, “Some Methods for Classification and Analysis of Multivariate Observations”, Proc. Symp. Math. Statist.and Probability (5th), 281-297,(1967). [Kantardzic, 2003] M., Kantardzic, “Data Mining: Concepts, Methods and Algorithms”, Wiley, (2003). [Kaufman and Rousseeuw, 1987] L., Kaufman, P. J., Rousseeuw, “Clustering by Means of Medoids,” Statistical Data Analysis Based on The L1–Norm and Related Methods, edited by Y. Dodge, North-Holland, 405–416, (1987). [Kaufman and Rousseeuw, 1990] L., Kaufman, P. J., Rousseeuw, “Finding Groups in Data: An Introduction to Cluster Analysis”, John Wiley and Sons., (1990). [Höppner vd, 2000] F., Höppner, F., Klawonn, R., Kruse, T., Runkler, “Fuzzy Cluster Analysis”, John Wiley&Sons, Chichester, (2000). [Işık and Çamurcu, 2007] M., Işık, A.Y., Çamurcu, “K-MEANS, K-MEDOIDS ve Bulanık CMEANS Algoritmalarının Uygulamalı olarak Performanslarının Tespiti”, İstanbul Ticaret Üniversitesi Fen Bilimleri Dergisi, Sayı :11, 31-45, (2007).