International Journal of Application or Innovation in Engineering & Management (IJAIEM) Web Site: www.ijaiem.org Email: editor@ijaiem.org Volume 3, Issue 5, May 2014 ISSN 2319 - 4847 An adaptive fuzzy clustering algorithm with generalized entropy based on weighted sample Kai Li1, Lijuan Cui2 and Xiuchen Ye3 1 School of mathematics and computer, Hebei University, Baoding 071002 China 2 Library, Hebei University, Baoding 071002 China 3 Center of computer, Hebei University, Baoding 071002 China Abstract Aiming at fuzzy clustering with generalized entropy, an adaptive fuzzy clustering algorithm with generalized entropy based on weighted sample is presented. Firstly, weight of sample is introduced into objective function for fuzzy clustering with generalized entropy. Based on it, we obtain optimization problem for fuzzy clustering with generalized entropy based on weighted sample. Then, we use Lagrange multiplier method to solve corresponding optimization problem and obtain degree of membership for each sample belonging to different cluster, centers of clusters and weights of samples. Finally, we select some representative datasets from UCI repository to conduct experiments. Experimental results show the effectiveness of presented algorithms above. Keywords: fuzzy clustering, generalized entropy, weighted sample, adaptive method 1. INTRODUCTION Clustering is an important data analysis method and has been applied to pattern recognition, data mining, and etc. Up to now, researchers have proposed many different clustering algorithms. On them, division-based cluster analysis (also called as objective function based cluster analysis) is one of the commonly used methods, such as K-means and Fuzzy Cmeans. However, these clustering algorithms only consider data points or data attributes with the same importance. To solve these problems, researchers have proposed many different improved algorithms. Huang et al[1] introduced variable weights to the k-means clustering process and presented a k-means type clustering algorithm that can automatically calculate variable weights. Jing et al[2] included the weight entropy in the objective function to extend the k-means clustering process. They calculate a weight for each dimension in each cluster and use the weight values to identify the subsets of important dimensions that categorize different clusters. To reduce the FCM algorithm's dependence on the initial cluster centers and data sets, Su et al[3] introduced weighting parameter to adjust the location of cluster centers and noise problem. To consider the particular contributions of different features, Li et al[4] presented a new feature weighted fuzzy clustering algorithm. In addition, Karayiannis [5] introduced entropy into fuzzy clustering and proposed fuzzy clustering algorithm based on maximum entropy. Following that, Li et al[6], and Wagner et al[7] combined the loss of function for data samples to cluster centers to propose maximum entropy clustering algorithm. Wei et al[8] presented a bidirectional association fuzzy clustering network to solve the problem of fuzzy clustering. In this paper, an adaptive fuzzy clustering with generalized entropy based on weighted sample is studied. In the process of clustering, with changes of degree of membership for each sample and centers of clusters, weights of samples are updated. This paper is organized as follows. In section 2, we give an objective function about weighted sample for fuzzy clustering with generalized entropy and use Lagrange method to obtain membership of samples and centers of cluster. In section 3, an adaptive fuzzy clustering algorithm with the generalized entropy based on weighted sample is given. In the section 4, we choose commonly used datasets from UCI to test the presented algorithms’ performance. In the final section, conclusion is given. 2. FUZZY CLUSTERING WITH GENERALIZED ENTROPY BASED ON WEIGHTED SAMPLE Let X {x1 , x2 , , xn } be a data set, where xi R s , c is a positive integer greater than one and m 1 is fuzzy c index; ij i ( x j ) 0 is degree of membership for xj belonging to ith cluster center vi and ij 1 . U is a membership i 1 matrix which is composed of all μij’s (i=1,2,…,c; j=1,2,…,n); V is a vector whose component consists of cluster center vi (i=1,2,…,c). Objective function with generalized entropy’s fuzzy clustering is represented as n c n c J G (U,V) = ij m || x j vi ||2 (21 m 1) 1 ( ij m 1) . j=1 i=1 j 1 (1) i 1 According to (1), we use Lagrange multiplier to obtain GEFCM algorithms [9]. As samples for data set have different importance in clustering process, here, weights of samples are introduced into objective function (1) above. So, we obtain the following objective function Volume 3, Issue 5, May 2014 Page 137 International Journal of Application or Innovation in Engineering & Management (IJAIEM) Web Site: www.ijaiem.org Email: editor@ijaiem.org Volume 3, Issue 5, May 2014 n c n ISSN 2319 - 4847 c J WG (U,V) = w j ij m || x j vi ||2 (21 m 1)1 ( ij m 1) , j 1 j=1 i=1 (2) i 1 where δ is an adjustable parameter and wj is the weight of the j-th data sample. Based on objective function (2) and constrained condition, we obtain the following optimization problem for fuzzy clustering with generalized entropy based on weighted sample min JWG (U,V) c . (3) s.t. 1, j 1, 2, , n ij i 1 In the following, we use Lagrange multipliers to solve optimization problem (3). So, Lagrange function L corresponding to (3) is written as L(U,V ; 1 , , n ) n = c n wj m ij c || x j v i ||2 (21 m 1) 1 ( ij m 1) . j 1 j= 1 i= 1 c i 1 c c 1 ( i1 1) 2 ( i 2 1) n ( in 1) i 1 i 1 i 1 Here, we take the derivative of function L with respect to λj, μij and vi and let them equal to zero, namely c L ij 1 0 , j i 1 m 1 2 1 m 1 m 1 L w m || x || m (2 1) 0 , j ij j ij vi (4) (5) j ij n L 2 vi imj ( x j v i ) 0 . (6) j 1 From ( 4 ), ( 5 ) and (6), and using simple algebra operation, we obtain the following degree of membership for each sample and centers of clusters. ij 1 1 2 1 m 2 1 m w j || x j v i || (2 c k 1 w j || x j v k || (2 1) 1) 1 1 m 1 , (7) i 1, 2, , c ; j 1, 2, , n n w j imj v i j 1 x j . (8) n w j m ij j 1 i 1, 2 , , c It is noted that weight of each sample wj is determined by some fixed method. Now, by using iterative method on the basis of (7) and (8), we obtain fuzzy clustering algorithm with generalized entropy based on weighted sample. Here, we call it as WGEFCM algorithm. 3. ADAPTIVE FUZZY CLUSTERING WITH GENERALIZED ENTROPY BASED ON WEIGHTED SAMPLE To further improve performance of clustering, we modify (2) to obtain the following objective function n c n c J AWG (U,V) = wj ij m || x j vi ||2 (21 m 1)1 ( ij m 1) , j=1 i=1 j 1 (9) i 1 where β is a parameter. Based on objective function (9), we obtain the following optimization problem for fuzzy clustering with generalized entropy based on weighted sample m in J A W G (U ,V ) c s .t . 1, j 1, 2, , n . ij (10) i 1 n w j 1 j 1 For optimization problem (10), we use Lagrange multiplier approach to solve degree of membership for each sample, centers of clusters and weights of samples. So, Lagrange function L corresponding to (10) is written as Volume 3, Issue 5, May 2014 Page 138 International Journal of Application or Innovation in Engineering & Management (IJAIEM) Web Site: www.ijaiem.org Email: editor@ijaiem.org Volume 3, Issue 5, May 2014 ISSN 2319 - 4847 L(U,V ; 1 , , n ) n = c n w j c . ij m || x j vi ||2 (21 m 1) 1 ( ij m 1) j 1 j= 1 i=1 c i 1 c c n 1 ( i1 1) 2 ( i 2 1) n ( in 1) ( w j 1) i 1 i 1 i 1 j 1 Here, we take the derivative of function L with respect to λj, μij, vi , wj and γ, and let them equal to zero, namely c L ij 1 0 , j i 1 m 1 2 1 m 1 m 1 L w m || x || m (2 1) 0 , j ij vi ij j (12) n L 2 vi imj ( x j v i ) 0 , (13) j 1 c L w j 1 w j L j ij (11) m ij || x j vi ||2 0 . (14) i 1 n wj 1 0 j 1 (15) From (11)-(15) and using simple algebra operation, we obtain the following degree of membership for each sample, centers of clusters and weights of samples. ij 1 1 2 1 m 2 1 m w j || x j v i || (2 c k 1 w j || x j v k || (2 1) 1) 1 1 m 1 , (16) i 1, 2, , c ; j 1, 2, , n n w j imj v i j 1 x j , (17) n j w m ij j 1 i 1, 2 , , c 1 c ( imj || x j v i || 2 ) 1 w j i1 n l 1 c 1 . (18) ( ilm || x l v i || 2 ) 1 i 1 j 1, 2 , , n In the following, we give the weighted sample’s fuzzy clustering algorithm with generalized entropy and name it as AWGEFCM algorithm. Step 1 Initialize c centers of clusters and weights of samples, and assign m, β and δ. Step 2 Compute degree of membership μij for each sample according to (16). Step 3 Compute center of cluster vi for each cluster according to (17). Step 4 Calculate weights of samples wj according to (18). Step 5 Repeat step 2 to step 4 until the center of cluster vi does not change. 4. EXPERIMENTS In order to verify the effectiveness of the proposed algorithm AWGEFCM, we select five datasets from UCI data repository[10]. In addition, we choose two indexes to evaluate performance of clustering result. They are accuracy ACC and mutual information MI, respectively, where ACC is represented as ACC = n err 100% , n is number of samples in n dataset X and err is number of misclassified samples. Moreover, we initialize centers of clusters by using randomly chose method form dataset in the following experiment. Firstly, we choose dataset Iris to conduct the detailed experimental study. In the experiment, parameter δ is fixed as value -0.01; fuzzy index m is taken as 1.1, 1.5, 2, 2.5, 3, 5, 7 and 11, respectively and β is taken as 5, 10, 20, 50 and 100, respectively. Aimed at algorithms AWGEFCM, experimental results are seen in Figure 1, where (a), (c), (e), (g) and (i) Volume 3, Issue 5, May 2014 Page 139 International Journal of Application or Innovation in Engineering & Management (IJAIEM) Web Site: www.ijaiem.org Email: editor@ijaiem.org Volume 3, Issue 5, May 2014 ISSN 2319 - 4847 1 0.9 0.8 0.7 0.6 0.5 0.8 0.78 MI Accuracy are relation between fuzzy index m and accuracy whereas (b), (d), (f), (h) and (j) are relation between fuzzy index m and mutual information (MI). 0.74 0.72 1.1 1.5 2 2.5 3 5 7 Fuzzy index m (a) β=5 9 11 1.1 1.5 2 2.5 3 5 Fuzzy index m (b) β=5 0.92 MI Accuracy 0.94 0.9 0.88 0.86 9 1.1 1.5 9 11 2 2.5 3 5 Fuzzy index m 7 9 11 7 9 11 7 9 11 7 9 11 (d) β=10 0.94 0.8 0.92 0.78 0.9 0.76 0.74 0.88 0.72 0.86 1.1 1.5 2 2.5 3 5 7 Fuzzy index m (e) β=20 9 1.1 1.5 11 MI 0.92 0.9 0.88 0.86 1.1 1.5 2 2.5 3 5 7 Fuzzy index m (g) β=50 9 0.8 0.78 0.76 0.74 0.72 0.7 1.1 1.5 11 2 2.5 3 5 Fuzzy index m (h) β=20 0.94 MI 0.92 0.9 0.88 0.86 1.1 1.5 2 2.5 3 5 7 Fuzzy index m (i) β=100 2 2.5 3 5 Fuzzy index m (f) β=20 0.94 Accuracy 7 0.8 0.78 0.76 0.74 0.72 0.7 11 MI Accuracy 1.1 1.5 2 2.5 3 5 7 Fuzzy index m (c) β=10 Accuracy 0.76 9 11 0.8 0.78 0.76 0.74 0.72 0.7 1.1 1.5 2 2.5 3 5 Fuzzy index m (j) β=100 Figure 1 Clustering performance for algorithm AWGEFCM, where (a), (c), (e), (g) and (i) are relation between fuzzy index m and accuracy whereas (b), (d), (f), (h) and (j) are relation between fuzzy index m and mutual information (MI). It is seen that when β=5, clustering performance of algorithm AWGEFCM is approximately stable with the change of fuzzy index m. However, when β is taken as 10, 20, 50 and 100, respectively, there is bigger change for clustering performance of algorithm. Especially, when fuzzy index m is taken as 7, 9 and 11, respectively, we obtain better clustering results. Secondly, for datasets Australian, Breast-w, Heart and Ionosphere, we fix fuzzy index m as value 3 and δ as value -0.1 or -200. Let β be taken as 5, 10, 20, 50, 100 and 1000, respectively. Experimental results are shown in Figure 2. Volume 3, Issue 5, May 2014 Page 140 International Journal of Application or Innovation in Engineering & Management (IJAIEM) 1 0.8 0.6 0.4 0.2 0 Accuracy Accuracy Web Site: www.ijaiem.org Email: editor@ijaiem.org Volume 3, Issue 5, May 2014 5 10 20 50 100 Fuzzy index m (a) Australian 5 10 20 50 100 1000 Fuzzy index m (b) Breast-w 0.705 Accuracy Accuracy 0.9755 0.975 0.9745 0.974 0.9735 0.973 0.9725 1000 0.608 ISSN 2319 - 4847 0.606 0.604 0.602 0.6 0.7 0.695 0.69 0.685 5 10 20 50 100 1000 Fuzzy index m (c) Heart 5 10 20 50 100 1000 Fuzzy index m (d) Ionosphere Figure 2 Accuracy for different dataset using algorithm AWGEFCM By Figure 2, we know that for different dataset, when β takes different value, clustering performance is different. Besides, we also compare clustering performance for different algorithm on datasets Australian, breast-w, Heart, Ionosphere and Iris. The selected clustering algorithms mainly include FCM, GEFCM [10] and WGEFCM. Experimental results are given in Table 1. It is seen that we obtain better clustering results using presented algorithm AWGEFCM compared with FCM, GEFCM and WGEFCM in some datasets. Table 1: Comparison result of different algorithm Dataset FCM Australian Breast-w Heart Ionosphere Iris 56.09% 95.59% 59.25% 70.94% 89.33% GEFC M 56.09% 95.61% 59.26% 71.23% 92.67% WGEF CM 56.67% 95.88% 60.74% 71.23% 91.33% AWGEF CM 60.14% 97.5% 60.74% 71.23% 92.67% 5. CONCLUSIONS In this paper, we study fuzzy clustering with weights of samples based on generalized entropy. Objective function for fuzzy clustering method with generalized entropy based on weighted sample is obtained. Then, an adaptive fuzzy clustering algorithm with generalized entropy based on weighted sample is presented. We select some representative datasets to conduct experimental study. Experimental results show that the presented algorithm is effective. Acknowledgment This work is support by Natural Science Foundation of China (No. 61375075) and Nature Science Foundation of Hebei Province (No. F2012201014). REFREENCE [1] J.Z. Huang, K.N. Michael, H. Rong, and Z. Li, “Automated Variable Weighting in k-Means Type Clustering,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 27( 5),pp. 657-668 , 2005. [2] L. Jing, K.N. Michael, and J.Z. Huang, “An Entropy Weighting k-Means Algorithm for Subspace Clustering of High-Dimensional Sparse Data,” IEEE Transactions on Knowledge and Data Engineering, vol.19(8),pp. 10261041,2007. [3] X. Su, X. Wang, Z. Wang, Y. Xiao, “A New Fuzzy Clustering Algorithm Based on Entropy Weighting,” Journal of Computational Information Systems, vol. 6(10), pp. 3319-3326,2010. [4] J. Li, X. Gao, X. Jiao, “A new feature weighted fuzzy clustering algorithm,” Acta electronica sinica, vol. 34(1), pp. 89-92, 2006. [5] Karayiannis N. B. MECA:maximum entropy clustering algorithm. In Proceedings of the third IEEE conference on fuzzy systems, vol. 1,pp. 630-635,1994. Volume 3, Issue 5, May 2014 Page 141 International Journal of Application or Innovation in Engineering & Management (IJAIEM) Web Site: www.ijaiem.org Email: editor@ijaiem.org Volume 3, Issue 5, May 2014 ISSN 2319 - 4847 [6] R.P. Li, M. Mukaidon, “A maximum entropy approach to fuzzy clustering,” In Proceedings of the Fouth IEEE international conference on fuzzy system,Yokohama, Japan, pp. 2227-2232,1995. [7] D. Tran and M.Wagner, “Fuzzy entropy clustering,” In Proceedings of the Ninth IEEE International Conference on Fuzzy Systems, vol. 1, pp. 152-157,2000. [8] C. Wei, C.Fahn, “The multisynapse neural network and its application to fuzzy clustering,” IEEE Transactions on Neural Networks, vol.13(3), pp. 600-618, 2002. [9] K. Li, H. Y. Ma and Y. Wang, “Unified model of fuzzy clustering algorithm based on entropy and its application to image segmentation,” Journal of Computational Information Systems, vol. 7(15), pp. 5476-5483,2011. [10] Blake C. L., Merz C. J, “UCI Repository for Machine Learning databases IrvineCA: University of California,” Department of Information and Computer Sciences, http://www.ics.uci.edu/mlearn/MLRepository.html, 1998. AUTHOR Kai Li received the B.S. and M.S. degrees in Mathematics Department and Electrical Engineering Department from Hebei University,Baoding,China, in 1982 and 1992, respectively. He received the Ph.D. degree from Beijing Jiaotong University, Beijing, China, in 2001. He is currently a Professor in School of Mathematics and Computer Science, Hebei University. His current research interests include machine learning, data mining, computational intelligence, and pattern recognition. Lijuan Cui received the B.S. degrees in Education Department from Hebei University, Baoding, China, in 2007. She is currently a vice Professor with library of Hebei University. Her current research interests include data mining and information retrieval. Xiuchen Ye received the B.S. degrees in Electrical Engineering Department from Hebei University, Baoding, China, in 1987. He is currently a vice Professor with Center of Computer from Hebei University. His current research interests include data mining and machine learning. Volume 3, Issue 5, May 2014 Page 142