International Journal of Application or Innovation in Engineering & Management... Web Site: www.ijaiem.org Email: Volume 3, Issue 5, May 2014

advertisement
International Journal of Application or Innovation in Engineering & Management (IJAIEM)
Web Site: www.ijaiem.org Email: editor@ijaiem.org
Volume 3, Issue 5, May 2014
ISSN 2319 - 4847
An adaptive fuzzy clustering algorithm with
generalized entropy based on weighted sample
Kai Li1, Lijuan Cui2 and Xiuchen Ye3
1
School of mathematics and computer, Hebei University, Baoding 071002 China
2
Library, Hebei University, Baoding 071002 China
3
Center of computer, Hebei University, Baoding 071002 China
Abstract
Aiming at fuzzy clustering with generalized entropy, an adaptive fuzzy clustering algorithm with generalized entropy based on
weighted sample is presented. Firstly, weight of sample is introduced into objective function for fuzzy clustering with generalized
entropy. Based on it, we obtain optimization problem for fuzzy clustering with generalized entropy based on weighted sample.
Then, we use Lagrange multiplier method to solve corresponding optimization problem and obtain degree of membership for each
sample belonging to different cluster, centers of clusters and weights of samples. Finally, we select some representative datasets
from UCI repository to conduct experiments. Experimental results show the effectiveness of presented algorithms above.
Keywords: fuzzy clustering, generalized entropy, weighted sample, adaptive method
1. INTRODUCTION
Clustering is an important data analysis method and has been applied to pattern recognition, data mining, and etc. Up to
now, researchers have proposed many different clustering algorithms. On them, division-based cluster analysis (also
called as objective function based cluster analysis) is one of the commonly used methods, such as K-means and Fuzzy Cmeans. However, these clustering algorithms only consider data points or data attributes with the same importance. To
solve these problems, researchers have proposed many different improved algorithms. Huang et al[1] introduced variable
weights to the k-means clustering process and presented a k-means type clustering algorithm that can automatically
calculate variable weights. Jing et al[2] included the weight entropy in the objective function to extend the k-means
clustering process. They calculate a weight for each dimension in each cluster and use the weight values to identify the
subsets of important dimensions that categorize different clusters. To reduce the FCM algorithm's dependence on the
initial cluster centers and data sets, Su et al[3] introduced weighting parameter to adjust the location of cluster centers
and noise problem. To consider the particular contributions of different features, Li et al[4] presented a new feature
weighted fuzzy clustering algorithm. In addition, Karayiannis [5] introduced entropy into fuzzy clustering and proposed
fuzzy clustering algorithm based on maximum entropy. Following that, Li et al[6], and Wagner et al[7] combined the loss
of function for data samples to cluster centers to propose maximum entropy clustering algorithm. Wei et al[8] presented a
bidirectional association fuzzy clustering network to solve the problem of fuzzy clustering. In this paper, an adaptive
fuzzy clustering with generalized entropy based on weighted sample is studied. In the process of clustering, with changes
of degree of membership for each sample and centers of clusters, weights of samples are updated.
This paper is organized as follows. In section 2, we give an objective function about weighted sample for fuzzy clustering
with generalized entropy and use Lagrange method to obtain membership of samples and centers of cluster. In section 3,
an adaptive fuzzy clustering algorithm with the generalized entropy based on weighted sample is given. In the section 4,
we choose commonly used datasets from UCI to test the presented algorithms’ performance. In the final section,
conclusion is given.
2. FUZZY CLUSTERING WITH GENERALIZED ENTROPY BASED ON WEIGHTED SAMPLE
Let X  {x1 , x2 , , xn } be a data set, where xi  R s , c is a positive integer greater than one and m  1 is fuzzy
c
index; ij  i ( x j )  0 is degree of membership for xj belonging to ith cluster center vi and

ij
 1 . U is a membership
i 1
matrix which is composed of all μij’s (i=1,2,…,c; j=1,2,…,n); V is a vector whose component consists of cluster center
vi (i=1,2,…,c). Objective function with generalized entropy’s fuzzy clustering is represented as
n
c
n
c
J G (U,V) =   ij m || x j  vi ||2    (21 m  1) 1 ( ij m  1) .
j=1 i=1
j 1
(1)
i 1
According to (1), we use Lagrange multiplier to obtain GEFCM algorithms [9]. As samples for data set have different
importance in clustering process, here, weights of samples are introduced into objective function (1) above. So, we obtain
the following objective function
Volume 3, Issue 5, May 2014
Page 137
International Journal of Application or Innovation in Engineering & Management (IJAIEM)
Web Site: www.ijaiem.org Email: editor@ijaiem.org
Volume 3, Issue 5, May 2014
n
c
n
ISSN 2319 - 4847
c
J WG (U,V) =  w j  ij m || x j  vi ||2   (21 m  1)1 ( ij m  1) ,
j 1
j=1 i=1
(2)
i 1
where δ is an adjustable parameter and wj is the weight of the j-th data sample.
Based on objective function (2) and constrained condition, we obtain the following optimization problem for fuzzy
clustering with generalized entropy based on weighted sample
min JWG (U,V)
c
.
(3)

s.t.
 1, j  1, 2,  , n
ij
i 1
In the following, we use Lagrange multipliers to solve optimization problem (3). So, Lagrange function L corresponding
to (3) is written as
L(U,V ; 1 , ,  n )
n
=
c
n
  wj
m
ij
c
|| x j  v i ||2    (21 m  1)  1 (   ij m  1) .
j 1
j= 1 i= 1
c
i 1
c
c
 1 (   i1  1)   2 (   i 2  1)     n (   in  1)
i 1
i 1
i 1
Here, we take the derivative of function L with respect to λj, μij and vi and let them equal to zero, namely
c
L
 ij  1  0 ,

 j
i 1
m 1
2
1 m
1
m 1
L
 w m
|| x  ||  m (2 1)    0 ,

j
  ij
j
ij
vi
(4)
(5)
j
ij
n
L
 2
vi

 imj ( x j  v i )  0 .
(6)
j 1
From ( 4 ), ( 5 ) and (6), and using simple algebra operation, we obtain the following degree of membership for each
sample and centers of clusters.
 ij
1

1



2
1 m
2
1 m
w j || x j  v i ||   (2
c
k 1
w j || x j  v k ||   (2
 1)
 1)
1
1



m 1
,
(7)
i  1, 2,  , c ; j  1, 2,  , n
n
w j  imj

v
i
j 1

x
j
.
(8)
n

w j
m
ij
j 1
i  1, 2 ,  , c
It is noted that weight of each sample wj is determined by some fixed method. Now, by using iterative method on the basis
of (7) and (8), we obtain fuzzy clustering algorithm with generalized entropy based on weighted sample. Here, we call it
as WGEFCM algorithm.
3. ADAPTIVE FUZZY CLUSTERING WITH GENERALIZED ENTROPY BASED ON WEIGHTED SAMPLE
To further improve performance of clustering, we modify (2) to obtain the following objective function
n
c
n
c
J AWG (U,V) =  wj ij m || x j  vi ||2   (21 m  1)1 ( ij m  1) ,
j=1 i=1
j 1
(9)
i 1
where β is a parameter.
Based on objective function (9), we obtain the following optimization problem for fuzzy clustering with generalized
entropy based on weighted sample
m in J A W G (U ,V )
c
s .t .

 1, j  1, 2,  , n .
ij
(10)
i 1
n
w
j
1
j 1
For optimization problem (10), we use Lagrange multiplier approach to solve degree of membership for each sample,
centers of clusters and weights of samples. So, Lagrange function L corresponding to (10) is written as
Volume 3, Issue 5, May 2014
Page 138
International Journal of Application or Innovation in Engineering & Management (IJAIEM)
Web Site: www.ijaiem.org Email: editor@ijaiem.org
Volume 3, Issue 5, May 2014
ISSN 2319 - 4847
L(U,V ; 1 , ,  n )
n
=
c
n
w

j
c
.
 ij m || x j  vi ||2    (21  m  1)  1 (   ij m  1)
j 1
j= 1 i=1
c
i 1
c
c
n
 1 (   i1  1)   2 (   i 2  1)     n (   in  1)   (  w j  1)
i 1
i 1
i 1
j 1
Here, we take the derivative of function L with respect to λj, μij, vi , wj and γ, and let them equal to zero, namely
c
L
 ij  1  0 ,

 j
i 1
m 1
2
1 m
1
m 1
L
 w m
|| x  ||  m (2 1)    0 ,

j
  ij
vi
ij
j
(12)
n
L
 2
vi
 imj ( x j  v i )  0 ,

(13)
j 1
c
L
  w j 1
w j
L
j
ij
(11)

m
ij
|| x j  vi ||2    0 .
(14)
i 1
n
wj 1  0

j 1
(15)

From (11)-(15) and using simple algebra operation, we obtain the following degree of membership for each sample,
centers of clusters and weights of samples.
 ij

1

1




2
1 m

2
1 m
w j || x j  v i ||   (2
c
k 1
w j || x j  v k ||   (2
 1)
 1)
1
1



m 1
,
(16)
i  1, 2,  , c ; j  1, 2,  , n
n
w j  imj

v
i

j 1
x
j
,
(17)
n

j

w 
m
ij
j 1
i  1, 2 ,  , c
1
c
(   imj || x j  v i || 2 ) 1  
w
j

i1
n

l 1
c
1
.
(18)
(   ilm || x l  v i || 2 ) 1  
i 1
j  1, 2 ,  , n
In the following, we give the weighted sample’s fuzzy clustering algorithm with generalized entropy and name it as
AWGEFCM algorithm.
Step 1 Initialize c centers of clusters and weights of samples, and assign m, β and δ.
Step 2 Compute degree of membership μij for each sample according to (16).
Step 3 Compute center of cluster vi for each cluster according to (17).
Step 4 Calculate weights of samples wj according to (18).
Step 5 Repeat step 2 to step 4 until the center of cluster vi does not change.
4. EXPERIMENTS
In order to verify the effectiveness of the proposed algorithm AWGEFCM, we select five datasets from UCI data
repository[10]. In addition, we choose two indexes to evaluate performance of clustering result. They are accuracy ACC
and mutual information MI, respectively, where ACC is represented as ACC = n  err 100% , n is number of samples in
n
dataset X and err is number of misclassified samples. Moreover, we initialize centers of clusters by using randomly chose
method form dataset in the following experiment.
Firstly, we choose dataset Iris to conduct the detailed experimental study. In the experiment, parameter δ is fixed as value
-0.01; fuzzy index m is taken as 1.1, 1.5, 2, 2.5, 3, 5, 7 and 11, respectively and β is taken as 5, 10, 20, 50 and 100,
respectively. Aimed at algorithms AWGEFCM, experimental results are seen in Figure 1, where (a), (c), (e), (g) and (i)
Volume 3, Issue 5, May 2014
Page 139
International Journal of Application or Innovation in Engineering & Management (IJAIEM)
Web Site: www.ijaiem.org Email: editor@ijaiem.org
Volume 3, Issue 5, May 2014
ISSN 2319 - 4847
1
0.9
0.8
0.7
0.6
0.5
0.8
0.78
MI
Accuracy
are relation between fuzzy index m and accuracy whereas (b), (d), (f), (h) and (j) are relation between fuzzy index m and
mutual information (MI).
0.74
0.72
1.1 1.5
2
2.5 3 5 7
Fuzzy index m
(a) β=5
9
11
1.1 1.5 2 2.5 3 5
Fuzzy index m
(b) β=5
0.92
MI
Accuracy
0.94
0.9
0.88
0.86
9
1.1 1.5
9
11
2 2.5 3 5
Fuzzy index m
7
9
11
7
9
11
7
9
11
7
9
11
(d) β=10
0.94
0.8
0.92
0.78
0.9
0.76
0.74
0.88
0.72
0.86
1.1 1.5 2 2.5 3 5 7
Fuzzy index m
(e) β=20
9
1.1 1.5
11
MI
0.92
0.9
0.88
0.86
1.1 1.5 2 2.5 3 5 7
Fuzzy index m
(g) β=50
9
0.8
0.78
0.76
0.74
0.72
0.7
1.1 1.5
11
2 2.5 3 5
Fuzzy index m
(h) β=20
0.94
MI
0.92
0.9
0.88
0.86
1.1 1.5 2 2.5 3 5 7
Fuzzy index m
(i) β=100
2 2.5 3 5
Fuzzy index m
(f) β=20
0.94
Accuracy
7
0.8
0.78
0.76
0.74
0.72
0.7
11
MI
Accuracy
1.1 1.5 2 2.5 3 5 7
Fuzzy index m
(c) β=10
Accuracy
0.76
9
11
0.8
0.78
0.76
0.74
0.72
0.7
1.1 1.5
2 2.5 3 5
Fuzzy index m
(j) β=100
Figure 1 Clustering performance for algorithm AWGEFCM, where (a), (c), (e), (g) and (i) are relation between fuzzy
index m and accuracy whereas (b), (d), (f), (h) and (j) are relation between fuzzy index m and mutual information (MI).
It is seen that when β=5, clustering performance of algorithm AWGEFCM is approximately stable with the change of
fuzzy index m. However, when β is taken as 10, 20, 50 and 100, respectively, there is bigger change for clustering
performance of algorithm. Especially, when fuzzy index m is taken as 7, 9 and 11, respectively, we obtain better
clustering results.
Secondly, for datasets Australian, Breast-w, Heart and Ionosphere, we fix fuzzy index m as value 3 and δ as value -0.1 or
-200. Let β be taken as 5, 10, 20, 50, 100 and 1000, respectively. Experimental results are shown in Figure 2.
Volume 3, Issue 5, May 2014
Page 140
International Journal of Application or Innovation in Engineering & Management (IJAIEM)
1
0.8
0.6
0.4
0.2
0
Accuracy
Accuracy
Web Site: www.ijaiem.org Email: editor@ijaiem.org
Volume 3, Issue 5, May 2014
5
10
20
50
100
Fuzzy index m
(a) Australian
5
10
20
50 100 1000
Fuzzy index m
(b) Breast-w
0.705
Accuracy
Accuracy
0.9755
0.975
0.9745
0.974
0.9735
0.973
0.9725
1000
0.608
ISSN 2319 - 4847
0.606
0.604
0.602
0.6
0.7
0.695
0.69
0.685
5
10
20
50 100 1000
Fuzzy index m
(c) Heart
5
10
20
50 100 1000
Fuzzy index m
(d) Ionosphere
Figure 2 Accuracy for different dataset using algorithm AWGEFCM
By Figure 2, we know that for different dataset, when β takes different value, clustering performance is different.
Besides, we also compare clustering performance for different algorithm on datasets Australian, breast-w, Heart,
Ionosphere and Iris. The selected clustering algorithms mainly include FCM, GEFCM [10] and WGEFCM. Experimental
results are given in Table 1. It is seen that we obtain better clustering results using presented algorithm AWGEFCM
compared with FCM, GEFCM and WGEFCM in some datasets.
Table 1: Comparison result of different algorithm
Dataset
FCM
Australian
Breast-w
Heart
Ionosphere
Iris
56.09%
95.59%
59.25%
70.94%
89.33%
GEFC
M
56.09%
95.61%
59.26%
71.23%
92.67%
WGEF
CM
56.67%
95.88%
60.74%
71.23%
91.33%
AWGEF
CM
60.14%
97.5%
60.74%
71.23%
92.67%
5. CONCLUSIONS
In this paper, we study fuzzy clustering with weights of samples based on generalized entropy. Objective function for fuzzy clustering
method with generalized entropy based on weighted sample is obtained. Then, an adaptive fuzzy clustering algorithm with generalized
entropy based on weighted sample is presented. We select some representative datasets to conduct experimental study. Experimental
results show that the presented algorithm is effective.
Acknowledgment
This work is support by Natural Science Foundation of China (No. 61375075) and Nature Science Foundation of Hebei
Province (No. F2012201014).
REFREENCE
[1] J.Z. Huang, K.N. Michael, H. Rong, and Z. Li, “Automated Variable Weighting in k-Means Type Clustering,”
IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 27( 5),pp. 657-668 , 2005.
[2] L. Jing, K.N. Michael, and J.Z. Huang, “An Entropy Weighting k-Means Algorithm for Subspace Clustering of
High-Dimensional Sparse Data,” IEEE Transactions on Knowledge and Data Engineering, vol.19(8),pp. 10261041,2007.
[3] X. Su, X. Wang, Z. Wang, Y. Xiao, “A New Fuzzy Clustering Algorithm Based on Entropy Weighting,” Journal of
Computational Information Systems, vol. 6(10), pp. 3319-3326,2010.
[4] J. Li, X. Gao, X. Jiao, “A new feature weighted fuzzy clustering algorithm,” Acta electronica sinica, vol. 34(1), pp.
89-92, 2006.
[5] Karayiannis N. B. MECA:maximum entropy clustering algorithm. In Proceedings of the third IEEE conference on
fuzzy systems, vol. 1,pp. 630-635,1994.
Volume 3, Issue 5, May 2014
Page 141
International Journal of Application or Innovation in Engineering & Management (IJAIEM)
Web Site: www.ijaiem.org Email: editor@ijaiem.org
Volume 3, Issue 5, May 2014
ISSN 2319 - 4847
[6] R.P. Li, M. Mukaidon, “A maximum entropy approach to fuzzy clustering,” In Proceedings of the Fouth IEEE
international conference on fuzzy system,Yokohama, Japan, pp. 2227-2232,1995.
[7] D. Tran and M.Wagner, “Fuzzy entropy clustering,” In Proceedings of the Ninth IEEE International Conference on
Fuzzy Systems, vol. 1, pp. 152-157,2000.
[8] C. Wei, C.Fahn, “The multisynapse neural network and its application to fuzzy clustering,” IEEE Transactions on
Neural Networks, vol.13(3), pp. 600-618, 2002.
[9] K. Li, H. Y. Ma and Y. Wang, “Unified model of fuzzy clustering algorithm based on entropy and its application to
image segmentation,” Journal of Computational Information Systems, vol. 7(15), pp. 5476-5483,2011.
[10] Blake C. L., Merz C. J, “UCI Repository for Machine Learning databases IrvineCA: University of California,”
Department of Information and Computer Sciences, http://www.ics.uci.edu/mlearn/MLRepository.html, 1998.
AUTHOR
Kai Li received the B.S. and M.S. degrees in Mathematics Department and Electrical Engineering
Department from Hebei University,Baoding,China, in 1982 and 1992, respectively. He received the Ph.D.
degree from Beijing Jiaotong University, Beijing, China, in 2001.
He is currently a Professor in School of Mathematics and Computer Science, Hebei University. His current
research interests include machine learning, data mining, computational intelligence, and pattern recognition.
Lijuan Cui received the B.S. degrees in Education Department from Hebei University, Baoding, China, in 2007. She is
currently a vice Professor with library of Hebei University. Her current research interests include data mining and
information retrieval.
Xiuchen Ye received the B.S. degrees in Electrical Engineering Department from Hebei University, Baoding, China, in
1987. He is currently a vice Professor with Center of Computer from Hebei University. His current research interests
include data mining and machine learning.
Volume 3, Issue 5, May 2014
Page 142
Download