CLUSTRING - EEMB DERSLER

advertisement
PARTITIONAL
CLUSTERING
Deniz ÜSTÜN
CONTENT

WHAT IS CLUSTERING ?

WHAT IS PARTITIONAL CLUSTERING ?

THE USED ALGORITHMS IN PARTITIONAL CLUSTERING
What is Clustering ?


A process of clustering is classification of the objects which are
similar among them, and organizing of data into groups.
The techniques for Clustering are among the unsupervised
methods.
What is Partitional Clustering ?




The Partitional Clustering Algorithms separate the similar
objects to the Clusters.
The Partitional Clustering Algorithms are succesful to determine
center based Cluster.
The Partitional Clustering Algorithms divide n objects to k
cluster by using k parameter.
The techniques of the Partitional Clustering start with a
randomly chosen clustering and then optimize the clustering
according to some accuracy measurement.
The Used Algorithms in Partitional
Clustering

K-MEANS ALGORITHM

K-MEDOIDS ALGORITHM

FUZZY C-MEANS ALGORITHM
K-MEANS ALGORITHM


K-MEANS algorithm is introduced as one of the simplest
unsupervised learning algorithms that resolve the clustering
problems by J.B. MacQueen in 1967 (MacQueen, 1967).
K-MEANS algorithm allows that one of the data belong to only a
cluster.

Therefore, this algorithm is a definite clustering algorithm.

Given the N-sample of the clusters in the N-dimensional space.
K-MEANS ALGORITHM


This space is separated, {C1,C2,…,Ck} the K clusters.
The vector mean (Mk) of the Ck cluster is given (Kantardzic,
2003) :
1 nk
M k   X ik
nk i 1
where the value of Xk is i.sample belong to Ck.
The square-error formula for the Ck is given :
nk
2
ei2    X ik  M k 
i 1
K-MEANS ALGORITHM


The square-error formula for the Ck is called the changing in
cluster.
The square-error for all the clusters is the sum of the changing
in clusters.
K
Ek2   ek2
k 1

The aim of the square-error method is to find the K clusters that
minimize the value of the Ek2 according to the value of the given
K
K-MEANS ALGORITHM
EXAMPLE
Gözlemler
Değişken1
Değişken2
Küme Üyeliği
X1
3
2
C1
X2
2
3
C2
X3
7
8
C1
10
3  7 2  8 
M1  
,
  5,5
2 
 2
9
8
7
6
5
 2 3
M 2   ,   2,3
1 1
4
3
2
1
0
0
1
2
3
4
5
6
7
8
9
10
K-MEANS ALGORITHM
EXAMPLE


 2  3  3  3   0

e  3  5  2  5  7  5  8  5  21
2
1
2
2
e
2
2
2
2
E  e  e  21 0  21
2
2
1
2
2
2
2
K-MEANS ALGORITHM
EXAMPLE
d M 1 , X 1  
5  32  5  22
 2,82
d M 2 , X 1  
2  32  3  22
 1,41
d M 1 , X 2  
5  22  5  32
 3,60
d M 2 , X 2  
2  22  3  32
0
d M 1 , X 3  
5  72  5  82
 3,60
d M 2 , X 3  
2  7 2  3  82
 7,07
Gözlemler d(M1) d(M2)
Küme Üyeliği
X1
2,82
1,41
C2
X2
3,60
0
C2
X3
3,60
7,07
C1
d M 2 , X1   d M1 , X1 
K-MEANS ALGORITHM
EXAMPLE
Gözlemler
Değişken1
Değişken2
Küme Üyeliği
X1
3
2
C2
X2
2
3
C2
X3
7
8
C1
10
3  2 2  3
M2  
,
  2.5,2.5
2 
 2
9
8
7
6
5
7 8
M 1   ,   7,8
1 1
4
3
2
1
0
0
1
2
3
4
5
6
7
8
9
10
K-MEANS ALGORITHM
EXAMPLE



e22  3  2.5  2  2.5  2  2.5  3  2.5  1
2

2

e  7  7  8  8  0
2
1
2
2
E  e  e  0 1  1
2
2
1
2
2
2
2
K-MEANS ALGORITHM
EXAMPLE-1
Küme Üyeliği
Gözlemler d(M1) d(M2)
X1
7,21
0,7
C2
X2
7,07
0,7
C2
X3
0
7,10
C1
10
9
d M 1 , X 1  
d M 2 , X 1  
7  32  8  22
 7,21
2,5  32  2,5  22
 0,7
8
C1
7
6
d M 1 , X 2  
d M 2 , X 2  
d M 1 , X 3  
d M 2 , X 3  
7  22  8  32  7,07
2,5  22  2,5  32  0,7
7  7   8  8  0
2.5  72  2.5  82  7,10
2
2
5
4
3
C2
2
1
0
0
1
2
3
4
5
6
7
8
9
10
K-MEANS ALGORITHM
EXAMPLE-2
Dataset
The Number of
Attributes
The Number of
Features
The Number of
Class
Synthetic
1200
2
4
1
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
K-MEANS ALGORITHM
EXAMPLE-2
K=2
1
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
K-MEANS ALGORITHM
EXAMPLE-2
1
K=3
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
K-MEANS ALGORITHM
EXAMPLE-2
K=4
1
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
K-MEDOIDS ALGORITHM



The aim of the K-MEDOIDS algorithm is to find the K
representative objects (Kaufman and Rousseeuw, 1987).
Each cluster in K-MEDOIDS algorithm is represented by
the object in cluster.
K-MEANS algorithm determine the clusters by the mean
process. However, K-MEDOIDS algorithm find the cluster
by using mid-point.
nk
2
ei2    X ik  Ok 
i 1
K-MEDOIDS ALGORITHM
EXAMPLE-1
K-MEDOIDS ALGORITHM
EXAMPLE-1
Select the Randomly K-Medoids
K-MEDOIDS ALGORITHM
EXAMPLE-1
Allocate to Each Point to Closest Medoid
K-MEDOIDS ALGORITHM
EXAMPLE-1
Allocate to Each Point to Closest Medoid
K-MEDOIDS ALGORITHM
EXAMPLE-1
Allocate to Each Point to Closest Medoid
K-MEDOIDS ALGORITHM
EXAMPLE-1
Determine New Medoid for Each Cluster
K-MEDOIDS ALGORITHM
EXAMPLE-1
Determine New Medoid for Each Cluster
K-MEDOIDS ALGORITHM
EXAMPLE-1
Allocate to Each Point to Closest Medoid
K-MEDOIDS ALGORITHM
EXAMPLE-1
Stop Process
K-MEDOIDS ALGORITHM
EXAMPLE-2
Dataset
The Number of
Attributes
The Number of
Features
The Number of
Class
Synthetic
2000
2
3
1
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
K-MEDOIDS ALGORITHM
EXAMPLE-2
K=2
1
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
K-MEDOIDS ALGORITHM
EXAMPLE-2
K=3
1
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
FUZZY C-MEANS ALGORITHM






Fuzzy C-MEANS algorithm is the best known and widely
used a method.
Fuzzy C-MEANS algorithm is introduced by DUNN in 1973
and improved by BEZDEK in 1981 [Höppner vd, 2000].
Fuzzy C-MEANS lets that objects are belonging to two and
more cluster.
The total value of the membership of a data for all the
classes is equal to one.
However, the value of the memebership of the cluster that
contain this object is high than other clusters.
This Algorithm is used the least squares method [Höppner
vd, 2000].
FUZZY C-MEANS ALGORITHM
N
C
Jm   u xi  ci ,
i 1 j 1
2
m
ij
1 m  
The algorithm start by using randomly membership matrix
(U) and then the center vector calculate [Höppner vd, 2000].
N
cj 
m
u
 ij xi
i 1
N
m
u
 ij
i 1
FUZZY C-MEANS ALGORITHM
According to the calculated center vector, the membership
matrix (u) is computed by using the given as:
uij 
1
 xi  ci



k 1  xi  ck
C




2
 m 1
The new membership matrix (unew) is compared with the old
membership matrix (uold) and the the process continues until
the difference is smaller than the value of the ε
FUZZY C-MEANS ALGORITHM
EXAMPLE
Dataset
The Number of
Attributes
The Number of
Features
The Number of
Class
Synthetic
2000
2
3
1
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
FUZZY C-MEANS ALGORITHM
EXAMPLE
1
C=3
m=5
ε=1e-6
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Results





K-MEDOIDS is the best algorithm according to K-MEANS
and FUZZY C-MEANS.
However, K-MEDOIDS algorithm is suitable for small
datasets.
K-MEANS algorithm is the best appropriate in terms of
time.
In FUZZY C-MEANS algorithm, a object can belong to one
or more cluster.
However, a object can belong to only a cluster in the other
two algorithms.
References






[MacQueen, 1967] J.B., MacQueen, “Some Methods for Classification and Analysis of
Multivariate Observations”, Proc. Symp. Math. Statist.and Probability (5th), 281-297,(1967).
[Kantardzic, 2003] M., Kantardzic, “Data Mining: Concepts, Methods and Algorithms”, Wiley,
(2003).
[Kaufman and Rousseeuw, 1987] L., Kaufman, P. J., Rousseeuw, “Clustering by Means of
Medoids,” Statistical Data Analysis Based on The L1–Norm and Related Methods, edited by
Y. Dodge, North-Holland, 405–416, (1987).
[Kaufman and Rousseeuw, 1990] L., Kaufman, P. J., Rousseeuw, “Finding Groups in Data:
An Introduction to Cluster Analysis”, John Wiley and Sons., (1990).
[Höppner vd, 2000] F., Höppner, F., Klawonn, R., Kruse, T., Runkler, “Fuzzy Cluster
Analysis”, John Wiley&Sons, Chichester, (2000).
[Işık and Çamurcu, 2007] M., Işık, A.Y., Çamurcu, “K-MEANS, K-MEDOIDS ve Bulanık CMEANS Algoritmalarının Uygulamalı olarak Performanslarının Tespiti”, İstanbul Ticaret
Üniversitesi Fen Bilimleri Dergisi, Sayı :11, 31-45, (2007).
Download