Clustering Validity

advertisement
Clustering Validity
Adriano Joaquim de O Cruz ©2006
NCE/UFRJ
adriano@nce.ufrj.br
Clustering Validity
 The number of clusters is not always
previously known.
 In many problems the number of
classes is known but it is not the best
configuration.
 It is necessary to study methods to
indicate and/or validate the number of
classes.
*@2006 Adriano Cruz
*NCE e IM - UFRJ
Cluster ‹#›
Clustering Validity Example 1
 Consider the problem of number
recognition
 It is known that there are 10 classes (10
digits)
 The number of clusters, however, may
be greater than 10
 This is the result of different handwriting
to the same digit
*@2006 Adriano Cruz
*NCE e IM - UFRJ
Cluster ‹#›
Clustering Validity Example 2
 Consider the problem segmentation of
thermal image in a room
 It is known that there are 2 classes of
temperatures: body and room
temperatures
 This is a problem where the number of
classes is well defined.
*@2006 Adriano Cruz
*NCE e IM - UFRJ
Cluster ‹#›
Clustering Validity Problem
 First data is partitioned in different number of
clusters
 It is also important to try different initial
conditions to the same number of partitions
 Validity measures are applied to these
partitions to estimate their quality
 It is necessary to estimate the quality when
the number of partitions is changed and, for
the same number, when the initial conditions
are different
*@2006 Adriano Cruz
*NCE e IM - UFRJ
Cluster ‹#›
Clustering Validity
L-Clusters
Initial Definitions
 d(ei,ek) is the dissimilarity between
element ei and ek.
 Euclidean distance is an example
of an measure of dissimilarity
*@2006 Adriano Cruz
*NCE e IM - UFRJ
Cluster ‹#›
L–Cluster Definition
 C is an L-cluster if for each object ei
belonging to C:
 ek  C, max d(ei,ek)<eh  C, min
d(ei,eh)
 Maximum distance between any element
ei and any element ek is smaller than the
minimum distance between ei and any eh
from another cluster.
*@2006 Adriano Cruz
*NCE e IM - UFRJ
Cluster ‹#›
L-cluster
C
*@2006 Adriano Cruz
*NCE e IM - UFRJ
Cluster ‹#›
L* – Definition
 C is an L*-cluster if for each object
ei belonging to C:
 ek  C, max d(ei,ek) <
 el  C, eh  C, min d(el,eh)
*@2006 Adriano Cruz
*NCE e IM - UFRJ
Cluster ‹#›
L*-cluster
C
*@2006 Adriano Cruz
*NCE e IM - UFRJ
Cluster ‹#›
Clustering Validity
Silhouettes
Introduction
 Silhouettes: a graphical aid to the
interpretation and validation of cluster
analysis. Journal of Computational and
Applied Mathematics. P.J. Rousseeuw, 1987
 Each cluster is represented by one silhouette,
showing which objects lie well within the
cluster.
 The user can compare the quality of the
clusters
*@2006 Adriano Cruz
*NCE e IM - UFRJ
Cluster ‹#›
Method - I
 Consider a cluster A .
 For each element ei  A calculate the
average dissimilarity to all other objects
of A, a(ei) = d(ei,A).
 Therefore, A can not be a singleton.
 Euclidean distance is an example of
dissimilarity.
*@2006 Adriano Cruz
*NCE e IM - UFRJ
Cluster ‹#›
Method - II
 Consider all clusters Ck different from A.
 Calculate dk(ei,Ck), the average
dissimilarity of ei to all elements of Ck.
 Select b(ei) = min(dk(ei,Ck)).
 Let us call B the cluster whose
dissimilarity is b(ei).
 This is the second-best choice for ei
*@2006 Adriano Cruz
*NCE e IM - UFRJ
Cluster ‹#›
Method - III






The silhouette s(ei) is equal to
s(ei) = 1–[a(ei) / b(ei)] se a(ei) < b(ei).
s(ei) = 0
se a(ei) = b(ei).
s(ei) = [b(ei) / a(ei)] - 1 se a(ei) > b(ei).
ou
s(ei) = [b(ei) - a(ei)] / max (b(ei),a(ei))
 -1 <= s(ei) <= +1
*@2006 Adriano Cruz
*NCE e IM - UFRJ
Cluster ‹#›
Understanding s(ei)
 s(ei)  1: within dissimilarity a(ei) <<
b(ei), ei is well classified.
 s(ei)  0: a(ei)  b(ei), ei may belong to
either cluster.
 s(ei)  -1: within dissimilarity
a(ei)>>b(ei), ei is misclassified, should
belong to B.
*@2006 Adriano Cruz
*NCE e IM - UFRJ
Cluster ‹#›
Silhouette
 The silhouette of the cluster A is the plot
of all s(ei) ranked in decreasing order.
 The average of all s(ei) of all elements
in the cluster is called the average
silhouette.
*@2006 Adriano Cruz
*NCE e IM - UFRJ
Cluster ‹#›
Example of use I
QTY = 100;
X = [randn(QTY,2)+0.5*ones(QTY,2);randn(QTY,2)...
- 0.5*ones(QTY,2)];
opts = statset('Display','final');
[cidx, ctrs] = kmeans(X, 2, 'Distance','city', ...
'Replicates',5, 'Options',opts);
figure;
plot(X(cidx==1,1),X(cidx==1,2),'r.', ...
X(cidx==2,1),X(cidx==2,2), ...
'b.', ctrs(:,1),ctrs(:,2),'kx');
figure;
[s, h] = silhouette(X, cidx, 'sqeuclid');
*@2006 Adriano Cruz
*NCE e IM - UFRJ
Cluster ‹#›
Ex Silhouette 1
*@2006 Adriano Cruz
*NCE e IM - UFRJ
Cluster ‹#›
Ex Silhouette 2
*@2006 Adriano Cruz
*NCE e IM - UFRJ
Cluster ‹#›
Example of use I I
QTY = 100;
X = [randn(QTY,2)+2*ones(QTY,2);randn(QTY,2)...
- 2*ones(QTY,2)];
opts = statset('Display','final');
[cidx, ctrs] = kmeans(X, 2, 'Distance','city', ...
'Replicates',5, 'Options',opts);
figure;
plot(X(cidx==1,1),X(cidx==1,2),'r.', ...
X(cidx==2,1),X(cidx==2,2), ...
'b.', ctrs(:,1),ctrs(:,2),'kx');
figure;
[s, h] = silhouette(X, cidx, 'sqeuclid');
*@2006 Adriano Cruz
*NCE e IM - UFRJ
Cluster ‹#›
Ex silhouette 3
*@2006 Adriano Cruz
*NCE e IM - UFRJ
Cluster ‹#›
Ex silhouette 4
*@2006 Adriano Cruz
*NCE e IM - UFRJ
Cluster ‹#›
Cluster Validity
Partition Coefficient
Partition Coefficient
 This coefficient is defined as
 c n

2
F =   (μij )  / n
 i=1 j=1

1/ c  F  1
*@2006 Adriano Cruz
*NCE e IM - UFRJ
Cluster ‹#›
Partition Coefficient comments
 F is inversely proportional to the number
of clusters.
 F is not appropriated to find the best
number of partitions
 F is best suited to validate the best
partition among those with the same
number of clusters
*@2006 Adriano Cruz
*NCE e IM - UFRJ
Cluster ‹#›
Partition Coefficient
 When F=1/c the system is entirely fuzzy,
since every element belongs to all
clusters with the same degree of
membership
 When F=1 the system is rigid and
membership values are either 1 or 0.
 This measurement can only be applied
to fuzzy partitions
*@2006 Adriano Cruz
*NCE e IM - UFRJ
Cluster ‹#›
Partition Coefficient Example
 The Partition Matrix is
U = 1 1 0 0


0 0 1 1 
w1
w2
w3
12 +12 +12 +12
F=
=1
4
*@2006 Adriano Cruz
*NCE e IM - UFRJ
w3
Cluster ‹#›
Partition Coefficient Example
 The Partition Matrix is
U = 0.5 0.5 0.5 0.5


0.5 0.5 0.5 0.5
w1
8  0.52
F=
= 0.5= 1 / 2 = 1 / c
4
*@2006 Adriano Cruz
*NCE e IM - UFRJ
w2
w3
w4
Cluster ‹#›
Partition Coefficient Example
 The Partition Matrix is
U1 = 0.5 1 0 0.9 0.3 0.2


0.5 0 1 0.1 0.7 0.8
X1
X4
X2
X5
X3
X6
0.52 +12 + 0.92 + 0.32 + 0.22 + 0.52 +12 + 0.12 + 0.72 + 0.82
F=
6
F = 0.763
*@2006 Adriano Cruz
*NCE e IM - UFRJ
Cluster ‹#›
Cluster Validity
Partition Entropy
Partition Entropy
 Partition Entropy is defined as
 c n

H =   μij  log( μij )  / n
 i=1 j=1

0  H  logc
 When H=0 the partition is rigid.
 When H=log(c) the fuzziness is maximum.
 0 <= 1-F <= H
*@2006 Adriano Cruz
*NCE e IM - UFRJ
Cluster ‹#›
Partition Entropy comments
 Partition Entropy (H) is directly proportional to
the number of partitions.
 H is more appropriated to validate the best
partition among several runs of an algorithm.
 H is strictly a fuzzy measure
*@2006 Adriano Cruz
*NCE e IM - UFRJ
Cluster ‹#›
Cluster Validity
Compactness and Separation
Compactness and Separation
 CS is defined as
CS =
Jm
n  (dmin )2
 Jm is the objective function minimized by
the FCM algorithm.
 n is the number of elements.
 dmin is minimum Euclidean distance
between the center of two clusters.
*@2006 Adriano Cruz
*NCE e IM - UFRJ
Cluster ‹#›
Compactness and Separation
 The minimum distance is defined as
d min = mi,ijn ci  c j
 The complete formula is
c
n
 μ
m
ij
CS =
*@2006 Adriano Cruz
vi  x j
2
i=1 j=1
n  mi,ijn vi  v j
2
*NCE e IM - UFRJ
Cluster ‹#›
Compactness and Separation
 This a very complete validation
measure.
 It validates the number of clusters and
the checks the separation among
clusters.
 From our experiments it works well
even when the degree of superposition
is high.
*@2006 Adriano Cruz
*NCE e IM - UFRJ
Cluster ‹#›
Cluster Validity
Fuzzy Linear Discriminant
Fischer Linear Discriminant
 The Fisher’s Linear Discriminant (FLD)
is an important technique used in
pattern recognition problems to evaluate
the compactness and separation of the
partitions produced by crisp clustering
techniques.
*@2006 Adriano Cruz
*NCE e IM - UFRJ
Cluster ‹#›
Fischer Linear Discriminant
 It is easier to handle classification
problems in which sampled data has
few characteristics
 So it is important to reduce the problem
dimensionality
 When FLD is applied to a space crisply
partitioned it produces an operator (W)
that maps the original set (Rp) into a
new set (Rk), where k<p
*@2006 Adriano Cruz
*NCE e IM - UFRJ
Cluster ‹#›
Fischer Linear Discriminant
x2
W
x1
Figura . – Projeção de amostras dispostas em 2 classes em uma reta feita pelo Discriminante
Linear de Fisher
*@2006 Adriano Cruz
*NCE e IM - UFRJ
Cluster ‹#›
FLD
 FLD measures the compactness and
separation of all categories when crisp
partitions are created
 FLD uses two matrices:
 SB : Between Classes Scatter Matrix
 SW: Within Classes Scatter Matrix
*@2006 Adriano Cruz
*NCE e IM - UFRJ
Cluster ‹#›
FLD – SB Matrix
 Measures the quality of separation
between classes
c
S B =  ni (mi  m )(mi  m )
T
i=1
n
1
m=  xj
n j=1
*@2006 Adriano Cruz
1
mi =
ni
*NCE e IM - UFRJ
ni
x ,
i
xi  ci
j=1
Cluster ‹#›
FLD – SB Matrix




m is the average of all samples
mi is the average of all samples belonging to cluster i
n is the number of samples
ni is the number of samples belonging to cluster i
c
S B =  ni (mi  m )(mi  m )
T
i=1
n
1
m=  xj
n j=1
*@2006 Adriano Cruz
ni
1
mi =  xi ,xi  ci
ni j=1
*NCE e IM - UFRJ
Cluster ‹#›
FLD – SW Matrix
 Measures the compactness of all
classes
 It is the sum of all internal scattering
SW =  (x j  mi )(xj  mi )
T
i
jci
c
n
SW =  (x j  mi )(xj  mi )
T
i=1 j=1
*@2006 Adriano Cruz
*NCE e IM - UFRJ
Cluster ‹#›
Total Scattering
 The total scattering is the sum of the
internal scattering and the scattering
between the classes
 ST=SW+SB
 In an optimal partition the separation
between classes (SB) must be
maximum and within the classes
minimum (SW)
*@2006 Adriano Cruz
*NCE e IM - UFRJ
Cluster ‹#›
J criteria
 Fisher defined the J criteria that must
be maximized

SB 
J=
SW 
 A simplified way to evaluate J is
trace(SB )
J=
trace(SW )
*@2006 Adriano Cruz
*NCE e IM - UFRJ
Cluster ‹#›
J comments
 J may vary in the interval 0<=J<=
 J is strictly rigid
 J looses precision as the sample
overlapping increases
*@2006 Adriano Cruz
*NCE e IM - UFRJ
Cluster ‹#›
EFLD
 EFLD measures the compactness and
separation of all categories when fuzzy
partitions are created
 EFLD uses two matrices:
 SBe : Between Classes Scatter Matrix
 SWe: Within Classes Scatter Matrix
*@2006 Adriano Cruz
*NCE e IM - UFRJ
Cluster ‹#›
EFLD – SBe Matrix
 Measures the quality of separation
between classes
c
n
S Be =  μij (mei  m )(mei  m )T
i=1 j=1
n
1 n
m=  xj
n j=1
μ x
ij
mei =
j
j=1
n
μ
ij
j=1
*@2006 Adriano Cruz
*NCE e IM - UFRJ
Cluster ‹#›
EFLD – SWe Matrix
 Measures the compactness of all
classes
 It is the sum of all internal scattering
c
n
SWe =  μij (x j  mei )(xj  mei )T
i=1 j=1
*@2006 Adriano Cruz
*NCE e IM - UFRJ
Cluster ‹#›
Total Scattering
 The total scattering is the sum of the
internal scattering and the scattering
between the classes
STe=SWe+SBe
 In an optimal partition the separation
between classes (SBe) must be
maximum and within the classes
minimum (SWe)
*@2006 Adriano Cruz
*NCE e IM - UFRJ
Cluster ‹#›
Je criteria
 Je : criteria that must be maximised

S 
J =
S 
Be
e
We
 A simplified way to evaluate Je is
Je =
trace(SB )
e
trace(SW )
e
*@2006 Adriano Cruz
*NCE e IM - UFRJ
Cluster ‹#›
Simplifying Je criteria
 A simplified way to evaluate Je
 It can be proved that ST is constant and
equal to
ST = trace(ST )
n
ST =  x j  m
2
j=1
S Be
S Be
Je =
=
SWe ST  S Be
*@2006 Adriano Cruz
*NCE e IM - UFRJ
Cluster ‹#›
Je comments
 Je may vary in the interval 0<=Je<=
 Je is strictly rigid
 Je looses precision as the sample
overlapping increases
*@2006 Adriano Cruz
*NCE e IM - UFRJ
Cluster ‹#›
Applying EFLD
Número de Categorias
EFLD
2
3
4
5
6
Amostras X1
4,6815
4,9136
0,2943
0,2559
0,3157
Amostras X2
0,3271
0,8589
0,8757
0,9608
1,0674
*@2006 Adriano Cruz
*NCE e IM - UFRJ
Cluster ‹#›
Cluster Validity
Inter Class Contrast
Comments
 EFLD
 Increases as the number of clusters
rises.
 Increases when classes have high
degree of overlapping.
 Reaches maximum for a wrong number
of clusters.
*@2006 Adriano Cruz
*NCE e IM - UFRJ
Cluster ‹#›
ICC
 Evaluates a crisp and fuzzy clustering
algorithms
 Measures:
 Partition Compactness
 Partition Separation
 ICC must be Maximized
*@2006 Adriano Cruz
*NCE e IM - UFRJ
Cluster ‹#›
ICC
sBe
ICC =
Dmin c
n
 sBe – estimates the quality of the
placement of the centres.
 1/n – scale factor
 Compensates the influence of the number
of points in sBe
*@2006 Adriano Cruz
*NCE e IM - UFRJ
Cluster ‹#›
ICC - 2
sBe
ICC =
Dmin c
n
 Dmin – minimum Euclidian distance between
all pairs of centres
 Neutralizes the tendency of sBe to grow,
avoiding the maximum being reached for a
number of clusters greater than the ideal
value.
 When 2 or more clusters represent a class
– Dmin decreases abruptly
*@2006 Adriano Cruz
*NCE e IM - UFRJ
Cluster ‹#›
ICC Fuzzy Application




Five classes with 500 points each
No class overlapping
X1 – (1,2), (6,2), (1, 6), (6,6), (3,5, 9) Std 0,3
Apply FCM for m = 2 and c = 2 ...10
*@2006 Adriano Cruz
*NCE e IM - UFRJ
Cluster ‹#›
ICC Fuzzy Application Results
Number of clusters
Measures
2
3
4
5
ICC
M
7,596
41,99
51,92
96,70
ICCTra
M
7,596
41,99
51,92
96,70
ICCDet
M
IND
154685
259791
673637
EFLD
M
0.185
0.986
1.877
13.65
EFLDTra
M
0,185
0,986
1,877
13,65
EFLDDet
M
IND
0,955
3,960
182,70
CS
m
0,350
0,096
0,070
0,011
F
M
0,705
0,713
0,795
0,943
MinHT
M
0,647
0,572
2,124
1,994
MeanHT
M
0,519
0,496
1,327
1,887
MinRF
0
0,100
0,316
0
0
*@2006 Adriano Cruz
*NCE e IM - UFRJ
Cluster ‹#›
ICC Fuzzy Application Time
Number of Categories
Time
2
3
4
5
ICC
0,0061
0,0069
0,0082
0,00914
ICCTra
0,0078
0,0060
0,0088
0,0110
ICCDet
0,0110
0,0088
0,0110
0,0132
EFLD
0.0053
0.0071
0.0063
0.0080
EFLDTra
0,7678
1,0870
1,4780
1,8982
EFLDDet
0,7800
1,1392
1,5510
2,0160
CS
0,0226
0,0261
0,0382
0,0476
NFI
0,0061
0,0056
0,0058
0,00603
F
0,0044
0,0045
0,0049
0,00491
FPI
0,0061
0,0045
0,0049
0,00532
*@2006 Adriano Cruz
*NCE e IM - UFRJ
Cluster ‹#›
Application with Overlapping




Five classes with 500 points each
High cluster overlapping
X1 – (1,2), (6,2), (1, 6), (6,6), (3,5, 9) Std 0,3
Apply FCM for m = 2 and c = 2 ...10
*@2006 Adriano Cruz
*NCE e IM - UFRJ
Cluster ‹#›
Application Overlapping Results
Measures
2
3
4
5
10
ICC
M
5,065
4,938
6,191
7,829
5,69
ICCTra
M
5,065
4,938
6,191
7,829
5,69
ICCDet
M
IND
715,19
3572
7048
6024
EFLD
M
0.450
0.585
0.839
1.095
1.344
EFLDTra
M
0,450
0,585
0,839
1,095
1,344
EFLDDet
M
IND
0,049
0,315
0,743
1,200
CS
m
0,164
0,225
0,191
0,122
0,223
F
M
0,754
0,621
0,591
0,586
0,439
MeanHT
M
0,632
0,485
0,550
0,597
0,429
MinRF
0
0,170
0,294
0,194
0,210
0,402
MPE
m
0,568
0,601
0,561
0,525
0,565
*@2006 Adriano Cruz
*NCE e IM - UFRJ
Cluster ‹#›
Application Time Results
Number of Clusters
Time
2
3
4
5
ICC
0,0060
0,0064
0,0077
0,00881
ICCTra
0,0066
0,0060
0,0098
0,0110
ICCDet
0,0110
0,0078
0,0110
0,0120
EFLD
0.0063
0.0088
0.0096
0.0110
EFLDTra
0,7930
2,1038
1,7598
2,2584
EFLDDet
0,9720
1,2580
1,6090
1,8450
CS
0,0220
0,0283
0,0362
0,05903
F
0,0112
0,0121
0,0061
0,0164
MPE
0,0167
0,0271
0,0319
0,03972
*@2006 Adriano Cruz
*NCE e IM - UFRJ
Cluster ‹#›
ICC conclusions
 Fast and efficient
 Works with fuzzy and crisp partitions
 Efficient even with high overlapping
clusters
 High rate of right results
*@2006 Adriano Cruz
*NCE e IM - UFRJ
Cluster ‹#›
Download