Lecture 21 clustering

advertisement
Intro. ANN & Fuzzy Systems
Lecture 21
Clustering (2)
Intro. ANN & Fuzzy Systems
Outline
• Similarity (Distance) Measures
• Distortion Criteria
Scattering Criterion
• Hierarchical Clustering and other clustering
methods
(C) 2001-2003 by Yu Hen Hu
2
Intro. ANN & Fuzzy Systems
Distance Measure
• Distance Measure – What does it mean “Similar"?


( xi  yi ) 

 i 1

– Norm: d ( x, y ) || x  y ||m  
N
1/ m
m
– Mahalanobis distance:
d(x,y) = |x – y|TSxy1|x – y|
– Angle: d(x,y) = xTy/(|x|•|y|)
Binary and symbolic features (x, y contains 0, 1 only):
– Tanimoto coefficient: d ( x, y ) 
(C) 2001-2003 by Yu Hen Hu
xT y
xT x  y T y
3
Intro. ANN & Fuzzy Systems
Clustering Criteria
• Is the current clustering assignment good enough?
Most popular one is the mean-square error distortion
measure
c
n
D   I ( xk , i) || xk  W (i) ||2
i 1 k 1
c
 1
  
i 1  N i

|| x  y || ,

x , yc ( i )

2
N
N i   I ( xk , i)
k 1
• Other distortion measures can also be used:
 1
D   
i 1  N i
c
(C) 2001-2003 by Yu Hen Hu

d ( x, y) 

x ; yC ( i )

 1

D   
Min. d ( x, y ) 
x ; yC ( i )
i 1  N i

c
4
Intro. ANN & Fuzzy Systems
Scatter Matrics
• Scatter matrices are defined
in the context of analysis of
variance in statistics.
• They are used in linear
discriminant analysis.
• However, they can also be
used to gauge the fitness of
a particular clustering
assignment.
• Mean vector for i-th cluster:
1
mi 
Ni
N
 I ( xk , i) xk
k 1
• Total mean vector
1 c
1 N
m   N i mi   xk
N i 1
N k 1
• Scatter matrix for i-th cluster:

N
Si   I ( xk , i) ( xk  mi )(xk  mi )T

k 1
• Within-cluster scatter matrix
c
SW   Si
i 1
• Between-cluster scatter matrix
c

S B   N i (mi  m)(mi  m)T

i 1
(C) 2001-2003 by Yu Hen Hu
5
Intro. ANN & Fuzzy Systems
Scattering Criteria
• Total scatter matrix:
N

ST   ( xk  m)(xk  m)T

k 1
 SW  S B
• Note that the total scatter
matrix is independent of the
assignment I(xk,i). But …
• SW and SB both depend on
I(xk,i)!
• Desired clustering property
– SW small
– SB large
• How to gauge Sw is small or
SB is large?
There are several ways.
• Tr. Sw (trace of SW): Let
M
SW   m vm vmT
m 1
be the eigenvalue
decomposition of SW, then
M
c
m 1
i 1
Tr. SW   m   Tr.Si
c
N
  I ( xk , i ) || xk  mi ||2  D
i 1 k 1
(C) 2001-2003 by Yu Hen Hu
6
Intro. ANN & Fuzzy Systems
Cluster Separating Measure (CSM)
std = 0.3, csm = 1.6667
1.5
• Similar to scattering
criteria.
• csm = (mi-mj)/(i+j)
• The larger its value, the
more separable the two
clusters.
• Assume underlying data
distribution is Gaussian.
1
0.5
0
-2
-1
0
1
std = 0.5, csm = 1
2
-1
0
1
std = 0.8, csm = 0.625
2
-1
2
1.5
1
0.5
0
-2
2
1.5
1
0.5
0
-2
(C) 2001-2003 by Yu Hen Hu
0
1
7
Intro. ANN & Fuzzy Systems
Hierarchical Clustering
• Merge Method:
Initially, each xk is a cluster. During each iteration, nearest
pair of distinct clusters are merged until the number of
clusters is reduced to 1.
• How to measure distance between two clusters:
dmin(C(i), C(j)) = min. d(x,y); x  C(i), y  C(j)
 leads to minimum spanning tree
dmax(C(i), C(j)) = max. d(x,y); x  C(i), y  C(j)
davg(C(i), C(j)) =
1
Ni N j
  d ( x, y)
xC ( i ) yC ( j )
dmean(C(i), C(j)) = mi– mj
(C) 2001-2003 by Yu Hen Hu
8
Intro. ANN & Fuzzy Systems
Hierarchical Clustering (II)
Split method:
• Initially, only one cluster. Iteratively, a cluster is
splited into two or more clusters, until the total
number of clusters reaches a predefined goal.
• The scattering criterion can be used to decide
how to split a given cluster into two or more
clusters.
• Another way is to perform a m-way clustering,
using, say, k-means algorithm to split a cluster
into m smaller clusters.
(C) 2001-2003 by Yu Hen Hu
9
Download