# K-means Clustering via Principal Component Analysis ```K-means Clustering via
Principal Component Analysis
According to the paper by Chris Ding
and Xiaofeng He from Int’l Conf.
2004
1
Minimizing the sum of squared errors
K
JK 

(x i  m k )
2
k  1 i C k
Where data matrix
x i  ( x1 ,  , x d )
X  ( x1 , , x n )
Centroid of cluster Ck
m
k

1
nk

xi
nk is the number of points in Ck
i C k
2
Principal Component Analysis
(PCA)
Centered data matrix
Y  ( y 1 ,  , y n ),
y i  x i  x,
x 
1
n
n

xi
i 1
Covariance matrix
1
n 1
YY
T

(x

n 1
i
 x )( x i  x )
T
i 1
1
Factor
1
n
n 1
is ignored
3
PCA - continuation
Eigenvalues and eigenvectors
YY u k   k u k ,
T
Y Y v k  k v k ,
T
v k  Y u k / k
T
1/ 2
Singular value decomposition (SVD)
Y 

1/ 2
k
ukv
T
k
k
4
PCA - example
5
K-means → PCA
nk

Indikator vectors h  ( 0 ,  , 0 , 1,  ,1, 0 ,  , 0 ) T / n 1 / 2
k
k
H K  (h 1 ,  , h K )
Criterion
J K  Tr( X X )  Tr( H X XH
T
T
K
T
K
)
Linear transform by K &times; K orthonormal matrix T
Q k  (q 1 ,  , q K )  H K T
Last column of T
t K  ( n1 / n ,  ,
n1 / n )
T
6
K-means → PCA - continuation
n1
qK 
Criterion
J K  Tr( Y Y )  Tr( Q K 1Y YQ K 1 )
n
T
hK 
1
Therefore
n
h1   
nK
T
e
n
T
Optimization becomes
T
T
max Tr( Q K 1Y YQ K 1 )
Q K 1
Solution is first K-1 principal components
Q k  ( v 1 ,  , v K 1 )
7
PCA → K-means
Clustering by PCA
C  ee / n 
T
K 1

k 1
K
vkvk 
T

K

q kq k 
T
k 1
T
h kh k
k 1
Probability of connectivity between i and j
p ij 
c ij
c
1/ 2
ii
c
1/ 2
jj
0,
p ij  
1
,

0    1,
if p ij  
if p ij  
usually
  0.5
8
9
10
11
12
13
14
15
16
Eigenvalues
• 1. case 164030, 58, 5
• 2. case 212920, 1892, 157
17
18
19