K-means Clustering via Principal Component Analysis

K-means Clustering via Principal Component Analysis According to the paper by Chris Ding and Xiaofeng He from Int’l Conf. Machine Learning, Banff, Canada, 2004 1 Traditional K-means Clustering Minimizing the sum of squared errors K JK   (x i  m k ) 2 k  1 i C k Where data matrix x i  ( x1 ,  , x d ) X  ( x1 , , x n ) Centroid of cluster Ck m k  1 nk  xi nk is the number of points in Ck i C k 2 Principal Component Analysis (PCA) Centered data matrix Y  ( y 1 ,  , y n ), y i  x i  x, x  1 n n  xi i 1 Covariance matrix 1 n 1 YY T  (x  n 1 i  x )( x i  x ) T i 1 1 Factor 1 n n 1 is ignored 3 PCA - continuation Eigenvalues and eigenvectors YY u k   k u k , T Y Y v k  k v k , T v k  Y u k / k T 1/ 2 Singular value decomposition (SVD) Y   1/ 2 k ukv T k k 4 PCA - example 5 K-means → PCA nk  Indikator vectors h  ( 0 ,  , 0 , 1,  ,1, 0 ,  , 0 ) T / n 1 / 2 k k H K  (h 1 ,  , h K ) Criterion J K  Tr( X X )  Tr( H X XH T T K T K ) Linear transform by K × K orthonormal matrix T Q k  (q 1 ,  , q K )  H K T Last column of T t K  ( n1 / n ,  , n1 / n ) T 6 K-means → PCA - continuation n1 qK  Criterion J K  Tr( Y Y )  Tr( Q K 1Y YQ K 1 ) n T hK  1 Therefore n h1    nK T e n T Optimization becomes T T max Tr( Q K 1Y YQ K 1 ) Q K 1 Solution is first K-1 principal components Q k  ( v 1 ,  , v K 1 ) 7 PCA → K-means Clustering by PCA C  ee / n  T K 1  k 1 K vkvk  T  K  q kq k  T k 1 T h kh k k 1 Probability of connectivity between i and j p ij  c ij c 1/ 2 ii c 1/ 2 jj 0, p ij   1 ,  0    1, if p ij   if p ij   usually   0.5 8 9 10 11 12 13 14 15 16 Eigenvalues • 1. case 164030, 58, 5 • 2. case 212920, 1892, 157 17 18 19 Thank you for your attention 20

K-means Clustering via Principal Component Analysis

Related documents

Products

Support

K-means Clustering via Principal Component Analysis

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib