Spectral Clustering Jianping Fan Dept of Computer Science UNC, Charlotte Lecture Outline Motivation Graph overview and construction Spectral Clustering Cool implementations 2 Semantic interpretations of clustering clusters 3 Spectral Clustering Example – 2 Spirals 2 Dataset exhibits complex cluster shapes 1.5 1 0.5 0 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 -0.5 -1 K-means performs very poorly in this space due bias toward dense spherical clusters. -1.5 -2 0.8 0.6 0.4 0.2 In the embedded space given by two leading eigenvectors, clusters are trivial to separate. -0.709 -0.7085 -0.708 -0.7075 -0.707 -0.7065 0 -0.706 -0.2 -0.4 -0.6 4 -0.8 Spectral Clustering Example Original Points K-means (2 Clusters) Why k-means fail for these two examples? Lecture Outline Motivation Graph overview and construction Spectral Clustering Cool implementation 6 Graph-based Representation of Data Similarity 7 similarity Graph-based Representation of Data Similarity 8 Graph-based Representation of Data Relationship 9 Manifold 10 Graph-based Representation of Data Relationships Manifold 11 Graph-based Representation of Data Relationships 12 Data Graph Construction 13 14 Graph-based Representation of Data Relationships 15 Graph-based Representation of Data Relationships 16 Graph-based Representation of Data Relationships 17 18 Graph-based Representation of Data Relationships Graph Cut 19 Lecture Outline Motivation Graph overview and construction Spectral Clustering Cool implementations 20 21 Graph-based Representation of Data Relationships 22 Graph Cut 23 24 25 26 27 Graph-based Representation of Data Relationships 28 Graph Cut 29 30 31 32 33 Eigenvectors & Eigenvalues 34 35 36 Normalized Cut A graph G(V, E) can be partitioned into two disjoint sets A, B Cut is defined as: Optimal partition of the graph G is achieved by minimizing the cut Min ( ) 37 Normalized Cut Normalized Cut Association between partition set and whole graph 38 Normalized Cut 39 Normalized Cut 40 Normalized Cut 41 Normalized Cut Normalized Cut becomes Normalized cut can be solved by eigenvalue equation: 42 K-way Min-Max Cut Intra-cluster similarity Inter-cluster similarity Decision function for spectral clustering 43 Mathematical Description of Spectral Clustering Refined decision function for spectral clustering We can further define: 44 Refined decision function for spectral clustering This decision function can be solved as 45 Spectral Clustering Algorithm Ng, Jordan, and Weiss Motivation Given a set of points S s1,..., sn Rl We would like to cluster them into k subsets 46 Algorithm Form the affinity matrix W R 2 2 || si s j || / 2 DefineWij e if i j nxn Wii 0 Scaling parameter chosen by user Define D a diagonal matrix whose (i,i) element is the sum of A’s row i 47 Algorithm LD 1/ 2 1/ 2 Form the matrix Find x1 , x2 ,..., xk , the k largest eigenvectors of L These form the the columns of the new matrix X WD Note: have reduced dimension from nxn to nxk 48 Algorithm Form the matrix Y Renormalize each of X’s rows to have unit length Yij X ij /( X ij 2 )2 Y R nxk j Treat each row of Y as a point in R k Cluster into k clusters via K-means 49 Algorithm Final Cluster Assignment Assign point si to cluster j iff row i of Y was assigned to cluster j 50 Why? If we eventually use K-means, why not just apply K-means to the original data? This method allows us to cluster non-convex regions 51 Some Examples 52 53 54 55 56 57 58 59 60 User’s Prerogative Affinity matrix construction Choice of scaling factor Realistically, search over gives the tightest clusters 2 and pick value that Choice of k, the number of clusters Choice of clustering method 61 How to select k? Eigengap: the difference between two consecutive eigenvalues. Most stable clustering is generally given by the value k that maximises the expression k k k 1 Largest eigenvalues of Cisi/Medline data 50 λ1 45 40 Choose k=2 Eigenvalue max k 2 1 35 30 25 λ2 20 15 10 5 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 K 62 Recap – The bottom line 63 Summary Spectral clustering can help us in hard clustering problems The technique is simple to understand The solution comes from solving a simple algebra problem which is not hard to implement Great care should be taken in choosing the “starting conditions” 64 Spectral Clustering Spectral Clustering Spectral Clustering Spectral Clustering Spectral Clustering Spectral Clustering Spectral Clustering Spectral Clustering Spectral Clustering Spectral Clustering Spectral Clustering Spectral Clustering Spectral Clustering