spectral clustering between friends spectral clustering (a la Ng-Jordan-Weiss) data e.g. similarity graph edges have weights w(i,j) the Laplacian diagonal matrix D Normalized Laplacian: energy Normalized Laplacian: spectral embedding Normalized Laplacian: Compute first k eigenvectors: v1, v2 , …, vk clustering Run k–means to cluster the points spectral clustering … what to prove? Sidi, et. al. 2011 [TelAviv-SFU] Many, many variants… Many opinions why should spectral clustering work? spectral embedding k perfect clusters graph expansion Expansion: For a subset S µ V, define E(S) = set of edges with one endpoint in S. S graph expansion Expansion: For a subset S µ V, define E(S) = set of edges with one endpoint in S. S1 S2 S4 S3 k-way expansion constant: ½G (k) = min f max Á(Si ) : S1 ; S2 ; : : : ; Sk µ V disjointg Theorem [Cheeger70, Alon-Milman85, Sinclair-Jerrum89]: ¸2 · ½G (2) · 2 p 2¸ 2 “most important result in spectral graph theory” -- Wikipedia Miclo’s conjecture Higher-order Cheeger Conjecture [Miclo 08]: For every graph G and k 2 N, we have p ¸k · ½G (k) · C(k) ¸ k 2 for some C(k) depending only on k. S1 S2 [Lee-OveisGharan-Trevisan 2012]: True with This bound for C(k) is tight. Algorithm of Ng-Jordan-Weiss works, changing the last step. S4 S3 the clustering step we do random projection random space partition Run k–means to cluster the points Miclo’s conjecture Higher-order Cheeger Conjecture [Miclo 08]: For every graph G and k 2 N, we have p ¸k · ½G (k) · C(k) ¸ k 2 for some C(k) depending only on k. S1 S2 [Lee-OveisGharan-Trevisan 2012]: True with This bound for C(k) is tight. Algorithm of Ng-Jordan-Weiss works, changing the last step. S4 S3 hybrid algorithms Suppose the data has some nice low-dimensional structure Spectral embedding could lose that information: Back in a high-dimensional space hybrid algorithms Suppose the data has some nice low-dimensional structure Use spectral embedding distances to deform the data Do clustering on transformed data set unraveling the mysteries of complexity the unique games conjecture Consider linear equations in two variables, modulo a prime p Variables: x1, x2, …, xn x12 + x2 =4 x4 – 3 x7 = 1 x9 + 8 x12 = 9 … If there exists a solution that satisfies 99% of the equations, can you find one that satisfies 10%? Conjectured to be NP-hard [Khot 2002] a spectral attack Construct a graph with one vertex for every variable, and an edge whenever two variables occur in the same constraint. x12 + x2 =4 x4 – 3 x7 = 1 x9 + 8 x12 = 9 … A “good” solution to the equations implies a partition of the graph into p nice clusters! a spectral attack Higher-order Cheeger Theorem: For every graph G and k 2 N, we have S1 S2 Unnecessary for large k: [Arora-Barak-Steurer 2010] A better asymptotic dependence would disprove the UGC. S4 S3