Document

advertisement
spectral clustering between friends
spectral clustering (a la Ng-Jordan-Weiss)
data
e.g.
similarity graph
edges have weights w(i,j)
the Laplacian
diagonal matrix D
Normalized Laplacian:
energy
Normalized Laplacian:
spectral embedding
Normalized Laplacian:
Compute first k eigenvectors: v1, v2 , …, vk
clustering
Run k–means to cluster the points
spectral clustering
… what to prove?
Sidi, et. al. 2011 [TelAviv-SFU]
Many, many variants…
Many opinions
why should spectral clustering work?
spectral embedding
k perfect clusters
graph expansion
Expansion: For a subset S µ V, define
E(S) = set of edges with one endpoint in S.
S
graph expansion
Expansion: For a subset S µ V, define
E(S) = set of edges with one endpoint in S.
S1
S2
S4
S3
k-way expansion constant:
½G (k) = min f max Á(Si ) : S1 ; S2 ; : : : ; Sk µ V disjointg
Theorem
[Cheeger70, Alon-Milman85, Sinclair-Jerrum89]:
¸2
· ½G (2) ·
2
p
2¸ 2
“most important result
in spectral graph theory”
-- Wikipedia
Miclo’s conjecture
Higher-order Cheeger Conjecture [Miclo 08]:
For every graph G and k 2 N, we have
p
¸k
· ½G (k) · C(k) ¸ k
2
for some C(k) depending only on k.
S1
S2
[Lee-OveisGharan-Trevisan 2012]:
True with
This bound for C(k) is tight.
Algorithm of Ng-Jordan-Weiss works, changing the last step.
S4
S3
the clustering step
we do random projection
random space partition
Run k–means to cluster the points
Miclo’s conjecture
Higher-order Cheeger Conjecture [Miclo 08]:
For every graph G and k 2 N, we have
p
¸k
· ½G (k) · C(k) ¸ k
2
for some C(k) depending only on k.
S1
S2
[Lee-OveisGharan-Trevisan 2012]:
True with
This bound for C(k) is tight.
Algorithm of Ng-Jordan-Weiss works, changing the last step.
S4
S3
hybrid algorithms
Suppose the data has some nice low-dimensional structure
Spectral embedding could lose
that information:
Back in a high-dimensional space
hybrid algorithms
Suppose the data has some nice low-dimensional structure
Use spectral embedding distances
to deform the data
Do clustering on transformed data set
unraveling the mysteries of complexity
the unique games conjecture
Consider linear equations in two variables, modulo a prime p
Variables: x1, x2, …, xn
x12 + x2
=4
x4 – 3 x7 = 1
x9 + 8 x12 = 9
…
If there exists a solution that satisfies 99% of the equations,
can you find one that satisfies 10%?
Conjectured to be NP-hard [Khot 2002]
a spectral attack
Construct a graph with one vertex for every variable, and an
edge whenever two variables occur in the same constraint.
x12 + x2
=4
x4 – 3 x7 = 1
x9 + 8 x12 = 9
…
A “good” solution to the equations implies a partition of the
graph into p nice clusters!
a spectral attack
Higher-order Cheeger Theorem:
For every graph G and k 2 N, we have
S1
S2
Unnecessary for large k:
[Arora-Barak-Steurer 2010]
A better asymptotic dependence would disprove the UGC.
S4
S3
Download