Consistency of Modularity Clustering on Random Geometric Graphs Erik Davis May 10, 2016

advertisement
Consistency of Modularity Clustering on Random
Geometric Graphs
Erik Davis
The University of Arizona
May 10, 2016
Outline
Introduction to Modularity Clustering
Pointwise Convergence
Convergence of Optimal Partitions
Graph Clustering
G = (X , W )
Graph Clustering
G = (X , W )
clustering = partition = coloring
Graph Clustering
G = (X , W )
1
2m
P
i,j
clustering = partition = coloring
Wij δ(ci , cj ) where m =
1
2
P
i,j
Wij .
Modularity (Newman & Girvan ‘04)
Q(U) =
1 X di dj
1 X
Wij δ(ci , cj ) −
δ(ci , cj )
2m
2m
2m
i,j
where di =
P
j
Wij , U a clustering.
i,j
Modularity Clustering
Modularity Clustering:
U ∗ = arg max Q(U),
|U |≤K
Modularity Clustering
Modularity Clustering:
U ∗ = arg max Q(U),
|U |≤K
More generally (α-modularity):
Q(U) =
1 X
1 X α α
Wij δ(ci , cj ) − 2
di dj δ(ci , cj )
2m
S
i,j
where S =
P
i
diα .
i,j
Random Geometric Graphs
Fix open D ⊂ Rd with Lipschitz boundary, ν = ρ(x) dx probability
measure on D.
Random Geometric Graphs
Fix open D ⊂ Rd with Lipschitz boundary, ν = ρ(x) dx probability
measure on D. Take {Xi }i∈N i.i.d., and Xn = {Xi }ni=1 .
Random Geometric Graphs
Fix open D ⊂ Rd with Lipschitz boundary, ν = ρ(x) dx probability
measure on D. Take {Xi }i∈N i.i.d., and Xn = {Xi }ni=1 .
To assign weights, we pick a kernel η : Rd → R, length scale n .
Random Geometric Graphs
Fix open D ⊂ Rd with Lipschitz boundary, ν = ρ(x) dx probability
measure on D. Take {Xi }i∈N i.i.d., and Xn = {Xi }ni=1 .
To assign weights, we pick a kernel η : Rd → R, length scale n .
( X −X η i n j , if i 6= j,
Wij =
0,
otherwise.
Random Geometric Graphs
Fix open D ⊂ Rd with Lipschitz boundary, ν = ρ(x) dx probability
measure on D. Take {Xi }i∈N i.i.d., and Xn = {Xi }ni=1 .
To assign weights, we pick a kernel η : Rd → R, length scale n .
( X −X i
j
1
=: ηn (Xi − Xj ), if i 6= j,
dη
n
Wij = n
0,
otherwise.
Graph Gn = (Xn , W ).
Questions
Gn gives (random) modularity functional Qn .
Questions
Gn gives (random) modularity functional Qn .
1. What is the behavior of Qn as n → ∞?
Questions
Gn gives (random) modularity functional Qn .
1. What is the behavior of Qn as n → ∞?
2. What do optimal modularity clusterings
Un∗ ∈ arg max Qn (Un )
|Un |≤K
look like?
Questions
Gn gives (random) modularity functional Qn .
1. What is the behavior of Qn as n → ∞?
2. What do optimal modularity clusterings
Un∗ ∈ arg max Qn (Un )
|Un |≤K
look like?
Consistency: Subject to certain technical assumptions, Un∗ → U ∗
where U ∗ is a partition of D characterized as the solution to a
(deterministic) continuum optimization problem.
Consistency of Clustering Methods
I
K-Means (Pollard 1981)
I
Spectral Clustering (von Luxburg, Belkin, & Bosquet 2008)
I
Modularity (Bickel & Chen 2009, Zhao, Levina, & Zhu 2012)
I
Cheeger Cut (Garcı́a-Trillos, Slepčev, von Brecht, Laurent, &
Bresson 2014)
Outline
Introduction to Modularity Clustering
Pointwise Convergence
Convergence of Optimal Partitions
Key Identity
Let Un = {Un,k }K
k=1 be a partition of Xn , and let un,k = 1Un,k .
Key Identity
Let Un = {Un,k }K
k=1 be a partition of Xn , and let un,k = 1Un,k .
Then
1 − 1/K − Qn (Un ) =
K
2
1 X X α
d
u
(X
)
−
1/K
i
n,k
i
S2
+ n
i
k=1
K
2 X
n
4m
GTVn (un,k ),
k=1
where
GTVn (u) :=
1 1 X
ηn (Xi − Xj )|u(Xi ) − u(Xj )|.
n n2
i,j
Continuum Partitioning
Domain D ⊂ Rd , fixed K ≥ 1, and partition U = {Uk }K
k=1 of D.
Continuum Partitioning
Domain D ⊂ Rd , fixed K ≥ 1, and partition U = {Uk }K
k=1 of D.
I
U is balanced with respect to µ if µ(Uk ) = 1/K for k = 1, . . . , K .
Continuum Partitioning
Domain D ⊂ Rd , fixed K ≥ 1, and partition U = {Uk }K
k=1 of D.
I
U is balanced with respect to µ if µ(Uk ) = 1/K for k = 1, . . . , K .
I
The perimeter of Uk in D, with respect to a weight ρ2 , is
Z
Per(Uk ; ρ2 ) =
ρ2 (x) dHd−1 (x).
∂Uk
Continuum Partitioning
Domain D ⊂ Rd , fixed K ≥ 1, and partition U = {Uk }K
k=1 of D.
I
U is balanced with respect to µ if µ(Uk ) = 1/K for k = 1, . . . , K .
I
The perimeter of Uk in D, with respect to a weight ρ2 , is
Z
Per(Uk ; ρ2 ) =
ρ2 (x) dHd−1 (x).
∂Uk
More generally,
Per(Uk ; ρ2 ) = TV (1Uk ; ρ2 ) :=
Z
sup
1Uk (x)div Φ(x) dx.
Φ∈Cc1 (D;Rd )
|Φ(x)|≤ρ2 (x)
Remark: When f smooth, TV (f ; ρ2 ) =
R
D
D
|∇f |ρ2 (x) dx.
Continuum Partitioning
∗
U = arg min
K
X
|U |=K
k=1
µ(Uk )=1/K
Per(Uk ; ρ2 ).
Continuum Partitioning
∗
U = arg min
K
X
Per(Uk ; ρ2 ).
|U |=K
k=1
µ(Uk )=1/K
(a) K = 4
(b) K = 9.
(c) K = 25.
Figure: Local minimizers on D = (0, 1)2 , with ρ(x) = 1, dµ = dx,
produced using The Surface Evolver.
“Pointwise” Convergence
A (finite perimeter) partition U of D induces a partition Un of
Xn ⊂ D.
“Pointwise” Convergence
A (finite perimeter) partition U of D induces a partition Un of
Xn ⊂ D.
“Pointwise” Convergence
A (finite perimeter) partition U of D induces a partition Un of
Xn ⊂ D.
“Pointwise” Convergence
What is the behavior of Qn as n → ∞?
“Pointwise” Convergence
What is the behavior of Qn as n → ∞?
Theorem (Asymptotics)
Let U be a finite perimeter partition. Suppose {n } satisfies
P∞
P
(d+1)/2
d
) < ∞ when α = 0, 1 and ∞
n=1 exp(−nn
n=1 exp(−nn ) < ∞
otherwise.
“Pointwise” Convergence
What is the behavior of Qn as n → ∞?
Theorem (Asymptotics)
Let U be a finite perimeter partition. Suppose {n } satisfies
P∞
P
(d+1)/2
d
) < ∞ when α = 0, 1 and ∞
n=1 exp(−nn
n=1 exp(−nn ) < ∞
otherwise. Then, as n → ∞,

2
C PK Per(U ; ρ2 ) if PK
= 0,
1 − 1/K − Qn (Un ) a.s.
η,ρ
k
k=1
k=1 µ(Uk ) − 1/K
−−→
∞
n
otherwise,
where
dµ(x) = R
R
ρ1+α (x) dx
n η(x)|x1 | dx
and Cη,ρ = R R 2
.
1+α (x) dx
ρ
2
ρ (x) dx
D
D
Sketch of Proof: Convergence of Graph Total Variation
Recall
GTVn (u) =
1 1 X
ηn (Xi − Xj )|u(Xi ) − u(Xj )|.
n n 2
i,j
Sketch of Proof: Convergence of Graph Total Variation
Recall
GTVn (u) =
1 1 X
ηn (Xi − Xj )|u(Xi ) − u(Xj )|.
n n 2
i,j
Proposition
Fix u = 1U , and let {n }n∈N be a sequence converging to zero such that
∞
X
exp(−nn(d+1)/2 ) < +∞.
n=1
Then
a.s.
GTVn (u) −−→ ση TV (u; ρ2 ).
Sketch of Proof: Convergence of Graph Total Variation
Recall
GTVn (u) =
1 1 X
ηn (Xi − Xj )|u(Xi ) − u(Xj )|.
n n 2
i,j
Proposition
Fix u = 1U , and let {n }n∈N be a sequence converging to zero such that
∞
X
exp(−nn(d+1)/2 ) < +∞.
n=1
Then
a.s.
GTVn (u) −−→ ση TV (u; ρ2 ).
Ingredients:
Sketch of Proof: Convergence of Graph Total Variation
Recall
GTVn (u) =
1 1 X
ηn (Xi − Xj )|u(Xi ) − u(Xj )|.
n n 2
i,j
Proposition
Fix u = 1U , and let {n }n∈N be a sequence converging to zero such that
∞
X
exp(−nn(d+1)/2 ) < +∞.
n=1
Then
a.s.
GTVn (u) −−→ ση TV (u; ρ2 ).
Ingredients:
I Nonlocal TV (Ponce ‘04)
Sketch of Proof: Convergence of Graph Total Variation
Recall
GTVn (u) =
1 1 X
ηn (Xi − Xj )|u(Xi ) − u(Xj )|.
n n 2
i,j
Proposition
Fix u = 1U , and let {n }n∈N be a sequence converging to zero such that
∞
X
exp(−nn(d+1)/2 ) < +∞.
n=1
Then
a.s.
GTVn (u) −−→ ση TV (u; ρ2 ).
Ingredients:
I Nonlocal TV (Ponce ‘04)
I Exponential bounds for U-statistics (Giné, Latala, & Zinn ‘00)
Outline
Introduction to Modularity Clustering
Pointwise Convergence
Convergence of Optimal Partitions
Convergence of Partitions
Convergence of Partitions
U3,1
U3,2
Convergence of Partitions
γ3,1 ∈ P(D × {0, 1})
Convergence of Partitions
U1
γ3,1 ∈ P(D × {0, 1})
U2
Convergence of Partitions
γ3,1 ∈ P(D × {0, 1})
γ1 ∈ P(D × {0, 1})
Convergence of Partitions
Given partition Un = {Un,k }K
k=1 of Xn , we associate the measures
γn,k ∈ P(D × {0, 1}), for k = 1, . . . , K , by
n
γn,k
1X
δ(Xi ,1U (Xi )) = (Id × 1Un,k )] νn .
=
n,k
n
i=1
Convergence of Partitions
Given partition Un = {Un,k }K
k=1 of Xn , we associate the measures
γn,k ∈ P(D × {0, 1}), for k = 1, . . . , K , by
n
γn,k
1X
δ(Xi ,1U (Xi )) = (Id × 1Un,k )] νn .
=
n,k
n
i=1
Given partition U = {Uk }K
k=1 of D, we similarly define
γk = (Id × 1Uk )] ν.
Convergence of Partitions
Given partition Un = {Un,k }K
k=1 of Xn , we associate the measures
γn,k ∈ P(D × {0, 1}), for k = 1, . . . , K , by
n
γn,k
1X
δ(Xi ,1U (Xi )) = (Id × 1Un,k )] νn .
=
n,k
n
i=1
Given partition U = {Uk }K
k=1 of D, we similarly define
γk = (Id × 1Uk )] ν.
w
We say that Un −
→ U if there exists a sequence {πn }n∈N of
permutations of {1, . . . , K } such that
w
γn,πn k −
→ γk , for k = 1, . . . , K .
Convergence of Optimal Partitions
Theorem (Convergence of Optimal Partitions)
Suppose {n }n∈N satisfies suitable conditions. For n ≥ 1, let
Un∗ ∈ arg max|U |≤K Qn (U) be an optimal partition.
If U ∗ is the unique solution (up to relabeling of its constituent sets) to
the problem
K
X
minimize
Per(Uk ; ρ2 )
(P)
|U |=K
µ(Uk )=1/K k=1
with dµ(x) = ρ1+α (x) dx/
a.s.
R
D
ρ1+α (x) dx, then Un∗ −−→ U ∗ .
Convergence of Optimal Partitions
Theorem (Convergence of Optimal Partitions)
Suppose {n }n∈N satisfies suitable conditions. For n ≥ 1, let
Un∗ ∈ arg max|U |≤K Qn (U) be an optimal partition.
If U ∗ is the unique solution (up to relabeling of its constituent sets) to
the problem
K
X
minimize
Per(Uk ; ρ2 )
(P)
|U |=K
µ(Uk )=1/K k=1
R
a.s.
with dµ(x) = ρ1+α (x) dx/ D ρ1+α (x) dx, then Un∗ −−→ U ∗ . If there is
more than one solution to (P), then almost surely i) {Un∗ }n∈N has at least
one cluster point, and ii) every cluster point is a solution to (P).
Example
Remarks on Theorem
I
Weak convergence of measures via Wasserstein metric.
Remarks on Theorem
I
Weak convergence of measures via Wasserstein metric.
I
Useful tool: transport maps relating empirical measures νn to ν,
with bound on ∞-transport cost (Garcı́a-Trillos, Slepčev).
Remarks on Theorem
I
Weak convergence of measures via Wasserstein metric.
I
Useful tool: transport maps relating empirical measures νn to ν,
with bound on ∞-transport cost (Garcı́a-Trillos, Slepčev).
I
Because modularity clusterings are optimizers of discrete energies,
we use Γ-convergence to prove that their limit is the optimizer of a
continuum energy.
Remarks on Theorem
I
Weak convergence of measures via Wasserstein metric.
I
Useful tool: transport maps relating empirical measures νn to ν,
with bound on ∞-transport cost (Garcı́a-Trillos, Slepčev).
I
Because modularity clusterings are optimizers of discrete energies,
we use Γ-convergence to prove that their limit is the optimizer of a
continuum energy.
I
Balance constraint in the continuum problem presents a technical
difficulty.
Remarks on Theorem
I
Weak convergence of measures via Wasserstein metric.
I
Useful tool: transport maps relating empirical measures νn to ν,
with bound on ∞-transport cost (Garcı́a-Trillos, Slepčev).
I
Because modularity clusterings are optimizers of discrete energies,
we use Γ-convergence to prove that their limit is the optimizer of a
continuum energy.
I
Balance constraint in the continuum problem presents a technical
difficulty.
I
We modified the notion of Γ-convergence for random functionals to
allow the use of our pointwise convergence result.
Existence of Transport Maps
Let D ⊂ Rd be open, connected with Lipschitz boundary. Assume
ν = ρ(x) dx with ρ continuous and bounded above/below by
positive constants.
Proposition (Garcı́a-Trillos, Slepčev ‘14)
There is a constant C > 0 such that, with probability one, there exists a
sequence of transportation maps {Tn }n∈N , Tn] ν = νn with
lim sup
n→∞
lim sup
n→∞
lim sup
n→∞
n1/2 kId − Tn k∞
≤C
(2 log log n)1/2
(d = 1),
n1/d kId − Tn k∞
≤C
(log n)3/4
(d = 2),
n1/d kId − Tn k∞
≤C
(log n)1/d
(d ≥ 3).
Assumption on n
Assume {n }n∈N is such that n > 0 and n → 0. For α = 0, 1, assume
that
√
2 log log n 1
√
lim
= 0,
if d = 1
n→∞
n
n
(log n)3/4 1
=0
n→∞
n1/2 n
(log n)1/d 1
lim
=0
n→∞
n1/d n
lim
if d = 2
if d ≥ 3.
For α 6= 0, 1, assume that
∞
X
n=1
n exp(−nd+1
) < ∞.
n
The role of α
(a) Level sets
(b) α = −1
(c) α = 0
(d) α = 1
dµ(x) ∝ ρ(x)1+α , ρ(x) ∝ min(2 exp(−4||x − x0 ||2 ), 1/2).
Thanks for coming!
Download