Consistency of Modularity Clustering on Random Geometric Graphs Erik Davis The University of Arizona May 10, 2016 Outline Introduction to Modularity Clustering Pointwise Convergence Convergence of Optimal Partitions Graph Clustering G = (X , W ) Graph Clustering G = (X , W ) clustering = partition = coloring Graph Clustering G = (X , W ) 1 2m P i,j clustering = partition = coloring Wij δ(ci , cj ) where m = 1 2 P i,j Wij . Modularity (Newman & Girvan ‘04) Q(U) = 1 X di dj 1 X Wij δ(ci , cj ) − δ(ci , cj ) 2m 2m 2m i,j where di = P j Wij , U a clustering. i,j Modularity Clustering Modularity Clustering: U ∗ = arg max Q(U), |U |≤K Modularity Clustering Modularity Clustering: U ∗ = arg max Q(U), |U |≤K More generally (α-modularity): Q(U) = 1 X 1 X α α Wij δ(ci , cj ) − 2 di dj δ(ci , cj ) 2m S i,j where S = P i diα . i,j Random Geometric Graphs Fix open D ⊂ Rd with Lipschitz boundary, ν = ρ(x) dx probability measure on D. Random Geometric Graphs Fix open D ⊂ Rd with Lipschitz boundary, ν = ρ(x) dx probability measure on D. Take {Xi }i∈N i.i.d., and Xn = {Xi }ni=1 . Random Geometric Graphs Fix open D ⊂ Rd with Lipschitz boundary, ν = ρ(x) dx probability measure on D. Take {Xi }i∈N i.i.d., and Xn = {Xi }ni=1 . To assign weights, we pick a kernel η : Rd → R, length scale n . Random Geometric Graphs Fix open D ⊂ Rd with Lipschitz boundary, ν = ρ(x) dx probability measure on D. Take {Xi }i∈N i.i.d., and Xn = {Xi }ni=1 . To assign weights, we pick a kernel η : Rd → R, length scale n . ( X −X η i n j , if i 6= j, Wij = 0, otherwise. Random Geometric Graphs Fix open D ⊂ Rd with Lipschitz boundary, ν = ρ(x) dx probability measure on D. Take {Xi }i∈N i.i.d., and Xn = {Xi }ni=1 . To assign weights, we pick a kernel η : Rd → R, length scale n . ( X −X i j 1 =: ηn (Xi − Xj ), if i 6= j, dη n Wij = n 0, otherwise. Graph Gn = (Xn , W ). Questions Gn gives (random) modularity functional Qn . Questions Gn gives (random) modularity functional Qn . 1. What is the behavior of Qn as n → ∞? Questions Gn gives (random) modularity functional Qn . 1. What is the behavior of Qn as n → ∞? 2. What do optimal modularity clusterings Un∗ ∈ arg max Qn (Un ) |Un |≤K look like? Questions Gn gives (random) modularity functional Qn . 1. What is the behavior of Qn as n → ∞? 2. What do optimal modularity clusterings Un∗ ∈ arg max Qn (Un ) |Un |≤K look like? Consistency: Subject to certain technical assumptions, Un∗ → U ∗ where U ∗ is a partition of D characterized as the solution to a (deterministic) continuum optimization problem. Consistency of Clustering Methods I K-Means (Pollard 1981) I Spectral Clustering (von Luxburg, Belkin, & Bosquet 2008) I Modularity (Bickel & Chen 2009, Zhao, Levina, & Zhu 2012) I Cheeger Cut (Garcı́a-Trillos, Slepčev, von Brecht, Laurent, & Bresson 2014) Outline Introduction to Modularity Clustering Pointwise Convergence Convergence of Optimal Partitions Key Identity Let Un = {Un,k }K k=1 be a partition of Xn , and let un,k = 1Un,k . Key Identity Let Un = {Un,k }K k=1 be a partition of Xn , and let un,k = 1Un,k . Then 1 − 1/K − Qn (Un ) = K 2 1 X X α d u (X ) − 1/K i n,k i S2 + n i k=1 K 2 X n 4m GTVn (un,k ), k=1 where GTVn (u) := 1 1 X ηn (Xi − Xj )|u(Xi ) − u(Xj )|. n n2 i,j Continuum Partitioning Domain D ⊂ Rd , fixed K ≥ 1, and partition U = {Uk }K k=1 of D. Continuum Partitioning Domain D ⊂ Rd , fixed K ≥ 1, and partition U = {Uk }K k=1 of D. I U is balanced with respect to µ if µ(Uk ) = 1/K for k = 1, . . . , K . Continuum Partitioning Domain D ⊂ Rd , fixed K ≥ 1, and partition U = {Uk }K k=1 of D. I U is balanced with respect to µ if µ(Uk ) = 1/K for k = 1, . . . , K . I The perimeter of Uk in D, with respect to a weight ρ2 , is Z Per(Uk ; ρ2 ) = ρ2 (x) dHd−1 (x). ∂Uk Continuum Partitioning Domain D ⊂ Rd , fixed K ≥ 1, and partition U = {Uk }K k=1 of D. I U is balanced with respect to µ if µ(Uk ) = 1/K for k = 1, . . . , K . I The perimeter of Uk in D, with respect to a weight ρ2 , is Z Per(Uk ; ρ2 ) = ρ2 (x) dHd−1 (x). ∂Uk More generally, Per(Uk ; ρ2 ) = TV (1Uk ; ρ2 ) := Z sup 1Uk (x)div Φ(x) dx. Φ∈Cc1 (D;Rd ) |Φ(x)|≤ρ2 (x) Remark: When f smooth, TV (f ; ρ2 ) = R D D |∇f |ρ2 (x) dx. Continuum Partitioning ∗ U = arg min K X |U |=K k=1 µ(Uk )=1/K Per(Uk ; ρ2 ). Continuum Partitioning ∗ U = arg min K X Per(Uk ; ρ2 ). |U |=K k=1 µ(Uk )=1/K (a) K = 4 (b) K = 9. (c) K = 25. Figure: Local minimizers on D = (0, 1)2 , with ρ(x) = 1, dµ = dx, produced using The Surface Evolver. “Pointwise” Convergence A (finite perimeter) partition U of D induces a partition Un of Xn ⊂ D. “Pointwise” Convergence A (finite perimeter) partition U of D induces a partition Un of Xn ⊂ D. “Pointwise” Convergence A (finite perimeter) partition U of D induces a partition Un of Xn ⊂ D. “Pointwise” Convergence What is the behavior of Qn as n → ∞? “Pointwise” Convergence What is the behavior of Qn as n → ∞? Theorem (Asymptotics) Let U be a finite perimeter partition. Suppose {n } satisfies P∞ P (d+1)/2 d ) < ∞ when α = 0, 1 and ∞ n=1 exp(−nn n=1 exp(−nn ) < ∞ otherwise. “Pointwise” Convergence What is the behavior of Qn as n → ∞? Theorem (Asymptotics) Let U be a finite perimeter partition. Suppose {n } satisfies P∞ P (d+1)/2 d ) < ∞ when α = 0, 1 and ∞ n=1 exp(−nn n=1 exp(−nn ) < ∞ otherwise. Then, as n → ∞, 2 C PK Per(U ; ρ2 ) if PK = 0, 1 − 1/K − Qn (Un ) a.s. η,ρ k k=1 k=1 µ(Uk ) − 1/K −−→ ∞ n otherwise, where dµ(x) = R R ρ1+α (x) dx n η(x)|x1 | dx and Cη,ρ = R R 2 . 1+α (x) dx ρ 2 ρ (x) dx D D Sketch of Proof: Convergence of Graph Total Variation Recall GTVn (u) = 1 1 X ηn (Xi − Xj )|u(Xi ) − u(Xj )|. n n 2 i,j Sketch of Proof: Convergence of Graph Total Variation Recall GTVn (u) = 1 1 X ηn (Xi − Xj )|u(Xi ) − u(Xj )|. n n 2 i,j Proposition Fix u = 1U , and let {n }n∈N be a sequence converging to zero such that ∞ X exp(−nn(d+1)/2 ) < +∞. n=1 Then a.s. GTVn (u) −−→ ση TV (u; ρ2 ). Sketch of Proof: Convergence of Graph Total Variation Recall GTVn (u) = 1 1 X ηn (Xi − Xj )|u(Xi ) − u(Xj )|. n n 2 i,j Proposition Fix u = 1U , and let {n }n∈N be a sequence converging to zero such that ∞ X exp(−nn(d+1)/2 ) < +∞. n=1 Then a.s. GTVn (u) −−→ ση TV (u; ρ2 ). Ingredients: Sketch of Proof: Convergence of Graph Total Variation Recall GTVn (u) = 1 1 X ηn (Xi − Xj )|u(Xi ) − u(Xj )|. n n 2 i,j Proposition Fix u = 1U , and let {n }n∈N be a sequence converging to zero such that ∞ X exp(−nn(d+1)/2 ) < +∞. n=1 Then a.s. GTVn (u) −−→ ση TV (u; ρ2 ). Ingredients: I Nonlocal TV (Ponce ‘04) Sketch of Proof: Convergence of Graph Total Variation Recall GTVn (u) = 1 1 X ηn (Xi − Xj )|u(Xi ) − u(Xj )|. n n 2 i,j Proposition Fix u = 1U , and let {n }n∈N be a sequence converging to zero such that ∞ X exp(−nn(d+1)/2 ) < +∞. n=1 Then a.s. GTVn (u) −−→ ση TV (u; ρ2 ). Ingredients: I Nonlocal TV (Ponce ‘04) I Exponential bounds for U-statistics (Giné, Latala, & Zinn ‘00) Outline Introduction to Modularity Clustering Pointwise Convergence Convergence of Optimal Partitions Convergence of Partitions Convergence of Partitions U3,1 U3,2 Convergence of Partitions γ3,1 ∈ P(D × {0, 1}) Convergence of Partitions U1 γ3,1 ∈ P(D × {0, 1}) U2 Convergence of Partitions γ3,1 ∈ P(D × {0, 1}) γ1 ∈ P(D × {0, 1}) Convergence of Partitions Given partition Un = {Un,k }K k=1 of Xn , we associate the measures γn,k ∈ P(D × {0, 1}), for k = 1, . . . , K , by n γn,k 1X δ(Xi ,1U (Xi )) = (Id × 1Un,k )] νn . = n,k n i=1 Convergence of Partitions Given partition Un = {Un,k }K k=1 of Xn , we associate the measures γn,k ∈ P(D × {0, 1}), for k = 1, . . . , K , by n γn,k 1X δ(Xi ,1U (Xi )) = (Id × 1Un,k )] νn . = n,k n i=1 Given partition U = {Uk }K k=1 of D, we similarly define γk = (Id × 1Uk )] ν. Convergence of Partitions Given partition Un = {Un,k }K k=1 of Xn , we associate the measures γn,k ∈ P(D × {0, 1}), for k = 1, . . . , K , by n γn,k 1X δ(Xi ,1U (Xi )) = (Id × 1Un,k )] νn . = n,k n i=1 Given partition U = {Uk }K k=1 of D, we similarly define γk = (Id × 1Uk )] ν. w We say that Un − → U if there exists a sequence {πn }n∈N of permutations of {1, . . . , K } such that w γn,πn k − → γk , for k = 1, . . . , K . Convergence of Optimal Partitions Theorem (Convergence of Optimal Partitions) Suppose {n }n∈N satisfies suitable conditions. For n ≥ 1, let Un∗ ∈ arg max|U |≤K Qn (U) be an optimal partition. If U ∗ is the unique solution (up to relabeling of its constituent sets) to the problem K X minimize Per(Uk ; ρ2 ) (P) |U |=K µ(Uk )=1/K k=1 with dµ(x) = ρ1+α (x) dx/ a.s. R D ρ1+α (x) dx, then Un∗ −−→ U ∗ . Convergence of Optimal Partitions Theorem (Convergence of Optimal Partitions) Suppose {n }n∈N satisfies suitable conditions. For n ≥ 1, let Un∗ ∈ arg max|U |≤K Qn (U) be an optimal partition. If U ∗ is the unique solution (up to relabeling of its constituent sets) to the problem K X minimize Per(Uk ; ρ2 ) (P) |U |=K µ(Uk )=1/K k=1 R a.s. with dµ(x) = ρ1+α (x) dx/ D ρ1+α (x) dx, then Un∗ −−→ U ∗ . If there is more than one solution to (P), then almost surely i) {Un∗ }n∈N has at least one cluster point, and ii) every cluster point is a solution to (P). Example Remarks on Theorem I Weak convergence of measures via Wasserstein metric. Remarks on Theorem I Weak convergence of measures via Wasserstein metric. I Useful tool: transport maps relating empirical measures νn to ν, with bound on ∞-transport cost (Garcı́a-Trillos, Slepčev). Remarks on Theorem I Weak convergence of measures via Wasserstein metric. I Useful tool: transport maps relating empirical measures νn to ν, with bound on ∞-transport cost (Garcı́a-Trillos, Slepčev). I Because modularity clusterings are optimizers of discrete energies, we use Γ-convergence to prove that their limit is the optimizer of a continuum energy. Remarks on Theorem I Weak convergence of measures via Wasserstein metric. I Useful tool: transport maps relating empirical measures νn to ν, with bound on ∞-transport cost (Garcı́a-Trillos, Slepčev). I Because modularity clusterings are optimizers of discrete energies, we use Γ-convergence to prove that their limit is the optimizer of a continuum energy. I Balance constraint in the continuum problem presents a technical difficulty. Remarks on Theorem I Weak convergence of measures via Wasserstein metric. I Useful tool: transport maps relating empirical measures νn to ν, with bound on ∞-transport cost (Garcı́a-Trillos, Slepčev). I Because modularity clusterings are optimizers of discrete energies, we use Γ-convergence to prove that their limit is the optimizer of a continuum energy. I Balance constraint in the continuum problem presents a technical difficulty. I We modified the notion of Γ-convergence for random functionals to allow the use of our pointwise convergence result. Existence of Transport Maps Let D ⊂ Rd be open, connected with Lipschitz boundary. Assume ν = ρ(x) dx with ρ continuous and bounded above/below by positive constants. Proposition (Garcı́a-Trillos, Slepčev ‘14) There is a constant C > 0 such that, with probability one, there exists a sequence of transportation maps {Tn }n∈N , Tn] ν = νn with lim sup n→∞ lim sup n→∞ lim sup n→∞ n1/2 kId − Tn k∞ ≤C (2 log log n)1/2 (d = 1), n1/d kId − Tn k∞ ≤C (log n)3/4 (d = 2), n1/d kId − Tn k∞ ≤C (log n)1/d (d ≥ 3). Assumption on n Assume {n }n∈N is such that n > 0 and n → 0. For α = 0, 1, assume that √ 2 log log n 1 √ lim = 0, if d = 1 n→∞ n n (log n)3/4 1 =0 n→∞ n1/2 n (log n)1/d 1 lim =0 n→∞ n1/d n lim if d = 2 if d ≥ 3. For α 6= 0, 1, assume that ∞ X n=1 n exp(−nd+1 ) < ∞. n The role of α (a) Level sets (b) α = −1 (c) α = 0 (d) α = 1 dµ(x) ∝ ρ(x)1+α , ρ(x) ∝ min(2 exp(−4||x − x0 ||2 ), 1/2). Thanks for coming!