Consistency of Modularity Clustering on Random Geometric Graphs Erik Davis May 10, 2016

Consistency of Modularity Clustering on Random Geometric Graphs Erik Davis The University of Arizona May 10, 2016 Outline Introduction to Modularity Clustering Pointwise Convergence Convergence of Optimal Partitions Graph Clustering G = (X , W ) Graph Clustering G = (X , W ) clustering = partition = coloring Graph Clustering G = (X , W ) 1 2m P i,j clustering = partition = coloring Wij δ(ci , cj ) where m = 1 2 P i,j Wij . Modularity (Newman & Girvan ‘04) Q(U) = 1 X di dj 1 X Wij δ(ci , cj ) − δ(ci , cj ) 2m 2m 2m i,j where di = P j Wij , U a clustering. i,j Modularity Clustering Modularity Clustering: U ∗ = arg max Q(U), |U |≤K Modularity Clustering Modularity Clustering: U ∗ = arg max Q(U), |U |≤K More generally (α-modularity): Q(U) = 1 X 1 X α α Wij δ(ci , cj ) − 2 di dj δ(ci , cj ) 2m S i,j where S = P i diα . i,j Random Geometric Graphs Fix open D ⊂ Rd with Lipschitz boundary, ν = ρ(x) dx probability measure on D. Random Geometric Graphs Fix open D ⊂ Rd with Lipschitz boundary, ν = ρ(x) dx probability measure on D. Take {Xi }i∈N i.i.d., and Xn = {Xi }ni=1 . Random Geometric Graphs Fix open D ⊂ Rd with Lipschitz boundary, ν = ρ(x) dx probability measure on D. Take {Xi }i∈N i.i.d., and Xn = {Xi }ni=1 . To assign weights, we pick a kernel η : Rd → R, length scale n . Random Geometric Graphs Fix open D ⊂ Rd with Lipschitz boundary, ν = ρ(x) dx probability measure on D. Take {Xi }i∈N i.i.d., and Xn = {Xi }ni=1 . To assign weights, we pick a kernel η : Rd → R, length scale n . ( X −X η i n j , if i 6= j, Wij = 0, otherwise. Random Geometric Graphs Fix open D ⊂ Rd with Lipschitz boundary, ν = ρ(x) dx probability measure on D. Take {Xi }i∈N i.i.d., and Xn = {Xi }ni=1 . To assign weights, we pick a kernel η : Rd → R, length scale n . ( X −X i j 1 =: ηn (Xi − Xj ), if i 6= j, dη n Wij = n 0, otherwise. Graph Gn = (Xn , W ). Questions Gn gives (random) modularity functional Qn . Questions Gn gives (random) modularity functional Qn . 1. What is the behavior of Qn as n → ∞? Questions Gn gives (random) modularity functional Qn . 1. What is the behavior of Qn as n → ∞? 2. What do optimal modularity clusterings Un∗ ∈ arg max Qn (Un ) |Un |≤K look like? Questions Gn gives (random) modularity functional Qn . 1. What is the behavior of Qn as n → ∞? 2. What do optimal modularity clusterings Un∗ ∈ arg max Qn (Un ) |Un |≤K look like? Consistency: Subject to certain technical assumptions, Un∗ → U ∗ where U ∗ is a partition of D characterized as the solution to a (deterministic) continuum optimization problem. Consistency of Clustering Methods I K-Means (Pollard 1981) I Spectral Clustering (von Luxburg, Belkin, & Bosquet 2008) I Modularity (Bickel & Chen 2009, Zhao, Levina, & Zhu 2012) I Cheeger Cut (Garcı́a-Trillos, Slepčev, von Brecht, Laurent, & Bresson 2014) Outline Introduction to Modularity Clustering Pointwise Convergence Convergence of Optimal Partitions Key Identity Let Un = {Un,k }K k=1 be a partition of Xn , and let un,k = 1Un,k . Key Identity Let Un = {Un,k }K k=1 be a partition of Xn , and let un,k = 1Un,k . Then 1 − 1/K − Qn (Un ) = K 2 1 X X α d u (X ) − 1/K i n,k i S2 + n i k=1 K 2 X n 4m GTVn (un,k ), k=1 where GTVn (u) := 1 1 X ηn (Xi − Xj )|u(Xi ) − u(Xj )|. n n2 i,j Continuum Partitioning Domain D ⊂ Rd , fixed K ≥ 1, and partition U = {Uk }K k=1 of D. Continuum Partitioning Domain D ⊂ Rd , fixed K ≥ 1, and partition U = {Uk }K k=1 of D. I U is balanced with respect to µ if µ(Uk ) = 1/K for k = 1, . . . , K . Continuum Partitioning Domain D ⊂ Rd , fixed K ≥ 1, and partition U = {Uk }K k=1 of D. I U is balanced with respect to µ if µ(Uk ) = 1/K for k = 1, . . . , K . I The perimeter of Uk in D, with respect to a weight ρ2 , is Z Per(Uk ; ρ2 ) = ρ2 (x) dHd−1 (x). ∂Uk Continuum Partitioning Domain D ⊂ Rd , fixed K ≥ 1, and partition U = {Uk }K k=1 of D. I U is balanced with respect to µ if µ(Uk ) = 1/K for k = 1, . . . , K . I The perimeter of Uk in D, with respect to a weight ρ2 , is Z Per(Uk ; ρ2 ) = ρ2 (x) dHd−1 (x). ∂Uk More generally, Per(Uk ; ρ2 ) = TV (1Uk ; ρ2 ) := Z sup 1Uk (x)div Φ(x) dx. Φ∈Cc1 (D;Rd ) |Φ(x)|≤ρ2 (x) Remark: When f smooth, TV (f ; ρ2 ) = R D D |∇f |ρ2 (x) dx. Continuum Partitioning ∗ U = arg min K X |U |=K k=1 µ(Uk )=1/K Per(Uk ; ρ2 ). Continuum Partitioning ∗ U = arg min K X Per(Uk ; ρ2 ). |U |=K k=1 µ(Uk )=1/K (a) K = 4 (b) K = 9. (c) K = 25. Figure: Local minimizers on D = (0, 1)2 , with ρ(x) = 1, dµ = dx, produced using The Surface Evolver. “Pointwise” Convergence A (finite perimeter) partition U of D induces a partition Un of Xn ⊂ D. “Pointwise” Convergence A (finite perimeter) partition U of D induces a partition Un of Xn ⊂ D. “Pointwise” Convergence A (finite perimeter) partition U of D induces a partition Un of Xn ⊂ D. “Pointwise” Convergence What is the behavior of Qn as n → ∞? “Pointwise” Convergence What is the behavior of Qn as n → ∞? Theorem (Asymptotics) Let U be a finite perimeter partition. Suppose {n } satisfies P∞ P (d+1)/2 d ) < ∞ when α = 0, 1 and ∞ n=1 exp(−nn n=1 exp(−nn ) < ∞ otherwise. “Pointwise” Convergence What is the behavior of Qn as n → ∞? Theorem (Asymptotics) Let U be a finite perimeter partition. Suppose {n } satisfies P∞ P (d+1)/2 d ) < ∞ when α = 0, 1 and ∞ n=1 exp(−nn n=1 exp(−nn ) < ∞ otherwise. Then, as n → ∞,  2 C PK Per(U ; ρ2 ) if PK = 0, 1 − 1/K − Qn (Un ) a.s. η,ρ k k=1 k=1 µ(Uk ) − 1/K −−→ ∞ n otherwise, where dµ(x) = R R ρ1+α (x) dx n η(x)|x1 | dx and Cη,ρ = R R 2 . 1+α (x) dx ρ 2 ρ (x) dx D D Sketch of Proof: Convergence of Graph Total Variation Recall GTVn (u) = 1 1 X ηn (Xi − Xj )|u(Xi ) − u(Xj )|. n n 2 i,j Sketch of Proof: Convergence of Graph Total Variation Recall GTVn (u) = 1 1 X ηn (Xi − Xj )|u(Xi ) − u(Xj )|. n n 2 i,j Proposition Fix u = 1U , and let {n }n∈N be a sequence converging to zero such that ∞ X exp(−nn(d+1)/2 ) < +∞. n=1 Then a.s. GTVn (u) −−→ ση TV (u; ρ2 ). Sketch of Proof: Convergence of Graph Total Variation Recall GTVn (u) = 1 1 X ηn (Xi − Xj )|u(Xi ) − u(Xj )|. n n 2 i,j Proposition Fix u = 1U , and let {n }n∈N be a sequence converging to zero such that ∞ X exp(−nn(d+1)/2 ) < +∞. n=1 Then a.s. GTVn (u) −−→ ση TV (u; ρ2 ). Ingredients: Sketch of Proof: Convergence of Graph Total Variation Recall GTVn (u) = 1 1 X ηn (Xi − Xj )|u(Xi ) − u(Xj )|. n n 2 i,j Proposition Fix u = 1U , and let {n }n∈N be a sequence converging to zero such that ∞ X exp(−nn(d+1)/2 ) < +∞. n=1 Then a.s. GTVn (u) −−→ ση TV (u; ρ2 ). Ingredients: I Nonlocal TV (Ponce ‘04) Sketch of Proof: Convergence of Graph Total Variation Recall GTVn (u) = 1 1 X ηn (Xi − Xj )|u(Xi ) − u(Xj )|. n n 2 i,j Proposition Fix u = 1U , and let {n }n∈N be a sequence converging to zero such that ∞ X exp(−nn(d+1)/2 ) < +∞. n=1 Then a.s. GTVn (u) −−→ ση TV (u; ρ2 ). Ingredients: I Nonlocal TV (Ponce ‘04) I Exponential bounds for U-statistics (Giné, Latala, & Zinn ‘00) Outline Introduction to Modularity Clustering Pointwise Convergence Convergence of Optimal Partitions Convergence of Partitions Convergence of Partitions U3,1 U3,2 Convergence of Partitions γ3,1 ∈ P(D × {0, 1}) Convergence of Partitions U1 γ3,1 ∈ P(D × {0, 1}) U2 Convergence of Partitions γ3,1 ∈ P(D × {0, 1}) γ1 ∈ P(D × {0, 1}) Convergence of Partitions Given partition Un = {Un,k }K k=1 of Xn , we associate the measures γn,k ∈ P(D × {0, 1}), for k = 1, . . . , K , by n γn,k 1X δ(Xi ,1U (Xi )) = (Id × 1Un,k )] νn . = n,k n i=1 Convergence of Partitions Given partition Un = {Un,k }K k=1 of Xn , we associate the measures γn,k ∈ P(D × {0, 1}), for k = 1, . . . , K , by n γn,k 1X δ(Xi ,1U (Xi )) = (Id × 1Un,k )] νn . = n,k n i=1 Given partition U = {Uk }K k=1 of D, we similarly define γk = (Id × 1Uk )] ν. Convergence of Partitions Given partition Un = {Un,k }K k=1 of Xn , we associate the measures γn,k ∈ P(D × {0, 1}), for k = 1, . . . , K , by n γn,k 1X δ(Xi ,1U (Xi )) = (Id × 1Un,k )] νn . = n,k n i=1 Given partition U = {Uk }K k=1 of D, we similarly define γk = (Id × 1Uk )] ν. w We say that Un − → U if there exists a sequence {πn }n∈N of permutations of {1, . . . , K } such that w γn,πn k − → γk , for k = 1, . . . , K . Convergence of Optimal Partitions Theorem (Convergence of Optimal Partitions) Suppose {n }n∈N satisfies suitable conditions. For n ≥ 1, let Un∗ ∈ arg max|U |≤K Qn (U) be an optimal partition. If U ∗ is the unique solution (up to relabeling of its constituent sets) to the problem K X minimize Per(Uk ; ρ2 ) (P) |U |=K µ(Uk )=1/K k=1 with dµ(x) = ρ1+α (x) dx/ a.s. R D ρ1+α (x) dx, then Un∗ −−→ U ∗ . Convergence of Optimal Partitions Theorem (Convergence of Optimal Partitions) Suppose {n }n∈N satisfies suitable conditions. For n ≥ 1, let Un∗ ∈ arg max|U |≤K Qn (U) be an optimal partition. If U ∗ is the unique solution (up to relabeling of its constituent sets) to the problem K X minimize Per(Uk ; ρ2 ) (P) |U |=K µ(Uk )=1/K k=1 R a.s. with dµ(x) = ρ1+α (x) dx/ D ρ1+α (x) dx, then Un∗ −−→ U ∗ . If there is more than one solution to (P), then almost surely i) {Un∗ }n∈N has at least one cluster point, and ii) every cluster point is a solution to (P). Example Remarks on Theorem I Weak convergence of measures via Wasserstein metric. Remarks on Theorem I Weak convergence of measures via Wasserstein metric. I Useful tool: transport maps relating empirical measures νn to ν, with bound on ∞-transport cost (Garcı́a-Trillos, Slepčev). Remarks on Theorem I Weak convergence of measures via Wasserstein metric. I Useful tool: transport maps relating empirical measures νn to ν, with bound on ∞-transport cost (Garcı́a-Trillos, Slepčev). I Because modularity clusterings are optimizers of discrete energies, we use Γ-convergence to prove that their limit is the optimizer of a continuum energy. Remarks on Theorem I Weak convergence of measures via Wasserstein metric. I Useful tool: transport maps relating empirical measures νn to ν, with bound on ∞-transport cost (Garcı́a-Trillos, Slepčev). I Because modularity clusterings are optimizers of discrete energies, we use Γ-convergence to prove that their limit is the optimizer of a continuum energy. I Balance constraint in the continuum problem presents a technical difficulty. Remarks on Theorem I Weak convergence of measures via Wasserstein metric. I Useful tool: transport maps relating empirical measures νn to ν, with bound on ∞-transport cost (Garcı́a-Trillos, Slepčev). I Because modularity clusterings are optimizers of discrete energies, we use Γ-convergence to prove that their limit is the optimizer of a continuum energy. I Balance constraint in the continuum problem presents a technical difficulty. I We modified the notion of Γ-convergence for random functionals to allow the use of our pointwise convergence result. Existence of Transport Maps Let D ⊂ Rd be open, connected with Lipschitz boundary. Assume ν = ρ(x) dx with ρ continuous and bounded above/below by positive constants. Proposition (Garcı́a-Trillos, Slepčev ‘14) There is a constant C > 0 such that, with probability one, there exists a sequence of transportation maps {Tn }n∈N , Tn] ν = νn with lim sup n→∞ lim sup n→∞ lim sup n→∞ n1/2 kId − Tn k∞ ≤C (2 log log n)1/2 (d = 1), n1/d kId − Tn k∞ ≤C (log n)3/4 (d = 2), n1/d kId − Tn k∞ ≤C (log n)1/d (d ≥ 3). Assumption on n Assume {n }n∈N is such that n > 0 and n → 0. For α = 0, 1, assume that √ 2 log log n 1 √ lim = 0, if d = 1 n→∞ n n (log n)3/4 1 =0 n→∞ n1/2 n (log n)1/d 1 lim =0 n→∞ n1/d n lim if d = 2 if d ≥ 3. For α 6= 0, 1, assume that ∞ X n=1 n exp(−nd+1 ) < ∞. n The role of α (a) Level sets (b) α = −1 (c) α = 0 (d) α = 1 dµ(x) ∝ ρ(x)1+α , ρ(x) ∝ min(2 exp(−4||x − x0 ||2 ), 1/2). Thanks for coming!

Consistency of Modularity Clustering on Random Geometric Graphs Erik Davis May 10, 2016

Related documents

Products

Support

Consistency of Modularity Clustering on Random Geometric Graphs Erik Davis May 10, 2016

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib