Estimation of Latent Variable Densities in Networks Sharmodeep Bhattacharyya Department of Statistics University of California, Berkeley and Oregon State University Workshop on Theory of Big Data, UCL, January, 2015 (Joint work with Peter J. Bickel, UC Berkeley and Patrick J. Wolfe, UCL) Sharmodeep Bhattacharyya (berkeley) Networks January 8, 2015 1 / 32 Outline 1 Introduction and Motivation 2 Feature and Models of Networks Nonparametric Latent Space Models Density Functional Estimation Estimation of Latent Variable Density Regularization 3 Summary Sharmodeep Bhattacharyya (berkeley) Networks January 8, 2015 2 / 32 Introduction and Motivation Network Data G = (V , E ): undirected graph and V = {v1 , · · · , vn } arbitrarily labeled vertices. Adjacency matrices (Symmetric), [Aij ]ni,j=1 numerically represent network data: Aij = Sharmodeep Bhattacharyya (berkeley) 1 if node i links to node j, 0 otherwise. Networks January 8, 2015 3 / 32 Introduction and Motivation Example: Collegiate Social Network Figure : Facebook network adjacency matrix for two different colleges in two different rows Sharmodeep (Traud et.Bhattacharyya al. (2011)(berkeley) SIAM Review). Networks January 8, 2015 4 / 32 Feature and Models of Networks Nonparametric Models Outline 1 Introduction and Motivation 2 Feature and Models of Networks Nonparametric Latent Space Models Density Functional Estimation Estimation of Latent Variable Density Regularization 3 Summary Sharmodeep Bhattacharyya (berkeley) Networks January 8, 2015 5 / 32 Feature and Models of Networks Nonparametric Models Nonparametric Latent Variable Models Derived from representation of exchangeable random infinite array by Aldous and Hoover (1983). NP Model Define P({Aij }ni,j=1 ) conditionally given latent variables {ξi }ni=1 associated with vertices {vi }ni=1 respectively. (Bickel & Chen (2009), Bollobás et.al. (2007), Hoff et.al. (2002)). ξ1 , . . . , ξn Pr(Aij = 1|ξi = u, ξj = v ) iid ∼ U(0, 1) = hn (u, v ) = ρn w (u, v ), w (u, v ) is the conditional latent variable density given Aij = 1. Define λn ≡ nρn as the expected degree parameter and P = [Pij ]ni,j = [ρn w (ξi , ξj )]ni,j . hn : not uniquely defined. hn ϕ(u), ϕ(v ) , with measure-preserving ϕ, gives same model. Sharmodeep Bhattacharyya (berkeley) Networks January 8, 2015 6 / 32 Feature and Models of Networks Nonparametric Models Stochastic Block Model (Holland, Laskey and Leinhardt 1983) A K -block stochastic block model with parameters (π, P) is defined as follows. Consider latent variable corresponding to vertices as z = (z 1 , z 2 , . . . , z n ) with z 1, . . . , z n iid ∼ Multinomial(1; (π1 , . . . , πK )) Pr(Aij = 1|z i , z j ) = Pz i z j , where P = [Pab ] is a K × K symmetric matrix for undirected networks. Sharmodeep Bhattacharyya (berkeley) Networks January 8, 2015 7 / 32 Feature and Models of Networks Nonparametric Models Parameters of Interest Density Functional Integral parameter on subgraph, R is defined as integral P(R) = E Y (i,j)∈R h(ξi , ξj ) Y (1 − h(ξi , ξj )) (i,j)∈R̄ where, R̄ = {(i, j) ∈ / R, i ∈ V (G ), j ∈ V (G )}. Density Estimate a representation of the latent variable density w or h. Estimate equivalence class of latent variable density w or h with respect to norms of the form of cut-metric (Lovász (2006)). Sharmodeep Bhattacharyya (berkeley) Networks January 8, 2015 8 / 32 Feature and Models of Networks Density Functional Estimation Outline 1 Introduction and Motivation 2 Feature and Models of Networks Nonparametric Latent Space Models Density Functional Estimation Estimation of Latent Variable Density Regularization 3 Summary Sharmodeep Bhattacharyya (berkeley) Networks January 8, 2015 9 / 32 Feature and Models of Networks Density Functional Estimation Empirical “Moments”/ Count statistics Count statistics are normalized subgraph counts and smooth functions of them. The subgraph count,P̂(R), for subgraph R is P̂(R) = 1 X n p |Hom(R)| S⊆Kn ,S ∼ =R 1(S ⊆ G ) (1) where, Hom(R) is the group of Homomorphisms of R and Kn is the complete graph on n vertices. Examples (a) Average degree of a network is a count statistic, D̄ = 1 n Pn i=1 Di and Di = P j6=i Aij . (b) Another well-known statistic is Transitivity = Sharmodeep Bhattacharyya (berkeley) Normalized Count of ∆ Normalized Count of ∆ + ‘V 0 Networks January 8, 2015 10 / 32 Feature and Models of Networks Density Functional Estimation Computation of Count and Variance of Count Statistics Counts: Worst case computational complexity of exact counting of number of subgraphs, R in Gn is O(np ), where, p = |V (R)|. Computational complexity varies with subgraph and sparsity of graph. For dense graphs and complex patterns, the approximate counts are very crude. Variances: Finding variances of complex patterns also become theoretically challenging. So, instead of exact counting we try approximate counting (Similar idea used by Holmes and Reinert (2004)). Sharmodeep Bhattacharyya (berkeley) Networks January 8, 2015 11 / 32 Feature and Models of Networks Density Functional Estimation Bootstrap Scheme 1 For b th iterate of the bootstrap, b = 1, . . . , B, 2 Fix p = Size of R = |V (R)|. 3 Perform random breadth-first search described in Wernicke (2006) with a set of sampling probabilities (q1 , . . . , qp ) 4 Calculate P̂b (R), given by formula 1 P̂b (R) = Qp P̄B (R) = B 1 X P̂b (R) B b=1 X n d=1 qd p |Hom(R)| S∈S R p 1(S ∼ = R) where, SpR is the set of all size-p randomly selected subgraphs of G . Sharmodeep Bhattacharyya (berkeley) Networks January 8, 2015 12 / 32 Feature and Models of Networks Density Functional Estimation Bootstrap Theorem Theorem (B. and Bickel (2013)) Suppose R is fixed, acyclic with |V (R)| = p and R∞R∞ 0 B → ∞ and qd → 0 for all d = 1, . . . , p such that B Qp d=2 qd ≥ 1 np−1 ρen 0 1 B Ä w 2|R| (u, v )dudv < ∞. For 1 q1 ä − 1 → 0 and and n → ∞, λn → ∞ and under G generated from (1), then, (i) √ n Ä å Ç −e ρ̂n P̄B (R) − ρ−e n P(R) σ̂B2 (R) ä (ii) Given G , Var ρ−e n P̂b (R)|G = O Ä 1 q1 −1 ä 1 n + ⇒ N(0, 1) 1 nρe−p+1 n · Qp 1 d=2 λn qd (2) . (iii) We can set bootstrap confidence interval for P(R). Sharmodeep Bhattacharyya (berkeley) Networks January 8, 2015 13 / 32 Feature and Models of Networks Density Functional Estimation A General Principle for Estimating Variance (a) (b) If p = |V (R)|, e = |E (R)|, Ç Var T (R) ρen å 1 =Ä ä2 E ρen pn |Iso(R)| Sharmodeep Bhattacharyya (berkeley) X S,T ⊆Kn S,T ∼ =R,S∩T 6=φ Networks 1(S, T ⊆ H) January 8, 2015 14 / 32 Feature and Models of Networks Density Functional Estimation A General Principle for Estimating Variance (c) (d) If p = |V (R)|, e = |E (R)|, Ç Var T (R) ρen å 1 ≈Ä ä2 n e ρn p |Iso(R)| Sharmodeep Bhattacharyya (berkeley) X 1(W ⊆ G ) W ⊆Kn ,W =S∪T ,S,T ∼ =R,|S∩T |=1 Networks January 8, 2015 15 / 32 Feature and Models of Networks Estimation of Latent Variable Density Outline 1 Introduction and Motivation 2 Feature and Models of Networks Nonparametric Latent Space Models Density Functional Estimation Estimation of Latent Variable Density Regularization 3 Summary Sharmodeep Bhattacharyya (berkeley) Networks January 8, 2015 16 / 32 Feature and Models of Networks Estimation of Latent Variable Density Block Model Approximation For fixed number of communities to be K , a community assignment function z assigns community based on symmetric matrix Mn×n , defined as z(M)(i) ≡ z i (M) : {1, . . . , n} → {1, . . . , K } (3) The metric we will mainly refer to are (i) kw1 − w2 k22 = inf σ R1R1 0 0 (w1 − w2 )2 (u, σv )dudv (ii) kz (1) − z (2) kH = inf π H(z (1) , π ◦ z (2) ). where, σ : [0, 1] → [0, 1] measure-preserving transformation, π: any permutation of {1, . . . , n} and H is normalized Hamming distance H(z (1) , z (2) ) = Sharmodeep Bhattacharyya (berkeley) n 1X (1) (2) 1 z i 6= z i n i=1 Networks (4) January 8, 2015 17 / 32 Feature and Models of Networks Estimation of Latent Variable Density Block Model Approximation Given z(M), we can form an K × K mean matrix M̄ z from any symmetric matrix Mn×n z M̄ab ≡ n X n 1 X Mij 1 (z i = a, z j = b) , Oab i=1 j=1 1 ≤ a, b ≤ K , (5) where, Oab ≡ na nb , 1 ≤ a, b ≤ K , a 6= b na (na − 1), 1 ≤ a ≤ K, a = b where, na ≡ n X 1 (z i = a) , 1≤a≤K i=1 Sharmodeep Bhattacharyya (berkeley) Networks January 8, 2015 18 / 32 Feature and Models of Networks Estimation of Latent Variable Density Block Model Approximation Sharmodeep Bhattacharyya (berkeley) Networks January 8, 2015 19 / 32 Feature and Models of Networks Estimation of Latent Variable Density Estimation of Latent Variable Density Now, we define the estimate of latent variable density w , based on adjacency matrix An×n as, z(A) ŵ (x, y ; z) ≡ ρ̂−1 Āz G (x) (A),z G (y ) (A) , (x, y ) ∈ [0, 1]2 (6) where, ρ̂ = 1 X n 2 Sharmodeep Bhattacharyya (berkeley) ß Aij and G (x) ≡ min i∈[n] i>j Networks i ≥x n ™ (7) January 8, 2015 20 / 32 Feature and Models of Networks Estimation of Latent Variable Density Estimation of Latent Variable Density Sharmodeep Bhattacharyya (berkeley) Networks January 8, 2015 21 / 32 Feature and Models of Networks Estimation of Latent Variable Density Assumptions Let w0 be the true latent variable density. Define z 0 ≡ z(P), w0 (·, ·; z 0 ) by replacing A by P and ρ̂ by ρ in (6). Define ẑ ≡ z(A). Assumptions A1 Assumption on w0 : w0 ≤ M0 < ∞. A2 Assumption on w0 and z: n∧ (z 0 ) ≥ Kn and n∨ (z 0 ) ≤ 1 n K. A3 Assumption on w0 and z: kw0 (·, ·) − w0 (·, ·; z 0 )k2 ≤ µn → 0. Under conditions we can show µn ≤ M1 . 2 K 2 A4 Assumption on z: kẑ − z 0 kH = OP (∆n (K )) where, ∆n → 0. Sharmodeep Bhattacharyya (berkeley) Networks January 8, 2015 22 / 32 Feature and Models of Networks Estimation of Latent Variable Density Main Theorem Theorem (B., Bickel and Wolfe (2014)) Let An×n be the adjacency matrix of a simple random graph under model equation. Under assumptions A1-A4 and for community assignment function z, ß Å kw0 (·, ·) − ŵ (·, ·; ẑ)k2 = max O(µn (K )), OP Sharmodeep Bhattacharyya (berkeley) Networks ã ™ Ä ä p K , OP K 3/2 ρn ∆n . nρn January 8, 2015 23 / 32 Feature and Models of Networks Estimation of Latent Variable Density Methods of Obtaining ŵ Existing Methods Olhede, Wolfe (2013) proposed a scheme using profile likelihood as the estimation method. Airoldi, Chan (2014) proposed a method using degree distribution as the estimation method. Latouche and Robin (2013) and Lloyd et.al. (2013) proposed Bayesian methods for exchangeable network model inference. Gao, Lu and Zhou (2014) give minimax rates for dense case. Generalization Any estimation method of block model, satisfying conditions on estimation error, can be used to give ŵ from (6). Examples of method include maximum likelihood, variational likelihood, spectral clustering, SDP relaxation and other sufficiently accurate clustering schemes. Sharmodeep Bhattacharyya (berkeley) Networks January 8, 2015 24 / 32 Feature and Models of Networks Estimation of Latent Variable Density Special Case: Spectral Clustering Theorem 2 Let An×n be the adjacency matrix of a simple random graph under model equation. Assume A1-A3 for spectral assignment function z sp and γn is the absolute difference between the K and (K + 1)th eigenvalue of P. As n → ∞, ß Å kw0 (·, ·) − ŵ (·, ·; ẑ)k2 = max O(µn (K )), OP ã ™ Ä ä p K , OP K 3/2 ρn ∆n . nρn where, ∆n (K ) = O Sharmodeep Bhattacharyya (berkeley) nK (||P − P̄z sp (P) || + ||A − P||)2 γn2 Networks ! January 8, 2015 25 / 32 Feature and Models of Networks Regularization Outline 1 Introduction and Motivation 2 Feature and Models of Networks Nonparametric Latent Space Models Density Functional Estimation Estimation of Latent Variable Density Regularization 3 Summary Sharmodeep Bhattacharyya (berkeley) Networks January 8, 2015 26 / 32 Feature and Models of Networks Regularization Regularization: Choice of K (Ongoing Work) One idea is using cross-validation by using the density functionals. For size-r subgraphs {ar }, Pr (ar ) ≡ Pr [Aij = aij : 1 ≤ i, j ≤ r ] Z 1 = 0 ··· Z 1 Y [ρn w (ξi , ξj )]aij [1 − ρn w (ξi , ξj )]1−aij dξ1 · · · dξr 0 1≤i<j≤r Define kPr − Qr k = X |P [Ar = ar ] − Q [Ar = ar ]| . ar ∈{0,1}r ×r Sharmodeep Bhattacharyya (berkeley) Networks January 8, 2015 27 / 32 Feature and Models of Networks Regularization Regularization: Choice of K (Ongoing Work) P̂r (K ) is obtained by using ŵ . kP̂r (K ) − Pk22 is estimated by kP̂r (K ) − Pˆr k22 . Lemma 4 If dcut (P̂(K ), P) ≤ ∆n and 0 < δ ≤ w , ŵ ≤ 1/δ, ÇÇ å MSE(K ) = kP̂r (K ) − Pr k = OP å r r 2 /2 2 ∆n (K ) . 2 Kopt = argmin kP̂r (K ) − Pˆr B k22 . (8) (9) K where, Pˆr B is the bootstrap estimate of Pˆr . Sharmodeep Bhattacharyya (berkeley) Networks January 8, 2015 28 / 32 Feature and Models of Networks Regularization Facebook Data Figure : The cross-validation test using r = 3 between actual Figure : Top left picture is the adjacency matrix of the network. The rest of the figures represent the ŵ generating the network for K = 8, 13, 22. Sharmodeep Bhattacharyya (berkeley) network and the estimated network with number of clusters K . Networks January 8, 2015 29 / 32 Conclusion Outline 1 Introduction and Motivation 2 Feature and Models of Networks Nonparametric Latent Space Models Density Functional Estimation Estimation of Latent Variable Density Regularization 3 Summary Sharmodeep Bhattacharyya (berkeley) Networks January 8, 2015 30 / 32 Conclusion Future Works Works in Progress Extension of subsampling bootstrap for more general statistics. Provide a proper regularization scheme and general principles under which block model approximations work. Extend nonparametric latent space models to more general models. Verify the usefulness of the method on real network data sets. Sharmodeep Bhattacharyya (berkeley) Networks January 8, 2015 31 / 32 Conclusion References S. Bhattacharyya (2013) A Study of High-dimensional Clustering and Statistical Inference of Networks. PhD Thesis. S. Bhattacharyya and P. J. Bickel (2013) Subsampling bootstrap of count features of networks. (Under Revision Ann Stat) S. Bhattacharyya and P. J. Bickel (2013) Community detection in networks using graph distance. Arxiv. S. Bhattacharyya, P. J. Bickel and P. J. Wolfe (2014) Estimating Latent Variable Densities for Exchangeable Network Models. In Progress. P.J. Bickel and A. Chen (2009) A nonparametric view of network models and Newman-Girvan and other modularities. PNAS. P.J. Bickel, A. Chen and E. Levina (2011) The method of moments and degree distributions for network models. Ann Stat. P. Wolfe and S. Olhede (2013) Nonparametric graphon estimation. Arxiv. Sharmodeep Bhattacharyya (berkeley) Networks January 8, 2015 32 / 32