Estimation of Latent Variable Densities in Networks

advertisement
Estimation of Latent Variable Densities in
Networks
Sharmodeep Bhattacharyya
Department of Statistics
University of California, Berkeley and Oregon State University
Workshop on Theory of Big Data, UCL, January, 2015
(Joint work with Peter J. Bickel, UC Berkeley and Patrick J. Wolfe,
UCL)
Sharmodeep Bhattacharyya (berkeley)
Networks
January 8, 2015
1 / 32
Outline
1
Introduction and Motivation
2
Feature and Models of Networks
Nonparametric Latent Space Models
Density Functional Estimation
Estimation of Latent Variable Density
Regularization
3
Summary
Sharmodeep Bhattacharyya (berkeley)
Networks
January 8, 2015
2 / 32
Introduction and Motivation
Network Data
G = (V , E ): undirected graph and V = {v1 , · · · , vn } arbitrarily labeled vertices.
Adjacency matrices (Symmetric), [Aij ]ni,j=1 numerically represent network data:
Aij =
Sharmodeep Bhattacharyya (berkeley)



1
if node i links to node j,


0
otherwise.
Networks
January 8, 2015
3 / 32
Introduction and Motivation
Example: Collegiate Social Network
Figure : Facebook network adjacency matrix for two different colleges in two different rows
Sharmodeep
(Traud et.Bhattacharyya
al. (2011)(berkeley)
SIAM Review).
Networks
January 8, 2015
4 / 32
Feature and Models of Networks
Nonparametric Models
Outline
1
Introduction and Motivation
2
Feature and Models of Networks
Nonparametric Latent Space Models
Density Functional Estimation
Estimation of Latent Variable Density
Regularization
3
Summary
Sharmodeep Bhattacharyya (berkeley)
Networks
January 8, 2015
5 / 32
Feature and Models of Networks
Nonparametric Models
Nonparametric Latent Variable Models
Derived from representation of exchangeable random infinite array by Aldous and
Hoover (1983).
NP Model
Define P({Aij }ni,j=1 ) conditionally given latent variables {ξi }ni=1 associated with vertices
{vi }ni=1 respectively. (Bickel & Chen (2009), Bollobás et.al. (2007), Hoff et.al. (2002)).
ξ1 , . . . , ξn
Pr(Aij = 1|ξi = u, ξj = v )
iid
∼ U(0, 1)
=
hn (u, v ) = ρn w (u, v ),
w (u, v ) is the conditional latent variable density given Aij = 1.
Define λn ≡ nρn as the expected degree parameter and P = [Pij ]ni,j = [ρn w (ξi , ξj )]ni,j .
hn : not uniquely defined. hn ϕ(u), ϕ(v ) , with measure-preserving ϕ, gives same model.
Sharmodeep Bhattacharyya (berkeley)
Networks
January 8, 2015
6 / 32
Feature and Models of Networks
Nonparametric Models
Stochastic Block Model (Holland, Laskey and Leinhardt 1983)
A K -block stochastic block model with parameters (π, P) is defined as
follows. Consider latent variable corresponding to vertices as
z = (z 1 , z 2 , . . . , z n ) with
z 1, . . . , z n
iid
∼ Multinomial(1; (π1 , . . . , πK ))
Pr(Aij = 1|z i , z j ) = Pz i z j ,
where P = [Pab ] is a K × K symmetric matrix for undirected networks.
Sharmodeep Bhattacharyya (berkeley)
Networks
January 8, 2015
7 / 32
Feature and Models of Networks
Nonparametric Models
Parameters of Interest
Density Functional
Integral parameter on subgraph, R is defined as integral 
P(R) = E 

Y
(i,j)∈R
h(ξi , ξj )
Y
(1 − h(ξi , ξj ))
(i,j)∈R̄
where, R̄ = {(i, j) ∈
/ R, i ∈ V (G ), j ∈ V (G )}.
Density
Estimate a representation of the latent variable density w or h.
Estimate equivalence class of latent variable density w or h with
respect to norms of the form of cut-metric (Lovász (2006)).
Sharmodeep Bhattacharyya (berkeley)
Networks
January 8, 2015
8 / 32
Feature and Models of Networks
Density Functional Estimation
Outline
1
Introduction and Motivation
2
Feature and Models of Networks
Nonparametric Latent Space Models
Density Functional Estimation
Estimation of Latent Variable Density
Regularization
3
Summary
Sharmodeep Bhattacharyya (berkeley)
Networks
January 8, 2015
9 / 32
Feature and Models of Networks
Density Functional Estimation
Empirical “Moments”/ Count statistics
Count statistics are normalized subgraph counts and smooth functions of them.
The subgraph count,P̂(R), for subgraph R is P̂(R) =
1
X
n
p |Hom(R)| S⊆Kn ,S ∼
=R
1(S ⊆ G )
(1)
where, Hom(R) is the group of Homomorphisms of R and Kn is the complete graph on n
vertices.
Examples
(a) Average degree of a network is a count statistic, D̄ =
1
n
Pn
i=1 Di
and Di =
P
j6=i
Aij .
(b) Another well-known statistic is
Transitivity =
Sharmodeep Bhattacharyya (berkeley)
Normalized Count of ∆
Normalized Count of ∆ + ‘V 0
Networks
January 8, 2015
10 / 32
Feature and Models of Networks
Density Functional Estimation
Computation of Count and Variance of Count Statistics
Counts:
Worst case computational complexity of exact counting of number of
subgraphs, R in Gn is O(np ), where, p = |V (R)|.
Computational complexity varies with subgraph and sparsity of graph.
For dense graphs and complex patterns, the approximate counts are
very crude.
Variances: Finding variances of complex patterns also become
theoretically challenging.
So, instead of exact counting we try approximate counting (Similar idea
used by Holmes and Reinert (2004)).
Sharmodeep Bhattacharyya (berkeley)
Networks
January 8, 2015
11 / 32
Feature and Models of Networks
Density Functional Estimation
Bootstrap Scheme
1
For b th iterate of the bootstrap, b = 1, . . . , B,
2
Fix p = Size of R = |V (R)|.
3
Perform random breadth-first search described in Wernicke (2006) with a set
of sampling probabilities (q1 , . . . , qp )
4
Calculate P̂b (R), given by formula
1
P̂b (R) =
Qp
P̄B (R) =
B
1 X
P̂b (R)
B b=1
X
n
d=1 qd p |Hom(R)| S∈S R
p
1(S ∼
= R)
where, SpR is the set of all size-p randomly selected subgraphs of G .
Sharmodeep Bhattacharyya (berkeley)
Networks
January 8, 2015
12 / 32
Feature and Models of Networks
Density Functional Estimation
Bootstrap Theorem
Theorem (B. and Bickel (2013))
Suppose R is fixed, acyclic with |V (R)| = p and
R∞R∞
0
B → ∞ and qd → 0 for all d = 1, . . . , p such that
B
Qp
d=2 qd
≥
1
np−1 ρen
0
1
B
Ä
w 2|R| (u, v )dudv < ∞. For
1
q1
ä
− 1 → 0 and
and n → ∞, λn → ∞ and under G generated from (1), then,
(i)
√
n
Ä
å
Ç −e
ρ̂n P̄B (R) − ρ−e
n P(R)
σ̂B2 (R)
ä
(ii) Given G , Var ρ−e
n P̂b (R)|G = O
Ä
1
q1
−1
ä
1
n
+
⇒ N(0, 1)
1
nρe−p+1
n
·
Qp
1
d=2 λn qd
(2)
.
(iii) We can set bootstrap confidence interval for P(R).
Sharmodeep Bhattacharyya (berkeley)
Networks
January 8, 2015
13 / 32
Feature and Models of Networks
Density Functional Estimation
A General Principle for Estimating Variance
(a)
(b)
If p = |V (R)|, e = |E (R)|,
Ç
Var
T (R)
ρen
å


1
=Ä
ä2 E 

ρen pn |Iso(R)|
Sharmodeep Bhattacharyya (berkeley)

X
S,T ⊆Kn
S,T ∼
=R,S∩T 6=φ
Networks

1(S, T ⊆ H)

January 8, 2015
14 / 32
Feature and Models of Networks
Density Functional Estimation
A General Principle for Estimating Variance
(c)
(d)
If p = |V (R)|, e = |E (R)|,
Ç
Var
T (R)
ρen
å
1
ŀ
ä2
n
e
ρn p |Iso(R)|
Sharmodeep Bhattacharyya (berkeley)
X
1(W ⊆ G )
W ⊆Kn ,W =S∪T ,S,T ∼
=R,|S∩T |=1
Networks
January 8, 2015
15 / 32
Feature and Models of Networks
Estimation of Latent Variable Density
Outline
1
Introduction and Motivation
2
Feature and Models of Networks
Nonparametric Latent Space Models
Density Functional Estimation
Estimation of Latent Variable Density
Regularization
3
Summary
Sharmodeep Bhattacharyya (berkeley)
Networks
January 8, 2015
16 / 32
Feature and Models of Networks
Estimation of Latent Variable Density
Block Model Approximation
For fixed number of communities to be K , a community assignment function z
assigns community based on symmetric matrix Mn×n , defined as z(M)(i) ≡ z i (M) : {1, . . . , n} → {1, . . . , K }
(3)
The metric we will mainly refer to are
(i) kw1 − w2 k22 = inf σ
R1R1
0
0
(w1 − w2 )2 (u, σv )dudv
(ii) kz (1) − z (2) kH = inf π H(z (1) , π ◦ z (2) ).
where, σ : [0, 1] → [0, 1] measure-preserving transformation, π: any permutation of
{1, . . . , n} and H is normalized Hamming distance
H(z (1) , z (2) ) =
Sharmodeep Bhattacharyya (berkeley)
n
1X
(1)
(2)
1 z i 6= z i
n i=1
Networks
(4)
January 8, 2015
17 / 32
Feature and Models of Networks
Estimation of Latent Variable Density
Block Model Approximation
Given z(M), we can form an K × K mean matrix M̄ z from any symmetric
matrix Mn×n z
M̄ab
≡
n X
n
1 X
Mij 1 (z i = a, z j = b) ,
Oab i=1 j=1
1 ≤ a, b ≤ K ,
(5)
where,
Oab ≡


 na nb ,
1 ≤ a, b ≤ K , a 6= b

 na (na − 1),
1 ≤ a ≤ K, a = b
where,
na ≡
n
X
1 (z i = a) ,
1≤a≤K
i=1
Sharmodeep Bhattacharyya (berkeley)
Networks
January 8, 2015
18 / 32
Feature and Models of Networks
Estimation of Latent Variable Density
Block Model Approximation
Sharmodeep Bhattacharyya (berkeley)
Networks
January 8, 2015
19 / 32
Feature and Models of Networks
Estimation of Latent Variable Density
Estimation of Latent Variable Density
Now, we define the estimate of latent variable density w , based on
adjacency matrix An×n as,
z(A)
ŵ (x, y ; z) ≡ ρ̂−1 Āz
G (x) (A),z G (y ) (A)
,
(x, y ) ∈ [0, 1]2
(6)
where,
ρ̂ =
1 X
n
2
Sharmodeep Bhattacharyya (berkeley)
ß
Aij and G (x) ≡ min
i∈[n]
i>j
Networks
i
≥x
n
™
(7)
January 8, 2015
20 / 32
Feature and Models of Networks
Estimation of Latent Variable Density
Estimation of Latent Variable Density
Sharmodeep Bhattacharyya (berkeley)
Networks
January 8, 2015
21 / 32
Feature and Models of Networks
Estimation of Latent Variable Density
Assumptions
Let w0 be the true latent variable density.
Define z 0 ≡ z(P), w0 (·, ·; z 0 ) by replacing A by P and ρ̂ by ρ in (6).
Define ẑ ≡ z(A).
Assumptions
A1 Assumption on w0 : w0 ≤ M0 < ∞.
A2 Assumption on w0 and z: n∧ (z 0 ) ≥ Kn and n∨ (z 0 ) ≤
1 n
K.
A3 Assumption on w0 and z: kw0 (·, ·) − w0 (·, ·; z 0 )k2 ≤ µn → 0. Under
conditions we can show µn ≤
M1
.
2 K 2
A4 Assumption on z: kẑ − z 0 kH = OP (∆n (K )) where, ∆n → 0.
Sharmodeep Bhattacharyya (berkeley)
Networks
January 8, 2015
22 / 32
Feature and Models of Networks
Estimation of Latent Variable Density
Main Theorem
Theorem (B., Bickel and Wolfe (2014))
Let An×n be the adjacency matrix of a simple random graph under model
equation. Under assumptions A1-A4 and for community assignment
function z,
ß
Å
kw0 (·, ·) − ŵ (·, ·; ẑ)k2 = max O(µn (K )), OP
Sharmodeep Bhattacharyya (berkeley)
Networks
ã
™
Ä
ä
p
K
, OP K 3/2 ρn ∆n .
nρn
January 8, 2015
23 / 32
Feature and Models of Networks
Estimation of Latent Variable Density
Methods of Obtaining ŵ
Existing Methods
Olhede, Wolfe (2013) proposed a scheme using profile likelihood as the estimation
method.
Airoldi, Chan (2014) proposed a method using degree distribution as the estimation
method.
Latouche and Robin (2013) and Lloyd et.al. (2013) proposed Bayesian methods for
exchangeable network model inference.
Gao, Lu and Zhou (2014) give minimax rates for dense case.
Generalization
Any estimation method of block model, satisfying conditions on estimation error, can be
used to give ŵ from (6).
Examples of method include maximum likelihood, variational likelihood, spectral
clustering, SDP relaxation and other sufficiently accurate clustering schemes.
Sharmodeep Bhattacharyya (berkeley)
Networks
January 8, 2015
24 / 32
Feature and Models of Networks
Estimation of Latent Variable Density
Special Case: Spectral Clustering
Theorem 2
Let An×n be the adjacency matrix of a simple random graph under model
equation. Assume A1-A3 for spectral assignment function z sp and γn is the
absolute difference between the K and (K + 1)th eigenvalue of P. As n → ∞,
ß
Å
kw0 (·, ·) − ŵ (·, ·; ẑ)k2 = max O(µn (K )), OP
ã
™
Ä
ä
p
K
, OP K 3/2 ρn ∆n .
nρn
where,
∆n (K ) = O
Sharmodeep Bhattacharyya (berkeley)
nK (||P − P̄z sp (P) || + ||A − P||)2
γn2
Networks
!
January 8, 2015
25 / 32
Feature and Models of Networks
Regularization
Outline
1
Introduction and Motivation
2
Feature and Models of Networks
Nonparametric Latent Space Models
Density Functional Estimation
Estimation of Latent Variable Density
Regularization
3
Summary
Sharmodeep Bhattacharyya (berkeley)
Networks
January 8, 2015
26 / 32
Feature and Models of Networks
Regularization
Regularization: Choice of K (Ongoing Work)
One idea is using cross-validation by using the density functionals.
For size-r subgraphs {ar },
Pr (ar ) ≡ Pr [Aij = aij : 1 ≤ i, j ≤ r ]
Z 1
=
0
···
Z 1
Y
[ρn w (ξi , ξj )]aij [1 − ρn w (ξi , ξj )]1−aij dξ1 · · · dξr
0 1≤i<j≤r
Define
kPr − Qr k =
X
|P [Ar = ar ] − Q [Ar = ar ]| .
ar ∈{0,1}r ×r
Sharmodeep Bhattacharyya (berkeley)
Networks
January 8, 2015
27 / 32
Feature and Models of Networks
Regularization
Regularization: Choice of K (Ongoing Work)
P̂r (K ) is obtained by using ŵ .
kP̂r (K ) − Pk22 is estimated by kP̂r (K ) − Pˆr k22 .
Lemma 4
If dcut (P̂(K ), P) ≤ ∆n and 0 < δ ≤ w , ŵ ≤ 1/δ,
ÇÇ å
MSE(K ) = kP̂r (K ) − Pr k = OP
å
r r 2 /2
2
∆n (K ) .
2
Kopt = argmin kP̂r (K ) − Pˆr B k22 .
(8)
(9)
K
where, Pˆr B is the bootstrap estimate of Pˆr .
Sharmodeep Bhattacharyya (berkeley)
Networks
January 8, 2015
28 / 32
Feature and Models of Networks
Regularization
Facebook Data
Figure : The
cross-validation test using
r = 3 between actual
Figure : Top left picture is the adjacency matrix of the
network. The rest of the figures represent the ŵ generating the
network for K = 8, 13, 22.
Sharmodeep Bhattacharyya (berkeley)
network and the estimated
network with number of
clusters K .
Networks
January 8, 2015
29 / 32
Conclusion
Outline
1
Introduction and Motivation
2
Feature and Models of Networks
Nonparametric Latent Space Models
Density Functional Estimation
Estimation of Latent Variable Density
Regularization
3
Summary
Sharmodeep Bhattacharyya (berkeley)
Networks
January 8, 2015
30 / 32
Conclusion
Future Works
Works in Progress
Extension of subsampling bootstrap for more general statistics.
Provide a proper regularization scheme and general principles under
which block model approximations work.
Extend nonparametric latent space models to more general models.
Verify the usefulness of the method on real network data sets.
Sharmodeep Bhattacharyya (berkeley)
Networks
January 8, 2015
31 / 32
Conclusion
References
S. Bhattacharyya (2013) A Study of High-dimensional Clustering and Statistical
Inference of Networks. PhD Thesis.
S. Bhattacharyya and P. J. Bickel (2013) Subsampling bootstrap of count features
of networks. (Under Revision Ann Stat)
S. Bhattacharyya and P. J. Bickel (2013) Community detection in networks using
graph distance. Arxiv.
S. Bhattacharyya, P. J. Bickel and P. J. Wolfe (2014) Estimating Latent Variable
Densities for Exchangeable Network Models. In Progress.
P.J. Bickel and A. Chen (2009) A nonparametric view of network models and
Newman-Girvan and other modularities. PNAS.
P.J. Bickel, A. Chen and E. Levina (2011) The method of moments and degree
distributions for network models. Ann Stat.
P. Wolfe and S. Olhede (2013) Nonparametric graphon estimation. Arxiv.
Sharmodeep Bhattacharyya (berkeley)
Networks
January 8, 2015
32 / 32
Download