Bayesian Approach to Network Modularity Hofman Chris H. Wiggins *

advertisement
PHYSICAL REVIEW LETTERS
PRL 100, 258701 (2008)
week ending
27 JUNE 2008
Bayesian Approach to Network Modularity
Jake M. Hofman*
Department of Physics, Columbia University, New York, New York 10027, USA
Chris H. Wiggins†
Department of Applied Physics and Applied Mathematics, Columbia University, New York, New York 10027, USA
(Received 23 September 2007; published 23 June 2008)
We present an efficient, principled, and interpretable technique for inferring module assignments and
for identifying the optimal number of modules in a given network. We show how several existing methods
for finding modules can be described as variant, special, or limiting cases of our work, and how the
method overcomes the resolution limit problem, accurately recovering the true number of modules. Our
approach is based on Bayesian methods for model selection which have been used with success for almost
a century, implemented using a variational technique developed only in the past decade. We apply the
technique to synthetic and real networks and outline how the method naturally allows selection among
competing models.
DOI: 10.1103/PhysRevLett.100.258701
PACS numbers: 89.75.Hc, 02.50.r, 02.50.Tt
Large-scale networks describing complex interactions
among a multitude of objects have found application in a
wide array of fields, from biology to social science to
information technology [1,2]. In these applications one
often wishes to model networks, suppressing the complexity of the full description while retaining relevant information about the structure of the interactions [3]. One such
network model groups nodes into modules, or ‘‘communities,’’ with different densities of intra- and interconnectivity for nodes in the same or different modules. We present
here a computationally efficient Bayesian framework for
inferring the number of modules, model parameters, and
module assignments for such a model.
The problem of finding modules in networks (or ‘‘community detection’’) has received much attention in the
physics literature, wherein many approaches [4,5] focus
on optimizing an energy-based cost function with fixed
parameters over possible assignments of nodes into modules. The particular cost functions vary, but most compare
a given node partitioning to an implicit null model, the two
most popular being the configuration model and a limited
version of the stochastic block model (SBM) [6,7]. While
much effort has gone into how to optimize these cost
functions, less attention has been paid to what is to be
optimized. In recent studies which emphasize the importance of the latter question it was shown that there are
inherent problems with existing approaches regardless of
how optimization is performed, wherein parameter choice
sets a lower limit on the size of detected modules, referred
to as the ‘‘resolution limit’’ problem [8,9]. We extend
recent probabilistic treatments of modular networks
[10,11] to develop a solution to this problem that relies
on inferring distributions over the model parameters, as
opposed to asserting parameter values a priori, to determine the modular structure of a given network. The developed techniques are principled, interpretable, compu0031-9007=08=100(25)=258701(4)
tationally efficient, and can be shown to generalize several
previous studies on module detection.
We specify an N-node network by its adjacency matrix
A, where Aij 1 if there is an edge between nodes i and j
and Aij 0 otherwise, and define i 2 f1; . . . ; Kg to be
the unobserved module membership of the ith node. We
use a constrained SBM, which consists of a multinomial
distribution over module assignments with weights ~ and Bernoulli distributions over edges conpi j
tained within and between modules with weights #c ~ and #d pAij 1ji j ; #,
~
pAij 1ji j ; #
respectively. In short, to generate a random undirected
graph under this model we roll a K-sided die (biased by
)
~ N times to determine module assignments for each of
the N nodes; we then flip one of two biased coins (for either
intra- or intermodule connection, biased by #c , #d , respectively) for each of the NN 1=2 pairs of nodes to
determine if the pair is connected. The extension to directed graphs is straightforward.
Using this model, we write the joint probability
~ K pAj;
~ j
pA; j
~ ;
~ #;
~ #p
~ ~ (conditional dependence on K has been suppressed below for brevity) as
~ #cc 1 #c c # d 1 #d d
pA; j
~ ;
~ #
d
K
Y
n
;
1
(1)
P
where c i>j Aij i ;j is the number of edges conP
tained within communities, c i>j 1 Aij i ;j is
the number
of nonedges contained within communities,
P
d i>j Aij 1 i ;j is the number of edges between
P
different communities, d i>j 1 Aij 1 i ;j is
the numberPof nonedges between different communities,
and n N
i1 i ; is the occupation number of the th
258701-1
© 2008 The American Physical Society
~ and regrouping
module. Defining H lnpA; j
~ ;
~ #
terms by local and global counts, we recover (up to additive
constants) a generalized version of [10]:
H X
JL Aij JG i ;j i>j
week ending
27 JUNE 2008
PHYSICAL REVIEW LETTERS
PRL 100, 258701 (2008)
K
X
1
h
N
X
i ; ; (2)
i1
a Potts model Hamiltonian with unknown coupling constants JG ln1 #d =1 #c , JL ln#c =#d JG ,
and chemical potentials h ln . (Note that many
previous methods omit a chemical potential term, implicitly assuming equally sized groups.)
While previous approaches [4,10] minimize related
Hamiltonians as a function of ,
~ these methods require
that the user specifies values for these unknown constants,
which gives rise to the resolution limit problem [8,9]. Our
approach, however, uses a disorder-averaged calculation to
infer distributions over these parameters, avoiding this
issue. To do so, we take beta B and Dirichlet D distributions over #~ and ,
~ respectively:
~~ :
~ p#p
~ B#c ; c~0 ; c~0 B#d ; d~0 ; d~0 D;
~ n
0
(3)
These conjugate prior distributions are defined on the full
range of #~ and ,
~ respectively, and their functional forms
are preserved when integrated against the model to obtain
updated parameter distributions. Their hyperparameters
~~ g act as pseudocounts that augment
f~
c0 ; c~0 ; d~0 ; d~0 ; n
0
observed edge counts and occupation numbers.
In this framework the problem of module detection can
be stated as follows: given an adjacency matrix A, determine the most probable number of modules (i.e., occupied
spin states) K argmaxK pKjA and infer posterior distributions over the model parameters (i.e., coupling con~
stants and chemical potentials) p;
~ #jA
and the latent
module assignments (i.e., spin states) pjA.
~
In the absence of a priori belief about the number of modules, we
demand that pK is sufficiently weak that maximizing
pKjA / pAjKpK is equivalent to maximizing
pAjK, referred to as the evidence. This approach to
model selection [12] proposed by the statistical physicist
Jeffreys in 1935 [13] balances model fidelity and complexity to determine, in this context, the number of modules.
A more physically intuitive interpretation of the evidence is as the disorder-averaged partition function of a
spin glass, calculated by marginalizing over the possible
quenched values of the parameters #~ and ~ as well as the
spin configurations :
~
Z
XZ
~ #p
~ d#~ dpA;
~
j
~ ;
~ #p
~
Z pAjK ~
While the #~ and ~ integrals in Eq. (4) can be performed
analytically, the remaining sum over module assignments
~ scales as K N and becomes computationally intractable
for networks of even modest sizes. To accommodate largescale networks we use a variational approach that is
well known to the statistical physics community [14] and
has recently found application in the statistics and machine
learning literature, commonly termed variational Bayes
(VB) [15]. We proceed by taking the negative logarithm
of Z and using Gibbs’s inequality:
lnZ ln
XZ
d#~
~
XZ
~
d#~
Z
XZ
~
d#~
Z
~ :
de
~ H p#p
~
~
~ ;
~ #jK
~ pA; ;
dq
~ ;
~ ;
~ #
~
q;
~ ;
~ #
~
~ ;
~ #jK
~ lnpA; ;
dq
~ ;
~ ;
~ #
: (6)
~
q;
~ ;
~ #
That is, we first multiply and divide by an arbitrary ap~ and then upper-bound
proximating distribution q;
~ ;
~ #
the log of the expectation by the expectation of the log. We
define the quantity to be minimized —the expression in
Eq. (6)—as the variational free energy Ffq; Ag, a func~ (Note that the negative log of
tional of q;
~ ;
~ #.
~
q;
~ ;
~ # plays the role of a test Hamiltonian in variational approaches in statistical mechanics.)
We next choose a factorized approximating distribution
~ q~ q
~ with q~ ~ and
q;
~ ;
~ #
~ ~ q
~ #~ #
~ D;
~ n
~
~
q#~ # qc #c qd #d B#c ; c~ ; c~ B#d ; d ; d~ ;
~ as qi as in mean field theory, we factorize q~ Qi , an N-by-K matrix which gives the probability
that the ith node belongs to the th module. Evaluating
~ gives a
Ffq; Ag with this functional form for q;
~ ;
~ #
~~
function of the variational parameters f~
c ; c~ ; d~ ; d~ ; ng
and matrix elements Qi which can subsequently be minimized by taking the appropriate derivatives.
We summarize the resulting iterative algorithm, which
provably converges to a local minimum of Ffq; Ag and
provides controlled approximations to the evidence
~
pAjK as well as the posteriors p;
~ #jA
and pjA:
~
Initialization.—Initialize the N-by-K matrix Q Q0
and set pseudocounts c~ c~0 , c~ c~0 , d~ d~0 ,
d~ d~0 , and n~ n~0 .
Main loop.—Until convergence in Ffq; Ag:
(i) Update the expected value of the coupling constants
and chemical potentials
hJL i ~
c ~
c d~ d~ (7)
hJG i d~ d~ d~ ~
c ~
c c~ (8)
(4)
Z
hh i X
n~ ~
n ;
(5)
where x is the digamma function;
258701-2
(9)
PHYSICAL REVIEW LETTERS
PRL 100, 258701 (2008)
(ii) Update the variational distribution over each spin i
X
Qi / exp
hJL iAij hJG i
Qj hh i
(10)
ji
P
normalized such that Qi 1, for all i;
(iii) Update the variational distribution over parameters
from the expected counts and pseudocounts
n~ hn i n~0 N
X
Qi n~0
(11)
i1
c~ hc i c~0 12 TrQT AQ c~0
(12)
~ ni
~ T Q hc i c~0
c~ hc i c~0 12TrQT uh
(13)
d~ hd i d~0 M hc i d~0
(14)
(15)
d~ hd i d~0 C M hc i d~0 ;
P
where C NN 1=2, M i>j Aij , and u~ is a N-by-1
vector of 1’s;
(iv) Calculate the updated optimized free energy
Ffq; Ag ln
K X
N
X
Zc Zd Z~
Qi lnQi ;
~ Z
~ Z
~
Z
1 i1
c
d
(16)
~
~~ is the beta function with a vector-valued
where Z~ Bn
argument, the partition function for the Dirichlet distribution q~ ~ [likewise for qc #c , qd #d ]. As this provably
converges to a local optimum, VB is best implemented
with multiple randomly chosen initializations of Q0 to find
the global minimum of Ffq; Ag.
Convergence of the above algorithm provides the ap~
proximate posterior distributions q~ ,
~ q~ ,
~ and q#~ #
and simultaneously returns K , the number of nonempty
modules that maximizes the evidence. As such, one needs
only to specify a maximum number of allowed modules
and run VB; the probability of occupation for extraneous
modules converges to zero as the algorithm runs and the
most probable number of occupied modules remains.
This is significantly more accurate than other approximate methods, such as Bayesian information criterion
(BIC) [16] and integrated classification likelihood (ICL)
[17,18], and is less computationally expensive than empirical methods such as cross-validation (CV) [19,20] in
which one must perform the associated procedure after
fitting the model for each considered value of K.
Specifically, BIC and ICL are suggested for single-peaked
likelihood functions well approximated by Laplace integration and studied in the large-N limit. For a SBM the first
assumption of a single-peaked function is invalidated by
the underlying symmetries of the latent variables; i.e.,
nodes are distinguishable and modules indistinguishable.
See Fig. 1 for a comparison of our method with the Girvan-
week ending
27 JUNE 2008
Newman modularity [5] in the resolution limit test [8,9],
where VB consistently identifies the correct number of
modules. [Note that VB is both accurate and fast: it performs competitively in the ‘‘four groups’’ test [21] and
scales as OMK. Runtime for the main loop in MATLAB
on a 2 GHz laptop is 6 min for N 106 nodes with
average degree 16 and K 4.]
Furthermore, we note that previous methods in which
parameter inference is performed by optimizing a likelihood function via expectation maximization (EM)
[11,18] are also special cases of the framework presented
here. EM is a limiting case of VB in which one collapses
the distributions over parameters to point estimates at the
mode of each distribution; however EM is prone to overfitting and cannot be used to determine the appropriate
number of modules, as the likelihood of observed data
increases with the number of modules in the model. As
such, VB performs at least as well as EM while simultaneously providing complexity control [22,23].
In addition to validating the method on synthetic networks, we apply VB to the 2000 NCAA American football
schedule shown in Fig. 2 [24]. Each of the 115 nodes
represents an individual team and each of the 613 edges
represents a game played between the nodes joined. The
algorithm correctly identifies the presence of the 12 conferences which comprise the schedule, where teams tend to
play more games within than between conferences, making
most modules assortative. Of the 115 teams, 105 teams are
assigned to their corresponding conferences, with the majority of exceptions belonging to the frequently misclassified independent teams [25]—the only disassortative
FIG. 1 (color). Results for the resolution limit test suggested in
[8,9]. Shapes and colors correspond to the inferred modules.
(Left) Our method, variational Bayes, in which all 15 modules
are correctly identified (each clique is assigned a unique color/
shape). (Right) GN modularity optimization, where failure due
to the resolution limit is observed—neighboring cliques are
incorrectly grouped together. (Bottom) The results of this test
implemented for a range of true number of modules, Ktrue , the
number of 4-node cliques in the ringlike graph. Note that our
method correctly infers the number of communities KVB over the
entire range of Ktrue , while GN modularity initially finds the
correct number of communities but fails for Ktrue 15 as shown
analytically in [9].
258701-3
PHYSICAL REVIEW LETTERS
PRL 100, 258701 (2008)
43
35
62
C. W. was supported by NSF No. ECS-0425850 and NIH
No. 1U54CA121852.
32
100
44
27
19
55
86
13
15
week ending
27 JUNE 2008
72
37
39
60
64
28
113
77
71
18
97
76
45
93
106
90
7
104
48
33
34
88
101
3
2
49
14
107
16
87
21
66
110
38
67
40
61
46
57
96
92
114
59
98
58
63
20
65
26
31
81
103
70
91
41
17
82
11
94
108
99
105
85
50
54
10
115
89
5
24
95
75
53
6
1
51
102
83
25
12
30
4
73
29
56
80
36
42
111
47
84
68
74
22
52
78
23
9
8
69
112
109
79
FIG. 2 (color). Each of the 115 nodes represents a NCAA team
and each of the 613 edges a game played in 2000 between two
teams it joins. The inferred module assignments (designated by
color) on the football network which recover the 12 NCAA
conferences (designated by shape). Nodes 29, 43, 59, 60, 64, 81,
83, 91, 98, and 111 are misclassified and are mostly independent
teams, represented by parallelograms.
group in the network. We emphasize that, unlike other
methods in which the number of conferences must be
asserted, VB determines 12 as the most probable number
of conferences automatically.
Posing module detection as inference of a latent variable
within a probabilistic model has a number of advantages. It
clarifies what precisely is to be optimized and suggests a
principled and efficient procedure for how to perform this
optimization. Inferring distributions over model parameters reveals the natural scale of a given modular network,
avoiding resolution limit problems. This method allows us
to view a number of approaches to the problem by physicists, applied mathematicians, social scientists, and computer scientists as related subparts of a larger problem. In
short, it suggests how a number of seemingly disparate
methods may be recast and united. A second advantage of
this work is its generalization to other models, including
those designed to reveal structural features other than
modularity. Finally, use of the evidence allows model
selection not only among nested models, e.g., models
differing only in the number of parameters, but even
among models of different parametric families. The last
strikes us as a natural area for progress in the statistical
study of real-world networks.
It is a pleasure to acknowledge useful conversations on
modeling with Joel Bader and Matthew Hastings, on
Monte Carlo methods for Potts models with Jonathan
Goodman, with David Blei on variational methods, and
with Aaron Clauset for his feedback on this manuscript
[26]. J. H. was supported by NIH No. 5PN2EY016586;
*jmh2045@columbia.edu
†
chris.wiggins@columbia.edu
[1] R. Albert and A.-L. Barabási, Rev. Mod. Phys. 74, 47
(2002).
[2] D. J. Watts and S. H. Strogatz, Nature (London) 393, 440
(1998).
[3] E. Ziv, M. Middendorf, and C. H. Wiggins, Phys. Rev. E
71, 046117 (2005).
[4] J. Reichardt and S. Bornholdt, Phys. Rev. E 74, 016110
(2006).
[5] M. Newman and M. Girvan, Phys. Rev. E 69, 026113
(2004).
[6] P. Holland and S. Leinhardt, Sociological Methodology 7,
1 (1976).
[7] F. McSherry, in Proceedings of the 42nd IEEE Symposium
on the Foundations of Computer Science, 2001 (IEEE,
New York, 2001), pp. 529–537.
[8] J. Kumpula, J. Saramäki, K. Kaski, and J. Kertész, Eur.
Phys. J. B 56, 41 (2007).
[9] S. Fortunato and M. Barthélemy, Proc. Natl. Acad. Sci.
U.S.A. 104, 36 (2007).
[10] M. B. Hastings, Phys. Rev. E 74, 035102(R) (2006).
[11] M. E. J. Newman and E. A. Leicht, Proc. Natl. Acad. Sci.
U.S.A. 104, 9564 (2007).
[12] R. E. Kass and A. E. Raftery, J. Am. Stat. Assoc. 90, 773
(1995).
[13] H. Jeffreys, Proc. Cambridge Philos. Soc. 31, 203 (1935).
[14] R. P. Feynman, Statistical Mechanics, A Set of Lectures
(W. A. Benjamin, Reading, MA, 1972).
[15] M. I. Jordan, Z. Ghahramani, T. Jaakkola, and L. K. Saul,
Mach. Learn. 37, 183 (1999).
[16] G. Schwarz, Ann. Stat. 6, 461 (1978).
[17] C. Biernacki, G. Celeux, and G. Govaert, IEEE Trans.
Pattern Anal. Mach. Intell. 22, 719 (2000).
[18] C. Ambroise, H. Zanghi, and V. Miele, Statistics for
Systems Biology Group Research Report No. SSB-RR8, 2007.
[19] E. M. Airoldi, D. M. Blei, S. E. Fienberg, and E. P. Xing,
arXiv:0705.4485v1.
[20] M. Stone, J. Royal Stat. Soc. 36, 111 (1974).
[21] L. Danon, A. Dı́az-Guilera, J. Duch, and A. Arenas,
J. Stat. Mech. (2005) P09008.
[22] C. M. Bishop, Pattern Recognition and Machine Learning
(Springer, New York, 2006).
[23] D. J. MacKay, Information Theory, Inference, and
Learning Algorithms (Cambridge University Press,
Cambridge, U.K., 2003).
[24] M. Girvan and M. E. J. Newman, Proc. Natl. Acad. Sci.
U.S.A. 99, 7821 (2002).
[25] A. Clauset, C. Moore, and M. E. J. Newman, in ICML
2006 Ws, Lecture Notes in Computer Science, edited by
E. M. Airoldi (Springer-Verlag, Berlin, 2007).
[26] Related software can be downloaded from http://vbmod.
sourceforge.net.
258701-4
Download