A Bayesian approach to network modularity Jake Hofman Wiggins Group Columbia University 2008.03.12 1 Network modularity (community detection) Assign nodes to modules? 2 The “resolution limit” problem 39 42 38 40 43 35 36 37 34 44 41 33 31 47 45 30 32 48 46 29 51 49 26 27 52 50 25 28 54 53 24 23 55 56 21 22 1 18 19 17 2 4 20 3 5 13 8 9 16 6 15 14 7 10 12 11 HRB = − ! ij (Aij − γpij )δzi ,zj Girvan & Newman (2004), Reichardt & Bornholdt (2006) Fortunato et. al. (2007), Kumpula et. al. (2007) 3 The “resolution limit” problem 39 42 38 40 43 35 36 37 34 44 41 33 31 47 45 30 32 48 46 29 51 49 26 27 52 50 25 28 54 53 24 23 55 56 21 22 1 18 19 17 2 4 20 3 5 13 8 9 16 6 15 14 7 10 12 11 HRB = − ! ij (Aij − γpij )δzi ,zj Girvan & Newman (2004), Reichardt & Bornholdt (2006) Fortunato et. al. (2007), Kumpula et. al. (2007) 3 The “resolution limit” problem 43 35 36 37 34 47 42 38 40 52 44 41 47 45 30 48 40 38 37 34 35 55 36 33 29 56 49 53 52 50 26 31 59 25 28 39 41 45 50 44 46 51 27 46 49 54 32 43 51 33 31 42 48 39 30 29 58 32 57 60 54 53 24 23 55 56 21 22 3 1 18 19 17 1 28 3 5 14 18 10 10 12 13 12 7 11 11 24 17 9 6 15 23 22 8 8 9 16 7 21 5 6 13 26 2 2 4 20 27 25 4 15 14 20 19 16 HRB = − ! ij (Aij − γpij )δzi ,zj Girvan & Newman (2004), Reichardt & Bornholdt (2006) Fortunato et. al. (2007), Kumpula et. al. (2007) 3 The “resolution limit” problem Fixed parameters -> fixed resolution or complexity 46 39 42 38 40 43 35 36 37 34 41 50 47 45 41 38 40 37 46 36 55 29 34 56 35 54 49 33 53 52 31 58 50 26 60 25 28 39 49 48 51 27 44 43 45 30 32 48 52 51 44 33 31 42 47 29 57 30 32 59 54 53 24 23 55 56 28 21 4 22 3 1 18 19 17 25 1 2 21 2 4 24 3 22 23 5 6 20 5 13 16 6 15 14 7 8 9 7 17 9 8 19 13 10 12 10 12 27 26 15 14 11 11 20 18 16 HRB = − ! ij (Aij − γpij )δzi ,zj Girvan & Newman (2004), Reichardt & Bornholdt (2006) Fortunato et. al. (2007), Kumpula et. al. (2007) 3 Community detection as inference Know ensemble (parameters, assignment variables, complexity) Infer ensemble (parameters, latent variables, complexity) Sample microstates Observe microstate 4 Complexity control in probabilistic models Increasing complexity 5 Complexity control in probabilistic models • Maximize evidence (integrating over unknown parameters) to infer most probable model complexity p(D|θ̂, K) p(D|K) = ! dθ p(D|θ, K)p(θ|K) http://research.microsoft.com/~minka/statlearn/demo/ 6 Generating modular networks • For each node: • Roll K-sided die with bias ! to determine zi=1,...,K, the (unobserved) module assignment for ith node • For each pair of nodes (i,j): • If zi=zj, flip “in community” coin with bias !c to determine edge Aij • If zi"zj, flip “between communities” coin with bias !d to determine edge Aij Stochastic block models (Holland, Laskey, Leinhardt 1983; Wang and Wong, 1987) 7 Generating modular networks • For each node: • Roll K-sided die with bias ! to determine zi=1,...,K, the (unobserved) module assignment for ith node 1 2 3 4 5 6 7 8 9 • For each pair of nodes (i,j): • If zi=zj, flip “in community” coin with bias !c to determine edge Aij • If zi"zj, flip “between communities” coin with bias !d to determine edge Aij Stochastic block models (Holland, Laskey, Leinhardt 1983; Wang and Wong, 1987) 7 Generating modular networks • For each node: • Roll K-sided die with bias ! to determine zi=1,...,K, the (unobserved) module assignment for ith node 1 2 3 4 5 6 7 8 9 • For each pair of nodes (i,j): • If zi=zj, flip “in community” coin with bias !c to determine edge Aij • If zi"zj, flip “between communities” coin with bias !d to determine edge Aij Stochastic block models (Holland, Laskey, Leinhardt 1983; Wang and Wong, 1987) 7 Generating modular networks • For each node: • Roll K-sided die with bias ! to determine zi=1,...,K, the (unobserved) module assignment for ith node 1 2 3 4 5 6 7 8 9 • For each pair of nodes (i,j): • If zi=zj, flip “in community” coin with bias !c to determine edge Aij • If zi"zj, flip “between communities” coin with bias !d to determine edge Aij Stochastic block models (Holland, Laskey, Leinhardt 1983; Wang and Wong, 1987) 7 Generating modular networks • For each node: • Roll K-sided die with bias ! to determine zi=1,...,K, the (unobserved) module assignment for ith node 1 2 3 4 5 6 7 8 9 • For each pair of nodes (i,j): • If zi=zj, flip “in community” coin with bias !c to determine edge Aij • If zi"zj, flip “between communities” coin with bias !d to determine edge Aij Stochastic block models (Holland, Laskey, Leinhardt 1983; Wang and Wong, 1987) 7 Generating modular networks • For each node: • Roll K-sided die with bias ! to determine zi=1,...,K, the (unobserved) module assignment for ith node 1 2 3 4 5 6 7 8 9 • For each pair of nodes (i,j): • If zi=zj, flip “in community” coin with bias !c to determine edge Aij • If zi"zj, flip “between communities” coin with bias !d to determine edge Aij Stochastic block models (Holland, Laskey, Leinhardt 1983; Wang and Wong, 1987) 7 Generating modular networks • For each node: • Roll K-sided die with bias ! to determine zi=1,...,K, the (unobserved) module assignment for ith node 1 2 3 4 5 6 7 8 9 • For each pair of nodes (i,j): • If zi=zj, flip “in community” coin with bias !c to determine edge Aij • If zi"zj, flip “between communities” coin with bias !d to determine edge Aij Stochastic block models (Holland, Laskey, Leinhardt 1983; Wang and Wong, 1987) 7 Generating modular networks • For each node: • Roll K-sided die with bias ! to determine zi=1,...,K, the (unobserved) module assignment for ith node 1 2 3 4 5 6 7 8 9 • For each pair of nodes (i,j): • If zi=zj, flip “in community” coin with bias !c to determine edge Aij • If zi"zj, flip “between communities” coin with bias !d to determine edge Aij Stochastic block models (Holland, Laskey, Leinhardt 1983; Wang and Wong, 1987) 7 Generating modular networks • For each node: • Roll K-sided die with bias ! to determine zi=1,...,K, the (unobserved) module assignment for ith node 1 2 3 4 5 6 7 8 9 • For each pair of nodes (i,j): • If zi=zj, flip “in community” coin with bias !c to determine edge Aij • If zi"zj, flip “between communities” coin with bias !d to determine edge Aij Stochastic block models (Holland, Laskey, Leinhardt 1983; Wang and Wong, 1987) 7 Generating modular networks • For each node: • Roll K-sided die with bias ! to determine zi=1,...,K, the (unobserved) module assignment for ith node 1 2 3 4 5 6 7 8 9 • For each pair of nodes (i,j): • If zi=zj, flip “in community” coin with bias !c to determine edge Aij • If zi"zj, flip “between communities” coin with bias !d to determine edge Aij Stochastic block models (Holland, Laskey, Leinhardt 1983; Wang and Wong, 1987) 7 Generating modular networks • For each node: • Roll K-sided die with bias ! to determine zi=1,...,K, the (unobserved) module assignment for ith node 1 2 3 4 5 6 7 8 9 • For each pair of nodes (i,j): • If zi=zj, flip “in community” coin with bias !c to determine edge Aij • If zi"zj, flip “between communities” coin with bias !d to determine edge Aij Stochastic block models (Holland, Laskey, Leinhardt 1983; Wang and Wong, 1987) 7 Generating modular networks • For each node: • Roll K-sided die with bias ! to determine zi=1,...,K, the (unobserved) module assignment for ith node 1 2 3 4 5 6 7 8 9 • For each pair of nodes (i,j): • If zi=zj, flip “in community” coin with bias !c to determine edge Aij • If zi"zj, flip “between communities” coin with bias !d to determine edge Aij Stochastic block models (Holland, Laskey, Leinhardt 1983; Wang and Wong, 1987) 7 Generating modular networks • For each node: • Roll K-sided die with bias ! to determine zi=1,...,K, the (unobserved) module assignment for ith node 1 2 3 4 5 6 7 8 9 • For each pair of nodes (i,j): • If zi=zj, flip “in community” coin with bias !c to determine edge Aij • If zi"zj, flip “between communities” coin with bias !d to determine edge Aij Stochastic block models (Holland, Laskey, Leinhardt 1983; Wang and Wong, 1987) 7 Generating modular networks • For each node: • Roll K-sided die with bias ! to determine zi=1,...,K, the (unobserved) module assignment for ith node • For each pair of nodes (i,j): • If zi=zj, flip “in community” coin with bias !c to determine edge Aij • If zi"zj, flip “between communities” coin with bias !d to determine edge Aij 1 2 7 3 5 9 4 6 8 Stochastic block models (Holland, Laskey, Leinhardt 1983; Wang and Wong, 1987) 8 Generating modular networks • Die rolling, coin flipping <-> infinite-range spin-glass Potts model: ! =− H ≡ − ln p(A, !z |!π , θ) JG JL hµ ! i,j (JL Aij − JG )δzi ,zj + K ! hµ µ=1 N ! δzi ,µ i=1 ≡ ln ϑc /ϑd ≡ ln(1 − ϑd )/(1 − ϑc ) + JG ≡ − ln πµ • Infer distributions over spin assignments, coupling constants, and chemical potentials and find number of occupied spin states p(A|K) = !" ! z dθ! " ! = d!π p(A, !z , !π , θ) !" ! z dθ! " ! π) d!π e−H p(θ)p(! Extends Newman (2004, 2006), Hastings (2006), Bornholdt & Reichardt (2006) 9 Generating modular networks • Die rolling, coin flipping <-> infinite-range spin-glass Potts model: ! =− H ≡ − ln p(A, !z |!π , θ) JG JL hµ ! i,j (JL Aij − JG )δzi ,zj + K ! hµ µ=1 N ! δzi ,µ i=1 ≡ ln ϑc /ϑd ≡ ln(1 − ϑd )/(1 − ϑc ) + JG ≡ − ln πµ • Infer distributions over spin assignments, coupling constants, and chemical potentials and find number of occupied spin states p(A|K) = !" ! z dθ! " ! = d!π p(A, !z , !π , θ) !" ! z dθ! " ! π) d!π e−H p(θ)p(! Extends Newman (2004, 2006), Hastings (2006), Bornholdt & Reichardt (2006) Can do integrals, but sum is intractable, O(KN); use mean-field variational technique 9 Validation: Toy graph • Automatic complexity control: probability of occupation for extraneous modules goes to zero 10 Validation: Toy graph • Automatic complexity control: probability of occupation for extraneous modules goes to zero 10 The “resolution limit” problem Variational Bayesian approach correctly infers complexity 42 47 46 43 52 42 47 46 44 48 43 52 44 48 39 39 51 51 50 38 45 55 50 38 40 41 45 55 49 40 41 49 37 37 54 34 56 54 35 53 35 34 56 53 36 36 33 33 58 58 59 31 57 60 29 29 4 25 1 25 2 27 3 27 3 28 30 32 1 2 31 57 30 32 4 59 60 28 26 26 5 5 21 21 6 6 8 23 9 17 7 22 8 23 9 17 7 13 22 13 24 12 15 24 12 18 10 15 20 11 14 19 16 Variational Bayes 18 10 20 11 14 19 16 Girvan-Newman modularity 11 APS March Meeting 2008 co-authorship network 12 APS March Meeting 2008 co-authorship network 12 Theoretical superconductors APS March Meeting 2008 co-authorship network Nanotubes, Graphene Experimental superconductors 12 Conclusions • Used variational Bayesian inference to infer distributions over module assignments, model parameters, and model complexity • Resulted in a principled, accurate, and scalable algorithm which addresses the resolution limit problem • Validated technique on synthetic and real networks • Future: extend model to handle alternative network structure, using same framework • preprint: http://arxiv.org/abs/0709.3512 13 Acknowledgements • Useful discussions • Jonathan Goodman (NYU) • Joel Bader (Hopkins) • Matt Hastings (LANL) • Aaron Clauset (SFI) • Edo Airoldi (Princeton) 14