A Bayesian approach to network modularity Jake Hofman Wiggins Group

advertisement
A Bayesian approach to
network modularity
Jake Hofman
Wiggins Group
Columbia University
2008.03.12
1
Network modularity (community detection)
Assign nodes to modules?
2
The “resolution limit” problem
39
42
38
40
43
35
36
37
34
44
41
33
31
47
45
30
32
48
46
29
51
49
26
27
52
50
25
28
54
53
24
23
55
56
21
22
1
18
19
17
2
4
20
3
5
13
8
9
16
6
15
14
7
10
12
11
HRB = −
!
ij
(Aij − γpij )δzi ,zj
Girvan & Newman (2004), Reichardt & Bornholdt (2006)
Fortunato et. al. (2007), Kumpula et. al. (2007)
3
The “resolution limit” problem
39
42
38
40
43
35
36
37
34
44
41
33
31
47
45
30
32
48
46
29
51
49
26
27
52
50
25
28
54
53
24
23
55
56
21
22
1
18
19
17
2
4
20
3
5
13
8
9
16
6
15
14
7
10
12
11
HRB = −
!
ij
(Aij − γpij )δzi ,zj
Girvan & Newman (2004), Reichardt & Bornholdt (2006)
Fortunato et. al. (2007), Kumpula et. al. (2007)
3
The “resolution limit” problem
43
35
36
37
34
47
42
38
40
52
44
41
47
45
30
48
40
38
37
34
35
55
36
33
29
56
49
53
52
50
26
31
59
25
28
39
41
45
50
44
46
51
27
46
49
54
32
43
51
33
31
42
48
39
30
29
58
32
57
60
54
53
24
23
55
56
21
22
3
1
18
19
17
1
28
3
5
14
18
10
10
12
13
12
7
11
11
24
17
9
6
15
23
22
8
8
9
16
7
21
5
6
13
26
2
2
4
20
27
25
4
15
14
20
19
16
HRB = −
!
ij
(Aij − γpij )δzi ,zj
Girvan & Newman (2004), Reichardt & Bornholdt (2006)
Fortunato et. al. (2007), Kumpula et. al. (2007)
3
The “resolution limit” problem
Fixed parameters -> fixed resolution or complexity
46
39
42
38
40
43
35
36
37
34
41
50
47
45
41
38
40
37
46
36
55
29
34
56
35
54
49
33
53
52
31
58
50
26
60
25
28
39
49
48
51
27
44
43
45
30
32
48
52
51
44
33
31
42
47
29
57
30
32
59
54
53
24
23
55
56
28
21
4
22
3
1
18
19
17
25
1
2
21
2
4
24
3
22
23
5
6
20
5
13
16
6
15
14
7
8
9
7
17
9
8
19
13
10
12
10
12
27
26
15
14
11
11
20
18
16
HRB = −
!
ij
(Aij − γpij )δzi ,zj
Girvan & Newman (2004), Reichardt & Bornholdt (2006)
Fortunato et. al. (2007), Kumpula et. al. (2007)
3
Community detection as inference
Know ensemble
(parameters,
assignment
variables,
complexity)
Infer ensemble
(parameters,
latent variables,
complexity)
Sample
microstates
Observe
microstate
4
Complexity control in probabilistic models
Increasing complexity
5
Complexity control in probabilistic models
• Maximize evidence (integrating over unknown parameters) to infer most probable
model complexity
p(D|θ̂, K)
p(D|K) =
!
dθ p(D|θ, K)p(θ|K)
http://research.microsoft.com/~minka/statlearn/demo/
6
Generating modular networks
• For each node:
• Roll K-sided die with bias ! to
determine zi=1,...,K, the
(unobserved) module
assignment for ith node
• For each pair of nodes (i,j):
• If zi=zj, flip “in community”
coin with bias !c to determine
edge Aij
• If zi"zj, flip “between
communities” coin with bias !d
to determine edge Aij
Stochastic block models (Holland, Laskey, Leinhardt 1983; Wang and Wong, 1987)
7
Generating modular networks
• For each node:
• Roll K-sided die with bias ! to
determine zi=1,...,K, the
(unobserved) module
assignment for ith node
1
2
3
4
5
6
7
8
9
• For each pair of nodes (i,j):
• If zi=zj, flip “in community”
coin with bias !c to determine
edge Aij
• If zi"zj, flip “between
communities” coin with bias !d
to determine edge Aij
Stochastic block models (Holland, Laskey, Leinhardt 1983; Wang and Wong, 1987)
7
Generating modular networks
• For each node:
• Roll K-sided die with bias ! to
determine zi=1,...,K, the
(unobserved) module
assignment for ith node
1
2
3
4
5
6
7
8
9
• For each pair of nodes (i,j):
• If zi=zj, flip “in community”
coin with bias !c to determine
edge Aij
• If zi"zj, flip “between
communities” coin with bias !d
to determine edge Aij
Stochastic block models (Holland, Laskey, Leinhardt 1983; Wang and Wong, 1987)
7
Generating modular networks
• For each node:
• Roll K-sided die with bias ! to
determine zi=1,...,K, the
(unobserved) module
assignment for ith node
1
2
3
4
5
6
7
8
9
• For each pair of nodes (i,j):
• If zi=zj, flip “in community”
coin with bias !c to determine
edge Aij
• If zi"zj, flip “between
communities” coin with bias !d
to determine edge Aij
Stochastic block models (Holland, Laskey, Leinhardt 1983; Wang and Wong, 1987)
7
Generating modular networks
• For each node:
• Roll K-sided die with bias ! to
determine zi=1,...,K, the
(unobserved) module
assignment for ith node
1
2
3
4
5
6
7
8
9
• For each pair of nodes (i,j):
• If zi=zj, flip “in community”
coin with bias !c to determine
edge Aij
• If zi"zj, flip “between
communities” coin with bias !d
to determine edge Aij
Stochastic block models (Holland, Laskey, Leinhardt 1983; Wang and Wong, 1987)
7
Generating modular networks
• For each node:
• Roll K-sided die with bias ! to
determine zi=1,...,K, the
(unobserved) module
assignment for ith node
1
2
3
4
5
6
7
8
9
• For each pair of nodes (i,j):
• If zi=zj, flip “in community”
coin with bias !c to determine
edge Aij
• If zi"zj, flip “between
communities” coin with bias !d
to determine edge Aij
Stochastic block models (Holland, Laskey, Leinhardt 1983; Wang and Wong, 1987)
7
Generating modular networks
• For each node:
• Roll K-sided die with bias ! to
determine zi=1,...,K, the
(unobserved) module
assignment for ith node
1
2
3
4
5
6
7
8
9
• For each pair of nodes (i,j):
• If zi=zj, flip “in community”
coin with bias !c to determine
edge Aij
• If zi"zj, flip “between
communities” coin with bias !d
to determine edge Aij
Stochastic block models (Holland, Laskey, Leinhardt 1983; Wang and Wong, 1987)
7
Generating modular networks
• For each node:
• Roll K-sided die with bias ! to
determine zi=1,...,K, the
(unobserved) module
assignment for ith node
1
2
3
4
5
6
7
8
9
• For each pair of nodes (i,j):
• If zi=zj, flip “in community”
coin with bias !c to determine
edge Aij
• If zi"zj, flip “between
communities” coin with bias !d
to determine edge Aij
Stochastic block models (Holland, Laskey, Leinhardt 1983; Wang and Wong, 1987)
7
Generating modular networks
• For each node:
• Roll K-sided die with bias ! to
determine zi=1,...,K, the
(unobserved) module
assignment for ith node
1
2
3
4
5
6
7
8
9
• For each pair of nodes (i,j):
• If zi=zj, flip “in community”
coin with bias !c to determine
edge Aij
• If zi"zj, flip “between
communities” coin with bias !d
to determine edge Aij
Stochastic block models (Holland, Laskey, Leinhardt 1983; Wang and Wong, 1987)
7
Generating modular networks
• For each node:
• Roll K-sided die with bias ! to
determine zi=1,...,K, the
(unobserved) module
assignment for ith node
1
2
3
4
5
6
7
8
9
• For each pair of nodes (i,j):
• If zi=zj, flip “in community”
coin with bias !c to determine
edge Aij
• If zi"zj, flip “between
communities” coin with bias !d
to determine edge Aij
Stochastic block models (Holland, Laskey, Leinhardt 1983; Wang and Wong, 1987)
7
Generating modular networks
• For each node:
• Roll K-sided die with bias ! to
determine zi=1,...,K, the
(unobserved) module
assignment for ith node
1
2
3
4
5
6
7
8
9
• For each pair of nodes (i,j):
• If zi=zj, flip “in community”
coin with bias !c to determine
edge Aij
• If zi"zj, flip “between
communities” coin with bias !d
to determine edge Aij
Stochastic block models (Holland, Laskey, Leinhardt 1983; Wang and Wong, 1987)
7
Generating modular networks
• For each node:
• Roll K-sided die with bias ! to
determine zi=1,...,K, the
(unobserved) module
assignment for ith node
1
2
3
4
5
6
7
8
9
• For each pair of nodes (i,j):
• If zi=zj, flip “in community”
coin with bias !c to determine
edge Aij
• If zi"zj, flip “between
communities” coin with bias !d
to determine edge Aij
Stochastic block models (Holland, Laskey, Leinhardt 1983; Wang and Wong, 1987)
7
Generating modular networks
• For each node:
• Roll K-sided die with bias ! to
determine zi=1,...,K, the
(unobserved) module
assignment for ith node
1
2
3
4
5
6
7
8
9
• For each pair of nodes (i,j):
• If zi=zj, flip “in community”
coin with bias !c to determine
edge Aij
• If zi"zj, flip “between
communities” coin with bias !d
to determine edge Aij
Stochastic block models (Holland, Laskey, Leinhardt 1983; Wang and Wong, 1987)
7
Generating modular networks
• For each node:
• Roll K-sided die with bias ! to
determine zi=1,...,K, the
(unobserved) module
assignment for ith node
• For each pair of nodes (i,j):
• If zi=zj, flip “in community”
coin with bias !c to determine
edge Aij
• If zi"zj, flip “between
communities” coin with bias !d
to determine edge Aij
1
2
7
3
5
9
4
6
8
Stochastic block models (Holland, Laskey, Leinhardt 1983; Wang and Wong, 1987)
8
Generating modular networks
• Die rolling, coin flipping <-> infinite-range spin-glass Potts model:
! =−
H ≡ − ln p(A, !z |!π , θ)
JG
JL
hµ
!
i,j
(JL Aij − JG )δzi ,zj +
K
!
hµ
µ=1
N
!
δzi ,µ
i=1
≡ ln ϑc /ϑd
≡ ln(1 − ϑd )/(1 − ϑc ) + JG
≡ − ln πµ
• Infer distributions over spin assignments, coupling constants, and chemical
potentials and find number of occupied spin states
p(A|K) =
!"
!
z
dθ!
"
! =
d!π p(A, !z , !π , θ)
!"
!
z
dθ!
"
! π)
d!π e−H p(θ)p(!
Extends Newman (2004, 2006), Hastings (2006), Bornholdt & Reichardt (2006)
9
Generating modular networks
• Die rolling, coin flipping <-> infinite-range spin-glass Potts model:
! =−
H ≡ − ln p(A, !z |!π , θ)
JG
JL
hµ
!
i,j
(JL Aij − JG )δzi ,zj +
K
!
hµ
µ=1
N
!
δzi ,µ
i=1
≡ ln ϑc /ϑd
≡ ln(1 − ϑd )/(1 − ϑc ) + JG
≡ − ln πµ
• Infer distributions over spin assignments, coupling constants, and chemical
potentials and find number of occupied spin states
p(A|K) =
!"
!
z
dθ!
"
! =
d!π p(A, !z , !π , θ)
!"
!
z
dθ!
"
! π)
d!π e−H p(θ)p(!
Extends Newman (2004, 2006), Hastings (2006), Bornholdt & Reichardt (2006)
Can do integrals,
but sum is
intractable, O(KN);
use mean-field
variational
technique
9
Validation: Toy graph
• Automatic complexity control: probability of occupation for extraneous modules
goes to zero
10
Validation: Toy graph
• Automatic complexity control: probability of occupation for extraneous modules
goes to zero
10
The “resolution limit” problem
Variational Bayesian approach correctly infers complexity
42
47
46
43
52
42
47
46
44
48
43
52
44
48
39
39
51
51
50
38
45
55
50
38
40
41
45
55
49
40
41
49
37
37
54
34
56
54
35
53
35
34
56
53
36
36
33
33
58
58
59
31
57
60
29
29
4
25
1
25
2
27
3
27
3
28
30
32
1
2
31
57
30
32
4
59
60
28
26
26
5
5
21
21
6
6
8
23
9
17
7
22
8
23
9
17
7
13
22
13
24
12
15
24
12
18
10
15
20
11
14
19
16
Variational Bayes
18
10
20
11
14
19
16
Girvan-Newman
modularity
11
APS March Meeting
2008 co-authorship
network
12
APS March Meeting
2008 co-authorship
network
12
Theoretical
superconductors
APS March Meeting
2008 co-authorship
network
Nanotubes,
Graphene
Experimental
superconductors
12
Conclusions
• Used variational Bayesian inference to infer distributions over module assignments,
model parameters, and model complexity
• Resulted in a principled, accurate, and scalable algorithm which addresses the
resolution limit problem
• Validated technique on synthetic and real networks
• Future: extend model to handle alternative network structure, using same framework
• preprint: http://arxiv.org/abs/0709.3512
13
Acknowledgements
• Useful discussions
• Jonathan Goodman (NYU)
• Joel Bader (Hopkins)
• Matt Hastings (LANL)
• Aaron Clauset (SFI)
• Edo Airoldi (Princeton)
14
Download