Community Structure in Networks: Practice and Significance Elizabeth Leicht 7 January 2011

advertisement
Community Structure in Networks:
Practice and Significance
Elizabeth Leicht
Research Fellow CABDyN Complexity Centre
7 January 2011
Learning from Networks
EMOTIONS MAPPED BY NEW GEOGRAPHY
New York Times 1857; Apr 3, 1933; ProQuest Historical Newspapers The New York Times (1851 - 2004)
pg. 17
Friendship map of students in a 7th grade
class–adapted from Who Shall Survive,
Jacob Moreno, 1934.
Dealing with large networks
Detecting communities in networks
Girvan & Newman
J. Moreno
Adamic & Glance
Calculating modularity
M. E. J. Newman PNAS 103, 8577 (2006).
E. A. Leicht and M. E. J. Newman Phys. Rev. Lett. 100, 118703, (2008).
n
1 X
Q=
[Aij − Pij ] δci ,cj
m i,j=1
1, if there is an edge from j to i
.
0, otherwise
Pij = the expected number of edges from j to i.
ci = the community to which i belongs.
Aij =
What is the expected number of edges between two nodes?
kiin
kjout
kiin kjout
Pij =
m
Division of a network into two communities
si = +1
sj = −1
#
"
n
kiin kjout
1 X
Q=
(si sj +1)
Aij −
2m ij
m
h
Let Bij = Aij −
matrix.
Q=
kiin kjout
m
i
1
δij = (si sj + 1)
2
be an element of the modularity
1 T
1 T T
1 T
s Bs =
s B s=
s B + BT s
2m
2m
4m
Approximate group ID by the
sign of the entry for the node
in the leading eigenvector, v(1) .
(
si =
+1,
−1,
(1)
if vi > 0
(1)
if vi < 0
Two communities and more
Friendship network from 7th grade
class divided into two communities by
method.
Communities with bias in edge direction
Construct a network
of n nodes and
connect pairs of
nodes with
probability p.
Allow random edge
direction for
intra-community
edges.
Bias edge direction
for inter-community
edges.
Communities with bias in edge direction
Allowing directed edges
1
Ignoring directed edges1
M. E. J. Newman PNAS 103, 8577 (2006).
Edge direction bias in real networks
Pennsylvania State
Pennsylvania State
Northwestern
Minnesota
Iowa
Ohio State
Michigan
Michigan
Illinois
Michigan State
Wisconsin
Michigan State
Illinois
Ohio State
Northwestern
Wisconsin
Minnesota
Iowa
Purdue
Indiana
Accounting for win-loss result
Purdue
Indiana
Tracking only games played
American football games among US “Big Ten” schools
with directed edges from losing team to winning team.
Exploratory analysis of networks structure
M. E. J. Newman and E. A. Leicht PNAS 104, 9564-9569, (2007.)
a
b
Group identity inferred from network structure.
A pattern for edges is not pre-determined.
Method: the data and the model
Data
Observed: network edges, Aij ∀i, j.
Missing: group identity of each node, gi ∀i.
Model parameters
θri : probability there exists an edge from a node in
(group) r to a node i.
n
X
θri = 1
i=1
πr : probability a randomly selected node ∈ (group) r .
n
X
i=1
πi = 1
A likelihood problem
The likelihood of the data given the model is,
Pr(A, g |π, θ) = Pr(A|g , π, θ) Pr(g |π, θ)
where
Pr(A|g , π, θ) =
Y
A
θgjij,i and
Pr(g |π, θ) =
ij
Y
π gj
j
Frequently, one works not with the likelihood itself, but with
the log-likelihood,
"
#
X
Y A
L = ln Pr(A, g |π, θ) =
ln πgj +
θgjij,i
j
i
Dealing with missing data
We cannot directly observe g .
We can calculate an expected value for the log-likelihood
over all possible values of g .
L=
=
c
X
c
i
Xh
X
X
ln πgi +
...
Pr(g |A, π, θ)
Aij ln θgi ,j
g1 =1
gn =1
X
h
ir
qir ln πr +
i
X
Aij ln θrj
j
i
j
where
Q A
πr j θrj ij
Pr(A, gi = r |π, θ)
=P
qir = Pr(gi = r |A, π, θ) =
Q Aij
Pr(A|π, θ)
s πs
j θsj
An iterative method–the EM algorithm
Initialize model parameters (θ, π) with random values.
Find the probability a given node i is a member of group
r (E-step).
Q A
πr j θrj ij
qir = P
Q Aij .
s πs
j θsj
Maximize the model parameter (M-step)
P
Aij qir
1X
πr =
qir ,
θrj = Pi
,
n i
i ki qir
Iterate until convergence.
Zachary karate club
Disassortative word network
Assortative & disassortative structure
assortative
disassortative
success
1
0.5
this paper
max. modularity
min. modularity
0
0.1
1
probability ratio pout /pin
10
Keystone network
Modularity method
EM method
We assign nodes to groups based on the set of
keystone nodes to which they are connected.
“Big Ten” results with EM approach
Node size is
proportional to the
probability of the
team losing to teams
assigned to group 1.
Node shading
corresponds to the
probability that the
node is assigned to
group 1.
qj1 →
Two methods for one network
Pennsylvania State
Northwestern
Minnesota
Iowa
Illinois
Purdue
Ohio State
Michigan
Wisconsin
Michigan State
Indiana
Purdue
Indiana
Wisconsin
Illinois
Iowa
Minesota
Pennsylvania
Michigan State
Northwestern
Michigan
Ohio State
Summary
There are many
existing methods for
detecting structure
in complex.
Moving forward we
need to focus on
improving our
understanding of
what these
structures indicate
in real networks.
Friendship map of students in a 7th
grade class–adapted from Who Shall
Survive, Jacob Moreno, 1934.
Download