Anti-Bipartite approach to Modularity Optimisation in Networks Mathematics Course 499

advertisement
Anti-Bipartite approach to Modularity
Optimisation in Networks
Mathematics Course 499
Brendan O’Dowd
School of Mathematics
Trinity College, University of Dublin
With thanks to supervisor Dr. Conor Houghton
May 14, 2009
Abstract
Many real networks exhibit community structure, whereby nodes tend to form
clusters with a higher density of edges within them than between them. I
propose new algorithms for cluster detection in networks, all based on vector
partitioning, which use eigenvectors of the modularity matrix more normally
used for analysis of bipartite networks. By adopting an ‘anti-bipartite’ approach
to cluster detection, my algorithms found the correct partitions of several sample
networks. Furthermore, the vector representation of this approach shows that
these eigenvectors contain information about the structure of the network.
Contents
1 Introduction to Networks and Modularity
1.1 Networks and Clustering . . . . . . . . . . . . . . . . . . . . . . .
1.2 Modularity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3
3
4
2 Vector Partitioning
2.1 Background to Vector Partitioning . . . . . . . . . . . . . . . . .
2.2 Testing Vector Partitioning . . . . . . . . . . . . . . . . . . . . .
2.3 Bipartite networks and negative modularity . . . . . . . . . . . .
9
9
11
16
3 Anti-bipartite approaches to Vector Partitioning
3.1 Using both positive and negative eigenvalues to find
ordinary networks . . . . . . . . . . . . . . . . . . .
3.2 Adjusting the vectors when βn and βn−1 are used . .
3.3 Adjusting the vectors when β1 and βn are used . . .
19
clusters in
. . . . . . .
. . . . . . .
. . . . . . .
19
21
24
4 Conclusions and Further Work
28
A Range of Modularity
31
1
List of Figures
2.1
2.2
2.3
2.4
2.5
2.6
2.7
Sample network . . . . . . . . . . . . . . . . . . . . . . . .
Modularity chart of sample network . . . . . . . . . . . .
Best cut of sample network . . . . . . . . . . . . . . . . .
Facebook friend network diagram . . . . . . . . . . . . . .
Facebook friend network - Trintity College Subsection . .
Bipartite network . . . . . . . . . . . . . . . . . . . . . . .
Vector representation of approximately bipartite network
.
.
.
.
.
.
.
12
13
13
14
15
16
18
Vector representation using high and low eigenvalues . . . . . . .
Modularity plot using high and low eigenvalues . . . . . . . . . .
Vector representation busing high and low eigenvalues with cut .
Plot using low eigenvalues for normal network . . . . . . . . . . .
Modularity plot with lowest eigenvalues and vectors reflected
through the origin . . . . . . . . . . . . . . . . . . . . . . . . . .
3.6 Plot using high and low eigenvalues for normal network . . . . .
3.7 Plot of modularity when using high and low eigenvalues . . . . .
3.8 Partitioned vector representation of the network using high and
low eigenvalues . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.9 Sample network showing location of cut between clusters . . . . .
3.10 Diagram of a linear network . . . . . . . . . . . . . . . . . . . . .
3.11 Comparison of methods using linear network. . . . . . . . . . . .
20
20
22
22
3.1
3.2
3.3
3.4
3.5
2
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
23
25
26
26
26
27
27
Chapter 1
Introduction to Networks
and Modularity
1.1
Networks and Clustering
A wide range of areas under research can be represented as networks, which are
composed of a set of nodes which may be connected to one another by edges.
Examples include social networks where people may be connected if they are
friends or are related through family or background, biological networks such as
food webs, genetic networks, protein networks, metabolic systems, and neural
networks [1][2][3]. Other networks of interest are citation networks, the internet,
power grids, communication and distribution networks and many more besides
[2][4][5][6].
Of particular interest is a common phenomenon known as community structure or ‘clustering’ within networks, whereby nodes in a network tend to form
groups with a higher density of connections within them than between them [7].
This behaviour usually corresponds to some real grouping or interdependence.
For example, clusters in a social network may correspond to groups of people
that work closely together or have similar interests, or clusters appearing in a
cell network may indicate that these cells perform some special function together
[1][3]. In the former example where we may have expected to find communities,
it may be of practical use to locate a specific group of people. In the latter
example, finding community structure that we did not expect to see may help
in our understanding of how the network operates on a macro level.
Our natural ability to identify clusters depends on the size of the network,
how that network is presented and our familiarity with the entities of which that
network is comprised. For many networks under current investigation these factors prevent us from identifying clusters manually in any reasonable amount
of time. Thus the problem of identifying and quantifying community structure becomes a computational task, and as such, has developed considerably
following the arrival of high performance computing in the mid 1980’s. Of key
3
concern here is how a computer program might identify clusters within a network, which requires that we try to convert our own intuition of what a cluster
is into something more specific.
In this paper I consider only networks that are unweighted and undirected,
i.e. the magnitude and direction of the connection of two nodes in a network
is ignored. An example of a weighted network would be a road network which
takes into account the average volume of traffic between any two connected
points, A and B. An example of a directed network would be an information
flow network, such as Ann tells Brendan a story, who in turn tells the story to
Carol.
1.2
Modularity
One of the earliest methods of separating clusters was Graph Partitioning [8].
This method arose from a need to increase computational performance by assigning tasks (e.g. calculations) to different processors within a computer. Many
of these tasks would have to communicate amongst one another before being
carried out, as they may rely on the results of a separate calculation. Typically
communication is much faster within a processor than between them. The goal,
therefore, was to minimise the number of times that data had to be transferred
from processor to processor. In network terms this means grouping nodes into
clusters such that the number of connections (or edges) between the clusters is
minimised. The number of edges between clusters is called the cut size.
While minimisation of the cut size works well for computational purposes,
it does not seem to divide groups of nodes into communities as we understand
them [7]. One of the reasons is that the number of nodes in each group must
first be given, which is unlikely to be available in any real scenario other than
a computer network. A more fundamental reason is that a good division into
separate clusters may not be one in which the number of edges between groups
is minimised. An alternative approach is suggested by M. E. J. Newman, who
says that a good division “is one in which the number of edges between groups
is smaller that expected ”[7]. Equally this means finding a division in which
the number of edges within groups is larger than expected. With this latter
interpretation in mind he introduces Modularity, Q, which can he defines in a
conceptual sense by
Q
= (number of edges within clusters)
− (expected number of such edges) .
(1.1)
The job becomes one of allocating nodes into clusters, and finding the arrangement where Q is maximised.
It is important to note here that modularity is all about telling us how good
a division of a network is. One of the problems with graph partitioning is that it
finds a division of the network regardless of whether a natural division actually
exists. If there is no natural clustering behaviour in a network, then a good
4
cluster detection approach should have some sense that this is the case. The
modularity approach has this advantage, since it is used to quantify as well as
detect the clusters.
The first term in eqn. (1.1) is calculated via the n × n adjacency matrix Aij
(where n is the number of nodes in the network), given by
½
1 if node i is connected to node j
Aij =
(1.2)
0 otherwise
Note that we assume that no node is self connected, i.e. Aii = 0 for all i ∈
[1, n]. The degree of node i, denoted ki , is the number of other nodes connected
to it, and can be calculated from Aij thus
ki =
n
X
Aij .
(1.3)
j=1
If
Pm is the total number of edges in the network, then the sum of all the degrees,
ki , will be double this amount, 2m, since each edge is counted by the degree
of two separate nodes.
If we let gi be the cluster to which node i belongs, then we give the total
number of edges within clusters by
X
Aij δ(gi , gj ),
(1.4)
ij
where δ(gi , gj ) equals one if gi = gj and zero otherwise.
What we mean by the expected number of edges within clusters requires a
little more thought. How many edges would you expect there to be between
nodes i and j? It is useful to imagine this as the probability of i and j being
connected if all the edges were taken up and laid between nodes at random. Let
us denote this value Pij . We will assume that the expected degree of each node
is the same as in the real network, i.e.
X
X
ki =
Aij =
Pij
(1.5)
j
j
This assumption is made so that the random model will have the same degree
distribution as the original network. This is thought to be important, since
almost all real networks have a right skewed degree distribution [9].
The assumption that the degrees of the nodes in the random network remain
the same implies that the total number of edges is also unchanged:
X
X
X
ki =
Aij =
Pij = 2m
(1.6)
i
ij
ij
We assume that the probability of an edge being connected to node i is solely a
function of its degree which we can call f (ki ). This means that the probability
of nodes i and j being connected is a product of these functions
Pij = f (ki )f (kj ).
5
(1.7)
Summing over j gives
n
X
Pij = f (ki )
j=1
n
X
f (kj ) = ki .
(1.8)
j=1
Since this is true for all i and the summation over f (kj ) does not depend on i,
we see that f (ki ) is simply a constant (C, say) times ki . If we sum again over
j we must get 2m, from (1.6). Therefore we have
Pij
=
f (ki )f (kj ) = Cki Ckj
(1.9)
which implies
X
Pij
= C2
X
ij
ki
i
X
kj
j
(1.10)
and so
2m
C
= C 2 (2m)(2m)
1
= √ .
2m
(1.11)
Using (1.7) we can now write a simple, calculable expression for Pij
Pij =
ki kj
,
2m
(1.12)
and the expected number of nodes within clusters:
X ki kj
ij
2m
δ(gi , gj ).
(1.13)
Note that this expression for Pij is easily calculated. Alternative forms for
Pij are suggested by other authors, but the form presented here has proven
successful in several studies and will be used throughout this report.
The modularity is now given by
Q =
=
1 X
[Aij − Pij ] δ(gi , gj )
2m ij
¸
·
1 X
ki kj
δ(gi , gj )
Aij −
2m ij
2m
(1.14)
(1.15)
where the prefactor of 1/2m is included to ensure that the maximum modularity
is one, in line with other measures of network division (such as the clustering coefficient [1]) and previous definitions of modularity. The minimum possible value
6
for modularity is −1/2 [10], despite numerous sources (including Wikipedia [11])
claiming that the lower bound is −1. A derivation of these limits is given in
Appendix A.
Some authors feel that when a community detection algorithm is used to
detect clusters for a random graph of the Erdös − Rényi model (where all nodes
are connected with equal probability P ), or for scale-free models which do have
a right skewed degree distribution, the result should be zero. However this
has been shown by Guimerà et al. not to be the case for Newman’s modularity.
They have shown that the modularity can be calculated from simple parameters
of those networks [12]. This shows that due to fluctuations, even stochastic
network models give rise to community structure as defined by Newman. Thus
the selection of a random model used to calculate the “expected number of links
between clusters” is the subject of debate, but one in which I will not get into
in this report. It is fair to say that if it is the magnitude of the modularity one
is interested in, one may have to compare it to the modularity of a similar sized
network that has no intended community structure. As I am only interested in
the maxima and minima of modularities I will absolve myself of this task.
It is tempting to absorb the prefactor of 1/2m prefactor into the definition of
P and say that Pij is simply the product of ki /2m and kj /2m, which each look
like they could be the probability of a random edge being attached to nodes i
and j. This is intuitively satisfactory, but hard to justify and there seems to be
no reason why 1/2m would be absorbed into the definition of Aij .
In the current form (eqn. (1.15)) the modularity can be calculated exactly,
yet from a computational point of view the delta function is somewhat awkward.
Fortunately a convenient matrix form is available. We introduce the n × c index
matrix S, where c is the number of clusters that we want to split the network
into. The entries of S are defined
½
1 if node i is a member of cluster j
Sij =
(1.16)
0 otherwise
Note that the j th column of S is a vector of n elements and tells us which nodes
are in the j th cluster. This interpretation will be of use later when S is broken
up.
The delta function δ(gi , gj ) can now be given by
δ(gi , gj ) =
c
X
Sik Skj
(1.17)
k=1
and so the modularity now becomes
c
Q=
1 XX
[Aij − Pij ] Sik Skj .
2m ij
(1.18)
k=1
We now introduce the modularity matrix B, equal to A − P which leaves us
with an important matrix formula for Q:
Q = Tr(S T BS).
7
(1.19)
Note all the terms in the matrix definition for modularity can be easily plugged
into a computer program. This definition is therefore very useful for measuring
the modularity exactly, and we will return to it throughout the report.
8
Chapter 2
Vector Partitioning
2.1
Background to Vector Partitioning
While the matrix definition for modularity given in eqn. (1.19) will tell us when
we have found a good split of the network into clusters, it offers little insight
into how we practically find the clusters themselves. Unfortunately, trying every
possible partition of the network is computationally unfeasible since the number
of different partitions for a network of n nodes is the nth Bell number [13]. Thus
even for a small network with 30 nodes an exhaustive search would require that
we calculate the modularity 8.5 × 1023 times. Another approach is required.
Vector partitioning is a useful method that can help us to do this. First we
expand B in terms of the matrix of its eigenvectors U (where U = (u1 |u2 | . . . |un ))
and the diagonal matrix of eigenvalues D so that (1.19) becomes
Q = Tr(S T U DU T S).
(2.1)
If we call the eigenvalues βi where Dii = βi , then the modularity can be written
Q=
n X
c
X
βj (uTj sk )2 ,
(2.2)
j=1 k=1
where sk is the k th column of the n × c matrix S. What we essentially have now
is a sum of n elements, where the j th element is proportional to βj and depends
on the corresponding eigenvector, uj , i.e.
Q = β1 [stuff 1 (u1 )] + β2 [stuff 2 (u2 )] + . . . + βn [stuff n (un )]
(2.3)
If we arrange the eigenvalues in order of size, so that β1 ≥ β2 ≥ . . . ≥ βn−1 ≥ βn
then we see that a good approach would be to maximise the “stuff” preceded
by large positive eigenvalues such as β1 , β2 and so on. For reasons such as
minimising computational time and other practical reasons that we will soon
see, it is neither practical nor useful to consider very many of these terms.
9
Typically it is sufficient to maximise two or three of these “stuffs” to achieve
high modularity. In order to do this it is necessary to modify (2.1) as follows
Q
= Tr(S T U (D − αI + αI)U T S)
= nα + Tr(S T U (D − αI)U T S)
" n
#2
n X
c
X
X
(βj − α)
Uij Sik .
= nα +
j=1 k=1
(2.4)
i=1
where α is a constant that we will discuss momentarily. The n prefactor to α
arises since Tr(S T S) = n and U is orthogonal. We now reduce the summation
over all n eigenvalues to just the first p leading eigenvalues, giving
" n
#2
p X
c
X
X
Q ' nα +
(βj − α)
Uij Sik .
(2.5)
j=1 k=1
i=1
The choice of α deals with the minimisation of the error associated with this
approximation. Here we will only give the result of this exercise, which finds
that α is an average of the n − p lowest eigenvalues
α=
n
X
1
βi .
n − p i=p+1
(2.6)
Finally, we simplify (2.5) by defining a set of n vectors ri , i = 1, 2, . . . n, each
of dimension p. They are given by
p
[ri ]j = βj − αUij .
(2.7)
Because α is the average of the eigenvalues not considered for maximising
modularity, it will be less than than all of the p leading eigenvalues, so the
components of all ri will be real. Inserting this into (2.5) gives
Q '
=
nα +
nα +
" n
p X
c
X
X
#2
(βj − α)Uij Sik
j=1 k=1
i=1
j=1 k=1
i∈Gk
"
p X
c
X
X
#2
[ri ]j
(2.8)
where the summation over i ∈ Gk indicates that we are summing over all vectors ri that correspond to nodes that are in the same cluster. This reduced
summation has come about due to the behaviour of S discussed at eqn. (1.17).
We now introduce the “community vectors” Rk which represent the sum of
all vectors that make up cluster k. This simplifies 2.8 further, giving
Q = nα +
c
X
k=1
10
2
|Rk | .
(2.9)
To maximise the community vectors, we need to group those vectors ri that
point in roughly the same direction. This is extremely useful, since it gives us
not only a graphical representation of the network, but also a method to go
about finding the clusters.
From here it is relatively easy to prove (see [7] for derivation) that if we
consider breaking the network into just two clusters, then the community vectors will be of equal magnitude and exactly oppositely directed. (This is true
when the contributions of all n dimensions are used. When two dimensions
are used, as in much of the following examples, the community vectors appear
only approximately equal in magnitude and oppositely directed.) It can also be
proven that it is never beneficial to place a vector within a cluster such that
its associated community vector points in the opposite direction, i.e. we should
never let node i be in cluster k if ri · Rk < 0.
Thus for two clusters we can be given the direction of R1 , R2 or the dividing
plane between the clusters to define the network split. If we consider just the
two leading eigenvalues β1 and β2 1 , then we can find the best split by imagining
the line dividing the clusters rotating from 0◦ to 360◦ and grouping the vectors
ri at each angle depending on what side of the dividing line they are on. At
every angle the vectors to each side are added to give the community vectors
R1 and R1 . The sum of the magnitudes of the two community vectors are then
used to find the modularity as given by eqn. (2.9). When this has been done
for every angle a plot of modularity versus angle of R1 can be built up and the
maximum modularity found.
It is not just the angle of a vector ri , but also its magnitude that tells us
about a node’s membership of a cluster. To illustrate this, consider a set of
vectors {ri } (i from 1 to n) that has been optimally split into two clusters
using the method described above. If we now move a node whose vector is very
long into the wrong cluster, then from eqn. (2.8) the modularity will fall by
a substantial amount. Conversely, if that vector is very small then it can be
moved at relatively little cost to the modularity. Thus the length of a vector
indicates the “certainty” of the algorithm in assigning it to its cluster. A large
vector shows that we can be quite certain that the algorithm has placed that
node in the correct cluster whereas a short vector shows that the node is not a
clear member of either group.
2.2
Testing Vector Partitioning
Using C programming I set about writing a program, or rather a series of programs, to separate networks into clusters. Some networks were real, based on
data recorded by network researchers examining networks of football teams,
email records and citations of physicists’ publications. I also created several
simple networks with around a dozen nodes and very obvious clustering. These
1 It is more difficult, yet by no means impossible to use more positive eigenvalues such as
β3 ,β4 etc. and cycle through all directions for a dividing plane in higher dimensions. In this
report we will look only at instances where two eigenvalues were used.
11
were useful when examining how the procedure worked, and how it might be
improved.
The first of my programs took in the data about the connections of each
node, created the matrices A and P as defined by (1.2) and (1.12) and then
subtracted one from the other to get the modularity matrix B. The second
program took B and calculated its eigenvalues and eigenvectors, and the third
program then used a selection of these to calculate each of the ri using (2.7).
The fourth program rotated the direction of a line through the origin from 0◦ to
360◦ . At each tenth of a degree the program checked which side of this dividing
line the vectors lay on, found their sum and then calculated the modularity
using (2.9). Since this modularity is an approximation using only two out of
the available n eigenvectors, I wrote a fifth program which took this suggested
division and calculated the modularity exactly using (2.1).
Fig. 2.1a shows an example of a simple network that I drew where it is
intended that nodes 1 to 5 form one cluster and nodes 6 to 11 form another.
Fig. 2.1b shows the plot of the vectors associated with each node as defined
by (2.7). Note that the vector for node 5 is the shortest, indicating that it is
least clear for this node which cluster it belongs to, as discussed in the previous
section. This seems reasonable given its location in fig. 2.1a. Fig. 2.2 shows
(a) Sample network
(b) Vector representation
Figure 2.1: Sample network of 11 nodes and the associated vector representation.
the plot of modularity versus the direction of the dividing line for the simple
network given in fig. 2.1a (certain prefactors to the modularity have been left
out, which doesn’t matter since we are only concerned with finding the point of
maximum modularity). The first obvious feature one notices about this plot is
that the region from 0◦ to 180◦ is identical to that from 180◦ to 360◦ . This is
because the procedure has no notion of what we have called R1 and R2 , and
calculates the same sum as each community vector passes by the same point. We
could just have rotated the dividing line from 0◦ to 180◦ but this is a relatively
minor issue. The point of maximum modularity corresponds to a cut through
the vector plot of fig. 2.1b at around 90◦ . As can be seen from fig. 2.3 where
12
8
Modularity
6
4
2
0
-2
0
50
100
150
200
250
300
350
Angle of dividing line in degrees
Figure 2.2: Modularity chart of sample network.
1
6 10
0.5
5
9
4
1,32
0
7
-0.5
11
8
-1
-1
-0.5
0
0.5
1
Figure 2.3: Vector representation of sample network cut by dividing line (blue)
into two clusters shown here in red and green.
13
the blue line divides the clusters, this corresponds to the separation of the nodes
that we had hoped for.
While the method described above can find the maximum modularity when
the network is split into only two clusters, plotting all the ri can still show a
network split that is composed of many more communities, even when plotted
in only two dimensions. To illustrate this, I used the social networking website
Facebook to compose my friend network. Facebook is a hugely popular social
networking website with over 175 million active users worlwide [14]. Facebook
allows users to create a personal profile, add other users as friends, send them
messages and so on. I used my 133 friends on Facebook to create my social
network, and then checked the list of mutual friends of each of these to see
which people were connected.
As can be seen from fig. 2.4, my friends form 3 distinct groups; those from
my home town of Castlebar (coloured red), those who are in college with me
(coloured green) and another group of friends I met when at the International
Conference for Physics Students (ICPS) held in Krakow during the summer of
2008 (coloured blue). The colouring was done manually by me since I know how
I met all of these people.
1
J.S.
0.5
O.N.
0
Z.B.
-0.5
-1
-1
-0.5
0
0.5
1
Figure 2.4: Vector representation of my friend network on social networking site
Facebook. My friends from Castlebar are in red, College friends are in green
and ICPS friends are in blue. A number of friends who do not fall into these
categories are in pink, but these vectors are barely visible.
There is also a small group of 5 people who do not fit comfortably into any
of these three categories and so the program, not knowing where they belong,
has left them very close to the centre. They are coloured pink in fig. 2.4
but are hardly visible. Only 3 people do not fit exactly into their groups and
14
these are marked by their initials. Z.B. stands for my girlfriend Zara who is
from Castlebar but is now studying in Trinity College and so is placed between
those two groups. O.N. and J.S. stand for Orna Nicholl and Jessica Stanley an undergraduate and graduate respectively of Trinity College. They are both
involved with the Institute of Physics and have met many of my friends from
the ICPS at various other conferences, and so they are placed between the
Trinity College cluster and the ICPS cluster. This graph shows how accurately
vector partitioning can find clusters within networks, even when only using 2
eigenvectors out of a possible 133.
The group of my 64 college friends on Facebook is further broken up in
fig. 2.5. The vectors are coloured according to the subject being studied by
each friend; Physics students are in red, Maths students are in green, and my
fellow Theoretical Physics students are in blue. Friends at college not in any of
these subjects are in pink. Students of Theoretical Physics take classes in both
1
0.5
0
-0.5
-1
-1
-0.5
0
0.5
1
Figure 2.5: Vector representation of my college friends based on Facebook connections. Physics students are in red, Maths students are in green and Theoretical Physics students are in blue. The pink vectors correspond to friends in
none of these courses.
the School of Physics and the School of Maths and tend to have a mixture of
friends from all three courses, whereas Maths and Physics students do not share
classes outside their own department. One notices that the clusters are not as
clearly defined as the previous case where all my friends were considered. This
shows a higher level of connectedness on this local level compared to my entire
friend network in fig. 2.4. The graph also seems to indicate that students of
Theoretical Physics are more friendly with Maths students than those studying
Physics. This may be due to the fact that we spent a majority our time doing
15
Maths courses in first and second year. If the blue vectors are ignored, we
can see that there is little interaction between Maths and Physics students.
One classmate has suggested that a camraderie developed between students
of Theoretical Physics and Mathematics while working at regular homework
assignments during our first year together.
2.3
Bipartite networks and negative modularity
We have seen that the eigenvectors corresponding to the most positive eigenvalues have proven very useful in splitting networks. It turns out that at the other
end of the spectrum the very negative eigenvalues, βn , βn−1 etc. are very useful
for splitting a different type of network that has a lower than expected number
of links within clusters. A network in which there are no connections within its
k clusters is said to be k-partite, and if k is 2 then it is said to be bipartite.
Fig 2.6 is my attempt at drawing an aproximately bipartite network, which has
only 4 edges within the clusters and the other 11 between them. This type
Figure 2.6: An example of a network that is approximately bipartite.
of network is not just a curiosity. A real life example would be a relationship
network where the nodes are people and edges indicate a relatonship between
two people at some point in the past. If our clusters are comprised of males and
females then we would, based on statistics on sexual orientation (for examples
see [15]), expect a majority of the edges to lie between the clusters. From here
on where there may be confusion over which type of network I am talking about,
I will refer to the network discussed up to now (with a higher density of links
within clusters) as a ‘normal’ network.
If we are trying to minimise rather than maximise the number of links
within clusters, then we need to minimise the modularity as we have defined it.
Looking back at our expression for modularity as a sum of terms proportional to
the eigenvalues (2.3) it now seems logical to maximise the terms with the most
negative prefactors, i.e. “stuff n” and “stuff n-1”. Retracing our steps through
vector partitioning, except this time using βn and βn−1 and their associated
eigenvalues, we come across a problem when it comes to defining the ri vectors
at eqn. (2.7). We will now find that βn − α and βn−1 − α will be negative,
so that the components of each ri will be complex. Newman’s approach is to
16
redefine ri as follows
[ri ]j =
p
α − βj Uij ,
(2.10)
so that the components of ri are all real. He then gives the modularity by
Q = nα −
c
X
2
|Rk | .
(2.11)
k=1
I prefer to leave the definition of the ri the same, and allow the components to
be imaginary. Here the subscripts
have been changed to µ, ν to avoid confusion
√
with the imaginary i = −1
p
[rµ ]ν =
βν − αUµν
p
=
−(α − βν ) [uν ]µ
p
= i (α − βν ) [uν ]µ
(2.12)
Since our rµ will be two dimensional, let ν = n correspond to the x-component
and ν = n − 1 correspond to the y-component of the vector, i.e.
³p
´
³p
´
rµ = i (α − βn ) [un ]µ x̂ + i (α − βn−1 ) [un−1 ]µ ŷ
(2.13)
where x̂ and ŷ are unit vectors in the x and y directions respectively. The
community vectors Rk will be sums of such vectors and so will be of the form
Rk = iax̂ + ibŷ
(2.14)
¡
¢
|Rk |2 = − a2 + b2 .
(2.15)
where a, b ∈ <, so that
While I like this interpretation from a theoretical point of view, my program
for finding approximately bipartite clusters works in the same way as if I had
taken Newman’s redefinitions at eqns. (2.10), (2.11). Obviously the vectors
have to be plotted with real components, but this approach helps to explain the
redefinition of modularity for the bipartite case at eqn. (2.11). It will also be
of much use later when explaining my approach for using β1 and βn .
I amended my program to look for the smallest modularity using the eigenvectors with the lowest eigenvalues. The vector representation for the approximately bipartite network is shown in fig. 2.6, as is the line dividing the nodes at
the point of minimum modularity. The clusters suggested by the program for
this network didn’t reflect the groups I had intended at all. It was only when
I redrew the network arranged according to the clusters recommended by the
algorithm that I saw a much better arrangement (see fig. 2.8b). As I had drawn
it the modularity was quite low at -0.233, but the new arrangement found by
the program reduced it much further to -0.436. The program managed to beat
my attempts to draw bipartite networks in this way on more than one occasion,
indicating that these things are hard to draw.
17
1
4
0.5
3
10
11
2
13
0
8
9
1
6
12
5
-0.5
7
-1
-1
-0.5
0
0.5
1
Figure 2.7: Vector representation of the approximately bipartite network shown
in fig. 2.6. The blue line shows the division of the nodes at minimum modularity.
Interestingly, this groups the nodes in a way (see fig. 2.8b) completely different
to, and much better than that which I expected.
(a) My attempt at drawing an approximately
bipartite network
(b) Better cut of the same network
Figure 2.8: My attempt at drawing two clusters of nodes in an approximately
bipartite fashion (left), and (right) the clusters suggested by the program. The
modularity for my clusters that I imagined is -0.233, while the lowest modularity
found by the program in the new arrangement is -0.436
18
Chapter 3
Anti-bipartite approaches
to Vector Partitioning
3.1
Using both positive and negative eigenvalues to find clusters in ordinary networks
While studying the approaches to splitting both ordinary networks and those
that are approximately bipartite, it became apparant to me that u1 and un (the
eigenvectors corresponding to β1 and βn ) seemed to ‘know’ the most about how
a network was connected. I wanted to see if both of these vectors could be used
together to find clusters in an ordinary network. Some of the inspiration for
this idea came from looking at eqn. (2.3) and wondering if “stuff 1” could be
maximised and “stuff n” simultaneously minimised to achieve a modularity as
good as, or better than the existing procedure. I felt that rejecting a partition
with a low number of links within clusters would aid a search for a high number
of links within clusters. I set about altering my program to give the user the
option of defining the vectors rµ by
p
[rµ ]ν = α − βν Uµν
ν = 1, n.
(3.1)
Letting the ν = 1 component correspond to x co-ordinates and ν = n component
correspond to y co-ordinates, we get
³p
´
³p
´
rµ =
(β1 − α) [u1 ]µ x̂ +
(βn − α) [un ]µ ŷ.
(3.2)
Since the x component will be real and the y component will be imaginary, this
is equivalent to
³p
´
³p
´
rµ =
(β1 − α) [u1 ]µ x̂ + i (α − βn ) [un ]µ ŷ.
(3.3)
Hence the community vectors will be of the form
19
1
6
4
0.5
8
2
11
0
9 10
1,3
7
-0.5
5
-1
-1
-0.5
0
0.5
1
Figure 3.1: Vector representation of the fig. 2.1a using eigenvectors of the
highest and lowest eigenvalues.
10
8
6
Modularity
4
2
0
-2
-4
-6
-8
-10
-12
0
50
100 150 200 250 300 350 400
Direction of dividing line with respect to positive x-axis in degrees
Figure 3.2: Plot of modularity versus angle of dividing line using the new method
for the network shown in fig. 2.1a.
20
Rk = ax̂ + ibŷ
a, b ∈ <,
(3.4)
and their magnitudes will have both positive and negative terms
|Rk |2 = a2 − b2 .
(3.5)
Since the modularity is still given by
Q = nα +
c
X
2
|Rk | ,
(3.6)
k=1
we will effectively be trying to maximise the x components of the community
vectors and minimising their y components. We shall use the previous network
example shown in fig 2.1a to see if this works. The vector representation is
shown in fig. 3.1. Again, they are plotted as though the components were real,
but the imaginary aspect of the y components has given us eqn. (3.5).
Using the new method of finding the magnitude of community vectors, we
get the plot of modularity versus angle of the dividing line shown in fig. 3.2.
The maximum modularity is again around 90◦ which corresponds to the cut we
had hoped for, as shown in fig. 3.3. This method found the correct clusters in
about a dozen other sample networks that I came up with.
3.2
Adjusting the vectors when βn and βn−1 are
used
From the previous section, it seems possible that the most negative eigenvalue
βn can be used to find clusters in normal (non-bipartite) networks. I decided
to investigate if clusters could be found using only the two most negative eigenvalues, βn and βn−1 . At first it seemed like a good idea to plot the vectors
using the eigenvectors un and un−1 as though the network was approximately
bipartite, and then look for the highest modularity. This never worked however, for reasons which are best understood by studying plots of the vectors
in this scenario. Fig. 3.4a shows the vectors for our sample network using un
and un−1 . We would like the vectors 1 - 5 to be in one cluster, and 6 - 11 in
another. However, this method of using βn and βn−1 separates nodes that are
closely connected to opposite sides of the origin. Vectors that should be added
together to form a community vector end up directly opposite each other, such
as the points 4 and 5 or 9 and 11. Therefore we will never group them correctly
by simply dividing them by a line at any angle, regardless of whether we are
looking for high or low modularity.
However, the fact that neighbouring nodes form vectors that are oppositely
directed can be useful. Nodes that are closely connected tend to form ‘jets’
of vectors pointing in opposite directions. In fig. 3.4a vectors 1 - 6 seem to
form two horizontal jets pointing left and right, while vectors 7 - 11 form jets
pointing upwards and downwards. I decided to reflect all the vectors through
21
1
6
4
0.5
8
2
11
0
9 10
1,3
7
-0.5
5
-1
-1
-0.5
0
0.5
1
Figure 3.3: Vector representation of the fig. 2.1a using eigenvectors of the
highest and lowest eigenvalues. The two clusters are coloured in red and green,
and the blue line indicates the line dividing the clusters at the point of maximum
modularity.
1
1
11
11
8
0.5
9
0.5
7
10
7
10
4
2
0
5
1,3
6
0
6
1,3
4
5
2
5
2
4
6
1,3
10 7
-0.5
-0.5
8
9
9
-1
11
8
-1
-1
-0.5
0
0.5
1
-1
-0.5
0
0.5
1
(a) Vector representation using lowest eigen-(b) Same plot with vectors reflected through
values
the origin
Figure 3.4: Vector representation of the network in fig. 2.1a using the eigenvectors of the lowest eigenvalues. The vectors seem to form horizontal and vertical
‘jets’ corresponding to neighbouring nodes (left). It was therefore decided to
reflect all the points through the origin so they could form groups again (right).
22
the origin to exploit this pattern and get neighbouring nodes beside each other
again. This can be seen in fig. 3.4b.
Up to now, a dividing line was rotated while the vectors to each side were
added to form the community vectors. Now, with jets of neighbouring vectors
at intervals of roughly 90◦ it seemed logical to me to only add vectors within
90◦ of one end of the rotating line. This means that while there are two vectors
for every node, only the vector nearest one end of the dividing line contributes
to a community vector.
Since the vectors had been reflected in this way, I decided that the imaginary
nature of the y components should be ignored, and the community vectors
calculated in the original way with real components. The result of calculating
the modularity in this way as the dividing line made its way through 360◦ is
shown in fig 3.5. In this plot we see that the pattern repeats itself not twice, but
19
18.5
Modularity
18
17.5
17
16.5
16
15.5
0
50
100
150
200
250
300
350
Angle of dividing line in degrees
Figure 3.5: Plot of modularity versus angle of dividing line for vectors given in
fig. 3.4b. Only vectors with 90◦ of one end of the dividing line contributed to
the community vectors R1 and R2 .
four times, since every vector now appears within a 180◦ range. The maximum
modularity is found at around 45◦ . This almost gives us the correct cut; nodes
6 and 7 which are grouped with 1 - 5 instead of 8 - 11. This is still a very good
result given that the input information, un and un−1 is normally used to solve
a completely different type of problem. If we move 7 to its correct group, then
the approximate modularity calculated by this method only drops by 0.08%
(of course when calculated exactly this change causes Q to increase). A very
encouraging point here is that if we look at the graph 3.4b and go clockwise
from 2, we pass by every point in the same order as if we were going from left
to right in the diagram of the network (see fig. 2.1a).
This technique was tested on a variety of other test networks. It gave perfect
results about half of the time and made one mistake about a quarter of the time.
Even in the cases where it made more than one mistake, the result showed a
23
good attempt at finding the clusters, at no stage bearing a similarity to a random
allocation of nodes to clusters.
3.3
Adjusting the vectors when β1 and βn are
used
My real interest was in using the eigenvectors u1 and un , since they seemed to
know the most about the structure of the network and they had already proven
useful in Section 3.1. I wanted to plot all the nodes as vectors that use these
eigenvalues and then look for ways of discerning the clusters from that plot.
Let the components of the ri be as before with those corresponding to u1 on
the x-axis and those corresponding to un on the y-axis. From what we have seen
so far with this set-up, we now expect that for nodes that are close together
in the network will be close to each other in the x direction, but separated
from each other in the y direction. We must take this latter effect and turn it
around in some way similar to what was done in Section 3.2 when all the vectors
were reflected through the origin. I decided to just reflect the vectors in the y
direction, since the nodes are already gathered correctly in the x direction. Plots
of these vectors and the original ri before the reflection operation are shown in
figs. 3.6b and 3.6a respectively.
In previous approaches, we were able to identify the clusters by rotating a
line through 360◦ as before grouping vectors in some region to the left or right.
This will not work this time however, since there is the potential for double
counting of vectors, especially those lying close to the x-axis whose reflection
will land right next to themselves. It is hard to imagine an algorithm related to
those seen already that counts some vectors twice and others only once. For this
reason I decided to neglect the negative y region entirely, and focus on splitting
those vectors ri above the x-axis.
With this set-up, I can now introduce a dividing line rotating about the
origin that groups vectors ri to its left and right into community vectors Rk .
At every angle we once more measure the lengths of the community vectors and
calculate the modularity using (2.9). The maximum modularity was found at
90◦ (see fig. 3.7), corresponding exactly to the cut that we wanted (3.8).
This procedure worked for every test network it was tried on, with two
interesting points to note. The first is that the modularity was almost always
found to be at its maximum when the vectors were cut at 90◦ . This in itself
is not a good sign, since whether or not a vector lies to the left or right of a
vertical line through the origin depends only on its x co-ordinate, which here
depends solely on u1 , so it is possible that u1 is playing no important role at
all.
The second point I noticed is that as we look at each vector starting from 0◦
and look at each vector as we rotate to 180◦ , then the order we pass them in is
almost always the same as their order from left to right. This helpful feature is
due in part to un , since any random allocation of the y co-ordinates would not
24
1
6
4
0.5
8
2
11
0
9 10
1,3
7
-0.5
5
-1
-1
-0.5
0
0.5
1
(a) Vector representation using high and low
eigenvalues
1
6
0.8
4
0.6
5
7
0.4
0.2
9
8
10
1,3
2
11
0
-1
-0.5
0
0.5
(b) Plot of same vectors reflected above the x-axis
Figure 3.6: On the left is the vector representation for the network shown in
fig. 2.1a using u1 and un . On the right any vectors below the x-axis have been
reflected through it.
25
1
22
20
Modularity
18
16
14
12
10
8
0
20
40
60
80 100 120 140 160 180
Direction of dividing line with respect to positive x-axis in degrees
Figure 3.7: Plot of modularity when using β1 and βn . We can see that the
modularity reaches a maximum at 90◦
1
6
0.8
4
0.6
5
7
0.4
0.2
9
8
10
1,3
2
11
0
-1
-0.5
0
0.5
1
Figure 3.8: Plot of vectors divided into their correct clusters using β1 and βn .
The clusters are coloured red and green, and the line dividing them at maximum
modularity is coloured blue.
Figure 3.9: The sample network being used with the cut indicated by the dashed
red line
26
give us the same result. In particular, we can see in fig. 3.8 that the vectors
closest to the line which divides the clusters - 4, 5 and 6 - are the nodes which are
closest to where the cut lies in the actual network (see fig. 3.9). Similarly those
vectors which are furthest away from the dividing line - 9, 11 and 2 - belong to
the nodes which are furthest away from the cut’s position in the network.
As one further test of this feature I decided to use both the existing method
(using u1 and u2 ) and my new method (using u1 and un ) on a network of
ten nodes each connected in a line as shown in fig 3.10. Both methods found
Figure 3.10: Diagram of a linear network.
the maximum modularity by correctly grouping 1-5 and 6-10 into clusters, but
the vector representation for each procedure is very similar. In both diagrams
1
0.8
6
0.7
5
0.5
0.6
7
6
4
0
0.4
8
9
-0.5
3
10
1
0.3
2
5
7
0.5
4
8
3
9
2
0.2
10
1
0.1
-1
-1
-0.5
0
0.5
1
0
-0.8 -0.6 -0.4 -0.2
0
0.2 0.4 0.6 0.8
(a) Vector representation of linear net-(b) Vector representation of linear network using u1 and u2 .
work using u1 and un .
Figure 3.11: These plots are both vector representations of the network shown
in fig. 3.10. The one on the left uses u1 and u2 while the one on the right
uses u1 and un , and has had all its vectors flipped above the x-axis. Both
methods identified the correct cut with nodes 1-5 in one cluster (red) and 6-10
in the other (green). The line dividing the clusters is in blue. The layout of the
vectors using each method is very similar.
(see fig. 3.11) we can see that the vectors are arranged in an angular fashion
according to the order they appear in the network. Even the angles between
adjacent pairs of vectors in each diagram are similar in scale.
27
Chapter 4
Conclusions and Further
Work
We have seen, particularly from Sections 3.1 and 3.3, that the eigenvector un
which corresponds to the lowest eigenvalue βn of the modularity matrix B can
be used to find clusters in normal networks (those with a higher density of
edges within clusters). When used in a form of vector partitioning which allows
components of vectors to be imaginary, this eigenvector has proven to be just
as useful as u2 , whose associated eigenvalue is positive and is more normally
used for this purpose. The most promising evidence for this is in the way the
arrangement of the vectors associated with the nodes represents their position
in the network, as seen in figs. reffig:sample network cut, 3.11.
A potential strategy for those who use p positive eigenvalues in network
partitioning would be to instead use the p eigenvalues with the largest magnitude. They could then search for a modified modularity, subtracting all the
components of the community vectors corresponding to eigenvectors of negative
eigenvalues. To test this method properly would require that a program be
written which generates random networks with intended community structure.
This would be done by letting probability of nodes within the same cluster being connected be P1 , and that for nodes in different clusters P2 , with P1 > P2 .
Then the performance of this strategy could be compared to existing algorithms
while varying n and some function of P1 and P2 .
We have seen in Section 3.3 the ability of the anti-bipartite approach at
representing the structure of networks, in terms of how firmly a vector belongs to
its assigned cluster (see figs. 3.11, 3.9). I think an interesting test of this would
be to calculate the modularity exactly for each of the n + 1 possible positions
of the dividing line in these diagrams. This would show if the arrangement
of vectors is in order of increasing modularity up to the maximum point, and
then decreasing after. I have a suspicion that this is the case, and that this
would allow us to immediately grade all nodes in terms of their firmness in, or
‘allegiance’ to, their cluster.
28
Bibliography
[1] M.E.J Newman, “Community structure in social and biological networks,”
PNAS vol. 99 no. 12 7821-7826 (2002)
[2] M.E.J Newman, “The Structure and Function of Complex Networks,”
SIAM Review 45 167-253 (2003)
[3] M. Parter, N. Kashtan, and U. Alon. “Environmental variability and modularity of bacterial metabolic networks,” BMC Evol Biol. 7: 169. (2007)
[4] G. W. Flake, S. Lawrence, C. L. Giles, and F. M. Coetzee. “SelfOrganization and Identification of Web Communities,” IEEE Computer,
35(3), 6671, (2002)
[5] D. J. Watts and S. H. Strogatz, “Collective dynamics of ’small-world’ networks,” Nature, 393:440-442 (1998)
[6] G. P. Garnett, J. P. Hughes, R. M. Anderson, B. P. Stoner, S. O. Aral, W.
L. Whittington, H. H. Handsfield, and K. K. Holmes, “Sexual mixing patterns of patients attending sexually transmitted diseases clinics,” Sexually
Transmitted Diseases 23, 248257 (1996).
[7] M.E.J Newman, “Finding community structure in networks using the eigenvectors of matrices,” Phys. Rev. E 74, 036104 (2006)
[8] B.W. Kernighan, S Lin. “An efficient heuristic procedure for partitioning
graphs,” Bell System Technical Journal, vol. 49 no. 2 291-307 (1970)
[9] A. L. Barabási and R. Albert. “Emergence of Scaling in Random Networks”
Science Vol. 286 no. 15 pp. 509-512. (1999)
[10] U. Brandes et al, “On Modularity Clustering,” IEEE Transactions on
Knowledge and Data Engineering, Vol. 20, no. 2, pp 172-188, (2008)
[11] Article on “Modularity (Networks)”, available at:
http://en.wikipedia.org/wiki/Modularity (networks)
[12] R. Guimer, M. Sales-Pardo, and L. A. Nunes Amaral, “Modularity from
fluctuations in random graphs and complex networks,” Phys. Rev. E 70,
025101(R) (2004)
29
[13] Arenas, Duch, Fernández and Gómez, “Size reduction of complex networks
preserving modularity,” New Journal of Physics Vol. 9 no. 176 (2007)
[14] Facebook online Press Room, available at:
http://www.facebook.com/press/info.php?statistics
[15] A. F. Bogaert. “The prevalence of male homosexuality: the effect of fraternal birth order and variations in family size,” Journal of Theoretical
Biology Vol. 230, Issue 1, 33-37 (2004).
30
Appendix A
Range of Modularity
We had the modularity given by
·
¸
ki kj
1 X
Aij −
δ(gi , gj )
2m ij
2m
Q =
(A.1)
Here the delta function makes sure that we are only adding terms that are in
the same cluster. We can equivalently call the clusters Ck and sum over k from
1 to c instead (c is the number of clusters), including only those nodes within
that cluster:
¸
c ·
1 X
ki kj
Q =
i, j ∈ Ck
(A.2)
Aij −
2m
2m
k=1
The second term here can also be expressed as the square of the sum of all the
degrees

Ã
!2 
c
X
1 X
1
Q =
ki 
Aij −
i ∈ Ck .
(A.3)
2m
2m
i
k=1
We now think of the degrees of the nodes in terms of endpoints of edges.
These edges are either completely contained in the cluster (intra-cluster edges)
or travel from cluster Ck to some other cluster (inter-cluster edges). The sum
of all the degrees must equal the number of endpoints of inter- and intra-cluster
edges. Thus the sum of all the degrees represents double the number of intracluster edges (they are double counted because both ends are in cluster) plus
the number of inter-cluster edges. Let us call the number of intra-cluster edges
of cluster k mk , and the number of inter-cluster edges m̂k . Thus we have
Ã
X
!2
ki
= (2mk + m̂k )2
i
31
i ∈ Ck .
(A.4)
In fact, the sum over Aij is also a sum over all edges in the cluster, again with
double counting. With this in mind the modularity is given by
"
#
c
2
1 X
(2mk + m̂k )
Q =
2mk −
2m
2m
k=1
"
µ
¶2 #
c
X
2mk + m̂k
mk
−
.
(A.5)
=
m
2m
k=1
As we would expect, decreasing the number of inter-cluster edges m̂k has the
effect of increasing the modularity. We will therefore let this equal zero to find
the maximum modularity:
¸
c ·
X
m k ³ m k ´2
Q =
−
m
m
=
k=1
c
X
k=1
m(m − mk )
.
m2
(A.6)
From this we see that Q is maximised if all the mk are equal. Allowing this,
and using c × mk = m we have
m(m − mk )
m2
mk
= 1−
m
1
= 1− .
(A.7)
c
This is maximised by letting the number of clusters, c, approach infinity. Thus
Q has a maximum value of one. Note that with two clusters, this equation also
proves that the upper bound for Q is 1/2.
Some authors have called the quantity mk /m the “coverage” of cluster k and
have defined it to be one when m, and thus mk , are equal to zero. With this
alternative, modularity of exactly one can be achieved. This is not the approach
taken in this report.
The lower bound can also be calculated from eqn. (A.5). Since the modularity is strictly decreasing in m̂k , we must then minimise mk . We get
µ
¶2
c
X
m̂k
Q =
−
.
(A.8)
2m
Q
= c
k=1
Again, this reaches a limit when all the m̂k are equal. This time we note that
c × mk = 2m since each inter-cluster edge is counted twice, which gives us
¶2
µ
m̂k
Q = −c
2m
µ ¶2
1
1
= −c
=− .
(A.9)
c
c
32
Thus the modularity is minimised when c is small. Since we must have at most
one inter-cluster edge (excluding any uninteresting cases where there are no
links or no clusters), the smallest c can be is two. Thus the lower bound for
modularity is −1/2.
33
Download