Anti-Bipartite approach to Modularity Optimisation in Networks Mathematics Course 499 Brendan O’Dowd School of Mathematics Trinity College, University of Dublin With thanks to supervisor Dr. Conor Houghton May 14, 2009 Abstract Many real networks exhibit community structure, whereby nodes tend to form clusters with a higher density of edges within them than between them. I propose new algorithms for cluster detection in networks, all based on vector partitioning, which use eigenvectors of the modularity matrix more normally used for analysis of bipartite networks. By adopting an ‘anti-bipartite’ approach to cluster detection, my algorithms found the correct partitions of several sample networks. Furthermore, the vector representation of this approach shows that these eigenvectors contain information about the structure of the network. Contents 1 Introduction to Networks and Modularity 1.1 Networks and Clustering . . . . . . . . . . . . . . . . . . . . . . . 1.2 Modularity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 3 4 2 Vector Partitioning 2.1 Background to Vector Partitioning . . . . . . . . . . . . . . . . . 2.2 Testing Vector Partitioning . . . . . . . . . . . . . . . . . . . . . 2.3 Bipartite networks and negative modularity . . . . . . . . . . . . 9 9 11 16 3 Anti-bipartite approaches to Vector Partitioning 3.1 Using both positive and negative eigenvalues to find ordinary networks . . . . . . . . . . . . . . . . . . . 3.2 Adjusting the vectors when βn and βn−1 are used . . 3.3 Adjusting the vectors when β1 and βn are used . . . 19 clusters in . . . . . . . . . . . . . . . . . . . . . 19 21 24 4 Conclusions and Further Work 28 A Range of Modularity 31 1 List of Figures 2.1 2.2 2.3 2.4 2.5 2.6 2.7 Sample network . . . . . . . . . . . . . . . . . . . . . . . . Modularity chart of sample network . . . . . . . . . . . . Best cut of sample network . . . . . . . . . . . . . . . . . Facebook friend network diagram . . . . . . . . . . . . . . Facebook friend network - Trintity College Subsection . . Bipartite network . . . . . . . . . . . . . . . . . . . . . . . Vector representation of approximately bipartite network . . . . . . . 12 13 13 14 15 16 18 Vector representation using high and low eigenvalues . . . . . . . Modularity plot using high and low eigenvalues . . . . . . . . . . Vector representation busing high and low eigenvalues with cut . Plot using low eigenvalues for normal network . . . . . . . . . . . Modularity plot with lowest eigenvalues and vectors reflected through the origin . . . . . . . . . . . . . . . . . . . . . . . . . . 3.6 Plot using high and low eigenvalues for normal network . . . . . 3.7 Plot of modularity when using high and low eigenvalues . . . . . 3.8 Partitioned vector representation of the network using high and low eigenvalues . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.9 Sample network showing location of cut between clusters . . . . . 3.10 Diagram of a linear network . . . . . . . . . . . . . . . . . . . . . 3.11 Comparison of methods using linear network. . . . . . . . . . . . 20 20 22 22 3.1 3.2 3.3 3.4 3.5 2 . . . . . . . . . . . . . . . . . . . . . 23 25 26 26 26 27 27 Chapter 1 Introduction to Networks and Modularity 1.1 Networks and Clustering A wide range of areas under research can be represented as networks, which are composed of a set of nodes which may be connected to one another by edges. Examples include social networks where people may be connected if they are friends or are related through family or background, biological networks such as food webs, genetic networks, protein networks, metabolic systems, and neural networks [1][2][3]. Other networks of interest are citation networks, the internet, power grids, communication and distribution networks and many more besides [2][4][5][6]. Of particular interest is a common phenomenon known as community structure or ‘clustering’ within networks, whereby nodes in a network tend to form groups with a higher density of connections within them than between them [7]. This behaviour usually corresponds to some real grouping or interdependence. For example, clusters in a social network may correspond to groups of people that work closely together or have similar interests, or clusters appearing in a cell network may indicate that these cells perform some special function together [1][3]. In the former example where we may have expected to find communities, it may be of practical use to locate a specific group of people. In the latter example, finding community structure that we did not expect to see may help in our understanding of how the network operates on a macro level. Our natural ability to identify clusters depends on the size of the network, how that network is presented and our familiarity with the entities of which that network is comprised. For many networks under current investigation these factors prevent us from identifying clusters manually in any reasonable amount of time. Thus the problem of identifying and quantifying community structure becomes a computational task, and as such, has developed considerably following the arrival of high performance computing in the mid 1980’s. Of key 3 concern here is how a computer program might identify clusters within a network, which requires that we try to convert our own intuition of what a cluster is into something more specific. In this paper I consider only networks that are unweighted and undirected, i.e. the magnitude and direction of the connection of two nodes in a network is ignored. An example of a weighted network would be a road network which takes into account the average volume of traffic between any two connected points, A and B. An example of a directed network would be an information flow network, such as Ann tells Brendan a story, who in turn tells the story to Carol. 1.2 Modularity One of the earliest methods of separating clusters was Graph Partitioning [8]. This method arose from a need to increase computational performance by assigning tasks (e.g. calculations) to different processors within a computer. Many of these tasks would have to communicate amongst one another before being carried out, as they may rely on the results of a separate calculation. Typically communication is much faster within a processor than between them. The goal, therefore, was to minimise the number of times that data had to be transferred from processor to processor. In network terms this means grouping nodes into clusters such that the number of connections (or edges) between the clusters is minimised. The number of edges between clusters is called the cut size. While minimisation of the cut size works well for computational purposes, it does not seem to divide groups of nodes into communities as we understand them [7]. One of the reasons is that the number of nodes in each group must first be given, which is unlikely to be available in any real scenario other than a computer network. A more fundamental reason is that a good division into separate clusters may not be one in which the number of edges between groups is minimised. An alternative approach is suggested by M. E. J. Newman, who says that a good division “is one in which the number of edges between groups is smaller that expected ”[7]. Equally this means finding a division in which the number of edges within groups is larger than expected. With this latter interpretation in mind he introduces Modularity, Q, which can he defines in a conceptual sense by Q = (number of edges within clusters) − (expected number of such edges) . (1.1) The job becomes one of allocating nodes into clusters, and finding the arrangement where Q is maximised. It is important to note here that modularity is all about telling us how good a division of a network is. One of the problems with graph partitioning is that it finds a division of the network regardless of whether a natural division actually exists. If there is no natural clustering behaviour in a network, then a good 4 cluster detection approach should have some sense that this is the case. The modularity approach has this advantage, since it is used to quantify as well as detect the clusters. The first term in eqn. (1.1) is calculated via the n × n adjacency matrix Aij (where n is the number of nodes in the network), given by ½ 1 if node i is connected to node j Aij = (1.2) 0 otherwise Note that we assume that no node is self connected, i.e. Aii = 0 for all i ∈ [1, n]. The degree of node i, denoted ki , is the number of other nodes connected to it, and can be calculated from Aij thus ki = n X Aij . (1.3) j=1 If Pm is the total number of edges in the network, then the sum of all the degrees, ki , will be double this amount, 2m, since each edge is counted by the degree of two separate nodes. If we let gi be the cluster to which node i belongs, then we give the total number of edges within clusters by X Aij δ(gi , gj ), (1.4) ij where δ(gi , gj ) equals one if gi = gj and zero otherwise. What we mean by the expected number of edges within clusters requires a little more thought. How many edges would you expect there to be between nodes i and j? It is useful to imagine this as the probability of i and j being connected if all the edges were taken up and laid between nodes at random. Let us denote this value Pij . We will assume that the expected degree of each node is the same as in the real network, i.e. X X ki = Aij = Pij (1.5) j j This assumption is made so that the random model will have the same degree distribution as the original network. This is thought to be important, since almost all real networks have a right skewed degree distribution [9]. The assumption that the degrees of the nodes in the random network remain the same implies that the total number of edges is also unchanged: X X X ki = Aij = Pij = 2m (1.6) i ij ij We assume that the probability of an edge being connected to node i is solely a function of its degree which we can call f (ki ). This means that the probability of nodes i and j being connected is a product of these functions Pij = f (ki )f (kj ). 5 (1.7) Summing over j gives n X Pij = f (ki ) j=1 n X f (kj ) = ki . (1.8) j=1 Since this is true for all i and the summation over f (kj ) does not depend on i, we see that f (ki ) is simply a constant (C, say) times ki . If we sum again over j we must get 2m, from (1.6). Therefore we have Pij = f (ki )f (kj ) = Cki Ckj (1.9) which implies X Pij = C2 X ij ki i X kj j (1.10) and so 2m C = C 2 (2m)(2m) 1 = √ . 2m (1.11) Using (1.7) we can now write a simple, calculable expression for Pij Pij = ki kj , 2m (1.12) and the expected number of nodes within clusters: X ki kj ij 2m δ(gi , gj ). (1.13) Note that this expression for Pij is easily calculated. Alternative forms for Pij are suggested by other authors, but the form presented here has proven successful in several studies and will be used throughout this report. The modularity is now given by Q = = 1 X [Aij − Pij ] δ(gi , gj ) 2m ij ¸ · 1 X ki kj δ(gi , gj ) Aij − 2m ij 2m (1.14) (1.15) where the prefactor of 1/2m is included to ensure that the maximum modularity is one, in line with other measures of network division (such as the clustering coefficient [1]) and previous definitions of modularity. The minimum possible value 6 for modularity is −1/2 [10], despite numerous sources (including Wikipedia [11]) claiming that the lower bound is −1. A derivation of these limits is given in Appendix A. Some authors feel that when a community detection algorithm is used to detect clusters for a random graph of the Erdös − Rényi model (where all nodes are connected with equal probability P ), or for scale-free models which do have a right skewed degree distribution, the result should be zero. However this has been shown by Guimerà et al. not to be the case for Newman’s modularity. They have shown that the modularity can be calculated from simple parameters of those networks [12]. This shows that due to fluctuations, even stochastic network models give rise to community structure as defined by Newman. Thus the selection of a random model used to calculate the “expected number of links between clusters” is the subject of debate, but one in which I will not get into in this report. It is fair to say that if it is the magnitude of the modularity one is interested in, one may have to compare it to the modularity of a similar sized network that has no intended community structure. As I am only interested in the maxima and minima of modularities I will absolve myself of this task. It is tempting to absorb the prefactor of 1/2m prefactor into the definition of P and say that Pij is simply the product of ki /2m and kj /2m, which each look like they could be the probability of a random edge being attached to nodes i and j. This is intuitively satisfactory, but hard to justify and there seems to be no reason why 1/2m would be absorbed into the definition of Aij . In the current form (eqn. (1.15)) the modularity can be calculated exactly, yet from a computational point of view the delta function is somewhat awkward. Fortunately a convenient matrix form is available. We introduce the n × c index matrix S, where c is the number of clusters that we want to split the network into. The entries of S are defined ½ 1 if node i is a member of cluster j Sij = (1.16) 0 otherwise Note that the j th column of S is a vector of n elements and tells us which nodes are in the j th cluster. This interpretation will be of use later when S is broken up. The delta function δ(gi , gj ) can now be given by δ(gi , gj ) = c X Sik Skj (1.17) k=1 and so the modularity now becomes c Q= 1 XX [Aij − Pij ] Sik Skj . 2m ij (1.18) k=1 We now introduce the modularity matrix B, equal to A − P which leaves us with an important matrix formula for Q: Q = Tr(S T BS). 7 (1.19) Note all the terms in the matrix definition for modularity can be easily plugged into a computer program. This definition is therefore very useful for measuring the modularity exactly, and we will return to it throughout the report. 8 Chapter 2 Vector Partitioning 2.1 Background to Vector Partitioning While the matrix definition for modularity given in eqn. (1.19) will tell us when we have found a good split of the network into clusters, it offers little insight into how we practically find the clusters themselves. Unfortunately, trying every possible partition of the network is computationally unfeasible since the number of different partitions for a network of n nodes is the nth Bell number [13]. Thus even for a small network with 30 nodes an exhaustive search would require that we calculate the modularity 8.5 × 1023 times. Another approach is required. Vector partitioning is a useful method that can help us to do this. First we expand B in terms of the matrix of its eigenvectors U (where U = (u1 |u2 | . . . |un )) and the diagonal matrix of eigenvalues D so that (1.19) becomes Q = Tr(S T U DU T S). (2.1) If we call the eigenvalues βi where Dii = βi , then the modularity can be written Q= n X c X βj (uTj sk )2 , (2.2) j=1 k=1 where sk is the k th column of the n × c matrix S. What we essentially have now is a sum of n elements, where the j th element is proportional to βj and depends on the corresponding eigenvector, uj , i.e. Q = β1 [stuff 1 (u1 )] + β2 [stuff 2 (u2 )] + . . . + βn [stuff n (un )] (2.3) If we arrange the eigenvalues in order of size, so that β1 ≥ β2 ≥ . . . ≥ βn−1 ≥ βn then we see that a good approach would be to maximise the “stuff” preceded by large positive eigenvalues such as β1 , β2 and so on. For reasons such as minimising computational time and other practical reasons that we will soon see, it is neither practical nor useful to consider very many of these terms. 9 Typically it is sufficient to maximise two or three of these “stuffs” to achieve high modularity. In order to do this it is necessary to modify (2.1) as follows Q = Tr(S T U (D − αI + αI)U T S) = nα + Tr(S T U (D − αI)U T S) " n #2 n X c X X (βj − α) Uij Sik . = nα + j=1 k=1 (2.4) i=1 where α is a constant that we will discuss momentarily. The n prefactor to α arises since Tr(S T S) = n and U is orthogonal. We now reduce the summation over all n eigenvalues to just the first p leading eigenvalues, giving " n #2 p X c X X Q ' nα + (βj − α) Uij Sik . (2.5) j=1 k=1 i=1 The choice of α deals with the minimisation of the error associated with this approximation. Here we will only give the result of this exercise, which finds that α is an average of the n − p lowest eigenvalues α= n X 1 βi . n − p i=p+1 (2.6) Finally, we simplify (2.5) by defining a set of n vectors ri , i = 1, 2, . . . n, each of dimension p. They are given by p [ri ]j = βj − αUij . (2.7) Because α is the average of the eigenvalues not considered for maximising modularity, it will be less than than all of the p leading eigenvalues, so the components of all ri will be real. Inserting this into (2.5) gives Q ' = nα + nα + " n p X c X X #2 (βj − α)Uij Sik j=1 k=1 i=1 j=1 k=1 i∈Gk " p X c X X #2 [ri ]j (2.8) where the summation over i ∈ Gk indicates that we are summing over all vectors ri that correspond to nodes that are in the same cluster. This reduced summation has come about due to the behaviour of S discussed at eqn. (1.17). We now introduce the “community vectors” Rk which represent the sum of all vectors that make up cluster k. This simplifies 2.8 further, giving Q = nα + c X k=1 10 2 |Rk | . (2.9) To maximise the community vectors, we need to group those vectors ri that point in roughly the same direction. This is extremely useful, since it gives us not only a graphical representation of the network, but also a method to go about finding the clusters. From here it is relatively easy to prove (see [7] for derivation) that if we consider breaking the network into just two clusters, then the community vectors will be of equal magnitude and exactly oppositely directed. (This is true when the contributions of all n dimensions are used. When two dimensions are used, as in much of the following examples, the community vectors appear only approximately equal in magnitude and oppositely directed.) It can also be proven that it is never beneficial to place a vector within a cluster such that its associated community vector points in the opposite direction, i.e. we should never let node i be in cluster k if ri · Rk < 0. Thus for two clusters we can be given the direction of R1 , R2 or the dividing plane between the clusters to define the network split. If we consider just the two leading eigenvalues β1 and β2 1 , then we can find the best split by imagining the line dividing the clusters rotating from 0◦ to 360◦ and grouping the vectors ri at each angle depending on what side of the dividing line they are on. At every angle the vectors to each side are added to give the community vectors R1 and R1 . The sum of the magnitudes of the two community vectors are then used to find the modularity as given by eqn. (2.9). When this has been done for every angle a plot of modularity versus angle of R1 can be built up and the maximum modularity found. It is not just the angle of a vector ri , but also its magnitude that tells us about a node’s membership of a cluster. To illustrate this, consider a set of vectors {ri } (i from 1 to n) that has been optimally split into two clusters using the method described above. If we now move a node whose vector is very long into the wrong cluster, then from eqn. (2.8) the modularity will fall by a substantial amount. Conversely, if that vector is very small then it can be moved at relatively little cost to the modularity. Thus the length of a vector indicates the “certainty” of the algorithm in assigning it to its cluster. A large vector shows that we can be quite certain that the algorithm has placed that node in the correct cluster whereas a short vector shows that the node is not a clear member of either group. 2.2 Testing Vector Partitioning Using C programming I set about writing a program, or rather a series of programs, to separate networks into clusters. Some networks were real, based on data recorded by network researchers examining networks of football teams, email records and citations of physicists’ publications. I also created several simple networks with around a dozen nodes and very obvious clustering. These 1 It is more difficult, yet by no means impossible to use more positive eigenvalues such as β3 ,β4 etc. and cycle through all directions for a dividing plane in higher dimensions. In this report we will look only at instances where two eigenvalues were used. 11 were useful when examining how the procedure worked, and how it might be improved. The first of my programs took in the data about the connections of each node, created the matrices A and P as defined by (1.2) and (1.12) and then subtracted one from the other to get the modularity matrix B. The second program took B and calculated its eigenvalues and eigenvectors, and the third program then used a selection of these to calculate each of the ri using (2.7). The fourth program rotated the direction of a line through the origin from 0◦ to 360◦ . At each tenth of a degree the program checked which side of this dividing line the vectors lay on, found their sum and then calculated the modularity using (2.9). Since this modularity is an approximation using only two out of the available n eigenvectors, I wrote a fifth program which took this suggested division and calculated the modularity exactly using (2.1). Fig. 2.1a shows an example of a simple network that I drew where it is intended that nodes 1 to 5 form one cluster and nodes 6 to 11 form another. Fig. 2.1b shows the plot of the vectors associated with each node as defined by (2.7). Note that the vector for node 5 is the shortest, indicating that it is least clear for this node which cluster it belongs to, as discussed in the previous section. This seems reasonable given its location in fig. 2.1a. Fig. 2.2 shows (a) Sample network (b) Vector representation Figure 2.1: Sample network of 11 nodes and the associated vector representation. the plot of modularity versus the direction of the dividing line for the simple network given in fig. 2.1a (certain prefactors to the modularity have been left out, which doesn’t matter since we are only concerned with finding the point of maximum modularity). The first obvious feature one notices about this plot is that the region from 0◦ to 180◦ is identical to that from 180◦ to 360◦ . This is because the procedure has no notion of what we have called R1 and R2 , and calculates the same sum as each community vector passes by the same point. We could just have rotated the dividing line from 0◦ to 180◦ but this is a relatively minor issue. The point of maximum modularity corresponds to a cut through the vector plot of fig. 2.1b at around 90◦ . As can be seen from fig. 2.3 where 12 8 Modularity 6 4 2 0 -2 0 50 100 150 200 250 300 350 Angle of dividing line in degrees Figure 2.2: Modularity chart of sample network. 1 6 10 0.5 5 9 4 1,32 0 7 -0.5 11 8 -1 -1 -0.5 0 0.5 1 Figure 2.3: Vector representation of sample network cut by dividing line (blue) into two clusters shown here in red and green. 13 the blue line divides the clusters, this corresponds to the separation of the nodes that we had hoped for. While the method described above can find the maximum modularity when the network is split into only two clusters, plotting all the ri can still show a network split that is composed of many more communities, even when plotted in only two dimensions. To illustrate this, I used the social networking website Facebook to compose my friend network. Facebook is a hugely popular social networking website with over 175 million active users worlwide [14]. Facebook allows users to create a personal profile, add other users as friends, send them messages and so on. I used my 133 friends on Facebook to create my social network, and then checked the list of mutual friends of each of these to see which people were connected. As can be seen from fig. 2.4, my friends form 3 distinct groups; those from my home town of Castlebar (coloured red), those who are in college with me (coloured green) and another group of friends I met when at the International Conference for Physics Students (ICPS) held in Krakow during the summer of 2008 (coloured blue). The colouring was done manually by me since I know how I met all of these people. 1 J.S. 0.5 O.N. 0 Z.B. -0.5 -1 -1 -0.5 0 0.5 1 Figure 2.4: Vector representation of my friend network on social networking site Facebook. My friends from Castlebar are in red, College friends are in green and ICPS friends are in blue. A number of friends who do not fall into these categories are in pink, but these vectors are barely visible. There is also a small group of 5 people who do not fit comfortably into any of these three categories and so the program, not knowing where they belong, has left them very close to the centre. They are coloured pink in fig. 2.4 but are hardly visible. Only 3 people do not fit exactly into their groups and 14 these are marked by their initials. Z.B. stands for my girlfriend Zara who is from Castlebar but is now studying in Trinity College and so is placed between those two groups. O.N. and J.S. stand for Orna Nicholl and Jessica Stanley an undergraduate and graduate respectively of Trinity College. They are both involved with the Institute of Physics and have met many of my friends from the ICPS at various other conferences, and so they are placed between the Trinity College cluster and the ICPS cluster. This graph shows how accurately vector partitioning can find clusters within networks, even when only using 2 eigenvectors out of a possible 133. The group of my 64 college friends on Facebook is further broken up in fig. 2.5. The vectors are coloured according to the subject being studied by each friend; Physics students are in red, Maths students are in green, and my fellow Theoretical Physics students are in blue. Friends at college not in any of these subjects are in pink. Students of Theoretical Physics take classes in both 1 0.5 0 -0.5 -1 -1 -0.5 0 0.5 1 Figure 2.5: Vector representation of my college friends based on Facebook connections. Physics students are in red, Maths students are in green and Theoretical Physics students are in blue. The pink vectors correspond to friends in none of these courses. the School of Physics and the School of Maths and tend to have a mixture of friends from all three courses, whereas Maths and Physics students do not share classes outside their own department. One notices that the clusters are not as clearly defined as the previous case where all my friends were considered. This shows a higher level of connectedness on this local level compared to my entire friend network in fig. 2.4. The graph also seems to indicate that students of Theoretical Physics are more friendly with Maths students than those studying Physics. This may be due to the fact that we spent a majority our time doing 15 Maths courses in first and second year. If the blue vectors are ignored, we can see that there is little interaction between Maths and Physics students. One classmate has suggested that a camraderie developed between students of Theoretical Physics and Mathematics while working at regular homework assignments during our first year together. 2.3 Bipartite networks and negative modularity We have seen that the eigenvectors corresponding to the most positive eigenvalues have proven very useful in splitting networks. It turns out that at the other end of the spectrum the very negative eigenvalues, βn , βn−1 etc. are very useful for splitting a different type of network that has a lower than expected number of links within clusters. A network in which there are no connections within its k clusters is said to be k-partite, and if k is 2 then it is said to be bipartite. Fig 2.6 is my attempt at drawing an aproximately bipartite network, which has only 4 edges within the clusters and the other 11 between them. This type Figure 2.6: An example of a network that is approximately bipartite. of network is not just a curiosity. A real life example would be a relationship network where the nodes are people and edges indicate a relatonship between two people at some point in the past. If our clusters are comprised of males and females then we would, based on statistics on sexual orientation (for examples see [15]), expect a majority of the edges to lie between the clusters. From here on where there may be confusion over which type of network I am talking about, I will refer to the network discussed up to now (with a higher density of links within clusters) as a ‘normal’ network. If we are trying to minimise rather than maximise the number of links within clusters, then we need to minimise the modularity as we have defined it. Looking back at our expression for modularity as a sum of terms proportional to the eigenvalues (2.3) it now seems logical to maximise the terms with the most negative prefactors, i.e. “stuff n” and “stuff n-1”. Retracing our steps through vector partitioning, except this time using βn and βn−1 and their associated eigenvalues, we come across a problem when it comes to defining the ri vectors at eqn. (2.7). We will now find that βn − α and βn−1 − α will be negative, so that the components of each ri will be complex. Newman’s approach is to 16 redefine ri as follows [ri ]j = p α − βj Uij , (2.10) so that the components of ri are all real. He then gives the modularity by Q = nα − c X 2 |Rk | . (2.11) k=1 I prefer to leave the definition of the ri the same, and allow the components to be imaginary. Here the subscripts have been changed to µ, ν to avoid confusion √ with the imaginary i = −1 p [rµ ]ν = βν − αUµν p = −(α − βν ) [uν ]µ p = i (α − βν ) [uν ]µ (2.12) Since our rµ will be two dimensional, let ν = n correspond to the x-component and ν = n − 1 correspond to the y-component of the vector, i.e. ³p ´ ³p ´ rµ = i (α − βn ) [un ]µ x̂ + i (α − βn−1 ) [un−1 ]µ ŷ (2.13) where x̂ and ŷ are unit vectors in the x and y directions respectively. The community vectors Rk will be sums of such vectors and so will be of the form Rk = iax̂ + ibŷ (2.14) ¡ ¢ |Rk |2 = − a2 + b2 . (2.15) where a, b ∈ <, so that While I like this interpretation from a theoretical point of view, my program for finding approximately bipartite clusters works in the same way as if I had taken Newman’s redefinitions at eqns. (2.10), (2.11). Obviously the vectors have to be plotted with real components, but this approach helps to explain the redefinition of modularity for the bipartite case at eqn. (2.11). It will also be of much use later when explaining my approach for using β1 and βn . I amended my program to look for the smallest modularity using the eigenvectors with the lowest eigenvalues. The vector representation for the approximately bipartite network is shown in fig. 2.6, as is the line dividing the nodes at the point of minimum modularity. The clusters suggested by the program for this network didn’t reflect the groups I had intended at all. It was only when I redrew the network arranged according to the clusters recommended by the algorithm that I saw a much better arrangement (see fig. 2.8b). As I had drawn it the modularity was quite low at -0.233, but the new arrangement found by the program reduced it much further to -0.436. The program managed to beat my attempts to draw bipartite networks in this way on more than one occasion, indicating that these things are hard to draw. 17 1 4 0.5 3 10 11 2 13 0 8 9 1 6 12 5 -0.5 7 -1 -1 -0.5 0 0.5 1 Figure 2.7: Vector representation of the approximately bipartite network shown in fig. 2.6. The blue line shows the division of the nodes at minimum modularity. Interestingly, this groups the nodes in a way (see fig. 2.8b) completely different to, and much better than that which I expected. (a) My attempt at drawing an approximately bipartite network (b) Better cut of the same network Figure 2.8: My attempt at drawing two clusters of nodes in an approximately bipartite fashion (left), and (right) the clusters suggested by the program. The modularity for my clusters that I imagined is -0.233, while the lowest modularity found by the program in the new arrangement is -0.436 18 Chapter 3 Anti-bipartite approaches to Vector Partitioning 3.1 Using both positive and negative eigenvalues to find clusters in ordinary networks While studying the approaches to splitting both ordinary networks and those that are approximately bipartite, it became apparant to me that u1 and un (the eigenvectors corresponding to β1 and βn ) seemed to ‘know’ the most about how a network was connected. I wanted to see if both of these vectors could be used together to find clusters in an ordinary network. Some of the inspiration for this idea came from looking at eqn. (2.3) and wondering if “stuff 1” could be maximised and “stuff n” simultaneously minimised to achieve a modularity as good as, or better than the existing procedure. I felt that rejecting a partition with a low number of links within clusters would aid a search for a high number of links within clusters. I set about altering my program to give the user the option of defining the vectors rµ by p [rµ ]ν = α − βν Uµν ν = 1, n. (3.1) Letting the ν = 1 component correspond to x co-ordinates and ν = n component correspond to y co-ordinates, we get ³p ´ ³p ´ rµ = (β1 − α) [u1 ]µ x̂ + (βn − α) [un ]µ ŷ. (3.2) Since the x component will be real and the y component will be imaginary, this is equivalent to ³p ´ ³p ´ rµ = (β1 − α) [u1 ]µ x̂ + i (α − βn ) [un ]µ ŷ. (3.3) Hence the community vectors will be of the form 19 1 6 4 0.5 8 2 11 0 9 10 1,3 7 -0.5 5 -1 -1 -0.5 0 0.5 1 Figure 3.1: Vector representation of the fig. 2.1a using eigenvectors of the highest and lowest eigenvalues. 10 8 6 Modularity 4 2 0 -2 -4 -6 -8 -10 -12 0 50 100 150 200 250 300 350 400 Direction of dividing line with respect to positive x-axis in degrees Figure 3.2: Plot of modularity versus angle of dividing line using the new method for the network shown in fig. 2.1a. 20 Rk = ax̂ + ibŷ a, b ∈ <, (3.4) and their magnitudes will have both positive and negative terms |Rk |2 = a2 − b2 . (3.5) Since the modularity is still given by Q = nα + c X 2 |Rk | , (3.6) k=1 we will effectively be trying to maximise the x components of the community vectors and minimising their y components. We shall use the previous network example shown in fig 2.1a to see if this works. The vector representation is shown in fig. 3.1. Again, they are plotted as though the components were real, but the imaginary aspect of the y components has given us eqn. (3.5). Using the new method of finding the magnitude of community vectors, we get the plot of modularity versus angle of the dividing line shown in fig. 3.2. The maximum modularity is again around 90◦ which corresponds to the cut we had hoped for, as shown in fig. 3.3. This method found the correct clusters in about a dozen other sample networks that I came up with. 3.2 Adjusting the vectors when βn and βn−1 are used From the previous section, it seems possible that the most negative eigenvalue βn can be used to find clusters in normal (non-bipartite) networks. I decided to investigate if clusters could be found using only the two most negative eigenvalues, βn and βn−1 . At first it seemed like a good idea to plot the vectors using the eigenvectors un and un−1 as though the network was approximately bipartite, and then look for the highest modularity. This never worked however, for reasons which are best understood by studying plots of the vectors in this scenario. Fig. 3.4a shows the vectors for our sample network using un and un−1 . We would like the vectors 1 - 5 to be in one cluster, and 6 - 11 in another. However, this method of using βn and βn−1 separates nodes that are closely connected to opposite sides of the origin. Vectors that should be added together to form a community vector end up directly opposite each other, such as the points 4 and 5 or 9 and 11. Therefore we will never group them correctly by simply dividing them by a line at any angle, regardless of whether we are looking for high or low modularity. However, the fact that neighbouring nodes form vectors that are oppositely directed can be useful. Nodes that are closely connected tend to form ‘jets’ of vectors pointing in opposite directions. In fig. 3.4a vectors 1 - 6 seem to form two horizontal jets pointing left and right, while vectors 7 - 11 form jets pointing upwards and downwards. I decided to reflect all the vectors through 21 1 6 4 0.5 8 2 11 0 9 10 1,3 7 -0.5 5 -1 -1 -0.5 0 0.5 1 Figure 3.3: Vector representation of the fig. 2.1a using eigenvectors of the highest and lowest eigenvalues. The two clusters are coloured in red and green, and the blue line indicates the line dividing the clusters at the point of maximum modularity. 1 1 11 11 8 0.5 9 0.5 7 10 7 10 4 2 0 5 1,3 6 0 6 1,3 4 5 2 5 2 4 6 1,3 10 7 -0.5 -0.5 8 9 9 -1 11 8 -1 -1 -0.5 0 0.5 1 -1 -0.5 0 0.5 1 (a) Vector representation using lowest eigen-(b) Same plot with vectors reflected through values the origin Figure 3.4: Vector representation of the network in fig. 2.1a using the eigenvectors of the lowest eigenvalues. The vectors seem to form horizontal and vertical ‘jets’ corresponding to neighbouring nodes (left). It was therefore decided to reflect all the points through the origin so they could form groups again (right). 22 the origin to exploit this pattern and get neighbouring nodes beside each other again. This can be seen in fig. 3.4b. Up to now, a dividing line was rotated while the vectors to each side were added to form the community vectors. Now, with jets of neighbouring vectors at intervals of roughly 90◦ it seemed logical to me to only add vectors within 90◦ of one end of the rotating line. This means that while there are two vectors for every node, only the vector nearest one end of the dividing line contributes to a community vector. Since the vectors had been reflected in this way, I decided that the imaginary nature of the y components should be ignored, and the community vectors calculated in the original way with real components. The result of calculating the modularity in this way as the dividing line made its way through 360◦ is shown in fig 3.5. In this plot we see that the pattern repeats itself not twice, but 19 18.5 Modularity 18 17.5 17 16.5 16 15.5 0 50 100 150 200 250 300 350 Angle of dividing line in degrees Figure 3.5: Plot of modularity versus angle of dividing line for vectors given in fig. 3.4b. Only vectors with 90◦ of one end of the dividing line contributed to the community vectors R1 and R2 . four times, since every vector now appears within a 180◦ range. The maximum modularity is found at around 45◦ . This almost gives us the correct cut; nodes 6 and 7 which are grouped with 1 - 5 instead of 8 - 11. This is still a very good result given that the input information, un and un−1 is normally used to solve a completely different type of problem. If we move 7 to its correct group, then the approximate modularity calculated by this method only drops by 0.08% (of course when calculated exactly this change causes Q to increase). A very encouraging point here is that if we look at the graph 3.4b and go clockwise from 2, we pass by every point in the same order as if we were going from left to right in the diagram of the network (see fig. 2.1a). This technique was tested on a variety of other test networks. It gave perfect results about half of the time and made one mistake about a quarter of the time. Even in the cases where it made more than one mistake, the result showed a 23 good attempt at finding the clusters, at no stage bearing a similarity to a random allocation of nodes to clusters. 3.3 Adjusting the vectors when β1 and βn are used My real interest was in using the eigenvectors u1 and un , since they seemed to know the most about the structure of the network and they had already proven useful in Section 3.1. I wanted to plot all the nodes as vectors that use these eigenvalues and then look for ways of discerning the clusters from that plot. Let the components of the ri be as before with those corresponding to u1 on the x-axis and those corresponding to un on the y-axis. From what we have seen so far with this set-up, we now expect that for nodes that are close together in the network will be close to each other in the x direction, but separated from each other in the y direction. We must take this latter effect and turn it around in some way similar to what was done in Section 3.2 when all the vectors were reflected through the origin. I decided to just reflect the vectors in the y direction, since the nodes are already gathered correctly in the x direction. Plots of these vectors and the original ri before the reflection operation are shown in figs. 3.6b and 3.6a respectively. In previous approaches, we were able to identify the clusters by rotating a line through 360◦ as before grouping vectors in some region to the left or right. This will not work this time however, since there is the potential for double counting of vectors, especially those lying close to the x-axis whose reflection will land right next to themselves. It is hard to imagine an algorithm related to those seen already that counts some vectors twice and others only once. For this reason I decided to neglect the negative y region entirely, and focus on splitting those vectors ri above the x-axis. With this set-up, I can now introduce a dividing line rotating about the origin that groups vectors ri to its left and right into community vectors Rk . At every angle we once more measure the lengths of the community vectors and calculate the modularity using (2.9). The maximum modularity was found at 90◦ (see fig. 3.7), corresponding exactly to the cut that we wanted (3.8). This procedure worked for every test network it was tried on, with two interesting points to note. The first is that the modularity was almost always found to be at its maximum when the vectors were cut at 90◦ . This in itself is not a good sign, since whether or not a vector lies to the left or right of a vertical line through the origin depends only on its x co-ordinate, which here depends solely on u1 , so it is possible that u1 is playing no important role at all. The second point I noticed is that as we look at each vector starting from 0◦ and look at each vector as we rotate to 180◦ , then the order we pass them in is almost always the same as their order from left to right. This helpful feature is due in part to un , since any random allocation of the y co-ordinates would not 24 1 6 4 0.5 8 2 11 0 9 10 1,3 7 -0.5 5 -1 -1 -0.5 0 0.5 1 (a) Vector representation using high and low eigenvalues 1 6 0.8 4 0.6 5 7 0.4 0.2 9 8 10 1,3 2 11 0 -1 -0.5 0 0.5 (b) Plot of same vectors reflected above the x-axis Figure 3.6: On the left is the vector representation for the network shown in fig. 2.1a using u1 and un . On the right any vectors below the x-axis have been reflected through it. 25 1 22 20 Modularity 18 16 14 12 10 8 0 20 40 60 80 100 120 140 160 180 Direction of dividing line with respect to positive x-axis in degrees Figure 3.7: Plot of modularity when using β1 and βn . We can see that the modularity reaches a maximum at 90◦ 1 6 0.8 4 0.6 5 7 0.4 0.2 9 8 10 1,3 2 11 0 -1 -0.5 0 0.5 1 Figure 3.8: Plot of vectors divided into their correct clusters using β1 and βn . The clusters are coloured red and green, and the line dividing them at maximum modularity is coloured blue. Figure 3.9: The sample network being used with the cut indicated by the dashed red line 26 give us the same result. In particular, we can see in fig. 3.8 that the vectors closest to the line which divides the clusters - 4, 5 and 6 - are the nodes which are closest to where the cut lies in the actual network (see fig. 3.9). Similarly those vectors which are furthest away from the dividing line - 9, 11 and 2 - belong to the nodes which are furthest away from the cut’s position in the network. As one further test of this feature I decided to use both the existing method (using u1 and u2 ) and my new method (using u1 and un ) on a network of ten nodes each connected in a line as shown in fig 3.10. Both methods found Figure 3.10: Diagram of a linear network. the maximum modularity by correctly grouping 1-5 and 6-10 into clusters, but the vector representation for each procedure is very similar. In both diagrams 1 0.8 6 0.7 5 0.5 0.6 7 6 4 0 0.4 8 9 -0.5 3 10 1 0.3 2 5 7 0.5 4 8 3 9 2 0.2 10 1 0.1 -1 -1 -0.5 0 0.5 1 0 -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 (a) Vector representation of linear net-(b) Vector representation of linear network using u1 and u2 . work using u1 and un . Figure 3.11: These plots are both vector representations of the network shown in fig. 3.10. The one on the left uses u1 and u2 while the one on the right uses u1 and un , and has had all its vectors flipped above the x-axis. Both methods identified the correct cut with nodes 1-5 in one cluster (red) and 6-10 in the other (green). The line dividing the clusters is in blue. The layout of the vectors using each method is very similar. (see fig. 3.11) we can see that the vectors are arranged in an angular fashion according to the order they appear in the network. Even the angles between adjacent pairs of vectors in each diagram are similar in scale. 27 Chapter 4 Conclusions and Further Work We have seen, particularly from Sections 3.1 and 3.3, that the eigenvector un which corresponds to the lowest eigenvalue βn of the modularity matrix B can be used to find clusters in normal networks (those with a higher density of edges within clusters). When used in a form of vector partitioning which allows components of vectors to be imaginary, this eigenvector has proven to be just as useful as u2 , whose associated eigenvalue is positive and is more normally used for this purpose. The most promising evidence for this is in the way the arrangement of the vectors associated with the nodes represents their position in the network, as seen in figs. reffig:sample network cut, 3.11. A potential strategy for those who use p positive eigenvalues in network partitioning would be to instead use the p eigenvalues with the largest magnitude. They could then search for a modified modularity, subtracting all the components of the community vectors corresponding to eigenvectors of negative eigenvalues. To test this method properly would require that a program be written which generates random networks with intended community structure. This would be done by letting probability of nodes within the same cluster being connected be P1 , and that for nodes in different clusters P2 , with P1 > P2 . Then the performance of this strategy could be compared to existing algorithms while varying n and some function of P1 and P2 . We have seen in Section 3.3 the ability of the anti-bipartite approach at representing the structure of networks, in terms of how firmly a vector belongs to its assigned cluster (see figs. 3.11, 3.9). I think an interesting test of this would be to calculate the modularity exactly for each of the n + 1 possible positions of the dividing line in these diagrams. This would show if the arrangement of vectors is in order of increasing modularity up to the maximum point, and then decreasing after. I have a suspicion that this is the case, and that this would allow us to immediately grade all nodes in terms of their firmness in, or ‘allegiance’ to, their cluster. 28 Bibliography [1] M.E.J Newman, “Community structure in social and biological networks,” PNAS vol. 99 no. 12 7821-7826 (2002) [2] M.E.J Newman, “The Structure and Function of Complex Networks,” SIAM Review 45 167-253 (2003) [3] M. Parter, N. Kashtan, and U. Alon. “Environmental variability and modularity of bacterial metabolic networks,” BMC Evol Biol. 7: 169. (2007) [4] G. W. Flake, S. Lawrence, C. L. Giles, and F. M. Coetzee. “SelfOrganization and Identification of Web Communities,” IEEE Computer, 35(3), 6671, (2002) [5] D. J. Watts and S. H. Strogatz, “Collective dynamics of ’small-world’ networks,” Nature, 393:440-442 (1998) [6] G. P. Garnett, J. P. Hughes, R. M. Anderson, B. P. Stoner, S. O. Aral, W. L. Whittington, H. H. Handsfield, and K. K. Holmes, “Sexual mixing patterns of patients attending sexually transmitted diseases clinics,” Sexually Transmitted Diseases 23, 248257 (1996). [7] M.E.J Newman, “Finding community structure in networks using the eigenvectors of matrices,” Phys. Rev. E 74, 036104 (2006) [8] B.W. Kernighan, S Lin. “An efficient heuristic procedure for partitioning graphs,” Bell System Technical Journal, vol. 49 no. 2 291-307 (1970) [9] A. L. Barabási and R. Albert. “Emergence of Scaling in Random Networks” Science Vol. 286 no. 15 pp. 509-512. (1999) [10] U. Brandes et al, “On Modularity Clustering,” IEEE Transactions on Knowledge and Data Engineering, Vol. 20, no. 2, pp 172-188, (2008) [11] Article on “Modularity (Networks)”, available at: http://en.wikipedia.org/wiki/Modularity (networks) [12] R. Guimer, M. Sales-Pardo, and L. A. Nunes Amaral, “Modularity from fluctuations in random graphs and complex networks,” Phys. Rev. E 70, 025101(R) (2004) 29 [13] Arenas, Duch, Fernández and Gómez, “Size reduction of complex networks preserving modularity,” New Journal of Physics Vol. 9 no. 176 (2007) [14] Facebook online Press Room, available at: http://www.facebook.com/press/info.php?statistics [15] A. F. Bogaert. “The prevalence of male homosexuality: the effect of fraternal birth order and variations in family size,” Journal of Theoretical Biology Vol. 230, Issue 1, 33-37 (2004). 30 Appendix A Range of Modularity We had the modularity given by · ¸ ki kj 1 X Aij − δ(gi , gj ) 2m ij 2m Q = (A.1) Here the delta function makes sure that we are only adding terms that are in the same cluster. We can equivalently call the clusters Ck and sum over k from 1 to c instead (c is the number of clusters), including only those nodes within that cluster: ¸ c · 1 X ki kj Q = i, j ∈ Ck (A.2) Aij − 2m 2m k=1 The second term here can also be expressed as the square of the sum of all the degrees à !2 c X 1 X 1 Q = ki Aij − i ∈ Ck . (A.3) 2m 2m i k=1 We now think of the degrees of the nodes in terms of endpoints of edges. These edges are either completely contained in the cluster (intra-cluster edges) or travel from cluster Ck to some other cluster (inter-cluster edges). The sum of all the degrees must equal the number of endpoints of inter- and intra-cluster edges. Thus the sum of all the degrees represents double the number of intracluster edges (they are double counted because both ends are in cluster) plus the number of inter-cluster edges. Let us call the number of intra-cluster edges of cluster k mk , and the number of inter-cluster edges m̂k . Thus we have à X !2 ki = (2mk + m̂k )2 i 31 i ∈ Ck . (A.4) In fact, the sum over Aij is also a sum over all edges in the cluster, again with double counting. With this in mind the modularity is given by " # c 2 1 X (2mk + m̂k ) Q = 2mk − 2m 2m k=1 " µ ¶2 # c X 2mk + m̂k mk − . (A.5) = m 2m k=1 As we would expect, decreasing the number of inter-cluster edges m̂k has the effect of increasing the modularity. We will therefore let this equal zero to find the maximum modularity: ¸ c · X m k ³ m k ´2 Q = − m m = k=1 c X k=1 m(m − mk ) . m2 (A.6) From this we see that Q is maximised if all the mk are equal. Allowing this, and using c × mk = m we have m(m − mk ) m2 mk = 1− m 1 = 1− . (A.7) c This is maximised by letting the number of clusters, c, approach infinity. Thus Q has a maximum value of one. Note that with two clusters, this equation also proves that the upper bound for Q is 1/2. Some authors have called the quantity mk /m the “coverage” of cluster k and have defined it to be one when m, and thus mk , are equal to zero. With this alternative, modularity of exactly one can be achieved. This is not the approach taken in this report. The lower bound can also be calculated from eqn. (A.5). Since the modularity is strictly decreasing in m̂k , we must then minimise mk . We get µ ¶2 c X m̂k Q = − . (A.8) 2m Q = c k=1 Again, this reaches a limit when all the m̂k are equal. This time we note that c × mk = 2m since each inter-cluster edge is counted twice, which gives us ¶2 µ m̂k Q = −c 2m µ ¶2 1 1 = −c =− . (A.9) c c 32 Thus the modularity is minimised when c is small. Since we must have at most one inter-cluster edge (excluding any uninteresting cases where there are no links or no clusters), the smallest c can be is two. Thus the lower bound for modularity is −1/2. 33