Network theory III David Lusseau BIOL4062/5062 d.lusseau@dal.ca Outline 16 March: community structure Suggested readings: Newman M.E.J. 2003. The structure and function of complex networks. SIAM Review 45,167-256 What is a community? A cluster of individuals that are more linked to one another than to others Traditional techniques Cluster analysis (hierarchical) Multi-Dimensional Scaling Principal Coordinate Analysis Traditional techniques How representative is the result? Loss of information measure: Stress in MDS What is the best division? Cluster analysis Peripheral individuals are lumped together Girvan-Newman algorithm Divisive clustering algorithm Find the boundaries of communities Divide a population of n vertices in 1 to n communities Weakest link between communities: edge betweenness Standardise betweenness at each step Re-calculate edge betweenness at each step Zachary karate club Girvan & Newman 2002 PNAS Finding the best division For each step calculate a modularity coefficient Best division will have the most edges within communities and the least between Take community size into consideration Q 2 eii a i Q=0.42 i a i e ij j 1 2 3 1 30 2 5 2 2 10 2 3 5 2 50 37 2 10 14 2 50 57 2 30 Q ( ( ) )( ( ) )( ( ) ) 108 108 108 108 108 108 Zachary karate club Newman & Girvan 2003 Physics Review E Modularity coefficient The principle of modularity coefficient optimisation can be apply to any community structure algorithm Extension to weighted matrices Edge betweenness Transform similarity matrix into dissimilarity matrix Calculate geodesic path using Djikstra’ algorithm Problem: more likely to remove edges between strongly connected pairs Alternative: Modularity optimisation Forget edge betweenness Optimise for high Q! Computer intensive Prone to false minima Difficult to find out Iterate the optimisation to detect Not always successful Modularity- Greedy algorithm Start with n communities (agglomerative clustering method) At each step link the communities that provides the greatest increase (or the smallest decrease in Q) Modularity- Greedy algorithm Q optimisation Girvan-Newman Overlapping communities Recognise that some individuals sit on the fence Do not force them in one community or the other but identify them as overlapping Palla et al. 2005 Nature Palla algorithm Based on the k-clique principle: a community is composed of a number of k-cliques k-cliques: fully connected subgraphs of k vertices Adjacent k-cliques share k-1 vertices Community: series of adjacent cliques Palla et al. 2005 Nature Palla algorithm Find all k-cliques Calculate the clique-clique overlap matrix Define adjacent cliques Issues (and advantages): k is user-defined, find ‘best’ k by trial and error Works only on binary networks (weighted network transformation) Palla et al. 2005 Nature Simply the best method Modularity matrix A matrix? Let’s eigenanalyse! Let’s rewrite the modularity coefficient: ki k j 1 Q ( Aij )si s j 4m ij 2m Community identification Links distributed at random Newman 2006 PNAS Modularity matrix Bij ( Aij 2m ) Sum rows and sum of columns = 0 ki k j One eigenvector (1,1,1….) with eigenvalue 0 Graph Laplacian Eigenvector of the dominant eigenvalue gives the best community division into 2 communities (negative and positive elements) Magnitude of eigenvector elements Tells us how well a vertex is classified (whether it belongs to the core or the periphery of the community) Zachary karate club Finding the best division Repeat the process on each subgraph Recalculate the modularity coefficient for the whole graph If new division makes 0 or <0 contribution to modularity then do not do it Else continue Power of modularity matrix method Different types of null models can be tested As long as we have One eigenvector (1,1,1….) with eigenvalue 0 1 Q ( Aij Pij )si s j 2m ij To do so, substract sum of rows from diagonal Uncertainty Bootstrapped algorithm m results from community algorithm Matrix: likelihood that 2 individuals belong to the same community Coarse-grain community identity Provides uncertainty overlap Girvan-Newman in Netdraw Modularity matrix in Socprog