Community Structure in Networks: Practice and Significance Elizabeth Leicht Research Fellow CABDyN Complexity Centre 7 January 2011 Learning from Networks EMOTIONS MAPPED BY NEW GEOGRAPHY New York Times 1857; Apr 3, 1933; ProQuest Historical Newspapers The New York Times (1851 - 2004) pg. 17 Friendship map of students in a 7th grade class–adapted from Who Shall Survive, Jacob Moreno, 1934. Dealing with large networks Detecting communities in networks Girvan & Newman J. Moreno Adamic & Glance Calculating modularity M. E. J. Newman PNAS 103, 8577 (2006). E. A. Leicht and M. E. J. Newman Phys. Rev. Lett. 100, 118703, (2008). n 1 X Q= [Aij − Pij ] δci ,cj m i,j=1 1, if there is an edge from j to i . 0, otherwise Pij = the expected number of edges from j to i. ci = the community to which i belongs. Aij = What is the expected number of edges between two nodes? kiin kjout kiin kjout Pij = m Division of a network into two communities si = +1 sj = −1 # " n kiin kjout 1 X Q= (si sj +1) Aij − 2m ij m h Let Bij = Aij − matrix. Q= kiin kjout m i 1 δij = (si sj + 1) 2 be an element of the modularity 1 T 1 T T 1 T s Bs = s B s= s B + BT s 2m 2m 4m Approximate group ID by the sign of the entry for the node in the leading eigenvector, v(1) . ( si = +1, −1, (1) if vi > 0 (1) if vi < 0 Two communities and more Friendship network from 7th grade class divided into two communities by method. Communities with bias in edge direction Construct a network of n nodes and connect pairs of nodes with probability p. Allow random edge direction for intra-community edges. Bias edge direction for inter-community edges. Communities with bias in edge direction Allowing directed edges 1 Ignoring directed edges1 M. E. J. Newman PNAS 103, 8577 (2006). Edge direction bias in real networks Pennsylvania State Pennsylvania State Northwestern Minnesota Iowa Ohio State Michigan Michigan Illinois Michigan State Wisconsin Michigan State Illinois Ohio State Northwestern Wisconsin Minnesota Iowa Purdue Indiana Accounting for win-loss result Purdue Indiana Tracking only games played American football games among US “Big Ten” schools with directed edges from losing team to winning team. Exploratory analysis of networks structure M. E. J. Newman and E. A. Leicht PNAS 104, 9564-9569, (2007.) a b Group identity inferred from network structure. A pattern for edges is not pre-determined. Method: the data and the model Data Observed: network edges, Aij ∀i, j. Missing: group identity of each node, gi ∀i. Model parameters θri : probability there exists an edge from a node in (group) r to a node i. n X θri = 1 i=1 πr : probability a randomly selected node ∈ (group) r . n X i=1 πi = 1 A likelihood problem The likelihood of the data given the model is, Pr(A, g |π, θ) = Pr(A|g , π, θ) Pr(g |π, θ) where Pr(A|g , π, θ) = Y A θgjij,i and Pr(g |π, θ) = ij Y π gj j Frequently, one works not with the likelihood itself, but with the log-likelihood, " # X Y A L = ln Pr(A, g |π, θ) = ln πgj + θgjij,i j i Dealing with missing data We cannot directly observe g . We can calculate an expected value for the log-likelihood over all possible values of g . L= = c X c i Xh X X ln πgi + ... Pr(g |A, π, θ) Aij ln θgi ,j g1 =1 gn =1 X h ir qir ln πr + i X Aij ln θrj j i j where Q A πr j θrj ij Pr(A, gi = r |π, θ) =P qir = Pr(gi = r |A, π, θ) = Q Aij Pr(A|π, θ) s πs j θsj An iterative method–the EM algorithm Initialize model parameters (θ, π) with random values. Find the probability a given node i is a member of group r (E-step). Q A πr j θrj ij qir = P Q Aij . s πs j θsj Maximize the model parameter (M-step) P Aij qir 1X πr = qir , θrj = Pi , n i i ki qir Iterate until convergence. Zachary karate club Disassortative word network Assortative & disassortative structure assortative disassortative success 1 0.5 this paper max. modularity min. modularity 0 0.1 1 probability ratio pout /pin 10 Keystone network Modularity method EM method We assign nodes to groups based on the set of keystone nodes to which they are connected. “Big Ten” results with EM approach Node size is proportional to the probability of the team losing to teams assigned to group 1. Node shading corresponds to the probability that the node is assigned to group 1. qj1 → Two methods for one network Pennsylvania State Northwestern Minnesota Iowa Illinois Purdue Ohio State Michigan Wisconsin Michigan State Indiana Purdue Indiana Wisconsin Illinois Iowa Minesota Pennsylvania Michigan State Northwestern Michigan Ohio State Summary There are many existing methods for detecting structure in complex. Moving forward we need to focus on improving our understanding of what these structures indicate in real networks. Friendship map of students in a 7th grade class–adapted from Who Shall Survive, Jacob Moreno, 1934.