Full Talk

Community Detection Algorithm and Community Quality Metric Mingming Chen & Boleslaw K. Szymanski Department of Computer Science Rensselaer Polytechnic Institute Community Structure  Many networks display community structure  Groups of nodes within which connections are denser than between them Community detection algorithms Community quality metrics Two Related Community Detection Topics  Community detection algorithm  LabelRank: a stabilized label propagation community detection algorithm Xie and Symanski, 2013.  LabelRankT: extended algorithm for dynamic networks based on LabelRank Xie, Chen, and Symanski, 2013.  A new community quality metric solving two problems of Modularity M. E. J. Newman, 2006; Newman and Girvan, 2004. LabelRank Algorithm  Four operators applied to the labels     No Label propagation operator Inflation operator Cutoff operator Conditional update operator No 2 1 No 1 1 1 3 Question: NP=P ? Node 1: No; Node 2: No; Node 3: No; Node 4: Yes. 197 Yes 4 PP 1 (No)=3/100; 1 (No)=3/4; PP 1 (Yes)=97/100. 1 (Yes)=1/4. Node 1: Yes. No. Label Propagation Operator W P  where W is the n x n weighted adjacent matrix. P is the n x n label probability distribution matrix which is composed of n (1 x n) row vectors Pi, one for each node  Each element Pi(c) holds the current estimation of probability of node i observing label c  C , where C is the set of labels (here, suppose C={1, 2, …, n})  Ex. Pi=(0.1, 0.2, …, 0.05, …)  To initialize P, each node is assigned a distribution of probabilities of all incoming edges Pi (c )   wic k Nb ( i ) wik , c  C s.t. wic  0. Label Propagation Operator  Each node receives the label probability distribution from its neighbors and computes the new distribution  P (c )   jNb ( i ) wij Pj (c ) i k Nb ( i ) wik , c  C. P3= (0.25, 0, 0.25, 0, 0, 0, 0.25, 0.25, 0, 0) P1= (0.25, 0.25, 0.25, 0.25, 0, 0, 0, 0, 0, 0) P1= (0.25, 0.125, 0.125, 0.125, 0.0625, 0.0625, 0.0625, 0.0625, 0.0625, 0.0625) P2= (0.25, 0.25, 0, 0, 0.25, 0.25, 0, 0, 0, 0) P4= (0.25, 0, 0, 0.25, 0, 0, 0, 0, 0.25, 0.25) Inflation Operator  Each element Pi(c) rises to the inth power:  in Pi (c )  Pi (c )in in P ( j )  i jC  It increases probabilities of labels with high probability but decreases that of labels with low probabilities during label propagation. P1= (0.25, 0.125, 0.125, 0.125, 0.0625, 0.0625, 0.0625, 0.0625, 0.0625, 0.0625) in (in  2) P1= (0.129, 0.0323, 0.0323, 0.0323, 0.00806, 0.00806, 0.00806, 0.00806, 0.00806, 0.00806) Cutoff Operator  The cutoff operator  r on P removes labels that are below the threshold r  [0,1] with the help from Inflation Operator that decreases probabilities of labels with low probabilities during propagation.   r efficiently reduces the space complexity from quadratic to linear. P1= (0.129, 0.0323, 0.0323, 0.0323, 0.00806, 0.00806, 0.00806, 0.00806, 0.00806, 0.00806)  r (r  0.1) P1= (0.129) With r = 0.1, the average number of labels in each node is less than 3. Conditional Update Operator  At each iteration, it updates a node i only when it is significantly different from its incoming neighbors in terms of labels:  jNb ( i ) isSubset (Ci* , C *j )  qki , where Ci* is the set of maximum probability labels at node i at the last step. isSubset ( s1 , s2 ) returns 1 if s1  s2 and 0 otherwise. ki is the node degree and q∈ [0,1].  isSubset can be viewed as a measure of similarity between two nodes. Effect of Conditional Update Operator Running time of LabelRank  O(Tm): m is the number of edges and T is the number of iterations. LabelRank is a linear algorithm Performance of LabelRank LabelRankT  It is a LabelRank with one extra conditional update rule by which only nodes involved changes will be updated. Changes are handled by comparing neighbors of node i at two consecutive steps, Nbt 1 (i ) and Nbt (i ) . Two Problems of Modularity Maximization  Split large communities  Favor small communities  Resolution limit problem  Modularity optimization may fail to discover communities smaller than a scale even in cases where communities are unambiguously defined.  This scale depends on the total number of edges in the network and the degree of interconnectedness of the communities.  Favor large communities Fortunato et al, 2008; Li et al, 2008; Arenas et al, 2008; Berry et al, 2009; Good et al, 2010; Ronhovde et al, 2010; Fortunato, 2010; Lancichinetti et al, 2011; Traag et al, 2011; Darst et al, 2013. Modularity  Modularity (Q): the fraction of edges falling within communities minus the expected value in an equivalent network with edges placed at random ki k j   1 Q   Aij   ci ,c j , 2 | E | ij  2 | E |  c ,c i j 1  0 M. E. J. Newman, 2006. if nodes i and j in the same community, otherwise.  Equivalent definition Newman and Girvan, 2004.  | E in |  2 | E in |  | E out |  2  ci ci ci Q     ,  2| E | ci  | E |     | Ecini |: the number of intra edges of Community ci ; |c | | Ecout |: the number of inter edges of Community ci . i Modularity with Split Penalty  Modularity (Q): the modularity of the community detection result ki k j   1 Q   Aij   ci ,c j . 2 | E | ij  2 | E |  Split penalty (SP): the fraction of edges that connect nodes of different communities 1 SP  Aij (1  ci ,c j ).  2 | E | ij  Qs = Q – SP: solving the problem, favoring small communities, of Modularity ki k j   1 1 Qs  Q  SP  A    Aij (1   ci ,c j ).    ij  ci ,c j 2 | E | ij  2 | E | 2 | E | ij Qs with Community Density  Resolution limit: Modularity optimization may fail to detect communities smaller than a scale  Intuitively, put density into Modularity and Split Penalty to solve the resolution limit problem ki k j 2   1 1 Qds  d ci  ci ,c j  Aij d ci ,c j (1   ci ,c j )    Aij d ci  2 | E | ij  2| E |  2 | E | ij d ci  | Ecini | | ci | (| ci | 1) / 2 d ci ,c j  | Eci ,c j | | ci || c j |  Equivalent definition  in  2 in out |C | | E |   | E | 2 | E |  | E | ci , c j  c  ci ci Qds    i d ci   d ci    d ci ,c j    |E| 2| E | ci   c cj c 2 | E |   j i   |c| Example of Two Well-Separated Communities Modularity (Q) Split Penalty (SP) Qs = Q – SP Qds 2 communities 0.5 0 0.5 0.5 1 community 0 0 0 0.245 Example of Two Weakly Connected Communities Modularity (Q) Split Penalty (SP) Qs = Q – SP Qds 2 communities 0.357 0.143 0.214 0.339 1 community 0 0 0 0.25 Ambiguity between One and Two Communities Modularity (Q) Split Penalty (SP) Qs = Q – SP Qds 2 communities 0.3 0.2 0.1 0.263 1 community 0 0 0 0.249 Ambiguity between One and Two Communities Modularity (Q) Split Penalty (SP) Qs = Q – SP Qds 2 communities 0.25 0.25 0 0.188 1 community 0 0 0 0.245 Example of One Well Connected Community Modularity (Q) Split Penalty (SP) Qs = Q – SP Qds 2 communities 0.167 0.333 -0.167 0.0417 1 community 0 0 0 0.23 Example of One Very Well Connected Community Modularity (Q) Split Penalty (SP) Qs = Q – SP Qds 2 communities 0.0455 0.455 -0.409 -0.239 1 community 0 0 0 0.168 Example of One Complete Graph Community Quality on a complete graph with 8 nodes Modularity (Q) Split Penalty (SP) Qs = Q – SP Qds 2 communities -0.0714 0.571 -0.643 -0.643 1 community 0 0 0 0 Modularity Has Nothing to Do with #Nodes  12  13  2  Q (clique)  Q(tree)  2 *      0.4231;  26  26    12  13  2 1  Qs (clique)  Qs (tree)  2 *     0.3462;   26   26  26  2  12 13 1 1    Qds (clique)  2 *  *1   *1   *   0.4183; 26 4 * 4   26   26  12 2  13 2  2 1 1  Qds (tree)  2 *  *   *   *   0.2214. 26 7 26 7 26 7 * 7     5-clique Example Modularity (Q) Split Penalty (SP) Qs = Q – SP Qds 30 communities 0.8758 0.09091 0.7848 0.8721 15 communities 0.8879 0.04545 0.8424 0.4305 ∆Qs=(0.8424-0.7848)=0.0576 > ∆Q=(0.8879-0.8758)=0.0121 Thanks! Q&A Example of Two Weakly Connected Communities Modularity (Q) Split Penalty (SP) Qs = Q – SP Qds 2 communities 0.309 0.25 0.0586 0.264 1 community -0.00586 0.125 -0.131 0.202

Full Talk

Related documents

Products

Support

Full Talk

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib