slide - National University of Singapore

advertisement

Algorithms for Community Detection in Large Networks

(And guidelines on CS3230R)

Leong Hon Wai ( 梁 汉槐 )

Department of Computer Science

National University of Singapore leonghw@comp.nus.edu.sg

http://www.comp.nus.edu.sg/~leonghw/

Hon Wai Leong, NUS

CS3230R Talk: 13 Feb 2014

(… Finding Communities) Page 1

© Leong Hon Wai, 2013

For CS3230R

 Choose CD algorithm(s)

 Check availability of code

 READ and understand chosen algorithm

 Quick survey CLOSELY-related algorithms

 Prepare implementation, test, evaluation

 Prepare report

 Prepare presentation

Hon Wai Leong, Computer Science, NUS

© Leong Hon Wai

(PPI Complex Detection, Sep 2013) Page 2

CS3230 Talks

 Need Talk on Testing of CD Algorithms

 Schedule

 20-Feb (Wk 6) – Disc. and Choosing Topics

 27-Feb (Break) – no talk

 06-Mar (Wk 7) – Feedback, Plan

 13-Mar (Wk 8) – Davin, WenBo

 20-Mar (Wk 9) – Yujian, Darius

 27-Mar (Wk 10) – ??

 03-Apr (Wk 11) – ??

Hon Wai Leong, Computer Science, NUS

© Leong Hon Wai

(PPI Complex Detection, Sep 2013) Page 3

Hon Wai Leong, Computer Science, NUS

© Leong Hon Wai

(PPI Complex Detection, Sep 2013) Page 4

Large Real-World Networks

 Internet graphs, WWW graphs

 Citation networks, actor networks

 Transportation network, Email networks

 Food Web,

 Social Networks (FB, Linked-In, etc)

 Biochemical networks

 Protein-Protein Interaction (PPI) networks

Hon Wai Leong, Computer Science, NUS

© Leong Hon Wai

(PPI Complex Detection, Sep 2013) Page 5

Community Structure (example)

Hon Wai Leong, Computer Science, NUS

© Leong Hon Wai

(PPI Complex Detection, Sep 2013) Page 6

Community Structure

“groups of vertices with dense intra-group connections , and sparse intergroup connections.”

 Within-group (intra-group) edges.

 High density

 Between-group (inter-group) edges.

 Low density.

Hon Wai Leong, Computer Science, NUS

© Leong Hon Wai

(PPI Complex Detection, Sep 2013) Page 7

Examples of Community Structures

 Communities of biochemical network might correspond to “functional units” of some kind.

 Communities of a web graph might correspond to sets of “web sites dealing with a related topics”.

Hon Wai Leong, Computer Science, NUS

© Leong Hon Wai

(PPI Complex Detection, Sep 2013) Page 8

Community Structure (example)

Hon Wai Leong, Computer Science, NUS

© Leong Hon Wai

(PPI Complex Detection, Sep 2013) Page 9

Where is the Rabbit (Sept 2013)

Typhoon Usagi ( ウサギ , rabbit)

(16-24 Sept 2013)

Hon Wai Leong, Computer Science, NUS http://en.wikipedia.org/wiki/Typhoon_Usagi_(2013)

(PPI Complex Detection, Sep 2013) Page 10

© Leong Hon Wai

Outline of Talk

 Large Networks are Everywhere

 Community Detection: A Quick Overview

Application in Computational Biology

 Protein Complex Detection

 Specialized Algorithms

 Performance Evaluation

 Challenges and Conclusion

Hon Wai Leong, Computer Science, NUS

© Leong Hon Wai

(PPI Complex Detection, Sep 2013) Page 11

http://www.cscs.umich.edu/~crshalizi/notebooks/community-discovery.html

THERE ARE MANY WAYS TO SKIN A CAT…..

THERE ARE EVEN MORE WAYS TO FIND COMMUNITIES IN NETWORKS…..

* Recommended: Aaron Clauset, "Finding local community structure in networks", physics/0503036 =

Physical Review E 72 (2005): 026132 [Clever; but then, Aaron is clever.]

* Aaron Clauset, M. E. J. Newman and Cristopher Moore, "Finding Community Structure in Very Large

Networks", cond-mat/0408187 = Physical Review E 70 (2004): 066111

* J.-J. Daudin, F. Picard and S. Robin, "A Mixture Model for Random Graphs", Statistics and Computing

18 (2008): 173--183

* Michelle Girvan and M. E. J. Newman, "Community structure in social and biological networks," condmat/0112110 = Proceedings of the National Academy of Sciences (USA) 99 (2002): 7821--7826

* Roger Guimera, Marta Sales-Pardo and Luis A. N. Amaral, "Modularity from Fluctuations in Random

Graphs", cond-mat/0403660 = Physical Review E 70 (2004): 025101

* Jake M. Hofman, Chris H. Wiggins, "A Bayesian Approach to Network Modularity", arxiv:0709.3512

[For "Bayesian", read "smoothed maximum likelihood". But nonetheless: cool.]

* Andrea Lancichinetti, Santo Fortunato, Janos Kertesz, "Detecting the overlapping and hierarchical community structure of complex networks", arxiv:0802.1218 [An interesting approach, but not quite as novel as they claim --- cf. Reichardt and Bornholdt --- and I'd really like to see more evidence of superior accuracy and/or robustness]

* E. A. Leicht, M. E. J. Newman, "Community structure in directed networks", arxiv:0709.4500

* M. E. J. Newman o "Modularity and community structure in networks", physics/0602124 = Proceedings of the

National Academy of Sciences (USA) 103 (2006): 87577--8582 o "Finding community structure in networks using the eigenvectors of matrices", Physical Review E

74 (2006): 036104 = physics/0605087

* M. E. J. Newman and Michelle Girvan o "Mixing patterns and community structure in networks", cond-mat/0210146 o "Finding and evaluating community structure in networks", Physical Review E 69 (2003): 026113 = cond-mat/0308217

* Jörg Reichardt and Stefan Bornholdt [Code is available by e-mail from Reichardt, who was very helpful to me when I needed to implement their algorithm.] o "Detecting Fuzzy Community Structures in Complex Networks with a Potts Model", Physical Review

Letters 93 (2004): 218701 = cond-mat/0402349 o "Statistical Mechanics of Community Detection", cond-mat/0603718 = Physical Review E 74 (2006):

016110 o "Clustering of sparse data via network communities — a prototype study of a large online market",

Journal of Statistical Mechanics: Theory and Experiment (2007): P06016

* Jörg Reichardt and Douglas R. White, "Role models for complex networks", arxiv:0708.0958

[Discussion]

* M. Sales-Pardo, R. Guimera, A. Moreira, L. Amaral, "Extracting the hierarchical organization of complex systems", arxiv:0705.1679

* Modesty forbids me to recommend: CRS, Marcelo F. Camperi and Kristina Lisa Klinkner, "Discovering

Functional Communities in Dynamical Networks", q-bio.NC/0609008

* To read: Edoardo M. Airoldi, David M. Blei, Stephen E. Fienberg and Eric P. Xing, "Mixed membership stochastic blockmodels", arxiv:0705.4485

* Nelson Augusto Alves, "Unveiling community structures in weighted networks", physics/0703087

* Leonardo Angelini, Stefano Boccaletti, Daniele Marinazzo, Mario Pellicoro, and Sebastiano Stramaglia,

"Fast identification of network modules by optimization of ratio association", cond-mat/0610182

* L. Angelini, D. Marinazzo, M. Pellicoro and S. Stramaglia, "Natural clustering: the modularity

* Jim Bagrow and Erik Bollt, "A Local Method for Detecting Communities", cond-mat/0412482

* James Bagrow, Erik Bollt, Luciano da F. Costa, "Network Structure Revealed by Short Cycles", condmat/0612502

* S. Boccaletti, M. Ivanchenko, V. Latora, A. Pluchino and A. Rapisarda, "Dynamical clustering methods to find community structures", physics/0607179

* Michael James Bommarito II, Daniel Martin Katz, Jon Zelner, "On the Stability of Community Detection

Algorithms on Longitudinal Citation Data", arxiv:0908.0449

* U. Brandes, D. Delling, M. Gaertler, R. Goerke, M. Hoefer, Z. Nikoloski, and D. Wagner, "Maximizing

Modularity is hard", physics/0608255 [i.e., maximizing Newman's Q is NP hard. I haven't read beyond the abstract yet, so I don't know if they address the question of what makes it hard in the hard cases, and whether those are properties we should expect to see in real-world networks. Conceivably, actual social networks are, on average, easy to modularize...]

* Andrea Capocci, Vito D. P. Servedio, Guido Caldarelli, Francesca Colaiori, "Detecting communities in large networks", cond-mat/0402499

* Horacio Castellini and Lilia Romanelli, "Social network from communities of electronic mail", nlin.CD/0509021

* Leon Danon, Albert Díaz-Guilera, and Alex Arenas, "The effect of size heterogeneity on community identification in complex networks", Journal of Statistical Mechanics: Theory and Experiment (2006):

P11010 = physics/0601144

* Leon Danon, Albert Díaz-Guilera, Jordi Duch and Alex Arenas, "Comparing community structure identification", Journal of Statistical Mechanics: Theory and Experiment (2005): P09008 = condmat/0505245

* Jordi Duch and Alex Arenas, "Community detection in complex networks using extremal optimization", Physical Review E 72 (2005): 027104

* Illes J. Farkas, Daniel Abel, Gergely Palla, Tamas Vicsek, "Weighted network modules", condmat/0703706

* Sam Field, Kenneth A. Frank, Kathryn Schiller, Catherine Riegle-Crumb and Chandra Muller,

"Identifying positions from affiliation networks: Preserving the duality of people and events", Social

Networks 28 (2006): 97--123

* G. W. Flake, S. R. Lawrence, C. L. Giles and F. M. Coetzee, "Self-organization and identification of Web communities", IEEE Computer 36 (2002): 66--71

* Santo Fortunato, "Community detection in graphs", arxiv:0906.0612

* Santo Fortunato and Marc Bathélemy, "Resolution limit in community detection", physics/0607100 = cite>Proceedings of the National Academy of Sciences (USA) 104 (2007): 36--41

* Santo Fortunato and Claudio Castellano, "Community Structure in Graphs", arxiv:0712.2716 [Review paper; thanks to Ed Vielmetti for the pointer]

* Santo Fortunato, Vito Latora and Massimo Marchiori, "A Method to Find Community Structures Based on Information Centrality", cond-mat/0402522

* Kenneth A. Frank, "Identifying Cohesive Subgroups", Social Networks 17 (1995): 27--56

* David Gfeller, Jean-Cédric Chappelier, and Paolo De Los Rios, "Finding instabilities in the community structure of complex networks", Physical Review E 72 (2005): 056135

* Rumi Ghosh, Kristina Lerman, "Structure of Heterogeneous Networks", arxiv:0906.2212

* V. Gol'dshtein and G. A. Koganov, "An indicator for community structure", physics/0607159

* Mark S. Handcock, Adrian E. Raftery and Jeremy Tantrum, "Model-Based Clustering for Social

Networks" Journal of the Royal Statistical Society A 170 (2007): 301--354 [PDF preprint]

* M. B. Hastings, "Community detection as an inference problem", Physical Review E 74 (2006): 035102

= cond-mat/0604429

* Erik Holmström, Nicolas Bock and Joan Brännlund, "Density Analysis of Network Community

* Erik Holmström, Nicolas Bock and Joan Brännlund, "Density Analysis of Network

Community Divisions", cond-mat/0608612

* I. Ispolatov, I. Mazo, A. Yuryev, "Finding mesoscopic communities in sparse networks", q-bio.MN/0512038 = Journal of Statistical Mechanics (2006): P09014

* Brian Karrer, Elizaveta Levina, M. E. J. Newman, "Robustness of community structure in networks", arxiv:0709.2108

* Jussi M. Kumpula, Jari Saramaki, Kimmo Kaski, and Janos Kertesz, "Resolution limit in complex network community detection with Potts model approach",condmat/0610370

* Andrea Lancichinetti, Santo Fortunato, "Benchmarks for testing community detection algorithms on directed and weighted graphs with overlapping communities", arxiv:0904.3940

* Sune Lehmann, Martin Schwartz, Lars Kai Hansen, "Bi-clique Communities", arxiv:0710.4867

* Michele Leone, Sumedha, Martin Weigt, "Clustering by soft-constraint affinity propagation: Applications to gene-expression data", arxiv:0705.2646

* Claire P. Massen, Jonathan P. K. Doye, "Thermodynamics of Community

Structure", cond-mat/0610077

* Ian X.Y. Leung, Pan Hui, Pietro Lio', Jon Crowcroft, "Towards Real Time

Community Detection in Large Networks", arxiv:0808.2633

* Stefanie Muff, Francesco Rao, and Amedeo Caflisch, "Local modularity measure for network clusterizations", Physical Review E 72 (2005): 056107

* Andreas Noack, "Modularity clustering is force-directed layout", arxiv:0807.4052

* Gergely Palla, Imre Derenyi, Illes Farkas and Tamas Vicsek, "Uncovering the overlapping community structure of complex networks in nature and society",

Nature 435 (2005): 814--818 = physics/0506133

* Gergely Palla, Illes J. Farkas, Peter Pollner, Imre Derenyi, Tamas Vicsek, "Directed network modules", physics/0703248

* Nicolas Pissard and Houssem Assadi, "Detecting overlapping communities in linear time with P&A algorithm", physics/0509254

* Pascal Pons, "Post-Processing Hierarchical Community Structures: Quality

Improvements and Multi-scale View", cs.DS/0608050

* Mason A. Porter, Jukka-Pekka Onnela, Peter J. Mucha, "Communities in

Networks", arxiv:0902.3788

* Josep M. Pujol, Javier Béjar, and Jordi Delgado, "Clustering algorithm for determining community structure in large networks", Physical Review E 74 (2006):

016107

* Francisco A. Rodrigues, Gonzalo Travieso, Luciano da F. Costa, "Fast Community

Identification by Hierarchical Growth", physics/0602144

* Huaijun Qiu and Edwin R. Hancock, "Graph matching and clustering using spectral partitions", Pattern Recognition 39 (2006): 22--34 [In this context, for the ideas on hierarchical decomposition, which sounds like it might work for community discovery, if in fact it's not equivalent to some existing community-discovery algorithm.]

* Usha Nandini Raghavan, Reka Albert, Soundar Kumara, "Near linear time algorithm to detect community structures in large-scale networks", arxiv:0709.2938

["every node is initialized with a unique label and at every step each node adopts the label that most of its neighbors currently have"]

* Jörg Reichardt and Stefan Bornholdt, "When are networks truly modular?", cond-mat/0606220

* Jörg Reichardt and Michele Leone, "(Un)detectable cluster structure in sparse networks", arxiv:0711.1452

* Martin Rosvall and Carl T. Bergstrom o "An information-theoretic framework for resolving community structure in complex networks", physics/0612035 [Or, MDL to the rescue!] o "Maps of Information Flow Reveal Community Structure In Complex

Networks" [Thanks to Martin and Carl for a preprint]

* Erin N. Sawardecker, Marta Sales-Pardo, Luís A. Nunes Amaral, "Detection of node group membership in networks with group overlap", arxiv:0812.1243

* Chayant Tantipathananandh, Tanya Berger-Wolf and David Kempe, "A

Framework For Community Identification in Dynamic Social Networks" [PDF]

* Joshua R. Tyler, Dennis M. Wilkinson and Bernardo A. Huberman, "Email as

Spectroscopy: Automated Discovery of Community Structure within Organizations," cond-mat/0303264

* I. Vragovic and E. Louis, "Network community structure and loop coefficient method", Physical Review E 74 (2006): 016105

* Huijie Yang, Wenxu Wang, Tao Zhou, Binghong ang and Fangcui Zhao,

"Reconstruct the Hierarchical Structure in a Complex Network", physics/0508026

["Based upon the eigenvector centrality (EC) measure, a method is proposed to reconstruct the hierarchical structure of a complex network. It is tested on the Santa

Fe Institute collaboration network, whose structure is well known."]

* Haijun Zhou o "Distance, dissimilarity index, and network community structure," physics/0302032 o "Network Landscape from a Brownian Particle's Perspective," physics/0302030

* Etay Ziv, Manuel Middendorf and Chris Wiggins, "An Information-Theoretic

Approach to Network Modularity", q-bio.QM/0411033

Largest component of SFI collaborations

Add Health Data

Outline of Talk

 Large Networks are Everywhere

 Community Detection: A Quick Overview

Application in Computational Biology

 Protein Complex Detection

 Specialized Algorithms

 Performance Evaluation

 Challenges and Conclusion

Hon Wai Leong, Computer Science, NUS

© Leong Hon Wai

(PPI Complex Detection, Sep 2013) Page 17

Goal is to minimize R

Adjacency

Matrix

Families of Community Finding

Methods / Algorithms

1 DIVISIVE METHODS

When do you stop cutting?

Modularity

Newman, Girvan (2004) e ij is equal to the number of links between community i and community j.

It is important to recalculate

Newman, Girvan (2004)

Newman, Girvan (2004)

Families of Community Finding

Methods / Algorithms

2 CLIQUE Percolation METHODS

Wanna use Clique Percolation Method?

Just google: “ cfinder ”

Also available online. Just google

“ BCFinder ”

Families of Community Finding

Methods / Algorithms

3 LINK CLUSTERING METHODS

COMMUNITY:

“a group of densely interconnected nodes”

Y.-Y. Ahn, J. P. Bagrow, and S. Lehmann, "Communities and Hierarchical Organization of

Links in Complex Networks", submitted, arxiv:0903.3178

.

COMMUNITY:

“a group of

TOPOLOGICALLY

SIMILAR LINKS

Y.-Y. Ahn, J. P. Bagrow, and S. Lehmann, "Communities and Hierarchical Organization of Links in

Complex Networks", submitted, arxiv:0903.3178

.

Colleagues

Family

Friends

Y.-Y. Ahn, J. P. Bagrow, and S. Lehmann, "Communities and Hierarchical Organization of Links in

Complex Networks", submitted, arxiv:0903.3178

.

Colleagues

Family

Friends

Y.-Y. Ahn, J. P. Bagrow, and S. Lehmann, "Communities and Hierarchical Organization of Links in

Complex Networks", submitted, arxiv:0903.3178

.

Friends

Family

Colleagues

‘Friends’ links

Y.-Y. Ahn, J. P. Bagrow, and S. Lehmann, "Communities and Hierarchical Organization of Links in

Complex Networks", submitted, arxiv:0903.3178

.

Friends

‘Nerds & geeks’ links

Colleagues

Family

‘Friends’ links

Y.-Y. Ahn, J. P. Bagrow, and S. Lehmann, "Communities and Hierarchical Organization of Links in

Complex Networks", submitted, arxiv:0903.3178

.

Node: multiple membership

Links: (almost) unique membership

Thank

you.

Q &

A

Contact:

Hon Wai Leong ( 梁 汉槐 )

FB, email: leonghw@comp.nus.edu.sg

http://www.comp.nus.edu.sg/~leonghw/

Hon Wai Leong, Computer Science, NUS

© Leong Hon Wai

(PPI Complex Detection, Sep 2013) Page 45

Download