Lecture slides

advertisement

CS8803-NS

Network Science

Fall 2013

Instructor: Constantine Dovrolis constantine@gatech.edu

http://www.cc.gatech.edu/~dovrolis/Courses/NetSci/

Disclaimers

The following slides include only the figures or videos that we use in class; they do not include detailed explanations, derivations or descriptions covered in class.

Many of the following figures are copied from open sources at the Web. I do not claim any intellectual property for the following material.

Outline

• What does “network community” mean?

• Community detection versus graph partitioning versus hierarchical clustering

• Graph partitioning algorithms

– Spectral partitioning (Fiedler’s method based on graph Laplacian)

• Modularity metric for community detection

– Spectral-based modularity optimization

– Other methods for modularity optimization

• Community detection methods that do not rely on modularity metric

– Betweenness-Centrality method

– Radicchi et al. method

• Hierarchical agglomerative clustering

Outline for next week’s class

• Variations of the community detection problem

– Overlapping communities

– Dynamic communities

– Link-based communities

• Properties of real-world network communities

• Applications of community detection

– In social networks

– In biological networks

– In brain networks

– In ecological networks

– In climate networks

Today’s outline (reordered)

• What does “network community” mean?

• Community detection versus graph partitioning versus hierarchical clustering

• Modularity metric for community detection

– Spectral-based modularity optimization

– Other methods for modularity optimization

• Community detection methods that do not rely on modularity metric

– Betweenness-Centrality method

– Radicchi et al. method

• Hierarchical agglomerative clustering

• Graph partitioning algorithms

– Spectral partitioning (Fiedler’s method based on graph

Laplacian)

Hierarchical network

Today’s outline (reordered)

• What does “network community” mean?

• Community detection versus graph partitioning versus hierarchical clustering

• Modularity metric for community detection

– Spectral-based modularity optimization

– Other methods for modularity optimization

• Community detection methods that do not rely on modularity metric

– Betweenness-Centrality method

– Radicchi et al. method

• Hierarchical agglomerative clustering

• Graph partitioning algorithms

– Spectral partitioning (Fiedler’s method based on graph

Laplacian)

Graph partitioning vs Community detection

• In graph partitioning, the desired number and size of the partitions is given

– E.g., graph bisection in two equal-sized partitions

– NP-Hard

• In community detection, the number of communities (and their size) results from the method itself

– It is a property of the network

• The community detection problem is less well-defined than the graph partitioning problem

Spectral bisection method for graph partitioning

(see last few slides for more details)

Graph partitioning vs Hierarchical clustering

Community detection vs Hierarchical clustering

• Hierarchical clustering comes in two forms:

– Divisive algs: top-down

– Agglomerative: bottom-up

• Key points:

– Need a similarity metric for any two nodes

• Which metric to use?

• How to examine similarity of groups of nodes?

– Which horizontal partition gives more insight?

– Some clusters are artificial; not “real communities”

– Fundamentally, many networks are NOT hierarchical

Today’s outline (reordered)

• What does “network community” mean?

• Community detection versus graph partitioning

• Modularity metric for community detection

– Spectral-based modularity optimization

– Other methods for modularity optimization

• Community detection methods that do not rely on modularity metric

– Betweenness-Centrality method

– Radicchi et al. method

• Hierarchical agglomerative clustering

• Graph partitioning algorithms

– Spectral partitioning (Fiedler’s method based on graph Laplacian)

Modularity definition

• Fraction of edges between pairs of nodes that belong to the same community

RELATIVE TO

• Fraction of edges between same pair of nodes if edges were placed randomly (but in a degree-preserving manner)

Spectral maximization of modularity

(2006)

Spectral maximization of modularity

(see class notes for detailed derivations)

Spectral maximization of modularity

(see class notes for detailed derivations)

Dividing a community into smaller communities

Spectral maximization of modularity

(see class notes for detailed derivations)

Greedy optimization of modularity (2004)

Complexity of Clauset et al.’s method

Today’s outline (reordered)

• What does “network community” mean?

• Community detection versus graph partitioning

• Modularity metric for community detection

– Spectral-based modularity optimization

– Other methods for modularity optimization

• Community detection methods that do not rely on modularity metric

– Betweenness-Centrality method

– Radicchi et al. method

• Hierarchical agglomerative clustering

• Graph partitioning algorithms

– Spectral partitioning (Fiedler’s method based on graph Laplacian)

The algorithm of Girvan-Newman

The algorithm of Girvan-Newman

The algorithm of Radicchi et al.

Today’s outline (reordered)

• What does “network community” mean?

• Community detection versus graph partitioning

• Modularity metric for community detection

– Spectral-based modularity optimization

– Other methods for modularity optimization

• Community detection methods that do not rely on modularity metric

– Betweenness-Centrality method

– Radicchi et al. method

• Hierarchical clustering

• Graph partitioning algorithms

– Spectral partitioning (Fiedler’s method based on graph Laplacian)

Hierarchical clustering

http://condor.depaul.edu/ntomuro/courses/578/notes/notes-Clustering.html

Hierarchical agglomerative clustering http://condor.depaul.edu/ntomuro/courses/578/notes/notes-Clustering.html

Hierarchical divisive clustering

http://mines.humanoriented.com/classes/2010/fall/csci568/portfolio_exports/mvoget/cluster/cluster.html

Node similarity metrics

Cluster similarity – 3 approaches

Today’s outline (reordered)

• What does “network community” mean?

• Community detection versus graph partitioning

• Modularity metric for community detection

– Spectral-based modularity optimization

– Other methods for modularity optimization

• Community detection methods that do not rely on modularity metric

– Betweenness-Centrality method

– Radicchi et al. method

• Hierarchical agglomerative clustering

• Graph partitioning algorithms

– Spectral partitioning (Fiedler’s method based on graph Laplacian)

Key points

(See class notes for detailed derivations)

• Define Laplacian of an (undirected, unweighted) graph

– Show that all eigenvalues of Laplacian are non-negative

– Show that Laplacian has at least one zero eigenvalue

– The number of zero eigenvalues is equal to the number of connected components in the graph

– The lowest non-zero eigenvalue is called

“algebraic connectivity” and it is proportional to the graph’s min cut set

Key points (cont’)

(See class notes for detailed derivations)

• Formulate graph bisection problem as a constrained optimization problem

• Show that min cut set is proportional to algebraic connectivity (min non-zero eigenvalue of Laplacian)

• Compute corresponding eigenvector

(appropriately normalized)

• And determine graph partitions based on the values of that eigenvector

• For a sparse graph, this method is O(n 2 )

– If the second eigenvector is computed using the orthogonalization or Lanczos method (which is

O(m*n))

Download