Network theory III

advertisement
Network theory III
David Lusseau
BIOL4062/5062
d.lusseau@dal.ca
Outline

16 March: community structure

Suggested readings:

Newman M.E.J. 2003. The structure and function of complex
networks. SIAM Review 45,167-256
What is a community?

A cluster of individuals that are more linked to
one another than to others
Traditional techniques

Cluster analysis (hierarchical)

Multi-Dimensional Scaling

Principal Coordinate Analysis
Traditional techniques

How representative is the result?


Loss of information measure: Stress in MDS
What is the best division?


Cluster analysis
Peripheral individuals are lumped together
Girvan-Newman algorithm

Divisive clustering algorithm


Find the boundaries of communities


Divide a population of n vertices in 1 to n communities
Weakest link between communities: edge betweenness
Standardise betweenness at each step

Re-calculate edge betweenness at each step
Zachary karate club
Girvan & Newman 2002 PNAS
Finding the best division

For each step calculate a modularity coefficient


Best division will have the most edges within
communities and the least between
Take community size into consideration
Q
2
eii  a i
Q=0.42
i
a i   e ij
j
1
2
3
1
30
2
5
2
2
10
2
3
5
2
50
37 2
10
14 2
50
57 2 
 30
Q  (
(
) )(
(
) )(
(
) )
108 108
108 108 
 108 108
Zachary karate club
Newman & Girvan 2003 Physics Review E
Modularity coefficient

The principle of modularity coefficient
optimisation can be apply to any community
structure algorithm
Extension to weighted matrices

Edge betweenness



Transform similarity matrix into dissimilarity matrix
Calculate geodesic path using Djikstra’ algorithm
Problem: more likely to remove edges
between strongly connected pairs
Alternative: Modularity optimisation

Forget edge betweenness

Optimise for high Q!

Computer intensive

Prone to false minima


Difficult to find out
Iterate the optimisation to detect

Not always successful
Modularity- Greedy algorithm

Start with n communities (agglomerative
clustering method)

At each step link the communities that provides
the greatest increase (or the smallest decrease
in Q)
Modularity- Greedy algorithm
Q optimisation
Girvan-Newman
Overlapping communities


Recognise that some individuals sit on the fence
Do not force them in one community or the
other but identify them as overlapping
Palla et al. 2005 Nature
Palla algorithm

Based on the k-clique principle: a community is
composed of a number of k-cliques

k-cliques: fully connected subgraphs of k vertices

Adjacent k-cliques share k-1 vertices

Community: series of adjacent cliques
Palla et al. 2005 Nature
Palla algorithm

Find all k-cliques
Calculate the clique-clique overlap matrix
Define adjacent cliques

Issues (and advantages):




k is user-defined, find ‘best’ k by trial and error
Works only on binary networks

(weighted network transformation)
Palla et al. 2005 Nature
Simply the best method
Modularity matrix

A matrix? Let’s eigenanalyse!

Let’s rewrite the modularity coefficient:
ki k j
1
Q
( Aij 
)si s j

4m ij
2m
Community
identification
Links distributed
at random
Newman 2006 PNAS
Modularity matrix
Bij  ( Aij 

2m
)
Sum rows and sum of columns = 0



ki k j
One eigenvector (1,1,1….) with eigenvalue 0
Graph Laplacian
Eigenvector of the dominant eigenvalue gives
the best community division into 2 communities
(negative and positive elements)
Magnitude of eigenvector elements

Tells us how well a vertex is classified (whether
it belongs to the core or the periphery of the
community)
Zachary karate club
Finding the best division

Repeat the process on each subgraph



Recalculate the modularity coefficient for the whole
graph
If new division makes 0 or <0 contribution to
modularity then do not do it
Else continue
Power of modularity matrix method

Different types of null models can be tested

As long as we have

One eigenvector (1,1,1….) with eigenvalue 0
1
Q
( Aij  Pij )si s j

2m ij

To do so, substract sum of rows from diagonal
Uncertainty


Bootstrapped algorithm
m results from community algorithm



Matrix: likelihood that 2 individuals belong to the
same community
Coarse-grain community identity
Provides uncertainty overlap
Girvan-Newman in Netdraw
Modularity matrix in Socprog
Download