Desynchronization algorithms

advertisement
Introduction of Network Science
Prof. Cheng-Shang Chang (張正尚教授)
Institute of Communications Engineering
National Tsing Hua University
Hsinchu Taiwan
Outline







What is network science?
A brief history of network science
Review of the mathematics of networks
Diffusion, distributed averaging, random
gossip, synchronization
Network formation
Structure of networks (Community
detection)
Conclusion
What is network science?
2005 National Research Council of
the National Academies
 “Organized knowledge of networks
based on their study using the
scientific method”
 Social networks, biological networks,
communication networks, power
grids, …

A visualization of the network structure of the
Internet at the level of “autonomous systems”
(Newman, 2003)
A social network (Newman, 2003)
A food web of predator-prey interactions between
species in a freshwater lake (Newman, 2003)
Power grid map
http://www.treehugger.com/files/2009/04/nprs-interactivepower-grid-map-shows-whos-got-the-power.php
Citation networks
http://www.public.asu.edu/~majansse/pubs/Suppl
ementIHDP.htm
Two key ingredients
The study of a collections of nodes
and links (graphs) that represent
something real
 The study of dynamic behavior of
the aggregation of nodes and links

Definition of Network
G(t)={V(t), E(t), f(t): J(t)}
 t: time
 V: node (vertex, actor)
 E: link (edge)
 f: NxN topology (adjacency matrix)
 J: algorithm for the evolution of the
network (microrule)

Definition of Network
Science (by Ted G. Lewis)

The study of the theoretical foundation of
network structure/dynamic behaviors and the
application of network to many subfields
 Social network analysis (SNA)
 Collaboration networks (citations, online social
networks)
 Emergent systems (power grids, the Internet)
 Physical science systems (phase transition,
percolation theory)
 Life science systems (epidemics, metabolic
processes)
A brief history: The pre-network
period (1736-1966)




1736 Leonhard Euler: seven bridge of
Konigsberg problem
1925 Yule: preferential attachment
 An explanation for the evolution of the
Internet and WWW
1927 Kermack and McKendrick: epidemic model
(diffusion of innovation, the spread of
information)
1959-1960 Erdos and Renyi: random graph
model
The meso-network period
(1967-1998)
1967 Stanley Milgram
 “Six degree of separation”
 Communication project
 If you do not know the target
person, forward the request to a
personal acquaintance
 Small-world effect: the diameter of
a network increases as ln(n)

The meso-network period
(1967-1998)






1972 Bonacich :influence network
Distributed consensus
Kirchhoff’s network: the value of a node is equal
to the difference between the sum of values
from input and output links
States and differential equations
Fixed point (steady state)
1984 Kuramoto: synchronization in coupled
linear systems
The modern period (1998present)
1998 Holland: emergence as the
final state (of the fixed point
problem)
 1998 Watts and Strogatz: a
generative procedure of rewiring the
links in a regular graph
 The small-world model
 Crossover point and phase transition

The modern period (1998present)





1999 M. Faloutsos, P. Faloutsos and C.
Faloutsos: observed a power law in their graph
of the Internet
1999 Barabasi: math model for scale-free
networks
2000 Dorogovtsev: power law in many biological
systems
1999 Kleinberg: power law in webgraph
2002 Girvan and Newman: community structure
The modern period (1998present)
Atay network (a generalization of
the Kirchhoff network)
 Emergence and synchronization:
 Heart beating
 The chirping of crickets
 Distributed consensus
 Propagation of influence

Review of the mathematics of
networks
Networks and their
representations









A networks is a graph
Vertices (nodes, sites, actors)
Edges (links, bonds, ties)
1
n: number of nodes
Self-edge
Multiedge
m: number of edges
2
4
Multiedges
Self-edges (self-loops)
3
Simple network (simple graph): a network that
has neither self-edges nor multiedges
Multigraph: a network with multiedges
Adjacency matrix





A: an n×n matrix
Aij=1 if there is an edge between
vertices i and j.
Aij=0 otherwise.
For a network with no self-edges, the
diagonal elements are all zero.
It is symmetric.
1
2
4
3
1
2
A=
3
4
1 2 3 4
Directed networks

Adjacency matrix: Aij =1 if there is
an edge from j to i.

With self-edges: Aii =1 for a single edge
from vertex i to itself in a directed
network.
1
2
4
3
1
2
A=
3
4
1 2 3 4
Degree
The degree of a vertex is the
number of edges connected to it.
 ki: the degree of vertex i
 m: number of edges
 2m ends of edges (every edge has
two ends)

Mean degree

c: the mean degree of a vertex in an
undirected graph

The maximum possible number of
edges is (n-1)n/2.
Density

Density (connectance): the fraction
of the maximum number of edges
that actually present

For large network (n is very large)
Density
A (large) network is said to be dense
if the density ρ tends to a constant
as
 On the other hand, it is said to be
sparse if ρ tends to 0 as

Regular graphs




A regular graph is a graph in which all
the vertices have the same degree.
k-regular graph: every vertex has
degree k
2-regular: ring
4-regular: square lattice
Path
Path: a sequence of connected
vertices
 Self-avoiding path: a path that does
not intersect itself
 Length of a path: the number of
edges in the path
 If there is a path of length 2 from j
to i via k, then AikAkj=1.

Paths and adjacency matrix

: the number of paths of length
2 from j to i

: the number of paths of length
3 from j to i
Geodesic paths





A geodesic path (shortest path) is a path
between two vertices that no shorter path
exists
Geodesic distance (shortest distance):
the length of a geodesic path
The smallest value r such that
Geodesic paths are self-avoiding (Why?)
Geodesic paths are not necessarily unique
Diameter
The diameter d of a graph is the
length of the longest geodesic paths
between any pairs of vertices in a
network.
 Suppose that
is the geodesic
distance between vertices i and j

Components



A network is connected if there is a path
from every vertex to any other vertex.
Disconnected networks can be separated
into several components.
Components:


There is a path from every vertex in the subnetwork to
any other vertex in the same subnetwork.
No other vextex can be added while preserving this
property.
Diffusion


Diffusion is the process by which gas
moves from regions of high density to
regions of low, driven by the relative
pressure of the different regions.
Diffusion in a network (Influence
network):
 The spread of an idea
 The spread of a disease
Diffusion in a network
Suppose that we have some
commodity on the vertices.
 Let
be the amount of the
commodity at vertex i at time t
 Suppose that community moves
from vertex j to an adjacent vertex i
at rate
 C is called the diffusion constant.

Governing equation for
diffusion in a network


is the degree of vertex i
is the Kronecker delta, which is 1 if i=j and 0
otherwise.
Governing equation for
diffusion in a network

Let D be the diagonal matrix with vertex
degrees along its diagonal.
Graph Laplacian: L=D-A
In matrix form,

A system of linear differential equations


Solving the system of linear
differential equations


Suppose vi and λi are the ith eigenvector
and eigenvalue.
Guess the solution has the form
Eigenvalues of the graph
Laplacian
The Laplacian is symmetric.
 It has real eigenvalues.
 The Laplacian is positivesemidefinite.
 All its eigenvalues are nonnegative.
 The vector (1,1,…,1) is an
eigenvector with eigenvalue 0.

Algebraic connectivity




The number of zero eigenvalues of the
Laplacian is the number of components.
The Laplacian can be written in a block form.
The network is connected if and only if the
second smallest eigenvalue of the Laplacian
is nonzero.
Algebraic connectivity: the second smallest
eigenvalue of the Laplacian
Distributed averaging
consensus





Lin Xiao and Stephen Boyd, “Systems & Control
Letters,” 53 (2004) 65 – 78.
Consider a network (connected graph) G=(V,E)
Each vertex i holds an initial scalar value xi(0) in
R, and x(0)=(x1(0),…, xn(0))
Two vertices can communicate with each other,
if and only if they are neighbors.
The problem is to compute the average of the
initial values,
,via a distributed
algorithm
Motivation

Sensor networks (measuring temperature)

A flock of flying birds
Distributed linear iterations

Constant edge weights

In matrix form

L=D-A is the Laplacian of the graph
Distributed linear iterations






W=I- L
The vector (1,1,…,1) is an eigenvector with
eigenvalue 0 of the Laplacain L.
L is symmetric for an undirected graph
W is a doubly stochastic matrix, i.e., all the row
sums and column sums are all equal to 1.
If W is a nonnegative matrix, then W can be
viewed as the probability transition matrix of a
Markov chain and
where
1.
is a matrix with all its elements being
Condition for convergence

As
 The condition for W to be a nonnegative matrix,


ki is the degree of vertex i
Distributed linear iteration is guaranteed to
converge if
Randomized gossip
algorithms





Stephen Boyd, Arpita Ghosh, Balaji Prabhakar, and
Devavrat Shah, IEEE Transactions on Information Theory,
VOL. 52, NO. 6, pp. 2508-2530, JUNE 2006.
Gossip algorithm: an algorithm in which each
node can communicate with no more than one
neighbor in each time slot.
Consider a network (connected graph) G=(V,E)
Each vertex i holds an initial scalar value xi(0) in
R, and x(0)=(x1(0),…, xn(0))
The problem is to compute the average of the
initial values,
,via a gossip algorithm
Asynchronous time model




Each vertex has a clock which ticks at the times
of a rate 1 Poisson process.
Superposition of independent Poisson processes
is also a Poisson process with the rate equal to
the sum of the rates of the original Poisson
processes.
Uniformization: consider a Poisson process with
rate n for clock ticks (as there are n vertices).
With probability 1/n, a clock tick is chosen for
vertex i.
Asynchronous time model


In the kth time slot, let node i’s clock tick and let
it contact some neighboring node j with
probability Pij.
Both vertices set their values equal to the
average of their current values.

With probability
, the random matrix W(k) is

where Q is the permutation matrix that
interchange the ith and jth coordinates.
Spread of information





Other objective functions, e.g., max, min.
How fast is information distributed over a
network via a randomized gossip algorithm?
Start from the initial state x(0)=(1,0,…,0), i.e.,
only the first vertex has the information.
If xi(t)>0, then vertex i must have been “visited”
(at least once) by time t via the randomized
gossip algorithm.
can be used to bound the
probability that all the vertices received the
information.
Influence network





xi(t): the degree of influence (power) of vertex i
at time t
:the influence from j to i
We still have
But the weight matrix W is much more
complicated. It may not be nonnegative, or
doubly stochastic.
Convergence might be a problem.
Command hierarchies
Emergent power




Define power as the degree of influence
in a social network
How does one increase his/her power?
Acquisition of weight influence: increase
the influence to others and reduce the
influence from others
Acquisition of link influence: rewiring links
(by knowing more important people) is
less effective.
Synchronization and
desynchronization
• Phenomenon of mutual synchronization
• The flashing of fireflies in south Asia.
• Spreading identical oscillators into a round-robin
schedule.
• Desynchronization has many applications
• Resource scheduling in wireless sensor networks.
• Fair resource scheduling as Time Division Multiple Access.
52
53
Desynchronization algorithms
• A general framework for distributed algorithm to achieve
desynchronization needed in TDMA.
Degesys, Rose, Patel, Nagpal (2007)
• All the nodes can communicate with each other.
• Each node is modelled by an oscillator with the same
fundamental frequency.
• There is no clock drift in every oscillator.
J.Degesys, I. Rose, A. Patel, R. Nagpal, “Desync: Self-Organizing desynchronization
and TDMA on wireless sensor networks,” IPSN, 2007
55
Desynchronization algorithms
• The DESYNC-STALE algorithm
Fire!
56
Desynchronization algorithms
• The DESYNC-STALE algorithm
• When a node reaches the end of the cycle, it fires and
resets its phase back to 0.
• It waits for the next node to fire and jump to a new phase
according to a certain function.
• The jumping function only uses the firing information of
the node fires before it and the node fires after it.
• The rate of convergence is only conjectured to be 𝑂(𝑛2 )
from various computer simulations.
57
Desynchronization algorithms
Fire!
𝜙 end
= 0 of the cycle, it fires and
• When a node reaches the
resets its phase back to 0.
58
Desynchronization algorithms
• It waits for the next node to fire and jump to a new phase
according to a certain function.
Fire!
59
Desynchronization algorithms
• The jumping function only uses the firing information of
the node fires before it and the node fires after it.
Fire!
60
Desynchronization algorithms
• Generalized process sharing scheme (GPS)
• An extension of the fair scheduling scheme to the GPS.
Pagliari, Hong, Scaglione (2010)
• Every node is assigned a weight and the amount of
bandwidth received by a node is proportional to its weight.
• They proposed an algorithm with two oscillators in each
node and showed the convergence in the ideal case.
R. Pagliari, Y.-W. Hong, and A Scaglione “Bio-inspired algorithms for decentralized
round-robin and proportional fair scheduling,” IEEE Journal on Selected Areas in
Communications: Special Issue on Bio-Inspired Networking, 2010
61
Desynchronization algorithms
• Both the DESYNC-STALE and the extension of the GPS scheme
are shown to work properly by various computer simulations.
• Lack of rigorous theoretical proofs in many aspects
• The rate of convergence of the DESYNC-STALE algorithm.
• The convergence of the stale GPS scheme.
62
Desynchronization algorithms
• All the node are not likely to be identical
• A particular node need to interact with the “outside”
world, and might not have the freedom to adjust its
local clock.
• The master node in Bluetooth.
• The collector node in a wireless sensor network.
• The master clock in parallel analog-to-digital converters.
63
Desynchronization algorithms
• Consider the desynchronization problem with an anchored node
• The anchored node never adjusts its phase.
• Except the anchored node, all the other nodes are
identical and they do not know which node the
anchored node is.
64
Dynamics in Anchored Desynchronization
Fire!
65
Network formation
Erdos-Renyi random graph
 Configuration model
 Preferential attachment
 Small world
 Formation of social networks by
random triad connections

Random Graphs
The G(n,p) model

There are n vertices.
With probability p, we place an edge independently
between each distinct pair.
First studied by Solomonoff and Rapoport in 1951.
Erdos and Renyi published a series of papers on this
model.
For a sample G in G(n,p) with m edges,

The probability of drawing a graph with m edges is then




The mean degree in the
G(n,p) model

The mean value of m is

This is a direct result of the binomial distribution as there
are n(n-1)/2 independent Bernoulli random variables with
parameter p.
The mean degree in a graph with m edges is 2m/n. Thus
the mean degree in G(n,p) is

Degree distribution





A given vertex in G(n,p) is connected with probability p to
each of the n-1 other vertices.
The degree of of a given vertex is thus a random variable
with the binominal distribution B(n-1,p), i.e.,
Let the mean degree c=(n-1)p be fixed as n goes to
Then the binomial distribution B(n-1,p) converges to the
Poisson distribution with mean c, i.e.,
This is called Poisson random graph (as the limit of the
Erdos and Renyi random graph)
Clustering coefficient




The clustering coefficient C is a measure of
transitivity.
It is defined as the probability that two network
neighbors of a vertex are also neighbors of each
other.
As each edge is connected independently with
probability p,
This is one of several aspects that the random
graph differs sharply most from real-world
networks.
Giant component





Consider the Poisson random graph
For the case p=0, there are no edges in the
network at all. Each vertex is completely
isolated.
For the case p=1, every possible edge in the
network is present and the network is an nclique.
Phase transition: an interesting question is how
the transition between the two extremes occurs
if we increase p from 0 to 1.
Giant component: a network component whose
size grows in proportion to n.
The size of the giant
component



Let u be the average fraction of the vertices in
the random graph that do not belong to the
giant component.
Suppose that a randomly chosen vertex i does
not belong to the giant component (with
probability u).
For any other vertex j


Either i is not connected to j (with probability 1-p),
or i is connected to j but j itself is not a member of the
giant component (with probability pu).
A sample of an Erdos-Renyi graph
http://igraph.sourceforge.net/screens
hots2.html
The configuration model




Given a specific degree
sequence
Give vertex i ki “stubs”
Choose two of the stubs
uniformly at random and
create an edge between the
two vertices of the two chosen
stubs. (they might be the
same vertices (self edges))
Repeat the process until all the
stubs are chosen.
From degree sequence to
degree distribution
Specify the degree distribution pk
 Draw a degree sequence k1,k2, …
with probability
 Then use the degree sequence to
generate a random graph via using
the configuration model.
 Scale free networks: the degree
distribution obeys the power law
(Pareto distribution).

Preferential attachment
Richer-get-richer effect
 Cumulative advantage
 Experience of shopping (品牌效應)

Price’s model
Every newly appearing paper cites
previous ones chosen at random
proportional to the number of
citations these previous papers
already have.
 Degree distributions obey the power
law.
 A special case is the Barabasi and
Albert model

The small-world model
Regular graphs vs. random
graphs
Random graphs have low transitivity
(clustering coefficient).
 Random graphs have small
diameter.
 On the other hand, regular graphs
have large diameter and some of
them have large transitivity.

A simple one-dimensional
network model (ring)
Every vertex is connected to c nearest
neighbors in a line (a) or in a ring (b). Here
c=6.
Clustering coefficient




Count the number of triangles
Two steps forward and one step
back
The length of the last step can be
chosen between 1,2,…c/2.
The number of ways to choose the
first two steps is the number of
positive integer solutions to
, which is
Clustering coefficient

The total number of triangles is

Each vertex has degree c and the number of
connected triples centered at a vertex is

This clustering coefficient varies from 0 for c=2
up to a maximum of ¾ when c
The mean shortest path




The farthest one can move around the ring in a
single step is c/2.
So two vertices m lattice spacing apart are
connected by a shortest path of 2m/c steps.
Averaging over the complete range of m from 0
to n/2 gives a mean shortest path of n/2c.
By contrast, the mean shortest path in the ER
random graph is
The small-world model




Watts and Strogatz 1998
Interpolate between the ring model and the
random graph by moving or rewiring edges from
the ring to random positions.
Start with a ring model with n vertices in which
every vertex has degree c.
Go through each edge in turn and with
probability p we remove that edge and replace it
with one (shortcut) that joins two vertices
chosen uniformly at random.
The small-world model
The small-world model


The crucial point about the Watts-Strogatz
small-world model is that as p is increased from
0 the clustering coefficient is maintained up to
quite large values of p while the small-world
behavior, meaning short average path lengths,
already appears for quite modest values of p.
As a result, there is a substantial range of
intermediate values for which the model shows
both effects simultaneously, i.e., large clustering
coefficient and short average path length.
Clustering coefficient (solid line) and
average path length (dashed line)
Scaling function for the
small-world model
Formation of Social Networks by
Random Triad Connections
Join work with Prof. Duan-Shin Lee
 Director of the Institute of
Communications Engineering
 National Tsing Hua University

National Tsing-Hua University
A Network Formation Model for Social Networks
• At time zero, the network consists of a clique with
m0 vertices.
• At time t, which is a non-negative integer, a new
vertex is attached to one of the existing vertices
in the network.
– The attached existing vertex is selected with equal
probability.
– This step is called the uniform attachment step.
• Each neighbor of the attached existing vertex is
attached to the new vertex with probability a and
not attached with probability 1-a.
– This step is called the triad formation step.
– Friends’ friends are more likely to be friends.
Institute of Communications Engineering
91
National Tsing-Hua University
Uniform Attachment and Triad Formation
• when t = 0
Institute of Communications Engineering
92
National Tsing-Hua University
Uniform Attachment and Triad Formation
• when t = 1
do nothing with
probability 1-a
uniform
attachment
triad formation
with probability a
Institute of Communications Engineering
93
National Tsing-Hua University
Uniform Attachment and Triad Formation
• when t = 2
Institute of Communications Engineering
94
Community detection
 INFOCOM 2011
 Cheng-Shang Chang, Chin-Yi Hsu, Jay Cheng, and Duan-Shin Lee
Institute of Communications Engineering
National Tsing Hua University
Taiwan, R.O.C.
Detecting Community


Community :
 It is the appearance of densely connected groups of
vertices, with only sparser connections between
groups.
Modularity (Girman and Newman 2002) :
 It is a property of a network and a specifically
proposed division of that network into communities.
 It measures when the division is a good one, in the
sense that there are fewer than expected edges
between communities.
96
Detecting Community

Example :
97
Algorithms for Detecting Community
 Many algorithms have been proposed in the
literature. Basically, they can be classified into four
Our algorithm
categories:
belongs to this
 (1) divisive algorithms
class
 (2) agglomerative algorithms
 (3) graph partitioning and clustering algorithms
 (4) data compression algorithms
Newman’s fast
algorithm also
belongs to this class
98
Agglomerative Algorithms

Example :
100
Our Contributions
 In spite of all the efforts in developing community detection
algorithms, there are still many questions that we do not have
satisfactory answers.
 What is a community in a network?
 Even with a definition of a community, what would be the
right index for measuring the performance of a graph
partition?
 We will provide a general probabilistic framework for these
questions.
101
Our Contributions
 In spite of all the efforts in developing community detection
algorithms, there are still many questions that we do not have
satisfactory answers.
 What is a community in a network?
 Even with a definition of a community, what would be the
right index for measuring the performance of a graph
partition?
 We will provide a general probabilistic framework for these
questions.
102
Our Contributions
 Characterization of a graph: the key idea of our framework is to
characterize a graph by a bivariate distribution that specifies the probability of
the two vertices appearing at both ends of a “randomly” selected path in the
graph.
 Definition of a community: With such a bivariate distribution, we
can then define a community as a set of vertices with the property that it is
more likely to find the other end in the same community given one of the two
ends in a randomly selected path is already in the community.
 Correlation measures: To detect communities, we define a class of
correlation measures that can be used for measuring how two vertices (and two
communities) are related. Two communities are positively (resp. negatively)
correlated if the value of a correlation measure for these two communities is
positive (resp. negative).
103
Our Contributions
 A class of distribution-based clustering algorithms :
as a generalization of Newman’s fast algorithm, we propose a
class of distribution-based clustering algorithms for
community detection.
 Two theoretic results that can be proved for a
distribution-based clustering algorithm:
 (i) it guarantees that every community detected by the
algorithm satisfies the definition of a community under
certain technical conditions for the bivariate distribution,
 (ii) the algorithm increases the “modularity” index in each
merge of two positively correlated communities.
104
Correlation Measures
 Definition: For any two indicator random variables X
and Y ,  ( X , Y ) is called a correlation measure in this
paper if
 (1)  ( X , Y ) is solely determined by the bivariate
distribution of X and Y ,
 (2)  ( X , Y )  0 if and only if X and Y are independent,
 (3)  ( X , Y )  0 if and only if X and Y are positively
correlated, i.e.,
P( X  1, Y  1)  P( X  1) P(Y  1)
 From (2) and (3), we also know that  ( X , Y )  0 if and
only if X and Y are negatively correlated.
105
Examples of Correlation Measures
 Covariance: For two indicator random variables X and
Y , we have
 Cov( X , Y )  P( X  1, Y  1)  P( X  1) P(Y  1)
106
Examples of Correlation Measures
 Correlation: Note that the correlation of two random
variables X and Y , denoted by Correl[ X , Y ] , can be
computed as follows:
Cov[ X , Y ]
Var ( X ) Var (Y )
P( X  1, Y  1)  P( X  1) P(Y  1) ,where

Var ( X ) Var (Y )
 Correl[ X , Y ] 
Var ( X )  P( X  1)  ( P( X  1))2
Var (Y )  P(Y  1)  ( P(Y  1))2
107
Examples of Correlation Measures
 Mutual information: The mutual information of two
random variables X and Y, denoted by I ( X ; Y ) , can be
computed as follows:
 I ( X ;Y ) 

( x , y )supp( PX ,Y )
PX ,Y ( x, y ) log
PX ,Y ( x, y )
PX ( x) PY ( y )
  ( X , Y )  Sgn(Cov( X , Y ))  I ( X ; Y )
108
Probabilistic Framework
 Instead of characterizing a network by a graph, we
characterize a network by a bivariate distribution.
 As mentioned before,
P((V , W )  (v, w))
{
1/2m, if vertices v and w are connected,

0,
otherwise.
 Now, let  ( A) be the sum of all the elements in a matrix
A, i.e.,
  ( A)   Avw
v
w
 Then, we can rewrite the bivariate distribution:
 P(V  v,W  w)  p(v, w) 
1
Avw
 ( A)
109
Probabilistic Framework
 Recall that the bivariate distribution above is the
probability for the two ends of a randomly selected
edge in a graph.
 Now, our idea is to generate the needed bivariate
distribution by randomly selecting the two ends of a
path.
 We first consider a function f that maps an adjacency
matrix A to another matrix f(A).
 Then we define a bivariate distribution from f(A) by
 P(V  v,W  w) 
1
f ( A)vw
 ( f ( A))
110
Probabilistic Framework
 A random selection of a path with length not greater
than 2: Consider a graph with an n  n adjacency matrix
A and
2
 f ( A)  0 I  1 A  2 A
 0 , 1 , and 2 are three nonnegative constants
2
  ( f ( A))  0 (I)  1 ( A)  2 ( A )
 A path with length l is selected with probability
l /  ( f ( A)) for l = 0, 1, and 2
111
Probabilistic Framework
 A random walk on a graph: It can be characterized by
a Markov chain with the n  n transition probability
matrix R  ( Rv , w ) , where
 Rv , w  Avw / kv is the transition probability from
vertex v to vertex w.
 The stationary probability that the Markov chain is
in vertex v, denoted by  v , is kv / 2m .
 l is the probability that we select a path with
length l, l = 1, 2, … .
 The probability of selecting a random walk (path)
with vertices v  v1 , v2 , , vl  w is
l 1
l v  Rv ,v
1
i 1
i
i 1
112
Probabilistic Framework
 We then have the bivariate distribution
l 1

p(v, w)   v  l 
l 1
 R
vl 1 i 1
v2
vi ,vi1
 We can simply let l = 0 for all l > 2 and this leads to
p(v, w) 
1
2m
Av, w 
2
n
Av,v2 Av2 ,w
v2 1
kv2

2m
113
Distribution-based Clustering Algorithm
 (1) Input a bivariate distribution p (v, w), v,w = 1, 2, … ,
n that characterizes the two randomly selected nodes
V and W, and a correlation measure  ( X , Y ) for two
indicator random variables.
 (2) Initially, there are n communities, indexed from 1
to n, with each community containing exactly one
node. Specifically, let Si be the set of nodes in
community i. Then Si  {i} , i = 1, 2, … , n.
114
Distribution-based Clustering Algorithm
 (3) Let X i (resp. Y j ) be the indicator random variable
for the event that V is in community i (resp. W is in
community j). Then
 p (v)  p (i)
 P(Y  1)   p ( w)  p ( j )
 P( X  1, Y  1)   p(v, w)  p(i, j )
 P( X i  1) 
j
vSi
wS j
i
j
V
V
W
W
vSi , wS j
Compute  ( X i , Y j ) for all i and j.
115
Distribution-based Clustering Algorithm
 (4) Find the two (distinct) communities that have the
largest correlation measure. Group these two
communities into a new community. Suppose that
community i and community j are grouped into a new
community k. Then S k  Si S j and update
 P( X k  1)  P( X i  1)  P( X j  1)
 P (Yk  1)  P (Yi  1)  P (Y j  1)
 P( X k  1, Yk  1)  P( X i  1, Yi  1)  P( X i  1, Y j  1)
 P( X j  1, Yi  1)  P( X j  1, Y j  1)
 P( X k  1, Yl  1)  P( X i  1, Yl  1)  P( X j  1, Yl  1)
 P( X l  1, Yk  1)  P( X l  1, Yi  1)  P( X l  1, Y j  1)
116
Distribution-based Clustering Algorithm
 (5) For all l  k , compute  ( X k , Yl ) and  ( X l , Yk ) .
 (6) Repeat (4) until either there is only one community
left or all the remaining pairs of communities have
negative measures, i.e.,  ( X i , Y j )  0 for all i  j.
117
Definition of a Community
 Definition: A set of nodes S is a community in a
probabilistic sense if
 P(V  S ,W  S )  P (V  S ) P (W  S )
 If P(W  S )  0 , then this is equivalent to
P(V  S | W  S )  P(V  S )
 It is more likely to find the other node in the same
community given that one of a randomly selected pair of
two nodes is already in the community.
118
Definition of a Community
 Theorem 1: Suppose that p (v, w) is symmetric and
p (v, v)  ( pV (v)) 2, for all v = 1, 2, … , n. Then every
community detected by any distribution-based
clustering algorithm is a community in the
probabilistic sense.
119
Definition of the Modularity Index
 Definition: Consider a bivariate distribution p (v, w) with
v, w = 1, 2, … , n. Let S c , c = 1, 2, … ,C, be a partition of
{1, 2, … , n}, i.e., Sc Sc ' is an empty set for c  c ' and
C
S  {1, 2, , n} .
c 1 c
The modularity index Q with respect to the partitionS c
where c = 1, 2, … ,C, is
Q
C
 ( P(V  S ,W  S )  P(V  S ) P(W  S ))
c 1
c
c
c
c
 Theorem 2: Suppose that p (v, w) is symmetric. Then
for any distribution-based clustering algorithm, the
modularity index is non-decreasing in every iteration.
120
Simulation Result
 Each point in these figures is an average over 100
random graphs. In these figures, we also show 95%
confidence intervals for all data points.
 In our simulation results, we will consider three
distribution-based clustering algorithms:
 (1) covariance algorithm
 (2) correlation algorithm
 (3) mutual information algorithm
121
Simulation Result
 To map a graph with an adjacency matrix A to a
bivariate distribution p (v, w) . Recall that,
f ( A)  0 I  1 A  2 A2 , we will consider the following
three types of functions:
 (1) (0 , 1 , 2 )  (0,1,0) , i.e.,
f1 ( A)  A
 (2) (0 , 1 , 2 )  (1,1, 0) , i.e.,
f 2 ( A)  I  A
 (3) (0 , 1 , 2 )  (1,0.5,0.25) , i.e.,
f3 ( A)  I  0.5 A  0.25 A2
122
Simulation Result
123
Simulation Result
124
Simulation Result
125
Simulation Result
126
Simulation Result
W. W. Zachary, J. Anthropol, Res. 33, 452, 1977.
127
Simulation Result

f1 ( A)  A
128
Simulation Result

f 2 ( A)  I  A
129
Simulation Result

f3 ( A)  I  0.5 A  0.25 A2
130
Conclusion


In 2005, the National Science Foundation in the
U.S. realized that there is a need for “organized
knowledge of networks” based on the scientific
method.
This will require the integration of the knowledge
in various fields, including the Internet, power
grids, social networks, physical networks, and
biological networks. The main mathematical tool
for network science is the study of the dynamics
of graphs.
Research problems




How is life formed? Is the emergence of life through
random rewiring of DNAs according a certain microrule?
How powerful is a person in a community? How much is
he/she worth? Can these be evaluated by the people
he/she knows?
How can one bring down the Internet? What is the best
strategy to defend one’s network from malicious attacks?
How are these related to the topology of a network?
Why is there a phase change from water to ice? Can this
be explained by using the percolation theory? Does the
large deviation theory play a role here?
Download