Network Motifs: Simple Building Blocks of Complex Network

advertisement
Network Motifs: Simple Building
Blocks of Complex Network
Lecturer: Jian Li
Introduction
 Recently, it was found that
biochemical and neuronal network
share a similar property: they contain
recurring circuit elements which occur
more often far more than that in
randomized networks.
 We call such simple building blocks
network motifs.
Introduction
 In the case of biological regulation
networks, it has been suggested that
network motifs play key information
processing roles.
Introduction
 Some examples:
Three major network mortifs were found in the
transcription network of bacteria and yeast.
One of these the feed-forward loop, has been
shown theoretically to perform information
processing tasks such as sign-sensitive
filtering, response acceleration and pulsegeneration.
Introduction
 Some examples:
Introduction
 Schematic Illustration:
Red dashed line indicate edges that participate in the feedforward loop
motif, which occur five times in the real network.
Introduction
 Applications in other network




Ecology (food web)
Neurobiology (neuron connectivity)
Engineering (electronic circuit, WWW)
……………………
Introduction
 Some remarks:
 The solution we get is closely related to the
randomized network model. So a
reasonable select of randomized network
model is very important.
 Some functional-important but lessfrequent building block will be missed no
matter how we select our model. To find
this type of things need specific knowledge
and information which are beyond the
sweep of graph theory approach.
Related Problems
 Theoretical Perspective:
efficiently counting cycle.
counting spanning trees.
number of nonisomorphic graphs
testing isomorphism
approximating perfect matching.
approximating frequent subgraphs based
on the regularity lemma.
 …………………






Related Problems
 Data mining perspective.




Mining frequent subgraphs.
Mining a given subgraph.
Mining subgraphs in sparse network.
Graph-based substructure pattern
mining(gSpan)…………………
Related Problems
 Random network.
 Generating randomized network with
prescribed degree sequence.
 Estimating subgraphs in random
networks.
Related Problems
 Random network.
 Erdos model
 -the distribution of the number of
edges per node exhibit a Poissonian
distribution.
 Scale-free model
 -the distribution of the number of
edges per node exhibit a exponential
distribution.
Randomized Network
 Generating randomized network
 Here we only give a simple algorithm.
 We employed a Markov-chain algorithm, based on
starting with the real network and repeatedly swapping
randomly chosen pairs of connections (X1->Y1, X2 ->Y2
is replaced by X1->Y2, X2->Y1) until the network is well
randomized.
 Switching is prohibited if the either of the connections
X1->Y2 or X2->Y1 already exist.
Randomized Network
 Controlling for Appearances of (n – 1)Node Motifs
 We generate a series of randomized network
ensembles, each of which has the same (n –
1)-node subgraph count as the real network,
as a null hypothesis for detecting n-node
motifs.
 This is done to avoid assigning high
significance to a structure only because of the
fact that it includes a highly significant
substructure.
Randomized Network
 Controlling for Appearances of (n – 1)Node Motifs
 Metropolis Monte-Carlo approach
 Vreal,k be the number of appearances of each of the
kth (n-1)-node subgraphs in the real network and
Vrand,k be the corresponding vector in the randomized
network.
 We define an energy
E = k(|Vreal,k – Vrand,k|/(Vreal,k + Vrand,k)).
 The energy E is zero only when all the three-node
subgraph counts of the real and randomized graphs
are equal.
Randomized Network
 Controlling for Appearances of (n – 1)Node Motifs





start by fully randomizing the network according to
first algorithm.
Then, we generate a random switch (X1->Y1, X2->
Y2 to (X1->Y2, X2->Y1), and similarly for double edges,
as described above).
If this switch lowers E, it is accepted.
Otherwise, it is accepted with probability exp(–M E/T),
where ME is the difference in energy before and after
the switch and T is an effective temperature.
Graph Theoretical Results
 Controlling for Appearances of (n – 1)Node Motifs


This process is repeated, with a simulated annealing
regiment to lower T slowly until a solution with E = 0 is
obtained.
This can be readily generalized to form (n – 1)-node nullhypothesis networks
Algorithm: Counting
 Goal: find all n-node network motif
 Method:
 Do the following for both real network and
randomized network
 Simply enumerate all the possible n node
subgraphs, classify them into nonisomorphic class.
 Count the number of subgraphs in each
class.[see all types of 3,4node
nonisomorphic graphs]
Algorithm: Counting
 Efficiently count all connected n-node
subgraphs in a connectivity matrix M
main{
for all rows i ;
for each nonzero element (i, j);
search (i,j);
}
search(i,j)
{
for each k such that Mik = 1 and k!=j{
if an n-node subgraph is obtained then record it and return;
else search (i,k);
}
do similar things for each Mki = 1, Mkj = 1, Mjk = 1;
}
Algorithm: Counting
 A table is formed that counts the
number of appearances of each type
of subgraph in the network,
 This process is repeated for each of
the randomized networks. The
number of appearances of each type
of subgraph in the random ensemble
is recorded, to assess its statistical
significance.
Algorithm: Counting
 Criteria for Network Motif Selection
 (i) The probability that it appears in a randomized
network an equal or greater number of times than in
the real network is smaller than P = 0.01.
 (ii) The number of times it appears in the real
network with distinct sets of nodes is at least 4.
 (iii) The number of appearances in the real network is
significantly larger than in the randomized networks:
Nreal – Nrand > 0.1Nrand. This is done to avoid detecting
as motifs some common subgraphs that have only a
slight difference between Nrand and Nreal but have a
narrow distribution in the randomized networks.
Algorithm: Counting
 Result
Ci=Ni/i Ni
Z-scores :
Z = (Creal –Crand)/Varrand
(note the inequality:
P[|(X-E(x))|>Z*Var(x)]<1/Z2
)
High Z-scores indicate the
event is quit unlikely.
Algorithm: Sampling
 A clever trade-off between accuracy
and efficiency.
 The counting algorithm can exactly
enumerate the number of subgraph,
but to detect network motifs, we only
need to know which type of subgraph
occur more frequently in real network
than in randomized network.
Algorithm: Sampling
 Using random sampling method can
do pretty good estimation.
 Random sampling has many
applications.
 -approximating dense subset
 -approximating #P-complete problem
 -mechine learning
 ……………
Algorithm: Sampling
 This algorithm does not enumerate
subgraphs exhaustively but instead
samples subgraphs in order to estimate
their relative frequency.
 The runtime of the algorithm asymptotically
does not depend on the network size.
 Surprisingly, few samples are needed to
detect network motifs reliably.
 The sampling method is useful for analyzing
very large networks or for detection of
high-order motifs, which are beyond the
reach of exhaustive enumeration algorithms.
Algorithm: Sampling
 Definition:Es is the set of picked edges
 Vs is the set of all node that are touch be the edges in Es
 ALGORITHM Sampling:






Initiate Vs= and Es =
1.Pick a random edge e1=(vi,vj),update Es={e1},Vs={vi,vj}
2.Make a list L of all neighboring edges of Es, omit all edges
between Vs.if L= return to 1
3.pick a random edge e=(vk,vl)from L. Update Es=Es U {e},
Vs=Vs U {vk,vl}
4.Repeat steps 2-3 until completing n-node subgraph S.
5.Calculate the probability P to sample S.
Algorithm: Sampling
 The probability of sampling the subgraph is
the sum of the probabilities of all such
possible ordered sets of n-1 edges:
 Where Sm is a set of all (n-1)-permutations
of the edges from the specific subgraph
edges that could lead to a sample of the
subgraph. Ej is the j -th edge in a specific
(n-1)-permutation (σ).
Algorithm: Sampling
Algorithm: Sampling
 Add score W = 1/P to the
accumulated score, Si , of the
relevant subgraph type i: Si = Si + W.
After ST samples, assuming we
sampled L different subgraph types,
we calculate the estimated subgraph
concentrations
Ci =Si/k=1L Sk
Algorithm: Sampling
 Z-scores is calculated as before.
Z = (Creal –<Crand>)/Varrand
 where Creal is the concentration in the
real network, <Crand> and Varrand are
the mean and SD in the randomized
networks.
Algorithm: Sampling
Sampling method versus exhaustive enumeration, *Highlighted
subgraphs were found to be network motifs.
Algorithm: Sampling
 Algorithm convergence
 The subgraph concentrations calculated by
the sampling algorithm converged to the
fully enumerated concentrations. Different
numbers of samples were required for
achieving good estimations for different
subgraphs and in different networks.
 All of the simulations we performed, on a
variety of networks, showed that the
results converge toward the real values
within ST = 105 samples or less.
Algorithm: Sampling
 Algorithm convergence
 It is seen that even with a small number of
samples one can estimate reliably
concentrations as low as C = 10-5.
 It is possible to use convergence studies in
order to decide the required number of
samples.(adaptive sampling method,using
instantaneous convergence rate to decide
how many samples are enough)
Algorithm: Sampling
 The sampling method allows accurate counting of
rare, high-order subgraphs and motifs
Some discuss and Future attempt
 We focus on comparing between the real
network and the randomized network with
prescribed degree sequence. So our
question is whether some real frequent
building block are caused by the degree
sequence.
 If so, so what we have done will miss this
type of building block. Some other
randomized network model (rather than the
ones with prescribed degree sequence)
could be introduced to deal with such case.
Some discuss and Future attempt
 Embedding the graph to euclidean
space, and considering the subgraph
with no only topological properties
but also geometric properties.
THANKS~~~~~
Download