Network Motifs: Simple Building Blocks of Complex Network Lecturer: Jian Li Introduction Recently, it was found that biochemical and neuronal network share a similar property: they contain recurring circuit elements which occur more often far more than that in randomized networks. We call such simple building blocks network motifs. Introduction In the case of biological regulation networks, it has been suggested that network motifs play key information processing roles. Introduction Some examples: Three major network mortifs were found in the transcription network of bacteria and yeast. One of these the feed-forward loop, has been shown theoretically to perform information processing tasks such as sign-sensitive filtering, response acceleration and pulsegeneration. Introduction Some examples: Introduction Schematic Illustration: Red dashed line indicate edges that participate in the feedforward loop motif, which occur five times in the real network. Introduction Applications in other network Ecology (food web) Neurobiology (neuron connectivity) Engineering (electronic circuit, WWW) …………………… Introduction Some remarks: The solution we get is closely related to the randomized network model. So a reasonable select of randomized network model is very important. Some functional-important but lessfrequent building block will be missed no matter how we select our model. To find this type of things need specific knowledge and information which are beyond the sweep of graph theory approach. Related Problems Theoretical Perspective: efficiently counting cycle. counting spanning trees. number of nonisomorphic graphs testing isomorphism approximating perfect matching. approximating frequent subgraphs based on the regularity lemma. ………………… Related Problems Data mining perspective. Mining frequent subgraphs. Mining a given subgraph. Mining subgraphs in sparse network. Graph-based substructure pattern mining(gSpan)………………… Related Problems Random network. Generating randomized network with prescribed degree sequence. Estimating subgraphs in random networks. Related Problems Random network. Erdos model -the distribution of the number of edges per node exhibit a Poissonian distribution. Scale-free model -the distribution of the number of edges per node exhibit a exponential distribution. Randomized Network Generating randomized network Here we only give a simple algorithm. We employed a Markov-chain algorithm, based on starting with the real network and repeatedly swapping randomly chosen pairs of connections (X1->Y1, X2 ->Y2 is replaced by X1->Y2, X2->Y1) until the network is well randomized. Switching is prohibited if the either of the connections X1->Y2 or X2->Y1 already exist. Randomized Network Controlling for Appearances of (n – 1)Node Motifs We generate a series of randomized network ensembles, each of which has the same (n – 1)-node subgraph count as the real network, as a null hypothesis for detecting n-node motifs. This is done to avoid assigning high significance to a structure only because of the fact that it includes a highly significant substructure. Randomized Network Controlling for Appearances of (n – 1)Node Motifs Metropolis Monte-Carlo approach Vreal,k be the number of appearances of each of the kth (n-1)-node subgraphs in the real network and Vrand,k be the corresponding vector in the randomized network. We define an energy E = k(|Vreal,k – Vrand,k|/(Vreal,k + Vrand,k)). The energy E is zero only when all the three-node subgraph counts of the real and randomized graphs are equal. Randomized Network Controlling for Appearances of (n – 1)Node Motifs start by fully randomizing the network according to first algorithm. Then, we generate a random switch (X1->Y1, X2-> Y2 to (X1->Y2, X2->Y1), and similarly for double edges, as described above). If this switch lowers E, it is accepted. Otherwise, it is accepted with probability exp(–M E/T), where ME is the difference in energy before and after the switch and T is an effective temperature. Graph Theoretical Results Controlling for Appearances of (n – 1)Node Motifs This process is repeated, with a simulated annealing regiment to lower T slowly until a solution with E = 0 is obtained. This can be readily generalized to form (n – 1)-node nullhypothesis networks Algorithm: Counting Goal: find all n-node network motif Method: Do the following for both real network and randomized network Simply enumerate all the possible n node subgraphs, classify them into nonisomorphic class. Count the number of subgraphs in each class.[see all types of 3,4node nonisomorphic graphs] Algorithm: Counting Efficiently count all connected n-node subgraphs in a connectivity matrix M main{ for all rows i ; for each nonzero element (i, j); search (i,j); } search(i,j) { for each k such that Mik = 1 and k!=j{ if an n-node subgraph is obtained then record it and return; else search (i,k); } do similar things for each Mki = 1, Mkj = 1, Mjk = 1; } Algorithm: Counting A table is formed that counts the number of appearances of each type of subgraph in the network, This process is repeated for each of the randomized networks. The number of appearances of each type of subgraph in the random ensemble is recorded, to assess its statistical significance. Algorithm: Counting Criteria for Network Motif Selection (i) The probability that it appears in a randomized network an equal or greater number of times than in the real network is smaller than P = 0.01. (ii) The number of times it appears in the real network with distinct sets of nodes is at least 4. (iii) The number of appearances in the real network is significantly larger than in the randomized networks: Nreal – Nrand > 0.1Nrand. This is done to avoid detecting as motifs some common subgraphs that have only a slight difference between Nrand and Nreal but have a narrow distribution in the randomized networks. Algorithm: Counting Result Ci=Ni/i Ni Z-scores : Z = (Creal –Crand)/Varrand (note the inequality: P[|(X-E(x))|>Z*Var(x)]<1/Z2 ) High Z-scores indicate the event is quit unlikely. Algorithm: Sampling A clever trade-off between accuracy and efficiency. The counting algorithm can exactly enumerate the number of subgraph, but to detect network motifs, we only need to know which type of subgraph occur more frequently in real network than in randomized network. Algorithm: Sampling Using random sampling method can do pretty good estimation. Random sampling has many applications. -approximating dense subset -approximating #P-complete problem -mechine learning …………… Algorithm: Sampling This algorithm does not enumerate subgraphs exhaustively but instead samples subgraphs in order to estimate their relative frequency. The runtime of the algorithm asymptotically does not depend on the network size. Surprisingly, few samples are needed to detect network motifs reliably. The sampling method is useful for analyzing very large networks or for detection of high-order motifs, which are beyond the reach of exhaustive enumeration algorithms. Algorithm: Sampling Definition:Es is the set of picked edges Vs is the set of all node that are touch be the edges in Es ALGORITHM Sampling: Initiate Vs= and Es = 1.Pick a random edge e1=(vi,vj),update Es={e1},Vs={vi,vj} 2.Make a list L of all neighboring edges of Es, omit all edges between Vs.if L= return to 1 3.pick a random edge e=(vk,vl)from L. Update Es=Es U {e}, Vs=Vs U {vk,vl} 4.Repeat steps 2-3 until completing n-node subgraph S. 5.Calculate the probability P to sample S. Algorithm: Sampling The probability of sampling the subgraph is the sum of the probabilities of all such possible ordered sets of n-1 edges: Where Sm is a set of all (n-1)-permutations of the edges from the specific subgraph edges that could lead to a sample of the subgraph. Ej is the j -th edge in a specific (n-1)-permutation (σ). Algorithm: Sampling Algorithm: Sampling Add score W = 1/P to the accumulated score, Si , of the relevant subgraph type i: Si = Si + W. After ST samples, assuming we sampled L different subgraph types, we calculate the estimated subgraph concentrations Ci =Si/k=1L Sk Algorithm: Sampling Z-scores is calculated as before. Z = (Creal –<Crand>)/Varrand where Creal is the concentration in the real network, <Crand> and Varrand are the mean and SD in the randomized networks. Algorithm: Sampling Sampling method versus exhaustive enumeration, *Highlighted subgraphs were found to be network motifs. Algorithm: Sampling Algorithm convergence The subgraph concentrations calculated by the sampling algorithm converged to the fully enumerated concentrations. Different numbers of samples were required for achieving good estimations for different subgraphs and in different networks. All of the simulations we performed, on a variety of networks, showed that the results converge toward the real values within ST = 105 samples or less. Algorithm: Sampling Algorithm convergence It is seen that even with a small number of samples one can estimate reliably concentrations as low as C = 10-5. It is possible to use convergence studies in order to decide the required number of samples.(adaptive sampling method,using instantaneous convergence rate to decide how many samples are enough) Algorithm: Sampling The sampling method allows accurate counting of rare, high-order subgraphs and motifs Some discuss and Future attempt We focus on comparing between the real network and the randomized network with prescribed degree sequence. So our question is whether some real frequent building block are caused by the degree sequence. If so, so what we have done will miss this type of building block. Some other randomized network model (rather than the ones with prescribed degree sequence) could be introduced to deal with such case. Some discuss and Future attempt Embedding the graph to euclidean space, and considering the subgraph with no only topological properties but also geometric properties. THANKS~~~~~