Motifs in Gene Regulatory Networks Based on the work of Uri Alon and his group. Report for the Seminar : “Bioinformatics of Gene Regulation” Pavlos Pavlidis PhD student – 1st Semester, Autumn 2005 Abstract Gene Regulatory Networks describe the components and their relationships of the cell’s transcriptional machinery. They are the on-off switches and rheostats of a cell operating at the gene level and dynamically orchestrate the transcriptional orchestra in order to respond to an inner or outer stimulus. (doegenomes.org). In the current work we review the specific attributes of those networks. These attributes are represented by specific types of “motifs”, which do not occur in random networks. We show that the motifs enable the network to respond in specific challenges of the cellular environment. Introduction The design principles of gene regulatory networks define the mechanism of regulation, determining their basic building blocks. The building blocks are responsible for the properties of the network since they enable it to respond with a certain way to stimuli. Often, in biological system a critical factor for the survival is the speed of cellular response. S Mangan & U Alon (2003) showed that specific motifs (feed-forwards loops) determine the speed of response in presence or in absence of a factor. Methods Databases The organization of the genome is different in Eucaryotes and Procaryotes. In the latter the genes are organized in operons: a functionally integrated genetic unit for the control of gene expression in bacteria. It consists of one or more genes that encode one or more polypeptide(s) and the adjacent site (promoter and operator) that controls their expression by regulating the transcription of the structural genes (www.fao.org). A database that contains information for the operons in E.coli, their products and the transcription factors that are involved, is the Regulon DB (http://www.cifn.unam.mx/Computational_Genomics/regulondb/) RegulonDB is a Java Interface Tool that enables somebody to navigate graphically in the chromosome and zoom into operons, transcription units and genes (Figure 1). In 2002, the database contained 577 interactions and 424 operons. Furthermore, the number of the involved transcription factor was 116. Using this information Shai S. et al (2002) presented the transcriptional network as a directed graph, in which each node simulates an operon and the edge (directed) between two operons symbolize that the first operon produces a transcription factor which regulates the transcription of the second operon. The transcription factor can be either activator or repressor. Fig1: The RegulonDB Scanning the network The network implemented using a matrix M. The value of Mij is 1 if the product of operon j regulates the transcription of operon i and 0 otherwise. The authors scanned the network in order to detect recurrent patterns (i.e. subgraphs that are repeated in the network representation). A pattern consists of n nodes. Therefore it is called n-node subgraph. The algorithm loops through all rows i of M. For each element (i, j) that is different that zero it loops through all connected elements (i, k), (k, i), (j, k), (k, j). This means that it loops in the i row and in the j line and considers the cells that have a non-zero value. These are the cells that are connected directly with the (i, j). The procedure is reapeated recursively until the n-node subgraph is obtained. The total number of occerences is stored. It should be noted that the algorithm performs corrections for multiple isomorphic occurrences of the same submatrix (because of symmetries). The procedure is performed also to a random network (RN). The term random here, needs further rubrication. A random network must retain the basic properties of the original real network (ON). These properties are: The number of nodes in the RN must be equal to the number of nodes in the ON. For each node, the number of incoming and out-going edges remains the same as in the ON. The number of n-1 subgraphs in the RN equals to the corresponding number in the ON. This must hold because otherwise if a subgraph in a motif is significant then the whole motif will be significant. In order to avoid this sideeffect the authors put this third requirement. The number of occurrences of an n-node subgraph is calculated for each one of 1000 RNs. The next step consists of the calculation of distribution and the probability that an RN will have the same or more appearances of a particular n-node subgraph. If the following requirements hold then the n-node subgraph is considered significant. The probability that a motif appears in a randomized network an equal or greater number of times than in ON must be smaller than 0.01. This requirement ensures that the appearance of the motif in a real network is happening much more often than in a random network. For this reason, it can be assumed that the corresponding n-node subgraph has a specific biological function that is implemented by the ON. The number of times that a motif appears in the ON (with distinct set of nodes) is at least 4. The number of appearances of the motif in the ON must be significant greater than the number it appears in a RN. That is formulated as Nreal – Nrand > 0.1Nrand, where Nreal is the number of occurrences in the ON and Nrand is the number of occurrences in the RN. This ensures that a motif that occurs in the RN approximately the same times as in the ON, but also has a very narrow distribution (so the requirement of p<0.01 holds) will not be considered as a significant one. It should be mentioned, that the algorithm does not recognizes biologically significant motifs based on experimental or functional information, but statistically significant motifs that occur in a real network but occur with a very low probability in a randomized network. The creation of the randomized networks As mentioned above in the randomized networks each node must have the same number of incoming and outgoing edges. In terms of Matrix representation that means that each row in the randomized graph must have the same number of 1s as in the ON. The same requirement must hold for the columns. This is implemented using a Markov-chain algorithm which is used often in adjacenty matrices. The algorithm begins with the matrix M of the ON and repeatedly swaps randomly chosen pairs of connections. (X1 Y1, X2 Y2 is replaced by X1 Y2, X2 Y1) until the network is well mixed. Notice that according to this procedure the number of the connections in each row and each column will remain the same as in the original matrix. Results The first significant motif is a 3-node graph. There are eight types of 3-node motifs (Fig 2). From these motifs only the feed-forward loop occurs significantly more frequently in biological networks than in random networks. The feed – forward loop (FFL) consists of two distinct types. The first one is termed as coherent FFL (cFFL) and the second one incoherent FFL (iFFL). cFFL means that the effect of the factor X to the product Z in the direct link (positive or negative) is the same as the indirect link through Y (see Fig 2). It turns out (see below) that each type of those two motifs have different properties and that they affect the response of the system to external changes of the environment. The second motif, is the SIM. SIM means single input module. In this motif there is a transcription factor X that regulates the expression of a set of operons. All of the operons are under the same kind of control (positive or negative). The transcription factors that control the expression of a SIM motif are most of the times autoregulated. Specifically 70% of them are auto-repressed. This means that they repress their further expression when their transcription level is increased. In E.coli the SIM moti appears 24 times. Large SIMs occur in randomized networks with probability less than 0.01. The third motif is called DOR, which means dense overlapping regulons. Those motifs contain a overlapping interactions between operons as an output, and as an input a set of transcription factors. The critical point is the determination of the word “density”. The authors show that in randomized networks the frequency of pairs of genes regulated by the same two transcription factors is much smaller than in real networks (57±14 vs 203). An example of DOR is the set of operons which are regulated by RpoS in the stationary phase. An interesting property of the DORs is that it seems that there is an inner organization of this motif. This means that a set of genes in a DOR can share exactly the same transcription factors, which is a subset of the transcription factors of the whole DOR. A figure represents the three motifs (FFL, SIM, DOR) is shown below. Fig 2: a. the FFL motif and in b a real example is the motif of ara. c. the SIM motif. A transcription factor is autoregulated and simultaneously regulates the expression of many genes. d. A biological example which represents a SIM motif. It’s the arginine biosynthesis. e. The DOR motif. A set of operons Z1…Zn are regulated by a combination of a set of Input transcription factors, X1…Xn. f. An example of the DOR motif is illustrated. It represents the stationary phase response. The meaning of FFL motif Mangan & Alon (2003) investigate in detail, the properties of the FFL motif. The FFL contains three interactions as is shown in Fig2. One direct from the transcription factor X (general transcription factor) to the operon Z, and two more from X to Z through the specific transcription factor Y. Each interaction can be either negative or positive. Therefore there are 8 different FFL motifs. Four of them are called coherent, when the type of regulation between X and Z is the same as the overall regulation through Y, and the complementary set is called incoherent. Fig3 : The eight FFL motifs. The upper row shows the coherent FFLs, and the second row the incoherent. The distinction is based on the direct effect of X to Z compared to the indirect through Y. The FFL motifs provide a great flexibility to the network considering the expression of the target genes. Both coherent and incoherent FFL is sign sensitive. They respond only in one direction. For example the first coherent type, which is the most common in E.coli and in Yeast responds rapidly to stimuli that decrease X but slowly to stimuli that increase it. On the other hand the last type of the incoherent FFL has a pulse-like behaviour. In the beginning the production of Z is increased, but as the production of X increases the Y decreases. After a certain point the quantity of Y is not able to produce Z and that means that Z decreases. Conclusions The work of Alon’s group represents the mining of motifs of a network, which is completely analogous to the mining of important patterns in a DNA sequence. With many publications from 2002 until 2005 they showed that biological regulatory networks are not structured randomly but they show preferences for some certain motifs. These motifs can be simple as the FFL motif which consists only of three nodes or more complicated as the DOR motif which can be viewed as a super-motif. The important result is that each one of those motifs provokes a specific function. The authors showed that FFL respond asymmetrically to responses that increase or decrease X and they proposed roles for some types of those motifs. References Mangan, S. and U. Alon. 2003. Structure and function of the feed-forward loop network motif. Proc Natl Acad Sci U S A 100: 11980-11985. Mangan, S., A. Zaslaver, and U. Alon. 2003. The coherent feedforward loop serves as a sign-sensitive delay element in transcription networks. J Mol Biol 334: 197204. Milo, R., S. Shen-Orr, S. Itzkovitz, N. Kashtan, D. Chklovskii, and U. Alon. 2002. Network motifs: simple building blocks of complex networks. Science 298: 824-827. Rosenfeld, N., M.B. Elowitz, and U. Alon. 2002. Negative autoregulation speeds the response times of transcription networks. J Mol Biol 323: 785-793. Shen-Orr, S.S., R. Milo, S. Mangan, and U. Alon. 2002. Network motifs in the transcriptional regulation network of Escherichia coli. Nat Genet 31: 64-68.