Pavlidis_BiGR

advertisement
Motifs in Gene Regulatory Networks
Based on the work of Uri Alon and his group.
Report for the Seminar : “Bioinformatics of Gene Regulation”
Pavlos Pavlidis
PhD student – 1st Semester, Autumn 2005
Abstract
Gene Regulatory Networks describe the components and their relationships of the
cell’s transcriptional machinery. They are the on-off switches and rheostats of a cell
operating at the gene level and dynamically orchestrate the transcriptional orchestra in
order to respond to an inner or outer stimulus. (doegenomes.org). In the current work
we review the specific attributes of those networks. These attributes are represented
by specific types of “motifs”, which do not occur in random networks. We show that
the motifs enable the network to respond in specific challenges of the cellular
environment.
Introduction
The design principles of gene regulatory networks define the mechanism of
regulation, determining their basic building blocks. The building blocks are
responsible for the properties of the network since they enable it to respond with a
certain way to stimuli. Often, in biological system a critical factor for the survival is
the speed of cellular response. S Mangan & U Alon (2003) showed that specific
motifs (feed-forwards loops) determine the speed of response in presence or in
absence of a factor.
Methods
Databases
The organization of the genome is different in Eucaryotes and Procaryotes. In the
latter the genes are organized in operons: a functionally integrated genetic unit for the
control of gene expression in bacteria. It consists of one or more genes that encode
one or more polypeptide(s) and the adjacent site (promoter and operator) that controls
their expression by regulating the transcription of the structural genes (www.fao.org).
A database that contains information for the operons in E.coli, their products and the
transcription
factors
that
are
involved,
is
the
Regulon
DB
(http://www.cifn.unam.mx/Computational_Genomics/regulondb/) RegulonDB is a
Java Interface Tool that enables somebody to navigate graphically in the chromosome
and zoom into operons, transcription units and genes (Figure 1). In 2002, the
database contained 577 interactions and 424 operons. Furthermore, the number of the
involved transcription factor was 116. Using this information Shai S. et al (2002)
presented the transcriptional network as a directed graph, in which each node
simulates an operon and the edge (directed) between two operons symbolize that the
first operon produces a transcription factor which regulates the transcription of the
second operon. The transcription factor can be either activator or repressor.
Fig1: The RegulonDB
Scanning the network
The network implemented using a matrix M. The value of Mij is 1 if the product of
operon j regulates the transcription of operon i and 0 otherwise.
The authors scanned the network in order to detect recurrent patterns (i.e. subgraphs
that are repeated in the network representation). A pattern consists of n nodes.
Therefore it is called n-node subgraph.
The algorithm loops through all rows i of M. For each element (i, j) that is different
that zero it loops through all connected elements (i, k), (k, i), (j, k), (k, j). This means
that it loops in the i row and in the j line and considers the cells that have a non-zero
value. These are the cells that are connected directly with the (i, j). The procedure is
reapeated recursively until the n-node subgraph is obtained. The total number of
occerences is stored. It should be noted that the algorithm performs corrections for
multiple isomorphic occurrences of the same submatrix (because of symmetries).
The procedure is performed also to a random network (RN). The term random here,
needs further rubrication.
A random network must retain the basic properties of the original real network (ON).
These properties are:
 The number of nodes in the RN must be equal to the number of nodes in the
ON.


For each node, the number of incoming and out-going edges remains the same
as in the ON.
The number of n-1 subgraphs in the RN equals to the corresponding number in
the ON. This must hold because otherwise if a subgraph in a motif is
significant then the whole motif will be significant. In order to avoid this sideeffect the authors put this third requirement.
The number of occurrences of an n-node subgraph is calculated for each one of 1000
RNs. The next step consists of the calculation of distribution and the probability that
an RN will have the same or more appearances of a particular n-node subgraph.
If the following requirements hold then the n-node subgraph is considered significant.
 The probability that a motif appears in a randomized network an equal or
greater number of times than in ON must be smaller than 0.01. This
requirement ensures that the appearance of the motif in a real network is
happening much more often than in a random network. For this reason, it can
be assumed that the corresponding n-node subgraph has a specific biological
function that is implemented by the ON.
 The number of times that a motif appears in the ON (with distinct set of
nodes) is at least 4.
The number of appearances of the motif in the ON must be significant greater than the
number it appears in a RN. That is formulated as Nreal – Nrand > 0.1Nrand, where Nreal is
the number of occurrences in the ON and Nrand is the number of occurrences in the
RN. This ensures that a motif that occurs in the RN approximately the same times as
in the ON, but also has a very narrow distribution (so the requirement of p<0.01
holds) will not be considered as a significant one.
It should be mentioned, that the algorithm does not recognizes biologically
significant motifs based on experimental or functional information, but statistically
significant motifs that occur in a real network but occur with a very low probability in
a randomized network.
The creation of the randomized networks
As mentioned above in the randomized networks each node must have the same
number of incoming and outgoing edges. In terms of Matrix representation that means
that each row in the randomized graph must have the same number of 1s as in the ON.
The same requirement must hold for the columns. This is implemented using a
Markov-chain algorithm which is used often in adjacenty matrices. The algorithm
begins with the matrix M of the ON and repeatedly swaps randomly chosen pairs of
connections. (X1  Y1, X2  Y2 is replaced by X1  Y2, X2  Y1) until the network
is well mixed. Notice that according to this procedure the number of the connections in
each row and each column will remain the same as in the original matrix.
Results
The first significant motif is a 3-node graph. There are eight types of 3-node motifs (Fig
2). From these motifs only the feed-forward loop occurs significantly more frequently in
biological networks than in random networks. The feed – forward loop (FFL) consists of
two distinct types. The first one is termed as coherent FFL (cFFL) and the second one
incoherent FFL (iFFL). cFFL means that the effect of the factor X to the product Z in the
direct link (positive or negative) is the same as the indirect link through Y (see Fig 2). It
turns out (see below) that each type of those two motifs have different properties and that
they affect the response of the system to external changes of the environment.
The second motif, is the SIM. SIM means single input module. In this motif there is a
transcription factor X that regulates the expression of a set of operons. All of the operons
are under the same kind of control (positive or negative). The transcription factors that
control the expression of a SIM motif are most of the times autoregulated. Specifically
70% of them are auto-repressed. This means that they repress their further expression
when their transcription level is increased. In E.coli the SIM moti appears 24 times. Large
SIMs occur in randomized networks with probability less than 0.01.
The third motif is called DOR, which means dense overlapping regulons. Those motifs
contain a overlapping interactions between operons as an output, and as an input a set of
transcription factors. The critical point is the determination of the word “density”. The
authors show that in randomized networks the frequency of pairs of genes regulated by
the same two transcription factors is much smaller than in real networks (57±14 vs 203).
An example of DOR is the set of operons which are regulated by RpoS in the
stationary phase. An interesting property of the DORs is that it seems that there is an
inner organization of this motif. This means that a set of genes in a DOR can share
exactly the same transcription factors, which is a subset of the transcription factors of
the whole DOR.
A figure represents the three motifs (FFL, SIM, DOR) is shown below.
Fig 2:
a. the FFL motif and in b a real example is the
motif of ara.
c. the SIM motif. A transcription factor is
autoregulated and simultaneously regulates the
expression of many genes.
d. A biological example which represents a SIM
motif. It’s the arginine biosynthesis.
e. The DOR motif. A set of operons Z1…Zn are
regulated by a combination of a set of Input
transcription factors, X1…Xn.
f. An example of the DOR motif is illustrated. It
represents the stationary phase response.
The meaning of FFL motif
Mangan & Alon (2003) investigate in detail, the properties of the FFL motif. The FFL
contains three interactions as is shown in Fig2. One direct from the transcription
factor X (general transcription factor) to the operon Z, and two more from X to Z
through the specific transcription factor Y. Each interaction can be either negative or
positive. Therefore there are 8 different FFL motifs. Four of them are called coherent,
when the type of regulation between X and Z is the same as the overall regulation
through Y, and the complementary set is called incoherent.
Fig3 :
The eight FFL motifs. The upper
row shows the coherent FFLs, and
the second row the incoherent.
The distinction is based on the
direct effect of X to Z compared
to the indirect through Y.
The FFL motifs provide a great flexibility to the network considering the expression
of the target genes. Both coherent and incoherent FFL is sign sensitive. They respond
only in one direction. For example the first coherent type, which is the most common
in E.coli and in Yeast responds rapidly to stimuli that decrease X but slowly to stimuli
that increase it. On the other hand the last type of the incoherent FFL has a pulse-like
behaviour. In the beginning the production of Z is increased, but as the production of
X increases the Y decreases. After a certain point the quantity of Y is not able to
produce Z and that means that Z decreases.
Conclusions
The work of Alon’s group represents the mining of motifs of a network, which is
completely analogous to the mining of important patterns in a DNA sequence. With
many publications from 2002 until 2005 they showed that biological regulatory
networks are not structured randomly but they show preferences for some certain
motifs. These motifs can be simple as the FFL motif which consists only of three
nodes or more complicated as the DOR motif which can be viewed as a super-motif.
The important result is that each one of those motifs provokes a specific function. The
authors showed that FFL respond asymmetrically to responses that increase or
decrease X and they proposed roles for some types of those motifs.
References
Mangan, S. and U. Alon. 2003. Structure and function of the feed-forward loop
network motif. Proc Natl Acad Sci U S A 100: 11980-11985.
Mangan, S., A. Zaslaver, and U. Alon. 2003. The coherent feedforward loop serves as
a sign-sensitive delay element in transcription networks. J Mol Biol 334: 197204.
Milo, R., S. Shen-Orr, S. Itzkovitz, N. Kashtan, D. Chklovskii, and U. Alon. 2002.
Network motifs: simple building blocks of complex networks. Science 298:
824-827.
Rosenfeld, N., M.B. Elowitz, and U. Alon. 2002. Negative autoregulation speeds the
response times of transcription networks. J Mol Biol 323: 785-793.
Shen-Orr, S.S., R. Milo, S. Mangan, and U. Alon. 2002. Network motifs in the
transcriptional regulation network of Escherichia coli. Nat Genet 31: 64-68.
Download