Rame

advertisement
6. Conclusions
CONCLUSIONS
6
In this thesis we elucidated the principles underlying the evolution of transcription factors, the
regulatory networks they form, how these networks evolve within and across organisms, and
the dynamic usage of such networks in response to different cellular conditions. We also
provided insights that could be useful in engineering regulatory networks in different organisms.
We developed a computational method to reliably predict DNA binding transcription factors
using domain assignments to protein sequences. This allowed us to identify and characterise
271 transcription factors in the E. coli genome for which we determined domain architecture and
evolutionary relationships. We showed that in ~40% of the cases, DNA binding domains
combine with a partner domain, whose role is to recognize a variety of small molecules, and
that close to 75% of all transcription factors evolved by gene duplication.
We also showed that activators, repressors and dual regulators in E. coli belong to the same
families and share common domain architectures, suggesting that the regulatory role of a
transcription factor is not determined by evolutionary relationships. To a large extent, it is the
position of the transcription factor binding site that is indicative of its role: activators essentially
bind upstream, while over two thirds of repressors have at least one downstream binding site.
These findings can be used to improve current tools for predicting the position of DNA binding
sites.
In general, the method we developed to study transcription factors in E. coli can be applied to
any other genome. We used the same approach to identify transcription factors in both human
and mouse genomes. These predictions are currently being experimentally verified in the
context of a functional genomics project at the RIKEN Genome Science Center in Japan.
Transcription factors have evolved by extensive duplication and we were able to quantify it. We
then studied how gene duplication can explain network growth. We proposed a model for
6-1
6. Conclusions
network evolution based on gene duplication and quantified the extent to which this model
applies to experimental data. We found that close to 90% of the genes in E. coli and yeast
regulatory networks have actually evolved by gene duplication. 45% of these events can be
explained by the conservation of both genes and their regulatory interactions, while in the
remaining cases gene duplication was followed by the gain of new interactions during the
divergence phase.
We conclude that a very small fraction of the interactions have evolved by recombination
events. Formation of motifs cannot be considered the by-product of duplication events because
there are no cases of entire motifs that evolved from ancestral ones. Therefore, we suggest that
the appearance of network motifs is a matter of convergent evolution, with motifs repetitively
and independently selected for during evolution, perhaps because of their functional
significance. The extent of target gene duplication for the highly interconnected transcription
factors clearly indicates that duplication events alone cannot account for the resulting scale-free
property of the network. This implies the presence of other evolutionary forces at work in
shaping the network topology. What exactly these forces are remains to be determined.
To investigate how evolution operates across genomes, we developed a procedure to predict
transcriptional interactions for 175 prokaryotic genomes, based on the experimentally
determined E. coli network. By comparing conservation of genes across these genomes we
found that transcription factors evolve more quickly than their target genes, suggesting that the
underlying regulatory network changes rapidly during evolution. We could identify conserved
transcriptional response pathways in these genomes. Despite the lack of conservation at the
motif level, all of the reconstructed networks exhibit a scale-free behaviour at the global level.
It appears that by conserving or losing specific transcription factors, the same gene can be
regulated in completely different ways, dictated by the organism's specific lifestyle. Our results
have predictive value as we identified transcription factors and their response pathways in the
analysed genomes. Along with transcription factor predictions, we also provide insights into how
transcriptional programs in these organisms may possibly work. The full potential of our method
will be realised when more and more interaction data will become available as a result of largescale experiments similar in spirit to current genomics projects, but whose specific aim would be
to identify the full set of transcriptional interactions in a given organism.
Cells are highly dynamic systems, in which not all genes are expressed at the same time. To
study the regulatory network dynamics, we integrated the static network for yeast with gene
expression data and identified condition specific sub-networks. We then studied structural
properties of these sub-networks, and showed that they remain scale-free. We found that
different motifs are preferentially used depending on the requirements posed by the cellular
6-2
6. Conclusions
condition. Our analysis reveals that the structure of the active sub-network for any given
condition is re-organised in such a way that target genes are efficiently regulated. For example,
during stress, the organism needs to respond quickly to the signal and therefore there is
preferential usage of the single input motifs, which only requires one transcription factor to
activate many target genes simultaneously.
Additionally, we found that the role of the regulatory hubs in the network also changes: although
there is a small number of transcription factors that act as hubs throughout all cellular
conditions, most of them behave like hubs in specific conditions only. The resulting picture is
that of a network that shifts its weight between different foci to coordinate distinct cellular
processes. Our approach can be used to derive condition specific sub-networks for any given
organism and the results from such studies could have implications in developing drugs that
affect the regulatory hubs in a condition specific manner.
The work presented in this thesis is a step towards understanding how transcriptional regulatory
systems evolve. The insights gained by studying simple and well-characterized organisms like
E. coli and yeast can be applied to understand transcriptional networks and other kinds of
biological networks in more complex organisms, some of which may not be experimentally
tractable, like most human pathogens.
This work was possible only due to the availability of new kinds of genomics experimental data,
like microarray and ChIp-chip data. The methods we developed and our predictions can be
readily applied to design novel experiments that will aid in understanding intricate details of how
transcriptional systems work in other organisms, including humans.
6-3
Download