6. Conclusions CONCLUSIONS 6 In this thesis we elucidated the principles underlying the evolution of transcription factors, the regulatory networks they form, how these networks evolve within and across organisms, and the dynamic usage of such networks in response to different cellular conditions. We also provided insights that could be useful in engineering regulatory networks in different organisms. We developed a computational method to reliably predict DNA binding transcription factors using domain assignments to protein sequences. This allowed us to identify and characterise 271 transcription factors in the E. coli genome for which we determined domain architecture and evolutionary relationships. We showed that in ~40% of the cases, DNA binding domains combine with a partner domain, whose role is to recognize a variety of small molecules, and that close to 75% of all transcription factors evolved by gene duplication. We also showed that activators, repressors and dual regulators in E. coli belong to the same families and share common domain architectures, suggesting that the regulatory role of a transcription factor is not determined by evolutionary relationships. To a large extent, it is the position of the transcription factor binding site that is indicative of its role: activators essentially bind upstream, while over two thirds of repressors have at least one downstream binding site. These findings can be used to improve current tools for predicting the position of DNA binding sites. In general, the method we developed to study transcription factors in E. coli can be applied to any other genome. We used the same approach to identify transcription factors in both human and mouse genomes. These predictions are currently being experimentally verified in the context of a functional genomics project at the RIKEN Genome Science Center in Japan. Transcription factors have evolved by extensive duplication and we were able to quantify it. We then studied how gene duplication can explain network growth. We proposed a model for 6-1 6. Conclusions network evolution based on gene duplication and quantified the extent to which this model applies to experimental data. We found that close to 90% of the genes in E. coli and yeast regulatory networks have actually evolved by gene duplication. 45% of these events can be explained by the conservation of both genes and their regulatory interactions, while in the remaining cases gene duplication was followed by the gain of new interactions during the divergence phase. We conclude that a very small fraction of the interactions have evolved by recombination events. Formation of motifs cannot be considered the by-product of duplication events because there are no cases of entire motifs that evolved from ancestral ones. Therefore, we suggest that the appearance of network motifs is a matter of convergent evolution, with motifs repetitively and independently selected for during evolution, perhaps because of their functional significance. The extent of target gene duplication for the highly interconnected transcription factors clearly indicates that duplication events alone cannot account for the resulting scale-free property of the network. This implies the presence of other evolutionary forces at work in shaping the network topology. What exactly these forces are remains to be determined. To investigate how evolution operates across genomes, we developed a procedure to predict transcriptional interactions for 175 prokaryotic genomes, based on the experimentally determined E. coli network. By comparing conservation of genes across these genomes we found that transcription factors evolve more quickly than their target genes, suggesting that the underlying regulatory network changes rapidly during evolution. We could identify conserved transcriptional response pathways in these genomes. Despite the lack of conservation at the motif level, all of the reconstructed networks exhibit a scale-free behaviour at the global level. It appears that by conserving or losing specific transcription factors, the same gene can be regulated in completely different ways, dictated by the organism's specific lifestyle. Our results have predictive value as we identified transcription factors and their response pathways in the analysed genomes. Along with transcription factor predictions, we also provide insights into how transcriptional programs in these organisms may possibly work. The full potential of our method will be realised when more and more interaction data will become available as a result of largescale experiments similar in spirit to current genomics projects, but whose specific aim would be to identify the full set of transcriptional interactions in a given organism. Cells are highly dynamic systems, in which not all genes are expressed at the same time. To study the regulatory network dynamics, we integrated the static network for yeast with gene expression data and identified condition specific sub-networks. We then studied structural properties of these sub-networks, and showed that they remain scale-free. We found that different motifs are preferentially used depending on the requirements posed by the cellular 6-2 6. Conclusions condition. Our analysis reveals that the structure of the active sub-network for any given condition is re-organised in such a way that target genes are efficiently regulated. For example, during stress, the organism needs to respond quickly to the signal and therefore there is preferential usage of the single input motifs, which only requires one transcription factor to activate many target genes simultaneously. Additionally, we found that the role of the regulatory hubs in the network also changes: although there is a small number of transcription factors that act as hubs throughout all cellular conditions, most of them behave like hubs in specific conditions only. The resulting picture is that of a network that shifts its weight between different foci to coordinate distinct cellular processes. Our approach can be used to derive condition specific sub-networks for any given organism and the results from such studies could have implications in developing drugs that affect the regulatory hubs in a condition specific manner. The work presented in this thesis is a step towards understanding how transcriptional regulatory systems evolve. The insights gained by studying simple and well-characterized organisms like E. coli and yeast can be applied to understand transcriptional networks and other kinds of biological networks in more complex organisms, some of which may not be experimentally tractable, like most human pathogens. This work was possible only due to the availability of new kinds of genomics experimental data, like microarray and ChIp-chip data. The methods we developed and our predictions can be readily applied to design novel experiments that will aid in understanding intricate details of how transcriptional systems work in other organisms, including humans. 6-3