TABLE OF CONTENTS DECLARATION ................................................................................................................. I ACKNOWLEDGEMENTS.................................................................................................... II SUMMARY ..................................................................................................................... IV ABBREVIATIONS ............................................................................................................. V TABLE OF CONTENTS .................................................................................................... VI LIST OF FIGURES ........................................................................................................... X LIST OF TABLES........................................................................................................... XII INTRODUCTION .................................................................................................... 0-1 CHAPTER 1: METHODS TO STUDY EVOLUTION OF TRANSCRIPTION FACTORS AND REGULATORY NETWORKS 1.1 INTRODUCTION ................................................................................................. 1-1 1.2 MOLECULAR MECHANISM FOR THE CONTROL OF GENE EXPRESSION 1-2 1.3 APPROACHES TO STUDY EVOLUTION OF TRANSCRIPTION FACTORS .. 1-3 1.3.1 COMPARISON OF PROTEIN SEQUENCES TO INFER HOMOLOGY ............................ 1-3 1.3.2 DOMAINS OF A PROTEINS CAN BE USED TO INFER EVOLUTIONARY HISTORY ......... 1-4 1.3.3 PROCEDURES TO ASSIGN DOMAINS TO PROTEIN SEQUENCES ............................. 1-7 1.3.4 PROTEIN EVOLUTION ....................................................................................... 1-8 1.4 APPROACHES TO STUDY GENE EXPRESSION PROGRAMS ................... 1-10 1.4.1 METHODS TO STUDY PROTEIN-DNA INTERACTIONS ........................................ 1-11 1.4.2 GENE EXPRESSION ANALYSIS ......................................................................... 1-12 1.5 STRUCTURE OF TRANSCRIPTIONAL REGULATORY NETWORKS .......... 1-17 1.5.1 MOTIFS......................................................................................................... 1-18 1.5.2 MODULES ..................................................................................................... 1-19 1.5.3 GLOBAL NETWORK ORGANISATION ................................................................. 1-20 1.6 REFERENCES .................................................................................................. 1-21 CHAPTER 2: EVOLUTION OF TRANSCRIPTION FACTORS IN E. COLI 2.1 INTRODUCTION ................................................................................................. 2-1 2.2 METHODS .......................................................................................................... 2-2 2.2.1 IDENTIFICATION OF TRANSCRIPTION FACTORS .................................................. 2-2 2.2.2 ACTIVATORS, REPRESSORS AND DUAL REGULATORS ......................................... 2-4 2.3 RESULTS AND DISCUSSION ........................................................................... 2-4 2.3.1 ELEVEN DNA-BINDING DOMAIN FAMILIES ......................................................... 2-4 2.3.2 TRANSCRIPTION FACTORS AND PARTNER DOMAINS ........................................... 2-6 2.3.3 TRANSCRIPTION FACTORS AND GENE DUPLICATION ........................................... 2-8 VI 2.3.4 PROTEIN FAMILIES AND REGULATORY FUNCTION ............................................... 2-9 2.3.5 BINDING SITE POSITION AND REGULATORY FUNCTION ...................................... 2-11 2.3.6 REPRESSOR BINDING SITES ........................................................................... 2-14 2.3.7 GLOBAL REGULATORS ................................................................................... 2-14 2.4 CONCLUSIONS ................................................................................................ 2-21 2.5 REFERENCES .................................................................................................. 2-22 CHAPTER 3: TRANSCRIPTIONAL REGULATORY NETWORK GROWTH BY GENE DUPLICATION 3.1 INTRODUCTION ................................................................................................. 3-1 3.2 MATERIALS AND METHODS ........................................................................... 3-2 3.2.1 GENE REGULATORY NETWORKS AND MOTIFS .................................................... 3-2 3.2.2 IDENTIFICATION OF DUPLICATED GENES ............................................................ 3-2 3.2.3 IDENTIFICATION OF DUPLICATED EDGES AND SIMULATION PROCEDURE ............... 3-3 3.2.4 METHOD TO INTRODUCING ERRONEOUS INTERACTIONS INTO THE NETWORK ....... 3-3 3.3 RESULTS AND DISCUSSION ........................................................................... 3-4 3.3.1 CHARACTERISTICS OF THE REGULATORY NETWORKS ........................................ 3-4 3.3.2 DUPLICATION AND DOMAINS ............................................................................. 3-7 3.3.3 FORMATION OF THE REGULATORY NETWORK BY GENE DUPLICATIONS................. 3-9 3.3.4 DUPLICATION AND NETWORK TOPOLOGY ........................................................ 3-17 3.3.5 DUPLICATION AND NETWORK MOTIFS .............................................................. 3-19 3.4 CONCLUSIONS ................................................................................................ 3-23 3.5 REFERENCES .................................................................................................. 3-23 CHAPTER 4: EVOLUTIONARY CHANGES IN THE BLUEPRINT FOR TRANSCRIPTIONAL REGULATION IN PROKARYOTES 4.1 INTRODUCTION ................................................................................................. 4-1 4.2 MATERIALS AND METHODS ........................................................................... 4-2 4.2.1 ALGORITHM TO RECONSTRUCT TRANSCRIPTIONAL NETWORKS. .......................... 4-2 4.2.2 PROCEDURE TO IDENTIFY ORTHOLOGOUS PROTEINS ......................................... 4-3 4.2.3 PROCEDURE TO EVALUATE SIGNIFICANCE OF THE BIAS IN GENE CONSERVATION. 4-4 4.2.4 ALGORITHM TO RECONSTRUCT ANCESTRAL NETWORKS. .................................... 4-5 4.2.5 PROCEDURE TO GROUP GENOMES BY INTERACTIONS AND GENES CONSERVED. .. 4-6 4.2.6 PROCEDURE TO ANALYSE SCALE FREE BEHAVIOUR OF CONSERVED NETWORKS. . 4-7 4.2.7 ALGORITHM TO ANALYSE CONSERVATION OF ‘NETWORK MOTIFS’........................ 4-8 4.2.8 PROCEDURE TO EVALUATE SIGNIFICANCE OF MOTIF INTERACTION CONSERVATION4-9 4.3 RESULTS AND DISCUSSION ......................................................................... 4-10 4.3.1 RECONSTRUCTION OF TRANSCRIPTIONAL REGULATORY NETWORKS ................. 4-10 4.3.2 CONSERVATION OF TRANSCRIPTION FACTORS AND TARGET GENES .................. 4-13 4.3.3 CONSERVATION OF TRANSCRIPTIONAL REGULATORY NETWORKS ..................... 4-17 4.3.4 EVOLUTION OF GLOBAL NETWORK STRUCTURE ............................................... 4-20 4.3.5 EVOLUTION OF LOCAL NETWORK STRUCTURE ................................................. 4-23 4.4 CONCLUSIONS ................................................................................................ 4-29 4.5 REFERENCES .................................................................................................. 4-30 VII CHAPTER 5: GENOME SCALE ANALYSIS OF REGULATORY NETWORK DYNAMICS 5.1 INTRODUCTION ................................................................................................. 5-1 5.2 MATERIALS AND METHODS ........................................................................... 5-2 5.2.1 DATASETS ...................................................................................................... 5-2 5.2.2 BACK-TRACKING ALGORITHM ........................................................................... 5-3 5.2.3 INTERCHANGE INDEX ....................................................................................... 5-4 5.2.4 TOPOLOGICAL MEASURES ................................................................................ 5-4 5.2.5 NORMALIZATION FOR REGULATORY HUBS ......................................................... 5-5 5.2.6 REGULATORY MOTIFS ...................................................................................... 5-5 5.2.7 RANDOM NETWORKS ....................................................................................... 5-6 5.2.8 SENSITIVITY ANALYSIS ..................................................................................... 5-6 5.3 RESULTS AND DISCUSSION ........................................................................... 5-6 5.3.1 REGULATORY NETWORK IN YEAST .................................................................... 5-6 5.3.2 DIFFERENTIAL USE OF THE REGULATORY NETWORK .......................................... 5-9 5.3.3 DYNAMICS OF REGULATORY INTERACTIONS ...................................................... 5-9 5.3.4 REGULATORY SPECIFICITY THROUGH TF COMBINATIONS ................................. 5-14 5.3.5 LARGE-SCALE TOPOLOGICAL CHANGES .......................................................... 5-15 5.3.6 TF HUBS IN THE REGULATORY NETWORK ........................................................ 5-18 5.3.7 PREFERENTIAL USE OF NETWORK MOTIFS ....................................................... 5-21 5.3.8 INTER-REGULATION OF TFS IN THE CELL CYCLE AND SPORULATION .................. 5-23 5.4 CONCLUSIONS ................................................................................................ 5-28 5.5 REFERENCES .................................................................................................. 5-28 CHAPTER 6: DISCUSSION AND CONCLUSIONS .................................... 6-1 APPENDIX A: SUPPLEMENTARY MATERIAL CHAPTER 2: EVOLUTION OF TRANSCRIPTION FACTORS IN E. COLI .............A-1 CHAPTER 3: TRANSCRIPTIONAL REGULATORY NETWORK GROWTH BY GENE DUPLICATION ........................................................................A-1 3.1 PSEUDOCODES .................................................................................................. A-1 3.2 REPRESENTATION OF THE TRANSCRIPTIONAL REGULATORY NETWORK .................. A-1 3.3 CHARACTERIZATION OF THE VERTICES IN THE NETWORK ...................................... A-2 3.4 CHARACTERIZATION OF THE EDGES IN THE NETWORK ........................................... A-2 3.5 IDENTIFICATION OF NETWORK MOTIFS.................................................................. A-4 3.6 IDENTIFICATION OF DIRECTLY AND INDIRECTLY REGULATED GENES ....................... A-5 3.7 SIMULATION PROCEDURE.................................................................................... A-5 3.8 INTERNAL DUPLICATION IN THE YEAST SIMS ........................................................ A-6 3.9 INTERNAL DUPLICATIONS IN THE E. COLI SIMS ..................................................... A-7 3.10 INTERNAL DUPLICATIONS IN THE YEAST FFMS.................................................... A-7 CHAPTER 4: EVOLUTIONARY CHANGES IN THE BLUEPRINT FOR TRANSCRIPTIONAL REGULATION IN PROKARYOTES ...............A-8 4.1 ALGORITHM TO SIMULATE NETWORK EVOLUTION .................................................. A-8 4.2 ALGORITHM TO EVALUATE OBSERVED NUMBER OF WHOLE MOTIFS IN GENOMES BY CREATING RANDOM NETWORKS ...................................................................... A-9 4.3 ALGORITHM TO EVALUATE OBSERVED CONSERVATION OF INTERACTIONS IN MOTIFS IN GENOMES ........................................................................................... A-9 4.4 CONSERVATION OF GENES, INTERACTIONS, GENOME SIZE AND NUMBER OF PREDICTED TRANSCRIPTION FACTORS FOR EACH OF THE 176 GENOMES ............. A-11 VIII CHAPTER 5: GENOME SCALE ANALYSIS OF REGULATORY NETWORK DYNAMICS .......................................................................................A-15 APPENDIX B: OTHER PUBLISHED WORKS 1. NON-CANONICAL INTERACTIONS IN PROTEIN STRUCTURES 1.1 NCI: A SERVER TO IDENTIFY NON-CANONICAL INTERACTIONS IN PROTEINS ............ B-1 1.2 A C-H..O HYDROGEN BOND STABILISED POLYPEPTIDE CHAIN REVERSAL MOTIF ..... B-2 1.3 REGISTERING HELICES AND STRANDS USING C-H..O HYDROGEN BONDS ............... B-3 2. PROTEIN INTERACTION AND GENE EXPRESSION 2.1 CONSERVATION OF GENE COREGULATION............................................................ B-4 2.2 INTRODUCTION TO MICROARRAY DATA ANALYSIS .................................................. B-5 2.3 UNIVERSAL MICROARRAYS ................................................................................. B-6 2.4 STATISTICAL ANALYSIS OF DOMAINS IN INTERACTING PROTEIN PAIRS ..................... B-7 3. MICROBIAL GENOME EVOLUTION 3.1 DOLOP – DATABASE OF BACTERIAL LIPOPROTEINS ............................................. B-8 3.2 LOSS OF SIGMA FACTORS AND PSEUDOGENE ACCUMULATION IN M. LEPRAE .......... B-9 LIST OF PUBLICATIONS ..................................................................................C-1 IX