Appendix x SUPPLEMENTARY MATERIAL CHAPTER 2: EVOLUTION OF TRANSCRIPTION FACTORS IN E. COLI .............A-1 CHAPTER 3: TRANSCRIPTIONAL REGULATORY NETWORK GROWTH BY GENE DUPLICATION...............................................................................................A-1 3.1 PSEUDOCODES .................................................................................................. A-1 3.2 REPRESENTATION OF THE TRANSCRIPTIONAL REGULATORY NETWORK .................. A-1 3.3 CHARACTERIZATION OF THE VERTICES IN THE NETWORK ...................................... A-2 3.4 CHARACTERIZATION OF THE EDGES IN THE NETWORK ........................................... A-2 3.5 IDENTIFICATION OF NETWORK MOTIFS.................................................................. A-4 3.6 IDENTIFICATION OF DIRECTLY AND INDIRECTLY REGULATED GENES ....................... A-5 3.7 SIMULATION PROCEDURE.................................................................................... A-5 3.8 INTERNAL DUPLICATION IN THE YEAST SIMS ........................................................ A-6 3.9 INTERNAL DUPLICATIONS IN THE E. COLI SIMS ..................................................... A-7 3.10 INTERNAL DUPLICATIONS IN THE YEAST FFMS.................................................... A-7 CHAPTER 4: EVOLUTIONARY CHANGES IN THE BLUEPRINT FOR TRANSCRIPTIONAL REGULATION IN PROKARYOTES .....................................A-8 4.1 ALGORITHM TO SIMULATE NETWORK EVOLUTION. ................................................. A-8 4.2 ALGORITHM TO EVALUATE OBSERVED NUMBER OF WHOLE MOTIFS IN GENOMES BY CREATING RANDOM NETWORKS. ................................................................................ A-9 4.3 ALGORITHM TO EVALUATE OBSERVED CONSERVATION OF INTERACTIONS IN MOTIFS IN GENOMES ................................................................................................................ A-9 4.4 CONSERVATION OF GENES, INTERACTIONS, GENOME SIZE AND NUMBER OF PREDICTED TRANSCRIPTION FACTORS FOR EACH OF THE 176 GENOMES. .................. A-11 CHAPTER 5: GENOME SCALE ANALYSIS OF REGULATORY NETWORK DYNAMICS .............................................................................................................A-15 A Supplementary Material APPENDIX A SUPPLEMENTARY MATERIAL Chapter 2: Evolution of Transcription Factors In E. coli Additional supplementary material is available at: http://www.mrc-lmb.cam.ac.uk/genomes/madanm/ec_tf http://www.mrc-lmb.cam.ac.uk/genomes/madanm/ec_tf_bs Chapter 3: Transcriptional Regulatory Network Growth By Gene Duplication Additional supplementary material is available at: http://www.mrc-lmb.cam.ac.uk/genomes/madanm/net_evol 3.1 Pseudocodes Pseudocodes to calculate various properties discussed in this chapter are given below. 3.2 Representation of the transcriptional regulatory network The gene regulatory network is represented as a directed graph G=(V, E) where each protein in the network is a vertex (v V). The protein can be a transcription factor (tf) or a regulated gene (rg) where tf, rg V. The transcriptional interaction (from a transcription factor to the regulated gene) is a directed edge (e E). The network is implemented using two hash Tables PR (Parent of a Regulated gene) and CT (Child of a Transcription factor). PR consists of |V| keys, one for each vertex in V. The key is a regulated gene and the values correspond to all the transcription factors regulating it (i.e. for each key rg, the value corresponds to all vertices tf such that there is an edge (tf → rg) E). CT consists of |V| keys, one for each vertex in V. The key is a transcription factor and the values correspond to all genes that it regulates (i.e. for each key tf, the value corresponds to all vertices rg such that there is an edge (tf → rg) E). A-1 Supplementary Material Each protein (v V) is characterised by a domain architecture, i.e. the ordered set of domains from the N-terminus to the C-terminus. Proteins with the same domain architecture, ignoring gaps and internal duplications, are considered homologues in this analysis. This information is stored in a hash Table DA of |D| elements, one for each domain architecture d. A key corresponds to a domain architecture d and values correspond to all vertices h (homologous proteins) in the graph that has the domain architecture d. In other words, this hash Table represents genes (h) that have arisen by duplication from a common ancestor. Another hash Table DAG consists of |V| keys, one for each vertex in V such that the key is a gene and the value is its domain architecture, d. An explanation for each line in the algorithm for the first section below is given as a guide. 3.3 Characterization of the vertices in the network A. Algorithm to identify duplicated genes (regulated genes) that are controlled by the same transcription factors 01: for each d DA 02: for each h DA [d] 03: for each tf PR [h] 04: X [tf] ← 0 05: for each h DA [d] 06: for each tf PR [h] 07: X [tf] ← X [tf] + 1 08: for each h DA [d] 09: for each tf PR [h] 10: if X [tf] > 1 11: print (h) # for each domain architecture (d) in DA # for each homologous gene (h) with domain architecture (d) # for each tf that regulates h # set the count X [tf] to 0 # for each homologous gene (h) with domain architecture (d) # for each tf that regulates h # increment the count X [tf] # for each homologous gene (h) with domain architecture (d) # for each tf that regulates h # if the tf regulates more than one homologous gene # print the rg B. Algorithm to identify duplicated genes (transcription factors) that control the same target genes 01: for each d DA 02: for each h DA [d] 03: for each rg CT [h] 04: X [rg] ← 0 05: for each h DA [d] 06: for each rg CT [h] 07: X [rg] ← X [rg] + 1 08: for each h DA [d] 09: for each rg CT [h] 10: if X [rg] > 1 11: print (h) # for each domain architecture (d) in DA # for each homologous gene (h) with domain architecture (d) # for each rg that is regulated by h # set the count X [rg] to 0 # for each homologous gene (h) with domain architecture (d) # for each rg that is regulated by h # increment the count X [rg] # for each homologous gene (h) with domain architecture (d) # for each rg that is regulated by h # if the rg is regulated by more than one homologous tf # print the tf 3.4 Characterization of the edges in the network A. Algorithm to identify edges that have evolved by the duplication of RGs: 01: for each d DA 02: for each h DA [d] 03: for each tf PR [h] 04: X [tf] ← 0 05: for each h DA [d] 06: for each tf PR [h] 07: X [tf] ← X [tf] + 1 A-2 Supplementary Material 08: 09: 10: 11: for each h DA [d] for each tf PR [h] if X [tf] > 1 print E (tf → h) B. Algorithm to identify edges that have evolved by the duplication of TFs: 01: for each d DA 02: for each h DA [d] 03: for each rg CT [h] 04: X [rg] ← 0 05: for each h DA [d] 06: for each rg CT [h] 07: X [rg] ← X [rg] + 1 08: for each h DA [d] 09: for each rg CT [h] 10: if X [rg] > 1 11: print E (h → rg) As mentioned, Algorithms A and B have precedence over C, meaning cases picked up by A and B are excluded from consideration by C. C. Algorithm to identify edges that have evolved by duplication of TFs and RGs: 01: for each d DA 02: for each h DA [d] 03: X [d] ← X [d] + 1 04: for each rg CT [h] 05: Y [ DAG [rg] ] ← Y [ DAG[rg] ] + 1 06: Z [rg] ← Z [rg] + 1 07: if X [d] > 1 08: for each rg CT [h] 09: if Z [rg] = 1 and Y [ DAG[rg] ] > 1 10: print E (h → rg) As mentioned, Algorithms A, B and C have precedence over D, E meaning cases picked up by A, B and C are excluded from consideration by D and E. D. Algorithm to identify edges that have evolved by duplication of TFs/RGs but lost their old interactions and gained new interactions: D.1 Edges where the RG duplicates and gets regulated by a new TF and loses old interactions: 01: for each d DA 02: for each h DA [d] 03: X [d] ← X [d] + 1 04: for each tf PR [h] 05: Y [tf] ← 0 06: for each h DA [d] 07: for each tf PR [h] 08: Y [tf] ← Y [tf] + 1 09: for each h DA [d] 10: for each tf PR [h] 11: if Y [tf] = 1 and X [ DAG[h] ] > 1 A-3 Supplementary Material 12: print E (tf → h) D.2 Edges where the TF duplicates and regulates new RGs and loses old ones: 01: for each d DA 02: for each h DA [d] 03 X [d] ← X [d] + 1 04: for each rg CT [h] 05: Y [rg] ← 0 06: for each h DA [d] 07: for each rg CT [h] 08: Y [rg] ← Y [rg] + 1 09: for each h DA [d] 10: for each rg CT [h] 11: if Y [rg] = 1 and X [ DAG[h] ] > 1 12: print E (h → rg) E. Algorithm to identify edges that have evolved by innovation of new regulatory interactions: 01: for each d DA 02: for each h DA [d] 03: X [d] ← X [d] + 1 04: for each d DA 05: for each h DA [d] 06: if X [d] = 1 07: for each rg CT [h] 08: if X [ DAG [rg] ] = 1 09: print E (h → rg) 3.5 Identification of network motifs The algorithms to identify Feed Forward Motifs and Single Input Motifs are described below. In these algorithms, P is an array which contains a list of transcription factors that regulate genes that are regulated by only one transcription factor and u, v, w V A. Feed Forward Motif 01: for each u CT 02: X [u] ← 0 03: for each v CT [u] 04: X [v] ← 1 05: for each w CT [v] 06: X [w] ← X [w] + 1 07: if X [w] = 2 & (u ≠ v, u ≠ w, v ≠ w) 08: print E (u → v → w) 09: X [w] ← 1 B. Single Input Motif 01: for each rg PR 02: for each tf PR [rg] 03: X [rg] ← X [rg] + 1 04: for each rg PR A-4 Supplementary Material 05: if X [rg] = 1 06: Push (P, PR [rg]) 07: for each tf P 08: for each rg CT [tf] 09: if X [rg] = 1 10: print E (tf → rg) 3.6 Identification of directly and indirectly regulated genes The breadth-first search method, which uses a recursive algorithm, was implemented to identify all nodes that can be reached from a given node. This was used to identify all other genes that can be affected by a transcription factor. The hash Table CT provides a list of all directly regulated genes. The difference between the two sets gives the indirectly regulated genes by a transcription factor. 3.7 Simulation procedure The domain architecture for each transcription factor is stored in a hash Table DATF of |D| elements, one for each domain architecture d. A key corresponds to a domain architecture d and values correspond to all vertices w that are transcription factors and have the domain architecture d. In other words, this hash Table represents groups of homologous transcription factors. Similarly a hash Table DARG is created where a key corresponds to a domain architecture and the values correspond to all vertices w that are target genes and have the domain architecture d. Randomise is a function that assigns a random domain architecture to a gene. However it is ensured that the number of genes with each domain architecture is the same as seen in the real network (as implemented in the hash Tables). A. Calculation of P-value for the duplication of target genes (RGs) that are controlled by the same transcription factor (TF) 01: for i = 1 to 10,000 02: Randomise (DARG) 03: Perform S.1.1.A and count the number of RGs that satisfy the criteria 04: if the number of RGs in simulation ≥ number seen in the real network 05: n ← n+1 06: P-value = n/10,000 B. Calculation of P-value for the duplication of transcription factors that control the same target genes 01: for i = 1 to 10,000 02: Randomise (DATF) 03: Perform S.1.1.B and count the number of TFs that satisfy the criteria 04: if the number of TFs in simulation ≥ number seen in the real network 05: n ← n+1 06: P-value = n/10,000 C. Simulation procedure to estimate the significance of duplication in determining network topology A-5 Supplementary Material 01: for i = 1 to 10,000 02: Randomise (DARG) 03: for each TF in CT 04: if number of distinct domain architectures of it’s RG ≤ number seen in the real network 05: n ← n+1 06: P-value = n/10,000 3.8 Internal duplication in the yeast SIMs YHR119W YDR207C YPL075W YDR423C YML100W YMR261C YDR123C YDR285W YER044CA YER179W YGL073W YGL213C YGR258C YHR152W YJR052W YLR263W YLR329W YOR351C YPL153C YBR118W YCL050C YCR031C YKL152C YPR080W YAL020C YAL021C YAL029C YAL030W YBR061C YDR463W YGR094W YHR163W YHR164C YJL071W YJL127C YJL164C YJL165C YJL221C YKL019W YMR290C YNL290W YOR363C YFL031W YCL043C YDR518W YDR519W YMR043W YBR160W YBR202W YGR108W YJL159W YLR131C YLR274W YPR113W YPR119W YPR200C YBR159W YDL078C YDR244W YER015W YGL205W YJL218W YJR019C YKL188C YLR284C YML042W YNL009W YNL202W YNL329C YOL002C YOL147C YOR100C YOR180C YPL095C Blue circles represent transcription factors and yellow circles represent duplicated genes within each SIM. A-6 Supplementary Material 3.9 Internal duplications in the E. coli SIMs FliA LexA Lrp PolB, RecA, RecN RpsU_DnaG_RpoD, Ssb, SulA UnuDC, UvrA, UvrB, UvrC, UvrD GltBDF, IlvIH, Kbl_Tdh, LivJ LivKHMGF, LysU, OppABCDF, SdaA, SerA ArgR Fur FlgKL, FliC, FlgMN, FliDST tarTapcheRBYZ EntCEBA, FepA_EntD, FepB FepDGC, FhuACDB, TonB ArgCBH, ArgD, ArgE ArgF, ArgI RpoE PurR RpoH PhoR 17 PhnCDE_f73_ PhnFGHIJKLMNOP, PhoA, PhoE, PstSCAB_PhoU GrpE, HflB, HtpG, HtpY, IbpAB Lon, MopA, MopB, ClpB, DnaKJ GlnB, GuaBA, PrsA, PurC PurEK, PurHD, PurL, PurMN PyrC, PyrD, SpeA, YcfC_PurB CodBA, CvpA_PurF_UbiX CysB BirA TyrR CysDNC, CysJIH CysK, CysPUWAM BioA, BioBFCD AroF_TyrA, AroG, AroP TyrB, TyrP Fis EcfF, EcfG, EcfH, EcfK, EcfLM FkpA, KsgA_EpaG_EpaH, LpxDA_FabZ MdoGH, NlpB_PurA, OstA_SurA_PdxA RfaDFCL, RpoD, Upps_CdsA_EcfE, CutC DapA_NlpB_PurA, EcfABC, EcfD AlaWX, LeuQPV, LeuX, LysT_ValT_LysW, MetT_LeuW_GlnUW_MetU_GlnVX MetY_YhbC_NusA_InfB, PheU, PheV, ArgU ProK, ProL, ArgW, ArgX_HisR_LeuT_ProM SerT, SerX, ThrU_TyrU_GlyT_ThrT, ThrW, TyrTV, ValUXY_AspV 3.10 Internal duplications in the yeast FFMs YER040W YJL110C YER040W YKR034W YER040W YKR034W YBR208C YER040W YKR034W YIR028W YER040W YKR034W YKR039W YJL110C YJL110C YKR039W YER040W YKR034W YEL063C YER040W YKR034W YIR029W YER040W YKR034W YLR142W YJL110C YKR034W YDL210W YER040W YKR034W YKR039W YER040W YKR034W YGR019W YER040W YKR034W YIR031C YGL013C YBL005W YKR034W YDL210W YER040W YKR034W YHR037W YER040W YER040W YKR034W YIR032C YBL005W YKR034W YOR375C YBL005W YKR039W YKR034W YJL110C YER040W YKR034W YOR348C YGL013C YGL013C YDR011W YJL110C YER040W YKR034W YHL016C YGL013C YBR008C YJL110C YKR034W YDR406W YBL005W YGR281W A-7 Supplementary Material YGL013C YGL013C YBL005W YJL219W YBR019C YDR040C YHR043C YGL035C YGL035C YLL043W YGL035C YOR153W YGL035C YDR146C YBR050C YGL035C YDR345C YGL035C YGL035C YGL035C YHR094C YBR101C YPL248C YGL035C YGL035C YGL035C YGL035C YDR516C YGL035C YBR126C YGL035C YMR280C YGL035C YDR009W YGL209W YEL070W YGL035C YFL054C YGL209W YGL209W YIL162W YGL035C YGL209W YGL209W YKL109W YGL035C YKR075C YGL209W YGL209W YLR044C YGL035C YGL209W YGL209W YGL209W YLR042C YOR328W YGL209W YGL209W YHR092C YBL005W YGL209W YGL209W YGL209W YGL209W YGL209W YGL209W YGL035C YBR020W YGL209W YGL209W YGL035C YGL035C YBL005W YGL209W YGL209W YGL209W YGL035C YOL156W YGL209W YGL209W YGL035C YBL005W YGL013C YGL013C YLR377C YGL035C YMR011W Colored circles within each FF represent transcription factors that are duplicates. Chapter 4: Evolutionary Changes In The Blueprint For Transcriptional Regulation In Prokaryotes Additional supplementary material is available at: http://www.mrc-lmb.cam.ac.uk/genomes/madanm/reconstruct_net/ 4.1 Algorithm to simulate network evolution. Since there was a bias in the conservation of target genes and transcription factors, we were interested in asking if there was a selection for regulatory hubs to be conserved or not. The table in the main text shows that there is no trend for regulatory hubs to be conserved. To see how it affects the network structure in terms of the interactions being conserved, the following simulation procedure was carried out. (a) Selection for hubs: 01: order transcription factors (ascending) according to the # of target genes they regulate 02: for each of the 175 genomes 03: identify # of TF and # of TG 04: create network with same # of TF and # of TG for genome, but choosing the TF with highest connectivity first 05: reconstruct transcriptional interactions (steps 03 – 08 from M1) 06: plot % interactions lost v/s % genes conserved (b) Neutral conservation of genes: 01: for each of the 175 genomes 02: identify # of TF and # of TG A-8 Supplementary Material 03: create network with same # of TF and # of TG for genome, choosing TF & TG without bias 04: reconstruct transcriptional interactions (steps 03 – 08 from M1) 05: plot % interactions lost v/s % genes conserved (c) Selection against hubs: 01: order transcription factors (descending) according to the # of target genes they regulate 02: for each of the 175 genomes 03: identify # of TF and # of TG 04: create network with same # of TF and # of TG for genome, but choosing the TF with the lowest connectivity first 05: reconstruct transcriptional interactions (steps 03 – 08 from M1) 06: plot % interactions lost v/s % genes conserved (d) Random conservation of non-interacting pairs of genes: 01: for each of the 175 genomes 02: identify # of interactions in reconstructed network 03: create network with same number of interactions but with pairs that are known not to interact – it can be TG-TG pair or TF-TG, TF-TG non-interacting pair 04: plot % interactions lost v/s % genes conserved 4.2 Algorithm to evaluate observed number of whole motifs in genomes by creating random networks. To evaluate if the number of motifs that were seen in the conserved networks were significant compared to random networks, the following procedure was carried out: 01: for each of 10,000 times 02: create 175 random networks of similar sizes (one for each of the 175 genomes) 03: for each of the 175 random networks identify number of feed-forward motifs single input modules and multi input motifs and store the values 04: for each of the 175 genomes and for each of the three types of motifs 05: calculate mean and standard deviation for each genome over the 10,000 runs 06: calculate P-value as the fraction of the runs where value was >= observed 07: calculate Z-score as Z = (observed – mean)/ 08: plot % genes conserved v/s number of motifs 4.3 Algorithm to evaluate observed conservation of interactions in motifs in genomes To evaluate whether interactions in a motif are more conserved than any interaction in the network, we introduce a term called conservation index (C.I.), which is defined as follows: R C.I . genomeX log 2 motif R all R motif I motif genome X I motif E. coli and R all I all genome X I all E. coli A-9 Supplementary Material In this definition, I motif genome X is the number of interactions that forms a motif in E. coli, which has motif been conserved in genome X. I E. coli is the number of interactions in a motif in E. coli. I all genome X all is the total number of interactions that have been conserved in genome X and I E. coli is the total number of interactions in E. coli. (a) Interactions in motifs are selected for 01: note the number of interactions in motifs in E. coli Iem and all interactions in E. coli Iea 02: for each genome x of the 175 genomes, note the number of interactions conserved, I xa & interactions conserved that forms a motif in E. coli, Ixm. 03: if (Ixa < Iem) Rmotif = Ixa/Iem 04: else Rmotif = 1 05: Rall = Ixa/Iea 06: C.I. = log2 (Rmotif/Rall) 07: plot % genes conserved v/s C.I for every genome (b) Interactions in motifs are neutrally removed 01: note the number of interactions in motifs in E. coli Iem and all interactions in E. coli Iea 02: for each of 10,000 times 03: create 175 random networks of similar sizes (one for each of the 175 genomes) 04: for each of the 175 networks, note the number of interactions conserved, I xa & interactions conserved that forms a motif in E. coli, Ixm. 05: Rmotif = Ixm/Iem 06: Rall = Ixa/Iea 07: C.I. = log2 (Rmotif/Rall) 08: calculate mean and of C.I. for each of the 175 networks over the 10,000 runs 08: calculate P-value as the fraction of the runs where C.I. was >= observed 10: calculate Z-score as Z = (observed – mean)/ 11: plot % genes conserved and the mean C.I. value (c) Interactions in motifs are selected against 01: note the number of interactions in motifs in E. coli Iem and all interactions in E. coli Iea 02: d = Iea – Iem 03: for each of the 175 genomes, x, note the number of interactions conserved, I xa & interactions conserved that forms a motif in E. coli, Ixm. 04: if (Ixm > d) Rmotif = (Ixm – d)/Iem 05: else Rmotif = 0 06: Rall = Ixa/Iea 07: C.I. = log2 (Rmotif/Rall) 08: plot % genes conserved v/s C.I for every genome A-10 Supplementary Material 4.4 Conservation of genes, interactions, genome size and number of predicted transcription factors for each of the 176 genomes. Genome Bifidobacterium_longum Corynebacterium_diphtheriae Corynebacterium_efficiens_YS-314 Corynebacterium_glutamicum Mycobacterium_bovis Mycobacterium_leprae Mycobacterium_tuberculosis_CDC1551 Mycobacterium_tuberculosis_H37Rv Streptomyces_avermitilis Streptomyces_coelicolor Thermobifida_fusca Tropheryma_whipplei_Twist Tropheryma_whipplei_TW08_27 Aquifex_aeolicus Bacteroides_thetaiotaomicron_VPI5482 Cytophaga_hutchinsonii Porphyromonas_gingivalis_W83 Chlamydophila_caviae Chlamydia_muridarum Chlamydophila_pneumoniae_AR39 Chlamydophila_pneumoniae_CWL029 Chlamydophila_pneumoniae_J138 Chlamydophila_pneumoniae_TW_183 Chlamydia_trachomatis Chlorobium_tepidum_TLS Chloroflexus_aurantiacus Aeropyrum_pernix Pyrobaculum_aerophilum Sulfolobus_solfataricus Sulfolobus_tokodaii Nostoc_sp Gloeobacter_violaceus Nostoc_punctiforme Prochlorococcus_marinus_CCMP1375 Prochlorococcus_marinus_MIT9313 Prochlorococcus_marinus_MED4 Synechococcus_sp_WH8102 %Genes %Interacti Predi Genom Abbreviation conserv ons cted e size ed conserved Tfs Phyla Blo Cdi Cef Cgl Mbo Mle Mtu_C Mtu_H Sav Scoe Tfus Twh_T Twh_TW Aae 31.61 38.25 41.57 42.37 42.77 30.02 42.77 43.3 52.2 52.59 39.71 17.93 18.2 33.21 13.59 15.67 22.77 23.01 17.14 12.04 17.14 17.37 27.79 25.32 11.19 1.31 1.31 7.18 1727 2272 2950 2993 3920 1605 4187 3927 7575 7769 2941 808 783 1529 87 88 131 142 177 39 172 173 623 723 164 8 7 33 actinobacteria actinobacteria actinobacteria actinobacteria actinobacteria actinobacteria actinobacteria actinobacteria actinobacteria actinobacteria actinobacteria actinobacteria actinobacteria aquificae Bth_V 42.77 13.82 4778 235 bacteroidetes Chu Pgi Cca Cmu Cpn_A Cpn_C Cpn_J Cpn_T Ct Ctep Cau Ap Pyae Sso Sst Ana Gvi Npu Pmar_C Pmar_M Pmar_ME Scsp 38.78 27.36 18.6 17.27 18.46 18.73 19 18.86 17.27 34.27 42.9 25.5 28.82 33.21 30.02 42.63 42.63 42.9 30.55 34.67 31.08 36.13 11.42 10.27 2.77 1.46 2.31 2.31 2.39 2.39 1.23 10.96 15.59 6.02 4.4 4.86 7.49 13.35 16.21 16.52 8.72 13.51 8.18 8.72 3499 1909 998 904 1112 1054 1069 1113 895 2252 3995 1841 2605 2977 2826 5366 4430 6037 1882 2265 1712 2517 96 64 7 4 7 7 7 7 6 33 114 30 57 115 102 143 149 148 20 31 22 34 bacteroidetes bacteroidetes chlamydiae chlamydiae chlamydiae chlamydiae chlamydiae chlamydiae chlamydiae chlorobi chloroflexi crenarchaeota crenarchaeota crenarchaeota crenarchaeota cyanobacteria cyanobacteria cyanobacteria cyanobacteria cyanobacteria cyanobacteria cyanobacteria A-11 Supplementary Material Genome %Genes Abbreviation conserv ed Ssp 38.12 Ter 38.25 Thel 33.6 Dr 39.45 Af 30.02 Fac 26.3 H_sp 29.22 Mac 34.53 Mba 32.94 Mj 24.97 Mka 23.51 Mma 34 Synechocystis_PCC6803 Trichodesmium_erythraeum Thermosynechococcus_elongatus Deinococcus_radiodurans Archaeoglobus_fulgidus Ferroplasma_acidarmanus Halobacterium_sp Methanosarcina_acetivorans Methanosarcina_barkeri Methanococcus_jannaschii Methanopyrus_kandleri Methanosarcina_mazei Methanothermobacter_thermautotrophic Mta us Methanobacterium_thermoautotrophicu Mth m Pyrococcus_abyssi Pa Pyrococcus_furiosus Pfu Pyrococcus_horikoshii Ph Thermoplasma_acidophilum Tac Thermoplasma_volcanium Tvo Bacillus_anthracis_A2012 Ban_A2 Bacillus_anthracis_Ames Ban_Am Bacillus_cereus_ATCC14579 Bc_A Bacillus_halodurans Bha Bacillus_subtilis Bs Clostridium_acetobutylicum Cac Clostridium_perfringens Cpe Clostridium_tetani_E88 Cte_E Clostridium_thermocellum Cth Desulfitobacterium_hafniense Dha Enterococcus_faecalis_V583 Efa_V Lactobacillus_gasseri Lga Listeria_innocua Lin Lactococcus_lactis Lla Leuconostoc_mesenteroides Lme Listeria_monocytogenes Lmo Lactobacillus_plantarum Lpl Mycoplasma_gallisepticum Mga Mycoplasma_genitalium Mge Mycoplasma_penetrans Mpe Mycoplasma_pneumoniae Mpn %Interacti Predi Genom ons cted Phyla e size conserved Tfs 10.65 3167 91 cyanobacteria 10.96 3914 68 cyanobacteria 10.5 2475 41 cyanobacteria 11.66 2629 78 deinococcus 7.87 2420 95 euryarchaeota 5.4 1745 46 euryarchaeota 1.54 2075 63 euryarchaeota 4.24 4540 118 euryarchaeota 8.95 3420 76 euryarchaeota 4.4 1729 50 euryarchaeota 2.85 1687 28 euryarchaeota 10.19 3371 95 euryarchaeota 25.5 1.62 1866 46 euryarchaeota 25.24 1.85 1873 47 euryarchaeota 28.42 28.16 27.23 23.51 25.24 51.27 50.47 50.6 50.87 50.07 40.51 35.6 32.28 32.28 54.59 36.39 23.78 38.38 34.27 29.89 39.85 38.38 13.15 10.5 15.94 11.56 3.32 1.38 5.55 4.47 8.03 24.4 20.38 19.22 24.4 26.56 13.43 18.14 13.51 8.8 33.51 11.04 8.18 13.2 11.04 8.57 16.13 7.87 1.31 0.23 2.93 0.15 1896 2125 1956 1482 1499 5544 5311 5234 4066 4112 3672 2660 2373 2532 6810 3113 1641 2968 2321 1846 2846 3009 726 484 1037 689 73 82 68 54 50 296 305 293 247 249 211 118 118 73 335 172 85 184 121 89 188 215 5 4 9 4 euryarchaeota euryarchaeota euryarchaeota euryarchaeota euryarchaeota firmicutes firmicutes firmicutes firmicutes firmicutes firmicutes firmicutes firmicutes firmicutes firmicutes firmicutes firmicutes firmicutes firmicutes firmicutes firmicutes firmicutes firmicutes firmicutes firmicutes firmicutes A-12 Supplementary Material Genome Mycoplasma_pulmonis Oceanobacillus_iheyensis Oenococcus_oeni Staphylococcus_aureus_Mu50 Staphylococcus_aureus_MW2 Staphylococcus_aureus_N315 Streptococcus_agalactiae_2603 Streptococcus_agalactiae_NEM316 Staphylococcus_epidermidis_ATCC_12 228 Streptococcus_mutans Streptococcus_pneumoniae_R6 Streptococcus_pneumoniae_TIGR4 Streptococcus_pyogenes Streptococcus_pyogenes_MGAS315 Streptococcus_pyogenes_MGAS8232 Streptococcus_pyogenes_SSI-1 Thermoanaerobacter_tengcongensis Ureaplasma_urealyticum Fusobacterium_nucleatum Nanoarchaeum_equitans Pirellula_sp Agrobacterium_tumefaciens_C58_Cere on Agrobacterium_tumefaciens_C58_UWa sh Azotobacter_vinelandii Buchnera_aphidicola Buchnera_aphidicola_Sg Bordetella_bronchiseptica Blochmannia_floridanus Burkholderia_fungorum Brucella_melitensis Bordetella_pertussis Bordetella_parapertussis Bradyrhizobium_japonicum Brucella_suis_1330 Buchnera_sp Coxiella_burnetii Caulobacter_crescentus Campylobacter_jejuni Chromobacterium_violaceum %Genes Abbreviation conserv ed Mpu 15.14 Oih 48.74 Ooe 28.69 Sa_Mu 41.31 Sa_MW 42.37 Sa_N 41.04 Sag_2 31.74 Sag_N 32.41 %Interacti Predi Genom ons cted Phyla e size conserved Tfs 0.92 782 7 firmicutes 17.68 3500 181 firmicutes 5.86 1639 71 firmicutes 17.29 2714 124 firmicutes 18.14 2632 115 firmicutes 17.14 2593 117 firmicutes 10.73 2124 106 firmicutes 11.58 2094 101 firmicutes Sep_A 38.52 16.21 2419 92 firmicutes Smu Spn Spn_T Spy Spy_M3 Spy_M8 Spy_S Tte Uu Fnu Meq Psp 31.21 32.01 30.82 28.42 28.29 28.29 28.16 37.72 11.03 31.35 9.97 42.63 8.41 4.01 3.78 10.11 9.65 9.8 9.34 14.82 0.3 11.73 0.61 10.5 1960 2043 2094 1697 1865 1845 1861 2588 614 2067 563 7325 122 91 92 79 92 101 89 113 5 54 9 124 firmicutes firmicutes firmicutes firmicutes firmicutes firmicutes firmicutes firmicutes firmicutes fusobacteria nanoarchaeota planctomycetes Atu_c 43.7 24.71 2721 161 proteobacteria Atu_w 43.3 24.01 2785 168 proteobacteria Avi Bap Bap_S Bbr Bfl Bfu Bmel Bp Bpp Brja Bs_1 Bsp Cbu Ccr Cj Cvi 59.9 15.14 17.93 54.85 19.13 64.55 37.99 49.01 53.26 59.77 37.99 18.46 32.67 47.55 33.34 58.31 44.47 0.54 3.78 29.88 0.15 42.85 20.69 26.25 29.72 43.93 20.61 3.86 13.82 24.78 7.49 43.55 4230 504 546 4994 583 7151 2059 3447 4185 8317 2116 564 2009 3737 1634 4407 251 4 9 435 6 560 89 481 370 530 85 8 33 206 27 246 proteobacteria proteobacteria proteobacteria proteobacteria proteobacteria proteobacteria proteobacteria proteobacteria proteobacteria proteobacteria proteobacteria proteobacteria proteobacteria proteobacteria proteobacteria proteobacteria A-13 Supplementary Material Genome Desulfovibrio_desulfuricans Escherichia_coli_CFT073 Escherichia_coli_K12 Escherichia_coli_O157H7 Escherichia_coli_O157H7_EDL933 Geobacter_metallireducens Haemophilus_ducreyi_35000HP Helicobacter_hepaticus Haemophilus_influenzae Helicobacter_pylori_26695 Helicobacter_pylori_J99 Haemophilus_somnus Klebsiella_pneumoniae Legionella_pneumophila Microbulbifer_degradans Mesorhizobium_loti Magnetospirillum_magnetotacticum Nitrosomonas_europaea Neisseria_meningitidis_MC58 Neisseria_meningitidis_Z2491 Pseudomonas_aeruginosa Pseudomonas_fluorescens Photorhabdus_luminescens Pasteurella_multocida Pseudomonas_putida_KT2440 Pseudomonas_syringae Rickettsia_conorii Rhodobacter_sphaeroides Ralstonia_metallidurans Rickettsia_prowazekii Rhodopseudomonas_palustris Rhodospirillum_rubrum Ralstonia_solanacearum Salmonella_enterica Shigella_flexneri_2a Shigella_flexneri_2a_2457T Sinorhizobium_meliloti Shewanella_oneidensis Novosphingobium_aromaticivorans Salmonella_typhimurium_LT2 Salmonella_typhi Salmonella_typhi_Ty2 %Genes Abbreviation conserv ed Dde 40.64 Ec_C 95.62 Ec_K 100 Ec_O 95.49 Ec_OE 95.62 Gme 43.7 Hdu_3 33.74 Hhe 33.47 Hi 45.02 Hp_2 27.23 Hp_J 28.16 Hso 40.64 Kpn 23.11 Lpn 7.04 Mde 49.67 Mlo 59.63 Mmag 55.12 Neu 39.18 Nm_M 35.99 Nm_Z 37.19 Pae 64.68 Pfl 64.95 Plu 67.47 Pmu 50.47 Ppu 60.56 Psy 63.35 Rco 22.98 Rhsp 52.59 Rme 59.77 Rp 18.86 Rpa 52.46 Rru 49.67 Rsol 51 Sen 85.4 Sfl 87.52 Sfl_2 87.92 Sme 47.02 Son 58.57 Spar 47.15 St 87.79 Sty 86.19 Sty_T 86.19 %Interacti Predi Genom ons cted Phyla e size conserved Tfs 13.59 2853 107 proteobacteria 95.67 5379 300 proteobacteria 100 4311 268 proteobacteria 95.59 5361 305 proteobacteria 95.44 5324 297 proteobacteria 21.38 3025 76 proteobacteria 23.55 1717 42 proteobacteria 7.18 1875 24 proteobacteria 31.04 1657 58 proteobacteria 6.79 1576 12 proteobacteria 3.86 1491 13 proteobacteria 28.18 1647 50 proteobacteria 13.28 438 33 proteobacteria 2.08 294 8 proteobacteria 34.05 3698 157 proteobacteria 32.04 6746 508 proteobacteria 38.14 8578 252 proteobacteria 23.24 2461 94 proteobacteria 15.44 2079 57 proteobacteria 16.6 2065 51 proteobacteria 48.95 5567 449 proteobacteria 51.04 5711 417 proteobacteria 57.45 4683 307 proteobacteria 38.99 2015 67 proteobacteria 47.18 5350 401 proteobacteria 50.27 5471 329 proteobacteria 2.08 1374 14 proteobacteria 30.42 4126 209 proteobacteria 43.93 5798 432 proteobacteria 1.69 835 9 proteobacteria 30.65 4577 240 proteobacteria 32.81 3700 197 proteobacteria 27.18 3440 206 proteobacteria 83.24 4655 279 proteobacteria 87.72 4180 223 proteobacteria 87.72 4068 217 proteobacteria 30.42 3341 219 proteobacteria 43.16 4324 204 proteobacteria 27.25 3851 199 proteobacteria 88.18 4451 289 proteobacteria 87.02 4395 270 proteobacteria 86.94 4323 269 proteobacteria A-14 Supplementary Material Genome Vibrio_cholerae Vibrio_parahaemolyticus Vibrio_vulnificus_CMCP6 Vibrio_vulnificus_YJ016 Wigglesworthia_glossinidia Wigglesworthia_brevipalpis Wolinella_succinogenes Xanthomonas_axonopodis Xanthomonas_campestris Xanthomonas_citri Xylella_fastidiosa Xylella_fastidiosa_Temecula1 Yersinia_pestis_CO92 Yersinia_pestis_KIM Borrelia_burgdorferi Leptospira_interrogans Treponema_pallidum Thermotoga_maritima %Genes Abbreviation conserv ed Vch 51.53 Vpa 52.99 Vvu_C 51.27 Vvu_Y 53.39 Wbe 17.53 Wbr 17.4 Wsu 36.79 Xax 50.34 Xca 51 Xci 51 Xfa 39.31 Xfa_T 37.99 Ype_C 73.58 Ype_K 73.31 Bb 14.75 Lint 37.19 Tp 18.07 Tm 32.94 %Interacti Predi Genom ons cted Phyla e size conserved Tfs 41.77 2742 130 proteobacteria 43.32 3080 145 proteobacteria 42 2972 127 proteobacteria 45.55 3262 132 proteobacteria 1.93 657 11 proteobacteria 1.85 611 11 proteobacteria 8.33 2044 60 proteobacteria 26.1 4239 184 proteobacteria 31.04 4181 188 proteobacteria 29.96 4312 190 proteobacteria 20.3 2766 68 proteobacteria 19.3 2034 54 proteobacteria 63.62 3885 236 proteobacteria 63.24 4090 225 proteobacteria 0.69 851 3 spirochaetes 13.51 4360 57 spirochaetes 4.55 1036 10 spirochaetes 10.42 1858 58 thermotogae Chapter 5: Genome Scale Analysis Of Regulatory Network Dynamics Additional supplementary material is available at: http://bioinfo.mbb.yale.edu/regulation/dynamics/ A-15