Chapter 5: Genome Scale Analysis Of Regulatory Network Dynamics

advertisement
Appendix
x
SUPPLEMENTARY MATERIAL
CHAPTER 2: EVOLUTION OF TRANSCRIPTION FACTORS IN E. COLI .............A-1
CHAPTER 3: TRANSCRIPTIONAL REGULATORY NETWORK GROWTH BY
GENE DUPLICATION...............................................................................................A-1
3.1 PSEUDOCODES .................................................................................................. A-1
3.2 REPRESENTATION OF THE TRANSCRIPTIONAL REGULATORY NETWORK .................. A-1
3.3 CHARACTERIZATION OF THE VERTICES IN THE NETWORK ...................................... A-2
3.4 CHARACTERIZATION OF THE EDGES IN THE NETWORK ........................................... A-2
3.5 IDENTIFICATION OF NETWORK MOTIFS.................................................................. A-4
3.6 IDENTIFICATION OF DIRECTLY AND INDIRECTLY REGULATED GENES ....................... A-5
3.7 SIMULATION PROCEDURE.................................................................................... A-5
3.8 INTERNAL DUPLICATION IN THE YEAST SIMS ........................................................ A-6
3.9 INTERNAL DUPLICATIONS IN THE E. COLI SIMS ..................................................... A-7
3.10 INTERNAL DUPLICATIONS IN THE YEAST FFMS.................................................... A-7
CHAPTER 4: EVOLUTIONARY CHANGES IN THE BLUEPRINT FOR
TRANSCRIPTIONAL REGULATION IN PROKARYOTES .....................................A-8
4.1 ALGORITHM TO SIMULATE NETWORK EVOLUTION. ................................................. A-8
4.2 ALGORITHM TO EVALUATE OBSERVED NUMBER OF WHOLE MOTIFS IN GENOMES BY
CREATING RANDOM NETWORKS. ................................................................................ A-9
4.3 ALGORITHM TO EVALUATE OBSERVED CONSERVATION OF INTERACTIONS IN MOTIFS IN
GENOMES ................................................................................................................ A-9
4.4 CONSERVATION OF GENES, INTERACTIONS, GENOME SIZE AND NUMBER OF
PREDICTED TRANSCRIPTION FACTORS FOR EACH OF THE 176 GENOMES. .................. A-11
CHAPTER 5: GENOME SCALE ANALYSIS OF REGULATORY NETWORK
DYNAMICS .............................................................................................................A-15
A
Supplementary Material
APPENDIX
A
SUPPLEMENTARY MATERIAL
Chapter 2: Evolution of Transcription Factors In E. coli
Additional supplementary material is available at:
http://www.mrc-lmb.cam.ac.uk/genomes/madanm/ec_tf
http://www.mrc-lmb.cam.ac.uk/genomes/madanm/ec_tf_bs
Chapter 3: Transcriptional Regulatory Network Growth By Gene
Duplication
Additional supplementary material is available at:
http://www.mrc-lmb.cam.ac.uk/genomes/madanm/net_evol
3.1 Pseudocodes
Pseudocodes to calculate various properties discussed in this chapter are given below.
3.2 Representation of the transcriptional regulatory network
The gene regulatory network is represented as a directed graph G=(V, E) where each protein in
the network is a vertex (v  V). The protein can be a transcription factor (tf) or a regulated
gene (rg) where tf, rg  V. The transcriptional interaction (from a transcription factor to the
regulated gene) is a directed edge (e  E). The network is implemented using two hash Tables
PR (Parent of a Regulated gene) and CT (Child of a Transcription factor). PR consists of |V|
keys, one for each vertex in V. The key is a regulated gene and the values correspond to all the
transcription factors regulating it (i.e. for each key rg, the value corresponds to all vertices tf
such that there is an edge (tf → rg)  E). CT consists of |V| keys, one for each vertex in V. The
key is a transcription factor and the values correspond to all genes that it regulates (i.e. for each
key tf, the value corresponds to all vertices rg such that there is an edge (tf → rg)  E).
A-1
Supplementary Material
Each protein (v  V) is characterised by a domain architecture, i.e. the ordered set of domains
from the N-terminus to the C-terminus. Proteins with the same domain architecture, ignoring
gaps and internal duplications, are considered homologues in this analysis. This information is
stored in a hash Table DA of |D| elements, one for each domain architecture d. A key
corresponds to a domain architecture d and values correspond to all vertices h (homologous
proteins) in the graph that has the domain architecture d. In other words, this hash Table
represents genes (h) that have arisen by duplication from a common ancestor. Another hash
Table DAG consists of |V| keys, one for each vertex in V such that the key is a gene and the
value is its domain architecture, d.
An explanation for each line in the algorithm for the first section below is given as a guide.
3.3 Characterization of the vertices in the network
A. Algorithm to identify duplicated genes (regulated genes) that are controlled by the same
transcription factors
01: for each d  DA
02: for each h  DA [d]
03:
for each tf  PR [h]
04:
X [tf] ← 0
05: for each h  DA [d]
06:
for each tf  PR [h]
07:
X [tf] ← X [tf] + 1
08: for each h  DA [d]
09:
for each tf  PR [h]
10:
if X [tf] > 1
11:
print (h)
# for each domain architecture (d) in DA
# for each homologous gene (h) with domain architecture (d)
# for each tf that regulates h
# set the count X [tf] to 0
# for each homologous gene (h) with domain architecture (d)
# for each tf that regulates h
# increment the count X [tf]
# for each homologous gene (h) with domain architecture (d)
# for each tf that regulates h
# if the tf regulates more than one homologous gene
# print the rg
B. Algorithm to identify duplicated genes (transcription factors) that control the same target
genes
01: for each d  DA
02: for each h  DA [d]
03:
for each rg  CT [h]
04:
X [rg] ← 0
05: for each h  DA [d]
06:
for each rg  CT [h]
07:
X [rg] ← X [rg] + 1
08:
for each h  DA [d]
09:
for each rg  CT [h]
10:
if X [rg] > 1
11:
print (h)
# for each domain architecture (d) in DA
# for each homologous gene (h) with domain architecture (d)
# for each rg that is regulated by h
# set the count X [rg] to 0
# for each homologous gene (h) with domain architecture (d)
# for each rg that is regulated by h
# increment the count X [rg]
# for each homologous gene (h) with domain architecture (d)
# for each rg that is regulated by h
# if the rg is regulated by more than one homologous tf
# print the tf
3.4 Characterization of the edges in the network
A. Algorithm to identify edges that have evolved by the duplication of RGs:
01: for each d  DA
02: for each h  DA [d]
03:
for each tf  PR [h]
04:
X [tf] ← 0
05: for each h  DA [d]
06:
for each tf  PR [h]
07:
X [tf] ← X [tf] + 1
A-2
Supplementary Material
08:
09:
10:
11:
for each h  DA [d]
for each tf  PR [h]
if X [tf] > 1
print E (tf → h)
B. Algorithm to identify edges that have evolved by the duplication of TFs:
01: for each d  DA
02: for each h  DA [d]
03:
for each rg  CT [h]
04:
X [rg] ← 0
05: for each h  DA [d]
06:
for each rg  CT [h]
07:
X [rg] ← X [rg] + 1
08:
for each h  DA [d]
09:
for each rg  CT [h]
10:
if X [rg] > 1
11:
print E (h → rg)
As mentioned, Algorithms A and B have precedence over C, meaning cases picked up by A and
B are excluded from consideration by C.
C. Algorithm to identify edges that have evolved by duplication of TFs and RGs:
01: for each d  DA
02: for each h  DA [d]
03:
X [d] ← X [d] + 1
04:
for each rg  CT [h]
05:
Y [ DAG [rg] ] ← Y [ DAG[rg] ] + 1
06:
Z [rg] ← Z [rg] + 1
07:
if X [d] > 1
08:
for each rg  CT [h]
09:
if Z [rg] = 1 and Y [ DAG[rg] ] > 1
10:
print E (h → rg)
As mentioned, Algorithms A, B and C have precedence over D, E meaning cases picked up by
A, B and C are excluded from consideration by D and E.
D. Algorithm to identify edges that have evolved by duplication of TFs/RGs but lost their old
interactions and gained new interactions:
D.1 Edges where the RG duplicates and gets regulated by a new TF and loses old interactions:
01: for each d  DA
02: for each h  DA [d]
03:
X [d] ← X [d] + 1
04:
for each tf  PR [h]
05:
Y [tf] ← 0
06: for each h  DA [d]
07:
for each tf  PR [h]
08:
Y [tf] ← Y [tf] + 1
09: for each h  DA [d]
10:
for each tf  PR [h]
11:
if Y [tf] = 1 and X [ DAG[h] ] > 1
A-3
Supplementary Material
12:
print E (tf → h)
D.2 Edges where the TF duplicates and regulates new RGs and loses old ones:
01: for each d  DA
02: for each h  DA [d]
03
X [d] ← X [d] + 1
04:
for each rg  CT [h]
05:
Y [rg] ← 0
06: for each h  DA [d]
07:
for each rg  CT [h]
08:
Y [rg] ← Y [rg] + 1
09: for each h  DA [d]
10:
for each rg  CT [h]
11:
if Y [rg] = 1 and X [ DAG[h] ] > 1
12:
print E (h → rg)
E. Algorithm to identify edges that have evolved by innovation of new regulatory interactions:
01: for each d  DA
02: for each h  DA [d]
03:
X [d] ← X [d] + 1
04: for each d  DA
05: for each h  DA [d]
06:
if X [d] = 1
07:
for each rg  CT [h]
08:
if X [ DAG [rg] ] = 1
09:
print E (h → rg)
3.5 Identification of network motifs
The algorithms to identify Feed Forward Motifs and Single Input Motifs are described below.
In these algorithms, P is an array which contains a list of transcription factors that regulate
genes that are regulated by only one transcription factor and u, v, w  V
A. Feed Forward Motif
01: for each u  CT
02: X [u] ← 0
03: for each v  CT [u]
04:
X [v] ← 1
05: for each w  CT [v]
06:
X [w] ← X [w] + 1
07:
if X [w] = 2 & (u ≠ v, u ≠ w, v ≠ w)
08:
print E (u → v → w)
09:
X [w] ← 1
B. Single Input Motif
01: for each rg  PR
02: for each tf  PR [rg]
03:
X [rg] ← X [rg] + 1
04: for each rg  PR
A-4
Supplementary Material
05: if X [rg] = 1
06:
Push (P, PR [rg])
07: for each tf  P
08: for each rg CT [tf]
09:
if X [rg] = 1
10:
print E (tf → rg)
3.6 Identification of directly and indirectly regulated genes
The breadth-first search method, which uses a recursive algorithm, was implemented to identify
all nodes that can be reached from a given node. This was used to identify all other genes that
can be affected by a transcription factor. The hash Table CT provides a list of all directly
regulated genes. The difference between the two sets gives the indirectly regulated genes by a
transcription factor.
3.7 Simulation procedure
The domain architecture for each transcription factor is stored in a hash Table DATF of |D|
elements, one for each domain architecture d. A key corresponds to a domain architecture d
and values correspond to all vertices w that are transcription factors and have the domain
architecture d. In other words, this hash Table represents groups of homologous transcription
factors. Similarly a hash Table DARG is created where a key corresponds to a domain
architecture and the values correspond to all vertices w that are target genes and have the
domain architecture d.
Randomise is a function that assigns a random domain architecture to a gene. However it is
ensured that the number of genes with each domain architecture is the same as seen in the real
network (as implemented in the hash Tables).
A. Calculation of P-value for the duplication of target genes (RGs) that are controlled by the
same transcription factor (TF)
01: for i = 1 to 10,000
02: Randomise (DARG)
03: Perform S.1.1.A and count the number of RGs that satisfy the criteria
04: if the number of RGs in simulation ≥ number seen in the real network
05:
n ← n+1
06: P-value = n/10,000
B. Calculation of P-value for the duplication of transcription factors that control the same target
genes
01: for i = 1 to 10,000
02: Randomise (DATF)
03: Perform S.1.1.B and count the number of TFs that satisfy the criteria
04: if the number of TFs in simulation ≥ number seen in the real network
05:
n ← n+1
06: P-value = n/10,000
C. Simulation procedure to estimate the significance of duplication in determining network
topology
A-5
Supplementary Material
01: for i = 1 to 10,000
02: Randomise (DARG)
03: for each TF in CT
04:
if number of distinct domain architectures of it’s RG ≤ number seen in the real network
05:
n ← n+1
06: P-value = n/10,000
3.8 Internal duplication in the yeast SIMs
YHR119W
YDR207C
YPL075W
YDR423C
YML100W YMR261C
YDR123C YDR285W YER044CA YER179W
YGL073W YGL213C YGR258C YHR152W
YJR052W YLR263W YLR329W
YOR351C YPL153C
YBR118W YCL050C YCR031C
YKL152C YPR080W
YAL020C YAL021C YAL029C YAL030W
YBR061C YDR463W YGR094W YHR163W
YHR164C YJL071W YJL127C YJL164C
YJL165C YJL221C YKL019W YMR290C YNL290W
YOR363C
YFL031W
YCL043C YDR518W YDR519W
YMR043W
YBR160W
YBR202W YGR108W
YJL159W YLR131C YLR274W
YPR113W YPR119W YPR200C
YBR159W YDL078C YDR244W YER015W
YGL205W YJL218W YJR019C YKL188C
YLR284C YML042W YNL009W YNL202W
YNL329C YOL002C YOL147C YOR100C
YOR180C YPL095C
Blue circles represent transcription factors and yellow circles represent duplicated genes within
each SIM.
A-6
Supplementary Material
3.9 Internal duplications in the E. coli SIMs
FliA
LexA
Lrp
PolB, RecA, RecN
RpsU_DnaG_RpoD, Ssb, SulA
UnuDC, UvrA, UvrB, UvrC, UvrD
GltBDF, IlvIH, Kbl_Tdh, LivJ
LivKHMGF, LysU,
OppABCDF, SdaA, SerA
ArgR
Fur
FlgKL, FliC, FlgMN, FliDST
tarTapcheRBYZ
EntCEBA, FepA_EntD, FepB
FepDGC, FhuACDB, TonB
ArgCBH, ArgD, ArgE
ArgF, ArgI
RpoE
PurR
RpoH
PhoR
17
PhnCDE_f73_
PhnFGHIJKLMNOP,
PhoA, PhoE,
PstSCAB_PhoU
GrpE, HflB, HtpG, HtpY, IbpAB
Lon, MopA, MopB, ClpB, DnaKJ
GlnB, GuaBA, PrsA, PurC
PurEK, PurHD, PurL, PurMN
PyrC, PyrD, SpeA, YcfC_PurB
CodBA, CvpA_PurF_UbiX
CysB
BirA
TyrR
CysDNC, CysJIH
CysK, CysPUWAM
BioA, BioBFCD
AroF_TyrA, AroG, AroP
TyrB, TyrP
Fis
EcfF, EcfG, EcfH, EcfK, EcfLM
FkpA, KsgA_EpaG_EpaH, LpxDA_FabZ
MdoGH, NlpB_PurA, OstA_SurA_PdxA
RfaDFCL, RpoD, Upps_CdsA_EcfE, CutC
DapA_NlpB_PurA, EcfABC, EcfD
AlaWX, LeuQPV, LeuX, LysT_ValT_LysW,
MetT_LeuW_GlnUW_MetU_GlnVX
MetY_YhbC_NusA_InfB, PheU, PheV, ArgU
ProK, ProL, ArgW, ArgX_HisR_LeuT_ProM
SerT, SerX, ThrU_TyrU_GlyT_ThrT,
ThrW, TyrTV, ValUXY_AspV
3.10 Internal duplications in the yeast FFMs
YER040W
YJL110C
YER040W
YKR034W
YER040W
YKR034W YBR208C
YER040W
YKR034W YIR028W
YER040W
YKR034W YKR039W
YJL110C
YJL110C
YKR039W
YER040W
YKR034W YEL063C
YER040W
YKR034W YIR029W
YER040W
YKR034W YLR142W
YJL110C
YKR034W YDL210W
YER040W
YKR034W YKR039W
YER040W
YKR034W YGR019W
YER040W
YKR034W YIR031C
YGL013C
YBL005W
YKR034W
YDL210W
YER040W
YKR034W YHR037W
YER040W
YER040W
YKR034W YIR032C
YBL005W
YKR034W YOR375C
YBL005W
YKR039W
YKR034W YJL110C
YER040W
YKR034W YOR348C
YGL013C
YGL013C
YDR011W
YJL110C
YER040W
YKR034W YHL016C
YGL013C
YBR008C
YJL110C
YKR034W
YDR406W
YBL005W
YGR281W
A-7
Supplementary Material
YGL013C
YGL013C
YBL005W
YJL219W
YBR019C
YDR040C
YHR043C
YGL035C
YGL035C
YLL043W
YGL035C
YOR153W
YGL035C
YDR146C
YBR050C
YGL035C
YDR345C
YGL035C
YGL035C
YGL035C
YHR094C
YBR101C
YPL248C
YGL035C
YGL035C
YGL035C
YGL035C
YDR516C
YGL035C
YBR126C
YGL035C
YMR280C
YGL035C
YDR009W
YGL209W
YEL070W
YGL035C
YFL054C
YGL209W
YGL209W
YIL162W
YGL035C
YGL209W
YGL209W
YKL109W
YGL035C
YKR075C
YGL209W
YGL209W
YLR044C
YGL035C
YGL209W
YGL209W
YGL209W
YLR042C
YOR328W
YGL209W
YGL209W
YHR092C
YBL005W
YGL209W
YGL209W
YGL209W
YGL209W
YGL209W
YGL209W
YGL035C
YBR020W
YGL209W
YGL209W
YGL035C
YGL035C
YBL005W
YGL209W
YGL209W
YGL209W
YGL035C
YOL156W
YGL209W
YGL209W
YGL035C
YBL005W
YGL013C
YGL013C
YLR377C
YGL035C
YMR011W
Colored circles within each FF represent transcription factors that are duplicates.
Chapter 4: Evolutionary Changes In The Blueprint For
Transcriptional Regulation In Prokaryotes
Additional supplementary material is available at:
http://www.mrc-lmb.cam.ac.uk/genomes/madanm/reconstruct_net/
4.1 Algorithm to simulate network evolution.
Since there was a bias in the conservation of target genes and transcription factors, we were
interested in asking if there was a selection for regulatory hubs to be conserved or not. The
table in the main text shows that there is no trend for regulatory hubs to be conserved. To see
how it affects the network structure in terms of the interactions being conserved, the following
simulation procedure was carried out.
(a) Selection for hubs:
01: order transcription factors (ascending) according to the # of target genes they regulate
02: for each of the 175 genomes
03:
identify # of TF and # of TG
04:
create network with same # of TF and # of TG for genome, but choosing the TF with
highest connectivity first
05:
reconstruct transcriptional interactions (steps 03 – 08 from M1)
06: plot % interactions lost v/s % genes conserved
(b) Neutral conservation of genes:
01: for each of the 175 genomes
02:
identify # of TF and # of TG
A-8
Supplementary Material
03:
create network with same # of TF and # of TG for genome, choosing TF & TG without
bias
04:
reconstruct transcriptional interactions (steps 03 – 08 from M1)
05: plot % interactions lost v/s % genes conserved
(c) Selection against hubs:
01: order transcription factors (descending) according to the # of target genes they regulate
02: for each of the 175 genomes
03:
identify # of TF and # of TG
04:
create network with same # of TF and # of TG for genome, but choosing the TF with the
lowest connectivity first
05:
reconstruct transcriptional interactions (steps 03 – 08 from M1)
06: plot % interactions lost v/s % genes conserved
(d) Random conservation of non-interacting pairs of genes:
01: for each of the 175 genomes
02:
identify # of interactions in reconstructed network
03:
create network with same number of interactions but with pairs that are known not to
interact – it can be TG-TG pair or TF-TG, TF-TG non-interacting pair
04: plot % interactions lost v/s % genes conserved
4.2 Algorithm to evaluate observed number of whole motifs in genomes by creating
random networks.
To evaluate if the number of motifs that were seen in the conserved networks were significant
compared to random networks, the following procedure was carried out:
01: for each of 10,000 times
02:
create 175 random networks of similar sizes (one for each of the 175 genomes)
03:
for each of the 175 random networks identify number of feed-forward motifs
single input modules and multi input motifs and store the values
04: for each of the 175 genomes and for each of the three types of motifs
05:
calculate mean and standard deviation for each genome over the 10,000 runs
06:
calculate P-value as the fraction of the runs where value was >= observed
07:
calculate Z-score as Z = (observed – mean)/
08: plot % genes conserved v/s number of motifs
4.3 Algorithm to evaluate observed conservation of interactions in motifs in genomes
To evaluate whether interactions in a motif are more conserved than any interaction in the
network, we introduce a term called conservation index (C.I.), which is defined as follows:
R
C.I . genomeX  log 2  motif
 R all
R motif 
I motif
genome X
I motif
E. coli
and



R all 
I all
genome X
I all
E. coli
A-9
Supplementary Material
In this definition,
I motif
genome X is the number of interactions that forms a motif in E. coli, which has
motif
been conserved in genome X. I E. coli is the number of interactions in a motif in E. coli.
I all
genome X
all
is the total number of interactions that have been conserved in genome X and I E. coli is the total
number of interactions in E. coli.
(a) Interactions in motifs are selected for
01: note the number of interactions in motifs in E. coli Iem and all interactions in E. coli Iea
02: for each genome x of the 175 genomes, note the number of interactions conserved, I xa &
interactions conserved that forms a motif in E. coli, Ixm.
03:
if (Ixa < Iem) Rmotif = Ixa/Iem
04:
else Rmotif = 1
05:
Rall = Ixa/Iea
06:
C.I. = log2 (Rmotif/Rall)
07: plot % genes conserved v/s C.I for every genome
(b) Interactions in motifs are neutrally removed
01: note the number of interactions in motifs in E. coli Iem and all interactions in E. coli Iea
02: for each of 10,000 times
03:
create 175 random networks of similar sizes (one for each of the 175 genomes)
04:
for each of the 175 networks, note the number of interactions conserved, I xa &
interactions conserved that forms a motif in E. coli, Ixm.
05:
Rmotif = Ixm/Iem
06:
Rall = Ixa/Iea
07:
C.I. = log2 (Rmotif/Rall)
08: calculate mean and  of C.I. for each of the 175 networks over the 10,000 runs
08:
calculate P-value as the fraction of the runs where C.I. was >= observed
10:
calculate Z-score as Z = (observed – mean)/
11: plot % genes conserved and the mean C.I. value
(c) Interactions in motifs are selected against
01: note the number of interactions in motifs in E. coli Iem and all interactions in E. coli Iea
02: d = Iea – Iem
03: for each of the 175 genomes, x, note the number of interactions conserved, I xa &
interactions conserved that forms a motif in E. coli, Ixm.
04:
if (Ixm > d) Rmotif = (Ixm – d)/Iem
05:
else Rmotif = 0
06:
Rall = Ixa/Iea
07:
C.I. = log2 (Rmotif/Rall)
08: plot % genes conserved v/s C.I for every genome
A-10
Supplementary Material
4.4 Conservation of genes, interactions, genome size and number of predicted
transcription factors for each of the 176 genomes.
Genome
Bifidobacterium_longum
Corynebacterium_diphtheriae
Corynebacterium_efficiens_YS-314
Corynebacterium_glutamicum
Mycobacterium_bovis
Mycobacterium_leprae
Mycobacterium_tuberculosis_CDC1551
Mycobacterium_tuberculosis_H37Rv
Streptomyces_avermitilis
Streptomyces_coelicolor
Thermobifida_fusca
Tropheryma_whipplei_Twist
Tropheryma_whipplei_TW08_27
Aquifex_aeolicus
Bacteroides_thetaiotaomicron_VPI5482
Cytophaga_hutchinsonii
Porphyromonas_gingivalis_W83
Chlamydophila_caviae
Chlamydia_muridarum
Chlamydophila_pneumoniae_AR39
Chlamydophila_pneumoniae_CWL029
Chlamydophila_pneumoniae_J138
Chlamydophila_pneumoniae_TW_183
Chlamydia_trachomatis
Chlorobium_tepidum_TLS
Chloroflexus_aurantiacus
Aeropyrum_pernix
Pyrobaculum_aerophilum
Sulfolobus_solfataricus
Sulfolobus_tokodaii
Nostoc_sp
Gloeobacter_violaceus
Nostoc_punctiforme
Prochlorococcus_marinus_CCMP1375
Prochlorococcus_marinus_MIT9313
Prochlorococcus_marinus_MED4
Synechococcus_sp_WH8102
%Genes %Interacti
Predi
Genom
Abbreviation conserv
ons
cted
e size
ed
conserved
Tfs
Phyla
Blo
Cdi
Cef
Cgl
Mbo
Mle
Mtu_C
Mtu_H
Sav
Scoe
Tfus
Twh_T
Twh_TW
Aae
31.61
38.25
41.57
42.37
42.77
30.02
42.77
43.3
52.2
52.59
39.71
17.93
18.2
33.21
13.59
15.67
22.77
23.01
17.14
12.04
17.14
17.37
27.79
25.32
11.19
1.31
1.31
7.18
1727
2272
2950
2993
3920
1605
4187
3927
7575
7769
2941
808
783
1529
87
88
131
142
177
39
172
173
623
723
164
8
7
33
actinobacteria
actinobacteria
actinobacteria
actinobacteria
actinobacteria
actinobacteria
actinobacteria
actinobacteria
actinobacteria
actinobacteria
actinobacteria
actinobacteria
actinobacteria
aquificae
Bth_V
42.77
13.82
4778
235
bacteroidetes
Chu
Pgi
Cca
Cmu
Cpn_A
Cpn_C
Cpn_J
Cpn_T
Ct
Ctep
Cau
Ap
Pyae
Sso
Sst
Ana
Gvi
Npu
Pmar_C
Pmar_M
Pmar_ME
Scsp
38.78
27.36
18.6
17.27
18.46
18.73
19
18.86
17.27
34.27
42.9
25.5
28.82
33.21
30.02
42.63
42.63
42.9
30.55
34.67
31.08
36.13
11.42
10.27
2.77
1.46
2.31
2.31
2.39
2.39
1.23
10.96
15.59
6.02
4.4
4.86
7.49
13.35
16.21
16.52
8.72
13.51
8.18
8.72
3499
1909
998
904
1112
1054
1069
1113
895
2252
3995
1841
2605
2977
2826
5366
4430
6037
1882
2265
1712
2517
96
64
7
4
7
7
7
7
6
33
114
30
57
115
102
143
149
148
20
31
22
34
bacteroidetes
bacteroidetes
chlamydiae
chlamydiae
chlamydiae
chlamydiae
chlamydiae
chlamydiae
chlamydiae
chlorobi
chloroflexi
crenarchaeota
crenarchaeota
crenarchaeota
crenarchaeota
cyanobacteria
cyanobacteria
cyanobacteria
cyanobacteria
cyanobacteria
cyanobacteria
cyanobacteria
A-11
Supplementary Material
Genome
%Genes
Abbreviation conserv
ed
Ssp
38.12
Ter
38.25
Thel
33.6
Dr
39.45
Af
30.02
Fac
26.3
H_sp
29.22
Mac
34.53
Mba
32.94
Mj
24.97
Mka
23.51
Mma
34
Synechocystis_PCC6803
Trichodesmium_erythraeum
Thermosynechococcus_elongatus
Deinococcus_radiodurans
Archaeoglobus_fulgidus
Ferroplasma_acidarmanus
Halobacterium_sp
Methanosarcina_acetivorans
Methanosarcina_barkeri
Methanococcus_jannaschii
Methanopyrus_kandleri
Methanosarcina_mazei
Methanothermobacter_thermautotrophic
Mta
us
Methanobacterium_thermoautotrophicu
Mth
m
Pyrococcus_abyssi
Pa
Pyrococcus_furiosus
Pfu
Pyrococcus_horikoshii
Ph
Thermoplasma_acidophilum
Tac
Thermoplasma_volcanium
Tvo
Bacillus_anthracis_A2012
Ban_A2
Bacillus_anthracis_Ames
Ban_Am
Bacillus_cereus_ATCC14579
Bc_A
Bacillus_halodurans
Bha
Bacillus_subtilis
Bs
Clostridium_acetobutylicum
Cac
Clostridium_perfringens
Cpe
Clostridium_tetani_E88
Cte_E
Clostridium_thermocellum
Cth
Desulfitobacterium_hafniense
Dha
Enterococcus_faecalis_V583
Efa_V
Lactobacillus_gasseri
Lga
Listeria_innocua
Lin
Lactococcus_lactis
Lla
Leuconostoc_mesenteroides
Lme
Listeria_monocytogenes
Lmo
Lactobacillus_plantarum
Lpl
Mycoplasma_gallisepticum
Mga
Mycoplasma_genitalium
Mge
Mycoplasma_penetrans
Mpe
Mycoplasma_pneumoniae
Mpn
%Interacti
Predi
Genom
ons
cted
Phyla
e size
conserved
Tfs
10.65
3167 91
cyanobacteria
10.96
3914 68
cyanobacteria
10.5
2475 41
cyanobacteria
11.66
2629 78
deinococcus
7.87
2420 95
euryarchaeota
5.4
1745 46
euryarchaeota
1.54
2075 63
euryarchaeota
4.24
4540 118 euryarchaeota
8.95
3420 76
euryarchaeota
4.4
1729 50
euryarchaeota
2.85
1687 28
euryarchaeota
10.19
3371 95
euryarchaeota
25.5
1.62
1866
46
euryarchaeota
25.24
1.85
1873
47
euryarchaeota
28.42
28.16
27.23
23.51
25.24
51.27
50.47
50.6
50.87
50.07
40.51
35.6
32.28
32.28
54.59
36.39
23.78
38.38
34.27
29.89
39.85
38.38
13.15
10.5
15.94
11.56
3.32
1.38
5.55
4.47
8.03
24.4
20.38
19.22
24.4
26.56
13.43
18.14
13.51
8.8
33.51
11.04
8.18
13.2
11.04
8.57
16.13
7.87
1.31
0.23
2.93
0.15
1896
2125
1956
1482
1499
5544
5311
5234
4066
4112
3672
2660
2373
2532
6810
3113
1641
2968
2321
1846
2846
3009
726
484
1037
689
73
82
68
54
50
296
305
293
247
249
211
118
118
73
335
172
85
184
121
89
188
215
5
4
9
4
euryarchaeota
euryarchaeota
euryarchaeota
euryarchaeota
euryarchaeota
firmicutes
firmicutes
firmicutes
firmicutes
firmicutes
firmicutes
firmicutes
firmicutes
firmicutes
firmicutes
firmicutes
firmicutes
firmicutes
firmicutes
firmicutes
firmicutes
firmicutes
firmicutes
firmicutes
firmicutes
firmicutes
A-12
Supplementary Material
Genome
Mycoplasma_pulmonis
Oceanobacillus_iheyensis
Oenococcus_oeni
Staphylococcus_aureus_Mu50
Staphylococcus_aureus_MW2
Staphylococcus_aureus_N315
Streptococcus_agalactiae_2603
Streptococcus_agalactiae_NEM316
Staphylococcus_epidermidis_ATCC_12
228
Streptococcus_mutans
Streptococcus_pneumoniae_R6
Streptococcus_pneumoniae_TIGR4
Streptococcus_pyogenes
Streptococcus_pyogenes_MGAS315
Streptococcus_pyogenes_MGAS8232
Streptococcus_pyogenes_SSI-1
Thermoanaerobacter_tengcongensis
Ureaplasma_urealyticum
Fusobacterium_nucleatum
Nanoarchaeum_equitans
Pirellula_sp
Agrobacterium_tumefaciens_C58_Cere
on
Agrobacterium_tumefaciens_C58_UWa
sh
Azotobacter_vinelandii
Buchnera_aphidicola
Buchnera_aphidicola_Sg
Bordetella_bronchiseptica
Blochmannia_floridanus
Burkholderia_fungorum
Brucella_melitensis
Bordetella_pertussis
Bordetella_parapertussis
Bradyrhizobium_japonicum
Brucella_suis_1330
Buchnera_sp
Coxiella_burnetii
Caulobacter_crescentus
Campylobacter_jejuni
Chromobacterium_violaceum
%Genes
Abbreviation conserv
ed
Mpu
15.14
Oih
48.74
Ooe
28.69
Sa_Mu
41.31
Sa_MW
42.37
Sa_N
41.04
Sag_2
31.74
Sag_N
32.41
%Interacti
Predi
Genom
ons
cted
Phyla
e size
conserved
Tfs
0.92
782
7
firmicutes
17.68
3500 181 firmicutes
5.86
1639 71
firmicutes
17.29
2714 124 firmicutes
18.14
2632 115 firmicutes
17.14
2593 117 firmicutes
10.73
2124 106 firmicutes
11.58
2094 101 firmicutes
Sep_A
38.52
16.21
2419
92
firmicutes
Smu
Spn
Spn_T
Spy
Spy_M3
Spy_M8
Spy_S
Tte
Uu
Fnu
Meq
Psp
31.21
32.01
30.82
28.42
28.29
28.29
28.16
37.72
11.03
31.35
9.97
42.63
8.41
4.01
3.78
10.11
9.65
9.8
9.34
14.82
0.3
11.73
0.61
10.5
1960
2043
2094
1697
1865
1845
1861
2588
614
2067
563
7325
122
91
92
79
92
101
89
113
5
54
9
124
firmicutes
firmicutes
firmicutes
firmicutes
firmicutes
firmicutes
firmicutes
firmicutes
firmicutes
fusobacteria
nanoarchaeota
planctomycetes
Atu_c
43.7
24.71
2721
161
proteobacteria
Atu_w
43.3
24.01
2785
168
proteobacteria
Avi
Bap
Bap_S
Bbr
Bfl
Bfu
Bmel
Bp
Bpp
Brja
Bs_1
Bsp
Cbu
Ccr
Cj
Cvi
59.9
15.14
17.93
54.85
19.13
64.55
37.99
49.01
53.26
59.77
37.99
18.46
32.67
47.55
33.34
58.31
44.47
0.54
3.78
29.88
0.15
42.85
20.69
26.25
29.72
43.93
20.61
3.86
13.82
24.78
7.49
43.55
4230
504
546
4994
583
7151
2059
3447
4185
8317
2116
564
2009
3737
1634
4407
251
4
9
435
6
560
89
481
370
530
85
8
33
206
27
246
proteobacteria
proteobacteria
proteobacteria
proteobacteria
proteobacteria
proteobacteria
proteobacteria
proteobacteria
proteobacteria
proteobacteria
proteobacteria
proteobacteria
proteobacteria
proteobacteria
proteobacteria
proteobacteria
A-13
Supplementary Material
Genome
Desulfovibrio_desulfuricans
Escherichia_coli_CFT073
Escherichia_coli_K12
Escherichia_coli_O157H7
Escherichia_coli_O157H7_EDL933
Geobacter_metallireducens
Haemophilus_ducreyi_35000HP
Helicobacter_hepaticus
Haemophilus_influenzae
Helicobacter_pylori_26695
Helicobacter_pylori_J99
Haemophilus_somnus
Klebsiella_pneumoniae
Legionella_pneumophila
Microbulbifer_degradans
Mesorhizobium_loti
Magnetospirillum_magnetotacticum
Nitrosomonas_europaea
Neisseria_meningitidis_MC58
Neisseria_meningitidis_Z2491
Pseudomonas_aeruginosa
Pseudomonas_fluorescens
Photorhabdus_luminescens
Pasteurella_multocida
Pseudomonas_putida_KT2440
Pseudomonas_syringae
Rickettsia_conorii
Rhodobacter_sphaeroides
Ralstonia_metallidurans
Rickettsia_prowazekii
Rhodopseudomonas_palustris
Rhodospirillum_rubrum
Ralstonia_solanacearum
Salmonella_enterica
Shigella_flexneri_2a
Shigella_flexneri_2a_2457T
Sinorhizobium_meliloti
Shewanella_oneidensis
Novosphingobium_aromaticivorans
Salmonella_typhimurium_LT2
Salmonella_typhi
Salmonella_typhi_Ty2
%Genes
Abbreviation conserv
ed
Dde
40.64
Ec_C
95.62
Ec_K
100
Ec_O
95.49
Ec_OE
95.62
Gme
43.7
Hdu_3
33.74
Hhe
33.47
Hi
45.02
Hp_2
27.23
Hp_J
28.16
Hso
40.64
Kpn
23.11
Lpn
7.04
Mde
49.67
Mlo
59.63
Mmag
55.12
Neu
39.18
Nm_M
35.99
Nm_Z
37.19
Pae
64.68
Pfl
64.95
Plu
67.47
Pmu
50.47
Ppu
60.56
Psy
63.35
Rco
22.98
Rhsp
52.59
Rme
59.77
Rp
18.86
Rpa
52.46
Rru
49.67
Rsol
51
Sen
85.4
Sfl
87.52
Sfl_2
87.92
Sme
47.02
Son
58.57
Spar
47.15
St
87.79
Sty
86.19
Sty_T
86.19
%Interacti
Predi
Genom
ons
cted
Phyla
e size
conserved
Tfs
13.59
2853 107 proteobacteria
95.67
5379 300 proteobacteria
100
4311 268 proteobacteria
95.59
5361 305 proteobacteria
95.44
5324 297 proteobacteria
21.38
3025 76
proteobacteria
23.55
1717 42
proteobacteria
7.18
1875 24
proteobacteria
31.04
1657 58
proteobacteria
6.79
1576 12
proteobacteria
3.86
1491 13
proteobacteria
28.18
1647 50
proteobacteria
13.28
438
33
proteobacteria
2.08
294
8
proteobacteria
34.05
3698 157 proteobacteria
32.04
6746 508 proteobacteria
38.14
8578 252 proteobacteria
23.24
2461 94
proteobacteria
15.44
2079 57
proteobacteria
16.6
2065 51
proteobacteria
48.95
5567 449 proteobacteria
51.04
5711 417 proteobacteria
57.45
4683 307 proteobacteria
38.99
2015 67
proteobacteria
47.18
5350 401 proteobacteria
50.27
5471 329 proteobacteria
2.08
1374 14
proteobacteria
30.42
4126 209 proteobacteria
43.93
5798 432 proteobacteria
1.69
835
9
proteobacteria
30.65
4577 240 proteobacteria
32.81
3700 197 proteobacteria
27.18
3440 206 proteobacteria
83.24
4655 279 proteobacteria
87.72
4180 223 proteobacteria
87.72
4068 217 proteobacteria
30.42
3341 219 proteobacteria
43.16
4324 204 proteobacteria
27.25
3851 199 proteobacteria
88.18
4451 289 proteobacteria
87.02
4395 270 proteobacteria
86.94
4323 269 proteobacteria
A-14
Supplementary Material
Genome
Vibrio_cholerae
Vibrio_parahaemolyticus
Vibrio_vulnificus_CMCP6
Vibrio_vulnificus_YJ016
Wigglesworthia_glossinidia
Wigglesworthia_brevipalpis
Wolinella_succinogenes
Xanthomonas_axonopodis
Xanthomonas_campestris
Xanthomonas_citri
Xylella_fastidiosa
Xylella_fastidiosa_Temecula1
Yersinia_pestis_CO92
Yersinia_pestis_KIM
Borrelia_burgdorferi
Leptospira_interrogans
Treponema_pallidum
Thermotoga_maritima
%Genes
Abbreviation conserv
ed
Vch
51.53
Vpa
52.99
Vvu_C
51.27
Vvu_Y
53.39
Wbe
17.53
Wbr
17.4
Wsu
36.79
Xax
50.34
Xca
51
Xci
51
Xfa
39.31
Xfa_T
37.99
Ype_C
73.58
Ype_K
73.31
Bb
14.75
Lint
37.19
Tp
18.07
Tm
32.94
%Interacti
Predi
Genom
ons
cted
Phyla
e size
conserved
Tfs
41.77
2742 130 proteobacteria
43.32
3080 145 proteobacteria
42
2972 127 proteobacteria
45.55
3262 132 proteobacteria
1.93
657
11
proteobacteria
1.85
611
11
proteobacteria
8.33
2044 60
proteobacteria
26.1
4239 184 proteobacteria
31.04
4181 188 proteobacteria
29.96
4312 190 proteobacteria
20.3
2766 68
proteobacteria
19.3
2034 54
proteobacteria
63.62
3885 236 proteobacteria
63.24
4090 225 proteobacteria
0.69
851
3
spirochaetes
13.51
4360 57
spirochaetes
4.55
1036 10
spirochaetes
10.42
1858 58
thermotogae
Chapter 5: Genome Scale Analysis Of Regulatory Network
Dynamics
Additional supplementary material is available at:
http://bioinfo.mbb.yale.edu/regulation/dynamics/
A-15
Download