De Keersmaecker et al. – Top down systems microbiology Supplementary information: Overview of omics data Omics data can either be generated in high-throughput experiments or by in silico prediction through mining of genome sequences. Although some are provided as supplementary data of the corresponding research papers, the number of prokaryotic omics data publicly available in databases is still limited (Table S1). However, such repositories start being constructed for several bacterial species (Table S1). Omics data generated by high-throughput experiments At the level of the genome A laboratory strain of Haemophilus influenzae was the first bacterium to be sequenced (Fleischmann et al., 1995). Up till now, more than 300 complete bacterial genomes are publicly available and many others are anticipated. These data are often a prerequisite for recent omics technologies. At the level of the transcriptome Expression profiling experiments measure the changes in mRNA levels after a genetic or environmental perturbation of the system. The microarray technology is the preferred platform to study gene expression changes on a global scale (Schena et al., 1995). Most bacterial microarrays are constructed in-house, although for some species, such as Escherichia coli, they are commercially available. Different experimental set ups can be used to compare global expression profiles, each of which gives different information and requires specific analysis procedures: in a genetic perturbation experiment, mRNA expression is compared between a knock out strain and a wild type, while in an environmental perturbation experiment, expression is monitored after applying an environmental trigger (for instance, growth under stress conditions). Static 1 De Keersmaecker et al. – Top down systems microbiology experiments measure gene expression after the cell has adapted to its new environment, while dynamic experiments profile the changes in expression level during cellular adaptation. Mainly encouraged by journal publishers, expression experiments are increasingly stored in public repositories, according to predefined standards (e.g., MIAME (Brazma et al., 2001)) (Table S1). Changes in mRNA abundance between conditions are not only a function of transcription, but also of mRNA stability. Therefore, Bernstein et al. determined the half-lives and steady-state abundance of 4,288 E. coli mRNAs (Bernstein et al., 2002). At the level of the RNOme Over the last few years, the importance of small non-coding RNAs (sRNAs) with diverse regulatory roles has been widely recognized (Vogel et al., 2003; Gottesman, 2005). The most exhaustive searches have taken place in E. coli, resulting in the identification of more than 50 sRNAs, corresponding to 1%-2% of the number of protein-coding genes. In prokaryotes, sRNAs can either regulate protein synthesis, by affecting mRNA transcription, translation and/or stability, or the activity of specific proteins by binding to them (Masse et al., 2003). As such, identifying sRNAs and recording their expression profiles will become as important as recording mRNA expression profiles in understanding how cells modulate gene expression. Until now, however, available data are restricted to listings of identified sRNAs in some model organisms (Table S1). At the level of the proteome Protein expression profiling experiments measure the changes in protein expression levels after a genetic or environmental perturbation of the system (Pedersen et al., 1978; VanBogelen et al., 1999). Both gel-based and gel-free (e.g. LC/MS) methods have been developed over the years (Volker and Hecker, 2005), The more recent introduction of fluorescent 2D difference gel 2 De Keersmaecker et al. – Top down systems microbiology electrophoresis (2D-DIGE) (Unlu et al., 1997) further increased reproducibility and accuracy of quantification (Van den Bergh and Arckens, 2004). Gel-based systems also easily allow the detection of post-translational modifications (PTM), such as phosphorylation (Van den Bergh and Arckens, 2005), although this approach is not as widely developed for prokaryotes as it is for yeasts (Ptacek et al., 2005). Non-gel-based alternatives relying on the separation of peptides rather than proteins, on the other hand, are more suitable to access low-abundance proteins such as membrane proteins (Volker and Hecker, 2005). Like microarray experiments, protein expression profiling can be dynamic or static. Especially for gelbased methods, however, profile analysis is complicated since the link between the observed signal and the identity of the protein is not predefined as it is in microarrays. For availability of microbial protein maps and data, see Table S1. At the level of the metabolome Measurement of metabolites gives information on how functional proteins act to transform energy and process materials. Metabolomics (Oliver et al., 1998) aims at the non-targeted, rapid, and unambiguous identification of hundreds of these metabolites in highly complex preparations. Metabolomics is a fast moving omics platform, with the majority of the papers published only in the last two years (e.g. (Koek et al., 2006)). Although a multitude of analytical platforms has previously been used, including mass spectroscopy and NMR based methods, no single technique currently enables the multiparallel analysis of the complete metabolome (Birkemeyer et al., 2005; van der Werf et al., 2005; Kell et al., 2005). In contrast to transcriptomics and proteomics, the technology involved in metabolomics is generic, as a given metabolite is the same in every organism that contains it, but we still lack metabolome databases capable of storing the plethora of data (Schauer et al., 2005) (Table S1). There exist comprehensive pathway databases, such as 3 De Keersmaecker et al. – Top down systems microbiology KEGG (Table S1). However, the available metabolite definitions are currently non-reconciled with metabolite profiles (Kopka, 2006). At the level of the interactome DNA-protein interactions Most protein-DNA interaction data in databases involve predicted and/or experimentally validated DNA motifs (see table 1 and below). Interactions between proteins and DNA can be experimentally investigated at a high-throughput level, by combining chromatin immunoprecipitation (ChIP) with whole-genome DNA microarrays (chips). The ChIP-chip technology allows the determination of the entire spectrum of in vivo DNA binding sites for any given protein (transcriptional regulator) (Buck and Lieb, 2004). Initially developed in yeast (Ren et al., 2000), Laub et al. were the first to apply the ChIP-chip technology in bacteria, i.e. in Caulobacter crescentus (Laub et al., 2002). Like gene expression data, ChIP-chip data are condition dependent and some interactions between a regulator and its target genes only occur in very specific conditions. Being tedious to generate, as it requires a separate set of microarray experiments per tested regulator and per tested condition, it is unlikely that a separate ChIP-chip compendium will be available for each condition and for each transcription factor in the short term. Extrapolating the already measured interactions to infer molecular networks for conditions not primarily tested in the ChIP-chip assay, is thus required. ChIP-chip data are only available for well selected regulators in E. coli (e.g. (Grainger et al., 2004)), Salmonella enterica (Navarre et al., 2006; Lucchini et al., 2006), Bacillus subtilis (e.g. (Molle et al., 2003)) and C. crescentus (Laub et al., 2002). In contrast to yeast, no large ChIP-chip compendia are yet available for bacteria. Protein-protein interactions 4 De Keersmaecker et al. – Top down systems microbiology Protein interaction data provide experimental information on direct interactions between proteins. To date, most of the experimental efforts to detect interactions are based on either the yeast twohybrid system (and variants thereof) (e.g. (Rain et al., 2001)) or MS identification of proteins that co-affinity purify (co-AP) with a bait protein (e.g. (Butland et al., 2005)). The two technologies detect complementary types of interactions (Uetz and Finley, 2005): yeast two-hybrid detects physical interactions while Co-AP/MS detects groups of proteins in stable complexes (functional interactions). Protein interaction data are again condition-dependent, and the lack of overlap between different datasets for a particular proteome (Uetz and Finley, 2005) and the high false negative rates (~85% in large yeast two-hybrid screens and 50% in co-AP/MS screens (Edwards et al., 2002; von Mering et al., 2002)) leave much room for improvement. Table S1 lists some databases containing experimental protein interaction data for several organisms. Other databases cover predicted protein-protein interactions (e.g. STRING, Table S1). At the level of the phenome Phenomics analyse mutation-driven phenotypes with the goal of understanding the relationship between genes and higher levels of organization in the cell. An efficient technology for assessing cellular phenotypes makes use of the Biolog system (Bochner, 2003) which uses phenotype arrays to screen growth of mutants on different substrates. Phenotypic traits are stored in a public database for the diverse model organisms, such as ASAP (Table S1). The Keio collection is a set of precisely defined, single-gene deletions of all nonessential genes in E. coli (Baba et al., 2006). In vivo molecular fluxes through metabolic pathways (the fluxome) can be considered to be the functional determinants of cellular physiology as they reflect the integration of genetic and metabolic regulation. Currently, flux analysis is based on 13 C-labelling of substrates and isotopomer distribution analysis by 2D NMR, GC-MS or LC-MS (Emmerling et al., 2002; Sauer, 2004). 5 De Keersmaecker et al. – Top down systems microbiology Omics data generated by in silico predictions based on comparative genomics Computational predictions solely based on sequence data can be considered as information complementary to and not confounded with other experimental ‘omics data’. Tools have been developed (e.g. STRING, NEBULON) that make predictions on proteins interacting with each other or belonging to the same biochemical pathway based on their co-occurrence in related genomes (phylogenetic profiling), their close linkage in several related genomes and sometimes their fusion in certain genomes and rearrangements of predicted operons. Other comparative tools (phylogenetic footprinting) compare intergenic regions of orthologs to search for evolutionary conserved regions. The conservation of these regions through evolution might point towards their biological importance. Some of these regions correspond to DNA-motifs, (i.e, the short conserved DNA-sequences located in the promoter region of genes that serve as the recognition sites for transcriptional regulators) (McCue et al., 2001), others to regulatory RNA elements located in the non-coding regions (e.g. (Rivas et al., 2001; Rodionov et al., 2003)). Specialized databases exist that contain information on regulatory motifs of diverse organisms (Table S1). The drawback of in silico predictions is that they usually contain many false positives (e.g., motifs that are not biologically relevant). References 1. Baba, T., Ara, T., Hasegawa, M., Takai, Y., Okumura, Y., Baba, M., Datsenko, K.A., Tomita, M., Wanner, B.L., and Mori, H. (2006) Construction of Escherichia coli K-12 inframe, single-gene knockout mutants: the Keio collection. Mol Sys Biol msb4100050-E. 2. Bernstein, J.A., Khodursky, A.B., Lin, P.H., Lin-Chao, S., and Cohen, S.N. (2002) Global analysis of mRNA decay and abundance in Escherichia coli at single-gene resolution using two-color fluorescent DNA microarrays. Proc Natl Acad Sci U S A 99: 9697-9702. 3. Birkemeyer, C., Luedemann, A., Wagner, C., Erban, A., and Kopka, J. (2005) Metabolome analysis: the potential of in vivo labeling with stable isotopes for metabolite profiling. Trends Biotechnol 23: 28-33. 6 De Keersmaecker et al. – Top down systems microbiology 4. Bochner, B.R. (2003) New technologies to assess genotype-phenotype relationships. Nat Rev Genet 4: 309-314. 5. Brazma, A., Hingamp, P., Quackenbush, J., Sherlock, G., Spellman, P., Stoeckert, C., Aach, J., Ansorge, W., Ball, C.A., Causton, H.C. , Gaasterland, T., Glenisson, P. , Holstege, F.C., Kim, I.F., Markowitz, V., Matese, J.C., Parkinson, H., Robinson, A., Sarkans, U., SchulzeKremer, S., Stewart, J., Taylor, R., Vilo, J., and Vingron, M. (2001) Minimum information about a microarray experiment (MIAME)-toward standards for microarray data. Nat Genet 29: 365-371. 6. Buck, M.J., Lieb, J.D. (2004) ChIP-chip: considerations for the design, analysis, and application of genome-wide chromatin immunoprecipitation experiments. Genomics 83: 349-360. 7. Butland, G., Peregrin-Alvarez, J.M., Li, J., Yang, W., Yang, X., Canadien, V., Starostine, A., Richards, D., Beattie, B., Krogan, N., Davey, M., Parkinson, J., Greenblatt, J., and Emili, A. (2005) Interaction network containing conserved and essential protein complexes in Escherichia coli. Nature 433: 531-537. 8. Edwards, A.M., Kus, B., Jansen, R., Greenbaum, D., Greenblatt, J., and Gerstein, M. (2002) Bridging structural biology and genomics: assessing protein interaction data with known complexes. Trends Genet 18: 529-536. 9. Emmerling, M., Dauner, M., Ponti, A., Fiaux, J., Hochuli, M., Szyperski, T., Wuthrich, K., Bailey, J.E., and Sauer, U. (2002) Metabolic flux responses to pyruvate kinase knockout in Escherichia coli. J Bacteriol 184: 152-164. 10. Fleischmann, R.D., Adams, M.D., White, O., Clayton, R.A., Kirkness, E.F., Kerlavage, A.R., Bult, C.J., Tomb, J.F., Dougherty, B.A., and Merrick, J.M. (1995) Whole-genome random sequencing and assembly of Haemophilus influenzae Rd. Science 269: 496-512. 11. Gottesman, S. (2005) Micros for microbes: non-coding regulatory RNAs in bacteria. Trends Genet 21: 399-404. 12. Grainger, D.C., Overton, T.W., Reppas, N., Wade, J.T., Tamai, E., Hobman, J.L., Constantinidou, C., Struhl, K., Church, G., and Busby, S.J. (2004) Genomic studies with Escherichia coli MelR protein: applications of chromatin immunoprecipitation and microarrays. J Bacteriol 186: 6938-6943. 13. Kell, D.B., Brown, M., Davey, H.M., Dunn, W.B., Spasic, I., and Oliver, S.G. (2005) Metabolic footprinting and systems biology: the medium is the message. Nat Rev Microbiol 3: 557-565. 14. Koek, M.M., Muilwijk, B., van der Werf, M.J., and Hankemeier, T. (2006) Microbial metabolomics with gas chromatography/mass spectrometry. Anal Chem 78: 1272-1281. 15. Kopka, J. (2006) Current challenges and developments in GC-MS based metabolite profiling technology. J Biotechnol 124: 312-322. 7 De Keersmaecker et al. – Top down systems microbiology 16. Laub, M.T., Chen, S.L., Shapiro, L., and McAdams, H.H. (2002) Genes directly controlled by CtrA, a master regulator of the Caulobacter cell cycle. Proc Natl Acad Sci U S A 99: 4632-4637. 17. Lucchini, S., Rowley, G., Goldberg, M.D., Hurd, D., Harrison, M., and Hinton, J.C. (2006) H-NS mediates the silencing of laterally acquired genes in bacteria. PLoS Pathog 2. 18. Masse, E., Majdalani, N., and Gottesman, S. (2003) Regulatory roles for small RNAs in bacteria. Curr Opin Microbiol 6: 120-124. 19. McCue, L., Thompson, W., Carmack, C., Ryan, M.P., Liu, J.S., Derbyshire, V., and Lawrence, C.E. (2001) Phylogenetic footprinting of transcription factor binding sites in proteobacterial genomes. Nucleic Acids Res 29: 774-782. 20. Molle, V., Fujita, M., Jensen, S.T., Eichenberger, P., Gonzalez-Pastor, J.E., Liu, J.S., and Losick, R. (2003) The Spo0A regulon of Bacillus subtilis. Mol Microbiol 50: 1683-1701. 21. Navarre, W.W., Porwollik, S., Wang, Y., McClelland, M., Rosen, H., Libby, S.J., and Fang, F.C. (2006) Selective silencing of foreign DNA with low GC content by the H-NS protein in Salmonella. Science 313: 236-238. 22. Oliver, S.G., Winson, M.K., Kell, D.B., and Baganz, F. (1998) Systematic functional analysis of the yeast genome. Trends Biotechnol 16: 373-378. 23. Pedersen, S., Bloch, P.L., Reeh, S., and Neidhardt, F.C. (1978) Patterns of protein synthesis in E. coli: a catalog of the amount of 140 individual proteins at different growth rates. Cell 14: 179-190. 24. Ptacek, J., Devgan, G., Michaud, G., Zhu, H., Zhu, X., Fasolo, J., Guo, H., Jona, G., Breitkreutz, A., Sopko, R., McCartney, R.R., Schmidt, M.C., Rachidi, N., Lee, S.J., Mah, A.S., Meng, L., Stark, M.J., Stern, D.F., De Virgilio, C., Tyers, M., Andrews, B., Gerstein, M., Schweitzer, B., Predki, P.F., and Snyder, M. (2005) Global analysis of protein phosphorylation in yeast. Nature 438: 679-684. 25. Rain, J.C., Selig, L., De Reuse, H., Battaglia, V., Reverdy, C., Simon, S., Lenzen, G., Petel, F., Wojcik, J., Schachter, V., Chemama, Y., Labigne, A., and Legrain, P. (2001) The protein-protein interaction map of Helicobacter pylori. Nature 409: 211-215. 26. Ren, B., Robert, F., Wyrick, J.J., Aparicio, O., Jennings, E.G., Simon, I., Zeitlinger, J., Schreiber, J., Hannett, N., Kanin, E., Volkert, T.L., Wilson, C.J., Bell, S.P., and Young, R.A. (2000) Genome-wide location and function of DNA binding proteins. Science 290: 2306-2309. 27. Rivas, E., Klein, R.J., Jones, T.A., and Eddy, S.R. (2001) Computational identification of noncoding RNAs in E. coli by comparative genomics. Curr Biol 11: 1369-1373. 8 De Keersmaecker et al. – Top down systems microbiology 28. Rodionov, D.A., Vitreschak, A.G., Mironov, A.A., and Gelfand, M.S. (2003) Regulation of lysine biosynthesis and transport genes in bacteria: yet another RNA riboswitch? Nucleic Acids Res 31: 6748-6757. 29. Sauer, U. (2004) High-throughput phenomics: experimental methods for mapping fluxomes. Curr Opin Biotechnol 15: 58-63. 30. Schauer, N., Steinhauser, D., Strelkov, S., Schomburg, D., Allison, G., Moritz, T., Lundgren, K., Roessner-Tunali, U., Forbes, M.G., Willmitzer, L., Fernie, A.R., and Kopka, J. (2005) GC-MS libraries for the rapid identification of metabolites in complex biological samples. FEBS Lett 579: 1332-1337. 31. Schena, M., Shalon, D., Davis, R.W., and Brown, P.O. (1995) Quantitative monitoring of gene expression patterns with a complementary DNA microarray. Science 270: 467-470. 32. Uetz, P., Finley, R.L.Jr. (2005) From protein networks to biological systems. FEBS Lett 579: 1821-1827. 33. Unlu, M., Morgan, M.E., and Minden, J.S. (1997) Difference gel electrophoresis: a single gel method for detecting changes in protein extracts. Electrophoresis 18: 2071-2077. 34. Van den Bergh, G., Arckens, L. (2004) Fluorescent two-dimensional difference gel electrophoresis unveils the potential of gel-based proteomics. Curr Opin Biotechnol 15: 3843. 35. Van den Bergh, G., Arckens, L. (2005) Recent advances in 2D electrophoresis: an array of possibilities. Expert Rev Proteomics 2: 243-252. 36. van der Werf, M.J., Jellema, R.H., and Hankemeier, T. (2005) Microbial metabolomics: replacing trial-and-error by the unbiased selection and ranking of targets. J Ind Microbiol Biotechnol 32: 234-252. 37. Van Bogelen, R.A., Schiller, E.E., Thomas, J.D., and Neidhardt, F.C. (1999) Diagnosis of cellular states of microbial organisms using proteomics. Electrophoresis 20: 2149-2159. 38. Vogel, J., Bartels, V., Tang, T.H., Churakov, G., Slagter-Jager, J.G., Huttenhofer, A., and Wagner, E.G. (2003) RNomics in Escherichia coli detects new sRNA species and indicates parallel transcriptional output in bacteria. Nucleic Acids Res 31: 6435-6443. 39. Volker, U., Hecker, M. (2005) From genomics via proteomics to cellular physiology of the Gram-positive model organism Bacillus subtilis. Cell Microbiol 7: 1077-1085. 40. von Mering, C., Krause, R., Snel, B., Cornell, M., Oliver, S.G., Fields, S., and Bork, P. (2002) Comparative assessment of large-scale data sets of protein-protein interactions. Nature 417: 399-403. 9