MMI5453AppendixS1

advertisement
De Keersmaecker et al. – Top down systems microbiology
Supplementary information: Overview of omics data
Omics data can either be generated in high-throughput experiments or by in silico prediction
through mining of genome sequences. Although some are provided as supplementary data of the
corresponding research papers, the number of prokaryotic omics data publicly available in
databases is still limited (Table S1). However, such repositories start being constructed for
several bacterial species (Table S1).
Omics data generated by high-throughput experiments
At the level of the genome
A laboratory strain of Haemophilus influenzae was the first bacterium to be sequenced
(Fleischmann et al., 1995). Up till now, more than 300 complete bacterial genomes are publicly
available and many others are anticipated. These data are often a prerequisite for recent omics
technologies.
At the level of the transcriptome
Expression profiling experiments measure the changes in mRNA levels after a genetic or
environmental perturbation of the system. The microarray technology is the preferred platform to
study gene expression changes on a global scale (Schena et al., 1995). Most bacterial microarrays
are constructed in-house, although for some species, such as Escherichia coli, they are
commercially available. Different experimental set ups can be used to compare global expression
profiles, each of which gives different information and requires specific analysis procedures: in a
genetic perturbation experiment, mRNA expression is compared between a knock out strain and a
wild type, while in an environmental perturbation experiment, expression is monitored after
applying an environmental trigger (for instance, growth under stress conditions). Static
1
De Keersmaecker et al. – Top down systems microbiology
experiments measure gene expression after the cell has adapted to its new environment, while
dynamic experiments profile the changes in expression level during cellular adaptation.
Mainly encouraged by journal publishers, expression experiments are increasingly stored in
public repositories, according to predefined standards (e.g., MIAME (Brazma et al., 2001))
(Table S1).
Changes in mRNA abundance between conditions are not only a function of transcription, but
also of mRNA stability. Therefore, Bernstein et al. determined the half-lives and steady-state
abundance of 4,288 E. coli mRNAs (Bernstein et al., 2002).
At the level of the RNOme
Over the last few years, the importance of small non-coding RNAs (sRNAs) with diverse
regulatory roles has been widely recognized (Vogel et al., 2003; Gottesman, 2005). The most
exhaustive searches have taken place in E. coli, resulting in the identification of more than 50
sRNAs, corresponding to 1%-2% of the number of protein-coding genes. In prokaryotes, sRNAs
can either regulate protein synthesis, by affecting mRNA transcription, translation and/or
stability, or the activity of specific proteins by binding to them (Masse et al., 2003). As such,
identifying sRNAs and recording their expression profiles will become as important as recording
mRNA expression profiles in understanding how cells modulate gene expression. Until now,
however, available data are restricted to listings of identified sRNAs in some model organisms
(Table S1).
At the level of the proteome
Protein expression profiling experiments measure the changes in protein expression levels after a
genetic or environmental perturbation of the system (Pedersen et al., 1978; VanBogelen et al.,
1999). Both gel-based and gel-free (e.g. LC/MS) methods have been developed over the years
(Volker and Hecker, 2005), The more recent introduction of fluorescent 2D difference gel
2
De Keersmaecker et al. – Top down systems microbiology
electrophoresis (2D-DIGE) (Unlu et al., 1997) further increased reproducibility and accuracy of
quantification (Van den Bergh and Arckens, 2004).
Gel-based systems also easily allow the detection of post-translational modifications (PTM), such
as phosphorylation (Van den Bergh and Arckens, 2005), although this approach is not as widely
developed for prokaryotes as it is for yeasts (Ptacek et al., 2005). Non-gel-based alternatives
relying on the separation of peptides rather than proteins, on the other hand, are more suitable to
access low-abundance proteins such as membrane proteins (Volker and Hecker, 2005). Like
microarray experiments, protein expression profiling can be dynamic or static. Especially for gelbased methods, however, profile analysis is complicated since the link between the observed
signal and the identity of the protein is not predefined as it is in microarrays. For availability of
microbial protein maps and data, see Table S1.
At the level of the metabolome
Measurement of metabolites gives information on how functional proteins act to transform
energy and process materials. Metabolomics (Oliver et al., 1998) aims at the non-targeted, rapid,
and unambiguous identification of hundreds of these metabolites in highly complex preparations.
Metabolomics is a fast moving omics platform, with the majority of the papers published only in
the last two years (e.g. (Koek et al., 2006)). Although a multitude of analytical platforms has
previously been used, including mass spectroscopy and NMR based methods, no single technique
currently enables the multiparallel analysis of the complete metabolome (Birkemeyer et al., 2005;
van der Werf et al., 2005; Kell et al., 2005). In contrast to transcriptomics and proteomics, the
technology involved in metabolomics is generic, as a given metabolite is the same in every
organism that contains it, but we still lack metabolome databases capable of storing the plethora
of data (Schauer et al., 2005) (Table S1). There exist comprehensive pathway databases, such as
3
De Keersmaecker et al. – Top down systems microbiology
KEGG (Table S1). However, the available metabolite definitions are currently non-reconciled
with metabolite profiles (Kopka, 2006).
At the level of the interactome
DNA-protein interactions
Most protein-DNA interaction data in databases involve predicted and/or experimentally
validated DNA motifs (see table 1 and below). Interactions between proteins and DNA can be
experimentally
investigated
at
a
high-throughput
level,
by
combining
chromatin
immunoprecipitation (ChIP) with whole-genome DNA microarrays (chips). The ChIP-chip
technology allows the determination of the entire spectrum of in vivo DNA binding sites for any
given protein (transcriptional regulator) (Buck and Lieb, 2004). Initially developed in yeast (Ren
et al., 2000), Laub et al. were the first to apply the ChIP-chip technology in bacteria, i.e. in
Caulobacter crescentus (Laub et al., 2002). Like gene expression data, ChIP-chip data are
condition dependent and some interactions between a regulator and its target genes only occur in
very specific conditions. Being tedious to generate, as it requires a separate set of microarray
experiments per tested regulator and per tested condition, it is unlikely that a separate ChIP-chip
compendium will be available for each condition and for each transcription factor in the short
term. Extrapolating the already measured interactions to infer molecular networks for conditions
not primarily tested in the ChIP-chip assay, is thus required. ChIP-chip data are only available for
well selected regulators in E. coli (e.g. (Grainger et al., 2004)), Salmonella enterica (Navarre et
al., 2006; Lucchini et al., 2006), Bacillus subtilis (e.g. (Molle et al., 2003)) and C. crescentus
(Laub et al., 2002). In contrast to yeast, no large ChIP-chip compendia are yet available for
bacteria.
Protein-protein interactions
4
De Keersmaecker et al. – Top down systems microbiology
Protein interaction data provide experimental information on direct interactions between proteins.
To date, most of the experimental efforts to detect interactions are based on either the yeast twohybrid system (and variants thereof) (e.g. (Rain et al., 2001)) or MS identification of proteins that
co-affinity purify (co-AP) with a bait protein (e.g. (Butland et al., 2005)). The two technologies
detect complementary types of interactions (Uetz and Finley, 2005): yeast two-hybrid detects
physical interactions while Co-AP/MS detects groups of proteins in stable complexes (functional
interactions). Protein interaction data are again condition-dependent, and the lack of overlap
between different datasets for a particular proteome (Uetz and Finley, 2005) and the high false
negative rates (~85% in large yeast two-hybrid screens and 50% in co-AP/MS screens (Edwards
et al., 2002; von Mering et al., 2002)) leave much room for improvement. Table S1 lists some
databases containing experimental protein interaction data for several organisms. Other databases
cover predicted protein-protein interactions (e.g. STRING, Table S1).
At the level of the phenome
Phenomics analyse mutation-driven phenotypes with the goal of understanding the relationship
between genes and higher levels of organization in the cell. An efficient technology for assessing
cellular phenotypes makes use of the Biolog system (Bochner, 2003) which uses phenotype
arrays to screen growth of mutants on different substrates. Phenotypic traits are stored in a public
database for the diverse model organisms, such as ASAP (Table S1). The Keio collection is a set
of precisely defined, single-gene deletions of all nonessential genes in E. coli (Baba et al., 2006).
In vivo molecular fluxes through metabolic pathways (the fluxome) can be considered to be the
functional determinants of cellular physiology as they reflect the integration of genetic and
metabolic regulation. Currently, flux analysis is based on
13
C-labelling of substrates and
isotopomer distribution analysis by 2D NMR, GC-MS or LC-MS (Emmerling et al., 2002; Sauer,
2004).
5
De Keersmaecker et al. – Top down systems microbiology
Omics data generated by in silico predictions based on comparative genomics
Computational predictions solely based on sequence data can be considered as information
complementary to and not confounded with other experimental ‘omics data’. Tools have been
developed (e.g. STRING, NEBULON) that make predictions on proteins interacting with each
other or belonging to the same biochemical pathway based on their co-occurrence in related
genomes (phylogenetic profiling), their close linkage in several related genomes and sometimes
their fusion in certain genomes and rearrangements of predicted operons. Other comparative tools
(phylogenetic footprinting) compare intergenic regions of orthologs to search for evolutionary
conserved regions. The conservation of these regions through evolution might point towards their
biological importance. Some of these regions correspond to DNA-motifs, (i.e, the short
conserved DNA-sequences located in the promoter region of genes that serve as the recognition
sites for transcriptional regulators) (McCue et al., 2001), others to regulatory RNA elements
located in the non-coding regions (e.g. (Rivas et al., 2001; Rodionov et al., 2003)). Specialized
databases exist that contain information on regulatory motifs of diverse organisms (Table S1).
The drawback of in silico predictions is that they usually contain many false positives (e.g.,
motifs that are not biologically relevant).
References
1. Baba, T., Ara, T., Hasegawa, M., Takai, Y., Okumura, Y., Baba, M., Datsenko, K.A.,
Tomita, M., Wanner, B.L., and Mori, H. (2006) Construction of Escherichia coli K-12 inframe, single-gene knockout mutants: the Keio collection. Mol Sys Biol msb4100050-E.
2. Bernstein, J.A., Khodursky, A.B., Lin, P.H., Lin-Chao, S., and Cohen, S.N. (2002) Global
analysis of mRNA decay and abundance in Escherichia coli at single-gene resolution using
two-color fluorescent DNA microarrays. Proc Natl Acad Sci U S A 99: 9697-9702.
3. Birkemeyer, C., Luedemann, A., Wagner, C., Erban, A., and Kopka, J. (2005) Metabolome
analysis: the potential of in vivo labeling with stable isotopes for metabolite profiling.
Trends Biotechnol 23: 28-33.
6
De Keersmaecker et al. – Top down systems microbiology
4. Bochner, B.R. (2003) New technologies to assess genotype-phenotype relationships. Nat
Rev Genet 4: 309-314.
5. Brazma, A., Hingamp, P., Quackenbush, J., Sherlock, G., Spellman, P., Stoeckert, C., Aach,
J., Ansorge, W., Ball, C.A., Causton, H.C. , Gaasterland, T., Glenisson, P. , Holstege, F.C.,
Kim, I.F., Markowitz, V., Matese, J.C., Parkinson, H., Robinson, A., Sarkans, U., SchulzeKremer, S., Stewart, J., Taylor, R., Vilo, J., and Vingron, M. (2001) Minimum information
about a microarray experiment (MIAME)-toward standards for microarray data. Nat Genet
29: 365-371.
6. Buck, M.J., Lieb, J.D. (2004) ChIP-chip: considerations for the design, analysis, and
application of genome-wide chromatin immunoprecipitation experiments. Genomics 83:
349-360.
7. Butland, G., Peregrin-Alvarez, J.M., Li, J., Yang, W., Yang, X., Canadien, V., Starostine,
A., Richards, D., Beattie, B., Krogan, N., Davey, M., Parkinson, J., Greenblatt, J., and
Emili, A. (2005) Interaction network containing conserved and essential protein complexes
in Escherichia coli. Nature 433: 531-537.
8. Edwards, A.M., Kus, B., Jansen, R., Greenbaum, D., Greenblatt, J., and Gerstein, M. (2002)
Bridging structural biology and genomics: assessing protein interaction data with known
complexes. Trends Genet 18: 529-536.
9. Emmerling, M., Dauner, M., Ponti, A., Fiaux, J., Hochuli, M., Szyperski, T., Wuthrich, K.,
Bailey, J.E., and Sauer, U. (2002) Metabolic flux responses to pyruvate kinase knockout in
Escherichia coli. J Bacteriol 184: 152-164.
10. Fleischmann, R.D., Adams, M.D., White, O., Clayton, R.A., Kirkness, E.F., Kerlavage,
A.R., Bult, C.J., Tomb, J.F., Dougherty, B.A., and Merrick, J.M. (1995) Whole-genome
random sequencing and assembly of Haemophilus influenzae Rd. Science 269: 496-512.
11. Gottesman, S. (2005) Micros for microbes: non-coding regulatory RNAs in bacteria. Trends
Genet 21: 399-404.
12. Grainger, D.C., Overton, T.W., Reppas, N., Wade, J.T., Tamai, E., Hobman, J.L.,
Constantinidou, C., Struhl, K., Church, G., and Busby, S.J. (2004) Genomic studies with
Escherichia coli MelR protein: applications of chromatin immunoprecipitation and
microarrays. J Bacteriol 186: 6938-6943.
13. Kell, D.B., Brown, M., Davey, H.M., Dunn, W.B., Spasic, I., and Oliver, S.G. (2005)
Metabolic footprinting and systems biology: the medium is the message. Nat Rev Microbiol
3: 557-565.
14. Koek, M.M., Muilwijk, B., van der Werf, M.J., and Hankemeier, T. (2006) Microbial
metabolomics with gas chromatography/mass spectrometry. Anal Chem 78: 1272-1281.
15. Kopka, J. (2006) Current challenges and developments in GC-MS based metabolite
profiling technology. J Biotechnol 124: 312-322.
7
De Keersmaecker et al. – Top down systems microbiology
16. Laub, M.T., Chen, S.L., Shapiro, L., and McAdams, H.H. (2002) Genes directly controlled
by CtrA, a master regulator of the Caulobacter cell cycle. Proc Natl Acad Sci U S A 99:
4632-4637.
17. Lucchini, S., Rowley, G., Goldberg, M.D., Hurd, D., Harrison, M., and Hinton, J.C. (2006)
H-NS mediates the silencing of laterally acquired genes in bacteria. PLoS Pathog 2.
18. Masse, E., Majdalani, N., and Gottesman, S. (2003) Regulatory roles for small RNAs in
bacteria. Curr Opin Microbiol 6: 120-124.
19. McCue, L., Thompson, W., Carmack, C., Ryan, M.P., Liu, J.S., Derbyshire, V., and
Lawrence, C.E. (2001) Phylogenetic footprinting of transcription factor binding sites in
proteobacterial genomes. Nucleic Acids Res 29: 774-782.
20. Molle, V., Fujita, M., Jensen, S.T., Eichenberger, P., Gonzalez-Pastor, J.E., Liu, J.S., and
Losick, R. (2003) The Spo0A regulon of Bacillus subtilis. Mol Microbiol 50: 1683-1701.
21. Navarre, W.W., Porwollik, S., Wang, Y., McClelland, M., Rosen, H., Libby, S.J., and Fang,
F.C. (2006) Selective silencing of foreign DNA with low GC content by the H-NS protein
in Salmonella. Science 313: 236-238.
22. Oliver, S.G., Winson, M.K., Kell, D.B., and Baganz, F. (1998) Systematic functional
analysis of the yeast genome. Trends Biotechnol 16: 373-378.
23. Pedersen, S., Bloch, P.L., Reeh, S., and Neidhardt, F.C. (1978) Patterns of protein synthesis
in E. coli: a catalog of the amount of 140 individual proteins at different growth rates. Cell
14: 179-190.
24. Ptacek, J., Devgan, G., Michaud, G., Zhu, H., Zhu, X., Fasolo, J., Guo, H., Jona, G.,
Breitkreutz, A., Sopko, R., McCartney, R.R., Schmidt, M.C., Rachidi, N., Lee, S.J., Mah,
A.S., Meng, L., Stark, M.J., Stern, D.F., De Virgilio, C., Tyers, M., Andrews, B., Gerstein,
M., Schweitzer, B., Predki, P.F., and Snyder, M. (2005) Global analysis of protein
phosphorylation in yeast. Nature 438: 679-684.
25. Rain, J.C., Selig, L., De Reuse, H., Battaglia, V., Reverdy, C., Simon, S., Lenzen, G., Petel,
F., Wojcik, J., Schachter, V., Chemama, Y., Labigne, A., and Legrain, P. (2001) The
protein-protein interaction map of Helicobacter pylori. Nature 409: 211-215.
26. Ren, B., Robert, F., Wyrick, J.J., Aparicio, O., Jennings, E.G., Simon, I., Zeitlinger, J.,
Schreiber, J., Hannett, N., Kanin, E., Volkert, T.L., Wilson, C.J., Bell, S.P., and Young,
R.A. (2000) Genome-wide location and function of DNA binding proteins. Science 290:
2306-2309.
27. Rivas, E., Klein, R.J., Jones, T.A., and Eddy, S.R. (2001) Computational identification of
noncoding RNAs in E. coli by comparative genomics. Curr Biol 11: 1369-1373.
8
De Keersmaecker et al. – Top down systems microbiology
28. Rodionov, D.A., Vitreschak, A.G., Mironov, A.A., and Gelfand, M.S. (2003) Regulation of
lysine biosynthesis and transport genes in bacteria: yet another RNA riboswitch? Nucleic
Acids Res 31: 6748-6757.
29. Sauer, U. (2004) High-throughput phenomics: experimental methods for mapping
fluxomes. Curr Opin Biotechnol 15: 58-63.
30. Schauer, N., Steinhauser, D., Strelkov, S., Schomburg, D., Allison, G., Moritz, T.,
Lundgren, K., Roessner-Tunali, U., Forbes, M.G., Willmitzer, L., Fernie, A.R., and Kopka,
J. (2005) GC-MS libraries for the rapid identification of metabolites in complex biological
samples. FEBS Lett 579: 1332-1337.
31. Schena, M., Shalon, D., Davis, R.W., and Brown, P.O. (1995) Quantitative monitoring of
gene expression patterns with a complementary DNA microarray. Science 270: 467-470.
32. Uetz, P., Finley, R.L.Jr. (2005) From protein networks to biological systems. FEBS Lett
579: 1821-1827.
33. Unlu, M., Morgan, M.E., and Minden, J.S. (1997) Difference gel electrophoresis: a single
gel method for detecting changes in protein extracts. Electrophoresis 18: 2071-2077.
34. Van den Bergh, G., Arckens, L. (2004) Fluorescent two-dimensional difference gel
electrophoresis unveils the potential of gel-based proteomics. Curr Opin Biotechnol 15: 3843.
35. Van den Bergh, G., Arckens, L. (2005) Recent advances in 2D electrophoresis: an array of
possibilities. Expert Rev Proteomics 2: 243-252.
36. van der Werf, M.J., Jellema, R.H., and Hankemeier, T. (2005) Microbial metabolomics:
replacing trial-and-error by the unbiased selection and ranking of targets. J Ind Microbiol
Biotechnol 32: 234-252.
37. Van Bogelen, R.A., Schiller, E.E., Thomas, J.D., and Neidhardt, F.C. (1999) Diagnosis of
cellular states of microbial organisms using proteomics. Electrophoresis 20: 2149-2159.
38. Vogel, J., Bartels, V., Tang, T.H., Churakov, G., Slagter-Jager, J.G., Huttenhofer, A., and
Wagner, E.G. (2003) RNomics in Escherichia coli detects new sRNA species and indicates
parallel transcriptional output in bacteria. Nucleic Acids Res 31: 6435-6443.
39. Volker, U., Hecker, M. (2005) From genomics via proteomics to cellular physiology of the
Gram-positive model organism Bacillus subtilis. Cell Microbiol 7: 1077-1085.
40. von Mering, C., Krause, R., Snel, B., Cornell, M., Oliver, S.G., Fields, S., and Bork, P.
(2002) Comparative assessment of large-scale data sets of protein-protein interactions.
Nature 417: 399-403.
9
Download