High-resolution microarray analysis of RNA degradation in Escherichia coli A thesis presented by Douglas Wayne Selinger to The Division of Medical Sciences in partial fulfillment of the requirements for the degree of Doctor of Philosophy in the subject of Genetics Harvard University Cambridge, Massachusetts November 22, 2002 i Copyright 2002 by Douglas Wayne Selinger All rights reserved. ii Advisor: George M. Church Douglas Wayne Selinger High-resolution microarray analysis of RNA degradation in Escherichia coli Abstract Reductionist biological research has been one of the most successful scientific enterprises of our age, elucidating everything from the molecular basis of genetic information to the functioning of the cellular machinery. Perhaps it was inevitable that computers, the other salient scientific development of our time, would transform biology with its paradigms of miniaturization, automation, and digital information. DNA microarrays are an exciting product of this technological fusion, allowing the simultaneous monitoring of thousands of RNA transcripts in a miniaturized, massively parallel, machine-readable format. In Chapter 2, I describe the first use of a "genome" microarray, which has probes for both open reading frames (ORFs) and intergenic regions in the sequenced model organism Escherichia coli MG1655. This array, synthesized by Affymetrix using a highly parallel light-directed in situ oligonucleotide synthesis method adapted from the semiconductor industry, contains almost 300,000 oligonucleotide probes of known sequence. This large number of oligos allows the genome to be sampled at an average resolution of ~1 oligonucleotide probe every 30 bases. In the course of this work I developed an RNA labeling protocol based on random priming useful for expression analysis in E. coli and potentially other prokaryotes. I also developed a set of freely-available software tools, collectively named Genome Array Processing Software (GAPS) (Appendix B), which are useful for analyzing gene iii expression data as well as for subgenic-resolution mapping of expression data to the genome. I describe the application of this technology to compare RNA expression profiles between cultures of E. coli growing in rich medium at logarithmic versus stationary phase. In Chapter 3, I describe a global analysis of RNA degradation which resulted in the measurement of as many as 2,679 RNA chemical half-lives (listed in Appendix C), representing ~60% of the known and predicted ORFs. High-resolution analysis of this rifampicin timecourse revealed that there are highly significant positional patterns in the degradation of different operonic regions, with 5' regions degraded more quickly and 3' ones more slowly. This result confirms, and further generalizes, the current model of a net 5' to 3' directionality of degradation. iv Table of Contents Chapter 1 - Introduction 1.1 Systems Biology Completion (SBC) 1.2 Prokaryotic DNA microarray analysis 1.3 RNA decay in E. coli 1 2 9 13 Chapter 2 RNA expression analysis using a 30 base pair resolution Escherichia coli genome array 20 Chapter 3 Global RNA half-life analysis in Escherichia coli reveals positional patterns of transcript degradation 50 Chapter 4 Conclusion 78 Appendix A Selinger D. W. et al, Nature Biotechnology 18, 12621268 (2000). 87 Appendix B Genome Array Processing Software (GAPS) manual 95 Appendix C Half-lives of 2,679 E. coli mRNAs 112 v I dedicate this thesis to my wife, who is my support and inspiration, and to my parents, who let me find my path and gave me the strength to follow it. This thesis is also dedicated to those who marvel at the workings of nature, but do not have the means to explore them. vi Acknowledgements My decision to pursue a Ph.D. degree was made in the 10th grade, precisely at the moment I learned about DNA and the molecular basis of genetics. It has been a long and exciting road from that moment to the completion of my thesis, and has been possible only by the support of many people along the way. Let me begin, however, at the end. George Church has been a better guide for my scientific wanderings than I could have possibly hoped for. It is plain to see that his drive to continually push the envelope in his research is motivated by the pure joy of discovery. His lab is a haven for ideas and expertise from every field, where order is imposed gently, and mainly by example. George's visionary nature, unique brand of pragmatism, and exceptional rigor, inspires emulation by his students and breeds independent thinking. He enjoys his students' company, and is a devoted mentor and teacher. I can not imagine a better place to have spent my graduate career. When I first arrived in the lab, I was fascinated by the spirited lab meeting debates, often with Fritz and Pete on opposing ends, and the rest of us moderating and contributing in between. They were always rigorous and hard-fought, but also constructive and balanced. They energized me as they continue to do to this day. I learned the importance of thinking quantitatively, the utility of ignoring artificial boundaries between disciplines, and how to distinguish between things that can't be done, and those that simply haven't been done yet. This initial group of people were both mentors and colleagues to me. I thank Fritz Roth for being the first to take me under his wing and for suggesting I apply DNA microarrays to RNA degradation. I thank Pete Estep for giving me an appreciation for the biotech business, many challenging debates, vii and his boundless and infectious energy. I thank Martha Bulyk for her wealth of experimental knowledge and her willingness to share it. I thank Saeed Tavazoie for his enthusiasm for math and physics, and for proclaiming long before most biologists, "this data is screaming to be clustered!" I thank Dereth Phillips for being a model teacher, matriarch of the lab, and spreader of good cheer and good science. I thank Jason Johnson for many insights into protein structure and for the best fish fry in history. I thank Jason Hughes, father of AlignACE, for his many critiques of computational biology issues. I thank Martin Steffen for helpful advice, beyond measure, on experimental design, and for the wealth of facts he brings to a discussion on just about any topic. I thank Rob Mitra for his wealth of humbly-presented, always helpful, questions and insights, and for his lab meeting presentations, which were filled with enough intrigue and methodical detective work to put Arthur Conan Doyle to shame. I thank John Aach for the rigorous and quantitative advice he humbly provides on almost any topic, and for many discussions on philosophy. I thank Abby McGuire for her contributions to microbial informatics and for many programming tips. I thank Keith Robison for introducing bioinformatics to Harvard. I had the pleasure of learning yeast genetics during a rotation with Fred Winston, who is both a master scientist and teacher. I also spent a rewarding rotation with Roger Brent who encouraged me to think about biology more broadly. Soon the field of genomics exploded and the Church lab expanded rapidly to include an exciting group of students and post docs. I was incredibly fortunate to be surrounded by such bright and interesting people. They have been my companions through discussions of almost every topic imaginable. I thank Vasu Badarinarayana for being my prokaryotic ally as the lab turned eukaryotic (him too, eventually) and for being viii the first user of GAPS©. I thank Kevin Cheung, a Harvard undergraduate at the time, for his help in carrying out the RNA rifampicin timecourse and for his excitement for research. I thank Barak Cohen for years of personal and practical advice and for stories from the front lines of competitive bird watching. I thank Patrik D'haeseleer for always being a cheerful and interested source of advice. I thank Adnan Derti for the initial BLASTing of the Affymetrix E. coli oligos against the genome, and for his general programming prowess and social conscience. I thank Aimee Dudley for her deep understanding of yeast genetics in particular, and experimental practice in general, which has so benefited the lab. I thank Jeremy Edwards for discussions about flux balance analysis and metabolism. I thank Yonatan Grad for endless, and I mean endless, (but generally funny) puns. I thank Xiaohua Huang for experimental advice on array signal amplification. I thank Jake Jaffe for teaching me about mass spectrometry and Mycoplasmas, and for being one of the few to join me on my biking expeditions. I thank Dan Janse for his parties and for his tea room companionship. I thank Peter Karchenko for his expertise on biochemical systems modeling. I thank Felix Lam for his help with the fermentor and his streamlined ordering system. I thank Kyriacos Leptos for identifying the linguistic roots of any word imaginable. I thank Nobuhisa Masuda for his helpful Nihongo (Japanese) lessons. I thank Tzachi Pilpel for his enthusiasm for teaching and mentoring and his early Matlab help. I thank Allegra Petti for sitting next me in the computer lab and sharing the little frustrations that computers like to send our way. I thank Nick Reppas for his clear thinking as a TF for biophysics 101 and for sharing his experiences with Buddhism. I thank Wayne Rindone for keeping track of the important details of maintaining ExpressDB and helping me post my microarray data on the web. I ix thank Dan Segre for discussions on biochemical modeling and life in Israel. I thank Jay Shendure for numerous interesting tea room discussions. I thank Priya Sudarsanam for following me from the Winston lab and for her fine example of how to go from a first rate biologist into a first rate computational biologist. I thank Matt Wright for his patient explanation of anything mathematical, physical, or chemical and for our shared, idealistic pursuit of the 'big' problems of science and philosophy. I thank "Dr. Kazu" Yanai for his Japanese restaurant guide and without whose guidance I would never have tasted Shabushabu. I thank Zhou Zhu for being an example of dedication and for her cheerful computer room presence. It was always a pleasure dealing with Cindy Reyes, Isabelle Jacquet, Mary Beth, Eva Marie and Bob Tannis, and they kept the lab and the department running with absolute efficiency. Phil Leder made the Genetics department a great place to be, and as far as I'm concerned, Connie Cepko and the BBS administrators have put together the best Ph.D. program anywhere in the world. I thank the National Science Foundation and the Japanese MEXT for the Monbusho program which allowed me to spend the summer of 2001 in Kyoto, Japan. I thank Minoru Kanehisa for hosting me in his lab at Kyoto 'Daigaku' and Nakao-san and all the members of the Kanehisa lab for their amazing patience in teaching me everything from programming to sushi. My first year Child Hall floormates Joe, Laurie, Jeremiah, Nancy, Darby, Vyjayanthi, and Glen became my Harvard family, and Vyjayanthi, also my future sisterin-law. We grew together, pulled each other through the downs, and had many, many x ups. Chuck and Paras, old high school friends, managed to follow me up to Boston and, in my good fortune, re-inserted themselves into my life. Rutgers University prepared me well for my scientific career and continued to fan the flames of my intellectual curiosity, allowing me to explore philosophy, languages, and foreign cultures - including studying a year abroad in Bristol, England. After graduation, the Fulbright Association awarded me a scholarship to study in Madrid, Spain, where I worked with Manuel Espinosa and Gloria del Solar. In addition to science, I learned to speak Spanish and to see myself more as a citizen of the world. Mr. Kenneth Card, the now fabled 10th (and 12th) grade biology teacher, introduced me to DNA and nurtured my pursuit of knowledge in every way. Mr. Steven Holtzman, my 10th grade English teacher, pushed me to strive for excellence and to search for the deeper meanings of literature, and of life. While still in high school, I was given the extraordinary opportunity to learn cutting-edge molecular biology through a cooperative education program with the research labs of Hoffman-La Roche in Nutley, New Jersey. Under the guidance of Mary Graves and Liberata DeSantis, I poured my first gels and made my first recombinant DNA constructs. This early experience gave me a tremendous clarity of purpose and propelled me into my chosen career. My training was bolstered in later years by summer internships at Merck and again at Hoffman-La Roche, where I continued to grow as a scientist. Without these opportunities, I would not be where I am today. My family has made everything possible. My brother Jeff has been, over the years, my protegé and my role model, my philosophical companion and my friend. My sister Debbie is a great listener and has brought me through many times of doubt. I could xi never express enough gratitude to my parents, who have loved me and taught me so much. I am truly grateful to have met my wife, Rosanna Marlene, in the course of my doctoral work. She is my perfect companion, and I soar with her beside me. Knowing I can share my accomplishments with her makes them especially sweet. xii Chapter 1 - Introduction 1.1 Systems Biology Completion This section will form the basis of an invited review article for Trends in Biotechnology. It has benefited from a number of discussions with many members of the Church lab, most notably Matthew Wright. 1.2 Prokaryotic DNA Microarray Analysis This section describes the motivations for, and development of, experimental and computational tools for E. coli DNA microarray analysis. 1.3 RNA Decay in E. coli This section reviews the current state of knowledge concerning RNA decay in E. coli, and summarizes the contributions made by the data and methods presented in this thesis. 1 1.1 Systems Biology Completion Generation of large-scale biological data, like those described in this thesis, have generated a great deal of excitement in biology. In the post-genomics era, we are reevaluating the ultimate goals of biology and the proper ends to which we should apply our newly-developed tools. A simple story has been told of the basic philosophy of science. It tells of a drunk man in search of his lost keys in the middle of the night. He searches only under light posts, not because they are more likely to be there, but because they are the only places he has a hope of finding them. Scientists too, search under the lamp posts; the questions surrounding us in the darkness may be more interesting, or even more important, but they are beyond the elucidating beam of our experimental methods and must await a new day. This strategy has brought the natural sciences a long way: from Aristotle's passive observations, to Galileo's experimental probings, to our own elaborately contrived and controlled microdissections of nature. But we risk becoming too comfortable searching next to our favorite lamppost and ignoring the flickering of new lights as they come to life around us. The floodlights have recently come on in biology in the form of systematic, quantitative, large-scale experiments with machine-readable outputs. Yes, we can shine them on our favorite genes, but it's clear we can also do far more. It's time to take stock of what has suddenly been illuminated, what is soon-to-be illuminated, and to map the boundary of the semi-darkness for those determined squinters among us. With new tools naturally come new goals. Classical molecular methods forced us to focus our gaze on small numbers of molecules at a time, so we laboriously built up 2 descriptions in human language (predominantly English), pictures, and the occasional video clip. The overarching goal of biology, if there was one, was to compile a large number of systems that are interesting (those that define a general rule, break one, or appeal to us as idiosyncratic human beings) or applicable (those that contribute to the engineering, reverse-engineering, or modification of a system). The defining feature of this "compilation strategy" is that it is more process than a goal. It specifies no endpoint other than continual accumulation. Long reserved for physicists searching for a "theory of everything", the idea of completion has now become pervasive in biology. The extent to which sequencing of complete genomes is taken for granted is well illustrated by a conversation I had with Sydney Brenner in 1998 at the Cold Spring Harbor Genome Meeting. After telling me how his group was almost finished with the sequence of a bacterial species, he realized he had forgotten its name. After a brief moment of embarrassment, he insisted that forgetting which genome one sequenced must be a milestone of some kind or another. Historians of science take note. (I should note myself that I have subsequently been unable to identify which genome he was referring to.) But now that "completion" has entered the biologist's lexicon it raises the questions of where else it rightfully applies and whether it constitutes a new sort of goal for biological inquiry. The proliferation of the "-ome" suffix attests to widespread acceptance that biology is rife with things to be completed, whether it's the proteome, the metabolome, or the physiome. What sort of overarching goal, then, is implied by all these projects? 3 There seem to be two distinct levels of completion. The first, and simpler of the two, is 'parts list completion'. Put most simply, completion at this level is defined as a fraction of observed to total predicted parts. This is well underway, and consists of the various 'ome' projects such as genomes, transcriptomes, and proteomes. The second, more ambitious and less well-defined level of completion, is at the level of 'systems biology', of how the parts work together to form a working biological system. It is systems biology completion (SBC), that I will discuss here. SBC is necessarily model-dependent, requiring specification of a model type and its requisite components. Using a traditional ab initio modeling strategy we would start from a set of rules and, given an initial state, apply them to derive the future states of the system. This approach can be valuable if i) such rules can be discovered, ii) appropriate initial conditions can be stated, and iii) it is practical to calculate future states, at a relevant time resolution, with current computing capacity. An atomic model easily satisfies requirement i, and it may be possible to guess a relevant initial condition for part ii, but it is highly unlikely we will meet requirement iii for the system sizes and timescales relevant in biology. Ordinary differential equation models also have their main difficulties in meeting requirement iii, because their nonlinearity can make them problematic for numerical solvers and because it can be difficult to choose an appropriate time step to capture a wide enough range of biologically relevant timescales while maintaining computability. The goal of modeling may be stated as finding a set of rules which are capable of mapping the space of all possible inputs (Fig. 1, blue area), e.g. descriptions of the cell's environment, to the space of all possible outputs allowed by the cell (Fig. 1, yellow area), 4 e.g. the concentration of all of its RNAs. By large-scale experimental sampling of inputoutput pairs (Fig. 1, yellow-red dots), such as condition-transcriptome pairs, one may be able to derive rules that allow the prediction of outputs for novel inputs (Krupa 2002). The accuracy of these predictions then, would be related to the density with which the input space is sampled, as well as to various properties of the input space itself. Input Rules Output Figure 1. A general schema for modeling as an exercise in mapping input space (blue area), e.g. all possible environments in which a cell can live, to output space (yellow area), e.g. all possible cellular responses. The red-yellow dot pairs represent measured input-output pairs, which, in large numbers, can be used to derive rules (arrows) to predict outputs for novel inputs. We are then forced to consider how to determine when the input space is adequately sampled. In other words, how many measurements, at least to the order of magnitude, would it take to populate the space of all possible inputs (e.g. conditions) with enough measured outputs (e.g. transcriptomes, proteomes, etc.) to make interpolation useful? This is a difficult question, but we can begin by defining what factors would affect our estimate. There are four factors which appear to be important: i) number of cell components, ii) conditions/cell types, iii) the required accuracy of prediction, and iv) the extent to which similar inputs give similar outputs. Firstly, the more components a cell has, such as the number of gene products, the more measurements we need to make. 5 Secondly, the more environments in which a cell is capable of living, the larger the input space; and the more ways a cell is capable of responding, the larger the output space. Larger input and output spaces, of course, require more sampling. Thirdly, the accuracy needed for our model affects the number of measurements needed, because more accurate interpolations require a more densely sampled space. Finally, if nearby points in input space map to nearby points in output space (i.e. the mapping function is relatively smooth) then we do not need to sample as densely. With respect to time, we don't need to sample much more finely than the timescale of the phenomena of interest; with respect to conditions, we don't want to focus all of our measurements in a small region of biological possibility (say, small increments of glucose concentration) because we know the cell response will be largely identical. Likewise, all of our measurements should not be from the same differentiated cell type if we want a general model of cells defined by a genotype. At the extremes of estimates for SBC, a cell which lives in only one environment and never changes needs only one measurement to cover all of input-output space, while a cell which is capable of living in many environments and exhibits a different response to even small environmental changes would need a fine sampling of a very large space, therefore requiring many, many measurements. Of course, we are not completely ignorant about where on this spectrum actual biological systems lie. Cells are not likely to reinvent themselves for slight changes of environment, but instead may rely on a relatively small number of programs which they use in combination to respond to the various natural environments for which they have evolved. In fact, a very simple cell, like Mycoplasma genitalium, may even be an example of a cell with approximately one state, 6 as it seems to lack any transcriptional regulation and lives in an exquisitely controlled environment within its human host (Razin et al. 1998). Large-scale experimental data may be useful in modeling by providing large numbers of constraints, and therefore aid in large-scale determination of the model rules. One can attempt to make large-scale measurements of input-output pairs which uniformly span all of input and output space, and using rules derived from these observed mappings, predict the output for an unmeasured input. For example, we can make separate transcriptome measurements of E. coli after heatshock and after lac induction, and predict what the transcriptome might be for the combination of these two inductions. For orthogonal conditions, the rules may be simply additive, whereas for interdependent conditions the rules will probably be more complicated, perhaps involving intermediate induction or epistasis. Study of these more complicated cases can give us important information about the structure of the network. The choice of a model type is a critical part of any SBC effort as it determines the type of rules which need to be discovered and the number and type of component measurements which need to be made. Table 1 gives examples of several model types. On one end of the spectrum, we can imagine atomic level, or even subatomic level descriptions of a complete cell. While large-scale measurements at this level are not forthcoming in the foreseeable future (and certain measurements impossible even in theory, according to the Heisenberg uncertainty principle) these model types set an upper bound on detail. Towards the lower end of the detail spectrum we have boolean models, which we can build from logical statements such as, "if the lac repressor is bound to the operator then the lac operon is off." 7 Model Scope Applicable Rules Cell c at time t Physics Cell c at time t Chemistry Biomolecu lar (discrete) Cell c at time t Biomolecu lar (statistical) Biochemic ally equivalent cells Biomolecu lar (steadystate) Genetically equivalent cells, similar growth conditions, steady state Genetically equivalent cells Atomic Molecular Boolean Cell Population Equivalent inoculums and culture conditions Model Components Atomic positions & momentums # of Compo nents Examples of Components 12 8 10 -10 13 C position & momentum Small molecule positions & momentums 107-1011 Glucose position & momentum Molecular Mechanics Macromolecule positions & momentums 106-1010 Hexokinase position & momentum Chemical kinetics & thermodynami cs described by differential equations Flux Balance Macromolecule concentrations, compartments 105-107 Molecular fluxes 103-104 Genetic and Metabolic "circuits" Regulons, Pathways Growth kinetics, reproductive fitness Cell growth rates 102-103 100-101 Hexokinase concentration in cytoplasm Flux of Glucose to Glucose-6P Glycolysis "on", Gluconeogenesis "off" # of wild type cells, # of mutant cells Table 1. Examples of hypothetical systems biology projects to be completed, listed from most complex (top) to least (bottom). We can currently collect complete component datasets for some classes of biomolecules at the level of macromolecular concentrations. As we move from more to less detailed models we make certain trade-offs. The more detailed models make fewer assumptions, and are therefore potentially more accurate for the systems they describe. On the other hand, they tend to be more problematic with regard to computability and component measurement, and are therefore difficult to apply to large systems. As we enhance our ability to make large numbers of measurements, we may be able to generate enough input-output pairs, i.e. constraints, to 8 allow SBC using more and more detailed model types. Using order of magnitude component estimates, together with the considerations of input-output space size and sampling discussed previously, we can get a rough idea of the number of measurements which might be needed for SBC of a particular system at a given level of detail. While admittedly rough, such an estimate would represent a conceptual starting point. In the pregenomic era, our sampling of input-output space was far too sparse for most model organisms and model types to warrant a claim of SBC. Component measurements were hard to come by and were acquired by any means necessary: from one-at-a-time extraction from the literature to educated guesses. As large-scale biology proceeds, we are dramatically increasing our capability to accurately sample significant amounts of input-output space. Large-scale RNA half-life measurements, like those described in this thesis, could eventually contribute to SBC of a biomolecular statistical model, in which the concentrations of all biomolecules and their changes with respect to time are incorporated into a set of differential equations. Judicious use of this newlypowerful experimental sampling capability could lead to justified claims of SBC for systems of increasing complexity. 1.2 Prokaryotic DNA microarray analysis While the seeds for microarray technology had been planted long ago (Gillespie and Spiegelman 1965; Grunstein and Hogness 1975; Lennon and Lehrach 1991), it has truly exploded in the last half-decade, and has resulted in a radical change in the landscape of modern biology. When my work on this thesis began in earnest at the 9 beginning of 1998, a search on Pubmed with the keyword "microarray" would have yielded only 7 articles on DNA microarrays. That same search run today (October 2002), yields more than 2,300 articles. Given the rapid pace of recent developments, it is important to put the present work into 'historical' context. DNA microarray analysis was initially developed for gene expression analysis in eukaryotes (Lockhart et al. 1996; Schena et al. 1996). As such, initial RNA labeling protocols were developed to take advantage of the ubiquitous polyA tails of eukaryotic messenger RNAs, which allowed them to be preferentially labeled over the far more abundant ribosomal and transfer RNAs. Prokaryotes, of course, are of central importance in biology, and were of particular interest to us because of their relatively small genomes, which make them potential model organisms for systems biology. We were, therefore, interested in extending microarray analysis to prokaryotes in general, and to the classical model organism Escherichia coli in particular. Thus, our initial contact with Affymetrix involved a collaboration to develop a labeling protocol useful for prokaryotes which included access to newly-designed E. coli oligonucleotide arrays. Development of an RNA labeling protocol (which for the Affymetrix platform generally means biotinylation) proved to be difficult, ultimately taking about 1½ years. Some of the factors which we considered during protocol development were: biotinylation efficiency, cost of the labeling reagent (and the quantity needed), amount of interaction of unincorporated labeling reagent with the array surface, robustness and relative complexity of the protocol, and its generalizability to other prokaryotes. Our initial strategies proved unsuccessful, including several direct chemical RNA labeling methods, polyadenylation with the catalytic subunit of yeast poly(A) polymerase using 10 biotinylated ATP, and polyadenylation followed by the standard Affymetrix labeling protocol (polyT priming, double-stranded cDNA synthesis, followed by T7 in vitro transcription with biotinylated ribonucleotides to create labeled cRNA). These methods typically yielded high fluorescent signal for rRNA and tRNA features, but almost none for mRNAs. A variety of on-chip (i.e. after hybridization) signal amplification methods were also tried unsuccessfully, including on-chip polyadenylation using yeast poly(A) polymerase and biotinylated ATP. The standard Affymetrix staining protocol involves the use of streptavidin-phycoerythrin (streptavidin to bind the biotinylated target nucleic acid, phycoerythrin as a fluorophore). An optional amplification step can be added using a biotinylated anti-streptavidin antibody, followed by another streptavidin-phycoerythrin stain. Iterations of this amplification procedure were explored as a way to increase the signal-to-noise ratio of mRNA probes. I found that although I could get reproducible 2-3 fold increase of technical signal-to-noise (where signal-to-noise ratio is defined as fluorescent intensity divided by the standard deviation of the background), it did not increase the number of mRNAs I was able to detect. Ultimately, I was successful in developing a protocol based on chemical fragmentation of total RNA, single-strand cDNA synthesis using random octamer primers, and 3' biotinylation by terminal deoxytransferase (TdT) using biotinylated dideoxynucleotides. (Use of TdT for the biotinylation gave slightly less signal, but significantly lower chip background, than incorporation of biotinylated nucleotides during the cDNA synthesis step.) The protocol originally required 1 mg of total RNA but was subsequently reduced to ~100 g in our hands, and to ~20 g using a somewhat 11 different random-priming protocol independently developed by Affymetrix (Rosenow et al. 2001). Details of the protocol can be found in the methods section of Chapter 2. Initial attempts to analyze the resulting data with GeneChip software (version 3.2) were problematic and revealed a number of limitations of Affymetrix's software package. First of all, the algorithms for transcript detection and quantitation were developed empirically for eukaryotic transcription analysis and it wasn't clear whether they would perform reliably with the increased noisiness of prokaryotic experiments (due, presumably, to increased cross-hybridization from ribosomal and transfer RNAs). Furthermore, the algorithm was kept secret by Affymetrix, preventing us from assessing or modifying it. Additionally, their metrics were not based on standard statistical methods, making interpretation of the results difficult. A number of other limitations were apparent, including poor annotation and an inability to access data from individual oligos on a large scale. (It should be noted that serious attempts were made to address all of these issues in MAS 5.0, a major re-write of Affymetrix's microarray analysis software.) These considerations led me to write a series of Perl scripts, collectively named Genome Array Processing Software or GAPS, which directly accessed the raw .CEL files generated by GeneChip, and did all subsequent processing in a more flexible and statistically rigorous manner. A detailed survey and explanation of the features of GAPS can be found in Appendix B. At our insistence, we were provided full access to the sequences of the oligonucleotides on the E. coli arrays, despite the fact that, at the time, these sequences were a well-guarded Affymetrix secret. This sequence knowledge ultimately allowed us to develop novel analyses which took full advantage of the tremendous density of oligos, 12 which sampled the genomic sequence, on average, once every 30 bases. We envisioned such sub-genic resolution would allow important biological measurements to be made, such as the identification of transcript boundaries, abortive termination events, and other position-specific features of transcription and RNA degradation. After winning the approval of Affymetrix, we were allowed to release the complete set of E. coli oligos as a supplement to our publication (Selinger et al. 2000) and as part of GAPS, which was the first microarray analysis tool to allow global subgenic-resolution expression analysis. This feature ultimately led to the discovery of a 5' to 3' directionality of RNA decay, described in Chapter 3. This first-ever release of Affymetrix oligo sequence data proved very popular with the scientific community and was shortly followed by the public release of complete sequence information for all Affymetrix chips. I believe this degree of openness is vital for microarray data interpretation, including meta-analysis, quality control, and the development of novel experimental and computational analyses. Although, perhaps, microarray expression analysis of prokaryotes is now taken for granted, the work described in Chapter 2 represents one of the first global RNA expression profiles of E. coli and the first using the Affymetrix platform (Arfin et al. 2000; Khodursky et al. 2000; Richmond et al. 1999; Tao et al. 1999). Additionally, it represents the first RNA expression analysis in any organism to be conducted at subgenic level resolution. Subgenic-resolution expression analysis has more recently been applied to humans (Kapranov et al. 2002; Shoemaker et al. 2001) and is emerging as an important tool for empirical transcription boundary mapping and exon discovery/verification. 1.3 RNA Decay in E. coli 13 Gene expression is controlled on many different levels, including transcription, RNA degradation, translation, or post translation. Steady state gene expression is a result of the combined kinetics of several of these processes. Historically, studies of gene regulation have focused on transcription and translation, with relatively little effort devoted to understanding the mechanisms of RNA degradation. Half-lives of transcripts in E. coli can vary anywhere from 40 seconds to 20 min, suggesting that there may be a significant amount of regulation at the level of RNA stability, and that RNA degradation is not merely a constitutively active salvage pathway (Kushner 2002). Here I present a brief review of the current state of knowledge of RNA decay in E. coli. RNA degradation in E. coli is largely accounted for by three central enzymes: two 3' - 5' exonucleases (RNase II and polynucleotide phosphorylase - PNPase) and a 5'- end dependent endonuclease (RNase E). Transcript cleavage is often observed to occur in a 5' to 3' direction (Bechhofer 1993; Carpousis et al. 1999). It has been proposed that this is due to a rate limiting initial cleavage by RNase E, which is inhibited by 5' stem-loop structures as well as the triphosphate present at the 5' termini of a new transcript (Mackie 1998). Once this initial endonucleolytic cleavage is made, possibly with the aid of additional targeting factors, the rest of the transcript, which now lacks a 5' triphosphate or a protective secondary structure, is rapidly degraded. RNase E cleavage is quickly followed by exonucleolytic digestion in the 3' to 5' direction. Stem loop structures are known to play an important role in the stabilization of transcripts. 5' stem loop structures have the strongest stabilizing effect, accounting for some of the longest lived mRNAs in the cell, and can confer similar stability to 14 transcripts to which they are fused (Chen et al. 1991; Emory et al. 1992; Lopez and Dreyfus 1996). They are thought to confer stability by inhibiting downstream cleavage by RNase E (and possibly other 5' - end dependent nucleases). RNase II and PNPase are both inhibited by stable stem-loops (although RNase II more so), which are often present at the 3' end of transcripts as a result of rho-independent termination (Higgins et al. 1993). Polyadenylation has also been shown to play a role in mRNA degradation (O'Hara et al. 1995). E. coli contains two poly(A) polymerases (PAPI and PAPII). Depending on the gene, anywhere between 2 - 50% of its transcripts will have a poly(A) tail of between 10 and 50 nucleotides. This tail has been proposed to affect mRNA stability differently depending on its context (Sarkar 1996; Sarkar 1997). For transcripts which lack a 3' stem loop structure, polyadenylation acts as a stabilizing factor, presumably by competing with 3' - 5' exonucleases to add instead of remove nucleotides. For transcripts which have a stable stem loop, polyadenylation creates a site which is recognized by the RNA degradosome - a complex which contains RNase E, PNPase, RhlB (an RNA helicase) and enolase (whose function in this complex is unclear). This complex then rapidly degrades the transcript through an unknown mechanism (although given the members of the complex it's not hard to imagine one). The link between translation and mRNA stability has also been investigated (Arnold et al. 1998; Petersen 1993). The assumption is that frequently transiting ribosomes may reduce the accessibility of the transcript to nucleolytic attack. Ribosomes have been found to have a stabilizing effect on transcripts, though the extent of the 15 stabilization varies greatly from transcript to transcript and depends on the mechanism of degradation. The list of players on the mRNA degradation scene is still longer (Ehretsmann et al. 1992; Kushner 1996). Notably missing from the above discussion is RNase III, which cleaves in double-stranded regions and is known to play a role in the degradation of a subset of E. coli transcripts. There are about 20 ribonucleases in all, many of which still await characterization. There is still a tremendous amount to be learned about the mechanisms and players involved in mRNA degradation in E. coli. Analysis of this process on a global scale is likely to yield crucial insights into the genetic regulation of prokaryotes. Importantly, by studying large numbers of RNAs, and the details of their degradation, one can begin to identify common patterns. Bioinformatic analysis, or further experiments, may then help identify features shared by these transcripts which are responsible for their particular mode of degradation. Furthermore, large-scale measurements can help determine whether known degradation mechanisms are general for many transcripts, or specific to the relatively small number of RNAs which have been studied so far. In this fashion, Chapter 3 makes a number of contributions to the study of prokaryotic RNA decay and sets the groundwork for a number of possible future studies. Before the advent of microarray analysis, the degradation of fewer than 25 bacterial RNAs had ever been studied (Bernstein et al. 2002). Here I present measured half-lives for as many as 2,679 mRNAs (Appendix C), representing about 60% of the known and predicted ORFs. Furthermore, I describe the first global positional analysis of RNA 16 degradation, in which it is found that the 5' ends of operons degrade significantly faster than the 3' ends. Groups of operons with similar degradation patterns were identified, allowing mechanistic explanations for their decay to be sought. References Arfin, S.M., A.D. Long, E.T. Ito, L. Tolleri, M.M. Riehle, E.S. Paegle, and G.W. Hatfield. 2000. Global gene expression profiling in Escherichia coli K12. The effects of integration host factor. J Biol Chem 275: 29672-29684. Arnold, T.E., J. Yu, and J.G. Belasco. 1998. mRNA stabilization by the ompA 5' untranslated region: two protective elements hinder distinct pathways for mRNA degradation. Rna 4: 319-330. Bechhofer, D. 1993. 5' mRNA Stabilizers. In Control of Messenger RNA Stability (ed. G.B. Joel Belasco), pp. 31-50. Academic Press, Inc., San Diego. Bernstein, J.A., A.B. Khodursky, P.H. Lin, S. Lin-Chao, and S.N. Cohen. 2002. Global analysis of mRNA decay and abundance in Escherichia coli at single-gene resolution using two-color fluorescent DNA microarrays. Proc Natl Acad Sci U S A 99: 9697-9702. Carpousis, A.J., N.F. Vanzo, and L.C. Raynal. 1999. mRNA degradation. A tale of poly(A) and multiprotein machines. Trends Genet 15: 24-28. Chen, L.H., S.A. Emory, A.L. Bricker, P. Bouvet, and J.G. Belasco. 1991. Structure and function of a bacterial mRNA stabilizer: analysis of the 5' untranslated region of ompA mRNA. J Bacteriol 173: 4578-4586. Ehretsmann, C.P., A.J. Carpousis, and H.M. Krisch. 1992. mRNA degradation in procaryotes. Faseb J 6: 3186-3192. Emory, S.A., P. Bouvet, and J.G. Belasco. 1992. A 5'-terminal stem-loop structure can stabilize mRNA in Escherichia coli. Genes Dev 6: 135-148. Gillespie, D. and S. Spiegelman. 1965. A quantitative assay for DNA-RNA hybrids with DNA immobilized on a membrane. J Mol Biol 12: 829-842. Grunstein, M. and D.S. Hogness. 1975. Colony hybridization: a method for the isolation of cloned DNAs that contain a specific gene. Proc Natl Acad Sci U S A 72: 39613965. Higgins, C., H. Causton, G. Dance, and E. Mudd. 1993. The Role of the 3' End in mRNA Stability and Decay. In Control of Messenger RNA Stability (ed. G.B. Joel Belasco), pp. 13-27. Academic Press, Inc., San Diego. Kapranov, P., S.E. Cawley, J. Drenkow, S. Bekiranov, R.L. Strausberg, S.P. Fodor, and T.R. Gingeras. 2002. Large-scale transcriptional activity in chromosomes 21 and 22. Science 296: 916-919. Khodursky, A.B., B.J. Peter, N.R. Cozzarelli, D. Botstein, P.O. Brown, and C. Yanofsky. 2000. DNA microarray analysis of gene expression in response to physiological and genetic changes that affect tryptophan metabolism in Escherichia coli. Proc Natl Acad Sci U S A 97: 12170-12175. 17 Krupa, B. 2002. On the Number of Experiments Required to Find the Causal Structure of Complex Systems. J Theor Biol 219: 257-267. Kushner, S. 1996. mRNA Decay. In Escherichia coli and Salmonella (ed. F. Neidhardt), pp. 851-858. ASM Press, Washington. Kushner, S.R. 2002. mRNA decay in Escherichia coli comes of age. J Bacteriol 184: 4658-4665; discussion 4657. Lennon, G.G. and H. Lehrach. 1991. Hybridization analyses of arrayed cDNA libraries. Trends Genet 7: 314-317. Lockhart, D.J., H. Dong, M.C. Byrne, M.T. Follettie, M.V. Gallo, M.S. Chee, M. Mittmann, C. Wang, M. Kobayashi, H. Horton, and E.L. Brown. 1996. Expression monitoring by hybridization to high-density oligonucleotide arrays. Nat Biotechnol 14: 1675-1680. Lopez, P.J. and M. Dreyfus. 1996. The lacZ mRNA can be stabilised by the T7 late mRNA leader in E coli. Biochimie 78: 408-415. Mackie, G.A. 1998. Ribonuclease E is a 5'-end-dependent endonuclease. Nature 395: 720-723. O'Hara, E.B., J.A. Chekanova, C.A. Ingle, Z.R. Kushner, E. Peters, and S.R. Kushner. 1995. Polyadenylylation helps regulate mRNA decay in Escherichia coli. Proc Natl Acad Sci U S A 92: 1807-1811. Petersen, C. 1993. Translation and mRNA Stability in Bacteria: A Complex Relationship. In Control of Messenger RNA Stability (ed. G.B. Joel Belasco), pp. 117-141. Academic Press, Inc., San Diego. Razin, S., D. Yogev, and Y. Naot. 1998. Molecular biology and pathogenicity of mycoplasmas. Microbiol Mol Biol Rev 62: 1094-1156. Richmond, C.S., J.D. Glasner, R. Mau, H. Jin, and F.R. Blattner. 1999. Genome-wide expression profiling in Escherichia coli K-12. Nucleic Acids Res 27: 3821-3835. Rosenow, C., R.M. Saxena, M. Durst, and T.R. Gingeras. 2001. Prokaryotic RNA preparation methods useful for high density array analysis: comparison of two approaches. Nucleic Acids Res 29: E112. Sarkar, N. 1996. Polyadenylation of mRNA in bacteria. Microbiology 142 ( Pt 11): 31253133. Sarkar, N. 1997. Polyadenylation of mRNA in prokaryotes. Annu Rev Biochem 66: 173197. Schena, M., D. Shalon, R. Heller, A. Chai, P.O. Brown, and R.W. Davis. 1996. Parallel human genome analysis: microarray-based expression monitoring of 1000 genes. Proc Natl Acad Sci U S A 93: 10614-10619. Selinger, D.W., K.J. Cheung, R. Mei, E.M. Johansson, C.S. Richmond, F.R. Blattner, D.J. Lockhart, and G.M. Church. 2000. RNA expression analysis using a 30 base pair resolution Escherichia coli genome array. Nat Biotechnol 18: 1262-1268. Shoemaker, D.D., E.E. Schadt, C.D. Armour, Y.D. He, P. Garrett-Engele, P.D. McDonagh, P.M. Loerch, A. Leonardson, P.Y. Lum, G. Cavet, L.F. Wu, S.J. Altschuler, S. Edwards, J. King, J.S. Tsang, G. Schimmack, J.M. Schelter, J. Koch, M. Ziman, M.J. Marton, B. Li, P. Cundiff, T. Ward, J. Castle, M. Krolewski, M.R. Meyer, M. Mao, J. Burchard, M.J. Kidd, H. Dai, J.W. Phillips, P.S. Linsley, R. Stoughton, S. Scherer, and M.S. Boguski. 2001. Experimental 18 annotation of the human genome using microarray technology. Nature 409: 922927. Tao, H., C. Bausch, C. Richmond, F.R. Blattner, and T. Conway. 1999. Functional genomics: expression analysis of Escherichia coli growing on minimal and rich media. J Bacteriol 181: 6425-6440. 19 Chapter 2 RNA expression analysis using a 30 base pair resolution Escherichia coli genome array Douglas W. Selinger, Kevin J. Cheung, Rui Mei, Eric M. Johansson, Craig S. Richmond, Frederick R. Blattner, David J. Lockhart, and George M. Church As published in Nature Biotechnology 18(12): 1262-68 (2000). 20 A high resolution ‘genome array’ has been developed for the study of gene expression and regulation in Escherichia coli. This array contains on average one 25-mer oligonucleotide probe per 30 base pairs over the entire genome, with one every 6 bases for the intergenic regions and every 60 bases for the 4,290 open reading frames (ORFs). Two-fold concentration differences can be detected at levels as low as 0.2 mRNA copies per cell, and differences can be seen over a dynamic range of 3 orders of magnitude. In rich medium we detected transcripts for 97% and 87% of the ORFs in stationary and log phases, respectively. 1,529 transcripts were found to be differentially expressed under these conditions. As expected, genes involved in translation were expressed at higher levels in log phase, whereas many genes known to be involved in the starvation response were expressed at higher levels in stationary phase. Many novel growth-phase regulated genes were identified, such as a putative receptor (b0836) and a 30S ribosomal protein subunit (S22), both of which are highly upregulated in stationary phase. Transcription of between 3,000 and 4,000 predicted ORFs was observed from the antisense strand, suggesting most of the genome is transcribed at a detectable level. Examples are also presented for high resolution array analysis of transcript start and stop sites and RNA secondary structure. Keywords: E. coli, stationary phase, gene expression, functional genomics, DNA chips, oligonucleotide arrays, microarrays 21 The ability to simultaneously measure RNA abundance for large numbers of genes has revolutionized biological research by allowing the analysis of global gene expression patterns. Oligonucleotide arrays have been used to examine differential gene expression in many organisms, including yeast, human, mouse, and bacteria1-5. Various analytical approaches have been developed and applied to these datasets to further characterize transcriptional regulation and the connectivity of genetic networks6-10. Global gene expression analyses in prokaryotes have lagged behind those in eukaryotes in part because of the lack of polyadenylation of prokaryotic mRNA, which has thwarted separation or selective labeling of mRNA in the presence of the much more abundant tRNA and rRNA1, 11-13. We describe here a ‘genome array,’ on which both coding and non-coding regions of the Escherichia coli genome are represented, and describe a genome-wide analysis of RNA at sub-transcript level resolution. A labeling protocol was developed based on random priming of total RNA which is reproducible, quantitative over 3 orders of magnitude, and sufficiently sensitive to detect as few as 0.2 copies per cell. When used to compare gene expression in log versus stationary phase, this method yields results which both agree with the literature and identify novel sets of co-regulated genes. We also present evidence that sub-transcript level resolution paired with complete genomic representation of E. coli on the array allows for analysis of operon structure, identification of small RNAs and antisense RNAs, and some aspects of RNA secondary structure. Results and discussion 22 Array design. The array consists of a 544 by 544 grid of 24 x 24 micron regions that each contain ~107 copies of selected 25-mer oligonucleotides (295,936 total) of defined sequence. The oligonucleotides on the array are synthesized in situ on a derivatized glass surface using a combination of photolithography and combinatorial chemistry2, 14. Probe oligonucleotides are arranged in pairs, or probe pairs, one of which is perfectly complementary to the target sequence (the perfect match, or PM oligonucleotide) and one with a single base mismatch at the central position (the mismatch, or MM oligonucleotide) which serves as a control for nonspecific hybridization. Oligonucleotides on the array are further organized into groups, or probe sets, which are complementary to different regions of the same putative transcript. Probe sets are present for 4,403 'b-numbers', which include all 4,290 predicted ORFs15, as well as all rRNAs, and tRNAs. Both strands of intergenic regions at least 40 bp in length are represented whereas only the strand predicted to be transcribed is represented for the ORFs. Most probe sets have 15 probe pairs, although certain selected RNAs, such as lpp and Bacillus subtilis control transcripts have 60 or more. Oligonucleotides are arranged in alternating rows of PM and MM features (Fig. 1). The top half of the array contains oligonucleotides targeting ORFs and miscellaneous untranslated RNAs, and the bottom half targets intergenic regions. The extreme bottom has probes for tRNAs and rRNAs. A biotinylated control oligonucleotide is added to the hybridization mixture and binds to the checkerboard border, corners, the AFFX-E COLI1 logo, and 100 pairs of features in a regularly spaced grid across the array. These patterns are used for grid alignment and to correct for spatial variations in array brightness (see Experimental Protocols). 23 Choice of a metric for RNA abundance. Signals from the 15 probe pairs in each probe set must be quantitated and combined into a measure of RNA concentration. The significant systematic differences in signal within a probe set for a given RNA led us to investigate metrics which used different regions of the signal distribution, in addition to the previously reported "average difference" metric2, 3, or AD, which uses the mean of all PM-MMs after outliers are discarded. When probe pairs of the probe sets were ranked by intensity difference (PM-MM), and probe pairs of different ranks were used to represent the entire probe set, we found that the number of genes detected increased as brighter probe pairs were used. An exception was the brightest probe pair, which gave fewer detected transcripts because of the high variability of the maximal probe pair of the negative controls. Transcripts were considered detected if the probe pair intensity difference of a given rank was at least 3 standard deviations above the mean of probe pairs of the same rank taken from control probe sets for which no transcript was present (see Experimental Protocols). Using the second maximal probe pair, 87% of the ORFs were detected in log phase, compared to 23% for the maximal and 70% for the third maximal. The use of the second maximal signal also led to the detection of more RNAs than measures of central tendency such as the median intensity (20%) and AD (18%). We therefore chose to use the second maximal probe pair intensity, or '2max', as a metric for RNA abundance. The three metrics investigated: 2max, the median, and AD, had a sensitivity of less than 0.2 copies/cell, were approximately linear for relative changes less than 10-fold, nonlinear over a dynamic range of 3 orders of magnitude, and were about equally precise 24 (R ≈ 0.94)(Fig. 2). The lowest concentration of RNA for which a 2-fold concentration could be detected was a change from 0.2 to 0.4 copies per cell, which was called significant in 4/4 probe sets with an average measured fold change of 1.65 +/- 0.35. We detected spiked RNAs from 100% (12/12) of probe sets at 0.2 copies/cell and 25% (2/8) at 0.02 copies/cell. Stationary Phase vs. Log Phase Expression Analysis. We compared the expression profiles of cells grown in rich media (LB) to either mid-log phase (OD600 = 0.6) in a fermentor or to late stationary phase in an overnight shaken culture. As expected, log phase cells showed increased RNA levels for genes involved in protein synthesis (rRNAs, tRNAs, and ribosomal proteins) and cell membrane synthesis (lpp) while stationary phase cells showed increases in stress/starvation response genes such as dps and rmf. Of 69 genes known to be differentially regulated in stationary phase16, 22 of these were called significantly changed in agreement with the literature (Table 1). One gene, rpoH, which is known to be regulated post-transcriptionally17, was called significantly changed in the reverse direction from that reported. The remaining 46 were not significantly changed. Some discrepancies and apparent "missed" changes are expected because most of the changes reported in the literature were detected at the protein level (usually by activity of lacZ fusions) and the correlation between gene transcript levels and protein product activity is expected to be imperfect. A notable transcript which was not called changed is the gene for the stationary phase sigma factor, rpoS. This is expected because the transcript is known to peak in early stationary phase and decrease thereafter, and therefore may not be significantly elevated by late stationary 25 phase. RpoS is also known to be regulated at the level of translation and protein stability18. However, the mRNA levels of 16 genes known to be rpoS regulated are increased in stationary phase, suggesting that rpoS activity has, in fact, increased. Altogether, there were 1,529 RNAs (including tRNAs and rRNAs) whose abundance significantly changed (see Experimental Protocol), which represents about 35% of the putative 4,403 RNAs in the genome. 926 were increased in stationary phase and 603 were decreased. Of these, 77% were changed by more than 2-fold. It is unclear how many of these changes have biological significance and whether the size of the absolute change (copies per cell) or relative change is more important in the regulation of genetic networks, although it is likely to be gene- and condition-dependent. For genes with post-transcriptional regulation, changes in transcript level may have little effect on the final activity of the gene product. Still, the sheer number of changes detected suggests there are many transcriptionally regulated genes important for adaptation to stationary phase, or stresses in general, which have previously gone unrecognized. It is interesting to note that of the 25 RNAs most increased in stationary phase (ranked by absolute change), 14 are genes of unknown function (Table 2). This includes a gene (b0836), annotated as a putative receptor19, which is measured to increase in stationary phase by more than 1000-fold and 30S ribosomal protein subunit S22 which increases 48-fold. Also found in the top 10 most increased in stationary phase are yjbJ, hdeA, and dps whose protein products were reported to be the first, sixth, and fifth most abundant in stationary phase, respectively20. Of the 10 genes of "known" function, only 3 were already known to be increased in stationary phase. The complete results of this analysis are in an expression database21, 22. 26 Novel Applications of a Genome Array: Identification of Small and Antisense RNAs Inclusion of probes for predicted intergenic regions allows genome-wide scanning for previously unidentified RNAs (Fig. 3). csrB, a small (360 bases) untranslated RNA which is known to be abundant in stationary phase23 but was not present in our annotation database was easily detected by probes targeting the region between loci b2793 and b2792. Genome arrays made by in situ synthesis of oligonucleotides also present an opportunity for the identification of antisense RNAs. By simply inverting the synthesis, a complementary array can be synthesized which contains probes that will bind to antisense RNAs24. Hybridization of a stationary phase sample to such a reverse complement chip resulted in the detection of antisense transcription of between 3,000 and 4,000 predicted ORFs, suggesting that there is a low level of transcription throughout the E. coli genome. The physiological significance of this transcription is unclear. An example of a detected antisense RNA is b1365 (Fig. 3B), a predicted ORF located in the Rac prophage. This transcript may be from an overlapping gene encoded on the opposite strand, a common occurrence in phage and viruses. Alternatively, it could result from read-through transcription of an upstream IS5 insertion. Consistent with this is the detection of IS5 transcription as well as antisense transcripts for the intervening ORFs, b1366 - b1369. It is important to note that transcription at a given locus may be part of a long 5' or 3' UTR, a spacer within an operon, an untranslated RNA, an ORF, or the result of an incorrectly predicted ORF start or stop site. The ability to establish transcript start and stops would aid in the interpretation of these RNAs, and is discussed in the next section. 27 Sub-transcript resolution The large number of oligonucleotides (295,936) on the array allowed transcripts to be probed at high resolution. Intergenic regions were probed, on average, every 6 bases whereas ORFs, and known RNAs were probed on average every 60 bases. This makes it possible to obtain reasonably high-resolution information on transcript starts and stops and operon structure. Analysis of oligonucleotide probes for selected transcripts revealed a large amount of intensity variation across the probes within a probe set, but also a striking consistency to the patterns (Fig. 4). A highly reproducible pattern was seen for all probe sets inspected. The intensity variation is likely due to sequence-dependent differences in hybridization affinity and accessibility and to the effects of secondary structure on hybridization. The similarity of the pattern obtained using RNA samples labeled by random primers and genomic DNA labeled directly with terminal transferase, suggests that the pattern is not a result of variations in priming or labeling efficiency. The signal pattern correlates well with regions of experimentally confirmed RNA secondary structure, such as the ompA 5' stem-loop25 (data not shown), but poorly with G/C content or hypothetical hairpin formation of the probe oligonucleotides26, 27. It is currently being investigated whether the signal is correlated with other predicted local RNA secondary structures. It has been shown that secondary structure can strongly affect oligonucleotide hybridization24, 28. Locations of known secondary structures in the lpp and rpsO 3' UTRs are highlighted in figure 4. It must be noted, however, that lack of signal may indicate 28 early transcription termination. Signal from flanking regions and/or independent information about transcription starts and stops can be used to rule out this possibility. Analysis of transcription in predicted intergenic regions allows 5' and 3' UTRs to be mapped. Transcriptional start and stops derived from array data for lpp and rpsO (Fig. 4) agree well with those determined with other methods. Lpp is known to be transcribed from -33 to 284, ending in a hairpin29, 30, and rpsO starting from -100 and continuing through a 3' stem-loop structure into pnp, with which it is co-transcribed31. To map transcription endpoints with the array, the ability of each oligonucleotide to hybridize to its target was determined. Oligonucleotides were considered 'reliable' if, when hybridized to genomic DNA, their intensity difference (PM-MM) was at least 3 standard deviations above noise. Oligonucleotides below this cut-off are referred to as 'unreliable'. Transcription was considered detectable at positions which had reliable oligonucleotides if the mean intensity difference at that position was greater than its standard deviation. Signal from lpp was detected starting between oligonucleotides centered at positions -30 and -37 and can be detected until the last reliable probe at position 250. The probes from 274 to 284 are unreliable and correspond to the location of a known hairpin. Transcription of rpsO is first detected at position -94 and begins no earlier than -117, the first reliable oligonucleotide for which no transcription is detected. RpsO transcription is detected, albeit irregularly, throughout the 3' UTR, where it presumably continues into pnp. Probes for pnp, however, are located only at the 3' end of the ORF so this continuation was not directly observed. RpsO and pnp are co-transcribed and contain a structured attenuator sequence between them which causes a high frequency of rho-independent termination before the 29 pnp coding region. This structured region also serves as a 3' stabilizer for rpsO and a 5' stabilizer for pnp and is targeted by RNaseE and RNaseIII which lead to rapid degradation of both rpsO and pnp RNAs32, 33. RpsO was seen to increase 400-fold in log phase, the largest relative fold increase in log phase, whereas pnp showed no change. Interestingly, the oligonucleotide hybridization pattern shows some differences between log and stationary phase toward the 3' end of rpsO (Fig. 4B). This region is between two known RNaseIII sites and is increased in stationary phase relative to the other probe pairs in the probe set, perhaps indicating that RNaseIII processing at this site is increased in stationary phase, leading to a decrease in local RNA secondary structure and increased hybridization to the array. Oligonucleotide Arrays and Cross-Hybridization. Considerably more crosshybridization is observed on E. coli arrays than on eukaryotic arrays, presumably because of the presence of large amounts of labeled rRNA and tRNA. Because perfect match (PM) features are tiled immediately above their mismatch (MM) counterparts, PM and MM features of equal intensity appear as rectangles in the image. These can be seen throughout the array images (Figs 1B-D). If the MM feature were not used, a large number of cross-hybridizing PM oligonucleotides would be included in the analysis and increase the noise of the system. The combination of MM signal subtraction and removal of outliers has proven effective in quantifying RNA abundance changes with oligonucleotide arrays2. We considered using MM features to identify cross-hybridizing PM features, discarding them, and then using the raw PM intensities of the remaining features to derive abundance measures. Our preliminary analysis suggested that this 30 approach yields results similar to those using PM-MM, so we did not pursue this line further. The Future of Genome Arrays. The noise present in a high complexity hybridization reaction, encourages use of increased statistical rigor to determine the significance of probe signal patterns. Corrections for systematic noise due to cross-hybridization, variability in probe efficiency, and spatial variability across the array surface can be used to increase the sensitivity and precision of the data. Because of the complexity of the factors influencing array signal, internal negative controls, such as probe sets which target RNAs that are not present, may be the best way to estimate the amount of signal which can be expected from all factors besides specific hybridization. Replicate array expression experiments, in combination with array hybridizations of genomic DNA, can be used to extract information from single oligonucleotides, allowing transcripts to be mapped at high resolution. The ability to interpret genome-wide transcription data at 10 100 base pair resolution has many potential applications for the study of gene regulation in both prokaryotes and eukaryotes, including identification of alternative promoters, and the ability to experimentally identify regions of transcription that are missed by ORFpredicting algorithms, a problem which is becoming more urgent as annotators deal with the difficult task of predicting genes in higher eukaryotic genomes34. There are a number of advantages of arrays which use short single-stranded probes over those which utilize longer double stranded DNAs35, 36. These advantages include higher resolution, better cross-hybridization controls, potential for paralog discrimination, splice variant identification, and strand-specific transcript detection. 31 DNA arrays with probes covering entire genomes, rather than just ORFs, are a logical step in the evolution of arrays. Inclusion of intergenic regions allows arrays to be used as readouts for techniques which enrich for DNA sequences of interest, such as proteinbound sequences using Whole-Genome In vivo Methylase Protection37 or ChIP (Chromatin Immuno-Precipitation)38, 39. If they are double stranded they could be used as a direct in vitro assay of DNA-protein interactions40. Genome arrays should also be useful for genotyping both ORF and promoter sequences41, 42. Integration of these data into an understanding of genetic networks and cell physiology will remain a central challenge in the post-genomic era. Experimental protocol Cell Culture. E. coli MG1655 was grown to mid-log phase in LB in a fermentor at 37 degrees with constant aeration of 11 liters/min and agitation of 300 rpm. Stationary phase cultures were grown at 37 degrees overnight in culture flasks containing LB aerated by shaking at 225 rpm. Samples were taken in duplicate for the log phase culture and sampled once from the stationary phase culture. Each log phase duplicate was labeled once and the single stationary phase RNA was labeled twice independently. RNA Preparation. RNA was prepared by extraction with acid phenol:chloroform extraction. Briefly, samples of culture were transferred directly into acid phenol:chloroform,5:1 (Ambion, Austin, TX) at 65º C to ensure rapid lysis and inactivation of RNAses. Two additional acid phenol:chloroform extraction were performed, followed by ethanol precipitation, treatment with 1.25 U of DNase I (Gibco 32 BRL) per ml of culture, 20 g proteinase K (Boehringer Mannheim, Mannheim, Germany) per ml of culture, and a final ethanol precipitation. The pellet was then washed with 70% ethanol, resuspended in DEPC-treated water, quantified by A260, and visualized on a denaturing polyacrylamide gel. We subsequently found that contaminating salts and sugars from the media were inhibiting the reverse transcription reaction used to make labeled cDNA. The yield was dramatically improved (see below) by removing salts and sugars after the first precipitation by three passes through Centricon PL-20 concentrator columns (Centricon, Beverly, MA), which have a cut-off about 30 bases, and diluting the concentrate with DEPC water. cDNA synthesis, biotinylation. The protocol currently supported by Affymetrix for prokaryotic expression analysis was not available at the time of this study, and limited direct comparison has been made with the protocol used here. In our labeling protocol 1.5 mg* of total RNA was fragmented in a high Mg2+ buffer (40 mM Tris-acetate, pH 8.1, 100 mM KOAc, 30 mM MgOAc) at 94º C. for 30 min in the presence of random octamers (6.7 mM) and 4 control RNAs generated by in vitro transcription (B. subtilis dapB, thrB, lysA, and pheB). After fragmentation the sample was put immediately on ice. The reaction was then diluted two-fold into the following reverse transcription reaction: 1X Superscript II buffer, dNTPs (1.3 mM), DTT (10 mM), 3,000 units of Superscript II Reverse Transcriptase (Gibco BRL) which was incubated at 42º C for 3 hrs. RNA was then degraded by treatment with 135 units of RNAse One (Promega, Madison, WI). RNase One was then heat inactivated and unincorporated nucleotides and random octamers were removed by Centrisep Spin Columns (Princeton Separations, Adelphia, 33 NJ). This reaction typically yields ~30 g first strand cDNA. 10 g was then biotinylated with 30 units of Terminal Deoxynucleotidyl Transferase (Gibco BRL) and 50 micromolar Biotin-N6-ddATP (Dupont NEN, Boston, MA) in 1X One-Phor-All buffer (Pharmacia, Piscataway, NJ) and incubated at 37º C for 2 hrs. Genomic DNA was fragmented with DNaseI (Promega) 1.1 U per g of DNA in 1X One-Phor-All buffer to an average size of 100 bp and then biotinylated with TdT as above. 10 g of biotinylated cDNA or gDNA was then hybridized to an E. coli array (Affymetrix, Santa Clara, CA) at 45º C for 40 hours, washed, and stained with streptavidin-phycoerythrin (Molecular Probes, Eugene, OR). Arrays used for expression analysis are denoted "antisense" by Affymetrix because they contain probes which will bind to the reverse complement of the transcript, e.g. cDNA, whereas "sense" arrays (Part# 900284) will bind to the transcripts themselves. Antisense arrays are not yet commercially available. It should be noted, however, that the commercially available sense chips can be used to analyze both strands: Affymetrix's RNA labeling protocol can be used for expression analysis, and our cDNA labeling protocol for reverse complement analysis. In this article, we refer to antisense arrays as "expression arrays" and sense as "reverse complement arrays". Most arrays were scanned after a single staining, but one stationary phase array and the reverse complement array were signal amplified with a biotinylated anti-streptavidin antibody, followed by a second streptavidin-phycoerythrin staining, according to standard Affymetrix protocols. This amplification increased the signal/noise ratio about 2 to 3fold, but did not result in a significant increase in the number of transcripts detected. The array was then scanned by a HP-Affymetrix array scanner. 34 *Note: 50 g of column-purified total RNA (RNA preparation section) yielded >10 g of cDNA, enough for an array hybridization. Taking into account a 67% loss from the Centricon columns, 150 g of RNA from a phenol:chloroform prep is enough for an array experiment. This hybridization sample can be recovered and re-used at least 3 times without significant loss of signal3. The use of Centricon columns caused no noticeable changes in the nature of the resulting array data. Data processing and normalization. Background was determined using GeneChip 3.2, which divides the array into 16 sectors and takes the average of the lowest 2% of features of each sector. After background subtraction, mismatch features were subtracted from perfect match features, and the resulting difference was multiplied by a scaling factor derived from GeneChip software. For spiked control RNAs the scaling factor was derived from setting the 16S ribosomal mean average differences to 50,000. For the log vs. stationary phase analysis, intensities were scaled so that the mean average difference for all probe sets was 5,000 units. All array analyses after the derivation of background and scaling factors were done with a set of Perl scripts which we have dubbed "Genome Array Processing Software" or "GAPS". GAPS takes ".CEL" files, generated by GeneChip, as input. GAPS and the .CEL files used in this study can be found at Express DB22. The array contains a regularly spaced 10 x 10 grid of control feature pairs which all hybridize to the same control oligonucleotide, and should thus be of equal intensity. However, we found that fluorescence intensity of these features typically varied about 23 fold across the surface of the array, possibly because of local differences in 35 washing/staining efficiencies. To correct for this spatial variation, the control grid was used to estimate local deviations in florescence intensity. First, each pair of controls were averaged. Then experimental features were multiplied by a correction factor which is derived from control features representing the relative brightness of the region. Control features closer to the probe pair contributed more to the final correction factor than distant ones. This correction factor was determined by the following equation: c Correction Factor = 1 4 di 4 1 ci i 1 j 1 dj where di or j is the Euclidean distance from the PM feature to the 4 closest control features, ci is the intensity of control feature i, and c is the mean of all control features on the array. RNA abundance metrics: average difference and 2max. Five control RNAs from Bacillus subtilis which each have 4 probe sets each on the array were analyzed at concentrations which ranged from ~20 to ~0.0002 copies/cell, and no RNA, which served as a negative control. These control RNAs were spiked into total cellular RNA before labeling. There were a total of 100 independent pairwise comparisons made. Copies/cell was estimated by assuming cells have approximately 60 femtograms of total RNA43. Copies per cell can be recalculated for different total RNA contents, which normally ranges from 20 to 200 femtograms/cell. For example, 1 copy per cell in a cell with 60 femtograms of total RNA is equivalent to 2 copies per cell in a cell with 120. The 36 average transcript size of our spiked RNAs was 4.6 kb. Probe pairs were averaged over duplicates and then ranked by their mean intensity difference (PM-MM). The total intensity normalized values reported in the tables and the online datafile are approximately 90% of the ribosomal normalized values of Figure 2A. The relationship between fluorescent signal and copies/cell is given by the equations of the regression lines of figure 2A: 2max Signal = 13000 * ln(Copies/Cell) + 39000, R2 = 0.76 Median Signal = 5500 * ln(Copies/Cell) + 16000, R2 = 0.80 Average Difference Signal = 6000 * ln(Copies/Cell) + 18000, R2 = 0.86 Conversions from fluorescence intensity and copies/cell should be used with extreme caution. In addition to cell-size issues noted above, there is a significant amount of error introduced by the large variability of probe signal, such that probes whose target RNA is present at equal concentration will have variable raw fluorescence intensity (see Fig. 2A). Experiments are in progress to use a hybridization of genomic DNA (where all genes are equimolar) to calibrate this conversion and allow more accurate measurement of absolute RNA levels. For the purposes of this study, we focus on the change in fluorescence of identical probe sets (thus bypassing inherent variability between different probe sets) and report "absolute change" and "fold change" (Tables 1, 2) rather than absolute RNA levels. We found that by using the intensity difference of the second maximal probe pair to represent a probe set we maximized the number of detected genes. We therefore chose 37 the second maximal probe pair intensity difference, or "2max", as a measure of RNA abundance. Using Excel, an exponential trend line was fit to a plot of observed vs. expected fold change, and the equation was used to calibrate estimates of fold change in our stationary vs. log expression comparison (Fig. 2B). The calibration equation is as follows: calibrated fold change = 1.2 x (measured fold change)1.9. Pairwise comparisons of the 2max of the same probe sets on duplicate arrays yielded an average linear correlation coefficient of 0.85 +/- 0.04. Transcript detection. To determine which transcripts were detected, we used a set of 4 distinct Bacillus subtilis probe sets whose target RNA was not used in our spiking experiments. After normalization to total intensity we determined the average 2max of these probe sets on the arrays used in the stationary vs. log comparison. Transcripts were considered detected if their 2max was at least 3 standard deviations above the mean of the 4 probe sets for the absent B. subtilis RNA. 97% and 87% of transcripts were detected in stationary and log phase respectively. 1.7% were not detected in either condition. Because the negative controls were used to determine the detection threshold, they could not be used to estimate false positives. The false positive rates for the 2max and median metrics, therefore, were estimated by using probe sets whose RNAs were spiked at 0.004 copies/cell or less, well below the sensitivity of the assay. These metrics both yielded a false positive rate of 0% (0/20) by this method. For the average difference metric, detection is decided by Affymetrix's calling algorithm which works independently of internal negative controls. We therefore used the negative controls to estimate the false positive rate, which was also 0% (0/15). The parameters used in Affymetrix's software 38 package, GeneChip 3.2, were the following: SDT multiplier = 4, ratio threshold = 1.5, ratio limit = 10, horizontal zones = 4, vertical zones = 4, % background cells = 2, pos/neg min = 3, pos/neg max = 4, pos ratio min = 0.33, pos ratio max = 0.43, avg. log ratio min = 0.9, avg. log ratio max = 1.3. It is important to note that 2max does not detect the maximal number of transcripts in every experiment. The maximum number of transcripts (4,033) on the reverse complement array was detected using the fourth brightest probe pair, or "4max". Averaging the 4th through 8th ranks "4-8max", which represented the peak of detection, gave 3,470 detected transcripts (78% of predicted RNAs). In this case 20 B. subtilis probe sets were used as negative controls, with a detection cutoff of 3 standard deviations above the mean. Widespread detection of transcription in E. coli with a reverse complement array has been confirmed in our lab on an independent RNA sample using the current Affymetrix labeling protocol in which 4,344 transcripts were detected (99% of predicted RNAs) using 4-8max (Daniel Janse, unpublished data). The agreement is particularly striking considering the many differences between our original experiment and the confirmation experiment, which were, respectively: biotinylated total cDNA vs. mRNA-enriched biotinylated RNA, antisense vs. sense chip, and stationary phase vs. log. phase RNA samples. Both protocols include a DNaseI digestion to remove genomic DNA and no gDNA contamination was detected by EtBr staining. Significance of changes. To determine which changes in 2max were significant, we devised a calling algorithm which uses both a t-test and a consensus measure. If either of the following criteria are fulfilled for transcripts which were detected in at least one 39 condition, the transcript is called significantly changed: i) mean 2max from duplicates is determined to be significantly different in the two conditions by a two tailed Student's ttest with >95% confidence or ii) after discarding the brightest and dimmest probe pairs, at least 11/13 of the remaining probe pairs are all changed in the same direction, by any amount. For transcripts with >15 probe pairs, the 15 brightest were identified and processed in the same way as the other probe sets. In the rare cases in which these two criteria conflicted, the decision based on the second maximal probe pair was used. It is important to note that the magnitude of the fold or absolute changes are not considered in deciding their significance, although 77% of the significant changes were greater than 2fold. Out of 100 independent pairwise comparisons, 52 were detected in at least one condition. The algorithm correctly assigned significant changes to all 52 of these probe sets, all of which had fold changes of at least 2-fold. Probe sets for control RNAs spiked at equal concentrations showed no significant changes (0/16). Acknowledgements We thank Jeremy Edwards for improvements to the labeling protocol, Daniel Janse for sharing unpublished data, Adnan Derti and Allegra Petti for bioinformatics contributions, Felix Lam for help with the fermentor, Michael Mittmann for array design, Phillip Juels for impeccable computer tech support, Wayne Rindone and John Aach for expression database support, Barak Cohen, Robi Mitra, Martha Bulyk, Pete Estep, Martin Steffen, and the rest of the Church lab for the many helpful discussions and encouragement which made this work possible. We also thank the reviewers for significant improvements to 40 the manuscript. This work was supported by grants from Aventis Pharma, Lipper Foundation, DOE and NSF. ________________________________________________________________________ 1. de Saizieu, A., et al. Bacterial transcript imaging by hybridization of total RNA to oligonucleotide arrays. Nat. Biotechnol. 16, 45-8 (1998). 2. Lockhart, D.J., et al. Expression monitoring by hybridization to high-density oligonucleotide arrays. Nat. Biotechnol. 14, 1675-80 (1996). 3. Wodicka, L., et al. Genome-wide expression monitoring in Saccharomyces cerevisiae. Nat. Biotechnol. 15, 1359-67 (1997). 4. Lee, C.K., et al. Gene expression profile of aging and its retardation by caloric restriction. Science 285, 1390-3 (1999). 5. Zhu, H., et al. Cellular gene expression altered by human cytomegalovirus: global monitoring with oligonucleotide arrays. Proc. Natl. Acad. Sci. USA 95, 14470-5 (1998). 6. Wen, X., et al. Large-scale temporal gene expression mapping of central nervous system development. Proc. Natl. Acad. Sci. USA 95, 334-9 (1998). 7. Roth, F.P., et al. Finding DNA regulatory motifs within unaligned noncoding sequences clustered by whole-genome mRNA quantitation. Nat. Biotechnol. 16, 939-45 (1998). 8. Tavazoie, S., et al. Systematic determination of genetic network architecture. Nat. Genet. 22, 281-5 (1999). 9. Eisen, M.B., et al. Cluster analysis and display of genome-wide expression patterns. Proc. Natl. Acad. Sci. USA 95, 14863-8 (1998). 10. Tamayo, P., et al. Interpreting patterns of gene expression with self-organizing maps: methods and application to hematopoietic differentiation. Proc. Natl. Acad. Sci. USA 96, 2907-12 (1999). 11. Richmond, C.S., et al. Genome-wide expression profiling in Escherichia coli K12. Nucleic Acids Res. 27, 3821-35 (1999). 12. Tao, H., et al. Functional genomics: expression analysis of Escherichia coli growing on minimal and rich media. J. Bacteriol. 181, 6425-40 (1999). 13. Chuang, S.E., D.L. Daniels, & F.R. Blattner. Global regulation of gene expression in Escherichia coli. J. Bacteriol. 175, 2026-36 (1993). 14. Pease, A.C., et al. Light-generated oligonucleotide arrays for rapid DNA sequence analysis. Proc. Natl. Acad. Sci. USA 91, 5022-6 (1994). 15. Blattner, F.R., et al. The complete genome sequence of Escherichia coli K-12. Science 277, 1453-74 (1997). 16. Hengge-Aronis, R. In Escherichia Coli and Salmonella: Cellular and Molecular Biology. (eds. Neidhardt, F. C. et al.) 1497-1512 (ASM Press, Washington D.C.; 1996). 17. Yuzawa, H., et al. Heat induction of sigma 32 synthesis mediated by mRNA secondary structure: a primary step of the heat shock response in Escherichia coli. Nucleic Acids Res. 21, 5449-55 (1993). 18. Lange, R. & R. Hengge-Aronis. The cellular concentration of the sigma S subunit of RNA polymerase in Escherichia coli is controlled at the levels of transcription, translation, and protein stability. Genes Dev. 8, 1600-12 (1994). 41 19. http://www.genome.wisc.edu 20. Link, A.J., K. Robison, & G.M. Church. Comparing the predicted and observed properties of proteins encoded in the genome of Escherichia coli K-12. Electrophoresis 18, 1259-313 (1997). 21. Aach, J., W. Rindone, & G.M. Church. Systematic Management and Analysis of Yeast Gene Expression Data. Genome Res. 10, 431-445 (2000). 22. http://arep.med.harvard.edu/cgi-bin/ExpressDBecoli/EXDStart 23. Liu, M.Y., et al. The RNA molecule CsrB binds to the global regulatory protein CsrA and antagonizes its activity in Escherichia coli. J. Biol. Chem. 272, 17502-10 (1997). 24. Southern, E.M., N. Milner, & K.U. Mir. Discovering antisense reagents by hybridization of RNA to oligonucleotide arrays. Ciba Found. Symp. 209, 38-44 (1997). 25. Chen, L.H., et al. Structure and function of a bacterial mRNA stabilizer: analysis of the 5' untranslated region of ompA mRNA. J. Bacteriol. 173, 4578-86 (1991). 26. SantaLucia, J., Jr. A unified view of polymer, dumbbell, and oligonucleotide DNA nearest-neighbor thermodynamics. Proc. Natl. Acad. Sci. USA 95, 1460-5 (1998). 27. http://mfold2.wustl.edu/~mfold/dna/form1.cgi 28. Mir, K.U. & E.M. Southern. Determining the influence of structure on hybridization using oligonucleotide arrays. Nat. Biotechnol. 17, 788-92 (1999). 29. Taljanidisz, J., P. Karnik, & N. Sarkar. Messenger ribonucleic acid for the lipoprotein of the Escherichia coli outer membrane is polyadenylated. J. Mol. Biol. 193, 507-15 (1987). 30. Cao, G.J. & N. Sarkar. Poly(A) RNA in Escherichia coli: nucleotide sequence at the junction of the lpp transcript and the polyadenylate moiety. Proc. Natl. Acad. Sci. USA 89, 7546-50 (1992). 31. Portier, C. & P. Regnier. Expression of the rpsO and pnp genes: structural analysis of a DNA fragment carrying their control regions. Nucleic Acids Res. 12, 6091102 (1984). 32. Portier, C., et al. The first step in the functional inactivation of the Escherichia coli polynucleotide phosphorylase messenger is a ribonuclease III processing at the 5' end. Embo J. 6, 2165-70 (1987). 33. Regnier, P. & E. Hajnsdorf. Decay of mRNA encoding ribosomal protein S15 of Escherichia coli is initiated by an RNase E-dependent endonucleolytic cleavage that removes the 3' stabilizing stem and loop structure. J. Mol. Biol. 217, 283-92 (1991). 34. Pennisi, E., Are Sequencers Ready to 'Annotate' the Human Genome?, in Science. 2000. p. 2183. 35. DeRisi, J., et al. Use of a cDNA microarray to analyse gene expression patterns in human cancer. Nat. Genet. 14, 457-60 (1996). 36. DeRisi, J.L., V.R. Iyer, & P.O. Brown. Exploring the metabolic and genetic control of gene expression on a genomic scale. Science 278, 680-6 (1997). 37. Tavazoie, S. & G.M. Church. Quantitative whole-genome analysis of DNAprotein interactions by in vivo methylase protection in E. coli. Nat. Biotechnol. 16, 56671 (1998). 38. Dedon, P.C., et al. A simplified formaldehyde fixation and immunoprecipitation technique for studying protein-DNA interactions. Anal. Biochem. 197, 83-90 (1991). 42 39. Orlando, V. & R. Paro. Mapping Polycomb-repressed domains in the bithorax complex using in vivo formaldehyde cross-linked chromatin. Cell 75, 1187-98 (1993). 40. Bulyk, M.L., et al. Quantifying DNA-protein interactions by double-stranded DNA arrays. Nat. Biotechnol. 17, 573-7 (1999). 41. Winzeler, E.A., et al. Whole genome genetic-typing in yeast using high-density oligonucleotide arrays. Parasitology 118, S73-80 (1999). 42. Gingeras, T.R., et al. Simultaneous genotyping and species identification using hybridization pattern recognition analysis of generic Mycobacterium DNA arrays. Genome Res. 8, 435-48 (1998). 43. Neidhardt, F.C., J.L. Ingraham, & M. Schaechter. Physiology of the Bacterial Cell: A Molecular Approach. 1st ed. (Sinauer Associates, Massachusetts; 1990). 43 A B C D lpp 5’ end of coding region lpp 3’ end of coding region high low Figure 1. False-color images of scanned Escherichia Coli genome array hybridized with a sample derived from a stationary phase culture growing in LB. (A) Whole array (top half: ORFs, bottom half: intergenic regions, very bottom: rRNAs and tRNAs). (B) Close-up of coding regions. The bright streak on the lower left is rmf. (C) Close-up of intergenic regions, rRNAs, tRNAs. (D) lpp coding region. Note: Apparent saturation (esp. in C) is due to display settings and not signal saturation. 44 A B 160000 120000 Signal 100000 100 Genes detected: Metric Log 2max 87% Median 20% AD 18% 2max Stationary 97% 90% 87% Observed fold change 140000 80000 60000 40000 20000 0 0.01 0.1 1 10 y = 1.11x0.46 R = 0.94 0.37 Median y = 1.95x R = 0.94 AD 10 y = 1.08x0.54 R = 0.95 100 -20000 1 1 Copies / cell 10 100 Known fold change Figure 2. Comparison of 2max (●), median (■), and average difference (▲) abundance metrics using Bacillus subtilis control RNAs. (A) Abundance measurement vs. RNA concentration, with present calls. Genes are considered detected by the 2max and median metrics if they are at least 3 standard deviations above negative controls for which no RNA is present. Detection using the average difference metric is determined using an algorithm implemented in the GeneChip 3.2 software package. No false positives were detected for any of the metrics (see Experimental Protocol). (B) Plot of observed fold changes measured by various metrics vs. known fold changes. The relationship between observed and known fold change is non-linear for all three metrics over a dynamic range of 3 orders of magnitude, and approximately linear for changes less than 10-fold. 45 1000 A Crick strand Watson strand (same array) csrB B csrB (untranscribed strand) Expression array Reverse complement array b1365 (sense) b1365 (antisense) pos. 171 228 Figure 3. The E. coli array can detect strand-specific transcription and can be used to identify (A) small untranslated RNAs, such as csrB, and (B) detection of a previously unidentified antisense RNA in the Rac prophage. Transcription was detected on the strand opposite b1365 from positions 171 to 228. Position is given as the number of base pairs from the central nucleotide of the oligonucleotide probe to the translation start of b1365. The oligonucleotides are closely spaced but three of them are nonoverlapping. The 15 probe pairs in these probe set are outlined by the white grid, with the PM features on the top row. Probe sets for the untranscribed strands show the background signal typical of undetected transcripts. The oligonucleotides in A and on the expression array of B are tiled from left to right in the 5' to 3' direction. 46 A B Intensity (PM - MM) / 2max 2 5 4 1.5 Known RNaseIII sites 3 Reported transcription start (-100) 1 Known hairpin 0.5 Known region of secondary structure 2 1 pnp 0 0 -300 -200 -100 0 100 200 -200 400 300 -100 0 -0.5 -2 Translation stop (237 bases) Known transcription start (position -33) -1 100 200 300 -1 Translation stop (270 bases) -3 Bases from translation start Bases from translation start Figure 4. Determination of transcription starts of (A) lpp and (B) rpsO. Both genes exhibit reproducible hybridization patterns despite large log phase fold increases of 60- and 400-fold, respectively. 2max-normalized (PM - MM) fluorescence intensity of log phase (), stationary phase (▲), and genomic DNA (●) arrays were plotted against distance from center of oligonucleotide to translation start site. Points for log and stationary phase are the means of duplicate experiments. Oligonucleotides which target both the open reading frame and the flanking intergenic regions allow this region to be probed at ~6 base pair average resolution for lpp and ~13 for rpsO. Transcription starts are detected between -30 and -37 for lpp (reported -33)28,29 and between -94 and -117 for rpsO (reported -100) 30. lpp is known to sometimes extend to position 284, ending in a hairpin structure. Oligonucleotides in this region showed no hybridization, suggesting early termination of transcription and/or sensitivity of the array to secondary structure. Variability in the hybridization pattern at the 3' end may reflect differential processing. The rpsO transcript has a 3' hairpin and can be co-transcribed with downstream pnp. The hairpin structure serves as a stabilizing element for both rpsO and pnp as well as a transcriptional attenuator31,32. Processing by RNaseIII may relieve secondary structure in this region and lead to the increased signal seen at the 3' end in stationary phase. 47 400 500 Table 1. ORFs with significant changes in probe set intensity, previously known to be differentially regulated in stationary phase Gene rmf glgS hdeA dps hdeB osmY himA Abs. Change 120465 118425 104184 91763 34968 21914 19920 Fold Change 17 160 41 55 5 9 23 csgB clpA wrbA 19385 17369 15845 >30 8 7 fic 14900 26 htrE 14893 >24 cstA sspA ftsA 13475 13076 11171 11 4 >5 hyaE dacC 10406 10064 >4 8 emrA otsB cfa iciA rpoH 8433 8276 7896 7506 -26713 >4 2 >4 >4 0.4 hns -170027 0.04 Annotation ribosome modulation factor glycogen biosynthesis, rpoS dependent orf, hypothetical protein global regulator, starvation conditions orf, hypothetical protein hyperosmotically inducible periplasmic protein integration host factor (IHF), alpha subunit; site specific recombination minor curlin subunit precursor, similar to CsgA ATP-binding component of serine protease trp repressor binding protein; affects association of trp repressor and operator induced in stationary phase, recognized by rpoS, affects cell division probable outer membrane porin protein involved in fimbrial assembly carbon starvation protein regulator of transcription; stringent starvation protein A ATP-binding cell division protein, septation process, complexes with FtsZ, associated with junctions of inner and outer membranes processing of HyaA and HyaB proteins D-alanyl-D-alanine carboxypeptidase; penicillin-binding protein 6 multidrug resistance secretion protein trehalose-6-phosphate phophatase, biosynthetic cyclopropane fatty acyl phospholipid synthase replication initiation inhibitor, binds to 13-mers at oriC RNA polymerase, sigma(32) factor; regulation of proteins induced at high temperatures DNA-binding protein HLP-II (HU, BH2, HD, NS); pleiotropic regulator Genes are ranked by absolute change, given as 2max in arbitrary fluorescence units. Signal was normalized to total array intensity. Fold changes were adjusted based on calibration with spiked transcripts (Fig. 2B). For those transcripts which were called absent in one condition the fold change was estimated (indicated by a ">") by substituting the mean of the negative controls + 3 standard deviations for the undetected transcript. 23 out of 69 transcripts which are known to be differentially expressed 15 and which are present on the array were called as significantly changed. The remaining 46 were not significantly changed. 22 out of 23 of the significant changes agree with the direction of change reported in the literature. rpoH, the heat-shock sigma factor, is reported to increase in stationary phase although RNA levels decreased about 3-fold in our experiment. This may be a result of translational control which is known to play a role in the regulation of rpoH16. Altogether, there were 1,529 genes (including tRNAs and rRNAs) which were significantly changed. 926 were increased in stationary phase and 603 were decreased. Annotations are from the University of Wisconsin Genome Project 15,19. The complete dataset can be found at Express DB21,22. 48 Table 2. ORFs with the largest significant increases in probe set intensity in stationary phase Bnumber b1005 b0836 b0953 b3049 b4045 b3510 b0812 b1480 b2665 b3555 b3239 b1240 b1635 b1051 Gene ycdF rmfa glgSa yjbJb hdea,b dpsa,b rpsV ygaU yiaG yhcO gst msyB Abs. Change 135446 130009 120465 118425 117238 104184 91763 74063 71120 67426 64840 53219 51788 51334 Fold Change 102 >1000 17 160 9 41 55 48 60 12 140 4 81 11 b0966 b1318 yccV ycjV 50782 48950 16 75 b1154 b1566 b2212 b1492 b2266 b1164 b3183 b1262 ycfK flxA alkB xasA elaB ycgZ yhbZ trpC 46949 45987 43206 42971 42249 41961 41925 41711 >180 13 6 85 >140 3 7 7 b1739 osmE 40691 24 Annotation orf, hypothetical protein putative receptor ribosome modulation factor glycogen biosynthesis, rpoS dependent orf, hypothetical protein orf, hypothetical protein global regulator, starvation conditions 30S ribosomal subunit protein S22 orf, hypothetical protein orf, hypothetical protein orf, hypothetical protein orf, hypothetical protein glutathionine S-transferase acidic protein suppresses mutants lacking function of protein export orf, hypothetical protein putative ATP-binding component of a transport system orf, hypothetical protein orf, hypothetical protein DNA repair system specific for alkylated DNA acid sensitivity protein, putative transporter orf, hypothetical protein orf, hypothetical protein putative GTP-binding factor N-(5-phosphoribosyl)anthranilate isomerase and indole-3-glycerolphosphate synthetase activator of ntrL gene Same analysis as Table 1. aThese genes are known to be differentially regulated in stationary phase16. bThe products of yjbJ, dps, and hdeA are the first, fifth, and sixth most abundant proteins, respectively, in stationary phase E. coli20. 49 Chapter 3 Global RNA half-life analysis in Escherichia coli reveals positional patterns of transcript degradation Douglas W. Selinger, Rini Mukherjee Saxena, Kevin J. Cheung, George M. Church, and Carsten Rosenow The research described in this chapter will be published in the February 2003 issue of Genome Research. The RNA degradation experiment described in this chapter is the result of a close and fruitful collaboration with Rini Saxena and Carsten Rosenow at Affymetrix. We began our collaboration after discovering that we had independently generated microarray datasets of an E. coli rifampicin timecourse. The data analyzed here are those generated by Rini Saxena. I carried out the data analysis using GAPS, as well as other programs I developed specifically for RNA degradation analysis, written in Perl, Matlab, and Mathematica. I was also primarily responsible for writing the manuscript for publication. 50 Abstract Sub-genic resolution oligonucleotide microarrays were used to study global RNA degradation in wild type Escherichia coli MG1655. RNA chemical half-lives were measured for 1,036 open reading frames (ORFs) and for 329 known and predicted operons. The half-life of total mRNA was 6.8 minutes under the conditions tested. Furthermore, we observed significant relationships between gene functional assignments and transcript stability. Unexpectedly, transcription of a single operon (tdcABCDEFG) was relatively rifampicin insensitive and showed significant increases 2.5 minutes after rifampicin addition. This supports a novel mechanism of transcription for the tdc operon, whose promoter lacks any recognizable sigma binding sites. Probe by probe analysis of all known and predicted operons showed that the 5' ends of operons degrade, on average, more quickly than the rest of the transcript, with stability increasing in a 3' direction, supporting and further generalizing the current model of a net 5' to 3' directionality of degradation. Hierarchical clustering analysis of operon degradation patterns revealed that this pattern predominates but is not exclusive. We found weak but highly significant correlation between the degradation of adjacent operon regions, suggesting that stability is determined by a combination of local and operon-wide stability determinants. The 16 ORF dcw gene cluster, which has a complex promoter structure and a partially characterized degradation pattern, was studied at high-resolution, allowing a detailed and integrated description of its abundance and degradation. We discuss the application of 51 sub-genic resolution DNA microarray analysis to study global mechanisms of RNA transcription and processing. 52 Introduction Gene regulation is a dynamic process which can be controlled by a number of mechanisms as genetic information flows from nucleic acids to proteins. The study of gene regulation in the steady state, while informative, overlooks the underlying dynamics of the processes. Steady state transcript levels are a result of both RNA synthesis and degradation, and as such, measurements of degradation rates can be used to determine their rates of synthesis (if their steady state levels are known) as well as reveal regulation which occurs via changes in RNA stability. For the genetic regulatory network of E. coli to be understood and eventually modeled, all means of regulation in use by the cell must be given due attention. RNA degradation in eubacteria was once viewed as a non-specific, unregulated process. Today it is known to involve multiple degradation pathways, a multisubunit protein complex (the degradosome), and to be an important regulatory mechanism for the expression of some genes. For reviews see (Grunberg-Manago 1999)(Regnier and Arraiano 2000)(Rauhut and Klug 1999). A small number of large-scale RNA degradation analyses have recently been reported in budding yeast (Wang et al. 2002), humans (Lam et al. 2001), and E. coli (Bernstein et al. 2002). RNA expression analysis with DNA microarrays has allowed transcription to be studied at an unprecedented scale. Nevertheless, the potential of the technology to elucidate the low-level details of the transcription and processing of RNA has been 53 poorly explored. In this study we have taken a first step by identifying global RNA degradation patterns at the operonic, genic, and subgenic levels. High-density oligonucleotide arrays from Affymetrix were used to study the degradation of RNA over essentially the entire transcriptome of Escherichia coli MG1655 (Selinger et al. 2000). These arrays have subgenic-resolution coverage of the genome (both coding and non-coding regions), allowing us to examine transcription and degradation in a relatively continuous and unbiased manner. We present RNA half-life measurements for 1,036 open reading frames (ORFs) and for 329 known and predicted operons. We present significant over- and underrepresentation of ORF functional categories in the set of most labile RNAs. We identify an unusual rifampicin-insensitive promoter (of the tdc operon) and strengthen the case for its transcription by a novel mechanism. We present evidence for the higher lability of the 5' ends of operons relative to their 3' ends, supporting the current model of an overall 5' to 3' direction of degradation. Finally, we explore positional patterns of RNA degradation and discuss the current state of the art of high-resolution global transcription analysis. Results and Discussion Half-life determination. For the determination of half-lives all experiments were done in triplicates for each RNA preparation. On average, 23% of the genes were detected at 2.33 (99% confidence) above negative control probes sets. Half-lives were calculated for 1,036 ORFs, of which 479 were calculated exactly and 557 represent upper bounds. Average half-lives were calculated for 329 known and predicted operons (see methods) 54 (Tables 1 and 2), although these are only a rough approximation as typically only a subset of the ORFs had measurable half-lives, and there can be considerable differences between the degradation of different operonic regions. After addition of rifampicin, which prevents initiation of new transcripts by binding to the subunit of RNA polymerase (Campbell et al. 2001), the total intensity for all mRNAs decreases exponentially with time (R = 0.98) with an estimated overall chemical half-life of 6.8 minutes. This is in rough agreement with a recently reported half-life of 7.5 minutes for total pulse-labeled RNA in comparable conditions (Mohanty and Kushner 1999). Although absolute decay rates are known to vary appreciably across experiments, especially those determined in different laboratories, we observe qualitative agreement with some well-studied transcripts, such as ompA, a very stable RNA in fast-growing cells (Nilsson et al. 1984)(see methods), and cspA an extremely unstable one which is transiently stabilized upon cold shock (Goldenberg et al. 1996)(Table 1). Genes encoding enzymes known to be involved in RNA decay such as pnp, rhlB, and rho show exponential decay patterns starting immediately after rifampicin treatment. The genes rne and rnc, also show progressive decay patterns but were expressed at relatively low levels, making half-life measurement difficult. The genes rnb and pcnB were undetected throughout the time-course. Average operon half-lives were calculated by taking the mean of the operons' member ORFs for which half-lives had been determined. A number of the most unstable operons (Table 2) enable metabolism that is presumably unnecessary in rich media, such as amino acid biosynthesis (thr, cad), alternative carbon source catabolism (lac, sdh), 55 and nucleotide biosynthesis (deo). It would be interesting to see whether these transcripts are more stable in rich media. Discovery of a rifampicin-insensitive promoter. Surprisingly, a single operon, tdcABCDEFG, which encodes a pathway for the transport and anaerobic degradation of L-threonine, was relatively rifampicin insensitive. All seven ORFs of this operon were significantly upregulated at 2.5 minutes after rifampicin addition. After their initial increase at 2.5 minutes, the ORFs of the tdc operon show either gradual decay or stability through the 5 and 10-minute time-points, followed by near-complete degradation by the 20-minute time-point (data not shown). Because rifampicin targets the core of the only RNA polymerase (RNAP) in E. coli, we were initially surprised to find an operon which could still be transcribed after rifampicin addition. However, differential sensitivity to rifampicin by RNAP holoenzyme containing different sigma subunits (70 vs. 32) has been previously observed (Wegrzyn et al. 1998) suggesting that certain holoenzymes may be rifampicin insensitive. Furthermore, the tdc promoter is unusual in that it doesn't contain any recognizable sigma binding sites, but does contains sites for a number of transcription factors, including CRP, IHF, FNR, LysR, TdcA, and TdcR. It has also been suggested that the tdc promoter is controlled by a novel mechanism and can be activated by altering its local topology (Wu and Datta 1995)(Sawers 2001). RNA decay related to function. To determine whether transcripts whose gene products participate in the same cellular processes tended to be degraded at the same rates, we looked at the over- and under-representation of 23 gene functional categories (Blattner et 56 al. 1997) within different half-life ranges (Table 3). P-values were calculated using the cumulative hypergeometric distribution and a 95% confidence level was used as a cutoff (Tavazoie et al. 1999). In the set of short-lived ( 5 minutes) transcripts, genes annotated as putative enzymes were significantly over-represented. Rapidly degraded transcripts are good candidates for regulation via RNA stability and many of these may be transiently stabilized in some environmental condition in which they are needed. The instability of their transcripts, and likely low protein levels, may have been a hindrance to their discovery and/or characterization. Genes involved in translation and post-translational modification were significantly under-represented among short-lived ( 5 minutes) transcripts, reflecting the known stability of the cell's translational machinery. Genes involved in energy metabolism were significantly over-represented among transcripts with intermediate half-lives of between 10 and 20 minutes. The genes in this category are, in general, well studied and are regulated by a variety of mechanisms unrelated to RNA stability, although in most cases regulation via transcript stability has not been ruled out. To assess whether our experiment preferentially measured the half-lives of some groups of genes relative to others, we looked for differential representations of genes with measured half-lives relative to all genes on the array. Genes whose half-lives could be determined in our experiment were significantly over-represented for those involved in translation and post-translational modification, which are generally very highly expressed and easy to detect. Those classified as "Hypothetical, unclassified, unknown" or as putative transport proteins were significantly under-represented, suggesting that both of these classes in general are expressed at a very low level and/or may contain a number of 57 spuriously predicted ORFs. These two uncharacterized groups stand in contrast to putative enzymes and putative regulatory proteins, which were detected at a rate indistinguishable from other groups. 5' to 3' directionality of degradation. RNA is degraded within the cell by the combined action of RNA exo- and endonucleases. The precise way in which this process occurs has been a subject of intense study (Grunberg-Manago 1999; Regnier and Arraiano 2000). Stable 5' secondary structures have been shown to confer stability on downstream sequences (Emory et al. 1992), while 3' polyadenylation targets transcripts for degradation (Sarkar 1997). To investigate whether degradation is targeted preferentially towards the 5' or 3' end of the mRNA, we measured the variability of degradation rates at different positions of predicted and known operons containing at least 2 ORFs. Each operon coding region was divided into 3 equal regions (5', middle, and 3'), while 30 bases upstream and downstream of the operons were denoted 5' and 3' UTRs, respectively. The UTR was chosen to be relatively short to increase the probability that it was in fact cotranscribed with the operon. The average log2 ratio of probes in each region was calculated for each operon (see methods). Log2 ratios of each region were averaged for all operons, as well as for subsets with specified half-lives, to compare the degradation rates of different transcript regions (Fig. 1). In the set of all operons, the log2 ratios were most negative for the 5' UTR and became less negative in a 5' to 3' direction, consistent with a predominantly 5' to 3' directional mechanism of degradation. 58 To determine whether positional patterns varied depending on overall stability, operons were grouped based on their average half-lives (Fig. 1). The same trend of 3'increasing stability was seen for all groups, regardless of overall half-life. This trend was most consistent for the 20-40 minute operons, whereas for the <5 minute and 5-20 minute operons there were some discrepancies at their 5' ends, especially at the later time-points. To assess the significance of the differential degradation rates we used a one-way ANOVA to test whether the differences between average degradation rates of different operon regions could be accounted for by chance. Significant differences between regional mean degradation rates were found for almost all timepoints in all half-life sets using = 0.05 or 0.10, as detailed in the figure 1 legend. The results for the analysis of all 835 operons were especially significant, with all p-values below 1x10-12. We conclude that the observed variation in the rate of degradation of different operonic regions is significant. Clustering of degradation patterns. It is important to note that while the 5' to 3' directionality illustrated by figure 1 indicates that, in general the 5' ends of operons are degraded more quickly than their 3' ends, it does not indicate whether this is the only pattern of operon degradation, or simply the most common one. To distinguish between these two possibilities the degradation patterns of all operons were clustered using a hierarchical clustering algorithm and displayed as a tree (Eisen et al. 1998) (Fig. 2). 149 known and predicted operons for which complete data was available were divided into 5 operon regions: 5' and 3' UTR (representing 30 bases up- and down-stream of the translation start and stop, respectively), and equal-length 5', middle, and 3' coding 59 regions. Within each operon, each region was ranked from most stable (5) to least stable (1) based on the average log2 ratio of oligos in that region at each timepoint. This withinoperon normalization allows operons with similar patterns to be grouped together regardless of their overall rate of degradation. The results of the clustering analysis indicate, that while there is a clear predominance of a 5' to 3' degradation pattern, other patterns are also present. Nevertheless, the degradation ranks for each region, when averaged over all operons, show a clear trend consistent with an overall 5' to 3' directionality of degradation. To assess the statistical significance of the observed directionality we performed a 2 goodness of fit test on each transcript region. We are easily able to reject the null hypothesis that each region has an equiprobable distribution of ranks, with p-values ranging from 2x10-6 to 2x10-38 (Fig. 2). From inspection of the rank distributions we conclude that 5' regions of operons are significantly more likely to be degraded quickly and 3' regions more likely to be degraded slowly. Because certain transcript features, such as the ompA stabilizer (Emory et al. 1992), are known to exert their effects along an entire transcript, we analyzed the extent to which the degradation of one region is correlated to other regions. The average Pearson's linear correlation coefficient (R) between the degradation of adjacent regions was 0.38, and the average correlation between any two operon regions was 0.26. These weak, but statistically significant (p < 0.005), correlations suggest that while there are important operon-wide determinants of stability, local determinants may play a larger role in the stability of RNAs. This emphasizes the need to scrutinize transcription and degradation at a higher level of resolution. 60 It should be noted that despite the difficulties of defining transcript boundaries, as well as the existence of operons with multiple promoters and terminators, we were still able to identify significant patterns. As our knowledge of these confounding factors increases we may expect to see even clearer patterns emerge. High-resolution analysis of the dcw gene cluster. The dcw gene cluster, important for cell envelope biosynthesis and cell division, contains 16 ORFs and has a complex promoter structure (Fig. 3) (Vicente et al. 1998)(Dewar and Dorazi 2000). It is transcribed mainly from two clusters of promoters located at the 5' end (~ORFs 1-3), and near the 3' end (ORFs12-14). We observe a complex degradation pattern for this operon, with 3 primary domains of stability (Figs. 3,4). The 5' end is degraded most rapidly, consistent with the most commonly observed pattern. The central region is relatively stable from murE to murC. The 3' end, from ddlB to envA, has an intermediate stability, with ftsA and ftsZ having nearly identical half-lives, as has been reported previously (Cam et al. 1996). These domains of stability roughly coincide with the clusters of promoters, suggesting they represent somewhat independent units which the cell chooses to regulate simultaneously by both transcriptional initiation and degradation. Interestingly, the relatively high signal intensity at mraZ and ddlB corresponds to the positions of the two major promoters Pmra and ftsQ2p1p, respectively (Flardh et al. 1997; Mengin-Lecreulx et al. 1998) (Fig. 4). This suggests that the regions downstream of these promoters are maintained at higher steady-state RNA levels in the cell, although we are cautious about making a firm conclusion in this regard due to the only semi-quantitative nature of the 61 relationship between microarray signal intensity and absolute RNA abundance. Nevertheless, this observation is consistent with previous measurements which show that about one-third of the transcription of ftsZ originates at promoters located within and between ddlB and ftsA, with the other two-thirds originating upstream of ddlB (de la Fuente et al. 2001; Flardh et al. 1998). The future of high-resolution transcriptome analysis. The type of transcriptome data presented here enables genome-wide analyses which until now have only been done on a small scale. For example, the relationship between RNA degradation and RNA sequence features, such as RNase sites and known and predicted secondary structures, can be assessed, as well as the effects of mutations, especially to the RNA degradation machinery. These data are also useful in the empirical definition of transcription boundaries (Selinger et al. 2000; Tjaden et al. 2002) and promoter usage. We expect such high-resolution analyses to increase in precision. Probe to probe variation, which can mask local changes in RNA abundance, can be improved by smoothing or, perhaps, by more sophisticated model-based (Li and Hung Wong 2001) or correlation-based methods (Cohen et al. 2000). High-resolution mapping of human exon boundaries using oligonucleotide arrays has also been reported (Kapranov et al. 2002; Shoemaker et al. 2001). Microarrays could be designed with probes more evenly spaced throughout the ORFs and the intergenic regions to allow more comprehensive coverage of the transcriptome. The continually increasing density of oligonucleotide arrays suggests that transcriptome data, and our resulting understanding of transcriptional regulation, will increase not only in scope, but also in detail. 62 Methods Growth of bacterial strains and transcript inhibition. E. coli, wild-type strain MG1655 was grown in LB broth medium in shaken flasks at 37C to midlogarithmic phase (A600 = 0.8) and then split into five flasks of 20 ml each. To initiate transcription inhibition, four of these samples were treated with rifampicin (Sigma, St. Louis, MO) at a concentration of 50 g ml-1 and incubated for an additional 2.5, 5, 10, and 20 minutes respectively, followed by immediate harvesting of the cells. The fifth sample was used as a control and cells were harvested immediately (at time-point zero). All RNA isolation procedures were accomplished with the MasterPure Complete DNA and RNA Purification kit from Epicentre Technologies, Madison, WI, as described previously (Rosenow et al. 2001). RNA labeling and hybridization. The cDNA synthesis method was described previously (Rosenow et al. 2001). Briefly, 10 g of total RNA was reverse transcribed using the Superscript II system for first strand cDNA synthesis from Life Technologies (Rockville, MD). The remaining RNA was removed using 2 U RNase H (Life Technologies, Rockville, MD) and 1 g RNase A (Epicentre, Madison, WI) for 10 min at 37C in 100 l total volume. The cDNA was purified using the Qiaquick PCR purification kit from Qiagen (Valencia, CA). Isolated cDNA was quantitated based on the absorption at 260 nm and fragmented using a partial DNase I digest. The fragmented 63 cDNA was 3’ end-labeled using terminal transferase (Roche Molecular Biochemicals, Indianapolis, IN) and biotin-N6-ddATP (DuPont/NEN, Boston, MA). The fragmented and end-labeled cDNA was added to the hybridization solution without further purification. Three microarray hybridizations were carried out for each time-point. Chip Scaling, Transcript Detection. To account for experimental and chip variations, all intensities were normalized according to the variations of the cRNA controls, which were added before the RNA labeling reaction and contain 4 probe sets targeting RNAs not present in the E. coli genome. The controls show a variation of less than 10% before scaling for all 15 labeling reactions (data not shown). Transcript abundances for each RNA were calculated in GAPS© by taking a mean of the perfect match (PM) minus mismatch (MM) probes, after removing the highest and two lowest (2-13max) (Selinger et al. 2000) and are referred to here simply as "average difference" (AD) (Lockhart et al. 1996). Each RNA is typically targeted by 15 unique oligonucleotide probe pairs. A transcript was considered "detected" if it was 2.33 (99% confidence) above the negative controls (90 probe sets for genes not present in the MG1655 genome). For the five time-points (0, 2.5, 5, 10, and 20 minutes) mRNA detection rates were 24, 27, 27, 18, and 6 percent, respectively, with detection cutoffs of 1766, 1014, 975, 1202, and 1327 AD units. The mean of the negative controls has been subtracted from all reported values so that values greater than 0 signify an average difference greater than the negative controls. For high resolution analysis (including the directionality analysis) we calculated log2 ratios as log2((PM-MM of time t)/(PM-MM of time 0)). We only used probe pairs in which PM-MM at time 0 was greater than 100 normalized fluorescent units. 64 RNA Chemical Half-life Determination. Probe pairs (perfect match - mismatch) were averaged over the triplicates of each time-point (0, 2.5, 5, 10 and 20 minutes after rifampicin addition), resulting in an average probe set intensity for each ORF. RNA abundances were determined using the average difference metric implemented by GAPS©. Chemical half-life was determined for each RNA by the following "two-fold" algorithm: i) The earliest time-point at which the transcript was detected was used as the baseline abundance. ii) The earliest successive time-point for which a two-fold decrease was detected was used as the experimental abundance and the half-life was calculated assuming exponential decay. When the baseline but not the experimental time-point was detected the half-life was estimated (yielding an upper-bound estimate) using the noise value in place of the experimental value. Other categories were defined, such as "stable" (transcript is detected but no change as great as two-fold observed), "possible increase" (a minimum two-fold change between any two time-points), "erratic" (both a two-fold increase and decrease observed), and "possibly stable" (at least a two-fold decrease observed, but later returns to baseline level). Slot blots for 4 genes were carried out as a validation of the array-measured RNA half-lives and gave the following results (slot blot/array): ompA 20.2 min/stable; cspC 17.2 min/possibly stable; fldA 10 min/6.7 min; sodA 9.5 min/6.9 min. Half-lives were alternatively calculated by fitting an exponential decay curve to all time-points, regardless of fold change or signal-to-noise thresholds. This approach was deemed inferior to the two-fold algorithm because it gave considerably poorer agreement with slot blot data, showed less sensitivity to rapidly degrading transcripts, and gave spurious results for RNAs whose signal dropped below 65 the detection threshold at later time-points (data not shown). Average half-lives were calculated for predicted and observed operons from RegulonDB (Salgado et al. 2001) by taking a mean for all operon members whose half-life had been determined. Half-lives with estimated upper bounds of greater than 40 minutes were set equal to 40 minutes to avoid skewing the results. The complete list of transcripts, calculated half-lives (of both ORFs and operons), and pattern categories are available at http://arep.med.harvard.edu/rna_decay/. The dataset was also deposited in ExpressDB (Aach et al. 2000) at http://arep.med.harvard.edu/ExpressDB/. 66 All Operons Op 5p Op M Op 3p 3p UTR 5p UTR 0 0 -0.5 -0.5 -1 -1.5 -2 -2.5 -3 n=835 -3.5 Average Log2 Ratio Average Log2 Ratio 5p UTR 20-40 min Operons Op M Op 3p 3p UTR -1 -2 -2.5 -3 n=81 -3.5 HL Not Determined Op 3p 3p UTR 5p UTR 0 0 -0.5 -0.5 -1 -1.5 -2 -2.5 -3 n=82 -3.5 Average Log2 Ratio Average Log2 Ratio Op 5p Op M -1.5 <5 min Operons 5p UTR Op 5p Op 5p Op M Op 3p 3p UTR -1 -1.5 -2 -2.5 -3 n=506 -3.5 5-20 min Operons 5p UTR Op 5p Op M Op 3p 3p UTR Average Log2 Ratio 0 -0.5 2.5 min 5 min 10 min 20 min -1 -1.5 -2 -2.5 -3 -3.5 n=166 Figure 1. Positional differences in operon degradation. Operon regions are plotted on the x-axis, average log2 ratios (compared to the 0 minute time-point) are plotted on the y-axis. Vertical bars indicate standard error. Operons were divided into 5 regions: 30 bases upstream (5p UTR) and downstream (3p UTR), and three equal length regions of the coding region: 5 prime (Op 5p), middle (Op M), and 3 prime (Op 3p). Patterns of operons with different average half-lives were compared. A 5’ to 3’ directionality is observable in the coding regions of all operon subsets. This directionality generally extends at least 30 bases into the UTRs, although the 5’ UTR of quickly degrading operons (<5 min) seems to be more stable than the coding region. All curves in this figure have statistically significant variation between means by one-way ANOVA at = 0.001, with the following exceptions: 2.5 min of the '20 - 40 min' graph, and the 5 and 20 minute curves of the 'half-life not determined' graph which were significant at = 0.05, 0.05, and 0.10, respectively. P-values for timepoints on the 'all operons' graph were all below 1x10-12. 67 Figure 2. Whole genome cluster analysis of operon degradation (following page). The degradation patterns of 149 operons (containing 2 or more ORFs, and oligo probes in all targeted regions) were hierarchically clustered after ranking the relative degradation rate of each region. The algorithm was implemented using the GeneCluster/TreeView package (Eisen et al. 1998). Transcript regions are on the x-axis, with each region split into 2.5, 5, 10, and 20-minute time-points. The average rank increases from 5' to 3', supporting a predominant 5' to 3' directionality of degradation (cluster c). The clustering also reveals that a variety of degradation patterns are present, such as operons with relatively stable 5' UTRs (cluster a). One group of operons (cluster b) is initially degraded most quickly at its 3' UTR at 2.5 and 5 minutes, but then by the 10 minute timepoint is more quickly degraded at it's middle and 3' coding regions. 2 goodness of fit tests show that the distributions of degradation ranks are highly non-random, with 5' regions more likely to be degraded quickly and 3' regions more likely to be degraded slowly. The complete clustering file, including gene names, is available at http://arep.med.harvard.edu/rna_decay/. 68 Time/Region 5’UTR 5’ M 3’ 3’UTR a b c Average Rank 2.4 2.8 2.9 Rank Distribution 2 p-value 3.4 3.4 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 2x10-38 2x10-6 Unstable (1) 1x10-7 4x10-25 1x10-21 Stable (5) Figure 2 69 6 HL(min): 1.8 ORF: mraZ 6.7 ftsL 10<n<20 murE 10<n<20 mraY >20 ftsW >20 murC 10 ftsQ 8.9 ftsZ 4 Average Log2 Ratio ORF: mraW HL(min): 5<n<10 ftsI 4.7 murF N/D murD N/D murG >20 ddlB 8.9 ftsA 9.2 envA 8.3 2 0 -2 -4 + x 2.5 min 5 min 10 min 20 min -6 -2 0 2 4 6 8 10 12 14 16 Position (kb) Figure 3. High-resolution analysis of the dcw Gene Cluster Transcripts. Average log2 ratios (y-axis) were plotted against operon position (x-axis) for 3-probe sliding windows of the 156 probes in the dcw gene cluster, including 30 bases up- and downstream of the first and last ORFs. The positions of known promoters (○) and the ORFs with their estimated half-lives are given in the upper part of the graph. Arrows indicate known RNase E processing sites. An additional weak promoter is thought to be present in either murD or ftsW (Mengin-Lecreulx et al. 1998). rho-independent terminators are present in the 5' region of mraZ and immediately downstream of envA. Degradation is fastest at the 5' end of the operon, with three apparent regions of distinct degradation rates: the 5' region (mraZ-ftsI) is degraded the fastest, the middle (murEmurC) is relatively stable, and the 3' region (ddlB-envA) is degraded at an intermediate rate. 70 18 7000 6000 5000 4000 3000 2000 1000 0 Signal Intensity (AD) 8000 envA ftsZ ftsA ftsQ ddlB murC murG ftsW murD mraY murF murE ftsL mraW mraZ ftsI -1000 0 min 2.5 min 5 min 10 min 20 min Figure 4. Transcript Abundance and Degradation of the dcw Gene Cluster The dcw gene cluster contains 16 ORFs involved in cell envelope biosynthesis and cell division. Several promoters have been described (see Fig. 3) and it is likely that they are all used, to varying extents. It has also been speculated that the cluster may sometimes be transcribed in its entirety. The ORFs have been plotted in the order they are transcribed, showing their array signal intensities (average differences) throughout the time-course. Although average difference is only an approximate indicator of transcript abundance, relatively high levels of steady state RNA are observed downstream of the mraZ and ddlB promoters, at the 5' end and about two-thirds of the way into the transcript, respectively. The middle portion of the operon has lower steady state RNA levels and is degraded more slowly (see Fig. 3). 71 B# b4188 b3605 b3914 b0990 b3913 b0553 b2398 b3494 b3556 b3685 b0726 b1205 b3362 b0162 b1060 b2080 b2377 b3361 b4132 b4396 Name HL yjfN lldD cpxP(2) cspG cpxP(1) nmpC yfeC uspB cspA yidE sucA ychH yhfG cdaR yceP yegP yfdY fic cadB rob 0.8* 0.9* 1.0 1.1 1.1 1.2 1.2 1.2* 1.2 1.2* 1.3 1.3 1.3* 1.4* 1.4 1.4 1.4 1.4 1.4 1.4 0 8782 7031 10398 6302 10811 4704 5062 4330 20403 4373 4699 11630 3959 3366 11780 5355 6525 6270 6923 4685 2.5 744 770 1812 1324 2352 1147 1218 870 4696 651 1236 2964 884 859 3294 1567 1880 1888 2019 1339 5 67 1424 3530 935 3506 742 614 366 3056 722 1001 692 1014 473 5286 2914 1189 2035 3046 734 10 20 -41 221 997 389 790 187 122 -33 1556 202 701 67 556 105 1886 1769 944 1267 2287 -32 -270 -109 -11 105 27 -183 -107 -191 100 -150 -161 -171 -114 520 1374 922 53 39 238 -322 Table 1. 20 most labile mRNAs The twenty most labile mRNAs with their average difference (AD) intensities at each time-point. 12 out of 20 have unknown or putative functions. High lability may be an indication of regulation at the level of RNA stability. This is known to be the case for cspA, which is extremely unstable at 37° but transiently stable after a shift to 15° (Goldenberg et al. 1996). The lability of cspG suggests that it may behave similarly. Numbers shaded in grey are below the 99% confidence detection threshold (see methods). *Half-life represents an upper bound. 72 Avg. HL Operon 1.35 pabA fic yhfG 1.35 yfeC yfeD 1.65 cadA cadB cadC 1.75 deoC deoA deoB deoD 1.95 yhcH yhcI nanE nanT 2.05 ynfB speG 2.1 thrL thrA thrB thrC 2.2 sdhC sdhD sdhA sdhB 2.2 yjbQ yjbR 2.35 lacA lacY lacZ 2.4 folX yfcH 2.45 ybjC mdaA 2.47 nagD nagC nagA nagB Table 2. Operons with average half-lives 2.5 minutes A number of these unstable operons enable biosynthesis that is presumably unnecessary in rich media, such as amino acid biosynthesis (thr, cad), alternative carbon sources (lac, sdh), and nucleotide biosynthesis (deo). Underlining indicates half-lives used in the average. 73 Functional Category Putative enzymes Translation and posttranslational modification Energy metabolism Translation, post-translational modification Hypothetical, unclassified, unknown Putative transport proteins Experimental Group HL <= 5 min HL <= 5 min Rep. over under p-value 6.5x10-5 1.8x10-5 10 min < HL < 20 min ORFs with measured HLs over over 5.4x10-3 1.8x10-23 ORFs with measured HLs under 1.3x10-8 ORFs with measured HLs under 2.8x10-5 Table 3. Functional category representation of half-life groups Several half-life groupings (HL 5 min, 5 < HL 10, 10< HL 20, and HL >20) were tested for over- or under-representation of 23 different functional categories (Blattner et al. 1997) relative to all genes whose half-lives were estimated. Categories were also identified which were over- or under-represented in the set of all ORFs with measured half-lives. P-values were calculated using the cumulative hypergeometric distribution (Tavazoie et al. 1999). A 95% confidence level was achieved using a cutoff of 2.2x10-3 to account for multiple hypotheses. Transcripts displaying no preference were those encoding proteins involved in transport and binding, structure and membrane proteins, carbon compound metabolism, amino acid- and nucleotide biosynthesis and metabolism, and central intermediary metabolism, transcription, and post-transcriptional regulation. Despite the rapid degradation of some well-studied genes (such as pnp, rhlB, and rho), as a whole, genes involved in RNA degradation were not significantly enriched in any halflife group. 74 Acknowledgements We thank Sidney Kushner for advice and the provision of mutants (not used in this study). We thank Kenn Rudd and Joel Belasco for critical reviews of the manuscript. One of the authors (DS) was graciously hosted in the lab of Minoru Kanehisa for part of this work. This work was supported by grants from the NSF-MEXT Monbusho program, Lipper Foundation, NSF, and DOE. References Aach, J., W. Rindone, and G.M. Church. 2000. Systematic Management and Analysis of Yeast Gene Expression Data. Genome Res 10: 431-445. Bernstein, J.A., A.B. Khodursky, P.H. Lin, S. Lin-Chao, and S.N. Cohen. 2002. Global analysis of mRNA decay and abundance in Escherichia coli at single-gene resolution using two-color fluorescent DNA microarrays. Proc Natl Acad Sci U S A 99: 9697-9702. Blattner, F.R., G. Plunkett, 3rd, C.A. Bloch, N.T. Perna, V. Burland, M. Riley, J. Collado-Vides, J.D. Glasner, C.K. Rode, G.F. Mayhew, J. Gregor, N.W. Davis, H.A. Kirkpatrick, M.A. Goeden, D.J. Rose, B. Mau, and Y. Shao. 1997. The complete genome sequence of Escherichia coli K-12. Science 277: 1453-1474. Cam, K., G. Rome, H.M. Krisch, and J.P. Bouche. 1996. RNase E processing of essential cell division genes mRNA in Escherichia coli. Nucleic Acids Res 24: 3065-3070. Campbell, E.A., N. Korzheva, A. Mustaev, K. Murakami, S. Nair, A. Goldfarb, and S.A. Darst. 2001. Structural mechanism for rifampicin inhibition of bacterial rna polymerase. Cell 104: 901-912. Cohen, B.A., R.D. Mitra, J.D. Hughes, and G.M. Church. 2000. A computational analysis of whole-genome expression data reveals chromosomal domains of gene expression. Nat Genet 26: 183-186. de la Fuente, A., P. Palacios, and M. Vicente. 2001. Transcription of the Escherichia coli dcw cluster: evidence for distal upstream transcripts being involved in the expression of the downstream ftsZ gene. Biochimie 83: 109-115. Dewar, S.J. and R. Dorazi. 2000. Control of division gene expression in Escherichia coli. FEMS Microbiol Lett 187: 1-7. Eisen, M.B., P.T. Spellman, P.O. Brown, and D. Botstein. 1998. Cluster analysis and display of genome-wide expression patterns. Proc Natl Acad Sci U S A 95: 14863-14868. Emory, S.A., P. Bouvet, and J.G. Belasco. 1992. A 5'-terminal stem-loop structure can stabilize mRNA in Escherichia coli. Genes Dev 6: 135-148. 75 Flardh, K., T. Garrido, and M. Vicente. 1997. Contribution of individual promoters in the ddlB-ftsZ region to the transcription of the essential cell-division gene ftsZ in Escherichia coli. Mol Microbiol 24: 927-936. Flardh, K., P. Palacios, and M. Vicente. 1998. Cell division genes ftsQAZ in Escherichia coli require distant cis-acting signals upstream of ddlB for full expression. Mol Microbiol 30: 305-315. Goldenberg, D., I. Azar, and A.B. Oppenheim. 1996. Differential mRNA stability of the cspA gene in the cold-shock response of Escherichia coli. Mol Microbiol 19: 241248. Grunberg-Manago, M. 1999. Messenger RNA stability and its role in control of gene expression in bacteria and phages. Annu Rev Genet 33: 193-227. Kapranov, P., S.E. Cawley, J. Drenkow, S. Bekiranov, R.L. Strausberg, S.P. Fodor, and T.R. Gingeras. 2002. Large-scale transcriptional activity in chromosomes 21 and 22. Science 296: 916-919. Lam, L.T., O.K. Pickeral, A.C. Peng, A. Rosenwald, E.M. Hurt, J.M. Giltnane, L.M. Averett, H. Zhao, R.E. Davis, M. Sathyamoorthy, L.M. Wahl, E.D. Harris, J.A. Mikovits, A.P. Monks, M.G. Hollingshead, E.A. Sausville, and L.M. Staudt. 2001. Genomic-scale measurement of mRNA turnover and the mechanisms of action of the anti-cancer drug flavopiridol. Genome Biol 2. Li, C. and W. Hung Wong. 2001. Model-based analysis of oligonucleotide arrays: model validation, design issues and standard error application. Genome Biol 2. Lockhart, D.J., H. Dong, M.C. Byrne, M.T. Follettie, M.V. Gallo, M.S. Chee, M. Mittmann, C. Wang, M. Kobayashi, H. Horton, and E.L. Brown. 1996. Expression monitoring by hybridization to high-density oligonucleotide arrays. Nat Biotechnol 14: 1675-1680. Mengin-Lecreulx, D., J. Ayala, A. Bouhss, J. van Heijenoort, C. Parquet, and H. Hara. 1998. Contribution of the Pmra promoter to expression of genes in the Escherichia coli mra cluster of cell envelope biosynthesis and cell division genes. J Bacteriol 180: 4406-4412. Mohanty, B.K. and S.R. Kushner. 1999. Analysis of the function of Escherichia coli poly(A) polymerase I in RNA metabolism. Mol Microbiol 34: 1094-1108. Nilsson, G., J.G. Belasco, S.N. Cohen, and A. von Gabain. 1984. Growth-rate dependent regulation of mRNA stability in Escherichia coli. Nature 312: 75-77. Rauhut, R. and G. Klug. 1999. mRNA degradation in bacteria. FEMS Microbiol Rev 23: 353-370. Regnier, P. and C.M. Arraiano. 2000. Degradation of mRNA in bacteria: emergence of ubiquitous features. Bioessays 22: 235-244. Rosenow, C., R.M. Saxena, M. Durst, and T.R. Gingeras. 2001. Prokaryotic RNA preparation methods useful for high density array analysis: comparison of two approaches. Nucleic Acids Res 29: E112. Salgado, H., A. Santos-Zavaleta, S. Gama-Castro, D. Millan-Zarate, E. Diaz-Peredo, F. Sanchez-Solano, E. Perez-Rueda, C. Bonavides-Martinez, and J. Collado-Vides. 2001. RegulonDB (version 3.2): transcriptional regulation and operon organization in Escherichia coli K-12. Nucleic Acids Res 29: 72-74. Sarkar, N. 1997. Polyadenylation of mRNA in prokaryotes. Annu Rev Biochem 66: 173197. 76 Sawers, G. 2001. A novel mechanism controls anaerobic and catabolite regulation of the Escherichia coli tdc operon. Mol Microbiol 39: 1285-1298. Selinger, D.W., K.J. Cheung, R. Mei, E.M. Johansson, C.S. Richmond, F.R. Blattner, D.J. Lockhart, and G.M. Church. 2000. RNA expression analysis using a 30 base pair resolution Escherichia coli genome array. Nat Biotechnol 18: 1262-1268. Shoemaker, D.D., E.E. Schadt, C.D. Armour, Y.D. He, P. Garrett-Engele, P.D. McDonagh, P.M. Loerch, A. Leonardson, P.Y. Lum, G. Cavet, L.F. Wu, S.J. Altschuler, S. Edwards, J. King, J.S. Tsang, G. Schimmack, J.M. Schelter, J. Koch, M. Ziman, M.J. Marton, B. Li, P. Cundiff, T. Ward, J. Castle, M. Krolewski, M.R. Meyer, M. Mao, J. Burchard, M.J. Kidd, H. Dai, J.W. Phillips, P.S. Linsley, R. Stoughton, S. Scherer, and M.S. Boguski. 2001. Experimental annotation of the human genome using microarray technology. Nature 409: 922927. Tavazoie, S., J.D. Hughes, M.J. Campbell, R.J. Cho, and G.M. Church. 1999. Systematic determination of genetic network architecture. Nat Genet 22: 281-285. Tjaden, B., D.R. Haynor, S. Stolyar, C. Rosenow, and E. Kolker. 2002. Identifying operons and untranslated regions of transcripts using Escherichia coli RNA expression analysis. Bioinformatics 18 Suppl 1: S337-S344. Vicente, M., M.J. Gomez, and J.A. Ayala. 1998. Regulation of transcription of cell division genes in the Escherichia coli dcw cluster. Cell Mol Life Sci 54: 317-324. Wang, Y., C.L. Liu, J.D. Storey, R.J. Tibshirani, D. Herschlag, and P.O. Brown. 2002. Precision and functional specificity in mRNA decay. Proc Natl Acad Sci U S A 99: 5860-5865. Wegrzyn, A., A. Szalewska-Palasz, A. Blaszczak, K. Liberek, and G. Wegrzyn. 1998. Differential inhibition of transcription from sigma70- and sigma32-dependent promoters by rifampicin. FEBS Lett 440: 172-174. Wu, Y. and P. Datta. 1995. Influence of DNA topology on expression of the tdc operon in Escherichia coli K-12. Mol Gen Genet 247: 764-767. 77 Chapter 4 Conclusion 78 At the outset of this thesis, there was little doubt that microarray expression analysis, which had already been successfully applied to eukaryotes, would eventually be extended to prokaryotes. In the course of my doctoral research I developed experimental methods generally useful in prokaryotic microarray analysis, as well as software tools which enable subgenic resolution analysis of transcription (Selinger et al. 2000). The global and high resolution nature of this approach led to a number of observations with important implications for bacterial transcription and the elucidation of RNA decay pathways. Here I consider some of these implications and speculate on future avenues of research. Widespread antisense transcription The detection of widespread transcription from the antisense strand (relative to known and predicted ORFs) was perhaps the most surprising result, and has been the subject of much speculation. Initially observed using a first strand cDNA labeling approach of stationary phase cells (Chapter 2) and later confirmed by a direct RNA labeling approach of log phase cells, we detected transcription from the antisense strand for >90% of the ORFs. The chance that this highly specific signal is an artifact of incomplete removal of gDNA was reduced by treatment of isolated cellular RNA with DNase I before labeling, and subsequent lack of observable genomic DNA by ethidium bromide staining. Additionally, antisense transcription was observed using two different labeling protocols, one of which involved direct labeling of RNA. 79 Sense Antisense Figure 1. Hybridization of directly labeled RNA isolated from cells growing at log phase to both a standard "sense" array (left) and a reverse complement "antisense" array (right) to detect antisense transcription. Comparison of hybridization of the same sample to both "sense" and "antisense" (Fig. 1) reveals that signal on the sense strand has a wider range of intensities, indicating that their corresponding transcripts are needed in differing amounts, whereas the antisense signal tends to be relatively uniform and of lower intensity. (Note that the intensity of the antisense array in this image was increased for clarity.) The almost universal detection of antisense transcription at a low, and relatively uniform level suggests that it may be, for the most part, transcriptional "noise" caused by the imperfect initiation and/or termination of RNA polymerase. However, it is also likely that at least some of these small RNAs have a bona fide biological function. In fact, small RNAs are receiving a large amount of attention, recently being declared the "Breakthrough of the Year" by Science (Couzin 2002), due in part to genome scale screens like the one described here. Historically, this is an interesting return to the roots of molecular biology, as Jacob and Monod initially proposed RNA as the likely trans 80 acting factor in the transcriptional control of operons (Jacob and Monod 1961), before the focus later shifted decisively to proteins. It is unclear precisely how many functional small RNAs exist in E. coli, although many are known (Wassarman et al. 1999) and new ones are continually being identified (Argaman et al. 2001; Eddy 2001; Wassarman et al. 2001). There are several ways to assess the biological significance of small RNAs. CsrB, a known small untranslated RNA, was inadvertently represented in the wrong orientation on the sense array, and in a stationary phase experiment (Chapter 2, Fig. 3) was clearly detected on an antisense array at a signal intensity far above any other. This suggests that signal intensity may be a reliable indicator for biological significance. Also, the presence of upstream consensus promoter sequences, evolutionary conservation, or known RNA hairpin structures may be good indicators of biological function. Transcriptional or post-transcriptional regulation of small RNAs across timecourses, conditions, or mutants would also strongly suggest a biological function. Our RNA degradation analysis found that ORFs annotated as "hypothetical, unclassified, unknown" had an increased likelihood of showing no change in RNA levels throughout a rifampicin timecourse (Chapter 3, Table 3), suggesting that many of these computationally predicted ORFs are erroneous. This observation, together with the possibility of widespread transcriptional noise, highlights the need for ORF- or small RNA finding algorithms with higher levels of specificity. The widespread transcription we observe should also sound a cautionary note that many verifiably transcribed RNAs may not have a biological function, and that further tests of functionality are necessary before a firm conclusion can be drawn. 81 RNA decay pathways In 1973, Apirion proposed that mRNA decay in E. coli involves both endo- and exonucleolytic events (Apirion 1973). These were assumed to be part of a ribonucleotide salvage pathway, and the possibility that RNA stability may play a role in gene regulation was not originally considered. As with many areas of molecular biology, the picture is considerably more complicated now than when it was initially envisioned. E. coli is known to contain at least 5 endo- and 8 exoribonucleases, and RNA decay is thought to be a carefully controlled process which, in many cases, plays a gene regulatory role (Grunberg-Manago 1999; Kushner 2002). Although far more attention has been paid to gene regulation at the level of transcriptional initiation, RNA stability is emerging as an important determinant of gene activity. Gene expression is a dynamic process, and we may learn a lot by paying attention to both the creation and destruction of this key intermediate in the flow of cellular information. Before the advent of microarrays, the decay of fewer than 25 bacterial RNAs had ever been studied (Bernstein et al. 2002). While these careful studies have lead to a wealth of information about the mechanisms of RNA degradation, it has always been difficult to draw general conclusions from such a small sampling of the transcriptome. Traditionally, transcriptome-wide measurements lacked gene level detail, and more detailed analyses only applied to a small number of transcripts at a time. In the RNA degradation study described in this thesis, transcriptome-wide coverage was combined with subgenic level detail to allow a preliminary glimpse into the global patterns of RNA degradation. 82 RNA, being a directionally oriented linear polymer, can be degraded in many distinct ways: from the 5' end, the middle, the 3' end, or some combination thereof. Different patterns have different biological consequences making some more intuitively favorable than others. For example, a 3' → 5' directionality would cause the degradation machinery to run against the translating ribosomes as well as allow the translation of many incomplete peptide products; both presumably poor adaptations. In contrast, a 5' → 3' directionality would allow faster inactivation of transcripts, no incomplete proteins, and would work co-directionally with translation. Thus, it has been hypothesized that most transcripts are degraded in a 5' → 3' direction, a proposal which has been shown rigorously for -Gal mRNA (Cannistraro and Kennell 1985; Cannistraro et al. 1986) (Kennell 2002; Kushner 2002). Notably, examples of 3' → 5' directionality are also known, suggesting multiple pathways of RNA decay exist (Arraiano et al. 1997; von Gabain et al. 1983). Interestingly, E. coli doesn't appear to have a 5' → 3' exonuclease, and the net 5' → 3' directionality is thought to result from an endonuclease (RNase E/G) which initiates degradation at the 5' end and then successively jumps to more 3' sites (Regnier and Arraiano 2000). This step is thought to be closely followed by a 3' → 5' exonuclease digestion (RNase II, PNPase) which quickly degrades the resulting fragments into oligoribonucleotides, which in turn, are degraded to mononucleotides by oligoribonuclease (Ghosh and Deutscher 1999). The results presented in Chapter 3 confirm and further generalize the 5' → 3' directionality of RNA degradation in E. coli as well as identify a significant number of transcripts which do not conform to the model. Hierarchical clustering of the degradation patterns, in addition to confirming that the 5' → 3' mechanism predominates, also 83 revealed a number of transcripts which appear to have very stable 5' UTRs (Chap. 3, Fig. 2, cluster a) and some cases in which the region most targeted by RNases changes midway through the timecourse (Chap. 3, Fig. 2, cluster b). Furthermore, although the degradation of bulk mRNA was exponential, our data suggests that the degradation of many individual transcripts is not (Appendix C). The transcripts with unusual degradation patterns highlighted by this analysis would make excellent targets for follow-up studies, as they may be processed by alternative degradation pathways. Further subgenic resolution microarray studies using various mutants of the RNA decay pathway would give invaluable information about the genes responsible for the patterns observed. It would also be interesting to see if RNA sequence or structural motifs could be associated with particular patterns of degradation. While this has been found for individual transcripts (Emory et al. 1992), initial genome wide searches so far have been unsuccessful (Bernstein et al. 2002). Final thoughts Microarray analysis provides a powerful combination of exhaustiveness and detail, allowing systematic surveys of transcription and RNA decay. The analyses described in this thesis led to the discovery of widespread antisense transcription, generated large scale kinetic data on RNA processing, and provided evidence supporting the generalization of the current model for RNA decay. The empirical mapping of transcriptional boundaries was also explored, and its utility in transcriptome mapping was later demonstrated in eukaryotic expression studies (Kapranov et al. 2002; Shoemaker et al. 2001). 84 Functional genomic data provide abundant raw material for hypothesis generation at both the single gene and genome wide level. Specific hypotheses can be tested either with traditional approaches, or by independent and/or more refined large scale measurements. With the increasing amount of quantitative data, hypotheses are beginning to include computational models. These models can be retested as new datasets become available, and refined as necessary. Computational modeling is slowly becoming more accessible to the average biologist and may eventually become as widespread and useful as tools like BLAST (Altschul et al. 1990). The generation, in recent years, of astounding quantities of diverse biological information, combined with the ever increasing power of computers, heralds a new age in the understanding, and perhaps engineering, of biological systems. References Altschul, S.F., W. Gish, W. Miller, E.W. Myers, and D.J. Lipman. 1990. Basic local alignment search tool. J Mol Biol 215: 403-410. Apirion, D. 1973. Degradation of RNA in Escherichia coli. A hypothesis. Mol Gen Genet 122: 313-322. Argaman, L., R. Hershberg, J. Vogel, G. Bejerano, E.G. Wagner, H. Margalit, and S. Altuvia. 2001. Novel small RNA-encoding genes in the intergenic regions of Escherichia coli. Curr Biol 11: 941-950. Arraiano, C.M., A.A. Cruz, and S.R. Kushner. 1997. Analysis of the in vivo decay of the Escherichia coli dicistronic pyrF-orfF transcript: evidence for multiple degradation pathways. J Mol Biol 268: 261-272. Bernstein, J.A., A.B. Khodursky, P.H. Lin, S. Lin-Chao, and S.N. Cohen. 2002. Global analysis of mRNA decay and abundance in Escherichia coli at single-gene resolution using two-color fluorescent DNA microarrays. Proc Natl Acad Sci U S A 99: 9697-9702. Cannistraro, V.J. and D. Kennell. 1985. Evidence that the 5' end of lac mRNA starts to decay as soon as it is synthesized. J Bacteriol 161: 820-822. Cannistraro, V.J., M.N. Subbarao, and D. Kennell. 1986. Specific endonucleolytic cleavage sites for decay of Escherichia coli mRNA. J Mol Biol 192: 257-274. Couzin, J. 2002. Breakthrough of the year. Small RNAs make big splash. Science 298: 2296-2297. 85 Eddy, S.R. 2001. Non-coding RNA genes and the modern RNA world. Nat Rev Genet 2: 919-929. Emory, S.A., P. Bouvet, and J.G. Belasco. 1992. A 5'-terminal stem-loop structure can stabilize mRNA in Escherichia coli. Genes Dev 6: 135-148. Ghosh, S. and M.P. Deutscher. 1999. Oligoribonuclease is an essential component of the mRNA decay pathway. Proc Natl Acad Sci U S A 96: 4372-4377. Grunberg-Manago, M. 1999. Messenger RNA stability and its role in control of gene expression in bacteria and phages. Annu Rev Genet 33: 193-227. Jacob, F. and J. Monod. 1961. Genetic regulatory mechanisms in the synthesis of proteins. J Mol Biol 3: 318-356. Kapranov, P., S.E. Cawley, J. Drenkow, S. Bekiranov, R.L. Strausberg, S.P. Fodor, and T.R. Gingeras. 2002. Large-scale transcriptional activity in chromosomes 21 and 22. Science 296: 916-919. Kennell, D. 2002. Processing endoribonucleases and mRNA degradation in bacteria. J Bacteriol 184: 4645-4657; discussion 4665. Kushner, S.R. 2002. mRNA decay in Escherichia coli comes of age. J Bacteriol 184: 4658-4665; discussion 4657. Regnier, P. and C.M. Arraiano. 2000. Degradation of mRNA in bacteria: emergence of ubiquitous features. Bioessays 22: 235-244. Selinger, D.W., K.J. Cheung, R. Mei, E.M. Johansson, C.S. Richmond, F.R. Blattner, D.J. Lockhart, and G.M. Church. 2000. RNA expression analysis using a 30 base pair resolution Escherichia coli genome array. Nat Biotechnol 18: 1262-1268. Shoemaker, D.D., E.E. Schadt, C.D. Armour, Y.D. He, P. Garrett-Engele, P.D. McDonagh, P.M. Loerch, A. Leonardson, P.Y. Lum, G. Cavet, L.F. Wu, S.J. Altschuler, S. Edwards, J. King, J.S. Tsang, G. Schimmack, J.M. Schelter, J. Koch, M. Ziman, M.J. Marton, B. Li, P. Cundiff, T. Ward, J. Castle, M. Krolewski, M.R. Meyer, M. Mao, J. Burchard, M.J. Kidd, H. Dai, J.W. Phillips, P.S. Linsley, R. Stoughton, S. Scherer, and M.S. Boguski. 2001. Experimental annotation of the human genome using microarray technology. Nature 409: 922927. von Gabain, A., J.G. Belasco, J.L. Schottel, A.C. Chang, and S.N. Cohen. 1983. Decay of mRNA in Escherichia coli: investigation of the fate of specific segments of transcripts. Proc Natl Acad Sci U S A 80: 653-657. Wassarman, K.M., F. Repoila, C. Rosenow, G. Storz, and S. Gottesman. 2001. Identification of novel small RNAs using comparative genomics and microarrays. Genes Dev 15: 1637-1651. Wassarman, K.M., A. Zhang, and G. Storz. 1999. Small RNAs in Escherichia coli. Trends Microbiol 7: 37-45. 86 Appendix A RNA expression analysis using a 30 base pair resolution Escherichia coli genome array Douglas W. Selinger, Kevin J. Cheung, Rui Mei, Eric M. Johansson, Craig S. Richmond, Frederick R. Blattner, David J. Lockhart, and George M. Church Original publication format, Nature Biotechnology 18(12): 1262-68 (2000). 87 88 89 90 91 92 93 94 Appendix B Genome Array Processing Software (GAPS) 95 GAPS © Genome Array Processing Software© Developed by Doug Selinger (selinger@fas.harvard.edu) Thesis Advisor: George M. Church, Ph.D. Harvard Medical School, Dept. of Genetics last updated March 2001 This package is designed for the analysis of Affymetrix E. coli oligonucleotide arrays as described in Selinger et al, Nature Biotechnology 18:1262-8 (2000). It can be used in conjunction with Affymetrix's GeneChip© software package and extends its functionality in several ways: - More flexibility in the way the analysis is done - An algorithm which corrects for varying signal in different regions of the chip - High resolution analysis, allowing data to be analyzed oligo by oligo - Handles replicates and multiple timepoints - Automatic annotation of results - All output files can easily be opened and analyzed in Excel - Detection of transcripts is decided by a simple statistical test We are grateful for the tremendous foresight that Affymetrix has shown in permitting the public release of all of the E. coli chip oligo sequences. This has allowed expression to be monitored at an unprecedented resolution, with on average one probe every 30 bases throughout the entire genome. We feel that high resolution expression profiling is the logical next step in the evolution of high density DNA array experiments and we thank Affymetrix for making it possible. Note to users: This package is not an Affymetrix product, and therefore their technical help line will not answer questions about it. All questions should be directed to Doug Selinger. I have devoted considerable effort to making this package usable by the general biological community and its use does not require any knowledge of programming. However, its proper use does require thorough reading of this manual, as it is not the type of program that can be figured out as you go along. To be able to support this package without it becoming my fulltime occupation I am setting some ground rules on the questions I will respond to. I will not answer: i) questions which I feel are already answered clearly in this manual 96 ii) questions about Perl (setup or use), Excel, or other programs which might be used in conjunction with GAPS©, or iii) general computer questions not specifically related to the use of GAPS© All other comments and questions are welcome. I. Outline of package components preGAPS© - This script takes a .CEL file output from GeneChip© and outputs a file (.adf) in which the following processing has been done: - background is subtracted - mismatch (MM) features are subtracted from perfect match (PM) features - the result is multiplied by a scaling factor to correct for differences between chips - the result is multiplied by a correction factor for varying fluorescence within the same chip. (This typically varies 2-3 fold across the chip surface.) Note: The background and scaling factors must be entered by the user and can be taken from a GeneChip© analysis or by another method. GAPS© - This script is the heart of the package and takes a set of .adf files (see above) and averages those which are replicates and compares multiple conditions/timepoints. It has two separate analysis modes which can be used together or separately: - ORF summary: Each predicted ORF is assigned a single fluorescence intensity and these are reported for each condition along with annotation. - High resolution analysis: The intensity of every oligo is reported across all chips and all conditions. This is output as two large tab-delimited files (one for the Watson strand and one for the Crick) which can be opened in a text editor, and portions copied into Excel. Alternatively, portions of these files can be returned by genome position using GAPScan © (see below). GAPScan© - This script is used to analyze sections of high resolution files output by GAPS©. The user enters the region of the genome and the desired strand, and the program returns all oligos and predicted ORFS in that range along with all associated data. A high resolution plot of the oligos along the genome can then be generated with Excel. II. Getting Started System requirements 97 - Windows, Linux, or Macintosh with at least 128 MB of physical RAM (256 MB recommended). The large memory requirement is due to the large amount of data which needs to be analyzed - each chip has almost 300,000 oligos. - Perl needs to be installed on the user's machine before these scripts can be used. There are versions of Perl available free for all major platforms: Windows, Linux, Solaris - ActivePerl http://www.activestate.com/Products/ActivePerl/Download.html Macintosh - MacPerl http://www.macperl.com/ For those unfamiliar with Perl, it is a relatively simple to use but powerful programming language which can be used on any platform. I have provided the source code (since Perl is runtime-compiled) which you are free to modify. At the beginning of each script there is a portion of code which contain the userdefineable parameters. In most cases, these must be modified with user specific information, such as file locations, etc. I have tried to make this portion of the program well-annotated so that non-programmers will be comfortable modifying these parameters. They can be modified in any text editor, as long as the resulting file is saved as plain text and ends in .pl so it is recognized as a Perl script. (I use the EditPlus text editor (for Windows) and illustrations in this manual will be screen-shots from that program, using its default color-coding settings.) III. PreGAPS© Command line: program_name filename background scalingfactor chiptype Example: pregaps.pl my_chip.CEL 100 2 s where: program_name = pregaps.pl filename = my_first_chip.CEL (the .CEL file created by GeneChip©, placed in the "rawdata" folder.) background = 100 units scaling factor = 2 chip type = "s" or blank for sense, "a" for antisense Chip type: Currently Affymetrix only sells "sense" chips. A "sense" chip contains oligos which will bind to mRNA. An "antisense" chip contains oligos which will hybridize to the complement of mRNA, such as cDNA. 98 Be sure that .CEL files are placed in the "rawdata" folder (and not in a subfolder) so that preGAPS© can find them. The .CEL files used in Selinger et al, Nature Biotechnology 2000 can be found on our web site. GAPS© is set up to do an analysis of these files as a demonstration. Normalization parameters for preGAPS© are provided in the "ref" folder. The background and scaling factors can be derived from a .CHP file generated with GeneChip©. The .CHP file was saved as a text file and then opened in Word. The background (underlined, highlighted in red) and scaling factor (underlined, highlighted in blue) are at the top of the file. These parameters can also be found by opening the .CHP file in GeneChip © and going to View -> Probe Info (which will include the background) and then View -> Parameters (which will include the scaling factor, SF). Output This program will output the .CEL file with the added extension .adf (which stands for "adjusted difference file"). These .adf files are needed as input for GAPS©. The following is an example of a .adf file: 99 At the top of the file are the chip type, background, and scaling factor values that were used by PreGAPS©. The columns tell the x and y coordinates of each PM feature and the fluoresecence intensity values, after background subtraction, of the PM and MM features. The Difference, with a small rounding error, is found by the following equation: (PM-MM) * scaling factor * correction factor. The correction factor is dependent on the intensity of local control features and corrects for uneven fluorescence intensity across the surface of the chip, explained in the following section. Spatial Correction The array contains a regularly spaced 10 x 10 grid of control feature pairs which all hybridize to the same control oligonucleotide, and should thus be of equal intensity. However, we found that fluorescence intensity of these features typically varied about 2-3 fold across the surface of the array, possibly because of local differences in washing/staining efficiencies. The following is a graph of the variation in the signal from these control features across the surface of the chip for two log phase replicates (Note: the data to create these graphs is automatically generated and can be found in the control_grid folder with a _controls suffix): 100 Log Phase 1 Log Phase 2 3000 22000 20000 2500 18000 2000 14000 Intensity 12000 10000 8000 2000 500 200 400 300 0 100 200 Y 100 600 500 400 300 500 X 4000 0 1000 600 500 400 300 6000 1500 0 500 200 400 300 X Intensity 16000 100 200 0 Y 100 0 0 To correct for this spatial variation, the control grid was used to estimate local deviations in florescence intensity. First, each pair of controls were averaged. Then experimental features were multiplied by a correction factor which is derived from control features representing the relative brightness of the region. Control features closer to the probe pair contributed more to the final correction factor than distant ones. This correction factor was determined by the following equation: c Correction Factor = 1 4 di 4 1 ci i 1 j 1 dj where di or j is the Euclidean distance from the PM feature to the 4 closest control features, ci is the intensity of control feature i, and c is the mean of all control features on the array. In the Log Phase 1 vs Log Phase 2 example above, where the control features showed a large amount of variation between arrays, the correction factor reduced the average coefficient of variation (coefficient of variation = standard deviation / mean) from 0.66 to 0.28. In Stationary Phase 1 vs. Stationary Phase 2, where the control features showed a similar pattern between arrays, there was a probably insignificant reduction from 0.53 to 0.52. These numbers were calculated from the 2max of ORF probe sets, which are located on the top half of the arrays. 101 IV. GAPS© Command line: program_name output_file_name Example: gaps.pl my_analysis where: program_name = gaps.pl output_file_name = my_analysis Here are the user-defineable parameters, named "parameters.txt" in the "ref" folder: ############# List of files to be processed $input_file_list = "input_LPvsSPvsgDNA"; This points GAPS© to a file which contains all of the names and paths of pre-GAPS© files to be used in the analysis. This is where the user assigns pre-GAPS© files to timepoints/conditions. The format for this file is as follows: header <tab> timepoint 1 name <cr> filename 1 <cr> filename 2 <cr> header <tab> timepoint 2 name <cr> filename 3 <cr> filename 4 <cr> where <tab> is the tab key, <cr> is carriage return, and the word "header" is typed literally, i.e. not substituted by a name. Example This file is provided in the "ref" folder and is setup to analyze the .CEL files from Selinger et al Nature Biotechnology 2000 after they have been processed by preGAPS©. header LP pdata/pregaps_output_files/log1.CEL.adf pdata/pregaps_output_files/log2.CEL.adf header SP pdata/pregaps_output_files/stat1.CEL.adf pdata/pregaps_output_files/stat2.CEL.adf header gDNA pdata/pregaps_output_files/gDNA.CEL.adf Note there should be no extra lines or carriage returns at the beginning or end of the file and the file should be plain text only. The example 102 reference file tells GAPS© that the first condition has two pre-GAPS© files (replicates) which are both for the condition named LP (log phase). The second condition contains two replicates for the condition named SP (stationary phase) and the last contains only a single pre-GAPS© file named gDNA (genomic DNA). The number of conditions and chips GAPS© can handle is limited only by available RAM. A rule of thumb for memory requirements is ~30 MB/chip. Many conditions requires more memory than many replicates. ############# Choice of Analyses $orf_summary = 1; Generate summary report of ORFs? 1 = yes, 0 = no. (Default = 1) This tells GAPS© whether or not to create a report describing changes in relative ORF abundances. $hires_file = 1; Generate high resolution file? 1 = yes, 0 = no. (Default = 1) This tells GAPS© whether or not to run an oligo by oligo analysis. This allows the user to later use GAPScan© to analyze expression results at high resolution. Two large (usually > 5 MB) are generated, one for the watson strand and one for the crick strand. These files will not be affected by changing ORF summary parameters (see below) so high resolution analysis only needs to be done once per dataset, even if multiple ORF summary analyses are done. ############# ORF Summary Parameters: $cutoff_sdevs = 3; the number of standard deviations above the negative controls to consider a transcript present (Default = 3). The negative controls are any B. subtilis control RNAs which were NOT spiked in by the user. The user must tell GAPS© which control RNAs were not spiked by appropriately setting the $neg_start and $neg_end parameters (see above). $rank_report = 2; $last_rank = 2; the range of the ranked probe pairs which should represent the probe set (Default: $rank_report = 2, $last_rank = 2). The fifteen probe pairs are ranked from brightest (1) to dimmest (15) and a subset of these is used to measure the abundance of the transcript. Setting both to 2 gives the "2max" (second maximal) and setting both to 8 gives the median. A range of ranks can also be used, for example setting them to 2 and 14 will give the mean of the 2nd to 14th ranks, "2-14max". 2-14max is analagous to Affymetrix's "average difference" measurement. 2-8max also works well as it takes the upper half of the signal distribution. Whichever metric is used is also applied to the negative controls to measure the amount of signal which that metric can be expected to give at random. This is then 103 used to call the transcripts present or absent, according to the standard deviation cutoff defined in $cutoff_sdevs above. $use_ttest = 1; "1" uses t-test to assign significance to changes, "0" doesn't (uses change cutoff regardless)(Default = 1) A t-test for significant changes can only be done when a single probe pair is used, such as 2max or median. $c_value = 4.303; critical value for t-test (decides significant changes) With 2 degrees of freedom use 4.303 for 95% confidence. The degrees of freedom, and thus the critical value, will change depending on the number of replicates of each condition/timepoint. Consult a critical value table for a Student's ttest. $change_cutoff = 11; number of probe pairs which must be different (by any amount) for change to be considered significant (Default = 11). The fifteen ranked probe pairs are compared between two conditions, rank 1 to rank 1, rank 2 to rank 2, etc., regardless of whether they are actually the same probe pair (although they usually are). If this is set at 11, then 11 of these must be larger or smaller in one condition for the change to be considered significant. Using a $change_cutoff of 11, the algorithm correctly assigned significant changes to 52/52 of probe sets for control RNAs spiked at known concentration changes, all of which had fold changes of at least 2fold. Probe sets for control RNAs spiked at equal concentrations showed no significant changes (0/16). ############ ORF Summary Parameters not needed for GAPS© version 1.1 or higher. See important note below*. $neg_start = 17; $neg_end = 20; These parameters define which probe sets to use as negative controls. These MUST correspond to control B. subtilis probe sets for which NO RNA was spiked. These probe sets allow GAPS© to decide the distribution of signal expected when there is no RNA present. All signal at the negative controls is assumed to be nonspecific and is used as a baseline to determine which transcripts have specific signal and should thus be called "present." There are 20 control probe sets corresponding to 5 different B. subtilis RNAs. 1-4 are dap, 5-8 are lys, 9-12 are phe, 1316 are thr, and 17-20 are trp. If no control RNAs were spiked set $neg_start = 1 and $neg_end = 20. If all but trp RNA were spiked set $neg_start = 17 and $neg_end = 20. Currently the choices need to be consecutive so that you cannot use dap and trp but not lys as negative 104 controls. The user must be certain to correctly define the negative controls, or the ORF summary analysis will be meaningless. *Version 1.1 of GAPS© (GAPS1-1.pl) uses 90 control probe sets which are present on the E. coli chip but not in the standard MG1655 genome. These include genes from bacteriophage lambda and several plasmid encoded genes. These probe sets are located from coordinates 421,81 to 152,87. Please check that this region of the chip doesn't contain any genes which you expect to be present in your experiment. If it does, we recommend using GAPS© version 1.0 (gaps.pl). This change was made so that more negative controls can be included (90 vs. a maximum of 20) and to make the negative controls independent of the B. subtilis controls, which are then freed for use as positive controls and/or for spiking experiments. These two parameters ($neg_start and $neg_end) are ignored by GAPS© version 1.1 (but absolutely neccessary for GAPS© version 1.0). GAPS© Output Files A. ORF Summary Output There are two output files produced by the ORF summary analysis. These are identified by the file suffixes: _rpt = These are the report files which contain a summary of how all transcripts behaved in the experiments. _info = A summary of the negative controls, including their mean, standard deviation, the resulting detection cutoff, number of probe sets used, and the metric used (such as rank 2 or ranks 2 to 14). The following is an example of the _rpt output from the ORF summary analysis opened in Excel: 105 This is data from a comparison of log vs. stationary phase, sorted by bnumber. All data is output as tab-delimted text files which are easily imported into Excel. We'll step through this data file column by column. Some of the column headers are cut off due to space constraints. Bnumber: The unique number assigned to all ORFs in the Blattner annotations. Gene: The common name of the ORF according to the Blattner annotations. LP Intensity: The fluorescence intensity of condition "LP" (log phase) using the metric selected in the parameters (discussed above). The fluorescence of the negative controls is subtracted from this number, so that numbers greater than 0 are brighter than the negative controls. The same summary metric is used for the negative controls as for the ORFs so they can be compared. For example, if GAPS© is set to average probe pairs of ranks 2 - 14, it will do the same for the negative controls. LP sdev*: The standard deviation of LP replicates. If the condition contains only one chip, this will be zero. LP detection: Whether the transcript is considered present (P) or absent (A). In order to be considered present the intensity must be a certain number of 106 standard deviations (as defined by the $cutoff_sdevs parameter) above the negative controls. SP Intensity, sdev, and detection: Same as above but for the condition labeled SP (stationary phase). The following columns are comparisons to the first condition (LP) which is taken as the baseline condition. SP absolute change: SP Intensity - LP Intensity SP fold change sig: If the transcript is present in both conditions, this will be blank. If it is absent in one of the conditions it will have a greater-than (>) or less-than (<) sign indicating that the fold change in the following column is expected to be an over- or under-estimate, respectively. The estimate is made by replacing the intensity of the absent transcript by the intensity of the negative controls + (the standard deviation * the number of standard deviations specified in the $cutoff_sdevs parameter). If both transcripts are absent, it will be scored "not determined" (ND). SP fold change: SP/LP. When the transcript is absent in one condition this will be an estimate (see above), and if absent in both conditions it will be scored "not determine" (ND). SP t-test*: The t statistic from a Student's t-test comparing the difference between the SP and LP intensities. This will only be meaningful when SP and LP are done in replicate. SP call: This scores whether the transcript has changed and whether this change is considered significant according to the user's paramters. The possible scores are increased (I), significantly increased (SI), decreased (D), significantly decreased (SD), or not determined (ND). This column deserves a little more elaboration. First of all, for the change to be considered for significance it must have been called present in at least one condition. Otherwise, it is scored ND. If it has been detected in at least one condition the change will be called significant if it meets at least one of the following two criteria: i) It passes a Student's t-test using the critical value supplied in the $c_value parameter. This test will only be performed if the $use_ttest paramter is set to 1. ii) When the the probes at each rank are compared, at least a certain number of probes (defined in the $change_cutoff parameter) are greater in one condition than in the other, by any amount. RNA type: Whether the RNA represents a coding sequence (CDS), ribosomal RNA (rRNA), transfer RNA (tRNA), or miscellaneous RNA (misc_RNA). 107 Length (bases): Length of the RNA in bases. Annotation: According to the Blattner annotations. *This column will not be present if the summary metric chosen is an average of multiple ranks. GAPS© can handle any number of conditions, as long as there is enough available RAM. Additional columns, of the same types as those above, will be added automatically and all comparisons are done to the first condition, which is considered the baseline. B. High Resolution Analysis Note that high resolution output files contain oligo sequences which are copyrighted by Affymetrix and their use is subject to certain terms and conditions. This analysis generates two high resolution files: _wat = The Watson strand _crk = The Crick strand _neg = A summary of the negative control features These files contain all of the oligos on the chip, in the order they are found in the genome (5' to 3' for Watson, 3' to 5' for Crick), and all of the data for each oligo. For this reason these files tend to be large and cannot be opened in Excel. However, it can be opened in a text editor and regions of interest can be selected and pasted into Excel. GAPScan©, the last script in the GAPS© package, can select regions of interest automatically. The _neg file gives detailed info on the negative controls, oligo by oligo. Gives the chip position (x,y coordinates), PM-MM for each chip, the average of replicates, and standard deviation. Then, for each condition, it reports the mean and standard deviation of all of the negative control probe pairs on the chip. If there are replicates then this is the mean of all the probe pair averages. The _wat and _crk files give a high resolution oligo by oligo analysis of transcription. Here is an example of a section of the Watson strand: 108 - The "Name" column is the name given to the probe set by GAPS ©. In this case the first one is ig_ds_b4403 which means this intergenic region (ig_) is downstream (ds_) of b4403, the last predicted ORF. For Crick files the intergenic annotation denotes the gene the region is upstream of. - The "Affy_ID" column is the name given to this probe set by Affymetrix so that it can be found with Affymetrix's GeneChip© software. - The "Oligo" column has the oligo sequence. - Then follow the x and y coordinates of the feature on the chip, and the 5' and 3' ends of the ORF in the genome, and the position of the center of the oligo in the genome. - Then follow the intensities of each oligo on each chip (LP1, LP2, etc), the mean of replicates (LP Average), number of standard deviations above the negative controls (LP sig - note that the data on the negative controls can be found in the _neg file), the standard deviation of the LP duplicates (LP SD) and finally markers for the 5' and 3' end of each ORF to make it easier to include the ORF boundaries in a plot of the oligo data. The following is a sample graph generated in Excel with the above data: 109 b0001 - Threonine Leader Peptide 700000 600000 Intensity (PM-MM) 500000 400000 LP Average SP Average ORF 5' ORF 3' 300000 200000 100000 0 0 50 100 150 200 250 300 350 -100000 Genome Position V. GAPScan© GAPScan© is a "reader" for the high resolution files which are output from GAPS©. It was used to select and organize the data for the above graph. The user specifies the high resolution file to use, the region of the genome to report, and the strand and GAPScan© searches through the files and returns all corresponding oligos and data. The command line format is as follows: gapscan.pl datafile locusname ORF_start ORF_end strand Example gapscan.pl LPvsSP my_fav_locus 100 2000 w Where the datafile is the base name of the high resolution file generated by GAPS© (leave off the "_crk.txt" or "_wat.txt). This will run the program GAPScan.pl on the LPvsSP high resolution analysis file and output the information from the Watson strand between genome positions 100 and 2000 to a file named "my_favorite_locus". If the strand is left out, the program outputs both strands for the specified region. 110 The program will work for queries which begin and end on opposite sides of the origin, but will give warnings about how the resulting output may differ from other queries, such as the file being out of order. A datafile listing predicted operons and their locations is provided in the "ref" folder for the user's convenience. This file is a list of operons predicted by Julio Collado-Vides and colleagues (in modified form). The most current list can be found at RegulonDB: http://tula.cifn.unam.mx:8850/regulondb/regulon_intro.frameset Reference Salgado, et al. RegulonDB (version 3.0): transcriptional regulation and operon organization in Escherichiacoli K-12. Nucleic Acids Res. 2000 Jan 1;28(1):65-67. Acknowledgements I thank George Church for giving me the opportunity to work with such an exciting emerging technology, Jeremy Edwards, Dan Janse, and Vasudeo Badarinarayana for being the first users of my software and suggesting valuable features which have since been included, Adnan Derti for BLASTing all the oligos on the chip against the genome and providing me with the results, and the rest of the Church lab for the support and advice which greatly aided the development of these software tools. We thank Affymetrix for supplying us with their technology before it was publicly available and for their foresightedness in allowing release of the E. coli chip oligo sequences, which make possible this software package, as well as many future analyses. We hope Affymetrix will continue in this spirit of openness by keeping .CEL files openly readable and by considering the release of more chip oligo sequences so that researchers may get the fullest possible benefit from their data. 111 Appendix C Measured half-lives of 2,679 Escherichia coli mRNAs 112 Half-life calculation by non-linear least squares curvefitting In order to improve the statistical rigor and accuracy of the half-life calculation, the data for each RNA was fitted to an exponential function (of the form A = A0ekt) using a non-linear least squares algorithm implented in MATLAB (function nlinfit in the statistics toolbox which uses the Gauss-Newton method). The function nlparci was then used to estimate 95% confidence intervals for the two parameters (A0,low, klow, A0,hi, and khi) which were then used to calculate upper and lower bounds of the half lives. Half-lives were calculated for 2,679 RNAs for which A0,low was positive (95% confidence of detection) and khi was negative (95% confidence of decrease). Half-lives are reported in minutes. 907 half-lives calculated by both methods are plotted in figure 1. Transcripts which fall along the line fit an exponential degradation pattern, whereas those that fall off the line do not. Those above the line degrade more slowly at the beginning of the timecourse (over their first 2-fold change) and more quickly later on. Those below the line have the opposite pattern. Exponential Fit vs. 2-Fold Method 40 35 2-fold half-life (min) 30 slope = 1 25 20 15 10 5 R = 0.57 0 0 5 10 15 20 25 Exponential fit half-life (min) Figure. 1 113 30 35 40 b0074 b0076 b0077 b0078 b0080 b0081 b0082 b0083 b0084 b0085 b0086 b0087 b0088 b0089 b0090 b0091 b0092 b0093 b0095 b0096 b0097 b0098 b0100 b0102 b0103 b0104 b0105 b0106 b0109 b0110 b0111 b0112 b0114 b0115 b0116 b0118 b0119 b0120 b0121 b0122 b0123 b0125 b0126 b0127 b0128 b0129 b0130 b0131 b0132 b0133 b0134 b0143 Measured half lives for 2,679 E. coli mRNAs, with best estimate and 95% confidence upper and lower bounds, respectively. B# b0002 b0003 b0008 b0009 b0014 b0015 b0016 b0019 b0020 b0022 b0023 b0025 b0026 b0027 b0028 b0030 b0031 b0036 b0037 b0038 b0039 b0043 b0045 b0048 b0049 b0050 b0051 b0052 b0053 b0054 b0055 b0056 b0057 b0058 b0059 b0062 b0063 b0064 b0065 b0067 b0069 b0070 b0071 b0072 b0073 Gene thrA thrB talB mog dnaK dnaJ yi81_1 nhaA nhaR insA_1 rpsT ribF ileS lspA slpA yaaF dapB caiD caiC caiB caiA fixC yaaU folA apaH apaG ksgA pdxA surA imp yabH yabP yabQ yabO hepA araA araB araC yabI yabK yabN yabM leuD leuC leuB HL HL min HL max 5.8 3.3 23.7 5.1 2.6 237.2 11.4 9.4 14.4 7.6 4.9 16.3 15.4 8.7 63.2 13.1 8.6 27.3 13.4 7.6 58.6 8.3 5.5 16.4 6.6 3.6 42.5 10.2 7.0 18.8 10.1 5.5 56.3 13.8 8.2 43.2 9.6 5.4 43.7 12.6 7.0 61.6 13.3 7.2 92.7 11.5 6.5 51.1 13.3 6.8 211.2 17.8 8.9 1641.7 19.5 10.9 92.1 8.7 6.1 15.7 2.5 1.5 8.5 10.5 5.9 43.8 15.1 9.1 43.2 2.4 1.4 7.0 10.1 5.7 43.2 7.3 5.7 10.4 7.1 5.4 10.5 6.7 3.7 31.1 7.5 6.0 10.0 6.4 4.4 11.6 7.6 5.6 11.9 8.7 6.3 13.8 11.4 7.2 28.1 16.2 9.6 50.9 9.9 6.8 18.4 20.5 11.1 132.7 12.2 7.4 35.7 11.4 7.0 31.1 11.6 6.6 46.0 7.8 4.3 39.1 9.4 5.7 26.6 2.5 1.5 7.3 8.7 5.1 27.5 11.8 7.2 33.8 13.9 9.3 27.6 114 leuA leuO ilvI ilvH fruR yabB yabC ftsL ftsI murE murF mraY murD ftsW murG murC ddlB ftsQ ftsZ lpxC yacA secA yacF yacE guaC hofC nadC ampD ampE aroP aceE aceF lpdA acnB yacL speD speE yacC yacK hpt yadF yadG yadH yadI yadE panD yadD panC panB pcnB 13.4 9.1 5.5 2.7 3.9 3.1 4.4 6.1 6.3 10.3 11.5 11.2 13.6 9.8 11.6 10.0 7.9 9.0 11.3 9.0 8.5 12.7 12.2 8.1 7.4 15.3 11.0 8.2 7.8 3.0 7.4 12.2 11.7 15.0 12.2 9.0 2.6 12.9 6.0 5.1 16.9 11.1 6.3 11.4 11.5 4.0 14.9 5.6 3.7 10.6 5.0 8.8 8.4 5.9 3.8 1.7 3.1 2.4 3.4 4.8 5.0 7.2 7.9 7.6 8.8 6.5 7.7 6.9 5.3 5.6 6.2 6.1 4.4 7.5 6.6 6.0 4.7 10.6 5.7 5.7 5.5 2.6 5.6 8.5 7.8 8.2 8.4 7.4 1.9 9.1 4.2 4.0 10.3 6.3 4.3 8.1 7.5 3.0 8.8 4.1 2.6 7.2 3.9 5.7 33.0 19.9 9.8 5.8 5.3 4.3 6.0 8.4 8.6 18.1 21.3 21.7 29.7 19.8 23.6 18.4 16.0 23.9 59.8 16.8 89.2 42.1 73.9 12.5 17.2 27.6 153.9 14.3 13.4 3.6 11.0 21.3 23.9 91.5 22.0 11.6 4.0 21.7 10.3 7.3 47.5 49.1 12.3 19.2 24.1 5.9 49.5 8.6 6.7 19.8 7.0 19.2 b0145 b0146 b0147 b0148 b0149 b0150 b0151 b0152 b0153 b0154 b0155 b0156 b0159 b0162 b0163 b0164 b0165 b0166 b0167 b0168 b0169 b0170 b0171 b0172 b0173 b0174 b0175 b0176 b0177 b0178 b0179 b0180 b0181 b0182 b0183 b0184 b0185 b0186 b0187 b0188 b0190 b0191 b0192 b0194 b0195 b0196 b0197 b0198 b0200 b0207 b0208 b0209 dksA sfsA yadP hrpB mrcB fhuA fhuC fhuD fhuB hemL yadQ yadR pfs yaeG yaeH yaeI dapD glnD map rpsB tsf pyrH frr yaeM yaeS cdsA yaeL yaeT hlpA lpxD fabZ lpxA lpxB rnhB dnaE accA ldcC yaeR mesJ yaeQ yaeJ cutF proS yaeB rcsF yaeC yaeE yaeD yafB yafC yafD 7.5 5.6 3.6 6.9 14.5 8.3 9.4 11.2 13.7 9.5 5.5 6.5 7.6 1.9 10.6 8.4 7.9 7.9 7.5 6.5 8.9 9.8 7.2 7.3 13.3 8.0 6.4 9.1 10.0 17.5 9.8 8.4 12.7 7.7 15.0 10.0 12.6 4.7 4.8 11.9 7.1 7.3 6.1 10.0 12.1 5.7 8.1 6.5 9.0 10.5 4.0 2.6 5.9 3.9 2.5 5.1 10.4 5.0 6.3 6.1 8.0 5.2 4.2 4.4 5.3 1.1 6.8 5.4 5.4 5.9 5.8 5.4 7.0 7.6 6.0 5.4 8.1 5.7 4.4 6.8 8.1 10.2 6.8 6.5 8.8 4.9 9.6 7.6 8.8 3.7 3.7 8.0 5.6 5.3 4.8 6.9 7.2 3.7 4.1 3.4 6.9 6.5 2.1 2.0 10.5 10.3 6.8 10.7 23.7 24.9 18.0 66.4 48.5 57.6 7.9 12.1 13.8 6.4 24.5 19.3 14.4 12.0 10.7 8.4 12.1 13.9 9.2 11.4 36.9 13.5 11.5 13.6 13.1 58.8 17.7 12.1 22.4 17.6 33.6 14.6 22.2 6.6 6.8 22.9 9.8 11.8 8.3 18.3 38.8 12.2 275.3 67.8 12.8 28.6 121.9 3.8 b0210 b0212 b0213 b0214 b0215 b0219 b0220 b0222 b0223 b0224 b0225 b0226 b0227 b0228 b0231 b0232 b0234 b0235 b0237 b0238 b0239 b0240 b0241 b0242 b0243 b0249 b0250 b0251 b0254 b0255 b0257 b0258 b0260 b0261 b0265 b0267 b0268 b0269 b0271 b0275 b0276 b0280 b0281 b0287 b0288 b0300 b0304 b0305 b0306 b0307 b0308 b0311 115 yafE gloB yafS rnhA dnaQ yafV ykfE gmhA yafJ yafK yafQ dinJ yafL yafM dinP yafN yafP pepD gpt yafA crl phoE proB proA ykfF ykfB yafY perR yi91a ykfC ykfD yagD insA_2 yagA yagE yagF yagH insA_3 yagJ yagN intF yagU ykgJ ykgA ykgC ykgD ykgE ykgF ykgG betA 9.3 8.7 6.1 6.5 5.3 2.1 9.5 5.2 7.9 6.0 3.0 2.4 8.4 3.9 4.4 1.9 12.6 10.7 9.9 8.4 4.4 9.0 14.1 7.0 5.0 18.3 4.8 5.6 10.0 10.7 8.9 9.3 7.9 1.7 9.7 12.3 4.5 12.3 15.4 9.7 5.4 4.7 8.1 3.5 9.7 10.9 12.0 4.9 4.8 4.9 6.2 4.3 6.1 5.7 4.4 4.7 3.5 1.1 6.8 3.9 6.1 4.1 2.2 1.8 5.3 2.8 2.9 1.3 7.5 6.2 7.2 5.1 2.5 4.8 8.2 4.7 3.4 10.8 3.4 3.0 6.8 5.7 6.5 4.7 5.3 1.2 6.9 7.7 2.9 7.7 8.0 6.9 3.3 2.7 5.2 2.6 5.3 6.6 6.4 3.6 3.5 3.8 5.2 2.2 20.0 17.9 9.9 10.6 10.5 22.3 15.5 7.8 10.9 11.1 5.0 3.5 19.2 6.3 9.4 3.4 40.2 37.9 15.9 23.1 19.7 78.6 51.9 13.5 9.6 61.2 8.0 39.3 18.7 82.7 14.3 359.1 15.9 2.7 16.6 30.3 9.7 29.4 231.1 16.6 14.6 18.9 17.8 5.3 56.6 30.3 105.3 7.4 7.5 7.1 7.9 57.9 b0313 b0314 b0315 b0318 b0319 b0320 b0322 b0325 b0328 b0329 b0331 b0332 b0333 b0335 b0338 b0342 b0343 b0344 b0346 b0347 b0349 b0352 b0353 b0354 b0356 b0357 b0358 b0362 b0366 b0369 b0371 b0376 b0380 b0381 b0382 b0384 b0385 b0386 b0387 b0388 b0389 b0390 b0391 b0393 b0394 b0395 b0397 b0398 b0399 b0401 b0402 b0403 betI betT yahA yahD yahE yahF yahH yahK yahN yahO prpB prpC prpE cynR lacA lacY lacZ mhpR mhpA mhpC mhpE mhpT yaiL adhC yaiN yaiO tauB hemB yaiT yaiH ddlA yaiB psiF yaiC proC yaiI aroL yaiA aroM yaiE yaiD yajF sbcC sbcD phoB brnQ proY malZ 3.1 8.5 8.5 10.3 14.7 8.0 8.9 11.9 2.3 13.7 14.5 8.8 3.7 9.8 6.8 2.4 2.0 2.1 5.3 6.7 6.7 8.0 4.2 7.5 9.7 9.0 19.1 8.3 9.2 8.4 13.1 7.4 3.3 6.1 3.3 5.2 17.8 9.5 9.5 8.3 10.9 10.9 9.6 9.2 4.8 7.6 18.6 14.2 7.4 8.4 13.8 11.5 2.0 5.7 4.9 6.2 8.0 4.1 6.4 8.1 1.7 7.7 9.4 5.4 2.1 6.4 4.2 1.9 1.6 1.6 2.7 4.2 3.6 4.7 2.7 5.9 6.7 5.2 10.0 4.2 6.4 5.9 6.9 5.0 2.5 4.6 2.6 3.6 10.3 6.2 6.1 5.2 6.6 5.5 6.8 6.0 3.4 6.5 9.4 7.7 4.2 4.6 7.5 8.3 6.9 17.0 31.4 31.4 88.8 252.9 14.8 21.9 3.5 61.1 31.8 23.2 13.7 21.3 17.3 3.3 2.7 3.3 626.5 17.1 44.6 26.0 10.1 10.5 17.5 36.6 239.4 353.6 16.5 14.5 123.5 14.9 4.6 9.0 4.6 9.4 66.7 21.1 20.8 20.2 31.3 811.6 16.3 19.2 7.8 9.3 592.8 90.7 28.9 45.2 83.7 18.9 b0404 b0405 b0406 b0407 b0408 b0409 b0410 b0411 b0413 b0414 b0415 b0416 b0417 b0418 b0419 b0420 b0421 b0422 b0423 b0424 b0425 b0426 b0427 b0428 b0429 b0430 b0431 b0432 b0433 b0434 b0435 b0436 b0437 b0438 b0439 b0440 b0441 b0442 b0443 b0444 b0445 b0449 b0452 b0453 b0454 b0456 b0457 b0458 b0459 b0460 b0461 b0462 116 yajB queA tgt yajC secD secF yajD tsx ybaD ribD ribH nusB thiL pgpA yajO dxs ispA xseB yajK thiJ apbA yajQ yajR cyoE cyoD cyoC cyoB cyoA ampG yajG bolA tig clpP clpX lon hupB ybaU ybaV ybaW ybaX ybaE mdlB tesB ybaY ybaZ ybaA ylaB ylaC ylaD hha ybaJ acrB 12.1 7.8 8.9 8.7 7.7 10.3 7.8 5.7 4.7 4.2 9.6 9.5 20.1 4.3 9.8 12.1 6.5 5.2 10.2 6.1 3.5 7.1 12.8 10.6 5.0 11.6 6.4 8.0 10.8 8.2 5.9 11.6 4.9 8.4 11.2 15.8 8.7 8.5 7.7 9.0 6.6 11.0 3.3 6.6 14.9 13.2 12.7 4.1 16.9 10.1 6.1 6.7 6.5 4.4 6.6 6.2 5.3 5.5 5.5 4.2 3.8 3.2 7.1 7.4 11.6 3.0 7.4 9.0 4.9 3.9 6.9 4.6 2.6 5.2 7.1 8.1 3.8 8.5 4.7 5.8 7.4 4.9 3.1 6.1 3.9 6.4 8.6 10.2 6.3 5.8 5.3 5.7 4.0 7.1 1.9 5.0 8.1 7.6 7.7 2.9 9.7 6.4 4.2 4.2 79.7 32.7 13.7 14.8 14.1 78.3 13.3 8.9 6.1 5.9 14.8 13.3 75.0 7.8 14.1 18.8 9.8 7.7 19.5 8.8 5.5 11.3 62.9 15.3 7.5 18.2 10.0 13.1 19.8 25.9 51.8 135.6 6.6 12.3 16.1 34.9 14.2 15.7 14.3 21.7 18.6 24.7 13.1 9.7 97.1 51.1 35.5 6.9 63.5 23.8 11.0 17.1 b0463 b0464 b0465 b0466 b0467 b0468 b0469 b0471 b0472 b0474 b0475 b0476 b0477 b0478 b0480 b0482 b0483 b0485 b0486 b0487 b0488 b0489 b0490 b0492 b0493 b0494 b0495 b0496 b0505 b0506 b0514 b0518 b0521 b0524 b0525 b0526 b0527 b0529 b0542 b0543 b0546 b0549 b0553 b0555 b0564 b0565 b0571 b0572 b0577 b0578 b0579 b0581 acrA acrR aefA ybaM priC ybaN apt ybaB recR adk hemH ybaC gsk ybaL ushA ybaP ybaQ ybaS ybaT ybbI ybbJ ybbK ybbL ybbN ybbO tesA ybbA ybbP ybbT ybbU ybbZ fdrA arcC ybbF ppiB cysS ybcI folD emrE ybcM ybcO nmpC ybcS appY ompT ylcA ylcB ybdG nfnB ybdF ybdK 8.6 5.0 12.6 16.7 6.9 6.2 11.3 11.7 13.4 12.3 6.4 10.1 12.2 7.0 11.6 8.0 4.5 2.9 6.5 3.0 6.0 2.7 5.6 11.9 13.0 8.9 8.4 8.9 11.9 3.7 13.1 13.0 5.8 13.5 8.5 14.2 11.0 5.9 6.5 14.9 10.6 9.7 1.7 14.3 5.2 4.4 10.2 6.1 6.4 8.1 1.6 5.6 6.7 3.1 8.1 9.1 5.1 3.7 6.6 6.5 6.9 6.1 4.8 7.1 6.6 5.4 7.6 5.4 3.3 2.2 4.7 2.0 4.0 1.7 4.4 8.3 8.5 6.5 5.5 6.3 6.9 2.1 8.7 8.7 3.4 9.9 6.2 8.1 6.6 4.0 4.1 7.9 5.8 5.2 1.4 8.6 2.8 3.0 6.2 3.4 4.5 6.0 1.0 3.8 11.9 13.1 28.4 96.1 10.4 18.2 41.0 58.9 222.0 2423.3 9.6 17.5 73.8 10.1 24.5 15.5 7.3 4.5 10.7 5.4 12.2 6.3 7.8 21.4 27.4 14.3 17.6 15.0 44.5 17.0 26.6 25.8 21.1 21.3 13.8 56.4 33.4 11.4 15.9 124.5 65.5 69.7 2.2 44.0 32.9 8.4 29.0 26.6 11.1 12.4 3.1 11.3 b0582 b0583 b0584 b0585 b0587 b0593 b0594 b0595 b0598 b0599 b0600 b0601 b0602 b0604 b0605 b0606 b0607 b0608 b0609 b0610 b0611 b0612 b0620 b0621 b0623 b0628 b0630 b0631 b0632 b0634 b0636 b0637 b0639 b0641 b0642 b0643 b0644 b0646 b0648 b0651 b0652 b0655 b0657 b0658 b0659 b0660 b0662 b0671 b0674 b0675 b0676 b0677 117 yi81_2 entD fepA fes fepE entC entE entB cstA ybdH ybdL ybdM ybdN dsbG ahpC ahpF ybdQ ybdR rnk rna ybdS citB dcuC cspE lipA lipB ybeD dacA mrdB ybeA ybeB ybeN rlpB leuS ybeL ybeQ ybeS ybeU ybeK gltL ybeJ lnt ybeX ybeY ybeZ yleB asnB nagD nagC nagA 13.4 23.5 20.8 18.1 14.9 11.4 13.8 10.7 4.1 2.2 3.2 3.5 22.2 7.5 11.1 7.8 5.9 6.7 4.8 11.6 7.5 21.7 9.2 19.1 5.0 11.1 12.5 13.2 9.0 8.7 10.9 7.2 15.7 9.2 8.6 7.5 5.7 21.0 23.5 5.7 11.2 9.9 13.7 8.4 7.2 8.3 8.0 4.6 6.9 4.2 4.5 2.0 7.6 11.8 10.7 9.7 8.7 8.4 8.7 8.1 3.3 1.3 1.6 1.9 12.2 5.5 8.0 5.9 4.2 4.1 2.5 7.5 5.4 11.3 6.2 10.8 2.7 6.9 8.0 8.1 5.9 6.2 7.5 4.9 9.2 6.5 6.3 5.0 3.1 11.0 13.3 4.5 7.7 7.0 8.9 6.5 4.8 4.7 4.5 2.4 4.0 2.9 3.7 1.6 58.6 5295.7 359.2 137.5 50.6 17.6 33.3 15.4 5.3 9.6 52.6 23.8 123.1 11.8 18.0 11.6 9.8 17.9 66.2 25.6 12.3 264.7 17.4 84.8 42.6 29.2 28.0 34.6 19.2 14.7 20.3 13.2 53.9 15.4 13.2 15.4 38.6 258.0 99.7 7.7 20.7 16.7 29.3 12.1 15.1 34.2 35.1 60.9 27.0 7.9 5.7 2.7 b0678 b0679 b0680 b0682 b0683 b0684 b0686 b0687 b0688 b0695 b0696 b0698 b0699 b0706 b0707 b0710 b0711 b0712 b0714 b0721 b0722 b0723 b0724 b0725 b0726 b0727 b0729 b0730 b0735 b0736 b0737 b0738 b0739 b0740 b0741 b0742 b0750 b0751 b0752 b0753 b0754 b0755 b0756 b0757 b0758 b0759 b0760 b0762 b0764 b0766 b0767 b0773 nagB nagE glnS ybfN fur fldA ybfF seqA pgm kdpD kdpC kdpA ybfA ybfD ybgA ybgI ybgJ ybgK nei sdhC sdhD sdhA sdhB sucA sucB sucD farR ybgE ybgC tolQ tolR tolA tolB pal ybgF nadA pnuC ybgR aroG gpmA galM galK galT galE modF modB ybhA ybhE ybhB 2.6 4.5 10.9 16.1 6.8 6.9 8.1 6.0 8.9 5.9 11.6 5.7 3.9 12.5 10.6 9.0 4.5 6.7 15.6 6.8 3.4 4.7 3.9 5.9 1.9 2.4 4.1 6.2 18.3 6.7 11.5 13.6 10.1 10.9 10.3 9.9 7.2 6.5 5.1 18.8 7.7 17.7 6.2 5.2 4.9 4.3 11.6 2.5 1.7 8.4 9.9 10.0 2.0 3.4 6.5 8.8 5.2 5.1 4.8 4.2 4.9 3.5 7.0 3.3 2.6 6.8 6.0 5.3 3.3 4.7 10.3 5.1 2.8 3.5 2.9 4.5 1.3 1.8 2.9 5.1 10.7 4.8 7.4 8.9 7.1 7.7 7.9 6.5 3.7 3.3 2.5 10.1 4.1 9.7 4.8 3.9 4.2 3.2 8.0 1.5 1.0 5.1 7.7 7.8 3.8 6.7 32.6 102.4 9.9 10.5 25.8 10.3 55.4 18.7 33.5 20.5 7.2 82.3 46.7 30.5 7.1 11.8 32.9 10.1 4.6 7.4 5.7 8.3 3.4 3.5 7.1 7.9 64.3 10.8 25.1 28.9 17.4 18.7 14.8 20.8 120.1 332.2 1360.2 143.1 72.0 105.7 8.7 7.9 6.0 6.6 21.4 8.2 8.0 23.7 13.7 13.9 b0774 b0775 b0777 b0778 b0779 b0780 b0782 b0783 b0784 b0785 b0786 b0789 b0790 b0791 b0792 b0793 b0794 b0795 b0796 b0798 b0799 b0800 b0801 b0803 b0804 b0806 b0808 b0809 b0810 b0811 b0813 b0814 b0815 b0817 b0819 b0820 b0821 b0829 b0831 b0833 b0834 b0837 b0838 b0839 b0840 b0841 b0842 b0843 b0844 b0845 b0848 b0849 118 bioA bioB bioC bioD uvrB ybhK moaB moaC moaD moaE ybhL ybhO ybhP ybhQ ybhR ybhS ybhF ybiH ybiA dinG ybiB ybiC ybiI ybiX ybiM ybiO glnQ glnP glnH ybiF ompX ybiP ybiS ybiT ybiU yliI yliJ dacC deoR ybjG cmr ybjH ybjM grxA 15.2 14.8 11.0 11.1 14.2 8.2 11.3 9.1 15.8 10.4 7.3 9.7 7.5 3.7 9.3 11.3 5.4 3.7 5.4 2.3 9.7 8.6 9.4 2.8 15.1 9.3 10.4 21.8 33.6 10.5 9.9 2.6 9.8 4.1 9.5 13.2 13.0 9.7 10.2 20.6 16.9 3.1 10.8 3.9 8.3 14.7 8.4 7.4 4.0 9.8 4.1 15.1 7.7 9.1 6.4 7.9 10.0 6.2 7.6 6.0 8.9 7.0 4.0 5.6 4.8 2.6 6.5 7.8 4.5 2.8 4.0 1.8 6.5 5.8 6.2 1.9 8.3 5.5 6.9 11.1 18.1 7.2 5.1 2.0 5.0 2.7 6.3 7.0 8.6 6.7 6.5 11.9 8.9 1.7 7.5 3.0 5.7 9.6 6.2 4.4 2.2 7.6 2.5 9.2 431.5 39.1 39.4 19.0 24.9 12.1 21.5 19.4 68.0 20.0 39.7 37.9 17.1 6.3 16.0 20.3 6.7 5.7 8.2 3.2 19.1 16.1 19.1 5.4 78.9 29.5 21.2 775.0 233.0 19.6 215.6 3.5 192.1 7.9 19.3 110.8 27.0 18.0 24.4 76.1 159.5 20.6 19.3 5.6 15.5 31.7 12.6 25.3 25.8 13.6 12.6 42.4 b0850 b0851 b0853 b0856 b0858 b0861 b0862 b0863 b0864 b0865 b0866 b0867 b0868 b0869 b0870 b0871 b0872 b0874 b0876 b0877 b0879 b0880 b0881 b0882 b0884 b0887 b0888 b0889 b0890 b0891 b0893 b0901 b0902 b0903 b0904 b0905 b0906 b0907 b0908 b0910 b0911 b0912 b0915 b0916 b0917 b0918 b0919 b0921 b0922 b0923 b0924 b0925 ybjC mdaA ybjN potH ybjO artM artQ artI artP ybjP ybjT ybjU poxB ybjE ybjD ybjX ybjZ cspD yljA clpA infA cydD trxB lrp ftsK lolA serS ycaK pflA pflB focA ycaO ycaP serC aroA cmk rpsA himD ycaH ycaQ ycaR kdsB smtA mukF mukE mukB ycbB 2.7 3.7 9.2 16.6 16.4 9.8 9.2 8.4 4.5 6.3 2.3 2.4 5.2 9.6 9.1 8.6 10.3 6.0 12.2 7.6 11.5 9.7 4.4 6.1 7.2 6.6 12.6 8.5 7.2 8.2 7.6 4.1 7.4 25.4 5.9 11.8 4.3 12.7 10.9 9.6 12.1 11.5 11.0 8.6 6.5 5.9 5.8 4.6 11.1 6.6 11.2 3.1 2.2 2.9 6.8 10.2 9.7 5.3 6.4 6.3 3.3 4.7 1.8 1.8 3.5 6.3 6.1 5.9 6.4 4.3 8.2 4.0 7.9 6.5 3.3 5.0 5.2 5.2 7.4 6.5 5.0 5.5 5.5 2.2 5.3 15.0 4.7 6.6 3.1 10.1 7.7 6.5 8.0 7.7 7.2 5.2 5.3 4.7 4.1 3.8 8.8 4.9 7.1 2.5 3.5 5.3 14.1 45.2 53.5 56.7 16.7 12.5 7.2 9.5 3.1 3.7 10.1 20.5 17.8 15.9 26.1 9.5 23.8 76.4 21.5 19.0 6.6 7.9 11.5 9.0 43.2 12.3 13.0 16.1 12.2 35.4 12.0 81.7 8.0 61.2 6.9 17.1 18.4 18.4 24.9 22.6 23.5 25.1 8.5 7.9 9.7 5.7 15.2 10.2 26.6 4.1 b0926 b0927 b0928 b0929 b0930 b0931 b0932 b0933 b0934 b0938 b0944 b0948 b0949 b0950 b0951 b0952 b0954 b0955 b0956 b0958 b0959 b0960 b0961 b0962 b0963 b0964 b0965 b0966 b0967 b0970 b0973 b0974 b0978 b0981 b0986 b0989 b0990 b0992 b0995 b0997 b0999 b1000 b1001 b1002 b1003 b1004 b1010 b1011 b1014 b1015 b1018 b1024 119 ycbK ycbL aspC ompF asnS pncB pepN ycbE ycbM ycbQ ycbF ycbY uup pqiA pqiB ymbA fabA ycbG sulA yccF helD mgsA yccV yccA hyaB hyaC appC yccC ymcC cspH cspG yccM torR torA yccD cbpA yccE agp yccJ wrbA putA putP ycdO ycdS 11.1 8.7 12.1 21.2 7.6 11.8 8.7 14.9 13.6 13.8 12.5 13.5 9.0 10.4 9.1 12.3 10.6 9.8 7.8 2.6 4.2 9.1 7.0 11.6 7.6 9.1 10.4 3.2 8.9 8.3 9.7 18.8 16.2 15.7 16.8 11.1 1.5 21.2 10.4 17.3 4.5 4.7 15.7 20.8 13.4 9.1 12.5 14.5 4.5 17.3 11.5 9.7 7.7 5.7 9.0 11.6 4.2 6.7 4.7 7.6 7.4 7.7 7.8 9.8 4.8 7.4 6.7 7.9 6.0 8.0 5.4 2.0 2.9 5.7 3.9 9.1 5.9 5.5 5.3 2.4 5.5 4.6 7.2 10.4 9.4 10.3 9.4 6.4 1.2 10.7 6.0 10.6 3.2 3.6 8.0 12.5 9.2 6.5 8.4 7.4 2.8 8.9 7.0 5.7 19.6 18.2 18.7 128.5 44.6 50.2 58.0 435.0 87.1 62.3 32.1 21.9 68.7 17.1 14.2 27.9 48.7 12.5 13.9 3.7 7.1 23.5 34.1 16.2 10.6 25.4 201.8 4.7 23.6 45.0 14.7 100.5 58.0 32.9 78.3 42.9 2.1 811.3 39.5 47.3 7.9 6.9 483.2 62.5 24.6 14.9 24.8 469.3 11.1 371.6 31.2 32.7 b1025 b1033 b1034 b1035 b1036 b1040 b1041 b1045 b1046 b1048 b1050 b1051 b1053 b1054 b1056 b1060 b1061 b1062 b1063 b1064 b1065 b1066 b1067 b1068 b1069 b1070 b1071 b1072 b1073 b1074 b1075 b1076 b1077 b1078 b1081 b1082 b1084 b1086 b1087 b1088 b1089 b1090 b1091 b1092 b1093 b1096 b1097 b1098 b1100 b1101 b1103 b1104 ycdT ycdW ycdX ycdY ycdZ csgD csgB ymdC mdoG yceK msyB yceE htrB yceI yceP dinI pyrC yceB grxB yceL rimJ yceH mviM mviN flgN flgM flgA flgB flgC flgD flgE flgF flgG flgJ flgK rne yceC yceF yceD rpmF plsX fabH fabD fabG pabC yceG tmk ycfH ptsG ycfF ycfL 17.5 6.7 12.0 13.3 10.7 7.9 13.0 2.7 6.8 10.9 11.6 11.2 8.2 8.9 10.8 3.3 9.4 9.2 9.4 10.0 15.9 3.8 4.9 6.6 9.7 6.0 5.1 14.6 6.9 3.5 6.1 5.6 2.9 8.4 14.8 8.7 10.3 5.1 7.9 8.6 10.0 7.0 11.7 9.2 13.6 5.8 7.0 7.2 3.0 3.8 6.8 7.8 10.8 5.1 9.2 9.5 7.2 5.0 7.5 2.1 4.1 5.5 6.4 6.9 4.4 6.7 7.7 2.2 6.4 4.9 6.8 8.6 11.3 3.1 4.0 5.3 7.6 4.1 3.2 8.3 4.3 2.4 4.2 3.2 1.6 5.7 8.9 5.8 5.5 2.7 5.3 5.9 6.4 5.4 8.0 7.3 7.1 3.4 4.9 5.4 2.3 3.2 5.5 6.0 44.8 9.8 17.2 22.3 20.8 18.6 50.3 3.6 20.6 314.7 65.0 29.7 51.1 13.5 18.3 6.2 17.7 71.9 15.5 12.0 26.9 4.7 6.2 8.9 13.2 11.5 11.9 63.2 17.2 6.7 11.0 19.8 15.7 16.1 43.4 16.7 75.7 34.5 15.6 15.7 22.8 9.7 21.7 12.6 173.0 20.1 12.1 11.1 4.2 4.6 8.9 11.1 b1105 b1106 b1107 b1108 b1109 b1111 b1112 b1113 b1114 b1116 b1117 b1118 b1119 b1123 b1125 b1126 b1127 b1128 b1130 b1131 b1132 b1134 b1135 b1137 b1138 b1139 b1140 b1143 b1145 b1146 b1150 b1153 b1158 b1162 b1163 b1170 b1171 b1172 b1174 b1175 b1176 b1177 b1178 b1179 b1180 b1182 b1183 b1184 b1186 b1187 b1188 b1189 120 ycfM ycfN ycfO ycfP ndh ycfQ ycfR ycfS mfd ycfU ycfV ycfW ycfX potD potB potA pepT ycfD phoP purB ycfC ymfC ymfD ymfE lit intE ymfI ymfR pin ycgE minE minD minC ycgJ ycgK ycgL hlyE umuD umuC nhaB fadR ycgB dadA 6.8 8.2 5.7 4.8 12.0 7.1 3.5 21.1 15.8 11.9 7.5 7.1 12.5 11.1 13.2 21.5 6.0 8.0 5.7 14.0 6.4 13.4 11.3 2.5 2.3 7.5 8.1 13.3 6.0 14.2 6.5 12.6 13.5 5.5 13.2 17.4 3.7 15.0 7.4 6.4 3.5 6.7 8.1 6.0 6.3 5.8 15.9 9.3 9.4 3.9 3.4 6.4 5.3 5.9 4.6 3.5 7.1 5.1 2.1 10.6 10.8 8.1 4.9 5.3 6.7 8.0 8.5 12.0 4.5 6.4 4.2 8.6 4.3 8.8 7.8 1.3 1.5 4.5 4.7 9.6 3.8 8.1 4.3 7.2 8.4 4.1 7.2 10.2 2.1 10.2 5.8 4.8 2.9 4.3 5.4 4.5 4.6 3.3 9.1 5.5 6.0 2.9 2.5 3.9 9.8 13.6 7.5 7.4 37.7 11.7 10.4 1380.3 29.0 22.2 16.4 10.6 92.1 18.1 29.6 102.6 9.2 10.6 9.0 37.3 12.1 28.0 21.0 20.3 5.0 22.0 28.3 21.5 15.3 58.2 13.1 50.0 33.7 8.2 75.9 59.5 17.9 28.8 10.2 9.4 4.5 15.1 16.9 8.9 9.7 23.1 62.0 30.6 22.4 6.1 5.5 18.4 b1190 b1191 b1195 b1197 b1198 b1199 b1200 b1201 b1203 b1205 b1208 b1209 b1210 b1214 b1215 b1216 b1217 b1219 b1221 b1222 b1224 b1225 b1227 b1232 b1233 b1234 b1235 b1236 b1238 b1240 b1243 b1244 b1245 b1246 b1247 b1248 b1249 b1250 b1251 b1253 b1254 b1255 b1256 b1260 b1261 b1263 b1266 b1267 b1269 b1270 b1271 b1272 dadX ymgE treA ycgC ychF ychH ychB hemM hemA ychA kdsA chaA chaB ychN narL narX narG narH narI purU ychJ ychK hnr galU tdk oppA oppB oppC oppD oppF cls kch yciI yciA yciB yciC yciD trpA trpB trpD yciV yciO yciL btuR yciK sohB 9.9 8.8 4.3 2.9 7.5 10.3 6.9 6.7 9.4 1.3 8.6 6.7 5.5 5.5 14.5 10.4 13.6 11.6 6.0 6.0 7.3 10.7 9.9 9.8 8.2 8.2 9.8 11.5 16.7 11.0 6.2 6.6 9.4 11.4 14.3 7.8 6.1 5.6 7.6 7.4 4.9 4.5 9.1 11.8 9.1 11.5 7.1 6.2 15.3 12.1 7.0 6.5 5.9 4.7 2.9 2.2 4.0 7.3 4.9 3.7 6.6 1.2 7.1 5.6 3.9 4.4 9.9 5.3 6.9 8.8 3.9 4.1 4.2 5.9 5.3 6.3 4.8 4.4 6.4 8.3 9.2 7.1 5.2 4.8 6.1 7.2 8.5 4.8 5.1 4.0 5.5 5.2 3.6 3.4 5.0 8.4 6.1 8.2 5.1 4.8 8.3 6.9 4.8 4.1 30.0 67.4 7.9 4.3 48.9 17.6 11.8 31.3 16.3 1.5 10.8 8.3 9.4 7.4 26.9 198.7 317.5 16.8 12.5 10.9 29.6 59.4 78.8 21.6 30.6 54.1 21.0 18.2 89.9 24.3 7.6 10.5 20.1 27.5 46.2 20.0 7.6 9.7 12.0 12.4 7.4 6.5 52.1 20.3 17.7 18.7 11.2 8.8 106.1 48.5 12.6 15.9 b1273 b1274 b1275 b1276 b1277 b1278 b1279 b1281 b1282 b1283 b1284 b1285 b1286 b1287 b1288 b1289 b1290 b1291 b1292 b1295 b1303 b1304 b1305 b1306 b1307 b1308 b1309 b1311 b1318 b1319 b1320 b1324 b1325 b1326 b1327 b1329 b1332 b1338 b1343 b1344 b1356 b1361 b1363 b1364 b1365 b1371 b1374 b1375 b1376 b1377 b1379 b1380 121 yciN topA cysB acnA ribA pgpB yciS pyrF yciH osmB yciR rnb yciW fabI ycjD sapF sapD sapC ymjA pspF pspA pspB pspC pspD pspE ycjM ycjO ycjV ompG ycjW tpx ycjG ycjI ynaJ ydaJ dbpA ydaO ydaR ydaW trkG ynaE ynaF hslJ ldhA 7.4 8.9 6.6 11.8 4.7 6.7 3.6 15.5 20.1 13.6 8.2 16.4 8.7 5.8 12.0 13.8 7.5 10.1 4.2 23.2 2.3 17.3 10.7 12.1 15.5 18.2 6.0 12.0 3.3 7.1 21.2 13.7 8.4 3.2 5.2 6.5 6.3 13.3 16.7 14.0 1.4 14.3 21.2 21.1 16.3 10.6 7.4 1.1 6.6 8.7 2.9 10.8 5.4 5.4 4.9 8.9 3.7 4.5 2.6 9.1 10.7 8.0 5.6 9.3 5.4 3.8 8.7 7.1 4.6 6.7 2.1 12.5 1.4 10.4 6.6 7.3 10.7 12.1 3.1 6.8 2.1 4.9 12.0 10.2 6.2 2.5 3.6 5.3 3.6 8.0 8.8 7.2 1.1 9.8 12.6 11.4 9.0 7.9 5.0 0.8 4.8 4.4 1.6 7.9 12.0 24.6 10.3 17.3 6.3 12.5 5.5 52.2 163.4 44.3 15.2 70.6 23.3 12.3 19.3 231.4 21.8 19.9 141.3 156.5 6.6 51.2 27.8 35.0 27.9 37.2 104.7 48.6 7.4 12.9 95.3 20.8 13.0 4.4 9.3 8.3 22.9 39.0 154.8 234.6 2.0 26.2 68.6 136.8 84.3 16.2 14.0 1.7 10.5 899.4 17.4 17.2 b1381 b1382 b1383 b1385 b1386 b1393 b1397 b1399 b1400 b1401 b1406 b1407 b1411 b1412 b1413 b1415 b1416 b1417 b1418 b1419 b1423 b1425 b1427 b1429 b1430 b1431 b1432 b1433 b1434 b1435 b1437 b1438 b1439 b1440 b1441 b1442 b1443 b1444 b1445 b1448 b1449 b1452 b1457 b1461 b1462 b1465 b1466 b1467 b1469 ydbH ynbE ydbL feaB tynA ydbS ydbA_ 1 ydbC ydbD ynbD acpD hrpA aldA gapC_ 2 gapC_ 1 cybB ydcA rimL tehA tehB ydcN ydcP yncB ydcD ydcE narV narW narY narU 10.7 6.4 9.4 15.1 9.3 5.4 14.1 2.8 7.0 7.0 5.8 4.1 5.8 8.6 5.3 2.8 9.0 2.3 5.0 4.8 69.2 14.8 24.4 65.1 38.1 60.6 33.2 3.7 11.9 12.8 6.9 25.7 11.7 2.7 14.3 7.6 9.6 5.1 13.0 6.2 1.7 8.7 5.0 7.1 10.5 1905.9 106.1 6.8 39.2 15.8 14.8 7.7 5.8 11.3 7.3 4.3 4.6 14.5 3.7 6.0 9.2 7.6 5.7 8.5 7.4 7.6 11.2 8.3 12.1 2.8 4.3 5.2 15.0 24.2 12.2 15.8 7.3 8.2 10.3 5.6 6.2 12.1 13.3 11.2 16.8 5.5 2.4 3.1 10.0 2.9 4.3 7.2 5.1 3.3 4.3 4.7 5.4 6.6 6.2 6.9 2.3 3.3 4.2 10.5 12.8 6.1 10.9 4.9 4.4 6.0 3.4 3.7 8.1 8.4 7.3 12.0 11.0 20.6 9.2 26.4 5.4 10.1 12.6 15.2 22.1 339.6 17.1 13.0 37.9 12.7 48.1 3.5 6.0 6.7 26.2 230.8 3739.3 29.3 14.0 53.3 34.2 16.0 20.4 23.5 32.4 23.6 27.8 b1473 b1475 b1476 b1477 b1478 b1479 b1480 b1482 b1488 b1490 b1491 b1492 b1497 b1498 b1499 b1507 b1508 b1509 b1512 b1514 b1516 b1517 b1518 b1519 b1520 b1521 b1522 b1523 b1524 b1525 b1529 b1530 b1531 b1533 b1534 b1537 b1538 b1539 b1540 b1542 b1544 b1545 b1547 b1549 b1552 b1556 b1557 b1558 b1561 b1562 b1563 b1564 122 yddG fdnH fdnI yddM adhP sfcA rpsV osmC xasA hipA hipB ydeW ydeY yneB uxaB yneH ydeB marR marA ydeD ydeF ydeJ dcp ydfG ydfH ydfI ydfK ydfO cspI cspB cspF rem relF relE relB 14.4 14.7 17.5 5.3 12.9 8.8 23.1 10.9 7.2 12.1 24.8 15.2 3.3 19.3 3.3 6.2 2.9 17.3 6.4 11.8 6.3 8.0 8.5 7.2 8.0 14.6 9.9 8.3 14.3 10.0 11.0 11.8 9.8 3.0 14.4 17.8 9.4 7.7 5.7 8.9 1.1 7.4 13.6 1.2 1.1 13.1 0.8 0.7 14.7 8.9 6.9 6.1 9.5 8.3 11.1 3.8 9.1 6.6 12.8 6.9 3.7 7.6 13.0 10.2 2.9 10.4 2.1 3.9 2.1 9.2 4.1 7.9 4.5 5.3 6.0 5.5 6.1 8.5 6.4 4.4 8.1 6.6 6.8 8.4 6.9 2.2 9.1 10.0 6.1 6.0 3.7 6.3 0.8 5.0 6.9 0.6 0.7 8.5 0.6 0.5 8.3 6.5 4.4 3.7 29.8 61.1 40.9 8.8 21.8 13.4 112.3 26.0 221.0 29.0 297.4 29.9 3.8 137.5 7.6 14.7 4.8 133.9 15.4 23.8 10.6 16.7 14.4 10.1 11.7 51.2 21.1 68.9 58.7 21.3 29.9 19.7 16.5 4.8 34.1 81.4 21.1 10.7 12.3 15.4 1.7 14.0 833.3 23.8 2.3 28.4 1.0 1.8 61.0 14.0 15.5 18.0 b1570 b1572 b1574 b1578 b1579 b1582 b1583 b1584 b1585 b1586 b1591 b1592 b1593 b1594 b1595 b1596 b1597 b1598 b1599 b1600 b1602 b1603 b1604 b1605 b1606 b1607 b1608 b1609 b1611 b1612 b1613 b1614 b1616 b1617 b1618 b1619 b1621 b1622 b1623 b1624 b1626 b1627 b1628 b1630 b1632 b1635 b1636 b1637 b1638 b1640 b1641 b1642 dicA ydfB dicF speG ynfC mlc ynfL ynfM asr pntB pntA ydgB ydgC rstA rstB fumC fumA manA ydgA uidB uidA uidR hdhA malX malY add ydgO ydgQ gst pdxY tyrS pdxH slyB slyA 5.1 10.1 15.8 6.8 14.9 8.6 3.8 5.2 6.7 6.6 9.5 19.3 6.6 2.3 10.3 9.5 8.4 9.6 10.9 3.3 8.4 8.4 10.7 3.5 7.8 2.7 11.1 7.3 6.4 6.8 8.8 5.1 4.3 11.1 5.0 7.1 7.7 6.0 6.4 8.6 24.4 8.3 14.9 8.5 12.5 5.6 7.4 13.5 7.1 9.0 7.4 7.0 4.3 5.6 8.4 3.6 7.5 6.2 2.8 3.7 5.7 4.2 5.5 10.5 3.9 1.8 6.8 5.5 6.0 5.8 6.7 2.3 5.1 5.5 7.1 2.1 5.1 1.5 7.8 5.5 4.9 5.6 6.8 4.2 2.4 6.9 3.8 4.7 5.0 3.5 5.5 6.3 12.7 4.7 7.8 4.3 8.7 4.7 4.4 10.1 4.7 5.6 5.3 5.3 6.1 54.4 149.9 58.1 2960.9 14.0 5.7 9.2 8.0 15.1 33.1 118.0 23.6 3.2 21.2 34.3 14.0 27.4 29.7 6.0 24.9 17.4 21.2 10.4 16.6 14.6 19.3 10.9 9.2 8.4 12.4 6.5 26.0 28.0 7.3 14.6 17.0 22.0 7.8 13.4 327.0 31.6 191.4 764.0 21.8 7.2 23.0 20.2 14.1 22.1 12.2 10.1 b1644 b1645 b1646 b1647 b1650 b1651 b1652 b1654 b1655 b1657 b1658 b1659 b1661 b1662 b1663 b1664 b1667 b1668 b1676 b1678 b1679 b1680 b1681 b1682 b1683 b1684 b1685 b1686 b1687 b1688 b1689 b1692 b1693 b1694 b1699 b1700 b1701 b1702 b1703 b1704 b1705 b1706 b1707 b1708 b1710 b1711 b1712 b1713 b1714 b1717 b1718 b1719 123 sodC nemA gloA rnt ydhD ydhO purR ydhB cfa ribE ydhE pykF ynhG ynhA ynhC ynhD ynhE ydiC ydiJ ydiB aroD ydiF ydiS ydiT ydiD ppsA ydiA aroH ydiE nlpC btuE btuC himA pheT pheS rpmI infC thrS 16.9 25.3 4.4 7.4 11.4 8.2 8.8 7.3 8.4 9.6 6.1 12.0 9.2 6.7 12.6 3.8 13.2 4.3 13.6 6.7 6.9 5.2 8.2 6.1 4.1 4.8 13.8 9.3 13.8 7.7 7.7 15.2 10.1 9.2 6.1 9.6 12.6 16.2 6.2 7.1 5.1 5.4 6.4 11.3 7.8 6.5 11.3 8.6 9.7 15.8 15.9 11.4 9.3 15.1 3.2 5.5 9.6 6.0 6.2 5.7 6.2 5.2 3.8 7.6 7.1 5.4 8.4 2.7 8.3 3.2 9.8 4.1 4.8 4.0 6.0 4.6 3.4 3.5 10.2 7.1 10.0 5.6 4.8 8.6 7.0 5.1 3.6 6.4 7.4 10.3 5.3 3.6 3.1 3.9 5.2 8.4 6.7 4.4 7.8 5.9 6.1 8.3 9.4 7.1 90.5 78.1 6.8 11.5 14.0 13.0 15.6 10.1 13.0 64.8 15.4 27.9 13.2 8.8 24.7 6.1 31.8 6.3 22.1 18.2 12.2 7.5 13.2 9.0 5.3 7.3 20.9 13.6 21.8 12.1 19.9 62.7 18.3 45.4 20.7 19.5 43.8 38.1 7.6 153.9 14.3 8.8 8.3 17.0 9.3 12.2 20.1 16.3 23.4 143.4 53.6 27.9 b1723 b1724 b1725 b1726 b1727 b1728 b1729 b1731 b1733 b1735 b1736 b1738 b1739 b1740 b1741 b1743 b1744 b1745 b1746 b1747 b1749 b1750 b1753 b1754 b1757 b1758 b1761 b1763 b1764 b1765 b1768 b1777 b1778 b1780 b1781 b1782 b1783 b1784 b1787 b1789 b1791 b1792 b1793 b1794 b1797 b1798 b1799 b1800 b1802 b1803 b1804 b1807 pfkB yniC ydjC celD celC celA osmE nadE spy ydjS xthA ydjX ynjA gdhA topB selD ydjA ydjB yeaA yeaD yeaF yeaG yeaH yeaK yeaL yeaN yeaO yoaF yeaP yeaR yeaS yeaT yeaU yeaW yeaX rnd yeaZ 8.1 9.1 3.4 5.6 6.3 18.8 10.9 13.2 16.1 7.8 9.8 3.5 9.0 9.1 8.2 12.1 13.5 8.6 13.3 16.0 7.5 4.3 13.5 13.5 5.0 10.4 14.4 8.7 6.5 3.7 9.6 2.3 3.3 7.6 4.6 9.1 11.1 6.3 8.0 16.1 12.0 6.5 9.5 6.1 21.6 13.4 1.2 6.7 9.9 19.7 8.1 13.1 6.3 7.1 2.7 4.5 4.6 10.0 7.8 9.2 8.6 5.9 6.1 2.8 6.3 6.9 5.6 8.9 8.2 5.1 9.0 9.8 5.3 2.3 8.7 8.0 3.5 6.5 9.0 6.7 5.2 2.9 7.0 1.6 2.4 6.1 3.4 6.5 6.6 5.1 5.5 9.7 8.3 4.7 5.4 4.0 11.9 7.3 0.9 4.7 5.2 11.7 5.5 7.4 11.2 12.6 4.7 7.3 9.6 163.7 18.1 23.0 140.3 11.7 25.4 4.7 15.7 13.5 15.6 18.9 38.8 29.2 25.0 45.0 13.0 36.9 29.9 44.1 9.1 25.4 35.4 12.3 8.9 5.2 15.6 4.7 5.2 10.1 7.2 15.0 35.1 8.0 15.3 46.3 21.6 10.8 37.5 12.5 118.7 86.5 2.0 11.4 103.8 62.3 15.3 56.2 b1808 b1809 b1811 b1812 b1813 b1814 b1815 b1816 b1820 b1821 b1822 b1825 b1826 b1827 b1829 b1830 b1831 b1832 b1835 b1836 b1837 b1839 b1840 b1841 b1842 b1843 b1844 b1845 b1846 b1847 b1848 b1850 b1851 b1852 b1853 b1854 b1855 b1856 b1857 b1858 b1859 b1860 b1861 b1862 b1863 b1864 b1865 b1866 b1867 b1869 b1870 b1871 124 pabB yeaB sdaA yoaE yebH htpX prc yebJ yebU holE ptrB yebE yebF yebG eda edd zwf yebK pykA msbB yebA yebL yebM yebI ruvB ruvA yebB ruvC yebC ntpA aspS yecD yecN yecO yecP 11.4 8.9 10.3 10.7 8.8 7.2 6.8 9.1 9.4 11.9 9.7 7.6 4.3 4.1 6.3 12.0 7.6 4.9 15.0 8.2 2.8 9.3 6.8 6.1 2.0 9.7 11.3 8.8 3.6 7.7 2.5 9.3 10.8 7.9 3.7 8.7 7.8 9.9 6.0 12.3 14.0 8.2 5.7 18.3 12.1 12.0 10.6 9.5 7.7 6.6 10.2 9.4 8.0 6.8 7.0 6.4 6.6 5.5 4.1 5.8 5.7 7.4 6.3 4.0 3.3 3.0 4.5 8.3 5.8 4.0 9.2 6.5 1.9 7.0 5.4 5.0 1.2 5.7 8.2 5.1 2.6 5.6 2.1 6.4 8.5 6.1 3.0 7.2 5.8 6.2 4.6 7.9 7.8 5.2 4.1 9.2 7.9 8.3 8.0 7.3 5.3 4.7 7.6 6.1 20.1 12.8 19.6 33.6 13.3 10.6 20.5 21.0 26.6 30.5 21.2 74.9 6.1 6.6 10.5 21.8 10.9 6.2 40.5 11.1 4.9 14.2 9.1 7.6 5.9 33.3 18.2 30.6 5.5 12.5 3.2 17.0 14.8 11.2 5.1 10.8 11.9 25.0 8.5 27.7 66.1 19.0 9.1 1187.4 25.7 21.5 15.8 13.7 14.5 11.4 15.2 19.8 b1874 b1875 b1882 b1883 b1884 b1886 b1887 b1888 b1890 b1891 b1892 b1894 b1895 b1896 b1897 b1898 b1899 b1900 b1901 b1902 b1903 b1905 b1907 b1908 b1912 b1913 b1914 b1915 b1916 b1917 b1918 b1919 b1920 b1921 b1922 b1923 b1924 b1926 b1927 b1928 b1929 b1930 b1931 b1932 b1938 b1939 b1940 b1941 b1942 b1943 b1945 b1947 cutC yecM cheY cheB cheR tar cheW cheA motA flhC flhD insA_5 yecG otsA otsB araH_2 araH_1 araG araF yecI ftn tyrP yecA pgsA uvrC uvrY yecF sdiA yecC yecS yedO fliY fliZ fliA fliC fliD fliT amyA yedD yedE yedF yedK yedL fliF fliG fliH fliI fliJ fliK fliM fliO 5.3 12.6 10.6 15.9 10.7 5.8 10.9 14.3 10.5 13.6 8.4 10.2 3.8 9.2 4.3 4.3 2.6 4.9 5.6 2.9 13.8 8.5 11.4 8.5 3.6 8.5 5.6 11.8 4.6 11.6 9.0 7.6 6.4 10.6 6.6 8.1 10.0 14.0 7.9 6.1 11.8 4.0 11.8 3.6 10.1 10.6 15.3 8.8 14.4 8.4 13.8 7.7 3.3 9.1 6.3 10.4 7.7 4.0 5.8 8.2 6.0 8.6 5.8 7.0 2.8 5.8 3.2 2.9 1.8 3.3 3.8 2.2 8.0 6.4 5.7 6.3 3.0 7.0 4.8 6.9 3.1 8.2 6.6 5.5 4.6 6.6 4.0 6.2 5.7 9.3 6.2 4.7 7.7 2.8 7.5 2.1 5.8 7.1 8.3 5.6 9.1 5.4 8.3 5.0 12.6 20.6 32.9 34.4 17.5 10.1 80.6 58.4 42.7 32.8 15.2 18.8 5.9 22.1 6.9 8.8 4.8 9.3 10.4 4.2 49.3 12.6 1108.5 12.8 4.6 10.7 6.6 39.5 9.2 19.6 14.4 12.5 10.9 26.3 17.4 11.9 39.4 28.4 10.8 8.7 24.9 7.1 28.0 14.2 38.0 20.8 98.5 20.9 34.2 18.6 40.5 16.8 b1948 b1952 b1953 b1955 b1957 b1958 b1960 b1961 b1962 b1963 b1964 b1965 b1966 b1967 b1968 b1969 b1970 b1971 b1973 b1976 b1978 b1981 b1982 b1983 b1988 b1990 b1991 b1992 b1993 b1995 b1998 b1999 b2002 b2004 b2005 b2006 b2007 b2008 b2009 b2010 b2011 b2012 b2015 b2016 b2017 b2018 b2019 b2021 b2022 b2023 b2024 b2025 125 fliP dsrB yedI vsr dcm yedJ yedU yedV yedW shiA amn nac erfK cobT cobS cobU yeeP yeeS yeeU yeeV yeeW yeeX yeeA sbmC dacD sbcB yeeD yeeY yefM hisL hisG hisC hisB hisH hisA hisF 13.0 9.6 3.2 8.5 8.4 5.7 9.0 3.1 12.5 3.8 3.3 11.8 21.9 8.9 11.5 7.3 17.1 5.4 7.7 4.3 9.2 9.0 8.7 9.2 13.0 2.8 9.4 5.6 6.1 2.1 17.1 16.7 8.2 4.6 5.8 18.8 8.0 8.3 5.1 4.7 4.6 8.0 9.1 4.4 2.9 1.2 6.5 3.6 4.5 6.8 10.0 10.1 7.0 6.3 2.2 6.0 6.0 4.5 6.2 1.9 7.9 2.7 2.2 7.0 11.8 6.4 6.0 5.1 11.0 3.5 4.4 3.5 5.3 6.1 7.0 6.4 7.5 2.1 7.4 3.8 4.1 1.1 10.4 8.8 4.7 2.7 3.0 11.8 6.6 6.5 3.7 3.4 3.3 5.7 6.2 3.6 2.3 0.7 4.4 2.7 3.4 4.2 6.6 7.9 89.2 20.0 5.9 14.5 14.1 7.7 16.1 7.9 29.2 6.4 6.7 37.5 160.7 14.9 114.9 12.4 38.5 11.5 28.3 5.6 34.2 17.6 11.4 16.5 49.2 4.0 12.9 10.5 11.9 32.2 46.8 149.0 30.4 16.8 97.2 46.1 10.3 11.3 8.1 7.5 7.7 13.7 17.1 5.7 4.1 3.2 12.1 5.1 6.7 17.2 20.5 14.0 b2026 b2027 b2029 b2031 b2032 b2033 b2034 b2035 b2036 b2037 b2038 b2039 b2040 b2041 b2042 b2044 b2046 b2047 b2048 b2050 b2051 b2052 b2060 b2063 b2064 b2065 b2068 b2070 b2071 b2072 b2073 b2076 b2077 b2078 b2080 b2081 b2082 b2084 b2086 b2090 b2091 b2097 b2098 b2099 b2100 b2101 b2102 b2104 b2105 b2107 b2108 b2109 hisI wzzB gnd yefJ wbbK wbbJ wbbI wbbH glf rfbX rfbC rfbA rfbD rfbB galF wcaL wzxC wcaJ cpsG wcaI wcaH wcaG yegH asmA dcd alkA yegO yegB baeS yegQ ogrK gatR_2 gatD yegT yegW yegX thiM yohL yehA yehB 12.5 9.4 16.1 3.8 4.9 5.6 6.0 7.9 11.1 15.1 16.2 12.1 11.6 9.3 8.6 10.9 16.8 17.0 14.8 7.2 13.8 3.4 9.5 8.4 12.9 2.7 10.9 14.0 18.4 16.3 23.0 13.6 12.8 7.7 6.9 19.3 12.2 8.0 12.2 12.8 11.6 10.4 13.6 6.4 16.6 2.9 5.3 12.0 6.4 8.5 21.3 6.1 9.0 6.5 11.7 3.0 3.5 4.0 3.9 4.8 6.7 8.6 9.1 8.6 8.4 6.3 6.3 6.8 11.1 9.4 10.2 3.8 7.8 2.3 4.9 6.6 9.2 1.9 6.2 8.0 9.9 9.7 11.6 6.8 7.2 4.0 4.3 10.6 7.7 5.1 7.6 7.4 7.5 7.7 8.0 3.5 10.7 1.8 3.4 7.1 3.6 5.4 11.1 3.9 20.3 16.9 25.7 5.1 8.2 9.7 13.6 23.6 32.7 61.8 74.6 20.7 18.9 17.3 13.5 26.7 33.9 91.2 27.0 57.5 56.8 6.4 173.2 11.5 21.7 4.7 45.8 55.3 135.4 50.3 1153.7 2590.0 62.2 80.2 18.2 115.6 29.4 18.4 30.9 49.8 25.2 15.9 47.4 43.6 37.4 8.7 12.3 36.9 28.3 19.4 264.5 14.4 b2112 b2113 b2114 b2119 b2121 b2123 b2125 b2126 b2127 b2128 b2129 b2130 b2131 b2133 b2134 b2135 b2136 b2137 b2139 b2143 b2144 b2146 b2147 b2149 b2150 b2151 b2153 b2154 b2155 b2156 b2157 b2160 b2162 b2168 b2169 b2170 b2171 b2172 b2173 b2175 b2176 b2177 b2178 b2180 b2181 b2183 b2184 b2185 b2186 b2187 b2188 b2190 126 yehE mrp metG yehL yehP yehR yehT yehU yehV yehW yehX yehY yehZ dld pbpG yohC yohD yohF yohH cdd sanA yeiA mglA mglB galS folE yeiG cirA lysP yeiE yeiI yeiK fruK fruB yeiO yeiP yeiQ yeiR spr rtn yejA yejB yejF yejG rsuA yejH rplY yejK yejL yejM yejO 7.4 7.3 2.2 9.7 10.8 24.9 6.6 5.6 8.2 8.5 11.6 9.8 11.5 8.5 13.4 7.8 3.3 10.4 7.3 7.7 6.6 8.4 12.5 8.7 7.8 15.3 7.3 8.2 11.4 8.6 6.7 5.4 18.9 4.6 3.6 9.0 11.6 6.6 9.8 10.0 6.2 12.6 8.1 15.0 9.7 4.6 10.6 8.4 9.8 5.7 9.7 4.7 5.2 5.7 1.4 5.0 6.4 13.2 4.0 4.4 4.8 5.7 7.8 7.2 7.2 7.2 9.3 5.5 2.4 7.4 3.7 5.2 4.4 5.8 7.9 5.9 5.1 8.7 4.8 6.3 6.5 5.8 5.2 3.5 10.6 3.3 2.5 4.9 8.5 5.0 5.2 6.5 4.3 8.3 6.6 8.3 5.2 3.3 6.3 5.5 6.8 4.3 6.5 3.4 13.0 10.0 6.2 119.4 33.6 222.0 18.0 7.8 28.9 17.0 22.6 15.6 27.8 10.3 23.7 13.6 5.3 17.5 651.4 15.0 12.9 15.4 30.3 16.6 16.5 61.8 14.5 12.0 44.3 16.8 9.2 12.1 83.3 7.5 6.6 54.6 18.3 9.7 99.4 22.3 10.6 26.4 10.4 75.8 64.1 8.0 33.4 17.8 17.9 8.2 18.8 7.6 b2191 b2193 b2194 b2196 b2198 b2199 b2200 b2202 b2203 b2208 b2209 b2211 b2212 b2213 b2216 b2217 b2218 b2219 b2220 b2225 b2226 b2227 b2229 b2231 b2232 b2233 b2234 b2235 b2237 b2239 b2240 b2241 b2242 b2245 b2248 b2249 b2251 b2254 b2255 b2256 b2257 b2259 b2261 b2262 b2264 b2265 b2266 b2267 b2268 b2274 b2276 b2277 narP ccmH ccmF ccmD ccmC ccmB napC napB napF eco yojI alkB ada yojN rcsB rcsC atoS atoC gyrA ubiG yfaL nrdA nrdB inaA glpQ glpT glpA glpB yfaO pmrD menC menB menD menF elaB elaA elaC nuoN nuoM 17.3 9.3 13.2 4.9 13.2 4.9 12.7 3.1 12.0 8.2 11.2 7.9 15.4 13.2 4.9 6.1 11.0 4.4 5.2 9.3 21.9 13.3 15.6 13.9 6.8 15.2 4.1 6.0 4.6 11.6 8.6 7.9 16.1 10.8 18.4 7.4 6.1 6.9 13.3 13.7 13.9 3.8 11.2 17.0 13.4 8.0 19.3 8.9 9.9 15.5 17.7 16.1 10.6 7.0 7.1 3.0 8.2 3.1 7.7 2.2 7.4 4.7 8.0 4.2 10.0 8.4 3.9 5.0 8.7 3.4 4.0 5.3 12.7 7.6 8.1 9.8 4.0 9.5 2.1 4.7 3.1 8.4 6.2 4.0 9.8 6.3 10.9 4.9 4.5 4.1 8.6 9.7 8.3 2.7 7.3 10.6 8.1 6.0 11.6 6.5 7.3 8.4 12.3 10.0 47.7 13.7 91.9 14.9 33.6 10.7 37.9 5.1 31.0 30.9 19.0 64.1 33.1 30.9 6.5 7.9 15.1 6.3 7.5 37.3 77.5 54.4 193.9 23.5 22.2 38.3 52.0 8.3 8.5 19.0 13.9 475.2 43.7 37.5 58.1 14.9 9.6 22.5 29.1 23.1 41.5 6.3 23.9 42.9 39.0 12.2 58.8 14.1 15.6 109.6 32.1 41.3 b2278 b2279 b2280 b2281 b2282 b2283 b2284 b2285 b2286 b2287 b2288 b2289 b2290 b2291 b2292 b2293 b2294 b2295 b2296 b2297 b2299 b2300 b2301 b2302 b2303 b2304 b2305 b2306 b2307 b2308 b2309 b2313 b2314 b2315 b2316 b2317 b2318 b2319 b2320 b2322 b2323 b2325 b2326 b2328 b2329 b2330 b2331 b2332 b2334 b2335 b2337 b2339 127 nuoL nuoK nuoJ nuoI nuoH nuoG nuoF nuoE nuoC nuoB nuoA lrhA yfbS yfbT ackA pta yfcE yfcF yfcG folX yfcI hisP hisM hisQ hisJ cvpA dedD folC accD dedA truA usg pdxB fabB mepA aroC yfcB - 12.1 13.4 11.6 11.1 11.2 10.8 9.3 9.4 7.5 8.0 6.3 6.2 6.4 8.6 4.1 13.7 11.4 7.6 10.3 13.0 12.6 6.3 7.8 2.6 4.8 6.0 8.4 14.9 26.3 7.1 4.8 10.0 6.1 6.2 9.0 11.1 11.4 13.2 9.3 15.2 4.5 14.4 11.4 11.1 8.5 13.2 3.2 4.6 14.8 12.6 12.2 11.7 8.8 9.2 8.2 8.4 8.2 8.0 6.5 6.3 5.8 5.6 4.8 4.8 4.8 5.5 3.0 9.3 7.0 5.3 7.2 7.6 9.6 5.2 5.7 1.9 3.8 4.5 5.5 9.0 15.4 3.8 3.5 6.2 5.2 4.9 6.4 7.3 7.7 8.1 6.5 8.7 3.7 8.6 7.0 7.1 6.2 7.4 2.1 3.1 7.6 7.1 7.2 6.5 19.5 24.1 20.1 16.4 17.8 16.9 16.4 18.6 10.4 14.0 9.2 8.7 9.8 19.9 6.5 26.0 30.1 12.9 17.6 43.1 18.5 8.0 12.6 4.0 6.6 9.2 17.7 44.6 89.0 56.6 7.7 25.2 7.4 8.5 15.2 22.8 22.5 36.1 16.2 57.8 5.8 42.6 32.0 26.3 13.3 63.1 6.6 8.9 303.6 60.1 40.3 57.4 b2340 b2341 b2342 b2343 b2345 b2346 b2347 b2350 b2351 b2353 b2356 b2358 b2361 b2366 b2368 b2369 b2370 b2375 b2377 b2378 b2379 b2380 b2381 b2382 b2383 b2384 b2386 b2388 b2392 b2393 b2394 b2395 b2398 b2399 b2400 b2405 b2406 b2410 b2411 b2412 b2413 b2414 b2415 b2416 b2417 b2418 b2420 b2421 b2423 b2425 b2426 b2427 vacJ yfdC yfdM yfdO dsdA emrK evgA evgS ddg glk nupC yi81_3 yfeA yfeC yfeD gltX xapR xapB yfeH lig zipA cysZ cysK ptsH ptsI crr pdxK cysM cysW cysP ucpA yfeT 8.1 2.1 5.7 12.5 11.1 6.0 7.6 12.4 12.4 15.4 11.8 13.9 7.1 17.2 16.3 2.4 13.9 12.8 1.9 3.7 9.0 4.0 5.5 14.5 16.4 9.1 19.2 6.3 9.2 10.5 13.4 13.5 1.5 1.8 9.7 7.6 12.1 7.3 16.2 6.2 10.8 6.4 12.9 12.9 11.4 6.7 12.0 8.2 7.3 7.3 3.1 10.3 5.9 1.7 3.5 9.2 5.9 5.0 4.5 7.4 7.6 9.1 6.3 7.5 4.1 9.5 8.6 2.0 8.9 7.2 1.5 2.6 6.9 2.3 4.1 10.0 9.7 6.2 11.0 5.1 5.7 7.8 7.6 9.1 1.3 1.6 6.3 4.8 7.9 5.1 9.3 4.8 7.9 5.6 7.8 8.7 8.0 5.2 6.9 4.3 4.2 4.5 2.6 6.6 13.4 2.9 16.4 19.6 95.3 7.3 24.4 38.2 33.6 51.3 87.6 91.4 26.2 89.2 183.9 2.9 31.6 58.7 2.7 6.1 12.8 16.0 8.4 26.7 51.6 17.4 75.1 8.2 24.2 16.0 58.6 26.1 1.9 2.2 21.2 18.0 25.7 12.8 65.2 8.6 17.3 7.5 36.8 25.1 19.8 9.6 45.3 97.8 28.0 19.3 3.9 22.7 b2428 b2429 b2430 b2431 b2432 b2433 b2434 b2435 b2438 b2439 b2440 b2441 b2442 b2443 b2445 b2447 b2449 b2450 b2452 b2454 b2456 b2457 b2459 b2460 b2463 b2464 b2465 b2468 b2469 b2471 b2472 b2473 b2474 b2475 b2476 b2477 b2478 b2479 b2480 b2488 b2489 b2490 b2491 b2492 b2493 b2494 b2495 b2496 b2498 b2499 b2500 b2501 128 yfeU amiA eutC eutB eutH eutJ cchB cchA talA tktB yffG narQ yffB dapE ypfH ypfI purC nlpB dapA gcvR bcp hyfH hyfI hyfR focB perM upp purM purN ppk 4.3 8.5 7.2 6.3 16.0 13.1 6.0 9.3 12.4 3.5 6.4 8.3 12.7 3.0 7.4 18.0 4.0 9.8 22.3 11.1 20.8 13.7 9.3 14.9 9.5 6.1 9.8 19.1 15.4 8.0 8.1 9.1 8.7 13.3 14.7 9.7 6.5 3.5 7.3 7.4 14.5 17.7 12.6 9.0 5.9 11.0 12.0 7.8 9.7 8.3 7.1 6.7 3.1 5.8 5.8 5.0 10.3 9.6 4.3 5.3 8.8 2.1 4.6 6.6 7.4 2.0 3.7 10.2 2.3 7.6 11.4 6.5 10.6 9.1 5.7 9.2 7.4 4.7 7.1 10.9 9.2 6.4 6.1 6.1 6.3 7.6 10.5 7.5 5.3 2.7 5.6 4.4 8.1 10.7 8.7 5.3 3.2 7.8 8.1 5.8 6.8 4.4 5.2 4.7 7.2 15.3 9.4 8.3 35.5 20.3 10.2 35.6 21.3 11.9 10.5 11.2 46.8 5.6 308.0 77.9 16.4 13.6 470.9 37.8 541.1 27.4 24.6 39.3 13.2 8.8 16.1 80.2 47.3 10.7 12.1 17.6 14.2 56.1 24.4 13.5 8.4 5.1 10.4 22.5 69.5 51.5 22.9 31.3 49.1 18.1 23.4 11.8 17.1 76.9 11.0 11.4 b2502 b2504 b2506 b2508 b2511 b2512 b2513 b2514 b2515 b2516 b2518 b2519 b2520 b2521 b2522 b2523 b2524 b2525 b2526 b2527 b2528 b2529 b2530 b2531 b2532 b2533 b2534 b2535 b2536 b2538 b2541 b2542 b2543 b2544 b2546 b2548 b2549 b2551 b2552 b2553 b2554 b2555 b2556 b2557 b2559 b2560 b2561 b2562 b2563 b2564 b2565 b2567 ppx guaB hisS gcpE yfgA ndk pbpC sseA sseB pepB yfhJ fdx hscA yfhE yfhF yfhO suhB csiE hcaT hcaA1 hcaB hcaD yphA yphB yphD yphF yphG glyA hmpA glnB yfhA yfhG yfhK purL yfhC yfhB yfhH yfhL acpS pdxJ recO rnc 8.6 9.7 8.9 7.6 21.0 13.2 9.3 8.9 6.3 5.7 6.1 11.4 8.6 4.8 10.9 10.7 11.6 9.7 8.9 7.1 9.8 9.1 10.8 4.4 2.8 4.4 17.3 1.3 16.1 5.8 20.4 6.4 7.4 17.9 12.0 13.0 5.7 8.0 9.9 6.2 14.3 11.1 6.1 10.6 6.0 3.7 9.5 22.4 8.3 7.9 12.0 7.9 5.8 6.7 6.0 5.8 13.0 7.2 5.9 6.2 4.9 4.9 4.0 6.6 5.7 3.5 7.1 7.9 8.5 6.8 7.0 3.9 7.5 6.5 7.9 3.1 2.3 2.6 10.0 1.1 9.7 3.5 12.7 3.4 5.1 9.3 6.7 7.9 3.7 6.1 7.2 3.7 9.3 7.3 4.8 7.2 4.0 2.8 6.4 12.5 6.6 5.3 9.4 5.9 16.5 17.5 16.8 11.0 54.5 86.7 21.9 16.1 8.8 7.0 13.4 41.9 17.3 7.3 23.2 16.9 18.2 16.6 12.4 42.1 14.0 15.3 17.1 7.5 3.7 14.4 63.5 1.5 47.1 15.6 51.4 43.4 13.8 279.6 56.3 37.7 12.6 11.6 15.9 19.3 31.8 23.2 8.4 20.2 11.9 5.4 18.9 108.6 11.3 15.1 16.6 12.1 b2569 b2570 b2571 b2572 b2573 b2575 b2576 b2577 b2578 b2579 b2580 b2581 b2583 b2584 b2585 b2587 b2592 b2593 b2594 b2595 b2596 b2599 b2600 b2601 b2602 b2603 b2604 b2605 b2606 b2607 b2608 b2609 b2610 b2611 b2612 b2613 b2614 b2615 b2616 b2617 b2618 b2619 b2620 b2622 b2627 b2629 b2630 b2631 b2638 b2640 b2643 b2645 129 lepA rseC rseB rseA rpoE yfiC srmB yfiE yfiK yfiD ung yfiF yfiP yfiQ pssA kgtP clpB yfiH sfhB pheA tyrA aroF yfiL yfiN yfiB rplS trmD yfjA rpsP ffh ypjE yfjD grpE yfjB recN smpA smpB intA yfjK yfjM yfjN yfjO yfjX yfjZ 11.6 9.5 10.1 10.9 8.0 20.2 11.5 4.4 12.5 10.6 5.1 13.8 21.3 5.1 8.7 5.3 9.4 6.1 7.8 8.0 1.5 8.7 15.2 3.6 6.3 4.8 6.2 9.8 19.1 11.7 13.8 8.6 5.8 3.5 8.0 8.6 10.0 6.8 6.4 8.1 7.0 5.3 4.4 3.1 10.4 4.1 6.9 8.2 13.3 9.1 16.9 2.2 7.6 5.4 6.9 6.7 5.6 10.9 7.6 3.2 7.6 7.6 3.7 8.9 12.0 4.2 6.1 4.3 7.1 3.6 6.1 5.4 0.8 5.0 8.4 2.5 4.0 3.5 3.8 6.2 14.8 7.5 7.8 6.9 4.9 2.3 6.0 5.5 6.8 4.9 4.2 5.9 4.7 4.1 3.7 2.7 7.1 2.2 5.2 6.3 7.6 5.7 9.8 1.2 25.2 39.6 19.1 27.9 14.0 132.8 23.4 6.9 35.9 17.6 8.3 30.4 94.3 6.3 15.2 6.8 13.8 19.6 10.7 15.5 14.4 33.8 78.9 6.4 13.9 7.8 16.5 23.6 26.8 26.4 57.1 11.3 7.1 7.1 12.0 18.9 19.1 11.1 13.5 13.1 14.4 7.5 5.4 3.7 19.6 24.2 10.3 11.6 52.2 22.5 58.9 18.0 b2647 b2660 b2661 b2662 b2663 b2664 b2665 b2666 b2667 b2669 b2670 b2672 b2674 b2676 b2677 b2679 b2680 b2682 b2683 b2684 b2686 b2687 b2688 b2689 b2690 b2696 b2697 b2698 b2699 b2700 b2701 b2703 b2704 b2705 b2706 b2707 b2708 b2709 b2712 b2714 b2715 b2717 b2718 b2719 b2722 b2724 b2726 b2727 b2728 b2729 b2730 b2731 ypjA ygaF gabD gabT gabP ygaE ygaU stpA ygaM nrdI nrdF proV proX ygaH emrR emrB ygaG gshA yqaB csrA alaS oraA recA ygaD mltB srlE srlB srlD gutM srlR gutQ ygaA hypF ascG ascF hycI hycH hycG hycD hycB hypA hypB hypC hypD hypE fhlA 9.6 4.5 2.9 6.8 6.4 11.8 8.2 4.8 1.4 15.9 3.9 16.8 9.4 12.4 4.1 6.8 11.0 3.6 5.9 7.1 9.8 12.3 9.6 7.2 4.4 7.2 8.0 9.0 6.6 7.2 10.9 11.2 17.3 23.3 6.3 6.8 9.9 10.4 14.7 12.8 15.7 7.0 11.2 8.4 16.5 9.2 5.4 8.7 3.2 9.9 9.8 12.7 5.5 3.2 2.2 4.6 4.2 7.2 5.7 3.5 0.7 8.5 2.5 10.4 6.2 6.9 2.6 5.5 7.2 2.4 4.8 5.5 5.7 9.3 7.0 5.2 2.8 5.2 6.6 4.9 4.8 5.4 8.0 8.3 11.1 13.0 5.1 5.3 6.7 6.7 8.3 8.2 8.1 4.5 6.4 4.8 8.3 5.7 3.6 5.6 1.8 6.5 5.6 8.0 38.6 7.4 4.3 13.0 12.9 33.0 14.6 7.9 15.7 133.3 8.8 43.6 19.2 61.7 9.3 9.0 23.7 8.1 7.7 10.2 35.5 18.0 15.1 11.5 10.1 11.6 10.1 54.0 10.7 11.1 16.8 17.1 39.2 117.0 8.3 9.5 19.5 23.7 65.5 29.2 210.8 15.3 44.4 34.5 1339.3 23.9 11.0 19.0 12.2 20.5 37.9 31.7 b2732 b2733 b2735 b2736 b2738 b2739 b2741 b2742 b2743 b2744 b2745 b2746 b2747 b2748 b2749 b2754 b2755 b2756 b2759 b2760 b2761 b2762 b2763 b2764 b2766 b2767 b2768 b2769 b2771 b2773 b2775 b2776 b2777 b2780 b2781 b2782 b2783 b2784 b2785 b2786 b2787 b2788 b2789 b2790 b2791 b2792 b2793 b2794 b2795 b2796 b2797 b2798 130 ygbA mutS ygbI ygbL ygbM rpoS nlpD pcm surE ygbO ygbB ygbP ygbE ygbF ygcK ygcB cysH cysI cysJ ygcN ygcO ygcP ygcQ ygcS ygcU yqcE ygcE ygcF pyrG mazG chpA chpR relA ygcA barA ygcX ygcY yqcB syd yqcD ygdH sdaC sdaB exo 11.9 8.4 10.3 16.0 8.5 12.2 10.1 8.7 6.4 2.2 3.1 12.8 14.2 6.4 11.2 18.0 9.8 16.8 6.6 6.2 11.7 9.2 6.8 12.5 7.5 5.6 20.7 11.6 23.4 11.7 2.0 10.9 2.7 6.8 12.2 11.0 8.2 6.6 5.5 10.6 13.3 5.8 3.4 5.8 4.1 7.5 4.5 4.7 7.9 7.7 13.2 9.1 6.6 6.5 7.2 10.2 5.8 7.9 6.5 5.6 5.3 1.2 2.5 8.5 8.6 4.6 6.9 9.1 6.1 9.4 3.9 3.5 7.5 5.7 4.8 7.7 5.1 3.1 12.5 7.0 12.6 7.7 1.1 6.7 2.2 5.8 7.4 7.1 6.4 4.6 4.5 6.0 8.4 4.6 2.6 4.1 2.6 5.3 3.0 3.8 5.6 5.8 8.2 6.2 59.6 11.6 17.9 37.8 15.9 26.0 21.9 19.6 8.2 13.3 4.2 25.6 41.3 10.2 30.0 690.5 25.1 81.9 20.6 24.7 26.4 23.4 11.6 34.0 14.4 27.3 60.3 34.8 167.2 24.3 17.4 28.6 3.7 8.4 35.0 23.9 11.3 11.4 7.1 44.9 31.6 8.0 4.8 9.9 10.5 13.1 8.5 6.2 13.2 11.7 33.6 17.1 b2799 b2800 b2802 b2803 b2804 b2805 b2806 b2809 b2810 b2811 b2813 b2817 b2818 b2819 b2820 b2821 b2822 b2824 b2825 b2826 b2827 b2829 b2830 b2833 b2834 b2836 b2837 b2838 b2839 b2840 b2841 b2843 b2845 b2847 b2859 b2863 b2865 b2869 b2875 b2876 b2877 b2887 b2888 b2889 b2890 b2891 b2892 b2893 b2894 b2895 b2896 b2897 fucO fucA fucI fucK fucU fucR ygdE ygdK mltA argA recD recB ptr recC ygdB ppdB ppdA thyA ptsP ygdP aas galR lysA lysR ygeA araE kduI yqeI ygeV ygfJ ygfT ygfU lysS prfB recJ dsbC xerD fldB ygfY 6.0 7.5 10.9 10.4 11.2 3.8 15.1 12.2 5.6 5.8 5.2 26.7 18.6 13.2 16.2 7.9 10.0 16.2 11.0 14.6 15.0 5.5 2.2 9.1 5.6 6.6 2.3 13.7 16.0 9.8 12.1 7.7 7.8 10.9 13.8 8.2 15.4 3.4 9.4 7.3 1.9 16.8 18.2 2.8 13.5 10.6 12.4 7.8 4.6 9.5 8.7 8.2 4.0 5.2 6.7 6.8 8.1 2.9 10.1 7.6 4.1 4.1 3.5 15.0 10.8 8.1 10.8 5.8 7.6 9.9 6.6 8.1 9.3 3.7 1.9 5.6 4.4 5.1 1.7 9.7 8.1 5.6 7.3 5.9 4.0 6.5 7.6 4.1 10.0 2.9 7.0 5.2 1.4 11.2 9.3 1.9 9.6 7.8 8.6 6.0 3.5 6.3 5.7 5.9 12.1 13.5 28.8 22.1 18.3 5.4 29.6 31.2 8.6 9.5 10.7 120.5 67.6 35.9 32.9 12.1 14.5 45.4 34.0 73.6 38.5 11.3 2.8 24.1 7.7 9.4 3.4 23.4 897.2 38.6 36.0 10.9 183.2 33.5 75.2 1449.4 33.2 4.2 14.3 12.1 2.7 33.5 405.6 5.8 22.7 16.6 22.6 11.1 6.7 18.5 17.7 13.7 b2898 b2899 b2900 b2901 b2903 b2905 b2906 b2907 b2908 b2909 b2910 b2912 b2913 b2914 b2916 b2919 b2921 b2922 b2923 b2924 b2925 b2927 b2928 b2929 b2930 b2934 b2935 b2936 b2937 b2938 b2939 b2941 b2942 b2946 b2947 b2948 b2949 b2950 b2951 b2952 b2953 b2954 b2955 b2957 b2958 b2959 b2960 b2961 b2962 b2963 b2964 b2965 131 ygfZ yqfB bglA gcvP gcvT visC ubiH pepP ygfB ygfE ygfA serA rpiA iciA ygfG ygfI yggE yggA yggB fba epd yggC yggD yggF cmtB tktA yggG speB speA yqgB yqgD metK yggJ gshB yqgE yqgF yggR yggS yggT yggU yggV yggW ansB yggN yggL yggH mutY yggX mltC nupG speC 6.2 2.6 6.5 10.6 12.4 7.2 8.8 4.6 4.6 3.8 4.3 3.9 8.7 7.6 6.0 18.5 6.2 5.4 6.1 18.3 14.1 5.7 6.6 6.2 3.9 12.2 13.0 6.0 18.0 9.7 19.7 16.2 2.9 3.8 6.8 5.6 8.5 3.9 5.5 5.2 11.5 10.8 16.0 15.0 17.4 10.1 6.3 10.2 8.6 10.9 2.7 18.0 4.6 1.8 4.5 6.8 8.5 5.1 6.5 3.5 4.0 3.2 3.4 2.9 6.9 5.8 4.2 12.3 3.8 4.1 4.2 12.3 10.6 4.7 4.8 4.6 2.0 8.4 8.7 4.4 11.4 6.8 10.1 9.7 2.1 3.0 5.6 4.5 6.5 2.4 4.3 4.1 8.4 7.2 9.8 10.1 11.0 7.3 4.5 7.4 6.8 7.5 2.1 10.6 9.6 4.5 11.6 24.7 22.4 12.0 13.6 6.7 5.6 4.8 5.6 6.1 11.8 10.8 10.5 37.6 16.2 7.8 11.5 36.0 21.2 7.3 10.6 9.8 124.9 22.3 25.8 9.6 42.7 16.8 386.8 48.5 4.6 5.0 8.7 7.6 12.3 11.9 7.5 7.2 18.5 21.5 42.8 29.6 40.8 16.3 10.5 16.7 11.8 19.8 3.6 59.4 b2968 b2974 b2977 b2978 b2981 b2984 b2988 b2989 b2992 b2993 b2994 b2995 b2996 b2997 b3000 b3001 b3002 b3003 b3005 b3007 b3008 b3009 b3011 b3012 b3015 b3016 b3017 b3018 b3020 b3021 b3022 b3023 b3024 b3025 b3026 b3028 b3029 b3031 b3032 b3033 b3034 b3035 b3036 b3037 b3038 b3039 b3040 b3041 b3042 b3049 b3051 b3052 yghD glcG glcF yghR gsp hybE hybD hybC hybB hybA yqhA yghA exbD metC yghB yqhD yqhE ygiR sufI plsC ygiW ygiX ygiY mdaB ygiN yqiA icc yqiB yqiE tolC ygiA ygiB ygiC ygiD ygiE ribB glgS - 8.3 13.8 15.5 8.8 10.7 10.4 9.2 8.9 16.8 12.8 12.2 9.1 7.7 4.4 10.0 5.4 6.1 11.2 1.5 8.1 16.3 8.6 7.8 8.5 15.2 14.5 12.2 4.9 7.6 2.6 3.2 9.8 14.0 8.0 13.3 5.0 6.3 6.5 5.3 5.4 7.0 7.1 9.0 5.2 5.7 17.7 3.1 5.6 4.8 10.0 17.3 11.7 4.9 7.5 8.7 6.1 5.3 6.6 6.7 6.7 8.8 7.4 6.5 6.1 6.2 3.8 6.4 4.4 3.3 6.6 0.8 4.3 9.7 6.5 4.9 5.6 8.5 9.1 6.1 3.6 6.2 1.8 2.4 7.1 9.8 5.3 7.1 3.4 4.6 5.2 4.9 4.2 5.3 6.2 6.2 4.4 3.4 10.8 2.3 4.1 3.8 6.3 9.7 7.3 28.1 82.1 70.5 16.0 3675.0 23.9 14.8 13.0 160.6 45.6 102.3 17.9 10.1 5.2 22.9 7.0 35.3 37.6 5.6 78.7 50.6 12.9 19.2 17.9 72.2 36.1 1038.8 7.6 9.8 4.6 4.8 16.0 24.4 17.0 113.0 9.2 9.9 8.7 5.7 7.6 10.4 8.3 16.3 6.3 17.6 48.8 4.4 8.6 6.5 23.4 81.9 29.2 b3053 b3054 b3055 b3056 b3057 b3058 b3064 b3065 b3066 b3067 b3068 b3070 b3071 b3072 b3074 b3075 b3080 b3082 b3083 b3085 b3086 b3087 b3091 b3092 b3093 b3094 b3095 b3096 b3097 b3099 b3100 b3101 b3102 b3103 b3107 b3108 b3109 b3110 b3115 b3116 b3117 b3122 b3124 b3127 b3128 b3129 b3130 b3131 b3133 b3135 b3136 b3139 132 glnE ygiF ygiM cca bacA ygiG ygjD rpsU dnaG rpoD ygjF yqjH yqjI aer ygjH ebgR ygjK ygjM ygjN ygjP ygjQ ygjR uxaA uxaC exuT exuR yqjA yqjB yqjC yqjE yqjF yqjG yhaH yhaL yhaM yhaN yhaO tdcD tdcC tdcB yhaD yhaU yhaG sohA yhaV agaR agaV agaA agaS agaC 11.6 3.3 15.8 13.5 7.2 4.9 5.6 7.5 7.7 20.1 3.2 9.1 9.3 10.8 9.2 13.9 11.9 9.1 8.9 9.7 15.0 5.2 4.7 7.7 3.1 4.4 6.3 7.5 11.5 7.0 14.9 11.7 9.9 7.0 9.5 9.6 6.2 10.0 12.0 9.2 8.2 12.8 12.3 3.1 4.3 3.0 4.9 3.3 4.4 9.6 10.2 4.2 7.8 2.3 9.2 9.2 5.1 3.6 3.7 4.6 5.5 10.5 2.4 5.7 6.5 7.0 6.1 8.7 7.3 5.5 5.9 5.9 10.1 3.8 2.5 5.4 2.2 3.3 4.7 6.0 6.4 5.2 11.1 8.5 7.8 5.3 6.9 7.0 4.3 7.1 7.1 6.2 5.6 10.5 6.8 2.0 2.2 2.3 4.0 2.4 2.3 6.4 6.2 2.8 23.1 5.7 55.8 25.4 12.4 7.6 11.9 20.6 12.8 246.8 4.8 21.6 16.5 23.2 19.4 35.1 32.5 24.6 17.9 27.3 29.2 8.2 59.7 13.3 5.2 6.3 9.5 9.9 55.2 10.9 22.6 18.7 13.5 10.1 15.1 15.3 10.7 17.0 38.1 18.5 15.1 16.5 64.0 7.2 645.5 4.4 6.4 5.4 33.4 19.0 28.1 8.8 b3141 b3142 b3148 b3149 b3150 b3151 b3152 b3153 b3154 b3155 b3156 b3157 b3160 b3162 b3163 b3164 b3165 b3166 b3167 b3168 b3169 b3170 b3172 b3173 b3175 b3176 b3177 b3178 b3179 b3180 b3181 b3182 b3184 b3185 b3186 b3187 b3188 b3189 b3190 b3191 b3192 b3193 b3194 b3195 b3197 b3198 b3199 b3200 b3201 b3202 b3203 b3204 agaI yraH yraN yraO yraP yraQ yraR yhbO yhbP yhbQ yhbS yhbT yhbW deaD yhbM pnp rpsO truB rbfA infB nusA yhbC argG yhbX secG mrsA folP hflB ftsJ yhbY greA dacB yhbE rpmA rplU ispB nlp murA yrbA yrbB yrbC yrbD yrbE yrbF yrbH yrbI yrbK yhbN yhbG rpoN yhbH ptsN 5.6 11.3 6.5 8.3 7.2 15.3 16.2 9.2 12.0 15.3 8.7 4.9 6.0 20.9 5.0 8.9 9.8 11.1 8.5 11.9 8.5 4.1 9.1 2.0 4.9 6.5 2.7 11.3 7.4 8.4 13.4 7.3 7.3 7.4 9.9 5.6 11.0 13.7 8.8 9.8 7.2 8.1 12.9 12.9 4.5 9.6 10.0 9.6 11.6 11.8 10.8 11.8 3.6 7.4 4.4 6.0 5.3 9.8 10.2 6.2 7.8 7.8 6.0 3.8 3.4 11.3 3.4 5.7 6.4 6.8 6.2 8.7 6.5 3.3 6.7 1.0 3.6 5.4 1.9 7.7 5.8 5.5 7.5 4.7 4.7 4.3 6.2 4.6 6.7 9.6 5.9 7.5 5.7 6.3 9.6 8.4 2.3 5.8 5.3 4.9 6.5 6.9 6.4 6.8 11.9 24.3 12.8 13.3 11.2 34.8 39.3 17.7 25.6 416.4 15.5 6.9 24.6 138.2 9.5 20.0 21.5 29.5 13.2 19.1 12.3 5.3 14.0 29.0 7.7 8.3 4.6 21.0 10.3 18.6 64.8 16.8 16.3 25.7 24.4 7.3 29.2 23.5 17.3 14.1 9.9 11.7 19.8 27.1 99.6 27.8 87.1 255.5 55.9 42.3 34.1 42.3 b3205 b3206 b3207 b3208 b3209 b3210 b3212 b3213 b3216 b3217 b3220 b3221 b3222 b3223 b3225 b3226 b3228 b3229 b3230 b3231 b3232 b3233 b3234 b3235 b3237 b3239 b3241 b3243 b3244 b3245 b3246 b3247 b3248 b3249 b3250 b3251 b3252 b3253 b3255 b3256 b3257 b3259 b3260 b3261 b3263 b3267 b3268 b3279 b3281 b3282 b3283 b3284 133 yhbJ ptsO yrbL mtgA yhbL arcB gltB gltD yhcD yhcE yhcG yhcH yhcI yhcJ nanA yhcK sspB sspA rpsI rplM yhcM yhcB degQ degS argR yhcO yhcQ yhcS tldD yhdP yhdR cafA yhdE mreD mreC mreB yhdA yhdH accB accC yhdT prmA yhdG fis yhdU yhdV yhdW yrdA aroE yrdC yrdD smg 11.3 20.3 3.8 10.1 8.9 6.9 8.3 14.2 10.0 17.1 20.2 6.8 3.6 4.9 4.4 4.7 7.6 7.4 9.6 9.7 8.4 8.1 13.5 17.1 4.3 7.6 7.1 8.7 7.7 14.7 10.6 10.7 7.3 6.4 15.3 11.6 14.7 4.7 8.6 12.9 9.3 14.1 12.4 13.9 12.1 19.0 16.1 7.3 14.5 13.5 7.6 5.6 7.5 10.6 2.9 6.5 6.5 5.2 5.7 9.7 6.5 10.8 11.6 3.9 2.7 3.5 3.4 3.9 5.3 5.8 7.1 6.7 5.2 5.2 8.2 8.8 2.6 4.4 4.2 5.7 5.9 9.7 6.8 6.7 5.4 4.7 8.8 7.4 9.5 3.6 6.9 9.3 5.3 9.8 8.0 8.2 6.8 10.7 8.1 4.1 8.3 7.4 5.3 4.2 22.7 225.4 5.3 22.0 14.3 10.1 15.5 26.2 22.3 40.4 79.2 24.1 5.4 8.5 6.3 5.8 13.5 10.2 14.6 17.4 21.7 18.5 37.1 289.5 12.5 28.6 22.7 18.0 11.2 30.0 23.9 26.9 11.3 9.9 56.4 26.5 32.8 6.8 11.3 21.1 36.9 25.1 27.7 45.3 54.7 85.8 683.1 32.5 60.2 75.4 13.6 8.6 b3285 b3286 b3287 b3288 b3289 b3290 b3291 b3292 b3293 b3294 b3295 b3296 b3297 b3298 b3300 b3301 b3302 b3303 b3304 b3305 b3306 b3307 b3308 b3309 b3317 b3318 b3319 b3320 b3321 b3322 b3323 b3325 b3326 b3327 b3329 b3330 b3335 b3336 b3337 b3340 b3341 b3342 b3343 b3344 b3345 b3346 b3348 b3350 b3354 b3355 b3356 b3357 smf_2 smf_1 def fmt sun trkA mscL yhdM yhdN rplQ rpoA rpsD rpsK rpsM prlA rplO rpmD rpsE rplR rplF rpsH rpsN rplE rplX rplB rplW rplD rplC rpsJ pinO yheD yheF yheG hofF hofH yheH hofD bfr yheA fusA rpsG rpsL yheL yheM yheN yheO slyX kefB yheU prkB yhfA crp 2.7 8.2 3.1 5.9 6.8 9.3 7.2 5.4 6.4 13.3 14.9 14.1 13.6 13.3 12.1 14.8 12.1 11.5 12.1 13.7 11.3 12.0 12.1 14.3 15.4 11.4 12.2 9.3 9.7 10.5 20.5 10.5 14.3 10.9 18.3 7.3 6.4 12.0 10.9 16.8 13.2 13.5 10.0 12.2 6.8 9.4 15.1 7.1 3.9 5.1 4.6 6.2 2.1 5.5 2.5 4.8 5.7 6.5 5.3 4.2 4.8 9.1 9.7 8.5 8.3 8.7 7.3 9.2 9.5 8.8 8.7 8.1 8.0 8.1 7.4 8.3 7.9 6.1 6.9 6.5 7.6 6.3 11.8 5.7 8.3 6.3 9.8 4.8 4.0 7.9 5.6 10.8 9.2 8.6 7.8 9.2 4.5 6.7 10.0 3.8 2.3 3.1 3.2 5.0 3.8 16.3 4.0 7.8 8.6 16.3 11.5 7.6 9.5 25.0 32.1 42.4 38.0 28.5 36.2 36.8 16.6 16.4 19.8 43.6 18.9 23.3 33.0 53.7 291.6 81.1 54.5 16.2 13.4 30.2 77.0 57.8 51.6 40.5 142.9 14.8 16.7 25.4 194.8 37.5 23.4 31.5 13.7 18.1 13.9 15.4 30.8 64.3 11.5 14.7 7.7 8.2 b3358 b3359 b3360 b3361 b3362 b3363 b3366 b3368 b3372 b3373 b3374 b3375 b3377 b3378 b3380 b3382 b3384 b3390 b3392 b3395 b3396 b3397 b3398 b3399 b3402 b3403 b3404 b3405 b3406 b3407 b3408 b3411 b3412 b3413 b3414 b3415 b3416 b3417 b3418 b3419 b3420 b3423 b3424 b3425 b3426 b3428 b3429 b3430 b3431 b3432 b3433 b3434 134 yhfK argD pabA fic yhfG ppiA nirD cysG yhfO yhfP yhfQ yhfR yhfT yhfU yhfW yhfY trpS aroK yrfA yrfD mrcA yrfE yrfF yrfG yhgE pckA envZ ompR greB yhgF feoA yhgA bioH yhgH yhgI gntT malQ malP malT yhgJ rtcA glpR glpG glpE glpD glgP glgA glgC glgX glgB asd yhgN 17.2 5.9 6.8 2.9 2.1 6.0 8.9 12.6 15.6 10.1 9.6 2.7 11.8 8.4 16.3 9.0 19.1 10.1 16.4 9.0 7.3 7.2 7.0 4.9 4.0 14.1 5.9 6.1 3.9 7.8 2.4 12.7 5.0 5.4 7.7 8.1 10.2 11.1 6.1 22.2 14.4 14.5 6.5 6.5 9.4 17.0 9.3 7.8 6.1 6.3 6.8 13.1 11.4 4.2 4.5 2.1 1.5 4.4 4.8 8.3 8.1 7.0 5.8 2.0 7.8 5.3 9.9 4.7 10.2 5.7 9.1 5.4 4.4 4.9 3.8 3.6 2.7 10.3 4.0 5.0 2.7 5.0 1.4 8.3 3.8 4.3 6.2 4.6 6.7 7.2 4.1 11.7 7.7 7.8 4.6 3.7 6.0 9.9 7.0 5.9 4.6 5.3 5.5 8.9 34.8 9.8 13.6 4.8 3.4 9.3 56.9 26.5 215.5 17.8 28.6 4.2 24.7 20.4 46.3 94.4 142.2 42.9 83.1 27.0 20.5 13.4 42.1 7.5 7.3 22.2 12.0 7.8 7.0 17.3 10.4 27.4 7.4 7.3 10.1 35.5 21.1 25.0 12.4 214.9 109.5 98.7 11.1 27.3 21.0 58.7 13.9 11.5 9.2 7.9 9.1 24.7 b3435 b3438 b3439 b3440 b3443 b3444 b3447 b3448 b3450 b3452 b3457 b3459 b3460 b3461 b3463 b3464 b3465 b3466 b3467 b3468 b3469 b3471 b3472 b3473 b3474 b3475 b3476 b3477 b3478 b3479 b3481 b3483 b3487 b3488 b3493 b3494 b3496 b3497 b3498 b3499 b3500 b3501 b3502 b3503 b3506 b3507 b3508 b3509 b3510 b3511 b3512 b3513 gntU_2 gntR yhhW yhhX yrhA insA_6 ggt yhhA ugpC ugpA livH yhhK livJ rpoH ftsE ftsY yhhF yhhL yhhM yhhN zntA yhhQ yhhS yhhT yhhU nikA nikB nikC nikD yhhG yhhH yhiI yhiJ pitA yhiO yhiP yhiQ prlC yhiR gor arsR arsB arsC slp yhiF yhiD hdeB hdeA hdeD yhiE yhiU 18.8 8.5 7.8 10.8 10.3 10.2 9.6 2.8 10.0 11.1 13.3 13.5 8.8 6.8 8.8 4.9 4.7 6.2 8.5 2.5 8.5 11.9 5.5 7.1 3.7 3.8 18.1 9.7 14.7 10.6 11.7 10.0 7.4 8.1 13.9 1.4 6.2 12.8 16.4 7.8 11.1 6.6 8.5 8.6 5.6 6.9 9.8 8.3 8.9 3.0 2.0 12.0 10.2 6.4 5.7 7.1 5.7 7.0 5.7 2.0 7.4 6.4 7.4 9.7 5.2 4.8 6.6 4.2 3.6 3.9 5.9 1.5 6.5 6.5 4.2 4.6 2.8 2.3 10.1 7.0 9.3 8.3 8.2 5.3 5.2 4.5 10.2 1.2 5.2 9.0 11.8 5.1 8.5 3.6 5.0 5.4 3.8 4.0 7.1 6.3 6.3 2.5 1.5 7.8 120.0 12.6 12.6 22.0 53.8 18.8 30.0 4.4 15.6 41.1 64.3 22.0 30.8 11.8 13.2 6.0 6.6 14.7 15.1 7.1 12.3 68.6 8.1 15.7 5.6 11.3 87.0 15.8 34.8 14.8 20.0 82.9 12.3 46.8 21.5 1.6 7.6 22.0 26.8 15.9 15.9 40.3 26.7 20.9 10.5 25.1 15.9 12.4 14.9 3.7 3.1 26.1 b3514 b3515 b3516 b3518 b3519 b3520 b3521 b3522 b3523 b3524 b3526 b3527 b3528 b3529 b3533 b3534 b3535 b3536 b3537 b3538 b3540 b3541 b3542 b3543 b3544 b3549 b3554 b3555 b3556 b3559 b3560 b3562 b3565 b3566 b3567 b3569 b3570 b3571 b3581 b3582 b3588 b3589 b3590 b3591 b3592 b3597 b3598 b3599 b3600 b3601 b3602 b3603 135 yhiV yhiW yhiX yhjA treF yhjB yhjC yhjD yhjE yhjG kdgK yhjJ dctA yhjK yhjO yhjQ yhjR yhjS yhjT yhjU dppF dppD dppC dppB dppA tag yiaF yiaG cspA glyS glyQ yiaA xylA xylF xylG xylR bax malS sgbH sgbU aldB yiaY selB selA yibF yibH yibI mtlA mtlD mtlR yibL lldP 10.0 2.0 1.8 11.5 10.2 13.6 9.1 3.8 17.9 6.0 6.4 9.0 4.1 10.8 11.6 14.2 5.5 5.9 5.5 6.6 16.8 9.2 9.6 7.3 5.0 12.4 6.6 5.6 1.4 11.5 7.1 18.7 2.3 5.9 6.9 6.5 8.8 6.6 20.0 17.6 13.8 24.2 13.8 8.8 3.4 4.2 3.2 3.7 4.5 6.5 2.7 10.6 5.3 1.6 1.4 6.0 5.5 7.8 5.0 2.7 10.4 4.1 4.3 5.9 3.4 7.9 7.3 7.6 4.5 5.0 4.5 3.8 9.1 5.8 4.8 5.2 3.4 8.2 4.7 3.4 1.0 8.4 5.5 9.8 1.7 4.4 4.9 4.1 6.2 5.2 11.4 8.9 7.3 12.2 7.1 5.2 2.3 2.9 2.0 2.8 3.5 5.0 1.7 7.2 87.4 2.5 2.7 121.1 77.5 53.0 46.1 6.6 66.9 11.6 13.1 18.8 5.3 16.8 28.7 117.3 7.3 7.2 6.9 24.5 113.9 22.2 407.3 12.6 9.4 25.1 11.5 15.7 2.4 18.2 10.3 213.2 3.6 9.0 11.7 16.3 15.1 9.1 81.7 930.3 142.6 4073.7 233.0 28.4 6.5 7.5 9.0 5.3 6.5 9.4 6.2 20.1 b3605 b3606 b3607 b3608 b3609 b3610 b3611 b3612 b3613 b3615 b3617 b3630 b3631 b3632 b3633 b3634 b3635 b3636 b3637 b3638 b3639 b3640 b3641 b3642 b3644 b3645 b3646 b3647 b3648 b3649 b3650 b3651 b3652 b3653 b3660 b3661 b3667 b3669 b3670 b3672 b3674 b3675 b3676 b3677 b3681 b3683 b3685 b3686 b3687 b3688 b3699 b3701 lldD yibK cysE gpsA secB grxC yibN yibO yibP yibD kbl rfaP rfaG rfaQ kdtA kdtB mutM rpmG rpmB radC dfp dut ttk pyrE yicC dinD yicG yicF gmk rpoZ spoT spoU recG gltS yicL nlpA uhpC uhpA ilvN ivbL yidF yidG yidH yidI glvG glvC yidE ibpB ibpA yidQ gyrB dnaN 1.2 19.3 9.9 4.3 6.2 6.6 7.7 4.7 10.4 17.5 17.6 11.7 11.1 9.2 10.2 7.7 21.1 10.9 10.3 4.4 7.0 9.1 11.3 11.6 12.1 4.9 11.2 12.6 5.3 6.4 9.8 10.0 27.8 3.5 11.2 6.3 4.2 14.2 6.3 3.3 1.5 6.2 6.1 8.7 10.2 7.9 1.3 21.9 11.7 5.0 15.8 7.6 0.8 1.9 11.8 53.5 8.2 12.5 3.6 5.5 5.4 7.3 5.3 8.6 5.9 11.1 3.6 6.8 7.3 18.5 8.8 681.8 9.8 87.9 7.7 24.8 7.5 21.3 7.1 13.2 7.6 15.6 5.1 15.3 10.8 499.9 6.8 26.2 7.1 19.0 3.3 6.4 4.9 12.3 5.7 23.0 7.7 20.6 6.8 39.1 8.1 23.6 3.6 8.0 6.5 37.9 8.6 23.7 4.4 6.8 5.3 8.1 8.0 12.4 6.7 19.8 15.2 169.0 1.9 20.8 7.0 28.7 3.6 27.5 2.2 33.1 7.1 10554.4 3.7 20.5 2.7 4.2 1.0 3.1 3.2 215.3 3.9 14.0 5.9 16.7 6.1 31.9 4.5 32.8 1.0 2.0 11.7 174.3 6.4 71.3 3.9 6.8 8.7 91.7 5.3 13.5 b3702 b3703 b3704 b3706 b3709 b3712 b3713 b3715 b3717 b3718 b3724 b3725 b3727 b3736 b3737 b3738 b3739 b3741 b3742 b3744 b3745 b3746 b3748 b3749 b3750 b3751 b3752 b3753 b3754 b3755 b3762 b3763 b3764 b3766 b3777 b3778 b3779 b3780 b3781 b3782 b3783 b3784 b3785 b3786 b3787 b3788 b3789 b3790 b3791 b3792 b3793 b3794 136 dnaA rpmH rnpA thdF tnaB yieE yieF yieH yieJ yieK phoU pstB pstC atpF atpE atpB atpI gidA mioC asnA yieM yieN rbsD rbsA rbsC rbsB rbsK rbsR yieO yieP yifA pssR yifE ilvL yifN rep gppA rhlB trxA rhoL rho rfe wzzE wecB wecC rffG rffH wecD wecE wzxE wecF wecG 9.4 7.6 10.9 8.6 11.4 9.5 11.6 9.0 11.7 10.1 7.6 10.5 13.6 12.8 12.7 8.7 4.8 11.0 9.2 10.1 5.2 6.1 3.8 2.4 8.2 14.5 6.3 8.8 5.7 5.1 6.5 3.9 11.1 9.2 7.4 11.7 19.3 4.9 5.5 1.9 6.5 8.0 10.9 10.1 10.6 13.2 13.6 12.4 15.7 11.9 10.1 12.3 7.2 5.5 7.4 5.9 7.7 7.2 8.7 6.0 7.5 6.2 5.6 7.7 8.0 7.2 8.3 5.9 3.6 7.4 6.8 6.4 3.7 4.2 2.8 2.0 6.9 7.6 5.0 6.2 3.8 4.2 5.4 3.1 7.4 6.6 4.0 6.3 12.0 4.1 4.6 1.6 5.3 6.1 9.0 6.0 7.3 7.8 7.6 7.7 10.2 7.0 7.2 8.0 13.6 12.1 20.3 16.2 21.5 14.1 17.6 17.3 26.5 26.0 11.9 16.2 47.2 57.5 27.1 16.8 6.9 21.3 14.2 23.7 8.7 11.0 5.8 3.2 10.3 143.8 8.5 15.0 11.4 6.3 8.4 5.3 22.8 15.1 52.6 78.2 50.0 6.1 6.9 2.4 8.7 11.9 14.0 31.2 19.4 40.7 67.7 32.5 34.0 40.1 16.9 26.6 b3795 b3800 b3801 b3802 b3803 b3804 b3805 b3806 b3807 b3809 b3820 b3821 b3822 b3823 b3825 b3826 b3827 b3830 b3831 b3832 b3833 b3834 b3835 b3836 b3837 b3838 b3839 b3840 b3842 b3843 b3844 b3845 b3859 b3860 b3861 b3863 b3865 b3866 b3867 b3869 b3870 b3871 b3872 b3876 b3878 b3881 b3883 b3884 b3885 b3886 b3898 yifK aslB aslA hemY hemX hemD hemC cyaA cyaY dapF yigI pldA recQ yigJ pldB yigL yigM ysgA udp yigN ubiE yigP yigR yigU yigW_ 1 rfaH yigC ubiB fadA yihE dsbA yihF polA yihA yihI hemN glnL glnA yihK yihL yihO yihQ yihT yihV yihW yihX rbn frvX 10.0 13.4 15.1 5.9 6.8 5.0 4.1 11.1 13.5 14.1 16.3 7.8 9.7 4.8 7.4 10.0 12.0 11.3 9.4 4.9 4.3 6.1 6.8 6.6 8.2 8.5 10.5 12.2 7.1 8.6 8.4 4.6 5.7 4.2 3.3 6.5 10.0 7.2 8.3 4.4 5.4 3.0 5.4 7.2 6.7 7.0 7.1 3.8 3.7 5.3 5.8 5.5 6.6 6.8 7.7 7.9 17.0 30.8 78.1 8.4 8.4 6.1 5.6 39.1 20.7 275.2 534.6 36.7 47.7 12.8 12.1 16.4 57.8 29.4 13.9 6.8 5.2 7.1 8.4 8.1 10.6 11.1 16.7 26.8 4.7 6.1 9.3 7.9 11.2 16.6 11.9 5.5 6.3 3.1 8.8 8.1 10.8 11.8 6.0 14.8 6.0 10.6 27.5 5.3 6.2 6.9 8.5 2.5 4.3 5.7 4.2 6.2 9.8 7.9 4.3 4.9 2.3 6.8 6.2 6.6 6.5 4.4 7.6 3.9 7.1 14.9 3.7 4.8 3.9 4.8 46.3 10.1 25.9 65.6 56.0 53.3 24.8 7.9 8.9 5.0 12.6 11.6 30.3 67.7 9.7 299.9 13.5 21.1 176.8 9.5 8.6 31.6 36.9 b3899 b3900 b3902 b3903 b3904 b3905 b3906 b3908 b3909 b3910 b3911 b3912 b3913 b3914 b3915 b3916 b3917 b3918 b3919 b3920 b3921 b3922 b3933 b3934 b3935 b3936 b3937 b3938 b3939 b3942 b3943 b3945 b3946 b3947 b3949 b3950 b3952 b3954 b3955 b3956 b3957 b3958 b3974 b3981 b3982 b3983 b3984 b3985 b3986 b3991 b3993 b3995 137 frvB frvA rhaD rhaA rhaB rhaS rhaR sodA kdgT yiiM cpxA cpxR yiiP pfkA sbp cdh tpiA yiiQ yiiR yiiS ftsN cytR priA rpmE yiiX metJ metB katG yijE gldA talC ptsA frwC frwB pflC yijO yijP ppc argE argC coaA secE nusG rplK rplA rplJ rplL thiG thiE yjaE 9.9 2.3 8.2 7.4 7.7 11.2 9.9 8.0 16.2 5.5 4.8 4.0 2.0 1.8 5.5 9.5 8.9 11.5 12.6 5.2 4.9 13.3 7.3 5.8 7.3 9.8 16.4 4.1 7.4 6.7 5.0 17.0 8.5 8.8 12.0 9.3 14.9 19.8 19.1 7.8 8.2 12.1 9.7 9.6 9.7 10.2 10.6 11.1 16.9 25.8 5.8 3.2 6.1 1.2 5.6 5.2 4.7 6.1 6.9 6.1 9.4 4.2 3.4 3.3 1.4 1.3 4.4 7.0 6.0 7.9 10.2 3.3 3.4 7.7 5.4 4.4 4.0 6.9 10.0 3.1 4.8 3.9 2.8 11.3 6.1 5.4 8.4 4.9 8.4 12.3 12.3 5.9 5.7 6.4 6.2 6.8 6.7 7.0 7.2 7.2 9.3 13.4 3.7 2.3 25.9 21.5 15.7 12.6 21.1 77.7 17.6 11.6 58.1 8.1 8.7 5.0 3.1 3.2 7.2 15.1 17.5 21.6 16.3 12.6 9.3 49.1 11.2 8.5 42.2 16.4 45.3 5.8 16.1 24.9 23.0 34.4 14.1 23.7 20.9 79.8 63.2 50.5 42.8 11.3 14.4 113.5 21.7 16.5 17.3 18.3 20.3 24.6 96.6 316.3 13.9 5.2 b3996 b3997 b3999 b4000 b4001 b4003 b4005 b4019 b4020 b4021 b4022 b4023 b4024 b4025 b4027 b4030 b4031 b4032 b4033 b4037 b4039 b4040 b4041 b4042 b4043 b4054 b4055 b4056 b4057 b4058 b4059 b4062 b4064 b4065 b4067 b4069 b4070 b4072 b4073 b4075 b4076 b4077 b4079 b4090 b4093 b4094 b4098 b4104 b4105 b4107 b4108 b4111 yjaD hemE yjaG hupA yjaH hydH purD metH yjbB pepE yjbC yjbD lysC pgi yjbF yjbA xylE malG malF malM ubiC ubiA plsB dgkA lexA tyrB aphA yjbQ yjbR uvrA ssb soxS yjcD yjcE yjcG acs nrfA nrfC nrfD nrfF nrfG gltP fdhF rpiB phnO phnN phnJ phnE phnD phnB phnA proP 5.7 8.9 6.7 20.4 10.0 11.9 4.7 11.4 7.5 8.5 3.4 10.5 6.8 7.2 8.1 8.2 12.4 9.6 7.7 10.9 4.7 4.2 9.4 6.2 9.9 15.2 4.2 3.8 6.3 14.7 6.2 1.4 7.1 5.3 22.2 2.6 12.0 3.3 10.5 17.3 34.5 9.0 9.7 13.1 9.4 8.5 23.7 10.6 11.3 9.1 8.5 5.6 3.8 7.0 5.5 10.2 6.9 7.5 2.5 7.2 5.2 6.1 2.3 7.9 3.7 5.6 4.7 4.8 8.1 6.6 5.4 6.1 3.4 3.5 6.3 4.2 6.4 8.5 3.0 2.9 4.3 8.8 4.6 1.0 4.5 4.1 11.4 2.0 7.0 1.7 7.4 10.6 17.8 5.0 6.2 7.5 6.9 5.5 12.3 7.0 6.9 5.9 5.9 4.2 10.9 12.0 8.6 5454.0 18.7 29.2 28.5 26.6 13.7 14.2 6.5 15.5 40.0 9.9 29.0 28.4 26.4 17.2 13.2 53.8 7.5 5.4 18.4 11.2 22.4 75.2 6.7 5.7 12.3 44.8 9.7 2.2 17.5 7.7 459.1 3.7 43.4 75.1 18.1 47.4 543.9 41.3 22.9 49.2 14.9 18.3 300.0 22.1 31.0 20.0 14.9 8.2 b4112 b4113 b4114 b4116 b4126 b4127 b4129 b4130 b4131 b4132 b4135 b4136 b4137 b4138 b4139 b4140 b4141 b4142 b4143 b4144 b4146 b4147 b4148 b4149 b4150 b4151 b4152 b4153 b4166 b4167 b4168 b4169 b4170 b4171 b4172 b4173 b4174 b4175 b4177 b4178 b4179 b4181 b4183 b4184 b4188 b4189 b4191 b4193 b4199 b4203 b4206 b4207 138 basS basR yjdB adiY yjdI yjdJ lysU yjdL cadA cadB yjdC dsbD cutA dcuA aspA yjeH mopB mopA yjeI yjeK efp sugE blc ampC frdD frdC frdB yjeS yjeF yjeE amiB mutL miaA hfq hflX hflK hflC purA yjeB vacB yjfI yjfK yjfL yjfN yjfO yjfQ sgaT yjfY rplI ytfB fklB 5.8 1.8 9.2 13.2 10.4 10.8 11.4 7.0 11.3 4.6 6.3 8.1 9.3 9.0 6.5 10.4 9.8 11.8 5.5 6.4 10.5 10.3 11.8 2.9 13.2 7.6 9.9 9.2 10.0 5.8 4.5 5.9 5.2 12.2 6.6 7.6 12.2 12.3 7.1 4.3 8.1 14.9 7.5 8.2 0.8 1.4 12.0 8.8 16.0 12.9 5.5 10.0 4.4 1.1 6.1 6.7 5.9 6.7 7.4 4.3 6.8 3.0 4.6 5.0 7.2 7.0 4.5 6.8 6.8 7.9 3.8 5.6 7.4 6.5 7.8 2.3 7.6 5.9 7.3 5.9 5.1 4.3 2.9 4.1 4.3 6.4 5.1 6.0 8.2 8.8 5.3 3.6 6.6 8.7 4.6 5.8 0.7 0.8 6.4 4.8 8.5 6.9 4.3 7.8 8.7 5.0 19.0 401.2 45.7 28.3 25.2 18.5 32.9 10.0 10.1 20.8 13.1 12.5 11.9 22.5 17.7 22.9 9.8 7.6 17.9 24.3 23.7 4.0 53.2 10.7 15.3 21.0 381.5 8.8 10.1 10.1 6.6 111.3 9.2 10.3 24.4 20.5 10.5 5.4 10.6 49.5 21.2 13.8 1.0 6.5 84.3 56.5 147.9 94.0 7.7 14.1 b4208 b4209 b4210 b4211 b4213 b4214 b4215 b4216 b4217 b4218 b4219 b4220 b4221 b4222 b4224 b4225 b4226 b4227 b4239 b4242 b4243 b4244 b4245 b4247 b4248 b4250 b4252 b4255 b4256 b4258 b4259 b4260 b4261 b4263 b4279 b4280 b4281 b4288 b4289 b4291 b4294 b4295 b4296 b4297 b4298 b4302 b4304 b4305 b4322 b4323 b4327 b4329 cycA ytfE ytfF ytfG cpdB cysQ ytfI ytfJ ytfK ytfL msrA ytfM ytfN ytfP chpS chpB ppa ytfQ treC mgtA yjgF pyrI pyrB yjgG yjgH yjgK yjgD valS holC pepA yjgP yjgR yjhB yjhC yjhD fecD fecC fecA insA_7 yjhU yjhF yjhG yjhH sgcA sgcC sgcX uxuA uxuB yjiE yjiG 10.9 10.6 10.1 9.1 10.6 8.2 6.4 7.6 3.8 4.9 3.4 11.2 6.4 8.7 6.6 3.6 13.7 6.6 6.5 9.6 9.8 10.5 10.8 8.9 7.5 10.1 14.1 7.8 8.9 10.7 8.3 5.1 8.4 5.5 2.7 8.4 15.2 15.0 12.8 8.8 13.9 4.2 23.0 8.4 10.6 6.5 17.1 9.7 5.1 8.0 5.2 6.9 7.6 6.8 7.2 6.0 7.8 5.7 3.3 6.3 2.7 2.7 2.7 8.3 4.1 5.7 4.0 2.7 7.4 4.4 4.2 6.7 7.6 7.0 6.9 4.5 5.0 5.8 8.9 6.4 6.2 7.3 5.7 4.3 6.1 3.2 1.9 5.6 9.0 8.7 7.4 6.3 9.6 3.3 14.4 4.6 7.1 4.0 9.2 4.8 4.1 6.4 2.7 4.5 19.4 24.4 16.9 18.4 16.9 14.3 109.0 9.7 6.7 30.9 4.7 17.0 13.9 18.0 19.4 5.6 93.9 13.3 14.2 16.6 13.9 21.2 25.3 881.4 14.8 40.6 34.7 10.1 15.3 20.3 15.8 6.3 13.3 18.2 4.7 17.4 50.6 53.8 50.1 14.5 25.4 5.7 58.1 46.5 20.9 16.3 113.4 1513.1 6.9 10.7 103.6 14.1 b4330 b4331 b4332 b4334 b4335 b4336 b4337 b4339 b4341 b4350 b4351 b4352 b4353 b4354 b4356 b4357 b4358 b4359 b4360 b4361 b4362 b4364 b4373 b4376 b4377 b4387 b4389 b4390 b4391 b4392 b4393 b4394 b4396 b4397 b4398 b4401 b4402 b4403 139 yjiH yjiI yjiJ yjiL yjiM yjiN yjiO yjiQ yjiS hsdR mrr yjiA yjiX yjiY yjiZ yjjM yjjN mdoB yjjA dnaC dnaT yjjP rimI osmY yjjU smp sms nadR yjjK slt trpR yjjX rob creA creB arcA yjjY lasT 13.7 7.2 11.9 8.9 3.6 2.6 13.3 12.9 13.6 6.6 2.1 4.1 4.0 7.6 9.3 2.1 12.0 11.7 8.0 8.5 9.5 15.0 6.7 8.3 11.5 14.4 5.0 4.7 9.1 4.4 5.5 3.5 1.7 5.0 8.4 5.5 5.1 14.7 7.4 5.3 7.3 6.4 2.8 1.9 8.8 8.0 7.4 3.3 1.2 3.2 3.4 5.5 6.3 1.3 7.1 8.4 5.6 6.0 6.8 9.8 4.6 5.5 7.4 7.3 2.8 3.4 7.1 3.7 3.3 2.0 1.5 3.5 5.5 4.5 3.9 9.6 93.4 11.2 30.8 14.9 4.9 3.8 26.7 33.2 84.4 746.1 9.1 5.5 4.9 12.6 18.2 5.7 40.3 19.2 14.1 14.6 16.0 32.6 12.7 16.9 24.9 560.5 21.6 7.5 12.9 5.4 17.7 11.8 2.1 8.4 17.8 7.1 7.4 30.5 140