Global analysis of RNA degradation in Escherichia coli using high

High-resolution microarray analysis of RNA degradation in
Escherichia coli
A thesis presented
by
Douglas Wayne Selinger
to
The Division of Medical Sciences
in partial fulfillment of the requirements
for the degree of
Doctor of Philosophy
in the subject of
Genetics
Harvard University
Cambridge, Massachusetts
November 22, 2002
i
Copyright 2002 by Douglas Wayne Selinger
All rights reserved.
ii
Advisor: George M. Church
Douglas Wayne Selinger
High-resolution microarray analysis of RNA degradation in
Escherichia coli
Abstract
Reductionist biological research has been one of the most successful scientific
enterprises of our age, elucidating everything from the molecular basis of genetic
information to the functioning of the cellular machinery. Perhaps it was inevitable that
computers, the other salient scientific development of our time, would transform biology
with its paradigms of miniaturization, automation, and digital information. DNA
microarrays are an exciting product of this technological fusion, allowing the
simultaneous monitoring of thousands of RNA transcripts in a miniaturized, massively
parallel, machine-readable format.
In Chapter 2, I describe the first use of a "genome" microarray, which has probes
for both open reading frames (ORFs) and intergenic regions in the sequenced model
organism Escherichia coli MG1655. This array, synthesized by Affymetrix using a
highly parallel light-directed in situ oligonucleotide synthesis method adapted from the
semiconductor industry, contains almost 300,000 oligonucleotide probes of known
sequence. This large number of oligos allows the genome to be sampled at an average
resolution of ~1 oligonucleotide probe every 30 bases.
In the course of this work I developed an RNA labeling protocol based on random
priming useful for expression analysis in E. coli and potentially other prokaryotes. I also
developed a set of freely-available software tools, collectively named Genome Array
Processing Software (GAPS) (Appendix B), which are useful for analyzing gene
iii
expression data as well as for subgenic-resolution mapping of expression data to the
genome. I describe the application of this technology to compare RNA expression
profiles between cultures of E. coli growing in rich medium at logarithmic versus
stationary phase.
In Chapter 3, I describe a global analysis of RNA degradation which resulted in
the measurement of as many as 2,679 RNA chemical half-lives (listed in Appendix C),
representing ~60% of the known and predicted ORFs. High-resolution analysis of this
rifampicin timecourse revealed that there are highly significant positional patterns in the
degradation of different operonic regions, with 5' regions degraded more quickly and 3'
ones more slowly. This result confirms, and further generalizes, the current model of a
net 5' to 3' directionality of degradation.
iv
Table of Contents
Chapter 1 - Introduction
1.1 Systems Biology Completion (SBC)
1.2 Prokaryotic DNA microarray analysis
1.3 RNA decay in E. coli
1
2
9
13
Chapter 2
RNA expression analysis using a 30 base pair resolution
Escherichia coli genome array
20
Chapter 3
Global RNA half-life analysis in Escherichia coli reveals
positional patterns of transcript degradation
50
Chapter 4
Conclusion
78
Appendix A
Selinger D. W. et al, Nature Biotechnology 18, 12621268 (2000).
87
Appendix B
Genome Array Processing Software (GAPS) manual
95
Appendix C
Half-lives of 2,679 E. coli mRNAs
112
v
I dedicate this thesis to my wife,
who is my support and inspiration,
and to my parents,
who let me find my path and gave me the strength to follow it.
This thesis is also dedicated to those who marvel at the workings of nature,
but do not have the means to explore them.
vi
Acknowledgements
My decision to pursue a Ph.D. degree was made in the 10th grade, precisely at the
moment I learned about DNA and the molecular basis of genetics. It has been a long and
exciting road from that moment to the completion of my thesis, and has been possible
only by the support of many people along the way. Let me begin, however, at the end.
George Church has been a better guide for my scientific wanderings than I could
have possibly hoped for. It is plain to see that his drive to continually push the envelope
in his research is motivated by the pure joy of discovery. His lab is a haven for ideas and
expertise from every field, where order is imposed gently, and mainly by example.
George's visionary nature, unique brand of pragmatism, and exceptional rigor, inspires
emulation by his students and breeds independent thinking. He enjoys his students'
company, and is a devoted mentor and teacher. I can not imagine a better place to have
spent my graduate career.
When I first arrived in the lab, I was fascinated by the spirited lab meeting
debates, often with Fritz and Pete on opposing ends, and the rest of us moderating and
contributing in between. They were always rigorous and hard-fought, but also
constructive and balanced. They energized me as they continue to do to this day. I
learned the importance of thinking quantitatively, the utility of ignoring artificial
boundaries between disciplines, and how to distinguish between things that can't be done,
and those that simply haven't been done yet. This initial group of people were both
mentors and colleagues to me. I thank Fritz Roth for being the first to take me under his
wing and for suggesting I apply DNA microarrays to RNA degradation. I thank Pete
Estep for giving me an appreciation for the biotech business, many challenging debates,
vii
and his boundless and infectious energy. I thank Martha Bulyk for her wealth of
experimental knowledge and her willingness to share it. I thank Saeed Tavazoie for his
enthusiasm for math and physics, and for proclaiming long before most biologists, "this
data is screaming to be clustered!" I thank Dereth Phillips for being a model teacher,
matriarch of the lab, and spreader of good cheer and good science. I thank Jason Johnson
for many insights into protein structure and for the best fish fry in history. I thank Jason
Hughes, father of AlignACE, for his many critiques of computational biology issues. I
thank Martin Steffen for helpful advice, beyond measure, on experimental design, and for
the wealth of facts he brings to a discussion on just about any topic. I thank Rob Mitra for
his wealth of humbly-presented, always helpful, questions and insights, and for his lab
meeting presentations, which were filled with enough intrigue and methodical detective
work to put Arthur Conan Doyle to shame. I thank John Aach for the rigorous and
quantitative advice he humbly provides on almost any topic, and for many discussions on
philosophy. I thank Abby McGuire for her contributions to microbial informatics and for
many programming tips. I thank Keith Robison for introducing bioinformatics to
Harvard. I had the pleasure of learning yeast genetics during a rotation with Fred
Winston, who is both a master scientist and teacher. I also spent a rewarding rotation with
Roger Brent who encouraged me to think about biology more broadly.
Soon the field of genomics exploded and the Church lab expanded rapidly to
include an exciting group of students and post docs. I was incredibly fortunate to be
surrounded by such bright and interesting people. They have been my companions
through discussions of almost every topic imaginable. I thank Vasu Badarinarayana for
being my prokaryotic ally as the lab turned eukaryotic (him too, eventually) and for being
viii
the first user of GAPS©. I thank Kevin Cheung, a Harvard undergraduate at the time, for
his help in carrying out the RNA rifampicin timecourse and for his excitement for
research. I thank Barak Cohen for years of personal and practical advice and for stories
from the front lines of competitive bird watching. I thank Patrik D'haeseleer for always
being a cheerful and interested source of advice. I thank Adnan Derti for the initial
BLASTing of the Affymetrix E. coli oligos against the genome, and for his general
programming prowess and social conscience. I thank Aimee Dudley for her deep
understanding of yeast genetics in particular, and experimental practice in general, which
has so benefited the lab. I thank Jeremy Edwards for discussions about flux balance
analysis and metabolism. I thank Yonatan Grad for endless, and I mean endless, (but
generally funny) puns. I thank Xiaohua Huang for experimental advice on array signal
amplification. I thank Jake Jaffe for teaching me about mass spectrometry and
Mycoplasmas, and for being one of the few to join me on my biking expeditions. I thank
Dan Janse for his parties and for his tea room companionship. I thank Peter Karchenko
for his expertise on biochemical systems modeling. I thank Felix Lam for his help with
the fermentor and his streamlined ordering system. I thank Kyriacos Leptos for
identifying the linguistic roots of any word imaginable. I thank Nobuhisa Masuda for his
helpful Nihongo (Japanese) lessons. I thank Tzachi Pilpel for his enthusiasm for teaching
and mentoring and his early Matlab help. I thank Allegra Petti for sitting next me in the
computer lab and sharing the little frustrations that computers like to send our way. I
thank Nick Reppas for his clear thinking as a TF for biophysics 101 and for sharing his
experiences with Buddhism. I thank Wayne Rindone for keeping track of the important
details of maintaining ExpressDB and helping me post my microarray data on the web. I
ix
thank Dan Segre for discussions on biochemical modeling and life in Israel. I thank Jay
Shendure for numerous interesting tea room discussions. I thank Priya Sudarsanam for
following me from the Winston lab and for her fine example of how to go from a first
rate biologist into a first rate computational biologist. I thank Matt Wright for his patient
explanation of anything mathematical, physical, or chemical and for our shared, idealistic
pursuit of the 'big' problems of science and philosophy. I thank "Dr. Kazu" Yanai for his
Japanese restaurant guide and without whose guidance I would never have tasted Shabushabu. I thank Zhou Zhu for being an example of dedication and for her cheerful
computer room presence.
It was always a pleasure dealing with Cindy Reyes, Isabelle Jacquet, Mary Beth,
Eva Marie and Bob Tannis, and they kept the lab and the department running with
absolute efficiency. Phil Leder made the Genetics department a great place to be, and as
far as I'm concerned, Connie Cepko and the BBS administrators have put together the
best Ph.D. program anywhere in the world.
I thank the National Science Foundation and the Japanese MEXT for the
Monbusho program which allowed me to spend the summer of 2001 in Kyoto, Japan. I
thank Minoru Kanehisa for hosting me in his lab at Kyoto 'Daigaku' and Nakao-san and
all the members of the Kanehisa lab for their amazing patience in teaching me everything
from programming to sushi.
My first year Child Hall floormates Joe, Laurie, Jeremiah, Nancy, Darby,
Vyjayanthi, and Glen became my Harvard family, and Vyjayanthi, also my future sisterin-law. We grew together, pulled each other through the downs, and had many, many
x
ups. Chuck and Paras, old high school friends, managed to follow me up to Boston and,
in my good fortune, re-inserted themselves into my life.
Rutgers University prepared me well for my scientific career and continued to fan
the flames of my intellectual curiosity, allowing me to explore philosophy, languages,
and foreign cultures - including studying a year abroad in Bristol, England. After
graduation, the Fulbright Association awarded me a scholarship to study in Madrid,
Spain, where I worked with Manuel Espinosa and Gloria del Solar. In addition to science,
I learned to speak Spanish and to see myself more as a citizen of the world.
Mr. Kenneth Card, the now fabled 10th (and 12th) grade biology teacher,
introduced me to DNA and nurtured my pursuit of knowledge in every way. Mr. Steven
Holtzman, my 10th grade English teacher, pushed me to strive for excellence and to
search for the deeper meanings of literature, and of life. While still in high school, I was
given the extraordinary opportunity to learn cutting-edge molecular biology through a
cooperative education program with the research labs of Hoffman-La Roche in Nutley,
New Jersey. Under the guidance of Mary Graves and Liberata DeSantis, I poured my first
gels and made my first recombinant DNA constructs. This early experience gave me a
tremendous clarity of purpose and propelled me into my chosen career. My training was
bolstered in later years by summer internships at Merck and again at Hoffman-La Roche,
where I continued to grow as a scientist. Without these opportunities, I would not be
where I am today.
My family has made everything possible. My brother Jeff has been, over the
years, my protegé and my role model, my philosophical companion and my friend. My
sister Debbie is a great listener and has brought me through many times of doubt. I could
xi
never express enough gratitude to my parents, who have loved me and taught me so
much. I am truly grateful to have met my wife, Rosanna Marlene, in the course of my
doctoral work. She is my perfect companion, and I soar with her beside me. Knowing I
can share my accomplishments with her makes them especially sweet.
xii
Chapter 1 - Introduction
1.1 Systems Biology Completion
This section will form the basis of an invited review article for Trends in Biotechnology.
It has benefited from a number of discussions with many members of the Church lab,
most notably Matthew Wright.
1.2 Prokaryotic DNA Microarray Analysis
This section describes the motivations for, and development of, experimental and
computational tools for E. coli DNA microarray analysis.
1.3 RNA Decay in E. coli
This section reviews the current state of knowledge concerning RNA decay in E. coli,
and summarizes the contributions made by the data and methods presented in this thesis.
1
1.1 Systems Biology Completion
Generation of large-scale biological data, like those described in this thesis, have
generated a great deal of excitement in biology. In the post-genomics era, we are reevaluating the ultimate goals of biology and the proper ends to which we should apply
our newly-developed tools.
A simple story has been told of the basic philosophy of science. It tells of a drunk
man in search of his lost keys in the middle of the night. He searches only under light
posts, not because they are more likely to be there, but because they are the only places
he has a hope of finding them. Scientists too, search under the lamp posts; the questions
surrounding us in the darkness may be more interesting, or even more important, but they
are beyond the elucidating beam of our experimental methods and must await a new day.
This strategy has brought the natural sciences a long way: from Aristotle's passive
observations, to Galileo's experimental probings, to our own elaborately contrived and
controlled microdissections of nature. But we risk becoming too comfortable searching
next to our favorite lamppost and ignoring the flickering of new lights as they come to
life around us. The floodlights have recently come on in biology in the form of
systematic, quantitative, large-scale experiments with machine-readable outputs. Yes, we
can shine them on our favorite genes, but it's clear we can also do far more. It's time to
take stock of what has suddenly been illuminated, what is soon-to-be illuminated, and to
map the boundary of the semi-darkness for those determined squinters among us.
With new tools naturally come new goals. Classical molecular methods forced us
to focus our gaze on small numbers of molecules at a time, so we laboriously built up
2
descriptions in human language (predominantly English), pictures, and the occasional
video clip. The overarching goal of biology, if there was one, was to compile a large
number of systems that are interesting (those that define a general rule, break one, or
appeal to us as idiosyncratic human beings) or applicable (those that contribute to the
engineering, reverse-engineering, or modification of a system). The defining feature of
this "compilation strategy" is that it is more process than a goal. It specifies no endpoint
other than continual accumulation.
Long reserved for physicists searching for a "theory of everything", the idea of
completion has now become pervasive in biology. The extent to which sequencing of
complete genomes is taken for granted is well illustrated by a conversation I had with
Sydney Brenner in 1998 at the Cold Spring Harbor Genome Meeting. After telling me
how his group was almost finished with the sequence of a bacterial species, he realized he
had forgotten its name. After a brief moment of embarrassment, he insisted that
forgetting which genome one sequenced must be a milestone of some kind or another.
Historians of science take note. (I should note myself that I have subsequently been
unable to identify which genome he was referring to.)
But now that "completion" has entered the biologist's lexicon it raises the
questions of where else it rightfully applies and whether it constitutes a new sort of goal
for biological inquiry. The proliferation of the "-ome" suffix attests to widespread
acceptance that biology is rife with things to be completed, whether it's the proteome, the
metabolome, or the physiome. What sort of overarching goal, then, is implied by all these
projects?
3
There seem to be two distinct levels of completion. The first, and simpler of the
two, is 'parts list completion'. Put most simply, completion at this level is defined as a
fraction of observed to total predicted parts. This is well underway, and consists of the
various 'ome' projects such as genomes, transcriptomes, and proteomes. The second,
more ambitious and less well-defined level of completion, is at the level of 'systems
biology', of how the parts work together to form a working biological system. It is
systems biology completion (SBC), that I will discuss here.
SBC is necessarily model-dependent, requiring specification of a model
type and its requisite components. Using a traditional ab initio modeling strategy we
would start from a set of rules and, given an initial state, apply them to derive the future
states of the system. This approach can be valuable if i) such rules can be discovered, ii)
appropriate initial conditions can be stated, and iii) it is practical to calculate future states,
at a relevant time resolution, with current computing capacity. An atomic model easily
satisfies requirement i, and it may be possible to guess a relevant initial condition for part
ii, but it is highly unlikely we will meet requirement iii for the system sizes and
timescales relevant in biology. Ordinary differential equation models also have their main
difficulties in meeting requirement iii, because their nonlinearity can make them
problematic for numerical solvers and because it can be difficult to choose an appropriate
time step to capture a wide enough range of biologically relevant timescales while
maintaining computability.
The goal of modeling may be stated as finding a set of rules which are capable of
mapping the space of all possible inputs (Fig. 1, blue area), e.g. descriptions of the cell's
environment, to the space of all possible outputs allowed by the cell (Fig. 1, yellow area),
4
e.g. the concentration of all of its RNAs. By large-scale experimental sampling of inputoutput pairs (Fig. 1, yellow-red dots), such as condition-transcriptome pairs, one may be
able to derive rules that allow the prediction of outputs for novel inputs (Krupa 2002).
The accuracy of these predictions then, would be related to the density with which the
input space is sampled, as well as to various properties of the input space itself.
Input
Rules
Output
Figure 1. A general schema for modeling as an exercise in mapping input space
(blue area), e.g. all possible environments in which a cell can live, to output space
(yellow area), e.g. all possible cellular responses. The red-yellow dot pairs represent
measured input-output pairs, which, in large numbers, can be used to derive rules
(arrows) to predict outputs for novel inputs.
We are then forced to consider how to determine when the input space is
adequately sampled. In other words, how many measurements, at least to the order of
magnitude, would it take to populate the space of all possible inputs (e.g. conditions) with
enough measured outputs (e.g. transcriptomes, proteomes, etc.) to make interpolation
useful? This is a difficult question, but we can begin by defining what factors would
affect our estimate.
There are four factors which appear to be important: i) number of cell
components, ii) conditions/cell types, iii) the required accuracy of prediction, and iv) the
extent to which similar inputs give similar outputs. Firstly, the more components a cell
has, such as the number of gene products, the more measurements we need to make.
5
Secondly, the more environments in which a cell is capable of living, the larger the input
space; and the more ways a cell is capable of responding, the larger the output space.
Larger input and output spaces, of course, require more sampling. Thirdly, the accuracy
needed for our model affects the number of measurements needed, because more accurate
interpolations require a more densely sampled space. Finally, if nearby points in input
space map to nearby points in output space (i.e. the mapping function is relatively
smooth) then we do not need to sample as densely. With respect to time, we don't need to
sample much more finely than the timescale of the phenomena of interest; with respect to
conditions, we don't want to focus all of our measurements in a small region of biological
possibility (say, small increments of glucose concentration) because we know the cell
response will be largely identical. Likewise, all of our measurements should not be from
the same differentiated cell type if we want a general model of cells defined by a
genotype.
At the extremes of estimates for SBC, a cell which lives in only one environment
and never changes needs only one measurement to cover all of input-output space, while
a cell which is capable of living in many environments and exhibits a different response
to even small environmental changes would need a fine sampling of a very large space,
therefore requiring many, many measurements. Of course, we are not completely
ignorant about where on this spectrum actual biological systems lie. Cells are not likely
to reinvent themselves for slight changes of environment, but instead may rely on a
relatively small number of programs which they use in combination to respond to the
various natural environments for which they have evolved. In fact, a very simple cell, like
Mycoplasma genitalium, may even be an example of a cell with approximately one state,
6
as it seems to lack any transcriptional regulation and lives in an exquisitely controlled
environment within its human host (Razin et al. 1998).
Large-scale experimental data may be useful in modeling by providing large
numbers of constraints, and therefore aid in large-scale determination of the model rules.
One can attempt to make large-scale measurements of input-output pairs which uniformly
span all of input and output space, and using rules derived from these observed mappings,
predict the output for an unmeasured input. For example, we can make separate
transcriptome measurements of E. coli after heatshock and after lac induction, and predict
what the transcriptome might be for the combination of these two inductions. For
orthogonal conditions, the rules may be simply additive, whereas for interdependent
conditions the rules will probably be more complicated, perhaps involving intermediate
induction or epistasis. Study of these more complicated cases can give us important
information about the structure of the network.
The choice of a model type is a critical part of any SBC effort as it determines the
type of rules which need to be discovered and the number and type of component
measurements which need to be made. Table 1 gives examples of several model types.
On one end of the spectrum, we can imagine atomic level, or even subatomic level
descriptions of a complete cell. While large-scale measurements at this level are not
forthcoming in the foreseeable future (and certain measurements impossible even in
theory, according to the Heisenberg uncertainty principle) these model types set an upper
bound on detail. Towards the lower end of the detail spectrum we have boolean models,
which we can build from logical statements such as, "if the lac repressor is bound to the
operator then the lac operon is off."
7
Model
Scope
Applicable
Rules
Cell c at
time t
Physics
Cell c at
time t
Chemistry
Biomolecu
lar
(discrete)
Cell c at
time t
Biomolecu
lar
(statistical)
Biochemic
ally
equivalent
cells
Biomolecu
lar
(steadystate)
Genetically
equivalent
cells,
similar
growth
conditions,
steady state
Genetically
equivalent
cells
Atomic
Molecular
Boolean
Cell
Population
Equivalent
inoculums
and culture
conditions
Model Components
Atomic positions &
momentums
# of
Compo
nents
Examples of
Components
12
8
10 -10
13
C position &
momentum
Small molecule
positions &
momentums
107-1011
Glucose position &
momentum
Molecular
Mechanics
Macromolecule
positions &
momentums
106-1010
Hexokinase position
& momentum
Chemical
kinetics &
thermodynami
cs described
by differential
equations
Flux Balance
Macromolecule
concentrations,
compartments
105-107
Molecular fluxes
103-104
Genetic and
Metabolic
"circuits"
Regulons, Pathways
Growth
kinetics,
reproductive
fitness
Cell growth rates
102-103
100-101
Hexokinase
concentration in
cytoplasm
Flux of Glucose to
Glucose-6P
Glycolysis "on",
Gluconeogenesis
"off"
# of wild type cells,
# of mutant cells
Table 1. Examples of hypothetical systems biology projects to be completed, listed from
most complex (top) to least (bottom). We can currently collect complete component
datasets for some classes of biomolecules at the level of macromolecular concentrations.
As we move from more to less detailed models we make certain trade-offs. The
more detailed models make fewer assumptions, and are therefore potentially more
accurate for the systems they describe. On the other hand, they tend to be more
problematic with regard to computability and component measurement, and are therefore
difficult to apply to large systems. As we enhance our ability to make large numbers of
measurements, we may be able to generate enough input-output pairs, i.e. constraints, to
8
allow SBC using more and more detailed model types. Using order of magnitude
component estimates, together with the considerations of input-output space size and
sampling discussed previously, we can get a rough idea of the number of measurements
which might be needed for SBC of a particular system at a given level of detail. While
admittedly rough, such an estimate would represent a conceptual starting point.
In the pregenomic era, our sampling of input-output space was far too sparse for
most model organisms and model types to warrant a claim of SBC. Component
measurements were hard to come by and were acquired by any means necessary: from
one-at-a-time extraction from the literature to educated guesses. As large-scale biology
proceeds, we are dramatically increasing our capability to accurately sample significant
amounts of input-output space. Large-scale RNA half-life measurements, like those
described in this thesis, could eventually contribute to SBC of a biomolecular statistical
model, in which the concentrations of all biomolecules and their changes with respect to
time are incorporated into a set of differential equations. Judicious use of this newlypowerful experimental sampling capability could lead to justified claims of SBC for
systems of increasing complexity.
1.2 Prokaryotic DNA microarray analysis
While the seeds for microarray technology had been planted long ago (Gillespie
and Spiegelman 1965; Grunstein and Hogness 1975; Lennon and Lehrach 1991), it has
truly exploded in the last half-decade, and has resulted in a radical change in the
landscape of modern biology. When my work on this thesis began in earnest at the
9
beginning of 1998, a search on Pubmed with the keyword "microarray" would have
yielded only 7 articles on DNA microarrays. That same search run today (October 2002),
yields more than 2,300 articles. Given the rapid pace of recent developments, it is
important to put the present work into 'historical' context.
DNA microarray analysis was initially developed for gene expression analysis in
eukaryotes (Lockhart et al. 1996; Schena et al. 1996). As such, initial RNA labeling
protocols were developed to take advantage of the ubiquitous polyA tails of eukaryotic
messenger RNAs, which allowed them to be preferentially labeled over the far more
abundant ribosomal and transfer RNAs. Prokaryotes, of course, are of central importance
in biology, and were of particular interest to us because of their relatively small genomes,
which make them potential model organisms for systems biology. We were, therefore,
interested in extending microarray analysis to prokaryotes in general, and to the classical
model organism Escherichia coli in particular. Thus, our initial contact with Affymetrix
involved a collaboration to develop a labeling protocol useful for prokaryotes which
included access to newly-designed E. coli oligonucleotide arrays.
Development of an RNA labeling protocol (which for the Affymetrix platform
generally means biotinylation) proved to be difficult, ultimately taking about 1½ years.
Some of the factors which we considered during protocol development were:
biotinylation efficiency, cost of the labeling reagent (and the quantity needed), amount of
interaction of unincorporated labeling reagent with the array surface, robustness and
relative complexity of the protocol, and its generalizability to other prokaryotes. Our
initial strategies proved unsuccessful, including several direct chemical RNA labeling
methods, polyadenylation with the catalytic subunit of yeast poly(A) polymerase using
10
biotinylated ATP, and polyadenylation followed by the standard Affymetrix labeling
protocol (polyT priming, double-stranded cDNA synthesis, followed by T7 in vitro
transcription with biotinylated ribonucleotides to create labeled cRNA). These methods
typically yielded high fluorescent signal for rRNA and tRNA features, but almost none
for mRNAs.
A variety of on-chip (i.e. after hybridization) signal amplification methods were
also tried unsuccessfully, including on-chip polyadenylation using yeast poly(A)
polymerase and biotinylated ATP. The standard Affymetrix staining protocol involves
the use of streptavidin-phycoerythrin (streptavidin to bind the biotinylated target nucleic
acid, phycoerythrin as a fluorophore). An optional amplification step can be added using
a biotinylated anti-streptavidin antibody, followed by another streptavidin-phycoerythrin
stain. Iterations of this amplification procedure were explored as a way to increase the
signal-to-noise ratio of mRNA probes. I found that although I could get reproducible 2-3
fold increase of technical signal-to-noise (where signal-to-noise ratio is defined as
fluorescent intensity divided by the standard deviation of the background), it did not
increase the number of mRNAs I was able to detect.
Ultimately, I was successful in developing a protocol based on chemical
fragmentation of total RNA, single-strand cDNA synthesis using random octamer
primers, and 3' biotinylation by terminal deoxytransferase (TdT) using biotinylated
dideoxynucleotides. (Use of TdT for the biotinylation gave slightly less signal, but
significantly lower chip background, than incorporation of biotinylated nucleotides
during the cDNA synthesis step.) The protocol originally required 1 mg of total RNA but
was subsequently reduced to ~100 g in our hands, and to ~20 g using a somewhat
11
different random-priming protocol independently developed by Affymetrix (Rosenow et
al. 2001). Details of the protocol can be found in the methods section of Chapter 2.
Initial attempts to analyze the resulting data with GeneChip software (version 3.2)
were problematic and revealed a number of limitations of Affymetrix's software package.
First of all, the algorithms for transcript detection and quantitation were developed
empirically for eukaryotic transcription analysis and it wasn't clear whether they would
perform reliably with the increased noisiness of prokaryotic experiments (due,
presumably, to increased cross-hybridization from ribosomal and transfer RNAs).
Furthermore, the algorithm was kept secret by Affymetrix, preventing us from assessing
or modifying it. Additionally, their metrics were not based on standard statistical
methods, making interpretation of the results difficult. A number of other limitations
were apparent, including poor annotation and an inability to access data from individual
oligos on a large scale. (It should be noted that serious attempts were made to address all
of these issues in MAS 5.0, a major re-write of Affymetrix's microarray analysis
software.) These considerations led me to write a series of Perl scripts, collectively
named Genome Array Processing Software or GAPS, which directly accessed the raw
.CEL files generated by GeneChip, and did all subsequent processing in a more flexible
and statistically rigorous manner. A detailed survey and explanation of the features of
GAPS can be found in Appendix B.
At our insistence, we were provided full access to the sequences of the
oligonucleotides on the E. coli arrays, despite the fact that, at the time, these sequences
were a well-guarded Affymetrix secret. This sequence knowledge ultimately allowed us
to develop novel analyses which took full advantage of the tremendous density of oligos,
12
which sampled the genomic sequence, on average, once every 30 bases. We envisioned
such sub-genic resolution would allow important biological measurements to be made,
such as the identification of transcript boundaries, abortive termination events, and other
position-specific features of transcription and RNA degradation. After winning the
approval of Affymetrix, we were allowed to release the complete set of E. coli oligos as a
supplement to our publication (Selinger et al. 2000) and as part of GAPS, which was the
first microarray analysis tool to allow global subgenic-resolution expression analysis.
This feature ultimately led to the discovery of a 5' to 3' directionality of RNA decay,
described in Chapter 3. This first-ever release of Affymetrix oligo sequence data proved
very popular with the scientific community and was shortly followed by the public
release of complete sequence information for all Affymetrix chips. I believe this degree
of openness is vital for microarray data interpretation, including meta-analysis, quality
control, and the development of novel experimental and computational analyses.
Although, perhaps, microarray expression analysis of prokaryotes is now taken
for granted, the work described in Chapter 2 represents one of the first global RNA
expression profiles of E. coli and the first using the Affymetrix platform (Arfin et al.
2000; Khodursky et al. 2000; Richmond et al. 1999; Tao et al. 1999). Additionally, it
represents the first RNA expression analysis in any organism to be conducted at subgenic
level resolution. Subgenic-resolution expression analysis has more recently been applied
to humans (Kapranov et al. 2002; Shoemaker et al. 2001) and is emerging as an important
tool for empirical transcription boundary mapping and exon discovery/verification.
1.3 RNA Decay in E. coli
13
Gene expression is controlled on many different levels, including transcription,
RNA degradation, translation, or post translation. Steady state gene expression is a result
of the combined kinetics of several of these processes. Historically, studies of gene
regulation have focused on transcription and translation, with relatively little effort
devoted to understanding the mechanisms of RNA degradation. Half-lives of transcripts
in E. coli can vary anywhere from 40 seconds to 20 min, suggesting that there may be a
significant amount of regulation at the level of RNA stability, and that RNA degradation
is not merely a constitutively active salvage pathway (Kushner 2002). Here I present a
brief review of the current state of knowledge of RNA decay in E. coli.
RNA degradation in E. coli is largely accounted for by three central enzymes: two
3' - 5' exonucleases (RNase II and polynucleotide phosphorylase - PNPase) and a 5'- end
dependent endonuclease (RNase E). Transcript cleavage is often observed to occur in a 5'
to 3' direction (Bechhofer 1993; Carpousis et al. 1999). It has been proposed that this is
due to a rate limiting initial cleavage by RNase E, which is inhibited by 5' stem-loop
structures as well as the triphosphate present at the 5' termini of a new transcript (Mackie
1998). Once this initial endonucleolytic cleavage is made, possibly with the aid of
additional targeting factors, the rest of the transcript, which now lacks a 5' triphosphate or
a protective secondary structure, is rapidly degraded. RNase E cleavage is quickly
followed by exonucleolytic digestion in the 3' to 5' direction.
Stem loop structures are known to play an important role in the stabilization of
transcripts. 5' stem loop structures have the strongest stabilizing effect, accounting for
some of the longest lived mRNAs in the cell, and can confer similar stability to
14
transcripts to which they are fused (Chen et al. 1991; Emory et al. 1992; Lopez and
Dreyfus 1996). They are thought to confer stability by inhibiting downstream cleavage by
RNase E (and possibly other 5' - end dependent nucleases). RNase II and PNPase are
both inhibited by stable stem-loops (although RNase II more so), which are often present
at the 3' end of transcripts as a result of rho-independent termination (Higgins et al.
1993).
Polyadenylation has also been shown to play a role in mRNA degradation
(O'Hara et al. 1995). E. coli contains two poly(A) polymerases (PAPI and PAPII).
Depending on the gene, anywhere between 2 - 50% of its transcripts will have a poly(A)
tail of between 10 and 50 nucleotides. This tail has been proposed to affect mRNA
stability differently depending on its context (Sarkar 1996; Sarkar 1997). For transcripts
which lack a 3' stem loop structure, polyadenylation acts as a stabilizing factor,
presumably by competing with 3' - 5' exonucleases to add instead of remove nucleotides.
For transcripts which have a stable stem loop, polyadenylation creates a site which is
recognized by the RNA degradosome - a complex which contains RNase E, PNPase,
RhlB (an RNA helicase) and enolase (whose function in this complex is unclear). This
complex then rapidly degrades the transcript through an unknown mechanism (although
given the members of the complex it's not hard to imagine one).
The link between translation and mRNA stability has also been investigated
(Arnold et al. 1998; Petersen 1993). The assumption is that frequently transiting
ribosomes may reduce the accessibility of the transcript to nucleolytic attack. Ribosomes
have been found to have a stabilizing effect on transcripts, though the extent of the
15
stabilization varies greatly from transcript to transcript and depends on the mechanism of
degradation.
The list of players on the mRNA degradation scene is still longer (Ehretsmann et
al. 1992; Kushner 1996). Notably missing from the above discussion is RNase III, which
cleaves in double-stranded regions and is known to play a role in the degradation of a
subset of E. coli transcripts. There are about 20 ribonucleases in all, many of which still
await characterization.
There is still a tremendous amount to be learned about the mechanisms and
players involved in mRNA degradation in E. coli. Analysis of this process on a global
scale is likely to yield crucial insights into the genetic regulation of prokaryotes.
Importantly, by studying large numbers of RNAs, and the details of their degradation,
one can begin to identify common patterns. Bioinformatic analysis, or further
experiments, may then help identify features shared by these transcripts which are
responsible for their particular mode of degradation. Furthermore, large-scale
measurements can help determine whether known degradation mechanisms are general
for many transcripts, or specific to the relatively small number of RNAs which have been
studied so far.
In this fashion, Chapter 3 makes a number of contributions to the study of
prokaryotic RNA decay and sets the groundwork for a number of possible future studies.
Before the advent of microarray analysis, the degradation of fewer than 25 bacterial
RNAs had ever been studied (Bernstein et al. 2002). Here I present measured half-lives
for as many as 2,679 mRNAs (Appendix C), representing about 60% of the known and
predicted ORFs. Furthermore, I describe the first global positional analysis of RNA
16
degradation, in which it is found that the 5' ends of operons degrade significantly faster
than the 3' ends. Groups of operons with similar degradation patterns were identified,
allowing mechanistic explanations for their decay to be sought.
References
Arfin, S.M., A.D. Long, E.T. Ito, L. Tolleri, M.M. Riehle, E.S. Paegle, and G.W.
Hatfield. 2000. Global gene expression profiling in Escherichia coli K12. The
effects of integration host factor. J Biol Chem 275: 29672-29684.
Arnold, T.E., J. Yu, and J.G. Belasco. 1998. mRNA stabilization by the ompA 5'
untranslated region: two protective elements hinder distinct pathways for mRNA
degradation. Rna 4: 319-330.
Bechhofer, D. 1993. 5' mRNA Stabilizers. In Control of Messenger RNA Stability (ed.
G.B. Joel Belasco), pp. 31-50. Academic Press, Inc., San Diego.
Bernstein, J.A., A.B. Khodursky, P.H. Lin, S. Lin-Chao, and S.N. Cohen. 2002. Global
analysis of mRNA decay and abundance in Escherichia coli at single-gene
resolution using two-color fluorescent DNA microarrays. Proc Natl Acad Sci U S
A 99: 9697-9702.
Carpousis, A.J., N.F. Vanzo, and L.C. Raynal. 1999. mRNA degradation. A tale of
poly(A) and multiprotein machines. Trends Genet 15: 24-28.
Chen, L.H., S.A. Emory, A.L. Bricker, P. Bouvet, and J.G. Belasco. 1991. Structure and
function of a bacterial mRNA stabilizer: analysis of the 5' untranslated region of
ompA mRNA. J Bacteriol 173: 4578-4586.
Ehretsmann, C.P., A.J. Carpousis, and H.M. Krisch. 1992. mRNA degradation in
procaryotes. Faseb J 6: 3186-3192.
Emory, S.A., P. Bouvet, and J.G. Belasco. 1992. A 5'-terminal stem-loop structure can
stabilize mRNA in Escherichia coli. Genes Dev 6: 135-148.
Gillespie, D. and S. Spiegelman. 1965. A quantitative assay for DNA-RNA hybrids with
DNA immobilized on a membrane. J Mol Biol 12: 829-842.
Grunstein, M. and D.S. Hogness. 1975. Colony hybridization: a method for the isolation
of cloned DNAs that contain a specific gene. Proc Natl Acad Sci U S A 72: 39613965.
Higgins, C., H. Causton, G. Dance, and E. Mudd. 1993. The Role of the 3' End in mRNA
Stability and Decay. In Control of Messenger RNA Stability (ed. G.B. Joel
Belasco), pp. 13-27. Academic Press, Inc., San Diego.
Kapranov, P., S.E. Cawley, J. Drenkow, S. Bekiranov, R.L. Strausberg, S.P. Fodor, and
T.R. Gingeras. 2002. Large-scale transcriptional activity in chromosomes 21 and
22. Science 296: 916-919.
Khodursky, A.B., B.J. Peter, N.R. Cozzarelli, D. Botstein, P.O. Brown, and C. Yanofsky.
2000. DNA microarray analysis of gene expression in response to physiological
and genetic changes that affect tryptophan metabolism in Escherichia coli. Proc
Natl Acad Sci U S A 97: 12170-12175.
17
Krupa, B. 2002. On the Number of Experiments Required to Find the Causal Structure of
Complex Systems. J Theor Biol 219: 257-267.
Kushner, S. 1996. mRNA Decay. In Escherichia coli and Salmonella (ed. F. Neidhardt),
pp. 851-858. ASM Press, Washington.
Kushner, S.R. 2002. mRNA decay in Escherichia coli comes of age. J Bacteriol 184:
4658-4665; discussion 4657.
Lennon, G.G. and H. Lehrach. 1991. Hybridization analyses of arrayed cDNA libraries.
Trends Genet 7: 314-317.
Lockhart, D.J., H. Dong, M.C. Byrne, M.T. Follettie, M.V. Gallo, M.S. Chee, M.
Mittmann, C. Wang, M. Kobayashi, H. Horton, and E.L. Brown. 1996.
Expression monitoring by hybridization to high-density oligonucleotide arrays.
Nat Biotechnol 14: 1675-1680.
Lopez, P.J. and M. Dreyfus. 1996. The lacZ mRNA can be stabilised by the T7 late
mRNA leader in E coli. Biochimie 78: 408-415.
Mackie, G.A. 1998. Ribonuclease E is a 5'-end-dependent endonuclease. Nature 395:
720-723.
O'Hara, E.B., J.A. Chekanova, C.A. Ingle, Z.R. Kushner, E. Peters, and S.R. Kushner.
1995. Polyadenylylation helps regulate mRNA decay in Escherichia coli. Proc
Natl Acad Sci U S A 92: 1807-1811.
Petersen, C. 1993. Translation and mRNA Stability in Bacteria: A Complex Relationship.
In Control of Messenger RNA Stability (ed. G.B. Joel Belasco), pp. 117-141.
Academic Press, Inc., San Diego.
Razin, S., D. Yogev, and Y. Naot. 1998. Molecular biology and pathogenicity of
mycoplasmas. Microbiol Mol Biol Rev 62: 1094-1156.
Richmond, C.S., J.D. Glasner, R. Mau, H. Jin, and F.R. Blattner. 1999. Genome-wide
expression profiling in Escherichia coli K-12. Nucleic Acids Res 27: 3821-3835.
Rosenow, C., R.M. Saxena, M. Durst, and T.R. Gingeras. 2001. Prokaryotic RNA
preparation methods useful for high density array analysis: comparison of two
approaches. Nucleic Acids Res 29: E112.
Sarkar, N. 1996. Polyadenylation of mRNA in bacteria. Microbiology 142 ( Pt 11): 31253133.
Sarkar, N. 1997. Polyadenylation of mRNA in prokaryotes. Annu Rev Biochem 66: 173197.
Schena, M., D. Shalon, R. Heller, A. Chai, P.O. Brown, and R.W. Davis. 1996. Parallel
human genome analysis: microarray-based expression monitoring of 1000 genes.
Proc Natl Acad Sci U S A 93: 10614-10619.
Selinger, D.W., K.J. Cheung, R. Mei, E.M. Johansson, C.S. Richmond, F.R. Blattner,
D.J. Lockhart, and G.M. Church. 2000. RNA expression analysis using a 30 base
pair resolution Escherichia coli genome array. Nat Biotechnol 18: 1262-1268.
Shoemaker, D.D., E.E. Schadt, C.D. Armour, Y.D. He, P. Garrett-Engele, P.D.
McDonagh, P.M. Loerch, A. Leonardson, P.Y. Lum, G. Cavet, L.F. Wu, S.J.
Altschuler, S. Edwards, J. King, J.S. Tsang, G. Schimmack, J.M. Schelter, J.
Koch, M. Ziman, M.J. Marton, B. Li, P. Cundiff, T. Ward, J. Castle, M.
Krolewski, M.R. Meyer, M. Mao, J. Burchard, M.J. Kidd, H. Dai, J.W. Phillips,
P.S. Linsley, R. Stoughton, S. Scherer, and M.S. Boguski. 2001. Experimental
18
annotation of the human genome using microarray technology. Nature 409: 922927.
Tao, H., C. Bausch, C. Richmond, F.R. Blattner, and T. Conway. 1999. Functional
genomics: expression analysis of Escherichia coli growing on minimal and rich
media. J Bacteriol 181: 6425-6440.
19
Chapter 2
RNA expression analysis using a 30 base pair resolution Escherichia coli
genome array
Douglas W. Selinger, Kevin J. Cheung, Rui Mei, Eric M. Johansson, Craig S. Richmond,
Frederick R. Blattner, David J. Lockhart, and George M. Church
As published in Nature Biotechnology 18(12): 1262-68 (2000).
20
A high resolution ‘genome array’ has been developed for the study of gene
expression and regulation in Escherichia coli. This array contains on average one
25-mer oligonucleotide probe per 30 base pairs over the entire genome, with one
every 6 bases for the intergenic regions and every 60 bases for the 4,290 open
reading frames (ORFs). Two-fold concentration differences can be detected at
levels as low as 0.2 mRNA copies per cell, and differences can be seen over a
dynamic range of 3 orders of magnitude. In rich medium we detected transcripts
for 97% and 87% of the ORFs in stationary and log phases, respectively. 1,529
transcripts were found to be differentially expressed under these conditions. As
expected, genes involved in translation were expressed at higher levels in log phase,
whereas many genes known to be involved in the starvation response were expressed
at higher levels in stationary phase. Many novel growth-phase regulated genes were
identified, such as a putative receptor (b0836) and a 30S ribosomal protein subunit
(S22), both of which are highly upregulated in stationary phase. Transcription of
between 3,000 and 4,000 predicted ORFs was observed from the antisense strand,
suggesting most of the genome is transcribed at a detectable level. Examples are
also presented for high resolution array analysis of transcript start and stop sites
and RNA secondary structure.
Keywords: E. coli, stationary phase, gene expression, functional genomics, DNA chips, oligonucleotide
arrays, microarrays
21
The ability to simultaneously measure RNA abundance for large numbers of
genes has revolutionized biological research by allowing the analysis of global gene
expression patterns. Oligonucleotide arrays have been used to examine differential gene
expression in many organisms, including yeast, human, mouse, and bacteria1-5. Various
analytical approaches have been developed and applied to these datasets to further
characterize transcriptional regulation and the connectivity of genetic networks6-10.
Global gene expression analyses in prokaryotes have lagged behind those in eukaryotes
in part because of the lack of polyadenylation of prokaryotic mRNA, which has thwarted
separation or selective labeling of mRNA in the presence of the much more abundant
tRNA and rRNA1, 11-13.
We describe here a ‘genome array,’ on which both coding and non-coding regions
of the Escherichia coli genome are represented, and describe a genome-wide analysis of
RNA at sub-transcript level resolution. A labeling protocol was developed based on
random priming of total RNA which is reproducible, quantitative over 3 orders of
magnitude, and sufficiently sensitive to detect as few as 0.2 copies per cell. When used
to compare gene expression in log versus stationary phase, this method yields results
which both agree with the literature and identify novel sets of co-regulated genes. We
also present evidence that sub-transcript level resolution paired with complete genomic
representation of E. coli on the array allows for analysis of operon structure,
identification of small RNAs and antisense RNAs, and some aspects of RNA secondary
structure.
Results and discussion
22
Array design. The array consists of a 544 by 544 grid of 24 x 24 micron regions that
each contain ~107 copies of selected 25-mer oligonucleotides (295,936 total) of defined
sequence. The oligonucleotides on the array are synthesized in situ on a derivatized glass
surface using a combination of photolithography and combinatorial chemistry2, 14. Probe
oligonucleotides are arranged in pairs, or probe pairs, one of which is perfectly
complementary to the target sequence (the perfect match, or PM oligonucleotide) and one
with a single base mismatch at the central position (the mismatch, or MM
oligonucleotide) which serves as a control for nonspecific hybridization.
Oligonucleotides on the array are further organized into groups, or probe sets, which are
complementary to different regions of the same putative transcript. Probe sets are present
for 4,403 'b-numbers', which include all 4,290 predicted ORFs15, as well as all rRNAs,
and tRNAs. Both strands of intergenic regions at least 40 bp in length are represented
whereas only the strand predicted to be transcribed is represented for the ORFs. Most
probe sets have 15 probe pairs, although certain selected RNAs, such as lpp and Bacillus
subtilis control transcripts have 60 or more.
Oligonucleotides are arranged in alternating rows of PM and MM features (Fig.
1). The top half of the array contains oligonucleotides targeting ORFs and miscellaneous
untranslated RNAs, and the bottom half targets intergenic regions. The extreme bottom
has probes for tRNAs and rRNAs. A biotinylated control oligonucleotide is added to the
hybridization mixture and binds to the checkerboard border, corners, the AFFX-E COLI1 logo, and 100 pairs of features in a regularly spaced grid across the array. These
patterns are used for grid alignment and to correct for spatial variations in array
brightness (see Experimental Protocols).
23
Choice of a metric for RNA abundance. Signals from the 15 probe pairs in each probe
set must be quantitated and combined into a measure of RNA concentration. The
significant systematic differences in signal within a probe set for a given RNA led us to
investigate metrics which used different regions of the signal distribution, in addition to
the previously reported "average difference" metric2, 3, or AD, which uses the mean of all
PM-MMs after outliers are discarded. When probe pairs of the probe sets were ranked by
intensity difference (PM-MM), and probe pairs of different ranks were used to represent
the entire probe set, we found that the number of genes detected increased as brighter
probe pairs were used. An exception was the brightest probe pair, which gave fewer
detected transcripts because of the high variability of the maximal probe pair of the
negative controls. Transcripts were considered detected if the probe pair intensity
difference of a given rank was at least 3 standard deviations above the mean of probe
pairs of the same rank taken from control probe sets for which no transcript was present
(see Experimental Protocols). Using the second maximal probe pair, 87% of the ORFs
were detected in log phase, compared to 23% for the maximal and 70% for the third
maximal. The use of the second maximal signal also led to the detection of more RNAs
than measures of central tendency such as the median intensity (20%) and AD (18%).
We therefore chose to use the second maximal probe pair intensity, or '2max', as a metric
for RNA abundance.
The three metrics investigated: 2max, the median, and AD, had a sensitivity of
less than 0.2 copies/cell, were approximately linear for relative changes less than 10-fold,
nonlinear over a dynamic range of 3 orders of magnitude, and were about equally precise
24
(R ≈ 0.94)(Fig. 2). The lowest concentration of RNA for which a 2-fold concentration
could be detected was a change from 0.2 to 0.4 copies per cell, which was called
significant in 4/4 probe sets with an average measured fold change of 1.65 +/- 0.35. We
detected spiked RNAs from 100% (12/12) of probe sets at 0.2 copies/cell and 25% (2/8)
at 0.02 copies/cell.
Stationary Phase vs. Log Phase Expression Analysis. We compared the expression
profiles of cells grown in rich media (LB) to either mid-log phase (OD600 = 0.6) in a
fermentor or to late stationary phase in an overnight shaken culture. As expected, log
phase cells showed increased RNA levels for genes involved in protein synthesis
(rRNAs, tRNAs, and ribosomal proteins) and cell membrane synthesis (lpp) while
stationary phase cells showed increases in stress/starvation response genes such as dps
and rmf. Of 69 genes known to be differentially regulated in stationary phase16, 22 of
these were called significantly changed in agreement with the literature (Table 1). One
gene, rpoH, which is known to be regulated post-transcriptionally17, was called
significantly changed in the reverse direction from that reported. The remaining 46 were
not significantly changed. Some discrepancies and apparent "missed" changes are
expected because most of the changes reported in the literature were detected at the
protein level (usually by activity of lacZ fusions) and the correlation between gene
transcript levels and protein product activity is expected to be imperfect. A notable
transcript which was not called changed is the gene for the stationary phase sigma factor,
rpoS. This is expected because the transcript is known to peak in early stationary phase
and decrease thereafter, and therefore may not be significantly elevated by late stationary
25
phase. RpoS is also known to be regulated at the level of translation and protein
stability18. However, the mRNA levels of 16 genes known to be rpoS regulated are
increased in stationary phase, suggesting that rpoS activity has, in fact, increased.
Altogether, there were 1,529 RNAs (including tRNAs and rRNAs) whose
abundance significantly changed (see Experimental Protocol), which represents about
35% of the putative 4,403 RNAs in the genome. 926 were increased in stationary phase
and 603 were decreased. Of these, 77% were changed by more than 2-fold. It is unclear
how many of these changes have biological significance and whether the size of the
absolute change (copies per cell) or relative change is more important in the regulation of
genetic networks, although it is likely to be gene- and condition-dependent. For genes
with post-transcriptional regulation, changes in transcript level may have little effect on
the final activity of the gene product. Still, the sheer number of changes detected
suggests there are many transcriptionally regulated genes important for adaptation to
stationary phase, or stresses in general, which have previously gone unrecognized. It is
interesting to note that of the 25 RNAs most increased in stationary phase (ranked by
absolute change), 14 are genes of unknown function (Table 2). This includes a gene
(b0836), annotated as a putative receptor19, which is measured to increase in stationary
phase by more than 1000-fold and 30S ribosomal protein subunit S22 which increases
48-fold. Also found in the top 10 most increased in stationary phase are yjbJ, hdeA, and
dps whose protein products were reported to be the first, sixth, and fifth most abundant in
stationary phase, respectively20. Of the 10 genes of "known" function, only 3 were
already known to be increased in stationary phase. The complete results of this analysis
are in an expression database21, 22.
26
Novel Applications of a Genome Array: Identification of Small and Antisense RNAs
Inclusion of probes for predicted intergenic regions allows genome-wide scanning
for previously unidentified RNAs (Fig. 3). csrB, a small (360 bases) untranslated RNA
which is known to be abundant in stationary phase23 but was not present in our
annotation database was easily detected by probes targeting the region between loci
b2793 and b2792.
Genome arrays made by in situ synthesis of oligonucleotides also present an
opportunity for the identification of antisense RNAs. By simply inverting the synthesis, a
complementary array can be synthesized which contains probes that will bind to antisense
RNAs24. Hybridization of a stationary phase sample to such a reverse complement chip
resulted in the detection of antisense transcription of between 3,000 and 4,000 predicted
ORFs, suggesting that there is a low level of transcription throughout the E. coli genome.
The physiological significance of this transcription is unclear. An example of a detected
antisense RNA is b1365 (Fig. 3B), a predicted ORF located in the Rac prophage. This
transcript may be from an overlapping gene encoded on the opposite strand, a common
occurrence in phage and viruses. Alternatively, it could result from read-through
transcription of an upstream IS5 insertion. Consistent with this is the detection of IS5
transcription as well as antisense transcripts for the intervening ORFs, b1366 - b1369.
It is important to note that transcription at a given locus may be part of a long 5'
or 3' UTR, a spacer within an operon, an untranslated RNA, an ORF, or the result of an
incorrectly predicted ORF start or stop site. The ability to establish transcript start and
stops would aid in the interpretation of these RNAs, and is discussed in the next section.
27
Sub-transcript resolution
The large number of oligonucleotides (295,936) on the array allowed transcripts
to be probed at high resolution. Intergenic regions were probed, on average, every 6
bases whereas ORFs, and known RNAs were probed on average every 60 bases. This
makes it possible to obtain reasonably high-resolution information on transcript starts and
stops and operon structure.
Analysis of oligonucleotide probes for selected transcripts revealed a large
amount of intensity variation across the probes within a probe set, but also a striking
consistency to the patterns (Fig. 4). A highly reproducible pattern was seen for all probe
sets inspected. The intensity variation is likely due to sequence-dependent differences in
hybridization affinity and accessibility and to the effects of secondary structure on
hybridization. The similarity of the pattern obtained using RNA samples labeled by
random primers and genomic DNA labeled directly with terminal transferase, suggests
that the pattern is not a result of variations in priming or labeling efficiency. The signal
pattern correlates well with regions of experimentally confirmed RNA secondary
structure, such as the ompA 5' stem-loop25 (data not shown), but poorly with G/C content
or hypothetical hairpin formation of the probe oligonucleotides26, 27. It is currently being
investigated whether the signal is correlated with other predicted local RNA secondary
structures. It has been shown that secondary structure can strongly affect oligonucleotide
hybridization24, 28. Locations of known secondary structures in the lpp and rpsO 3' UTRs
are highlighted in figure 4. It must be noted, however, that lack of signal may indicate
28
early transcription termination. Signal from flanking regions and/or independent
information about transcription starts and stops can be used to rule out this possibility.
Analysis of transcription in predicted intergenic regions allows 5' and 3' UTRs to
be mapped. Transcriptional start and stops derived from array data for lpp and rpsO (Fig.
4) agree well with those determined with other methods. Lpp is known to be transcribed
from -33 to 284, ending in a hairpin29, 30, and rpsO starting from -100 and continuing
through a 3' stem-loop structure into pnp, with which it is co-transcribed31. To map
transcription endpoints with the array, the ability of each oligonucleotide to hybridize to
its target was determined. Oligonucleotides were considered 'reliable' if, when
hybridized to genomic DNA, their intensity difference (PM-MM) was at least 3 standard
deviations above noise. Oligonucleotides below this cut-off are referred to as 'unreliable'.
Transcription was considered detectable at positions which had reliable oligonucleotides
if the mean intensity difference at that position was greater than its standard deviation.
Signal from lpp was detected starting between oligonucleotides centered at positions -30
and -37 and can be detected until the last reliable probe at position 250. The probes from
274 to 284 are unreliable and correspond to the location of a known hairpin.
Transcription of rpsO is first detected at position -94 and begins no earlier than -117, the
first reliable oligonucleotide for which no transcription is detected. RpsO transcription is
detected, albeit irregularly, throughout the 3' UTR, where it presumably continues into
pnp. Probes for pnp, however, are located only at the 3' end of the ORF so this
continuation was not directly observed.
RpsO and pnp are co-transcribed and contain a structured attenuator sequence
between them which causes a high frequency of rho-independent termination before the
29
pnp coding region. This structured region also serves as a 3' stabilizer for rpsO and a 5'
stabilizer for pnp and is targeted by RNaseE and RNaseIII which lead to rapid
degradation of both rpsO and pnp RNAs32, 33. RpsO was seen to increase 400-fold in log
phase, the largest relative fold increase in log phase, whereas pnp showed no change.
Interestingly, the oligonucleotide hybridization pattern shows some differences between
log and stationary phase toward the 3' end of rpsO (Fig. 4B). This region is between two
known RNaseIII sites and is increased in stationary phase relative to the other probe pairs
in the probe set, perhaps indicating that RNaseIII processing at this site is increased in
stationary phase, leading to a decrease in local RNA secondary structure and increased
hybridization to the array.
Oligonucleotide Arrays and Cross-Hybridization. Considerably more crosshybridization is observed on E. coli arrays than on eukaryotic arrays, presumably because
of the presence of large amounts of labeled rRNA and tRNA. Because perfect match
(PM) features are tiled immediately above their mismatch (MM) counterparts, PM and
MM features of equal intensity appear as rectangles in the image. These can be seen
throughout the array images (Figs 1B-D). If the MM feature were not used, a large
number of cross-hybridizing PM oligonucleotides would be included in the analysis and
increase the noise of the system. The combination of MM signal subtraction and removal
of outliers has proven effective in quantifying RNA abundance changes with
oligonucleotide arrays2. We considered using MM features to identify cross-hybridizing
PM features, discarding them, and then using the raw PM intensities of the remaining
features to derive abundance measures. Our preliminary analysis suggested that this
30
approach yields results similar to those using PM-MM, so we did not pursue this line
further.
The Future of Genome Arrays. The noise present in a high complexity hybridization
reaction, encourages use of increased statistical rigor to determine the significance of
probe signal patterns. Corrections for systematic noise due to cross-hybridization,
variability in probe efficiency, and spatial variability across the array surface can be used
to increase the sensitivity and precision of the data. Because of the complexity of the
factors influencing array signal, internal negative controls, such as probe sets which
target RNAs that are not present, may be the best way to estimate the amount of signal
which can be expected from all factors besides specific hybridization. Replicate array
expression experiments, in combination with array hybridizations of genomic DNA, can
be used to extract information from single oligonucleotides, allowing transcripts to be
mapped at high resolution. The ability to interpret genome-wide transcription data at 10 100 base pair resolution has many potential applications for the study of gene regulation
in both prokaryotes and eukaryotes, including identification of alternative promoters, and
the ability to experimentally identify regions of transcription that are missed by ORFpredicting algorithms, a problem which is becoming more urgent as annotators deal with
the difficult task of predicting genes in higher eukaryotic genomes34.
There are a number of advantages of arrays which use short single-stranded
probes over those which utilize longer double stranded DNAs35, 36. These advantages
include higher resolution, better cross-hybridization controls, potential for paralog
discrimination, splice variant identification, and strand-specific transcript detection.
31
DNA arrays with probes covering entire genomes, rather than just ORFs, are a logical
step in the evolution of arrays. Inclusion of intergenic regions allows arrays to be used as
readouts for techniques which enrich for DNA sequences of interest, such as proteinbound sequences using Whole-Genome In vivo Methylase Protection37 or ChIP
(Chromatin Immuno-Precipitation)38, 39. If they are double stranded they could be used
as a direct in vitro assay of DNA-protein interactions40. Genome arrays should also be
useful for genotyping both ORF and promoter sequences41, 42. Integration of these data
into an understanding of genetic networks and cell physiology will remain a central
challenge in the post-genomic era.
Experimental protocol
Cell Culture. E. coli MG1655 was grown to mid-log phase in LB in a fermentor at 37
degrees with constant aeration of 11 liters/min and agitation of 300 rpm. Stationary
phase cultures were grown at 37 degrees overnight in culture flasks containing LB
aerated by shaking at 225 rpm. Samples were taken in duplicate for the log phase culture
and sampled once from the stationary phase culture. Each log phase duplicate was
labeled once and the single stationary phase RNA was labeled twice independently.
RNA Preparation. RNA was prepared by extraction with acid phenol:chloroform
extraction. Briefly, samples of culture were transferred directly into acid
phenol:chloroform,5:1 (Ambion, Austin, TX) at 65º C to ensure rapid lysis and
inactivation of RNAses. Two additional acid phenol:chloroform extraction were
performed, followed by ethanol precipitation, treatment with 1.25 U of DNase I (Gibco
32
BRL) per ml of culture, 20 g proteinase K (Boehringer Mannheim, Mannheim,
Germany) per ml of culture, and a final ethanol precipitation. The pellet was then
washed with 70% ethanol, resuspended in DEPC-treated water, quantified by A260, and
visualized on a denaturing polyacrylamide gel. We subsequently found that
contaminating salts and sugars from the media were inhibiting the reverse transcription
reaction used to make labeled cDNA. The yield was dramatically improved (see below)
by removing salts and sugars after the first precipitation by three passes through
Centricon PL-20 concentrator columns (Centricon, Beverly, MA), which have a cut-off
about 30 bases, and diluting the concentrate with DEPC water.
cDNA synthesis, biotinylation. The protocol currently supported by Affymetrix for
prokaryotic expression analysis was not available at the time of this study, and limited
direct comparison has been made with the protocol used here. In our labeling protocol
1.5 mg* of total RNA was fragmented in a high Mg2+ buffer (40 mM Tris-acetate, pH
8.1, 100 mM KOAc, 30 mM MgOAc) at 94º C. for 30 min in the presence of random
octamers (6.7 mM) and 4 control RNAs generated by in vitro transcription (B. subtilis
dapB, thrB, lysA, and pheB). After fragmentation the sample was put immediately on ice.
The reaction was then diluted two-fold into the following reverse transcription reaction:
1X Superscript II buffer, dNTPs (1.3 mM), DTT (10 mM), 3,000 units of Superscript II
Reverse Transcriptase (Gibco BRL) which was incubated at 42º C for 3 hrs. RNA was
then degraded by treatment with 135 units of RNAse One (Promega, Madison, WI).
RNase One was then heat inactivated and unincorporated nucleotides and random
octamers were removed by Centrisep Spin Columns (Princeton Separations, Adelphia,
33
NJ). This reaction typically yields ~30 g first strand cDNA. 10 g was then
biotinylated with 30 units of Terminal Deoxynucleotidyl Transferase (Gibco BRL) and
50 micromolar Biotin-N6-ddATP (Dupont NEN, Boston, MA) in 1X One-Phor-All
buffer (Pharmacia, Piscataway, NJ) and incubated at 37º C for 2 hrs. Genomic DNA was
fragmented with DNaseI (Promega) 1.1 U per g of DNA in 1X One-Phor-All buffer to
an average size of 100 bp and then biotinylated with TdT as above. 10 g of biotinylated
cDNA or gDNA was then hybridized to an E. coli array (Affymetrix, Santa Clara, CA) at
45º C for 40 hours, washed, and stained with streptavidin-phycoerythrin (Molecular
Probes, Eugene, OR). Arrays used for expression analysis are denoted "antisense" by
Affymetrix because they contain probes which will bind to the reverse complement of the
transcript, e.g. cDNA, whereas "sense" arrays (Part# 900284) will bind to the transcripts
themselves. Antisense arrays are not yet commercially available. It should be noted,
however, that the commercially available sense chips can be used to analyze both strands:
Affymetrix's RNA labeling protocol can be used for expression analysis, and our cDNA
labeling protocol for reverse complement analysis. In this article, we refer to antisense
arrays as "expression arrays" and sense as "reverse complement arrays". Most arrays
were scanned after a single staining, but one stationary phase array and the reverse
complement array were signal amplified with a biotinylated anti-streptavidin antibody,
followed by a second streptavidin-phycoerythrin staining, according to standard
Affymetrix protocols. This amplification increased the signal/noise ratio about 2 to 3fold, but did not result in a significant increase in the number of transcripts detected. The
array was then scanned by a HP-Affymetrix array scanner.
34
*Note: 50 g of column-purified total RNA (RNA preparation section) yielded >10 g of
cDNA, enough for an array hybridization. Taking into account a 67% loss from the
Centricon columns, 150 g of RNA from a phenol:chloroform prep is enough for an
array experiment. This hybridization sample can be recovered and re-used at least 3
times without significant loss of signal3. The use of Centricon columns caused no
noticeable changes in the nature of the resulting array data.
Data processing and normalization. Background was determined using GeneChip 3.2,
which divides the array into 16 sectors and takes the average of the lowest 2% of features
of each sector. After background subtraction, mismatch features were subtracted from
perfect match features, and the resulting difference was multiplied by a scaling factor
derived from GeneChip software. For spiked control RNAs the scaling factor was
derived from setting the 16S ribosomal mean average differences to 50,000. For the log
vs. stationary phase analysis, intensities were scaled so that the mean average difference
for all probe sets was 5,000 units. All array analyses after the derivation of background
and scaling factors were done with a set of Perl scripts which we have dubbed "Genome
Array Processing Software" or "GAPS". GAPS takes ".CEL" files, generated by
GeneChip, as input. GAPS and the .CEL files used in this study can be found at Express
DB22.
The array contains a regularly spaced 10 x 10 grid of control feature pairs which
all hybridize to the same control oligonucleotide, and should thus be of equal intensity.
However, we found that fluorescence intensity of these features typically varied about 23 fold across the surface of the array, possibly because of local differences in
35
washing/staining efficiencies. To correct for this spatial variation, the control grid was
used to estimate local deviations in florescence intensity. First, each pair of controls were
averaged. Then experimental features were multiplied by a correction factor which is
derived from control features representing the relative brightness of the region. Control
features closer to the probe pair contributed more to the final correction factor than
distant ones. This correction factor was determined by the following equation:
c
Correction Factor =
 1 


4
 di 

 4 1 ci
i 1
  
 j 1 dj 
where di or j is the Euclidean distance from the PM feature to the 4 closest control features,
ci is the intensity of control feature i, and c is the mean of all control features on the
array.
RNA abundance metrics: average difference and 2max. Five control RNAs from
Bacillus subtilis which each have 4 probe sets each on the array were analyzed at
concentrations which ranged from ~20 to ~0.0002 copies/cell, and no RNA, which served
as a negative control. These control RNAs were spiked into total cellular RNA before
labeling. There were a total of 100 independent pairwise comparisons made. Copies/cell
was estimated by assuming cells have approximately 60 femtograms of total RNA43.
Copies per cell can be recalculated for different total RNA contents, which normally
ranges from 20 to 200 femtograms/cell. For example, 1 copy per cell in a cell with 60
femtograms of total RNA is equivalent to 2 copies per cell in a cell with 120. The
36
average transcript size of our spiked RNAs was 4.6 kb. Probe pairs were averaged over
duplicates and then ranked by their mean intensity difference (PM-MM). The total
intensity normalized values reported in the tables and the online datafile are
approximately 90% of the ribosomal normalized values of Figure 2A. The relationship
between fluorescent signal and copies/cell is given by the equations of the regression
lines of figure 2A:
2max Signal = 13000 * ln(Copies/Cell) + 39000, R2 = 0.76
Median Signal = 5500 * ln(Copies/Cell) + 16000, R2 = 0.80
Average Difference Signal = 6000 * ln(Copies/Cell) + 18000, R2 = 0.86
Conversions from fluorescence intensity and copies/cell should be used with
extreme caution. In addition to cell-size issues noted above, there is a significant amount
of error introduced by the large variability of probe signal, such that probes whose target
RNA is present at equal concentration will have variable raw fluorescence intensity (see
Fig. 2A). Experiments are in progress to use a hybridization of genomic DNA (where all
genes are equimolar) to calibrate this conversion and allow more accurate measurement
of absolute RNA levels. For the purposes of this study, we focus on the change in
fluorescence of identical probe sets (thus bypassing inherent variability between different
probe sets) and report "absolute change" and "fold change" (Tables 1, 2) rather than
absolute RNA levels.
We found that by using the intensity difference of the second maximal probe pair
to represent a probe set we maximized the number of detected genes. We therefore chose
37
the second maximal probe pair intensity difference, or "2max", as a measure of RNA
abundance. Using Excel, an exponential trend line was fit to a plot of observed vs.
expected fold change, and the equation was used to calibrate estimates of fold change in
our stationary vs. log expression comparison (Fig. 2B). The calibration equation is as
follows: calibrated fold change = 1.2 x (measured fold change)1.9. Pairwise comparisons
of the 2max of the same probe sets on duplicate arrays yielded an average linear
correlation coefficient of 0.85 +/- 0.04.
Transcript detection. To determine which transcripts were detected, we used a set of 4
distinct Bacillus subtilis probe sets whose target RNA was not used in our spiking
experiments. After normalization to total intensity we determined the average 2max of
these probe sets on the arrays used in the stationary vs. log comparison. Transcripts were
considered detected if their 2max was at least 3 standard deviations above the mean of
the 4 probe sets for the absent B. subtilis RNA. 97% and 87% of transcripts were
detected in stationary and log phase respectively. 1.7% were not detected in either
condition. Because the negative controls were used to determine the detection threshold,
they could not be used to estimate false positives. The false positive rates for the 2max
and median metrics, therefore, were estimated by using probe sets whose RNAs were
spiked at 0.004 copies/cell or less, well below the sensitivity of the assay. These metrics
both yielded a false positive rate of 0% (0/20) by this method. For the average difference
metric, detection is decided by Affymetrix's calling algorithm which works independently
of internal negative controls. We therefore used the negative controls to estimate the false
positive rate, which was also 0% (0/15). The parameters used in Affymetrix's software
38
package, GeneChip 3.2, were the following: SDT multiplier = 4, ratio threshold = 1.5,
ratio limit = 10, horizontal zones = 4, vertical zones = 4, % background cells = 2, pos/neg
min = 3, pos/neg max = 4, pos ratio min = 0.33, pos ratio max = 0.43, avg. log ratio min
= 0.9, avg. log ratio max = 1.3.
It is important to note that 2max does not detect the maximal number of
transcripts in every experiment. The maximum number of transcripts (4,033) on the
reverse complement array was detected using the fourth brightest probe pair, or "4max".
Averaging the 4th through 8th ranks "4-8max", which represented the peak of detection,
gave 3,470 detected transcripts (78% of predicted RNAs). In this case 20 B. subtilis
probe sets were used as negative controls, with a detection cutoff of 3 standard deviations
above the mean. Widespread detection of transcription in E. coli with a reverse
complement array has been confirmed in our lab on an independent RNA sample using
the current Affymetrix labeling protocol in which 4,344 transcripts were detected (99%
of predicted RNAs) using 4-8max (Daniel Janse, unpublished data). The agreement is
particularly striking considering the many differences between our original experiment
and the confirmation experiment, which were, respectively: biotinylated total cDNA vs.
mRNA-enriched biotinylated RNA, antisense vs. sense chip, and stationary phase vs. log.
phase RNA samples. Both protocols include a DNaseI digestion to remove genomic
DNA and no gDNA contamination was detected by EtBr staining.
Significance of changes. To determine which changes in 2max were significant, we
devised a calling algorithm which uses both a t-test and a consensus measure. If either of
the following criteria are fulfilled for transcripts which were detected in at least one
39
condition, the transcript is called significantly changed: i) mean 2max from duplicates is
determined to be significantly different in the two conditions by a two tailed Student's ttest with >95% confidence or ii) after discarding the brightest and dimmest probe pairs, at
least 11/13 of the remaining probe pairs are all changed in the same direction, by any
amount. For transcripts with >15 probe pairs, the 15 brightest were identified and
processed in the same way as the other probe sets. In the rare cases in which these two
criteria conflicted, the decision based on the second maximal probe pair was used. It is
important to note that the magnitude of the fold or absolute changes are not considered in
deciding their significance, although 77% of the significant changes were greater than 2fold. Out of 100 independent pairwise comparisons, 52 were detected in at least one
condition. The algorithm correctly assigned significant changes to all 52 of these probe
sets, all of which had fold changes of at least 2-fold. Probe sets for control RNAs spiked
at equal concentrations showed no significant changes (0/16).
Acknowledgements
We thank Jeremy Edwards for improvements to the labeling protocol, Daniel Janse for
sharing unpublished data, Adnan Derti and Allegra Petti for bioinformatics contributions,
Felix Lam for help with the fermentor, Michael Mittmann for array design, Phillip Juels
for impeccable computer tech support, Wayne Rindone and John Aach for expression
database support, Barak Cohen, Robi Mitra, Martha Bulyk, Pete Estep, Martin Steffen,
and the rest of the Church lab for the many helpful discussions and encouragement which
made this work possible. We also thank the reviewers for significant improvements to
40
the manuscript. This work was supported by grants from Aventis Pharma, Lipper
Foundation, DOE and NSF.
________________________________________________________________________
1.
de Saizieu, A., et al. Bacterial transcript imaging by hybridization of total RNA to
oligonucleotide arrays. Nat. Biotechnol. 16, 45-8 (1998).
2.
Lockhart, D.J., et al. Expression monitoring by hybridization to high-density
oligonucleotide arrays. Nat. Biotechnol. 14, 1675-80 (1996).
3.
Wodicka, L., et al. Genome-wide expression monitoring in Saccharomyces
cerevisiae. Nat. Biotechnol. 15, 1359-67 (1997).
4.
Lee, C.K., et al. Gene expression profile of aging and its retardation by caloric
restriction. Science 285, 1390-3 (1999).
5.
Zhu, H., et al. Cellular gene expression altered by human cytomegalovirus: global
monitoring with oligonucleotide arrays. Proc. Natl. Acad. Sci. USA 95, 14470-5 (1998).
6.
Wen, X., et al. Large-scale temporal gene expression mapping of central nervous
system development. Proc. Natl. Acad. Sci. USA 95, 334-9 (1998).
7.
Roth, F.P., et al. Finding DNA regulatory motifs within unaligned noncoding
sequences clustered by whole-genome mRNA quantitation. Nat. Biotechnol. 16, 939-45
(1998).
8.
Tavazoie, S., et al. Systematic determination of genetic network architecture. Nat.
Genet. 22, 281-5 (1999).
9.
Eisen, M.B., et al. Cluster analysis and display of genome-wide expression
patterns. Proc. Natl. Acad. Sci. USA 95, 14863-8 (1998).
10.
Tamayo, P., et al. Interpreting patterns of gene expression with self-organizing
maps: methods and application to hematopoietic differentiation. Proc. Natl. Acad. Sci.
USA 96, 2907-12 (1999).
11.
Richmond, C.S., et al. Genome-wide expression profiling in Escherichia coli K12. Nucleic Acids Res. 27, 3821-35 (1999).
12.
Tao, H., et al. Functional genomics: expression analysis of Escherichia coli
growing on minimal and rich media. J. Bacteriol. 181, 6425-40 (1999).
13.
Chuang, S.E., D.L. Daniels, & F.R. Blattner. Global regulation of gene expression
in Escherichia coli. J. Bacteriol. 175, 2026-36 (1993).
14.
Pease, A.C., et al. Light-generated oligonucleotide arrays for rapid DNA sequence
analysis. Proc. Natl. Acad. Sci. USA 91, 5022-6 (1994).
15.
Blattner, F.R., et al. The complete genome sequence of Escherichia coli K-12.
Science 277, 1453-74 (1997).
16.
Hengge-Aronis, R. In Escherichia Coli and Salmonella: Cellular and Molecular
Biology. (eds. Neidhardt, F. C. et al.) 1497-1512 (ASM Press, Washington D.C.; 1996).
17.
Yuzawa, H., et al. Heat induction of sigma 32 synthesis mediated by mRNA
secondary structure: a primary step of the heat shock response in Escherichia coli.
Nucleic Acids Res. 21, 5449-55 (1993).
18.
Lange, R. & R. Hengge-Aronis. The cellular concentration of the sigma S subunit
of RNA polymerase in Escherichia coli is controlled at the levels of transcription,
translation, and protein stability. Genes Dev. 8, 1600-12 (1994).
41
19.
http://www.genome.wisc.edu
20.
Link, A.J., K. Robison, & G.M. Church. Comparing the predicted and observed
properties of proteins encoded in the genome of Escherichia coli K-12. Electrophoresis
18, 1259-313 (1997).
21.
Aach, J., W. Rindone, & G.M. Church. Systematic Management and Analysis of
Yeast Gene Expression Data. Genome Res. 10, 431-445 (2000).
22.
http://arep.med.harvard.edu/cgi-bin/ExpressDBecoli/EXDStart
23.
Liu, M.Y., et al. The RNA molecule CsrB binds to the global regulatory protein
CsrA and antagonizes its activity in Escherichia coli. J. Biol. Chem. 272, 17502-10
(1997).
24.
Southern, E.M., N. Milner, & K.U. Mir. Discovering antisense reagents by
hybridization of RNA to oligonucleotide arrays. Ciba Found. Symp. 209, 38-44 (1997).
25.
Chen, L.H., et al. Structure and function of a bacterial mRNA stabilizer: analysis
of the 5' untranslated region of ompA mRNA. J. Bacteriol. 173, 4578-86 (1991).
26.
SantaLucia, J., Jr. A unified view of polymer, dumbbell, and oligonucleotide
DNA nearest-neighbor thermodynamics. Proc. Natl. Acad. Sci. USA 95, 1460-5 (1998).
27.
http://mfold2.wustl.edu/~mfold/dna/form1.cgi
28.
Mir, K.U. & E.M. Southern. Determining the influence of structure on
hybridization using oligonucleotide arrays. Nat. Biotechnol. 17, 788-92 (1999).
29.
Taljanidisz, J., P. Karnik, & N. Sarkar. Messenger ribonucleic acid for the
lipoprotein of the Escherichia coli outer membrane is polyadenylated. J. Mol. Biol. 193,
507-15 (1987).
30.
Cao, G.J. & N. Sarkar. Poly(A) RNA in Escherichia coli: nucleotide sequence at
the junction of the lpp transcript and the polyadenylate moiety. Proc. Natl. Acad. Sci.
USA 89, 7546-50 (1992).
31.
Portier, C. & P. Regnier. Expression of the rpsO and pnp genes: structural
analysis of a DNA fragment carrying their control regions. Nucleic Acids Res. 12, 6091102 (1984).
32.
Portier, C., et al. The first step in the functional inactivation of the Escherichia
coli polynucleotide phosphorylase messenger is a ribonuclease III processing at the 5'
end. Embo J. 6, 2165-70 (1987).
33.
Regnier, P. & E. Hajnsdorf. Decay of mRNA encoding ribosomal protein S15 of
Escherichia coli is initiated by an RNase E-dependent endonucleolytic cleavage that
removes the 3' stabilizing stem and loop structure. J. Mol. Biol. 217, 283-92 (1991).
34.
Pennisi, E., Are Sequencers Ready to 'Annotate' the Human Genome?, in Science.
2000. p. 2183.
35.
DeRisi, J., et al. Use of a cDNA microarray to analyse gene expression patterns in
human cancer. Nat. Genet. 14, 457-60 (1996).
36.
DeRisi, J.L., V.R. Iyer, & P.O. Brown. Exploring the metabolic and genetic
control of gene expression on a genomic scale. Science 278, 680-6 (1997).
37.
Tavazoie, S. & G.M. Church. Quantitative whole-genome analysis of DNAprotein interactions by in vivo methylase protection in E. coli. Nat. Biotechnol. 16, 56671 (1998).
38.
Dedon, P.C., et al. A simplified formaldehyde fixation and immunoprecipitation
technique for studying protein-DNA interactions. Anal. Biochem. 197, 83-90 (1991).
42
39.
Orlando, V. & R. Paro. Mapping Polycomb-repressed domains in the bithorax
complex using in vivo formaldehyde cross-linked chromatin. Cell 75, 1187-98 (1993).
40.
Bulyk, M.L., et al. Quantifying DNA-protein interactions by double-stranded
DNA arrays. Nat. Biotechnol. 17, 573-7 (1999).
41.
Winzeler, E.A., et al. Whole genome genetic-typing in yeast using high-density
oligonucleotide arrays. Parasitology 118, S73-80 (1999).
42.
Gingeras, T.R., et al. Simultaneous genotyping and species identification using
hybridization pattern recognition analysis of generic Mycobacterium DNA arrays.
Genome Res. 8, 435-48 (1998).
43.
Neidhardt, F.C., J.L. Ingraham, & M. Schaechter. Physiology of the Bacterial
Cell: A Molecular Approach. 1st ed. (Sinauer Associates, Massachusetts; 1990).
43
A
B
C
D
lpp 5’ end of coding region
lpp 3’ end of coding region
high
low
Figure 1. False-color images of scanned Escherichia Coli genome array
hybridized with a sample derived from a stationary phase culture growing
in LB. (A) Whole array (top half: ORFs, bottom half: intergenic regions,
very bottom: rRNAs and tRNAs). (B) Close-up of coding regions. The
bright streak on the lower left is rmf. (C) Close-up of intergenic regions,
rRNAs, tRNAs. (D) lpp coding region. Note: Apparent saturation (esp. in C)
is due to display settings and not signal saturation.
44
A
B
160000
120000
Signal
100000
100
Genes detected:
Metric
Log
2max
87%
Median
20%
AD
18%
2max
Stationary
97%
90%
87%
Observed fold change
140000
80000
60000
40000
20000
0
0.01
0.1
1
10
y = 1.11x0.46
R = 0.94
0.37
Median y = 1.95x
R = 0.94
AD
10
y = 1.08x0.54
R = 0.95
100
-20000
1
1
Copies / cell
10
100
Known fold change
Figure 2. Comparison of 2max (●), median (■), and average difference (▲)
abundance metrics using Bacillus subtilis control RNAs. (A) Abundance
measurement vs. RNA concentration, with present calls. Genes are
considered detected by the 2max and median metrics if they are at least 3
standard deviations above negative controls for which no RNA is present.
Detection using the average difference metric is determined using an
algorithm implemented in the GeneChip 3.2 software package. No false
positives were detected for any of the metrics (see Experimental Protocol).
(B) Plot of observed fold changes measured by various metrics vs. known
fold changes. The relationship between observed and known fold change
is non-linear for all three metrics over a dynamic range of 3 orders of
magnitude, and approximately linear for changes less than 10-fold.
45
1000
A
Crick strand
Watson strand (same array)
csrB
B
csrB (untranscribed strand)
Expression array
Reverse complement array
b1365 (sense)
b1365 (antisense)
pos. 171
228
Figure 3. The E. coli array can detect strand-specific transcription and can
be used to identify (A) small untranslated RNAs, such as csrB, and (B)
detection of a previously unidentified antisense RNA in the Rac prophage.
Transcription was detected on the strand opposite b1365 from positions
171 to 228. Position is given as the number of base pairs from the central
nucleotide of the oligonucleotide probe to the translation start of b1365.
The oligonucleotides are closely spaced but three of them are nonoverlapping. The 15 probe pairs in these probe set are outlined by the
white grid, with the PM features on the top row. Probe sets for the
untranscribed strands show the background signal typical of undetected
transcripts. The oligonucleotides in A and on the expression array of B are
tiled from left to right in the 5' to 3' direction.
46
A
B
Intensity (PM - MM) / 2max
2
5
4
1.5
Known RNaseIII
sites
3
Reported
transcription
start (-100)
1
Known
hairpin
0.5
Known region of
secondary structure
2
1
pnp
0
0
-300
-200
-100
0
100
200
-200
400
300
-100
0
-0.5
-2
Translation stop
(237 bases)
Known transcription start
(position -33)
-1
100
200
300
-1
Translation stop (270 bases)
-3
Bases from translation start
Bases from translation start
Figure 4. Determination of transcription starts of (A) lpp and (B) rpsO.
Both genes exhibit reproducible hybridization patterns despite large log
phase fold increases of 60- and 400-fold, respectively. 2max-normalized
(PM - MM) fluorescence intensity of log phase (), stationary phase (▲),
and genomic DNA (●) arrays were plotted against distance from center of
oligonucleotide to translation start site. Points for log and stationary
phase are the means of duplicate experiments. Oligonucleotides which
target both the open reading frame and the flanking intergenic regions
allow this region to be probed at ~6 base pair average resolution for lpp
and ~13 for rpsO. Transcription starts are detected between -30 and -37 for
lpp (reported -33)28,29 and between -94 and -117 for rpsO (reported -100) 30.
lpp is known to sometimes extend to position 284, ending in a hairpin
structure. Oligonucleotides in this region showed no hybridization,
suggesting early termination of transcription and/or sensitivity of the array
to secondary structure. Variability in the hybridization pattern at the 3' end
may reflect differential processing. The rpsO transcript has a 3' hairpin
and can be co-transcribed with downstream pnp. The hairpin structure
serves as a stabilizing element for both rpsO and pnp as well as a
transcriptional attenuator31,32. Processing by RNaseIII may relieve
secondary structure in this region and lead to the increased signal seen at
the 3' end in stationary phase.
47
400
500
Table 1. ORFs with significant changes in probe set intensity, previously known to
be differentially regulated in stationary phase
Gene
rmf
glgS
hdeA
dps
hdeB
osmY
himA
Abs. Change
120465
118425
104184
91763
34968
21914
19920
Fold Change
17
160
41
55
5
9
23
csgB
clpA
wrbA
19385
17369
15845
>30
8
7
fic
14900
26
htrE
14893
>24
cstA
sspA
ftsA
13475
13076
11171
11
4
>5
hyaE
dacC
10406
10064
>4
8
emrA
otsB
cfa
iciA
rpoH
8433
8276
7896
7506
-26713
>4
2
>4
>4
0.4
hns
-170027
0.04
Annotation
ribosome modulation factor
glycogen biosynthesis, rpoS dependent
orf, hypothetical protein
global regulator, starvation conditions
orf, hypothetical protein
hyperosmotically inducible periplasmic protein
integration host factor (IHF), alpha subunit; site specific
recombination
minor curlin subunit precursor, similar to CsgA
ATP-binding component of serine protease
trp repressor binding protein; affects association of trp
repressor and operator
induced in stationary phase, recognized by rpoS, affects cell
division
probable outer membrane porin protein involved in fimbrial
assembly
carbon starvation protein
regulator of transcription; stringent starvation protein A
ATP-binding cell division protein, septation process,
complexes with FtsZ, associated with junctions of inner and
outer membranes
processing of HyaA and HyaB proteins
D-alanyl-D-alanine carboxypeptidase; penicillin-binding
protein 6
multidrug resistance secretion protein
trehalose-6-phosphate phophatase, biosynthetic
cyclopropane fatty acyl phospholipid synthase
replication initiation inhibitor, binds to 13-mers at oriC
RNA polymerase, sigma(32) factor; regulation of proteins
induced at high temperatures
DNA-binding protein HLP-II (HU, BH2, HD, NS);
pleiotropic regulator
Genes are ranked by absolute change, given as 2max in arbitrary fluorescence units. Signal was
normalized to total array intensity. Fold changes were adjusted based on calibration with spiked transcripts
(Fig. 2B). For those transcripts which were called absent in one condition the fold change was estimated
(indicated by a ">") by substituting the mean of the negative controls + 3 standard deviations for the
undetected transcript. 23 out of 69 transcripts which are known to be differentially expressed 15 and which
are present on the array were called as significantly changed. The remaining 46 were not significantly
changed. 22 out of 23 of the significant changes agree with the direction of change reported in the
literature. rpoH, the heat-shock sigma factor, is reported to increase in stationary phase although RNA
levels decreased about 3-fold in our experiment. This may be a result of translational control which is
known to play a role in the regulation of rpoH16. Altogether, there were 1,529 genes (including tRNAs and
rRNAs) which were significantly changed. 926 were increased in stationary phase and 603 were decreased.
Annotations are from the University of Wisconsin Genome Project 15,19. The complete dataset can be found
at Express DB21,22.
48
Table 2. ORFs with the largest significant increases in probe set intensity in
stationary phase
Bnumber
b1005
b0836
b0953
b3049
b4045
b3510
b0812
b1480
b2665
b3555
b3239
b1240
b1635
b1051
Gene
ycdF
rmfa
glgSa
yjbJb
hdea,b
dpsa,b
rpsV
ygaU
yiaG
yhcO
gst
msyB
Abs. Change
135446
130009
120465
118425
117238
104184
91763
74063
71120
67426
64840
53219
51788
51334
Fold Change
102
>1000
17
160
9
41
55
48
60
12
140
4
81
11
b0966
b1318
yccV
ycjV
50782
48950
16
75
b1154
b1566
b2212
b1492
b2266
b1164
b3183
b1262
ycfK
flxA
alkB
xasA
elaB
ycgZ
yhbZ
trpC
46949
45987
43206
42971
42249
41961
41925
41711
>180
13
6
85
>140
3
7
7
b1739
osmE
40691
24
Annotation
orf, hypothetical protein
putative receptor
ribosome modulation factor
glycogen biosynthesis, rpoS dependent
orf, hypothetical protein
orf, hypothetical protein
global regulator, starvation conditions
30S ribosomal subunit protein S22
orf, hypothetical protein
orf, hypothetical protein
orf, hypothetical protein
orf, hypothetical protein
glutathionine S-transferase
acidic protein suppresses mutants lacking function
of protein export
orf, hypothetical protein
putative ATP-binding component of a transport
system
orf, hypothetical protein
orf, hypothetical protein
DNA repair system specific for alkylated DNA
acid sensitivity protein, putative transporter
orf, hypothetical protein
orf, hypothetical protein
putative GTP-binding factor
N-(5-phosphoribosyl)anthranilate isomerase and
indole-3-glycerolphosphate synthetase
activator of ntrL gene
Same analysis as Table 1. aThese genes are known to be differentially regulated in stationary
phase16. bThe products of yjbJ, dps, and hdeA are the first, fifth, and sixth most abundant proteins,
respectively, in stationary phase E. coli20.
49
Chapter 3
Global RNA half-life analysis in Escherichia coli reveals
positional patterns of transcript degradation
Douglas W. Selinger, Rini Mukherjee Saxena, Kevin J. Cheung, George M. Church, and
Carsten Rosenow
The research described in this chapter will be published in the February 2003
issue of Genome Research.
The RNA degradation experiment described in this chapter is the result of a close
and fruitful collaboration with Rini Saxena and Carsten Rosenow at Affymetrix. We
began our collaboration after discovering that we had independently generated
microarray datasets of an E. coli rifampicin timecourse. The data analyzed here are those
generated by Rini Saxena. I carried out the data analysis using GAPS, as well as other
programs I developed specifically for RNA degradation analysis, written in Perl, Matlab,
and Mathematica. I was also primarily responsible for writing the manuscript for
publication.
50
Abstract
Sub-genic resolution oligonucleotide microarrays were used to study global RNA
degradation in wild type Escherichia coli MG1655. RNA chemical half-lives were
measured for 1,036 open reading frames (ORFs) and for 329 known and predicted
operons. The half-life of total mRNA was 6.8 minutes under the conditions tested.
Furthermore, we observed significant relationships between gene functional assignments
and transcript stability.
Unexpectedly, transcription of a single operon (tdcABCDEFG) was relatively
rifampicin insensitive and showed significant increases 2.5 minutes after rifampicin
addition. This supports a novel mechanism of transcription for the tdc operon, whose
promoter lacks any recognizable sigma binding sites. Probe by probe analysis of all
known and predicted operons showed that the 5' ends of operons degrade, on average,
more quickly than the rest of the transcript, with stability increasing in a 3' direction,
supporting and further generalizing the current model of a net 5' to 3' directionality of
degradation. Hierarchical clustering analysis of operon degradation patterns revealed that
this pattern predominates but is not exclusive. We found weak but highly significant
correlation between the degradation of adjacent operon regions, suggesting that stability
is determined by a combination of local and operon-wide stability determinants. The 16
ORF dcw gene cluster, which has a complex promoter structure and a partially
characterized degradation pattern, was studied at high-resolution, allowing a detailed and
integrated description of its abundance and degradation. We discuss the application of
51
sub-genic resolution DNA microarray analysis to study global mechanisms of RNA
transcription and processing.
52
Introduction
Gene regulation is a dynamic process which can be controlled by a number of
mechanisms as genetic information flows from nucleic acids to proteins. The study of
gene regulation in the steady state, while informative, overlooks the underlying dynamics
of the processes. Steady state transcript levels are a result of both RNA synthesis and
degradation, and as such, measurements of degradation rates can be used to determine
their rates of synthesis (if their steady state levels are known) as well as reveal regulation
which occurs via changes in RNA stability.
For the genetic regulatory network of E. coli to be understood and eventually
modeled, all means of regulation in use by the cell must be given due attention. RNA
degradation in eubacteria was once viewed as a non-specific, unregulated process. Today
it is known to involve multiple degradation pathways, a multisubunit protein complex
(the degradosome), and to be an important regulatory mechanism for the expression of
some genes. For reviews see (Grunberg-Manago 1999)(Regnier and Arraiano
2000)(Rauhut and Klug 1999). A small number of large-scale RNA degradation analyses
have recently been reported in budding yeast (Wang et al. 2002), humans (Lam et al.
2001), and E. coli (Bernstein et al. 2002).
RNA expression analysis with DNA microarrays has allowed transcription to be
studied at an unprecedented scale. Nevertheless, the potential of the technology to
elucidate the low-level details of the transcription and processing of RNA has been
53
poorly explored. In this study we have taken a first step by identifying global RNA
degradation patterns at the operonic, genic, and subgenic levels.
High-density oligonucleotide arrays from Affymetrix were used to study the
degradation of RNA over essentially the entire transcriptome of Escherichia coli
MG1655 (Selinger et al. 2000). These arrays have subgenic-resolution coverage of the
genome (both coding and non-coding regions), allowing us to examine transcription and
degradation in a relatively continuous and unbiased manner.
We present RNA half-life measurements for 1,036 open reading frames (ORFs)
and for 329 known and predicted operons. We present significant over- and underrepresentation of ORF functional categories in the set of most labile RNAs. We identify
an unusual rifampicin-insensitive promoter (of the tdc operon) and strengthen the case for
its transcription by a novel mechanism. We present evidence for the higher lability of the
5' ends of operons relative to their 3' ends, supporting the current model of an overall 5' to
3' direction of degradation. Finally, we explore positional patterns of RNA degradation
and discuss the current state of the art of high-resolution global transcription analysis.
Results and Discussion
Half-life determination. For the determination of half-lives all experiments were done in
triplicates for each RNA preparation. On average, 23% of the genes were detected at
2.33 (99% confidence) above negative control probes sets. Half-lives were calculated
for 1,036 ORFs, of which 479 were calculated exactly and 557 represent upper bounds.
Average half-lives were calculated for 329 known and predicted operons (see methods)
54
(Tables 1 and 2), although these are only a rough approximation as typically only a subset
of the ORFs had measurable half-lives, and there can be considerable differences
between the degradation of different operonic regions.
After addition of rifampicin, which prevents initiation of new transcripts by
binding to the  subunit of RNA polymerase (Campbell et al. 2001), the total intensity
for all mRNAs decreases exponentially with time (R = 0.98) with an estimated overall
chemical half-life of 6.8 minutes. This is in rough agreement with a recently reported
half-life of 7.5 minutes for total pulse-labeled RNA in comparable conditions (Mohanty
and Kushner 1999). Although absolute decay rates are known to vary appreciably across
experiments, especially those determined in different laboratories, we observe
qualitative agreement with some well-studied transcripts, such as ompA, a very stable
RNA in fast-growing cells (Nilsson et al. 1984)(see methods), and cspA an extremely
unstable one which is transiently stabilized upon cold shock (Goldenberg et al.
1996)(Table 1).
Genes encoding enzymes known to be involved in RNA decay such as pnp, rhlB,
and rho show exponential decay patterns starting immediately after rifampicin
treatment. The genes rne and rnc, also show progressive decay patterns but were
expressed at relatively low levels, making half-life measurement difficult. The genes rnb
and pcnB were undetected throughout the time-course.
Average operon half-lives were calculated by taking the mean of the operons'
member ORFs for which half-lives had been determined. A number of the most unstable
operons (Table 2) enable metabolism that is presumably unnecessary in rich media, such
as amino acid biosynthesis (thr, cad), alternative carbon source catabolism (lac, sdh),
55
and nucleotide biosynthesis (deo). It would be interesting to see whether these
transcripts are more stable in rich media.
Discovery of a rifampicin-insensitive promoter. Surprisingly, a single operon,
tdcABCDEFG, which encodes a pathway for the transport and anaerobic degradation of
L-threonine, was relatively rifampicin insensitive. All seven ORFs of this operon were
significantly upregulated at 2.5 minutes after rifampicin addition. After their initial
increase at 2.5 minutes, the ORFs of the tdc operon show either gradual decay or stability
through the 5 and 10-minute time-points, followed by near-complete degradation by the
20-minute time-point (data not shown). Because rifampicin targets the core of the only
RNA polymerase (RNAP) in E. coli, we were initially surprised to find an operon which
could still be transcribed after rifampicin addition. However, differential sensitivity to
rifampicin by RNAP holoenzyme containing different sigma subunits (70 vs. 32) has
been previously observed (Wegrzyn et al. 1998) suggesting that certain holoenzymes may
be rifampicin insensitive. Furthermore, the tdc promoter is unusual in that it doesn't
contain any recognizable sigma binding sites, but does contains sites for a number of
transcription factors, including CRP, IHF, FNR, LysR, TdcA, and TdcR. It has also been
suggested that the tdc promoter is controlled by a novel mechanism and can be activated
by altering its local topology (Wu and Datta 1995)(Sawers 2001).
RNA decay related to function. To determine whether transcripts whose gene products
participate in the same cellular processes tended to be degraded at the same rates, we
looked at the over- and under-representation of 23 gene functional categories (Blattner et
56
al. 1997) within different half-life ranges (Table 3). P-values were calculated using the
cumulative hypergeometric distribution and a 95% confidence level was used as a cutoff
(Tavazoie et al. 1999). In the set of short-lived ( 5 minutes) transcripts, genes annotated
as putative enzymes were significantly over-represented. Rapidly degraded transcripts are
good candidates for regulation via RNA stability and many of these may be transiently
stabilized in some environmental condition in which they are needed. The instability of
their transcripts, and likely low protein levels, may have been a hindrance to their
discovery and/or characterization. Genes involved in translation and post-translational
modification were significantly under-represented among short-lived ( 5 minutes)
transcripts, reflecting the known stability of the cell's translational machinery. Genes
involved in energy metabolism were significantly over-represented among transcripts
with intermediate half-lives of between 10 and 20 minutes. The genes in this category
are, in general, well studied and are regulated by a variety of mechanisms unrelated to
RNA stability, although in most cases regulation via transcript stability has not been ruled
out.
To assess whether our experiment preferentially measured the half-lives of some
groups of genes relative to others, we looked for differential representations of genes with
measured half-lives relative to all genes on the array. Genes whose half-lives could be
determined in our experiment were significantly over-represented for those involved in
translation and post-translational modification, which are generally very highly expressed
and easy to detect. Those classified as "Hypothetical, unclassified, unknown" or as
putative transport proteins were significantly under-represented, suggesting that both of
these classes in general are expressed at a very low level and/or may contain a number of
57
spuriously predicted ORFs. These two uncharacterized groups stand in contrast to
putative enzymes and putative regulatory proteins, which were detected at a rate
indistinguishable from other groups.
5' to 3' directionality of degradation. RNA is degraded within the cell by the combined
action of RNA exo- and endonucleases. The precise way in which this process occurs has
been a subject of intense study (Grunberg-Manago 1999; Regnier and Arraiano 2000).
Stable 5' secondary structures have been shown to confer stability on downstream
sequences (Emory et al. 1992), while 3' polyadenylation targets transcripts for
degradation (Sarkar 1997). To investigate whether degradation is targeted preferentially
towards the 5' or 3' end of the mRNA, we measured the variability of degradation rates at
different positions of predicted and known operons containing at least 2 ORFs. Each
operon coding region was divided into 3 equal regions (5', middle, and 3'), while 30 bases
upstream and downstream of the operons were denoted 5' and 3' UTRs, respectively. The
UTR was chosen to be relatively short to increase the probability that it was in fact cotranscribed with the operon. The average log2 ratio of probes in each region was
calculated for each operon (see methods).
Log2 ratios of each region were averaged for all operons, as well as for subsets
with specified half-lives, to compare the degradation rates of different transcript regions
(Fig. 1). In the set of all operons, the log2 ratios were most negative for the 5' UTR and
became less negative in a 5' to 3' direction, consistent with a predominantly 5' to 3'
directional mechanism of degradation.
58
To determine whether positional patterns varied depending on overall stability,
operons were grouped based on their average half-lives (Fig. 1). The same trend of 3'increasing stability was seen for all groups, regardless of overall half-life. This trend was
most consistent for the 20-40 minute operons, whereas for the <5 minute and 5-20 minute
operons there were some discrepancies at their 5' ends, especially at the later time-points.
To assess the significance of the differential degradation rates we used a one-way
ANOVA to test whether the differences between average degradation rates of different
operon regions could be accounted for by chance. Significant differences between
regional mean degradation rates were found for almost all timepoints in all half-life sets
using 
 = 0.05 or 0.10, as detailed in
the figure 1 legend. The results for the analysis of all 835 operons were especially
significant, with all p-values below 1x10-12. We conclude that the observed variation in
the rate of degradation of different operonic regions is significant.
Clustering of degradation patterns. It is important to note that while the 5' to 3'
directionality illustrated by figure 1 indicates that, in general the 5' ends of operons are
degraded more quickly than their 3' ends, it does not indicate whether this is the only
pattern of operon degradation, or simply the most common one. To distinguish between
these two possibilities the degradation patterns of all operons were clustered using a
hierarchical clustering algorithm and displayed as a tree (Eisen et al. 1998) (Fig. 2). 149
known and predicted operons for which complete data was available were divided into 5
operon regions: 5' and 3' UTR (representing 30 bases up- and down-stream of the
translation start and stop, respectively), and equal-length 5', middle, and 3' coding
59
regions. Within each operon, each region was ranked from most stable (5) to least stable
(1) based on the average log2 ratio of oligos in that region at each timepoint. This withinoperon normalization allows operons with similar patterns to be grouped together
regardless of their overall rate of degradation. The results of the clustering analysis
indicate, that while there is a clear predominance of a 5' to 3' degradation pattern, other
patterns are also present. Nevertheless, the degradation ranks for each region, when
averaged over all operons, show a clear trend consistent with an overall 5' to 3'
directionality of degradation.
To assess the statistical significance of the observed directionality we performed a
2 goodness of fit test on each transcript region. We are easily able to reject the null
hypothesis that each region has an equiprobable distribution of ranks, with p-values
ranging from 2x10-6 to 2x10-38 (Fig. 2). From inspection of the rank distributions we
conclude that 5' regions of operons are significantly more likely to be degraded quickly
and 3' regions more likely to be degraded slowly.
Because certain transcript features, such as the ompA stabilizer (Emory et al.
1992), are known to exert their effects along an entire transcript, we analyzed the extent
to which the degradation of one region is correlated to other regions. The average
Pearson's linear correlation coefficient (R) between the degradation of adjacent regions
was 0.38, and the average correlation between any two operon regions was 0.26. These
weak, but statistically significant (p < 0.005), correlations suggest that while there are
important operon-wide determinants of stability, local determinants may play a larger
role in the stability of RNAs. This emphasizes the need to scrutinize transcription and
degradation at a higher level of resolution.
60
It should be noted that despite the difficulties of defining transcript boundaries, as
well as the existence of operons with multiple promoters and terminators, we were still
able to identify significant patterns. As our knowledge of these confounding factors
increases we may expect to see even clearer patterns emerge.
High-resolution analysis of the dcw gene cluster. The dcw gene cluster, important for
cell envelope biosynthesis and cell division, contains 16 ORFs and has a complex
promoter structure (Fig. 3) (Vicente et al. 1998)(Dewar and Dorazi 2000). It is
transcribed mainly from two clusters of promoters located at the 5' end (~ORFs 1-3), and
near the 3' end (ORFs12-14). We observe a complex degradation pattern for this operon,
with 3 primary domains of stability (Figs. 3,4). The 5' end is degraded most rapidly,
consistent with the most commonly observed pattern. The central region is relatively
stable from murE to murC. The 3' end, from ddlB to envA, has an intermediate stability,
with ftsA and ftsZ having nearly identical half-lives, as has been reported previously
(Cam et al. 1996).
These domains of stability roughly coincide with the clusters of promoters,
suggesting they represent somewhat independent units which the cell chooses to regulate
simultaneously by both transcriptional initiation and degradation. Interestingly, the
relatively high signal intensity at mraZ and ddlB corresponds to the positions of the two
major promoters Pmra and ftsQ2p1p, respectively (Flardh et al. 1997; Mengin-Lecreulx
et al. 1998) (Fig. 4). This suggests that the regions downstream of these promoters are
maintained at higher steady-state RNA levels in the cell, although we are cautious about
making a firm conclusion in this regard due to the only semi-quantitative nature of the
61
relationship between microarray signal intensity and absolute RNA abundance.
Nevertheless, this observation is consistent with previous measurements which show that
about one-third of the transcription of ftsZ originates at promoters located within and
between ddlB and ftsA, with the other two-thirds originating upstream of ddlB (de la
Fuente et al. 2001; Flardh et al. 1998).
The future of high-resolution transcriptome analysis. The type of transcriptome data
presented here enables genome-wide analyses which until now have only been done on a
small scale. For example, the relationship between RNA degradation and RNA sequence
features, such as RNase sites and known and predicted secondary structures, can be
assessed, as well as the effects of mutations, especially to the RNA degradation
machinery. These data are also useful in the empirical definition of transcription
boundaries (Selinger et al. 2000; Tjaden et al. 2002) and promoter usage.
We expect such high-resolution analyses to increase in precision. Probe to probe
variation, which can mask local changes in RNA abundance, can be improved by
smoothing or, perhaps, by more sophisticated model-based (Li and Hung Wong 2001) or
correlation-based methods (Cohen et al. 2000). High-resolution mapping of human exon
boundaries using oligonucleotide arrays has also been reported (Kapranov et al. 2002;
Shoemaker et al. 2001). Microarrays could be designed with probes more evenly spaced
throughout the ORFs and the intergenic regions to allow more comprehensive coverage
of the transcriptome. The continually increasing density of oligonucleotide arrays
suggests that transcriptome data, and our resulting understanding of transcriptional
regulation, will increase not only in scope, but also in detail.
62
Methods
Growth of bacterial strains and transcript inhibition. E. coli, wild-type strain
MG1655 was grown in LB broth medium in shaken flasks at 37C to midlogarithmic phase (A600 = 0.8) and then split into five flasks of 20 ml each. To
initiate transcription inhibition, four of these samples were treated with rifampicin
(Sigma, St. Louis, MO) at a concentration of 50 g ml-1 and incubated for an
additional 2.5, 5, 10, and 20 minutes respectively, followed by immediate
harvesting of the cells. The fifth sample was used as a control and cells were
harvested immediately (at time-point zero). All RNA isolation procedures were
accomplished with the MasterPure Complete DNA and RNA Purification kit
from Epicentre Technologies, Madison, WI, as described previously (Rosenow et
al. 2001).
RNA labeling and hybridization. The cDNA synthesis method was described
previously (Rosenow et al. 2001). Briefly, 10 g of total RNA was reverse transcribed
using the Superscript II system for first strand cDNA synthesis from Life Technologies
(Rockville, MD). The remaining RNA was removed using 2 U RNase H (Life
Technologies, Rockville, MD) and 1 g RNase A (Epicentre, Madison, WI) for 10 min at
37C in 100 l total volume. The cDNA was purified using the Qiaquick PCR
purification kit from Qiagen (Valencia, CA). Isolated cDNA was quantitated based on the
absorption at 260 nm and fragmented using a partial DNase I digest. The fragmented
63
cDNA was 3’ end-labeled using terminal transferase (Roche Molecular Biochemicals,
Indianapolis, IN) and biotin-N6-ddATP (DuPont/NEN, Boston, MA). The fragmented
and end-labeled cDNA was added to the hybridization solution without further
purification. Three microarray hybridizations were carried out for each time-point.
Chip Scaling, Transcript Detection. To account for experimental and chip variations,
all intensities were normalized according to the variations of the cRNA controls, which
were added before the RNA labeling reaction and contain 4 probe sets targeting RNAs
not present in the E. coli genome. The controls show a variation of less than 10% before
scaling for all 15 labeling reactions (data not shown). Transcript abundances for each
RNA were calculated in GAPS© by taking a mean of the perfect match (PM) minus
mismatch (MM) probes, after removing the highest and two lowest (2-13max) (Selinger
et al. 2000) and are referred to here simply as "average difference" (AD) (Lockhart et al.
1996). Each RNA is typically targeted by 15 unique oligonucleotide probe pairs. A
transcript was considered "detected" if it was 2.33 (99% confidence) above the
negative controls (90 probe sets for genes not present in the MG1655 genome). For the
five time-points (0, 2.5, 5, 10, and 20 minutes) mRNA detection rates were 24, 27, 27,
18, and 6 percent, respectively, with detection cutoffs of 1766, 1014, 975, 1202, and 1327
AD units. The mean of the negative controls has been subtracted from all reported values
so that values greater than 0 signify an average difference greater than the negative
controls. For high resolution analysis (including the directionality analysis) we calculated
log2 ratios as log2((PM-MM of time t)/(PM-MM of time 0)). We only used probe pairs in
which PM-MM at time 0 was greater than 100 normalized fluorescent units.
64
RNA Chemical Half-life Determination. Probe pairs (perfect match - mismatch) were
averaged over the triplicates of each time-point (0, 2.5, 5, 10 and 20 minutes after
rifampicin addition), resulting in an average probe set intensity for each ORF. RNA
abundances were determined using the average difference metric implemented by
GAPS©. Chemical half-life was determined for each RNA by the following "two-fold"
algorithm: i) The earliest time-point at which the transcript was detected was used as the
baseline abundance. ii) The earliest successive time-point for which a two-fold decrease
was detected was used as the experimental abundance and the half-life was calculated
assuming exponential decay. When the baseline but not the experimental time-point was
detected the half-life was estimated (yielding an upper-bound estimate) using the noise
value in place of the experimental value. Other categories were defined, such as "stable"
(transcript is detected but no change as great as two-fold observed), "possible increase" (a
minimum two-fold change between any two time-points), "erratic" (both a two-fold
increase and decrease observed), and "possibly stable" (at least a two-fold decrease
observed, but later returns to baseline level). Slot blots for 4 genes were carried out as a
validation of the array-measured RNA half-lives and gave the following results (slot
blot/array): ompA 20.2 min/stable; cspC 17.2 min/possibly stable; fldA 10 min/6.7 min;
sodA 9.5 min/6.9 min. Half-lives were alternatively calculated by fitting an exponential
decay curve to all time-points, regardless of fold change or signal-to-noise thresholds.
This approach was deemed inferior to the two-fold algorithm because it gave
considerably poorer agreement with slot blot data, showed less sensitivity to rapidly
degrading transcripts, and gave spurious results for RNAs whose signal dropped below
65
the detection threshold at later time-points (data not shown). Average half-lives were
calculated for predicted and observed operons from RegulonDB (Salgado et al. 2001) by
taking a mean for all operon members whose half-life had been determined. Half-lives
with estimated upper bounds of greater than 40 minutes were set equal to 40 minutes to
avoid skewing the results. The complete list of transcripts, calculated half-lives (of both
ORFs and operons), and pattern categories are available at
http://arep.med.harvard.edu/rna_decay/. The dataset was also deposited in ExpressDB
(Aach et al. 2000) at http://arep.med.harvard.edu/ExpressDB/.
66
All Operons
Op 5p
Op M
Op 3p
3p UTR
5p UTR
0
0
-0.5
-0.5
-1
-1.5
-2
-2.5
-3
n=835
-3.5
Average Log2 Ratio
Average Log2 Ratio
5p UTR
20-40 min Operons
Op M
Op 3p
3p UTR
-1
-2
-2.5
-3
n=81
-3.5
HL Not Determined
Op 3p
3p UTR
5p UTR
0
0
-0.5
-0.5
-1
-1.5
-2
-2.5
-3
n=82
-3.5
Average Log2 Ratio
Average Log2 Ratio
Op 5p
Op M
-1.5
<5 min Operons
5p UTR
Op 5p
Op 5p
Op M
Op 3p
3p UTR
-1
-1.5
-2
-2.5
-3
n=506
-3.5
5-20 min Operons
5p UTR
Op 5p
Op M
Op 3p
3p UTR
Average Log2 Ratio
0
-0.5
2.5 min
5 min
10 min
20 min
-1
-1.5
-2
-2.5
-3
-3.5
n=166
Figure 1. Positional differences in operon degradation.
Operon regions are plotted on the x-axis, average log2 ratios (compared to the 0 minute
time-point) are plotted on the y-axis. Vertical bars indicate standard error. Operons were
divided into 5 regions: 30 bases upstream (5p UTR) and downstream (3p UTR), and three
equal length regions of the coding region: 5 prime (Op 5p), middle (Op M), and 3 prime
(Op 3p). Patterns of operons with different average half-lives were compared. A 5’ to 3’
directionality is observable in the coding regions of all operon subsets. This
directionality generally extends at least 30 bases into the UTRs, although the 5’ UTR of
quickly degrading operons (<5 min) seems to be more stable than the coding region. All
curves in this figure have statistically significant variation between means by one-way
ANOVA at  = 0.001, with the following exceptions: 2.5 min of the '20 - 40 min' graph,
and the 5 and 20 minute curves of the 'half-life not determined' graph which were
significant at  = 0.05, 0.05, and 0.10, respectively. P-values for timepoints on the 'all
operons' graph were all below 1x10-12.
67
Figure 2. Whole genome cluster analysis of operon degradation (following page).
The degradation patterns of 149 operons (containing 2 or more ORFs, and oligo probes in
all targeted regions) were hierarchically clustered after ranking the relative degradation
rate of each region. The algorithm was implemented using the GeneCluster/TreeView
package (Eisen et al. 1998). Transcript regions are on the x-axis, with each region split
into 2.5, 5, 10, and 20-minute time-points. The average rank increases from 5' to 3',
supporting a predominant 5' to 3' directionality of degradation (cluster c). The clustering
also reveals that a variety of degradation patterns are present, such as operons with
relatively stable 5' UTRs (cluster a). One group of operons (cluster b) is initially
degraded most quickly at its 3' UTR at 2.5 and 5 minutes, but then by the 10 minute timepoint is more quickly degraded at it's middle and 3' coding regions. 2 goodness of fit
tests show that the distributions of degradation ranks are highly non-random, with 5'
regions more likely to be degraded quickly and 3' regions more likely to be degraded
slowly. The complete clustering file, including gene names, is available at
http://arep.med.harvard.edu/rna_decay/.
68
Time/Region
5’UTR 5’ M
3’ 3’UTR
a
b
c
Average Rank 2.4 2.8 2.9
Rank Distribution
2 p-value
3.4 3.4
1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5
2x10-38 2x10-6
Unstable (1)
1x10-7
4x10-25 1x10-21
Stable (5)
Figure 2
69
6
HL(min): 1.8
ORF: mraZ
6.7
ftsL
10<n<20
murE
10<n<20
mraY
>20
ftsW
>20
murC
10
ftsQ
8.9
ftsZ
4
Average Log2 Ratio
ORF:
mraW
HL(min): 5<n<10
ftsI
4.7
murF
N/D
murD
N/D
murG
>20
ddlB
8.9
ftsA
9.2
envA
8.3
2
0
-2
-4
+
x
2.5 min
5 min
10 min
20 min
-6
-2
0
2
4
6
8
10
12
14
16
Position (kb)
Figure 3. High-resolution analysis of the dcw Gene Cluster Transcripts.
Average log2 ratios (y-axis) were plotted against operon position (x-axis) for 3-probe
sliding windows of the 156 probes in the dcw gene cluster, including 30 bases up- and
downstream of the first and last ORFs. The positions of known promoters (○) and the
ORFs with their estimated half-lives are given in the upper part of the graph. Arrows
indicate known RNase E processing sites. An additional weak promoter is thought to be
present in either murD or ftsW (Mengin-Lecreulx et al. 1998). rho-independent
terminators are present in the 5' region of mraZ and immediately downstream of envA.
Degradation is fastest at the 5' end of the operon, with three apparent regions of distinct
degradation rates: the 5' region (mraZ-ftsI) is degraded the fastest, the middle (murEmurC) is relatively stable, and the 3' region (ddlB-envA) is degraded at an intermediate
rate.
70
18
7000
6000
5000
4000
3000
2000
1000
0
Signal Intensity (AD)
8000
envA
ftsZ
ftsA
ftsQ
ddlB
murC
murG
ftsW
murD
mraY
murF
murE
ftsL
mraW
mraZ
ftsI
-1000
0 min
2.5 min
5 min
10 min
20 min
Figure 4. Transcript Abundance and Degradation of the dcw Gene Cluster
The dcw gene cluster contains 16 ORFs involved in cell envelope biosynthesis and cell
division. Several promoters have been described (see Fig. 3) and it is likely that they are
all used, to varying extents. It has also been speculated that the cluster may sometimes be
transcribed in its entirety. The ORFs have been plotted in the order they are transcribed,
showing their array signal intensities (average differences) throughout the time-course.
Although average difference is only an approximate indicator of transcript abundance,
relatively high levels of steady state RNA are observed downstream of the mraZ and ddlB
promoters, at the 5' end and about two-thirds of the way into the transcript, respectively.
The middle portion of the operon has lower steady state RNA levels and is degraded
more slowly (see Fig. 3).
71
B#
b4188
b3605
b3914
b0990
b3913
b0553
b2398
b3494
b3556
b3685
b0726
b1205
b3362
b0162
b1060
b2080
b2377
b3361
b4132
b4396
Name HL
yjfN
lldD
cpxP(2)
cspG
cpxP(1)
nmpC
yfeC
uspB
cspA
yidE
sucA
ychH
yhfG
cdaR
yceP
yegP
yfdY
fic
cadB
rob
0.8*
0.9*
1.0
1.1
1.1
1.2
1.2
1.2*
1.2
1.2*
1.3
1.3
1.3*
1.4*
1.4
1.4
1.4
1.4
1.4
1.4
0
8782
7031
10398
6302
10811
4704
5062
4330
20403
4373
4699
11630
3959
3366
11780
5355
6525
6270
6923
4685
2.5
744
770
1812
1324
2352
1147
1218
870
4696
651
1236
2964
884
859
3294
1567
1880
1888
2019
1339
5
67
1424
3530
935
3506
742
614
366
3056
722
1001
692
1014
473
5286
2914
1189
2035
3046
734
10
20
-41
221
997
389
790
187
122
-33
1556
202
701
67
556
105
1886
1769
944
1267
2287
-32
-270
-109
-11
105
27
-183
-107
-191
100
-150
-161
-171
-114
520
1374
922
53
39
238
-322
Table 1. 20 most labile mRNAs
The twenty most labile mRNAs with their average difference (AD) intensities at each
time-point. 12 out of 20 have unknown or putative functions. High lability may be an
indication of regulation at the level of RNA stability. This is known to be the case for
cspA, which is extremely unstable at 37° but transiently stable after a shift to 15°
(Goldenberg et al. 1996). The lability of cspG suggests that it may behave similarly.
Numbers shaded in grey are below the 99% confidence detection threshold (see
methods). *Half-life represents an upper bound.
72
Avg. HL Operon
1.35 pabA fic yhfG
1.35 yfeC yfeD
1.65 cadA cadB cadC
1.75 deoC deoA deoB deoD
1.95 yhcH yhcI nanE nanT
2.05 ynfB speG
2.1 thrL thrA thrB thrC
2.2 sdhC sdhD sdhA sdhB
2.2 yjbQ yjbR
2.35 lacA lacY lacZ
2.4 folX yfcH
2.45 ybjC mdaA
2.47 nagD nagC nagA nagB
Table 2. Operons with average half-lives  2.5 minutes
A number of these unstable operons enable biosynthesis that is presumably unnecessary
in rich media, such as amino acid biosynthesis (thr, cad), alternative carbon sources (lac,
sdh), and nucleotide biosynthesis (deo). Underlining indicates half-lives used in the
average.
73
Functional Category
Putative enzymes
Translation and posttranslational modification
Energy metabolism
Translation, post-translational
modification
Hypothetical, unclassified,
unknown
Putative transport proteins
Experimental Group
HL <= 5 min
HL <= 5 min
Rep.
over
under
p-value
6.5x10-5
1.8x10-5
10 min < HL < 20 min
ORFs with measured HLs
over
over
5.4x10-3
1.8x10-23
ORFs with measured HLs
under
1.3x10-8
ORFs with measured HLs
under
2.8x10-5
Table 3. Functional category representation of half-life groups
Several half-life groupings (HL  5 min, 5 < HL  10, 10< HL  20, and HL >20) were
tested for over- or under-representation of 23 different functional categories (Blattner et
al. 1997) relative to all genes whose half-lives were estimated. Categories were also
identified which were over- or under-represented in the set of all ORFs with measured
half-lives. P-values were calculated using the cumulative hypergeometric distribution
(Tavazoie et al. 1999). A 95% confidence level was achieved using a cutoff of 2.2x10-3
to account for multiple hypotheses. Transcripts displaying no preference were those
encoding proteins involved in transport and binding, structure and membrane proteins,
carbon compound metabolism, amino acid- and nucleotide biosynthesis and metabolism,
and central intermediary metabolism, transcription, and post-transcriptional regulation.
Despite the rapid degradation of some well-studied genes (such as pnp, rhlB, and rho), as
a whole, genes involved in RNA degradation were not significantly enriched in any halflife group.
74
Acknowledgements
We thank Sidney Kushner for advice and the provision of mutants (not used in this
study). We thank Kenn Rudd and Joel Belasco for critical reviews of the manuscript. One
of the authors (DS) was graciously hosted in the lab of Minoru Kanehisa for part of this
work. This work was supported by grants from the NSF-MEXT Monbusho program,
Lipper Foundation, NSF, and DOE.
References
Aach, J., W. Rindone, and G.M. Church. 2000. Systematic Management and Analysis of
Yeast Gene Expression Data. Genome Res 10: 431-445.
Bernstein, J.A., A.B. Khodursky, P.H. Lin, S. Lin-Chao, and S.N. Cohen. 2002. Global
analysis of mRNA decay and abundance in Escherichia coli at single-gene
resolution using two-color fluorescent DNA microarrays. Proc Natl Acad Sci U S
A 99: 9697-9702.
Blattner, F.R., G. Plunkett, 3rd, C.A. Bloch, N.T. Perna, V. Burland, M. Riley, J.
Collado-Vides, J.D. Glasner, C.K. Rode, G.F. Mayhew, J. Gregor, N.W. Davis,
H.A. Kirkpatrick, M.A. Goeden, D.J. Rose, B. Mau, and Y. Shao. 1997. The
complete genome sequence of Escherichia coli K-12. Science 277: 1453-1474.
Cam, K., G. Rome, H.M. Krisch, and J.P. Bouche. 1996. RNase E processing of essential
cell division genes mRNA in Escherichia coli. Nucleic Acids Res 24: 3065-3070.
Campbell, E.A., N. Korzheva, A. Mustaev, K. Murakami, S. Nair, A. Goldfarb, and S.A.
Darst. 2001. Structural mechanism for rifampicin inhibition of bacterial rna
polymerase. Cell 104: 901-912.
Cohen, B.A., R.D. Mitra, J.D. Hughes, and G.M. Church. 2000. A computational analysis
of whole-genome expression data reveals chromosomal domains of gene
expression. Nat Genet 26: 183-186.
de la Fuente, A., P. Palacios, and M. Vicente. 2001. Transcription of the Escherichia coli
dcw cluster: evidence for distal upstream transcripts being involved in the
expression of the downstream ftsZ gene. Biochimie 83: 109-115.
Dewar, S.J. and R. Dorazi. 2000. Control of division gene expression in Escherichia coli.
FEMS Microbiol Lett 187: 1-7.
Eisen, M.B., P.T. Spellman, P.O. Brown, and D. Botstein. 1998. Cluster analysis and
display of genome-wide expression patterns. Proc Natl Acad Sci U S A 95:
14863-14868.
Emory, S.A., P. Bouvet, and J.G. Belasco. 1992. A 5'-terminal stem-loop structure can
stabilize mRNA in Escherichia coli. Genes Dev 6: 135-148.
75
Flardh, K., T. Garrido, and M. Vicente. 1997. Contribution of individual promoters in the
ddlB-ftsZ region to the transcription of the essential cell-division gene ftsZ in
Escherichia coli. Mol Microbiol 24: 927-936.
Flardh, K., P. Palacios, and M. Vicente. 1998. Cell division genes ftsQAZ in Escherichia
coli require distant cis-acting signals upstream of ddlB for full expression. Mol
Microbiol 30: 305-315.
Goldenberg, D., I. Azar, and A.B. Oppenheim. 1996. Differential mRNA stability of the
cspA gene in the cold-shock response of Escherichia coli. Mol Microbiol 19: 241248.
Grunberg-Manago, M. 1999. Messenger RNA stability and its role in control of gene
expression in bacteria and phages. Annu Rev Genet 33: 193-227.
Kapranov, P., S.E. Cawley, J. Drenkow, S. Bekiranov, R.L. Strausberg, S.P. Fodor, and
T.R. Gingeras. 2002. Large-scale transcriptional activity in chromosomes 21 and
22. Science 296: 916-919.
Lam, L.T., O.K. Pickeral, A.C. Peng, A. Rosenwald, E.M. Hurt, J.M. Giltnane, L.M.
Averett, H. Zhao, R.E. Davis, M. Sathyamoorthy, L.M. Wahl, E.D. Harris, J.A.
Mikovits, A.P. Monks, M.G. Hollingshead, E.A. Sausville, and L.M. Staudt.
2001. Genomic-scale measurement of mRNA turnover and the mechanisms of
action of the anti-cancer drug flavopiridol. Genome Biol 2.
Li, C. and W. Hung Wong. 2001. Model-based analysis of oligonucleotide arrays: model
validation, design issues and standard error application. Genome Biol 2.
Lockhart, D.J., H. Dong, M.C. Byrne, M.T. Follettie, M.V. Gallo, M.S. Chee, M.
Mittmann, C. Wang, M. Kobayashi, H. Horton, and E.L. Brown. 1996.
Expression monitoring by hybridization to high-density oligonucleotide arrays.
Nat Biotechnol 14: 1675-1680.
Mengin-Lecreulx, D., J. Ayala, A. Bouhss, J. van Heijenoort, C. Parquet, and H. Hara.
1998. Contribution of the Pmra promoter to expression of genes in the
Escherichia coli mra cluster of cell envelope biosynthesis and cell division genes.
J Bacteriol 180: 4406-4412.
Mohanty, B.K. and S.R. Kushner. 1999. Analysis of the function of Escherichia coli
poly(A) polymerase I in RNA metabolism. Mol Microbiol 34: 1094-1108.
Nilsson, G., J.G. Belasco, S.N. Cohen, and A. von Gabain. 1984. Growth-rate dependent
regulation of mRNA stability in Escherichia coli. Nature 312: 75-77.
Rauhut, R. and G. Klug. 1999. mRNA degradation in bacteria. FEMS Microbiol Rev 23:
353-370.
Regnier, P. and C.M. Arraiano. 2000. Degradation of mRNA in bacteria: emergence of
ubiquitous features. Bioessays 22: 235-244.
Rosenow, C., R.M. Saxena, M. Durst, and T.R. Gingeras. 2001. Prokaryotic RNA
preparation methods useful for high density array analysis: comparison of two
approaches. Nucleic Acids Res 29: E112.
Salgado, H., A. Santos-Zavaleta, S. Gama-Castro, D. Millan-Zarate, E. Diaz-Peredo, F.
Sanchez-Solano, E. Perez-Rueda, C. Bonavides-Martinez, and J. Collado-Vides.
2001. RegulonDB (version 3.2): transcriptional regulation and operon
organization in Escherichia coli K-12. Nucleic Acids Res 29: 72-74.
Sarkar, N. 1997. Polyadenylation of mRNA in prokaryotes. Annu Rev Biochem 66: 173197.
76
Sawers, G. 2001. A novel mechanism controls anaerobic and catabolite regulation of the
Escherichia coli tdc operon. Mol Microbiol 39: 1285-1298.
Selinger, D.W., K.J. Cheung, R. Mei, E.M. Johansson, C.S. Richmond, F.R. Blattner,
D.J. Lockhart, and G.M. Church. 2000. RNA expression analysis using a 30 base
pair resolution Escherichia coli genome array. Nat Biotechnol 18: 1262-1268.
Shoemaker, D.D., E.E. Schadt, C.D. Armour, Y.D. He, P. Garrett-Engele, P.D.
McDonagh, P.M. Loerch, A. Leonardson, P.Y. Lum, G. Cavet, L.F. Wu, S.J.
Altschuler, S. Edwards, J. King, J.S. Tsang, G. Schimmack, J.M. Schelter, J.
Koch, M. Ziman, M.J. Marton, B. Li, P. Cundiff, T. Ward, J. Castle, M.
Krolewski, M.R. Meyer, M. Mao, J. Burchard, M.J. Kidd, H. Dai, J.W. Phillips,
P.S. Linsley, R. Stoughton, S. Scherer, and M.S. Boguski. 2001. Experimental
annotation of the human genome using microarray technology. Nature 409: 922927.
Tavazoie, S., J.D. Hughes, M.J. Campbell, R.J. Cho, and G.M. Church. 1999. Systematic
determination of genetic network architecture. Nat Genet 22: 281-285.
Tjaden, B., D.R. Haynor, S. Stolyar, C. Rosenow, and E. Kolker. 2002. Identifying
operons and untranslated regions of transcripts using Escherichia coli RNA
expression analysis. Bioinformatics 18 Suppl 1: S337-S344.
Vicente, M., M.J. Gomez, and J.A. Ayala. 1998. Regulation of transcription of cell
division genes in the Escherichia coli dcw cluster. Cell Mol Life Sci 54: 317-324.
Wang, Y., C.L. Liu, J.D. Storey, R.J. Tibshirani, D. Herschlag, and P.O. Brown. 2002.
Precision and functional specificity in mRNA decay. Proc Natl Acad Sci U S A
99: 5860-5865.
Wegrzyn, A., A. Szalewska-Palasz, A. Blaszczak, K. Liberek, and G. Wegrzyn. 1998.
Differential inhibition of transcription from sigma70- and sigma32-dependent
promoters by rifampicin. FEBS Lett 440: 172-174.
Wu, Y. and P. Datta. 1995. Influence of DNA topology on expression of the tdc operon
in Escherichia coli K-12. Mol Gen Genet 247: 764-767.
77
Chapter 4
Conclusion
78
At the outset of this thesis, there was little doubt that microarray expression
analysis, which had already been successfully applied to eukaryotes, would eventually be
extended to prokaryotes. In the course of my doctoral research I developed experimental
methods generally useful in prokaryotic microarray analysis, as well as software tools
which enable subgenic resolution analysis of transcription (Selinger et al. 2000). The
global and high resolution nature of this approach led to a number of observations with
important implications for bacterial transcription and the elucidation of RNA decay
pathways. Here I consider some of these implications and speculate on future avenues of
research.
Widespread antisense transcription
The detection of widespread transcription from the antisense strand (relative to
known and predicted ORFs) was perhaps the most surprising result, and has been the
subject of much speculation. Initially observed using a first strand cDNA labeling
approach of stationary phase cells (Chapter 2) and later confirmed by a direct RNA
labeling approach of log phase cells, we detected transcription from the antisense strand
for >90% of the ORFs. The chance that this highly specific signal is an artifact of
incomplete removal of gDNA was reduced by treatment of isolated cellular RNA with
DNase I before labeling, and subsequent lack of observable genomic DNA by ethidium
bromide staining. Additionally, antisense transcription was observed using two different
labeling protocols, one of which involved direct labeling of RNA.
79
Sense
Antisense
Figure 1. Hybridization of directly labeled RNA isolated from cells growing at log phase
to both a standard "sense" array (left) and a reverse complement "antisense" array (right)
to detect antisense transcription.
Comparison of hybridization of the same sample to both "sense" and "antisense"
(Fig. 1) reveals that signal on the sense strand has a wider range of intensities, indicating
that their corresponding transcripts are needed in differing amounts, whereas the
antisense signal tends to be relatively uniform and of lower intensity. (Note that the
intensity of the antisense array in this image was increased for clarity.)
The almost universal detection of antisense transcription at a low, and relatively
uniform level suggests that it may be, for the most part, transcriptional "noise" caused by
the imperfect initiation and/or termination of RNA polymerase. However, it is also likely
that at least some of these small RNAs have a bona fide biological function. In fact, small
RNAs are receiving a large amount of attention, recently being declared the
"Breakthrough of the Year" by Science (Couzin 2002), due in part to genome scale
screens like the one described here. Historically, this is an interesting return to the roots
of molecular biology, as Jacob and Monod initially proposed RNA as the likely trans
80
acting factor in the transcriptional control of operons (Jacob and Monod 1961), before the
focus later shifted decisively to proteins. It is unclear precisely how many functional
small RNAs exist in E. coli, although many are known (Wassarman et al. 1999) and new
ones are continually being identified (Argaman et al. 2001; Eddy 2001; Wassarman et al.
2001).
There are several ways to assess the biological significance of small RNAs. CsrB,
a known small untranslated RNA, was inadvertently represented in the wrong orientation
on the sense array, and in a stationary phase experiment (Chapter 2, Fig. 3) was clearly
detected on an antisense array at a signal intensity far above any other. This suggests that
signal intensity may be a reliable indicator for biological significance. Also, the presence
of upstream consensus promoter sequences, evolutionary conservation, or known RNA
hairpin structures may be good indicators of biological function.
Transcriptional or post-transcriptional regulation of small RNAs across
timecourses, conditions, or mutants would also strongly suggest a biological function.
Our RNA degradation analysis found that ORFs annotated as "hypothetical, unclassified,
unknown" had an increased likelihood of showing no change in RNA levels throughout a
rifampicin timecourse (Chapter 3, Table 3), suggesting that many of these
computationally predicted ORFs are erroneous. This observation, together with the
possibility of widespread transcriptional noise, highlights the need for ORF- or small
RNA finding algorithms with higher levels of specificity. The widespread transcription
we observe should also sound a cautionary note that many verifiably transcribed RNAs
may not have a biological function, and that further tests of functionality are necessary
before a firm conclusion can be drawn.
81
RNA decay pathways
In 1973, Apirion proposed that mRNA decay in E. coli involves both endo- and
exonucleolytic events (Apirion 1973). These were assumed to be part of a ribonucleotide
salvage pathway, and the possibility that RNA stability may play a role in gene regulation
was not originally considered. As with many areas of molecular biology, the picture is
considerably more complicated now than when it was initially envisioned. E. coli is
known to contain at least 5 endo- and 8 exoribonucleases, and RNA decay is thought to
be a carefully controlled process which, in many cases, plays a gene regulatory role
(Grunberg-Manago 1999; Kushner 2002). Although far more attention has been paid to
gene regulation at the level of transcriptional initiation, RNA stability is emerging as an
important determinant of gene activity. Gene expression is a dynamic process, and we
may learn a lot by paying attention to both the creation and destruction of this key
intermediate in the flow of cellular information.
Before the advent of microarrays, the decay of fewer than 25 bacterial RNAs had
ever been studied (Bernstein et al. 2002). While these careful studies have lead to a
wealth of information about the mechanisms of RNA degradation, it has always been
difficult to draw general conclusions from such a small sampling of the transcriptome.
Traditionally, transcriptome-wide measurements lacked gene level detail, and more
detailed analyses only applied to a small number of transcripts at a time. In the RNA
degradation study described in this thesis, transcriptome-wide coverage was combined
with subgenic level detail to allow a preliminary glimpse into the global patterns of RNA
degradation.
82
RNA, being a directionally oriented linear polymer, can be degraded in many
distinct ways: from the 5' end, the middle, the 3' end, or some combination thereof.
Different patterns have different biological consequences making some more intuitively
favorable than others. For example, a 3' → 5' directionality would cause the degradation
machinery to run against the translating ribosomes as well as allow the translation of
many incomplete peptide products; both presumably poor adaptations. In contrast, a 5' →
3' directionality would allow faster inactivation of transcripts, no incomplete proteins,
and would work co-directionally with translation. Thus, it has been hypothesized that
most transcripts are degraded in a 5' → 3' direction, a proposal which has been shown
rigorously for -Gal mRNA (Cannistraro and Kennell 1985; Cannistraro et al. 1986)
(Kennell 2002; Kushner 2002). Notably, examples of 3' → 5' directionality are also
known, suggesting multiple pathways of RNA decay exist (Arraiano et al. 1997; von
Gabain et al. 1983). Interestingly, E. coli doesn't appear to have a 5' → 3' exonuclease,
and the net 5' → 3' directionality is thought to result from an endonuclease (RNase E/G)
which initiates degradation at the 5' end and then successively jumps to more 3' sites
(Regnier and Arraiano 2000). This step is thought to be closely followed by a 3' → 5'
exonuclease digestion (RNase II, PNPase) which quickly degrades the resulting
fragments into oligoribonucleotides, which in turn, are degraded to mononucleotides by
oligoribonuclease (Ghosh and Deutscher 1999).
The results presented in Chapter 3 confirm and further generalize the 5' → 3'
directionality of RNA degradation in E. coli as well as identify a significant number of
transcripts which do not conform to the model. Hierarchical clustering of the degradation
patterns, in addition to confirming that the 5' → 3' mechanism predominates, also
83
revealed a number of transcripts which appear to have very stable 5' UTRs (Chap. 3, Fig.
2, cluster a) and some cases in which the region most targeted by RNases changes
midway through the timecourse (Chap. 3, Fig. 2, cluster b). Furthermore, although the
degradation of bulk mRNA was exponential, our data suggests that the degradation of
many individual transcripts is not (Appendix C). The transcripts with unusual degradation
patterns highlighted by this analysis would make excellent targets for follow-up studies,
as they may be processed by alternative degradation pathways. Further subgenic
resolution microarray studies using various mutants of the RNA decay pathway would
give invaluable information about the genes responsible for the patterns observed. It
would also be interesting to see if RNA sequence or structural motifs could be associated
with particular patterns of degradation. While this has been found for individual
transcripts (Emory et al. 1992), initial genome wide searches so far have been
unsuccessful (Bernstein et al. 2002).
Final thoughts
Microarray analysis provides a powerful combination of exhaustiveness and
detail, allowing systematic surveys of transcription and RNA decay. The analyses
described in this thesis led to the discovery of widespread antisense transcription,
generated large scale kinetic data on RNA processing, and provided evidence supporting
the generalization of the current model for RNA decay. The empirical mapping of
transcriptional boundaries was also explored, and its utility in transcriptome mapping was
later demonstrated in eukaryotic expression studies (Kapranov et al. 2002; Shoemaker et
al. 2001).
84
Functional genomic data provide abundant raw material for hypothesis generation
at both the single gene and genome wide level. Specific hypotheses can be tested either
with traditional approaches, or by independent and/or more refined large scale
measurements. With the increasing amount of quantitative data, hypotheses are beginning
to include computational models. These models can be retested as new datasets become
available, and refined as necessary. Computational modeling is slowly becoming more
accessible to the average biologist and may eventually become as widespread and useful
as tools like BLAST (Altschul et al. 1990). The generation, in recent years, of astounding
quantities of diverse biological information, combined with the ever increasing power of
computers, heralds a new age in the understanding, and perhaps engineering, of
biological systems.
References
Altschul, S.F., W. Gish, W. Miller, E.W. Myers, and D.J. Lipman. 1990. Basic local
alignment search tool. J Mol Biol 215: 403-410.
Apirion, D. 1973. Degradation of RNA in Escherichia coli. A hypothesis. Mol Gen Genet
122: 313-322.
Argaman, L., R. Hershberg, J. Vogel, G. Bejerano, E.G. Wagner, H. Margalit, and S.
Altuvia. 2001. Novel small RNA-encoding genes in the intergenic regions of
Escherichia coli. Curr Biol 11: 941-950.
Arraiano, C.M., A.A. Cruz, and S.R. Kushner. 1997. Analysis of the in vivo decay of the
Escherichia coli dicistronic pyrF-orfF transcript: evidence for multiple
degradation pathways. J Mol Biol 268: 261-272.
Bernstein, J.A., A.B. Khodursky, P.H. Lin, S. Lin-Chao, and S.N. Cohen. 2002. Global
analysis of mRNA decay and abundance in Escherichia coli at single-gene
resolution using two-color fluorescent DNA microarrays. Proc Natl Acad Sci U S
A 99: 9697-9702.
Cannistraro, V.J. and D. Kennell. 1985. Evidence that the 5' end of lac mRNA starts to
decay as soon as it is synthesized. J Bacteriol 161: 820-822.
Cannistraro, V.J., M.N. Subbarao, and D. Kennell. 1986. Specific endonucleolytic
cleavage sites for decay of Escherichia coli mRNA. J Mol Biol 192: 257-274.
Couzin, J. 2002. Breakthrough of the year. Small RNAs make big splash. Science 298:
2296-2297.
85
Eddy, S.R. 2001. Non-coding RNA genes and the modern RNA world. Nat Rev Genet 2:
919-929.
Emory, S.A., P. Bouvet, and J.G. Belasco. 1992. A 5'-terminal stem-loop structure can
stabilize mRNA in Escherichia coli. Genes Dev 6: 135-148.
Ghosh, S. and M.P. Deutscher. 1999. Oligoribonuclease is an essential component of the
mRNA decay pathway. Proc Natl Acad Sci U S A 96: 4372-4377.
Grunberg-Manago, M. 1999. Messenger RNA stability and its role in control of gene
expression in bacteria and phages. Annu Rev Genet 33: 193-227.
Jacob, F. and J. Monod. 1961. Genetic regulatory mechanisms in the synthesis of
proteins. J Mol Biol 3: 318-356.
Kapranov, P., S.E. Cawley, J. Drenkow, S. Bekiranov, R.L. Strausberg, S.P. Fodor, and
T.R. Gingeras. 2002. Large-scale transcriptional activity in chromosomes 21 and
22. Science 296: 916-919.
Kennell, D. 2002. Processing endoribonucleases and mRNA degradation in bacteria. J
Bacteriol 184: 4645-4657; discussion 4665.
Kushner, S.R. 2002. mRNA decay in Escherichia coli comes of age. J Bacteriol 184:
4658-4665; discussion 4657.
Regnier, P. and C.M. Arraiano. 2000. Degradation of mRNA in bacteria: emergence of
ubiquitous features. Bioessays 22: 235-244.
Selinger, D.W., K.J. Cheung, R. Mei, E.M. Johansson, C.S. Richmond, F.R. Blattner,
D.J. Lockhart, and G.M. Church. 2000. RNA expression analysis using a 30 base
pair resolution Escherichia coli genome array. Nat Biotechnol 18: 1262-1268.
Shoemaker, D.D., E.E. Schadt, C.D. Armour, Y.D. He, P. Garrett-Engele, P.D.
McDonagh, P.M. Loerch, A. Leonardson, P.Y. Lum, G. Cavet, L.F. Wu, S.J.
Altschuler, S. Edwards, J. King, J.S. Tsang, G. Schimmack, J.M. Schelter, J.
Koch, M. Ziman, M.J. Marton, B. Li, P. Cundiff, T. Ward, J. Castle, M.
Krolewski, M.R. Meyer, M. Mao, J. Burchard, M.J. Kidd, H. Dai, J.W. Phillips,
P.S. Linsley, R. Stoughton, S. Scherer, and M.S. Boguski. 2001. Experimental
annotation of the human genome using microarray technology. Nature 409: 922927.
von Gabain, A., J.G. Belasco, J.L. Schottel, A.C. Chang, and S.N. Cohen. 1983. Decay of
mRNA in Escherichia coli: investigation of the fate of specific segments of
transcripts. Proc Natl Acad Sci U S A 80: 653-657.
Wassarman, K.M., F. Repoila, C. Rosenow, G. Storz, and S. Gottesman. 2001.
Identification of novel small RNAs using comparative genomics and microarrays.
Genes Dev 15: 1637-1651.
Wassarman, K.M., A. Zhang, and G. Storz. 1999. Small RNAs in Escherichia coli.
Trends Microbiol 7: 37-45.
86
Appendix A
RNA expression analysis using a 30 base pair resolution
Escherichia coli genome array
Douglas W. Selinger, Kevin J. Cheung, Rui Mei, Eric M. Johansson, Craig S. Richmond,
Frederick R. Blattner, David J. Lockhart, and George M. Church
Original publication format, Nature Biotechnology 18(12): 1262-68 (2000).
87
88
89
90
91
92
93
94
Appendix B
Genome Array Processing Software (GAPS)
95
GAPS
©
Genome Array Processing Software©
Developed by Doug Selinger (selinger@fas.harvard.edu)
Thesis Advisor: George M. Church, Ph.D.
Harvard Medical School, Dept. of Genetics
last updated March 2001
This package is designed for the analysis of Affymetrix E. coli oligonucleotide
arrays as described in Selinger et al, Nature Biotechnology 18:1262-8 (2000). It
can be used in conjunction with Affymetrix's GeneChip© software package and
extends its functionality in several ways:
- More flexibility in the way the analysis is done
- An algorithm which corrects for varying signal in different regions of the chip
- High resolution analysis, allowing data to be analyzed oligo by oligo
- Handles replicates and multiple timepoints
- Automatic annotation of results
- All output files can easily be opened and analyzed in Excel
- Detection of transcripts is decided by a simple statistical test
We are grateful for the tremendous foresight that Affymetrix has shown in
permitting the public release of all of the E. coli chip oligo sequences. This has
allowed expression to be monitored at an unprecedented resolution, with on
average one probe every 30 bases throughout the entire genome. We feel that
high resolution expression profiling is the logical next step in the evolution of high
density DNA array experiments and we thank Affymetrix for making it possible.
Note to users: This package is not an Affymetrix product, and therefore their
technical help line will not answer questions about it. All questions should be
directed to Doug Selinger. I have devoted considerable effort to making this
package usable by the general biological community and its use does not require
any knowledge of programming. However, its proper use does require thorough
reading of this manual, as it is not the type of program that can be figured out as
you go along. To be able to support this package without it becoming my fulltime occupation I am setting some ground rules on the questions I will respond
to. I will not answer:
i) questions which I feel are already answered clearly in this manual
96
ii) questions about Perl (setup or use), Excel, or other programs which
might be used in conjunction with GAPS©, or
iii) general computer questions not specifically related to the use of
GAPS©
All other comments and questions are welcome.
I. Outline of package components
preGAPS© - This script takes a .CEL file output from GeneChip© and outputs a
file (.adf) in which the following processing has been done:
- background is subtracted
- mismatch (MM) features are subtracted from perfect match (PM) features
- the result is multiplied by a scaling factor to correct for differences between
chips
- the result is multiplied by a correction factor for varying fluorescence within the
same chip. (This typically varies 2-3 fold across the chip surface.)
Note: The background and scaling factors must be entered by the user and can
be taken from a GeneChip© analysis or by another method.
GAPS© - This script is the heart of the package and takes a set of .adf files
(see above) and averages those which are replicates and compares multiple
conditions/timepoints. It has two separate analysis modes which can be used
together or separately:
- ORF summary: Each predicted ORF is assigned a single fluorescence
intensity and these are reported for each condition along with annotation.
- High resolution analysis: The intensity of every oligo is reported across
all chips and all conditions. This is output as two large tab-delimited files
(one for the Watson strand and one for the Crick) which can be opened in
a text editor, and portions copied into Excel. Alternatively, portions of
these files can be returned by genome position using GAPScan © (see
below).
GAPScan© - This script is used to analyze sections of high resolution files
output by GAPS©. The user enters the region of the genome and the desired
strand, and the program returns all oligos and predicted ORFS in that range
along with all associated data. A high resolution plot of the oligos along the
genome can then be generated with Excel.
II. Getting Started
System requirements
97
- Windows, Linux, or Macintosh with at least 128 MB of physical RAM (256 MB
recommended). The large memory requirement is due to the large amount of
data which needs to be analyzed - each chip has almost 300,000 oligos.
- Perl needs to be installed on the user's machine before these scripts can be
used. There are versions of Perl available free for all major platforms:
Windows, Linux, Solaris - ActivePerl
http://www.activestate.com/Products/ActivePerl/Download.html
Macintosh - MacPerl
http://www.macperl.com/
For those unfamiliar with Perl, it is a relatively simple to use but powerful
programming language which can be used on any platform. I have provided the
source code (since Perl is runtime-compiled) which you are free to modify. At the
beginning of each script there is a portion of code which contain the userdefineable parameters. In most cases, these must be modified with user specific
information, such as file locations, etc. I have tried to make this portion of the
program well-annotated so that non-programmers will be comfortable modifying
these parameters. They can be modified in any text editor, as long as the
resulting file is saved as plain text and ends in .pl so it is recognized as a Perl
script. (I use the EditPlus text editor (for Windows) and illustrations in this
manual will be screen-shots from that program, using its default color-coding
settings.)
III. PreGAPS©
Command line:
program_name filename background scalingfactor chiptype
Example:
pregaps.pl my_chip.CEL 100 2 s
where:
program_name = pregaps.pl
filename = my_first_chip.CEL (the .CEL file created by GeneChip©, placed in the
"rawdata" folder.)
background = 100 units
scaling factor = 2
chip type = "s" or blank for sense, "a" for antisense
Chip type: Currently Affymetrix only sells "sense" chips. A "sense" chip contains
oligos which will bind to mRNA. An "antisense" chip contains oligos which will
hybridize to the complement of mRNA, such as cDNA.
98
Be sure that .CEL files are placed in the "rawdata" folder (and not in a subfolder) so that preGAPS© can find them. The .CEL files used in Selinger et al,
Nature Biotechnology 2000 can be found on our web site. GAPS© is set up to do
an analysis of these files as a demonstration. Normalization parameters for
preGAPS© are provided in the "ref" folder.
The background and scaling factors can be derived from a .CHP file generated
with GeneChip©. The .CHP file was saved as a text file and then opened in
Word. The background (underlined, highlighted in red) and scaling factor
(underlined, highlighted in blue) are at the top of the file.
These parameters can also be found by opening the .CHP file in GeneChip © and
going to View -> Probe Info (which will include the background) and then View ->
Parameters (which will include the scaling factor, SF).
Output
This program will output the .CEL file with the added extension .adf (which
stands for "adjusted difference file"). These .adf files are needed as input for
GAPS©. The following is an example of a .adf file:
99
At the top of the file are the chip type, background, and scaling factor values that
were used by PreGAPS©. The columns tell the x and y coordinates of each PM
feature and the fluoresecence intensity values, after background subtraction, of
the PM and MM features. The Difference, with a small rounding error, is found
by the following equation: (PM-MM) * scaling factor * correction factor. The
correction factor is dependent on the intensity of local control features and
corrects for uneven fluorescence intensity across the surface of the chip,
explained in the following section.
Spatial Correction
The array contains a regularly spaced 10 x 10 grid of control feature pairs which
all hybridize to the same control oligonucleotide, and should thus be of equal
intensity. However, we found that fluorescence intensity of these features
typically varied about 2-3 fold across the surface of the array, possibly because
of local differences in washing/staining efficiencies. The following is a graph of
the variation in the signal from these control features across the surface of the
chip for two log phase replicates (Note: the data to create these graphs is
automatically generated and can be found in the control_grid folder with a
_controls suffix):
100
Log Phase 1
Log Phase 2
3000
22000
20000
2500
18000
2000
14000
Intensity
12000
10000
8000
2000
500
200
400
300
0
100
200
Y
100
600
500
400
300
500
X
4000
0
1000
600
500
400
300
6000
1500
0
500
200
400
300
X
Intensity
16000
100
200
0
Y
100
0
0
To correct for this spatial variation, the control grid was used to estimate local
deviations in florescence intensity. First, each pair of controls were averaged.
Then experimental features were multiplied by a correction factor which is
derived from control features representing the relative brightness of the region.
Control features closer to the probe pair contributed more to the final correction
factor than distant ones. This correction factor was determined by the following
equation:
c
Correction Factor =
 1 


4
 di 

 4 1 ci
i 1
  
 j 1 dj 
where di or j is the Euclidean distance from the PM feature to the 4 closest
control features, ci is the intensity of control feature i, and c is the mean of
all control features on the array.
In the Log Phase 1 vs Log Phase 2 example above, where the control features
showed a large amount of variation between arrays, the correction factor reduced
the average coefficient of variation (coefficient of variation = standard deviation /
mean) from 0.66 to 0.28. In Stationary Phase 1 vs. Stationary Phase 2, where
the control features showed a similar pattern between arrays, there was a
probably insignificant reduction from 0.53 to 0.52. These numbers were
calculated from the 2max of ORF probe sets, which are located on the top half of
the arrays.
101
IV. GAPS©
Command line: program_name output_file_name
Example: gaps.pl my_analysis
where:
program_name = gaps.pl
output_file_name = my_analysis
Here are the user-defineable parameters, named "parameters.txt" in the "ref"
folder:
############# List of files to be processed
$input_file_list = "input_LPvsSPvsgDNA";

This points GAPS© to a file which contains all of the names and paths
of pre-GAPS© files to be used in the analysis. This is where the user
assigns pre-GAPS© files to timepoints/conditions. The format for this
file is as follows:
header <tab> timepoint 1 name <cr>
filename 1 <cr>
filename 2 <cr>
header <tab> timepoint 2 name <cr>
filename 3 <cr>
filename 4 <cr>
where <tab> is the tab key, <cr> is carriage return, and the word "header"
is typed literally, i.e. not substituted by a name.
Example
This file is provided in the "ref" folder and is setup to analyze the .CEL files
from Selinger et al Nature Biotechnology 2000 after they have been
processed by preGAPS©.
header
LP
pdata/pregaps_output_files/log1.CEL.adf
pdata/pregaps_output_files/log2.CEL.adf
header
SP
pdata/pregaps_output_files/stat1.CEL.adf
pdata/pregaps_output_files/stat2.CEL.adf
header
gDNA
pdata/pregaps_output_files/gDNA.CEL.adf
Note there should be no extra lines or carriage returns at the beginning or
end of the file and the file should be plain text only. The example
102
reference file tells GAPS© that the first condition has two pre-GAPS© files
(replicates) which are both for the condition named LP (log phase). The
second condition contains two replicates for the condition named SP
(stationary phase) and the last contains only a single pre-GAPS© file
named gDNA (genomic DNA). The number of conditions and chips
GAPS© can handle is limited only by available RAM. A rule of thumb for
memory requirements is ~30 MB/chip. Many conditions requires more
memory than many replicates.
############# Choice of Analyses
$orf_summary = 1;
Generate summary report of ORFs? 1 = yes, 0 = no. (Default = 1) This
tells GAPS© whether or not to create a report describing changes in
relative ORF abundances.
$hires_file = 1;
Generate high resolution file? 1 = yes, 0 = no. (Default = 1) This tells
GAPS© whether or not to run an oligo by oligo analysis. This allows the
user to later use GAPScan© to analyze expression results at high
resolution. Two large (usually > 5 MB) are generated, one for the watson
strand and one for the crick strand. These files will not be affected by
changing ORF summary parameters (see below) so high resolution
analysis only needs to be done once per dataset, even if multiple ORF
summary analyses are done.
############# ORF Summary Parameters:
$cutoff_sdevs = 3;
the number of standard deviations above the negative controls to consider
a transcript present (Default = 3). The negative controls are any B. subtilis
control RNAs which were NOT spiked in by the user. The user must tell
GAPS© which control RNAs were not spiked by appropriately setting the
$neg_start and $neg_end parameters (see above).
$rank_report = 2;
$last_rank = 2;
the range of the ranked probe pairs which should represent the probe set
(Default: $rank_report = 2, $last_rank = 2). The fifteen probe pairs are
ranked from brightest (1) to dimmest (15) and a subset of these is used to
measure the abundance of the transcript. Setting both to 2 gives the
"2max" (second maximal) and setting both to 8 gives the median. A range
of ranks can also be used, for example setting them to 2 and 14 will give
the mean of the 2nd to 14th ranks, "2-14max". 2-14max is analagous to
Affymetrix's "average difference" measurement. 2-8max also works well
as it takes the upper half of the signal distribution. Whichever metric is
used is also applied to the negative controls to measure the amount of
signal which that metric can be expected to give at random. This is then
103
used to call the transcripts present or absent, according to the standard
deviation cutoff defined in $cutoff_sdevs above.
$use_ttest = 1;
"1" uses t-test to assign significance to changes, "0" doesn't (uses change
cutoff regardless)(Default = 1) A t-test for significant changes can only be
done when a single probe pair is used, such as 2max or median.
$c_value = 4.303;
critical value for t-test (decides significant changes) With 2 degrees of
freedom use 4.303 for 95% confidence. The degrees of freedom, and
thus the critical value, will change depending on the number of replicates
of each condition/timepoint. Consult a critical value table for a Student's ttest.
$change_cutoff = 11;
number of probe pairs which must be different (by any amount) for
change to be considered significant (Default = 11). The fifteen
ranked probe pairs are compared between two conditions, rank 1 to
rank 1, rank 2 to rank 2, etc., regardless of whether they are
actually the same probe pair (although they usually are). If this is
set at 11, then 11 of these must be larger or smaller in one
condition for the change to be considered significant. Using a
$change_cutoff of 11, the algorithm correctly assigned significant
changes to 52/52 of probe sets for control RNAs spiked at known
concentration changes, all of which had fold changes of at least 2fold. Probe sets for control RNAs spiked at equal concentrations
showed no significant changes (0/16).
############ ORF Summary Parameters not needed for GAPS©
version 1.1 or higher. See important note below*.
$neg_start = 17;
$neg_end = 20;
These parameters define which probe sets to use as negative controls.
These MUST correspond to control B. subtilis probe sets for which NO
RNA was spiked. These probe sets allow GAPS© to decide the
distribution of signal expected when there is no RNA present. All signal at
the negative controls is assumed to be nonspecific and is used as a
baseline to determine which transcripts have specific signal and should
thus be called "present." There are 20 control probe sets corresponding
to 5 different B. subtilis RNAs. 1-4 are dap, 5-8 are lys, 9-12 are phe, 1316 are thr, and 17-20 are trp. If no control RNAs were spiked set
$neg_start = 1 and $neg_end = 20. If all but trp RNA were spiked set
$neg_start = 17 and $neg_end = 20. Currently the choices need to be
consecutive so that you cannot use dap and trp but not lys as negative
104
controls. The user must be certain to correctly define the negative
controls, or the ORF summary analysis will be meaningless.
*Version 1.1 of GAPS© (GAPS1-1.pl) uses 90 control probe sets which are
present on the E. coli chip but not in the standard MG1655 genome. These
include genes from bacteriophage lambda and several plasmid encoded genes.
These probe sets are located from coordinates 421,81 to 152,87. Please check
that this region of the chip doesn't contain any genes which you expect to be
present in your experiment. If it does, we recommend using GAPS© version 1.0
(gaps.pl). This change was made so that more negative controls can be included
(90 vs. a maximum of 20) and to make the negative controls independent of the
B. subtilis controls, which are then freed for use as positive controls and/or for
spiking experiments. These two parameters ($neg_start and $neg_end) are
ignored by GAPS© version 1.1 (but absolutely neccessary for GAPS© version
1.0).
GAPS© Output Files
A. ORF Summary Output
There are two output files produced by the ORF summary analysis. These are
identified by the file suffixes:
_rpt = These are the report files which contain a summary of how all transcripts
behaved in the experiments.
_info = A summary of the negative controls, including their mean, standard
deviation, the resulting detection cutoff, number of probe sets used, and the
metric used (such as rank 2 or ranks 2 to 14).
The following is an example of the _rpt output from the ORF summary analysis
opened in Excel:
105
This is data from a comparison of log vs. stationary phase, sorted by bnumber.
All data is output as tab-delimted text files which are easily imported into Excel.
We'll step through this data file column by column. Some of the column headers
are cut off due to space constraints.
Bnumber: The unique number assigned to all ORFs in the Blattner annotations.
Gene: The common name of the ORF according to the Blattner annotations.
LP Intensity: The fluorescence intensity of condition "LP" (log phase) using the
metric selected in the parameters (discussed above). The fluorescence of the
negative controls is subtracted from this number, so that numbers greater than 0
are brighter than the negative controls. The same summary metric is used for
the negative controls as for the ORFs so they can be compared. For example, if
GAPS© is set to average probe pairs of ranks 2 - 14, it will do the same for the
negative controls.
LP sdev*: The standard deviation of LP replicates. If the condition contains only
one chip, this will be zero.
LP detection: Whether the transcript is considered present (P) or absent (A). In
order to be considered present the intensity must be a certain number of
106
standard deviations (as defined by the $cutoff_sdevs parameter) above the
negative controls.
SP Intensity, sdev, and detection: Same as above but for the condition labeled
SP (stationary phase).
The following columns are comparisons to the first condition (LP) which is taken
as the baseline condition.
SP absolute change: SP Intensity - LP Intensity
SP fold change sig: If the transcript is present in both conditions, this will be
blank. If it is absent in one of the conditions it will have a greater-than (>) or
less-than (<) sign indicating that the fold change in the following column is
expected to be an over- or under-estimate, respectively. The estimate is made
by replacing the intensity of the absent transcript by the intensity of the negative
controls + (the standard deviation * the number of standard deviations specified
in the $cutoff_sdevs parameter). If both transcripts are absent, it will be scored
"not determined" (ND).
SP fold change: SP/LP. When the transcript is absent in one condition this will
be an estimate (see above), and if absent in both conditions it will be scored "not
determine" (ND).
SP t-test*: The t statistic from a Student's t-test comparing the difference
between the SP and LP intensities. This will only be meaningful when SP and
LP are done in replicate.
SP call: This scores whether the transcript has changed and whether this
change is considered significant according to the user's paramters. The possible
scores are increased (I), significantly increased (SI), decreased (D), significantly
decreased (SD), or not determined (ND).
This column deserves a little more elaboration. First of all, for the change to be
considered for significance it must have been called present in at least one
condition. Otherwise, it is scored ND. If it has been detected in at least one
condition the change will be called significant if it meets at least one of the
following two criteria: i) It passes a Student's t-test using the critical value
supplied in the $c_value parameter. This test will only be performed if the
$use_ttest paramter is set to 1. ii) When the the probes at each rank are
compared, at least a certain number of probes (defined in the $change_cutoff
parameter) are greater in one condition than in the other, by any amount.
RNA type: Whether the RNA represents a coding sequence (CDS), ribosomal
RNA (rRNA), transfer RNA (tRNA), or miscellaneous RNA (misc_RNA).
107
Length (bases): Length of the RNA in bases.
Annotation: According to the Blattner annotations.
*This column will not be present if the summary metric chosen is an average of
multiple ranks.
GAPS© can handle any number of conditions, as long as there is enough
available RAM. Additional columns, of the same types as those above, will be
added automatically and all comparisons are done to the first condition, which is
considered the baseline.
B. High Resolution Analysis
Note that high resolution output files contain oligo sequences which are
copyrighted by Affymetrix and their use is subject to certain terms and conditions.
This analysis generates two high resolution files:
_wat = The Watson strand
_crk = The Crick strand
_neg = A summary of the negative control features
These files contain all of the oligos on the chip, in the order they are found in the
genome (5' to 3' for Watson, 3' to 5' for Crick), and all of the data for each oligo.
For this reason these files tend to be large and cannot be opened in Excel.
However, it can be opened in a text editor and regions of interest can be selected
and pasted into Excel. GAPScan©, the last script in the GAPS© package, can
select regions of interest automatically.
The _neg file gives detailed info on the negative controls, oligo by oligo. Gives
the chip position (x,y coordinates), PM-MM for each chip, the average of
replicates, and standard deviation. Then, for each condition, it reports the mean
and standard deviation of all of the negative control probe pairs on the chip. If
there are replicates then this is the mean of all the probe pair averages.
The _wat and _crk files give a high resolution oligo by oligo analysis of
transcription. Here is an example of a section of the Watson strand:
108
- The "Name" column is the name given to the probe set by GAPS ©. In this case
the first one is ig_ds_b4403 which means this intergenic region (ig_) is
downstream (ds_) of b4403, the last predicted ORF. For Crick files the intergenic
annotation denotes the gene the region is upstream of.
- The "Affy_ID" column is the name given to this probe set by Affymetrix so that it
can be found with Affymetrix's GeneChip© software.
- The "Oligo" column has the oligo sequence.
- Then follow the x and y coordinates of the feature on the chip, and the 5' and 3'
ends of the ORF in the genome, and the position of the center of the oligo in the
genome.
- Then follow the intensities of each oligo on each chip (LP1, LP2, etc), the mean
of replicates (LP Average), number of standard deviations above the negative
controls (LP sig - note that the data on the negative controls can be found in the
_neg file), the standard deviation of the LP duplicates (LP SD) and finally
markers for the 5' and 3' end of each ORF to make it easier to include the ORF
boundaries in a plot of the oligo data. The following is a sample graph generated
in Excel with the above data:
109
b0001 - Threonine Leader Peptide
700000
600000
Intensity (PM-MM)
500000
400000
LP Average
SP Average
ORF 5'
ORF 3'
300000
200000
100000
0
0
50
100
150
200
250
300
350
-100000
Genome Position
V. GAPScan©
GAPScan© is a "reader" for the high resolution files which are output from
GAPS©. It was used to select and organize the data for the above graph. The
user specifies the high resolution file to use, the region of the genome to report,
and the strand and GAPScan© searches through the files and returns all
corresponding oligos and data.
The command line format is as follows:
gapscan.pl datafile locusname ORF_start ORF_end strand
Example
gapscan.pl LPvsSP my_fav_locus 100 2000 w
Where the datafile is the base name of the high resolution file generated by
GAPS© (leave off the "_crk.txt" or "_wat.txt).
This will run the program GAPScan.pl on the LPvsSP high resolution analysis file
and output the information from the Watson strand between genome positions
100 and 2000 to a file named "my_favorite_locus". If the strand is left out, the
program outputs both strands for the specified region.
110
The program will work for queries which begin and end on opposite sides of the
origin, but will give warnings about how the resulting output may differ from other
queries, such as the file being out of order.
A datafile listing predicted operons and their locations is provided in the "ref"
folder for the user's convenience. This file is a list of operons predicted by Julio
Collado-Vides and colleagues (in modified form). The most current list can be
found at RegulonDB:
http://tula.cifn.unam.mx:8850/regulondb/regulon_intro.frameset
Reference
Salgado, et al. RegulonDB (version 3.0): transcriptional regulation and operon
organization in Escherichiacoli K-12. Nucleic Acids Res. 2000 Jan 1;28(1):65-67.
Acknowledgements
I thank George Church for giving me the opportunity to work with such an
exciting emerging technology, Jeremy Edwards, Dan Janse, and Vasudeo
Badarinarayana for being the first users of my software and suggesting valuable
features which have since been included, Adnan Derti for BLASTing all the oligos
on the chip against the genome and providing me with the results, and the rest of
the Church lab for the support and advice which greatly aided the development of
these software tools.
We thank Affymetrix for supplying us with their technology before it was publicly
available and for their foresightedness in allowing release of the E. coli chip oligo
sequences, which make possible this software package, as well as many future
analyses. We hope Affymetrix will continue in this spirit of openness by keeping
.CEL files openly readable and by considering the release of more chip oligo
sequences so that researchers may get the fullest possible benefit from their
data.
111
Appendix C
Measured half-lives of 2,679 Escherichia coli mRNAs
112
Half-life calculation by non-linear least squares curvefitting
In order to improve the statistical rigor and accuracy of the half-life calculation,
the data for each RNA was fitted to an exponential function (of the form A = A0ekt) using
a non-linear least squares algorithm implented in MATLAB (function nlinfit in the
statistics toolbox which uses the Gauss-Newton method). The function nlparci was
then used to estimate 95% confidence intervals for the two parameters (A0,low, klow, A0,hi,
and khi) which were then used to calculate upper and lower bounds of the half lives.
Half-lives were calculated for 2,679 RNAs for which A0,low was positive (95%
confidence of detection) and khi was negative (95% confidence of decrease). Half-lives
are reported in minutes. 907 half-lives calculated by both methods are plotted in figure 1.
Transcripts which fall along the line fit an exponential degradation pattern,
whereas those that fall off the line do not. Those above the line degrade more slowly at
the beginning of the timecourse (over their first 2-fold change) and more quickly later on.
Those below the line have the opposite pattern.
Exponential Fit vs. 2-Fold Method
40
35
2-fold half-life (min)
30
slope = 1
25
20
15
10
5
R = 0.57
0
0
5
10
15
20
25
Exponential fit half-life (min)
Figure. 1
113
30
35
40
b0074
b0076
b0077
b0078
b0080
b0081
b0082
b0083
b0084
b0085
b0086
b0087
b0088
b0089
b0090
b0091
b0092
b0093
b0095
b0096
b0097
b0098
b0100
b0102
b0103
b0104
b0105
b0106
b0109
b0110
b0111
b0112
b0114
b0115
b0116
b0118
b0119
b0120
b0121
b0122
b0123
b0125
b0126
b0127
b0128
b0129
b0130
b0131
b0132
b0133
b0134
b0143
Measured half lives for 2,679 E. coli
mRNAs, with best estimate and 95%
confidence upper and lower bounds,
respectively.
B#
b0002
b0003
b0008
b0009
b0014
b0015
b0016
b0019
b0020
b0022
b0023
b0025
b0026
b0027
b0028
b0030
b0031
b0036
b0037
b0038
b0039
b0043
b0045
b0048
b0049
b0050
b0051
b0052
b0053
b0054
b0055
b0056
b0057
b0058
b0059
b0062
b0063
b0064
b0065
b0067
b0069
b0070
b0071
b0072
b0073
Gene
thrA
thrB
talB
mog
dnaK
dnaJ
yi81_1
nhaA
nhaR
insA_1
rpsT
ribF
ileS
lspA
slpA
yaaF
dapB
caiD
caiC
caiB
caiA
fixC
yaaU
folA
apaH
apaG
ksgA
pdxA
surA
imp
yabH
yabP
yabQ
yabO
hepA
araA
araB
araC
yabI
yabK
yabN
yabM
leuD
leuC
leuB
HL HL min HL max
5.8
3.3
23.7
5.1
2.6
237.2
11.4
9.4
14.4
7.6
4.9
16.3
15.4
8.7
63.2
13.1
8.6
27.3
13.4
7.6
58.6
8.3
5.5
16.4
6.6
3.6
42.5
10.2
7.0
18.8
10.1
5.5
56.3
13.8
8.2
43.2
9.6
5.4
43.7
12.6
7.0
61.6
13.3
7.2
92.7
11.5
6.5
51.1
13.3
6.8
211.2
17.8
8.9 1641.7
19.5
10.9
92.1
8.7
6.1
15.7
2.5
1.5
8.5
10.5
5.9
43.8
15.1
9.1
43.2
2.4
1.4
7.0
10.1
5.7
43.2
7.3
5.7
10.4
7.1
5.4
10.5
6.7
3.7
31.1
7.5
6.0
10.0
6.4
4.4
11.6
7.6
5.6
11.9
8.7
6.3
13.8
11.4
7.2
28.1
16.2
9.6
50.9
9.9
6.8
18.4
20.5
11.1
132.7
12.2
7.4
35.7
11.4
7.0
31.1
11.6
6.6
46.0
7.8
4.3
39.1
9.4
5.7
26.6
2.5
1.5
7.3
8.7
5.1
27.5
11.8
7.2
33.8
13.9
9.3
27.6
114
leuA
leuO
ilvI
ilvH
fruR
yabB
yabC
ftsL
ftsI
murE
murF
mraY
murD
ftsW
murG
murC
ddlB
ftsQ
ftsZ
lpxC
yacA
secA
yacF
yacE
guaC
hofC
nadC
ampD
ampE
aroP
aceE
aceF
lpdA
acnB
yacL
speD
speE
yacC
yacK
hpt
yadF
yadG
yadH
yadI
yadE
panD
yadD
panC
panB
pcnB
13.4
9.1
5.5
2.7
3.9
3.1
4.4
6.1
6.3
10.3
11.5
11.2
13.6
9.8
11.6
10.0
7.9
9.0
11.3
9.0
8.5
12.7
12.2
8.1
7.4
15.3
11.0
8.2
7.8
3.0
7.4
12.2
11.7
15.0
12.2
9.0
2.6
12.9
6.0
5.1
16.9
11.1
6.3
11.4
11.5
4.0
14.9
5.6
3.7
10.6
5.0
8.8
8.4
5.9
3.8
1.7
3.1
2.4
3.4
4.8
5.0
7.2
7.9
7.6
8.8
6.5
7.7
6.9
5.3
5.6
6.2
6.1
4.4
7.5
6.6
6.0
4.7
10.6
5.7
5.7
5.5
2.6
5.6
8.5
7.8
8.2
8.4
7.4
1.9
9.1
4.2
4.0
10.3
6.3
4.3
8.1
7.5
3.0
8.8
4.1
2.6
7.2
3.9
5.7
33.0
19.9
9.8
5.8
5.3
4.3
6.0
8.4
8.6
18.1
21.3
21.7
29.7
19.8
23.6
18.4
16.0
23.9
59.8
16.8
89.2
42.1
73.9
12.5
17.2
27.6
153.9
14.3
13.4
3.6
11.0
21.3
23.9
91.5
22.0
11.6
4.0
21.7
10.3
7.3
47.5
49.1
12.3
19.2
24.1
5.9
49.5
8.6
6.7
19.8
7.0
19.2
b0145
b0146
b0147
b0148
b0149
b0150
b0151
b0152
b0153
b0154
b0155
b0156
b0159
b0162
b0163
b0164
b0165
b0166
b0167
b0168
b0169
b0170
b0171
b0172
b0173
b0174
b0175
b0176
b0177
b0178
b0179
b0180
b0181
b0182
b0183
b0184
b0185
b0186
b0187
b0188
b0190
b0191
b0192
b0194
b0195
b0196
b0197
b0198
b0200
b0207
b0208
b0209
dksA
sfsA
yadP
hrpB
mrcB
fhuA
fhuC
fhuD
fhuB
hemL
yadQ
yadR
pfs
yaeG
yaeH
yaeI
dapD
glnD
map
rpsB
tsf
pyrH
frr
yaeM
yaeS
cdsA
yaeL
yaeT
hlpA
lpxD
fabZ
lpxA
lpxB
rnhB
dnaE
accA
ldcC
yaeR
mesJ
yaeQ
yaeJ
cutF
proS
yaeB
rcsF
yaeC
yaeE
yaeD
yafB
yafC
yafD
7.5
5.6
3.6
6.9
14.5
8.3
9.4
11.2
13.7
9.5
5.5
6.5
7.6
1.9
10.6
8.4
7.9
7.9
7.5
6.5
8.9
9.8
7.2
7.3
13.3
8.0
6.4
9.1
10.0
17.5
9.8
8.4
12.7
7.7
15.0
10.0
12.6
4.7
4.8
11.9
7.1
7.3
6.1
10.0
12.1
5.7
8.1
6.5
9.0
10.5
4.0
2.6
5.9
3.9
2.5
5.1
10.4
5.0
6.3
6.1
8.0
5.2
4.2
4.4
5.3
1.1
6.8
5.4
5.4
5.9
5.8
5.4
7.0
7.6
6.0
5.4
8.1
5.7
4.4
6.8
8.1
10.2
6.8
6.5
8.8
4.9
9.6
7.6
8.8
3.7
3.7
8.0
5.6
5.3
4.8
6.9
7.2
3.7
4.1
3.4
6.9
6.5
2.1
2.0
10.5
10.3
6.8
10.7
23.7
24.9
18.0
66.4
48.5
57.6
7.9
12.1
13.8
6.4
24.5
19.3
14.4
12.0
10.7
8.4
12.1
13.9
9.2
11.4
36.9
13.5
11.5
13.6
13.1
58.8
17.7
12.1
22.4
17.6
33.6
14.6
22.2
6.6
6.8
22.9
9.8
11.8
8.3
18.3
38.8
12.2
275.3
67.8
12.8
28.6
121.9
3.8
b0210
b0212
b0213
b0214
b0215
b0219
b0220
b0222
b0223
b0224
b0225
b0226
b0227
b0228
b0231
b0232
b0234
b0235
b0237
b0238
b0239
b0240
b0241
b0242
b0243
b0249
b0250
b0251
b0254
b0255
b0257
b0258
b0260
b0261
b0265
b0267
b0268
b0269
b0271
b0275
b0276
b0280
b0281
b0287
b0288
b0300
b0304
b0305
b0306
b0307
b0308
b0311
115
yafE
gloB
yafS
rnhA
dnaQ
yafV
ykfE
gmhA
yafJ
yafK
yafQ
dinJ
yafL
yafM
dinP
yafN
yafP
pepD
gpt
yafA
crl
phoE
proB
proA
ykfF
ykfB
yafY
perR
yi91a
ykfC
ykfD
yagD
insA_2
yagA
yagE
yagF
yagH
insA_3
yagJ
yagN
intF
yagU
ykgJ
ykgA
ykgC
ykgD
ykgE
ykgF
ykgG
betA
9.3
8.7
6.1
6.5
5.3
2.1
9.5
5.2
7.9
6.0
3.0
2.4
8.4
3.9
4.4
1.9
12.6
10.7
9.9
8.4
4.4
9.0
14.1
7.0
5.0
18.3
4.8
5.6
10.0
10.7
8.9
9.3
7.9
1.7
9.7
12.3
4.5
12.3
15.4
9.7
5.4
4.7
8.1
3.5
9.7
10.9
12.0
4.9
4.8
4.9
6.2
4.3
6.1
5.7
4.4
4.7
3.5
1.1
6.8
3.9
6.1
4.1
2.2
1.8
5.3
2.8
2.9
1.3
7.5
6.2
7.2
5.1
2.5
4.8
8.2
4.7
3.4
10.8
3.4
3.0
6.8
5.7
6.5
4.7
5.3
1.2
6.9
7.7
2.9
7.7
8.0
6.9
3.3
2.7
5.2
2.6
5.3
6.6
6.4
3.6
3.5
3.8
5.2
2.2
20.0
17.9
9.9
10.6
10.5
22.3
15.5
7.8
10.9
11.1
5.0
3.5
19.2
6.3
9.4
3.4
40.2
37.9
15.9
23.1
19.7
78.6
51.9
13.5
9.6
61.2
8.0
39.3
18.7
82.7
14.3
359.1
15.9
2.7
16.6
30.3
9.7
29.4
231.1
16.6
14.6
18.9
17.8
5.3
56.6
30.3
105.3
7.4
7.5
7.1
7.9
57.9
b0313
b0314
b0315
b0318
b0319
b0320
b0322
b0325
b0328
b0329
b0331
b0332
b0333
b0335
b0338
b0342
b0343
b0344
b0346
b0347
b0349
b0352
b0353
b0354
b0356
b0357
b0358
b0362
b0366
b0369
b0371
b0376
b0380
b0381
b0382
b0384
b0385
b0386
b0387
b0388
b0389
b0390
b0391
b0393
b0394
b0395
b0397
b0398
b0399
b0401
b0402
b0403
betI
betT
yahA
yahD
yahE
yahF
yahH
yahK
yahN
yahO
prpB
prpC
prpE
cynR
lacA
lacY
lacZ
mhpR
mhpA
mhpC
mhpE
mhpT
yaiL
adhC
yaiN
yaiO
tauB
hemB
yaiT
yaiH
ddlA
yaiB
psiF
yaiC
proC
yaiI
aroL
yaiA
aroM
yaiE
yaiD
yajF
sbcC
sbcD
phoB
brnQ
proY
malZ
3.1
8.5
8.5
10.3
14.7
8.0
8.9
11.9
2.3
13.7
14.5
8.8
3.7
9.8
6.8
2.4
2.0
2.1
5.3
6.7
6.7
8.0
4.2
7.5
9.7
9.0
19.1
8.3
9.2
8.4
13.1
7.4
3.3
6.1
3.3
5.2
17.8
9.5
9.5
8.3
10.9
10.9
9.6
9.2
4.8
7.6
18.6
14.2
7.4
8.4
13.8
11.5
2.0
5.7
4.9
6.2
8.0
4.1
6.4
8.1
1.7
7.7
9.4
5.4
2.1
6.4
4.2
1.9
1.6
1.6
2.7
4.2
3.6
4.7
2.7
5.9
6.7
5.2
10.0
4.2
6.4
5.9
6.9
5.0
2.5
4.6
2.6
3.6
10.3
6.2
6.1
5.2
6.6
5.5
6.8
6.0
3.4
6.5
9.4
7.7
4.2
4.6
7.5
8.3
6.9
17.0
31.4
31.4
88.8
252.9
14.8
21.9
3.5
61.1
31.8
23.2
13.7
21.3
17.3
3.3
2.7
3.3
626.5
17.1
44.6
26.0
10.1
10.5
17.5
36.6
239.4
353.6
16.5
14.5
123.5
14.9
4.6
9.0
4.6
9.4
66.7
21.1
20.8
20.2
31.3
811.6
16.3
19.2
7.8
9.3
592.8
90.7
28.9
45.2
83.7
18.9
b0404
b0405
b0406
b0407
b0408
b0409
b0410
b0411
b0413
b0414
b0415
b0416
b0417
b0418
b0419
b0420
b0421
b0422
b0423
b0424
b0425
b0426
b0427
b0428
b0429
b0430
b0431
b0432
b0433
b0434
b0435
b0436
b0437
b0438
b0439
b0440
b0441
b0442
b0443
b0444
b0445
b0449
b0452
b0453
b0454
b0456
b0457
b0458
b0459
b0460
b0461
b0462
116
yajB
queA
tgt
yajC
secD
secF
yajD
tsx
ybaD
ribD
ribH
nusB
thiL
pgpA
yajO
dxs
ispA
xseB
yajK
thiJ
apbA
yajQ
yajR
cyoE
cyoD
cyoC
cyoB
cyoA
ampG
yajG
bolA
tig
clpP
clpX
lon
hupB
ybaU
ybaV
ybaW
ybaX
ybaE
mdlB
tesB
ybaY
ybaZ
ybaA
ylaB
ylaC
ylaD
hha
ybaJ
acrB
12.1
7.8
8.9
8.7
7.7
10.3
7.8
5.7
4.7
4.2
9.6
9.5
20.1
4.3
9.8
12.1
6.5
5.2
10.2
6.1
3.5
7.1
12.8
10.6
5.0
11.6
6.4
8.0
10.8
8.2
5.9
11.6
4.9
8.4
11.2
15.8
8.7
8.5
7.7
9.0
6.6
11.0
3.3
6.6
14.9
13.2
12.7
4.1
16.9
10.1
6.1
6.7
6.5
4.4
6.6
6.2
5.3
5.5
5.5
4.2
3.8
3.2
7.1
7.4
11.6
3.0
7.4
9.0
4.9
3.9
6.9
4.6
2.6
5.2
7.1
8.1
3.8
8.5
4.7
5.8
7.4
4.9
3.1
6.1
3.9
6.4
8.6
10.2
6.3
5.8
5.3
5.7
4.0
7.1
1.9
5.0
8.1
7.6
7.7
2.9
9.7
6.4
4.2
4.2
79.7
32.7
13.7
14.8
14.1
78.3
13.3
8.9
6.1
5.9
14.8
13.3
75.0
7.8
14.1
18.8
9.8
7.7
19.5
8.8
5.5
11.3
62.9
15.3
7.5
18.2
10.0
13.1
19.8
25.9
51.8
135.6
6.6
12.3
16.1
34.9
14.2
15.7
14.3
21.7
18.6
24.7
13.1
9.7
97.1
51.1
35.5
6.9
63.5
23.8
11.0
17.1
b0463
b0464
b0465
b0466
b0467
b0468
b0469
b0471
b0472
b0474
b0475
b0476
b0477
b0478
b0480
b0482
b0483
b0485
b0486
b0487
b0488
b0489
b0490
b0492
b0493
b0494
b0495
b0496
b0505
b0506
b0514
b0518
b0521
b0524
b0525
b0526
b0527
b0529
b0542
b0543
b0546
b0549
b0553
b0555
b0564
b0565
b0571
b0572
b0577
b0578
b0579
b0581
acrA
acrR
aefA
ybaM
priC
ybaN
apt
ybaB
recR
adk
hemH
ybaC
gsk
ybaL
ushA
ybaP
ybaQ
ybaS
ybaT
ybbI
ybbJ
ybbK
ybbL
ybbN
ybbO
tesA
ybbA
ybbP
ybbT
ybbU
ybbZ
fdrA
arcC
ybbF
ppiB
cysS
ybcI
folD
emrE
ybcM
ybcO
nmpC
ybcS
appY
ompT
ylcA
ylcB
ybdG
nfnB
ybdF
ybdK
8.6
5.0
12.6
16.7
6.9
6.2
11.3
11.7
13.4
12.3
6.4
10.1
12.2
7.0
11.6
8.0
4.5
2.9
6.5
3.0
6.0
2.7
5.6
11.9
13.0
8.9
8.4
8.9
11.9
3.7
13.1
13.0
5.8
13.5
8.5
14.2
11.0
5.9
6.5
14.9
10.6
9.7
1.7
14.3
5.2
4.4
10.2
6.1
6.4
8.1
1.6
5.6
6.7
3.1
8.1
9.1
5.1
3.7
6.6
6.5
6.9
6.1
4.8
7.1
6.6
5.4
7.6
5.4
3.3
2.2
4.7
2.0
4.0
1.7
4.4
8.3
8.5
6.5
5.5
6.3
6.9
2.1
8.7
8.7
3.4
9.9
6.2
8.1
6.6
4.0
4.1
7.9
5.8
5.2
1.4
8.6
2.8
3.0
6.2
3.4
4.5
6.0
1.0
3.8
11.9
13.1
28.4
96.1
10.4
18.2
41.0
58.9
222.0
2423.3
9.6
17.5
73.8
10.1
24.5
15.5
7.3
4.5
10.7
5.4
12.2
6.3
7.8
21.4
27.4
14.3
17.6
15.0
44.5
17.0
26.6
25.8
21.1
21.3
13.8
56.4
33.4
11.4
15.9
124.5
65.5
69.7
2.2
44.0
32.9
8.4
29.0
26.6
11.1
12.4
3.1
11.3
b0582
b0583
b0584
b0585
b0587
b0593
b0594
b0595
b0598
b0599
b0600
b0601
b0602
b0604
b0605
b0606
b0607
b0608
b0609
b0610
b0611
b0612
b0620
b0621
b0623
b0628
b0630
b0631
b0632
b0634
b0636
b0637
b0639
b0641
b0642
b0643
b0644
b0646
b0648
b0651
b0652
b0655
b0657
b0658
b0659
b0660
b0662
b0671
b0674
b0675
b0676
b0677
117
yi81_2
entD
fepA
fes
fepE
entC
entE
entB
cstA
ybdH
ybdL
ybdM
ybdN
dsbG
ahpC
ahpF
ybdQ
ybdR
rnk
rna
ybdS
citB
dcuC
cspE
lipA
lipB
ybeD
dacA
mrdB
ybeA
ybeB
ybeN
rlpB
leuS
ybeL
ybeQ
ybeS
ybeU
ybeK
gltL
ybeJ
lnt
ybeX
ybeY
ybeZ
yleB
asnB
nagD
nagC
nagA
13.4
23.5
20.8
18.1
14.9
11.4
13.8
10.7
4.1
2.2
3.2
3.5
22.2
7.5
11.1
7.8
5.9
6.7
4.8
11.6
7.5
21.7
9.2
19.1
5.0
11.1
12.5
13.2
9.0
8.7
10.9
7.2
15.7
9.2
8.6
7.5
5.7
21.0
23.5
5.7
11.2
9.9
13.7
8.4
7.2
8.3
8.0
4.6
6.9
4.2
4.5
2.0
7.6
11.8
10.7
9.7
8.7
8.4
8.7
8.1
3.3
1.3
1.6
1.9
12.2
5.5
8.0
5.9
4.2
4.1
2.5
7.5
5.4
11.3
6.2
10.8
2.7
6.9
8.0
8.1
5.9
6.2
7.5
4.9
9.2
6.5
6.3
5.0
3.1
11.0
13.3
4.5
7.7
7.0
8.9
6.5
4.8
4.7
4.5
2.4
4.0
2.9
3.7
1.6
58.6
5295.7
359.2
137.5
50.6
17.6
33.3
15.4
5.3
9.6
52.6
23.8
123.1
11.8
18.0
11.6
9.8
17.9
66.2
25.6
12.3
264.7
17.4
84.8
42.6
29.2
28.0
34.6
19.2
14.7
20.3
13.2
53.9
15.4
13.2
15.4
38.6
258.0
99.7
7.7
20.7
16.7
29.3
12.1
15.1
34.2
35.1
60.9
27.0
7.9
5.7
2.7
b0678
b0679
b0680
b0682
b0683
b0684
b0686
b0687
b0688
b0695
b0696
b0698
b0699
b0706
b0707
b0710
b0711
b0712
b0714
b0721
b0722
b0723
b0724
b0725
b0726
b0727
b0729
b0730
b0735
b0736
b0737
b0738
b0739
b0740
b0741
b0742
b0750
b0751
b0752
b0753
b0754
b0755
b0756
b0757
b0758
b0759
b0760
b0762
b0764
b0766
b0767
b0773
nagB
nagE
glnS
ybfN
fur
fldA
ybfF
seqA
pgm
kdpD
kdpC
kdpA
ybfA
ybfD
ybgA
ybgI
ybgJ
ybgK
nei
sdhC
sdhD
sdhA
sdhB
sucA
sucB
sucD
farR
ybgE
ybgC
tolQ
tolR
tolA
tolB
pal
ybgF
nadA
pnuC
ybgR
aroG
gpmA
galM
galK
galT
galE
modF
modB
ybhA
ybhE
ybhB
2.6
4.5
10.9
16.1
6.8
6.9
8.1
6.0
8.9
5.9
11.6
5.7
3.9
12.5
10.6
9.0
4.5
6.7
15.6
6.8
3.4
4.7
3.9
5.9
1.9
2.4
4.1
6.2
18.3
6.7
11.5
13.6
10.1
10.9
10.3
9.9
7.2
6.5
5.1
18.8
7.7
17.7
6.2
5.2
4.9
4.3
11.6
2.5
1.7
8.4
9.9
10.0
2.0
3.4
6.5
8.8
5.2
5.1
4.8
4.2
4.9
3.5
7.0
3.3
2.6
6.8
6.0
5.3
3.3
4.7
10.3
5.1
2.8
3.5
2.9
4.5
1.3
1.8
2.9
5.1
10.7
4.8
7.4
8.9
7.1
7.7
7.9
6.5
3.7
3.3
2.5
10.1
4.1
9.7
4.8
3.9
4.2
3.2
8.0
1.5
1.0
5.1
7.7
7.8
3.8
6.7
32.6
102.4
9.9
10.5
25.8
10.3
55.4
18.7
33.5
20.5
7.2
82.3
46.7
30.5
7.1
11.8
32.9
10.1
4.6
7.4
5.7
8.3
3.4
3.5
7.1
7.9
64.3
10.8
25.1
28.9
17.4
18.7
14.8
20.8
120.1
332.2
1360.2
143.1
72.0
105.7
8.7
7.9
6.0
6.6
21.4
8.2
8.0
23.7
13.7
13.9
b0774
b0775
b0777
b0778
b0779
b0780
b0782
b0783
b0784
b0785
b0786
b0789
b0790
b0791
b0792
b0793
b0794
b0795
b0796
b0798
b0799
b0800
b0801
b0803
b0804
b0806
b0808
b0809
b0810
b0811
b0813
b0814
b0815
b0817
b0819
b0820
b0821
b0829
b0831
b0833
b0834
b0837
b0838
b0839
b0840
b0841
b0842
b0843
b0844
b0845
b0848
b0849
118
bioA
bioB
bioC
bioD
uvrB
ybhK
moaB
moaC
moaD
moaE
ybhL
ybhO
ybhP
ybhQ
ybhR
ybhS
ybhF
ybiH
ybiA
dinG
ybiB
ybiC
ybiI
ybiX
ybiM
ybiO
glnQ
glnP
glnH
ybiF
ompX
ybiP
ybiS
ybiT
ybiU
yliI
yliJ
dacC
deoR
ybjG
cmr
ybjH
ybjM
grxA
15.2
14.8
11.0
11.1
14.2
8.2
11.3
9.1
15.8
10.4
7.3
9.7
7.5
3.7
9.3
11.3
5.4
3.7
5.4
2.3
9.7
8.6
9.4
2.8
15.1
9.3
10.4
21.8
33.6
10.5
9.9
2.6
9.8
4.1
9.5
13.2
13.0
9.7
10.2
20.6
16.9
3.1
10.8
3.9
8.3
14.7
8.4
7.4
4.0
9.8
4.1
15.1
7.7
9.1
6.4
7.9
10.0
6.2
7.6
6.0
8.9
7.0
4.0
5.6
4.8
2.6
6.5
7.8
4.5
2.8
4.0
1.8
6.5
5.8
6.2
1.9
8.3
5.5
6.9
11.1
18.1
7.2
5.1
2.0
5.0
2.7
6.3
7.0
8.6
6.7
6.5
11.9
8.9
1.7
7.5
3.0
5.7
9.6
6.2
4.4
2.2
7.6
2.5
9.2
431.5
39.1
39.4
19.0
24.9
12.1
21.5
19.4
68.0
20.0
39.7
37.9
17.1
6.3
16.0
20.3
6.7
5.7
8.2
3.2
19.1
16.1
19.1
5.4
78.9
29.5
21.2
775.0
233.0
19.6
215.6
3.5
192.1
7.9
19.3
110.8
27.0
18.0
24.4
76.1
159.5
20.6
19.3
5.6
15.5
31.7
12.6
25.3
25.8
13.6
12.6
42.4
b0850
b0851
b0853
b0856
b0858
b0861
b0862
b0863
b0864
b0865
b0866
b0867
b0868
b0869
b0870
b0871
b0872
b0874
b0876
b0877
b0879
b0880
b0881
b0882
b0884
b0887
b0888
b0889
b0890
b0891
b0893
b0901
b0902
b0903
b0904
b0905
b0906
b0907
b0908
b0910
b0911
b0912
b0915
b0916
b0917
b0918
b0919
b0921
b0922
b0923
b0924
b0925
ybjC
mdaA
ybjN
potH
ybjO
artM
artQ
artI
artP
ybjP
ybjT
ybjU
poxB
ybjE
ybjD
ybjX
ybjZ
cspD
yljA
clpA
infA
cydD
trxB
lrp
ftsK
lolA
serS
ycaK
pflA
pflB
focA
ycaO
ycaP
serC
aroA
cmk
rpsA
himD
ycaH
ycaQ
ycaR
kdsB
smtA
mukF
mukE
mukB
ycbB
2.7
3.7
9.2
16.6
16.4
9.8
9.2
8.4
4.5
6.3
2.3
2.4
5.2
9.6
9.1
8.6
10.3
6.0
12.2
7.6
11.5
9.7
4.4
6.1
7.2
6.6
12.6
8.5
7.2
8.2
7.6
4.1
7.4
25.4
5.9
11.8
4.3
12.7
10.9
9.6
12.1
11.5
11.0
8.6
6.5
5.9
5.8
4.6
11.1
6.6
11.2
3.1
2.2
2.9
6.8
10.2
9.7
5.3
6.4
6.3
3.3
4.7
1.8
1.8
3.5
6.3
6.1
5.9
6.4
4.3
8.2
4.0
7.9
6.5
3.3
5.0
5.2
5.2
7.4
6.5
5.0
5.5
5.5
2.2
5.3
15.0
4.7
6.6
3.1
10.1
7.7
6.5
8.0
7.7
7.2
5.2
5.3
4.7
4.1
3.8
8.8
4.9
7.1
2.5
3.5
5.3
14.1
45.2
53.5
56.7
16.7
12.5
7.2
9.5
3.1
3.7
10.1
20.5
17.8
15.9
26.1
9.5
23.8
76.4
21.5
19.0
6.6
7.9
11.5
9.0
43.2
12.3
13.0
16.1
12.2
35.4
12.0
81.7
8.0
61.2
6.9
17.1
18.4
18.4
24.9
22.6
23.5
25.1
8.5
7.9
9.7
5.7
15.2
10.2
26.6
4.1
b0926
b0927
b0928
b0929
b0930
b0931
b0932
b0933
b0934
b0938
b0944
b0948
b0949
b0950
b0951
b0952
b0954
b0955
b0956
b0958
b0959
b0960
b0961
b0962
b0963
b0964
b0965
b0966
b0967
b0970
b0973
b0974
b0978
b0981
b0986
b0989
b0990
b0992
b0995
b0997
b0999
b1000
b1001
b1002
b1003
b1004
b1010
b1011
b1014
b1015
b1018
b1024
119
ycbK
ycbL
aspC
ompF
asnS
pncB
pepN
ycbE
ycbM
ycbQ
ycbF
ycbY
uup
pqiA
pqiB
ymbA
fabA
ycbG
sulA
yccF
helD
mgsA
yccV
yccA
hyaB
hyaC
appC
yccC
ymcC
cspH
cspG
yccM
torR
torA
yccD
cbpA
yccE
agp
yccJ
wrbA
putA
putP
ycdO
ycdS
11.1
8.7
12.1
21.2
7.6
11.8
8.7
14.9
13.6
13.8
12.5
13.5
9.0
10.4
9.1
12.3
10.6
9.8
7.8
2.6
4.2
9.1
7.0
11.6
7.6
9.1
10.4
3.2
8.9
8.3
9.7
18.8
16.2
15.7
16.8
11.1
1.5
21.2
10.4
17.3
4.5
4.7
15.7
20.8
13.4
9.1
12.5
14.5
4.5
17.3
11.5
9.7
7.7
5.7
9.0
11.6
4.2
6.7
4.7
7.6
7.4
7.7
7.8
9.8
4.8
7.4
6.7
7.9
6.0
8.0
5.4
2.0
2.9
5.7
3.9
9.1
5.9
5.5
5.3
2.4
5.5
4.6
7.2
10.4
9.4
10.3
9.4
6.4
1.2
10.7
6.0
10.6
3.2
3.6
8.0
12.5
9.2
6.5
8.4
7.4
2.8
8.9
7.0
5.7
19.6
18.2
18.7
128.5
44.6
50.2
58.0
435.0
87.1
62.3
32.1
21.9
68.7
17.1
14.2
27.9
48.7
12.5
13.9
3.7
7.1
23.5
34.1
16.2
10.6
25.4
201.8
4.7
23.6
45.0
14.7
100.5
58.0
32.9
78.3
42.9
2.1
811.3
39.5
47.3
7.9
6.9
483.2
62.5
24.6
14.9
24.8
469.3
11.1
371.6
31.2
32.7
b1025
b1033
b1034
b1035
b1036
b1040
b1041
b1045
b1046
b1048
b1050
b1051
b1053
b1054
b1056
b1060
b1061
b1062
b1063
b1064
b1065
b1066
b1067
b1068
b1069
b1070
b1071
b1072
b1073
b1074
b1075
b1076
b1077
b1078
b1081
b1082
b1084
b1086
b1087
b1088
b1089
b1090
b1091
b1092
b1093
b1096
b1097
b1098
b1100
b1101
b1103
b1104
ycdT
ycdW
ycdX
ycdY
ycdZ
csgD
csgB
ymdC
mdoG
yceK
msyB
yceE
htrB
yceI
yceP
dinI
pyrC
yceB
grxB
yceL
rimJ
yceH
mviM
mviN
flgN
flgM
flgA
flgB
flgC
flgD
flgE
flgF
flgG
flgJ
flgK
rne
yceC
yceF
yceD
rpmF
plsX
fabH
fabD
fabG
pabC
yceG
tmk
ycfH
ptsG
ycfF
ycfL
17.5
6.7
12.0
13.3
10.7
7.9
13.0
2.7
6.8
10.9
11.6
11.2
8.2
8.9
10.8
3.3
9.4
9.2
9.4
10.0
15.9
3.8
4.9
6.6
9.7
6.0
5.1
14.6
6.9
3.5
6.1
5.6
2.9
8.4
14.8
8.7
10.3
5.1
7.9
8.6
10.0
7.0
11.7
9.2
13.6
5.8
7.0
7.2
3.0
3.8
6.8
7.8
10.8
5.1
9.2
9.5
7.2
5.0
7.5
2.1
4.1
5.5
6.4
6.9
4.4
6.7
7.7
2.2
6.4
4.9
6.8
8.6
11.3
3.1
4.0
5.3
7.6
4.1
3.2
8.3
4.3
2.4
4.2
3.2
1.6
5.7
8.9
5.8
5.5
2.7
5.3
5.9
6.4
5.4
8.0
7.3
7.1
3.4
4.9
5.4
2.3
3.2
5.5
6.0
44.8
9.8
17.2
22.3
20.8
18.6
50.3
3.6
20.6
314.7
65.0
29.7
51.1
13.5
18.3
6.2
17.7
71.9
15.5
12.0
26.9
4.7
6.2
8.9
13.2
11.5
11.9
63.2
17.2
6.7
11.0
19.8
15.7
16.1
43.4
16.7
75.7
34.5
15.6
15.7
22.8
9.7
21.7
12.6
173.0
20.1
12.1
11.1
4.2
4.6
8.9
11.1
b1105
b1106
b1107
b1108
b1109
b1111
b1112
b1113
b1114
b1116
b1117
b1118
b1119
b1123
b1125
b1126
b1127
b1128
b1130
b1131
b1132
b1134
b1135
b1137
b1138
b1139
b1140
b1143
b1145
b1146
b1150
b1153
b1158
b1162
b1163
b1170
b1171
b1172
b1174
b1175
b1176
b1177
b1178
b1179
b1180
b1182
b1183
b1184
b1186
b1187
b1188
b1189
120
ycfM
ycfN
ycfO
ycfP
ndh
ycfQ
ycfR
ycfS
mfd
ycfU
ycfV
ycfW
ycfX
potD
potB
potA
pepT
ycfD
phoP
purB
ycfC
ymfC
ymfD
ymfE
lit
intE
ymfI
ymfR
pin
ycgE
minE
minD
minC
ycgJ
ycgK
ycgL
hlyE
umuD
umuC
nhaB
fadR
ycgB
dadA
6.8
8.2
5.7
4.8
12.0
7.1
3.5
21.1
15.8
11.9
7.5
7.1
12.5
11.1
13.2
21.5
6.0
8.0
5.7
14.0
6.4
13.4
11.3
2.5
2.3
7.5
8.1
13.3
6.0
14.2
6.5
12.6
13.5
5.5
13.2
17.4
3.7
15.0
7.4
6.4
3.5
6.7
8.1
6.0
6.3
5.8
15.9
9.3
9.4
3.9
3.4
6.4
5.3
5.9
4.6
3.5
7.1
5.1
2.1
10.6
10.8
8.1
4.9
5.3
6.7
8.0
8.5
12.0
4.5
6.4
4.2
8.6
4.3
8.8
7.8
1.3
1.5
4.5
4.7
9.6
3.8
8.1
4.3
7.2
8.4
4.1
7.2
10.2
2.1
10.2
5.8
4.8
2.9
4.3
5.4
4.5
4.6
3.3
9.1
5.5
6.0
2.9
2.5
3.9
9.8
13.6
7.5
7.4
37.7
11.7
10.4
1380.3
29.0
22.2
16.4
10.6
92.1
18.1
29.6
102.6
9.2
10.6
9.0
37.3
12.1
28.0
21.0
20.3
5.0
22.0
28.3
21.5
15.3
58.2
13.1
50.0
33.7
8.2
75.9
59.5
17.9
28.8
10.2
9.4
4.5
15.1
16.9
8.9
9.7
23.1
62.0
30.6
22.4
6.1
5.5
18.4
b1190
b1191
b1195
b1197
b1198
b1199
b1200
b1201
b1203
b1205
b1208
b1209
b1210
b1214
b1215
b1216
b1217
b1219
b1221
b1222
b1224
b1225
b1227
b1232
b1233
b1234
b1235
b1236
b1238
b1240
b1243
b1244
b1245
b1246
b1247
b1248
b1249
b1250
b1251
b1253
b1254
b1255
b1256
b1260
b1261
b1263
b1266
b1267
b1269
b1270
b1271
b1272
dadX
ymgE
treA
ycgC
ychF
ychH
ychB
hemM
hemA
ychA
kdsA
chaA
chaB
ychN
narL
narX
narG
narH
narI
purU
ychJ
ychK
hnr
galU
tdk
oppA
oppB
oppC
oppD
oppF
cls
kch
yciI
yciA
yciB
yciC
yciD
trpA
trpB
trpD
yciV
yciO
yciL
btuR
yciK
sohB
9.9
8.8
4.3
2.9
7.5
10.3
6.9
6.7
9.4
1.3
8.6
6.7
5.5
5.5
14.5
10.4
13.6
11.6
6.0
6.0
7.3
10.7
9.9
9.8
8.2
8.2
9.8
11.5
16.7
11.0
6.2
6.6
9.4
11.4
14.3
7.8
6.1
5.6
7.6
7.4
4.9
4.5
9.1
11.8
9.1
11.5
7.1
6.2
15.3
12.1
7.0
6.5
5.9
4.7
2.9
2.2
4.0
7.3
4.9
3.7
6.6
1.2
7.1
5.6
3.9
4.4
9.9
5.3
6.9
8.8
3.9
4.1
4.2
5.9
5.3
6.3
4.8
4.4
6.4
8.3
9.2
7.1
5.2
4.8
6.1
7.2
8.5
4.8
5.1
4.0
5.5
5.2
3.6
3.4
5.0
8.4
6.1
8.2
5.1
4.8
8.3
6.9
4.8
4.1
30.0
67.4
7.9
4.3
48.9
17.6
11.8
31.3
16.3
1.5
10.8
8.3
9.4
7.4
26.9
198.7
317.5
16.8
12.5
10.9
29.6
59.4
78.8
21.6
30.6
54.1
21.0
18.2
89.9
24.3
7.6
10.5
20.1
27.5
46.2
20.0
7.6
9.7
12.0
12.4
7.4
6.5
52.1
20.3
17.7
18.7
11.2
8.8
106.1
48.5
12.6
15.9
b1273
b1274
b1275
b1276
b1277
b1278
b1279
b1281
b1282
b1283
b1284
b1285
b1286
b1287
b1288
b1289
b1290
b1291
b1292
b1295
b1303
b1304
b1305
b1306
b1307
b1308
b1309
b1311
b1318
b1319
b1320
b1324
b1325
b1326
b1327
b1329
b1332
b1338
b1343
b1344
b1356
b1361
b1363
b1364
b1365
b1371
b1374
b1375
b1376
b1377
b1379
b1380
121
yciN
topA
cysB
acnA
ribA
pgpB
yciS
pyrF
yciH
osmB
yciR
rnb
yciW
fabI
ycjD
sapF
sapD
sapC
ymjA
pspF
pspA
pspB
pspC
pspD
pspE
ycjM
ycjO
ycjV
ompG
ycjW
tpx
ycjG
ycjI
ynaJ
ydaJ
dbpA
ydaO
ydaR
ydaW
trkG
ynaE
ynaF
hslJ
ldhA
7.4
8.9
6.6
11.8
4.7
6.7
3.6
15.5
20.1
13.6
8.2
16.4
8.7
5.8
12.0
13.8
7.5
10.1
4.2
23.2
2.3
17.3
10.7
12.1
15.5
18.2
6.0
12.0
3.3
7.1
21.2
13.7
8.4
3.2
5.2
6.5
6.3
13.3
16.7
14.0
1.4
14.3
21.2
21.1
16.3
10.6
7.4
1.1
6.6
8.7
2.9
10.8
5.4
5.4
4.9
8.9
3.7
4.5
2.6
9.1
10.7
8.0
5.6
9.3
5.4
3.8
8.7
7.1
4.6
6.7
2.1
12.5
1.4
10.4
6.6
7.3
10.7
12.1
3.1
6.8
2.1
4.9
12.0
10.2
6.2
2.5
3.6
5.3
3.6
8.0
8.8
7.2
1.1
9.8
12.6
11.4
9.0
7.9
5.0
0.8
4.8
4.4
1.6
7.9
12.0
24.6
10.3
17.3
6.3
12.5
5.5
52.2
163.4
44.3
15.2
70.6
23.3
12.3
19.3
231.4
21.8
19.9
141.3
156.5
6.6
51.2
27.8
35.0
27.9
37.2
104.7
48.6
7.4
12.9
95.3
20.8
13.0
4.4
9.3
8.3
22.9
39.0
154.8
234.6
2.0
26.2
68.6
136.8
84.3
16.2
14.0
1.7
10.5
899.4
17.4
17.2
b1381
b1382
b1383
b1385
b1386
b1393
b1397
b1399
b1400
b1401
b1406
b1407
b1411
b1412
b1413
b1415
b1416
b1417
b1418
b1419
b1423
b1425
b1427
b1429
b1430
b1431
b1432
b1433
b1434
b1435
b1437
b1438
b1439
b1440
b1441
b1442
b1443
b1444
b1445
b1448
b1449
b1452
b1457
b1461
b1462
b1465
b1466
b1467
b1469
ydbH
ynbE
ydbL
feaB
tynA
ydbS
ydbA_
1
ydbC
ydbD
ynbD
acpD
hrpA
aldA
gapC_
2
gapC_
1
cybB
ydcA
rimL
tehA
tehB
ydcN
ydcP
yncB
ydcD
ydcE
narV
narW
narY
narU
10.7
6.4
9.4
15.1
9.3
5.4
14.1
2.8
7.0
7.0
5.8
4.1
5.8
8.6
5.3
2.8
9.0
2.3
5.0
4.8
69.2
14.8
24.4
65.1
38.1
60.6
33.2
3.7
11.9
12.8
6.9
25.7
11.7
2.7
14.3
7.6
9.6
5.1
13.0
6.2
1.7
8.7
5.0
7.1
10.5
1905.9
106.1
6.8
39.2
15.8
14.8
7.7
5.8
11.3
7.3
4.3
4.6
14.5
3.7
6.0
9.2
7.6
5.7
8.5
7.4
7.6
11.2
8.3
12.1
2.8
4.3
5.2
15.0
24.2
12.2
15.8
7.3
8.2
10.3
5.6
6.2
12.1
13.3
11.2
16.8
5.5
2.4
3.1
10.0
2.9
4.3
7.2
5.1
3.3
4.3
4.7
5.4
6.6
6.2
6.9
2.3
3.3
4.2
10.5
12.8
6.1
10.9
4.9
4.4
6.0
3.4
3.7
8.1
8.4
7.3
12.0
11.0
20.6
9.2
26.4
5.4
10.1
12.6
15.2
22.1
339.6
17.1
13.0
37.9
12.7
48.1
3.5
6.0
6.7
26.2
230.8
3739.3
29.3
14.0
53.3
34.2
16.0
20.4
23.5
32.4
23.6
27.8
b1473
b1475
b1476
b1477
b1478
b1479
b1480
b1482
b1488
b1490
b1491
b1492
b1497
b1498
b1499
b1507
b1508
b1509
b1512
b1514
b1516
b1517
b1518
b1519
b1520
b1521
b1522
b1523
b1524
b1525
b1529
b1530
b1531
b1533
b1534
b1537
b1538
b1539
b1540
b1542
b1544
b1545
b1547
b1549
b1552
b1556
b1557
b1558
b1561
b1562
b1563
b1564
122
yddG
fdnH
fdnI
yddM
adhP
sfcA
rpsV
osmC
xasA
hipA
hipB
ydeW
ydeY
yneB
uxaB
yneH
ydeB
marR
marA
ydeD
ydeF
ydeJ
dcp
ydfG
ydfH
ydfI
ydfK
ydfO
cspI
cspB
cspF
rem
relF
relE
relB
14.4
14.7
17.5
5.3
12.9
8.8
23.1
10.9
7.2
12.1
24.8
15.2
3.3
19.3
3.3
6.2
2.9
17.3
6.4
11.8
6.3
8.0
8.5
7.2
8.0
14.6
9.9
8.3
14.3
10.0
11.0
11.8
9.8
3.0
14.4
17.8
9.4
7.7
5.7
8.9
1.1
7.4
13.6
1.2
1.1
13.1
0.8
0.7
14.7
8.9
6.9
6.1
9.5
8.3
11.1
3.8
9.1
6.6
12.8
6.9
3.7
7.6
13.0
10.2
2.9
10.4
2.1
3.9
2.1
9.2
4.1
7.9
4.5
5.3
6.0
5.5
6.1
8.5
6.4
4.4
8.1
6.6
6.8
8.4
6.9
2.2
9.1
10.0
6.1
6.0
3.7
6.3
0.8
5.0
6.9
0.6
0.7
8.5
0.6
0.5
8.3
6.5
4.4
3.7
29.8
61.1
40.9
8.8
21.8
13.4
112.3
26.0
221.0
29.0
297.4
29.9
3.8
137.5
7.6
14.7
4.8
133.9
15.4
23.8
10.6
16.7
14.4
10.1
11.7
51.2
21.1
68.9
58.7
21.3
29.9
19.7
16.5
4.8
34.1
81.4
21.1
10.7
12.3
15.4
1.7
14.0
833.3
23.8
2.3
28.4
1.0
1.8
61.0
14.0
15.5
18.0
b1570
b1572
b1574
b1578
b1579
b1582
b1583
b1584
b1585
b1586
b1591
b1592
b1593
b1594
b1595
b1596
b1597
b1598
b1599
b1600
b1602
b1603
b1604
b1605
b1606
b1607
b1608
b1609
b1611
b1612
b1613
b1614
b1616
b1617
b1618
b1619
b1621
b1622
b1623
b1624
b1626
b1627
b1628
b1630
b1632
b1635
b1636
b1637
b1638
b1640
b1641
b1642
dicA
ydfB
dicF
speG
ynfC
mlc
ynfL
ynfM
asr
pntB
pntA
ydgB
ydgC
rstA
rstB
fumC
fumA
manA
ydgA
uidB
uidA
uidR
hdhA
malX
malY
add
ydgO
ydgQ
gst
pdxY
tyrS
pdxH
slyB
slyA
5.1
10.1
15.8
6.8
14.9
8.6
3.8
5.2
6.7
6.6
9.5
19.3
6.6
2.3
10.3
9.5
8.4
9.6
10.9
3.3
8.4
8.4
10.7
3.5
7.8
2.7
11.1
7.3
6.4
6.8
8.8
5.1
4.3
11.1
5.0
7.1
7.7
6.0
6.4
8.6
24.4
8.3
14.9
8.5
12.5
5.6
7.4
13.5
7.1
9.0
7.4
7.0
4.3
5.6
8.4
3.6
7.5
6.2
2.8
3.7
5.7
4.2
5.5
10.5
3.9
1.8
6.8
5.5
6.0
5.8
6.7
2.3
5.1
5.5
7.1
2.1
5.1
1.5
7.8
5.5
4.9
5.6
6.8
4.2
2.4
6.9
3.8
4.7
5.0
3.5
5.5
6.3
12.7
4.7
7.8
4.3
8.7
4.7
4.4
10.1
4.7
5.6
5.3
5.3
6.1
54.4
149.9
58.1
2960.9
14.0
5.7
9.2
8.0
15.1
33.1
118.0
23.6
3.2
21.2
34.3
14.0
27.4
29.7
6.0
24.9
17.4
21.2
10.4
16.6
14.6
19.3
10.9
9.2
8.4
12.4
6.5
26.0
28.0
7.3
14.6
17.0
22.0
7.8
13.4
327.0
31.6
191.4
764.0
21.8
7.2
23.0
20.2
14.1
22.1
12.2
10.1
b1644
b1645
b1646
b1647
b1650
b1651
b1652
b1654
b1655
b1657
b1658
b1659
b1661
b1662
b1663
b1664
b1667
b1668
b1676
b1678
b1679
b1680
b1681
b1682
b1683
b1684
b1685
b1686
b1687
b1688
b1689
b1692
b1693
b1694
b1699
b1700
b1701
b1702
b1703
b1704
b1705
b1706
b1707
b1708
b1710
b1711
b1712
b1713
b1714
b1717
b1718
b1719
123
sodC
nemA
gloA
rnt
ydhD
ydhO
purR
ydhB
cfa
ribE
ydhE
pykF
ynhG
ynhA
ynhC
ynhD
ynhE
ydiC
ydiJ
ydiB
aroD
ydiF
ydiS
ydiT
ydiD
ppsA
ydiA
aroH
ydiE
nlpC
btuE
btuC
himA
pheT
pheS
rpmI
infC
thrS
16.9
25.3
4.4
7.4
11.4
8.2
8.8
7.3
8.4
9.6
6.1
12.0
9.2
6.7
12.6
3.8
13.2
4.3
13.6
6.7
6.9
5.2
8.2
6.1
4.1
4.8
13.8
9.3
13.8
7.7
7.7
15.2
10.1
9.2
6.1
9.6
12.6
16.2
6.2
7.1
5.1
5.4
6.4
11.3
7.8
6.5
11.3
8.6
9.7
15.8
15.9
11.4
9.3
15.1
3.2
5.5
9.6
6.0
6.2
5.7
6.2
5.2
3.8
7.6
7.1
5.4
8.4
2.7
8.3
3.2
9.8
4.1
4.8
4.0
6.0
4.6
3.4
3.5
10.2
7.1
10.0
5.6
4.8
8.6
7.0
5.1
3.6
6.4
7.4
10.3
5.3
3.6
3.1
3.9
5.2
8.4
6.7
4.4
7.8
5.9
6.1
8.3
9.4
7.1
90.5
78.1
6.8
11.5
14.0
13.0
15.6
10.1
13.0
64.8
15.4
27.9
13.2
8.8
24.7
6.1
31.8
6.3
22.1
18.2
12.2
7.5
13.2
9.0
5.3
7.3
20.9
13.6
21.8
12.1
19.9
62.7
18.3
45.4
20.7
19.5
43.8
38.1
7.6
153.9
14.3
8.8
8.3
17.0
9.3
12.2
20.1
16.3
23.4
143.4
53.6
27.9
b1723
b1724
b1725
b1726
b1727
b1728
b1729
b1731
b1733
b1735
b1736
b1738
b1739
b1740
b1741
b1743
b1744
b1745
b1746
b1747
b1749
b1750
b1753
b1754
b1757
b1758
b1761
b1763
b1764
b1765
b1768
b1777
b1778
b1780
b1781
b1782
b1783
b1784
b1787
b1789
b1791
b1792
b1793
b1794
b1797
b1798
b1799
b1800
b1802
b1803
b1804
b1807
pfkB
yniC
ydjC
celD
celC
celA
osmE
nadE
spy
ydjS
xthA
ydjX
ynjA
gdhA
topB
selD
ydjA
ydjB
yeaA
yeaD
yeaF
yeaG
yeaH
yeaK
yeaL
yeaN
yeaO
yoaF
yeaP
yeaR
yeaS
yeaT
yeaU
yeaW
yeaX
rnd
yeaZ
8.1
9.1
3.4
5.6
6.3
18.8
10.9
13.2
16.1
7.8
9.8
3.5
9.0
9.1
8.2
12.1
13.5
8.6
13.3
16.0
7.5
4.3
13.5
13.5
5.0
10.4
14.4
8.7
6.5
3.7
9.6
2.3
3.3
7.6
4.6
9.1
11.1
6.3
8.0
16.1
12.0
6.5
9.5
6.1
21.6
13.4
1.2
6.7
9.9
19.7
8.1
13.1
6.3
7.1
2.7
4.5
4.6
10.0
7.8
9.2
8.6
5.9
6.1
2.8
6.3
6.9
5.6
8.9
8.2
5.1
9.0
9.8
5.3
2.3
8.7
8.0
3.5
6.5
9.0
6.7
5.2
2.9
7.0
1.6
2.4
6.1
3.4
6.5
6.6
5.1
5.5
9.7
8.3
4.7
5.4
4.0
11.9
7.3
0.9
4.7
5.2
11.7
5.5
7.4
11.2
12.6
4.7
7.3
9.6
163.7
18.1
23.0
140.3
11.7
25.4
4.7
15.7
13.5
15.6
18.9
38.8
29.2
25.0
45.0
13.0
36.9
29.9
44.1
9.1
25.4
35.4
12.3
8.9
5.2
15.6
4.7
5.2
10.1
7.2
15.0
35.1
8.0
15.3
46.3
21.6
10.8
37.5
12.5
118.7
86.5
2.0
11.4
103.8
62.3
15.3
56.2
b1808
b1809
b1811
b1812
b1813
b1814
b1815
b1816
b1820
b1821
b1822
b1825
b1826
b1827
b1829
b1830
b1831
b1832
b1835
b1836
b1837
b1839
b1840
b1841
b1842
b1843
b1844
b1845
b1846
b1847
b1848
b1850
b1851
b1852
b1853
b1854
b1855
b1856
b1857
b1858
b1859
b1860
b1861
b1862
b1863
b1864
b1865
b1866
b1867
b1869
b1870
b1871
124
pabB
yeaB
sdaA
yoaE
yebH
htpX
prc
yebJ
yebU
holE
ptrB
yebE
yebF
yebG
eda
edd
zwf
yebK
pykA
msbB
yebA
yebL
yebM
yebI
ruvB
ruvA
yebB
ruvC
yebC
ntpA
aspS
yecD
yecN
yecO
yecP
11.4
8.9
10.3
10.7
8.8
7.2
6.8
9.1
9.4
11.9
9.7
7.6
4.3
4.1
6.3
12.0
7.6
4.9
15.0
8.2
2.8
9.3
6.8
6.1
2.0
9.7
11.3
8.8
3.6
7.7
2.5
9.3
10.8
7.9
3.7
8.7
7.8
9.9
6.0
12.3
14.0
8.2
5.7
18.3
12.1
12.0
10.6
9.5
7.7
6.6
10.2
9.4
8.0
6.8
7.0
6.4
6.6
5.5
4.1
5.8
5.7
7.4
6.3
4.0
3.3
3.0
4.5
8.3
5.8
4.0
9.2
6.5
1.9
7.0
5.4
5.0
1.2
5.7
8.2
5.1
2.6
5.6
2.1
6.4
8.5
6.1
3.0
7.2
5.8
6.2
4.6
7.9
7.8
5.2
4.1
9.2
7.9
8.3
8.0
7.3
5.3
4.7
7.6
6.1
20.1
12.8
19.6
33.6
13.3
10.6
20.5
21.0
26.6
30.5
21.2
74.9
6.1
6.6
10.5
21.8
10.9
6.2
40.5
11.1
4.9
14.2
9.1
7.6
5.9
33.3
18.2
30.6
5.5
12.5
3.2
17.0
14.8
11.2
5.1
10.8
11.9
25.0
8.5
27.7
66.1
19.0
9.1
1187.4
25.7
21.5
15.8
13.7
14.5
11.4
15.2
19.8
b1874
b1875
b1882
b1883
b1884
b1886
b1887
b1888
b1890
b1891
b1892
b1894
b1895
b1896
b1897
b1898
b1899
b1900
b1901
b1902
b1903
b1905
b1907
b1908
b1912
b1913
b1914
b1915
b1916
b1917
b1918
b1919
b1920
b1921
b1922
b1923
b1924
b1926
b1927
b1928
b1929
b1930
b1931
b1932
b1938
b1939
b1940
b1941
b1942
b1943
b1945
b1947
cutC
yecM
cheY
cheB
cheR
tar
cheW
cheA
motA
flhC
flhD
insA_5
yecG
otsA
otsB
araH_2
araH_1
araG
araF
yecI
ftn
tyrP
yecA
pgsA
uvrC
uvrY
yecF
sdiA
yecC
yecS
yedO
fliY
fliZ
fliA
fliC
fliD
fliT
amyA
yedD
yedE
yedF
yedK
yedL
fliF
fliG
fliH
fliI
fliJ
fliK
fliM
fliO
5.3
12.6
10.6
15.9
10.7
5.8
10.9
14.3
10.5
13.6
8.4
10.2
3.8
9.2
4.3
4.3
2.6
4.9
5.6
2.9
13.8
8.5
11.4
8.5
3.6
8.5
5.6
11.8
4.6
11.6
9.0
7.6
6.4
10.6
6.6
8.1
10.0
14.0
7.9
6.1
11.8
4.0
11.8
3.6
10.1
10.6
15.3
8.8
14.4
8.4
13.8
7.7
3.3
9.1
6.3
10.4
7.7
4.0
5.8
8.2
6.0
8.6
5.8
7.0
2.8
5.8
3.2
2.9
1.8
3.3
3.8
2.2
8.0
6.4
5.7
6.3
3.0
7.0
4.8
6.9
3.1
8.2
6.6
5.5
4.6
6.6
4.0
6.2
5.7
9.3
6.2
4.7
7.7
2.8
7.5
2.1
5.8
7.1
8.3
5.6
9.1
5.4
8.3
5.0
12.6
20.6
32.9
34.4
17.5
10.1
80.6
58.4
42.7
32.8
15.2
18.8
5.9
22.1
6.9
8.8
4.8
9.3
10.4
4.2
49.3
12.6
1108.5
12.8
4.6
10.7
6.6
39.5
9.2
19.6
14.4
12.5
10.9
26.3
17.4
11.9
39.4
28.4
10.8
8.7
24.9
7.1
28.0
14.2
38.0
20.8
98.5
20.9
34.2
18.6
40.5
16.8
b1948
b1952
b1953
b1955
b1957
b1958
b1960
b1961
b1962
b1963
b1964
b1965
b1966
b1967
b1968
b1969
b1970
b1971
b1973
b1976
b1978
b1981
b1982
b1983
b1988
b1990
b1991
b1992
b1993
b1995
b1998
b1999
b2002
b2004
b2005
b2006
b2007
b2008
b2009
b2010
b2011
b2012
b2015
b2016
b2017
b2018
b2019
b2021
b2022
b2023
b2024
b2025
125
fliP
dsrB
yedI
vsr
dcm
yedJ
yedU
yedV
yedW
shiA
amn
nac
erfK
cobT
cobS
cobU
yeeP
yeeS
yeeU
yeeV
yeeW
yeeX
yeeA
sbmC
dacD
sbcB
yeeD
yeeY
yefM
hisL
hisG
hisC
hisB
hisH
hisA
hisF
13.0
9.6
3.2
8.5
8.4
5.7
9.0
3.1
12.5
3.8
3.3
11.8
21.9
8.9
11.5
7.3
17.1
5.4
7.7
4.3
9.2
9.0
8.7
9.2
13.0
2.8
9.4
5.6
6.1
2.1
17.1
16.7
8.2
4.6
5.8
18.8
8.0
8.3
5.1
4.7
4.6
8.0
9.1
4.4
2.9
1.2
6.5
3.6
4.5
6.8
10.0
10.1
7.0
6.3
2.2
6.0
6.0
4.5
6.2
1.9
7.9
2.7
2.2
7.0
11.8
6.4
6.0
5.1
11.0
3.5
4.4
3.5
5.3
6.1
7.0
6.4
7.5
2.1
7.4
3.8
4.1
1.1
10.4
8.8
4.7
2.7
3.0
11.8
6.6
6.5
3.7
3.4
3.3
5.7
6.2
3.6
2.3
0.7
4.4
2.7
3.4
4.2
6.6
7.9
89.2
20.0
5.9
14.5
14.1
7.7
16.1
7.9
29.2
6.4
6.7
37.5
160.7
14.9
114.9
12.4
38.5
11.5
28.3
5.6
34.2
17.6
11.4
16.5
49.2
4.0
12.9
10.5
11.9
32.2
46.8
149.0
30.4
16.8
97.2
46.1
10.3
11.3
8.1
7.5
7.7
13.7
17.1
5.7
4.1
3.2
12.1
5.1
6.7
17.2
20.5
14.0
b2026
b2027
b2029
b2031
b2032
b2033
b2034
b2035
b2036
b2037
b2038
b2039
b2040
b2041
b2042
b2044
b2046
b2047
b2048
b2050
b2051
b2052
b2060
b2063
b2064
b2065
b2068
b2070
b2071
b2072
b2073
b2076
b2077
b2078
b2080
b2081
b2082
b2084
b2086
b2090
b2091
b2097
b2098
b2099
b2100
b2101
b2102
b2104
b2105
b2107
b2108
b2109
hisI
wzzB
gnd
yefJ
wbbK
wbbJ
wbbI
wbbH
glf
rfbX
rfbC
rfbA
rfbD
rfbB
galF
wcaL
wzxC
wcaJ
cpsG
wcaI
wcaH
wcaG
yegH
asmA
dcd
alkA
yegO
yegB
baeS
yegQ
ogrK
gatR_2
gatD
yegT
yegW
yegX
thiM
yohL
yehA
yehB
12.5
9.4
16.1
3.8
4.9
5.6
6.0
7.9
11.1
15.1
16.2
12.1
11.6
9.3
8.6
10.9
16.8
17.0
14.8
7.2
13.8
3.4
9.5
8.4
12.9
2.7
10.9
14.0
18.4
16.3
23.0
13.6
12.8
7.7
6.9
19.3
12.2
8.0
12.2
12.8
11.6
10.4
13.6
6.4
16.6
2.9
5.3
12.0
6.4
8.5
21.3
6.1
9.0
6.5
11.7
3.0
3.5
4.0
3.9
4.8
6.7
8.6
9.1
8.6
8.4
6.3
6.3
6.8
11.1
9.4
10.2
3.8
7.8
2.3
4.9
6.6
9.2
1.9
6.2
8.0
9.9
9.7
11.6
6.8
7.2
4.0
4.3
10.6
7.7
5.1
7.6
7.4
7.5
7.7
8.0
3.5
10.7
1.8
3.4
7.1
3.6
5.4
11.1
3.9
20.3
16.9
25.7
5.1
8.2
9.7
13.6
23.6
32.7
61.8
74.6
20.7
18.9
17.3
13.5
26.7
33.9
91.2
27.0
57.5
56.8
6.4
173.2
11.5
21.7
4.7
45.8
55.3
135.4
50.3
1153.7
2590.0
62.2
80.2
18.2
115.6
29.4
18.4
30.9
49.8
25.2
15.9
47.4
43.6
37.4
8.7
12.3
36.9
28.3
19.4
264.5
14.4
b2112
b2113
b2114
b2119
b2121
b2123
b2125
b2126
b2127
b2128
b2129
b2130
b2131
b2133
b2134
b2135
b2136
b2137
b2139
b2143
b2144
b2146
b2147
b2149
b2150
b2151
b2153
b2154
b2155
b2156
b2157
b2160
b2162
b2168
b2169
b2170
b2171
b2172
b2173
b2175
b2176
b2177
b2178
b2180
b2181
b2183
b2184
b2185
b2186
b2187
b2188
b2190
126
yehE
mrp
metG
yehL
yehP
yehR
yehT
yehU
yehV
yehW
yehX
yehY
yehZ
dld
pbpG
yohC
yohD
yohF
yohH
cdd
sanA
yeiA
mglA
mglB
galS
folE
yeiG
cirA
lysP
yeiE
yeiI
yeiK
fruK
fruB
yeiO
yeiP
yeiQ
yeiR
spr
rtn
yejA
yejB
yejF
yejG
rsuA
yejH
rplY
yejK
yejL
yejM
yejO
7.4
7.3
2.2
9.7
10.8
24.9
6.6
5.6
8.2
8.5
11.6
9.8
11.5
8.5
13.4
7.8
3.3
10.4
7.3
7.7
6.6
8.4
12.5
8.7
7.8
15.3
7.3
8.2
11.4
8.6
6.7
5.4
18.9
4.6
3.6
9.0
11.6
6.6
9.8
10.0
6.2
12.6
8.1
15.0
9.7
4.6
10.6
8.4
9.8
5.7
9.7
4.7
5.2
5.7
1.4
5.0
6.4
13.2
4.0
4.4
4.8
5.7
7.8
7.2
7.2
7.2
9.3
5.5
2.4
7.4
3.7
5.2
4.4
5.8
7.9
5.9
5.1
8.7
4.8
6.3
6.5
5.8
5.2
3.5
10.6
3.3
2.5
4.9
8.5
5.0
5.2
6.5
4.3
8.3
6.6
8.3
5.2
3.3
6.3
5.5
6.8
4.3
6.5
3.4
13.0
10.0
6.2
119.4
33.6
222.0
18.0
7.8
28.9
17.0
22.6
15.6
27.8
10.3
23.7
13.6
5.3
17.5
651.4
15.0
12.9
15.4
30.3
16.6
16.5
61.8
14.5
12.0
44.3
16.8
9.2
12.1
83.3
7.5
6.6
54.6
18.3
9.7
99.4
22.3
10.6
26.4
10.4
75.8
64.1
8.0
33.4
17.8
17.9
8.2
18.8
7.6
b2191
b2193
b2194
b2196
b2198
b2199
b2200
b2202
b2203
b2208
b2209
b2211
b2212
b2213
b2216
b2217
b2218
b2219
b2220
b2225
b2226
b2227
b2229
b2231
b2232
b2233
b2234
b2235
b2237
b2239
b2240
b2241
b2242
b2245
b2248
b2249
b2251
b2254
b2255
b2256
b2257
b2259
b2261
b2262
b2264
b2265
b2266
b2267
b2268
b2274
b2276
b2277
narP
ccmH
ccmF
ccmD
ccmC
ccmB
napC
napB
napF
eco
yojI
alkB
ada
yojN
rcsB
rcsC
atoS
atoC
gyrA
ubiG
yfaL
nrdA
nrdB
inaA
glpQ
glpT
glpA
glpB
yfaO
pmrD
menC
menB
menD
menF
elaB
elaA
elaC
nuoN
nuoM
17.3
9.3
13.2
4.9
13.2
4.9
12.7
3.1
12.0
8.2
11.2
7.9
15.4
13.2
4.9
6.1
11.0
4.4
5.2
9.3
21.9
13.3
15.6
13.9
6.8
15.2
4.1
6.0
4.6
11.6
8.6
7.9
16.1
10.8
18.4
7.4
6.1
6.9
13.3
13.7
13.9
3.8
11.2
17.0
13.4
8.0
19.3
8.9
9.9
15.5
17.7
16.1
10.6
7.0
7.1
3.0
8.2
3.1
7.7
2.2
7.4
4.7
8.0
4.2
10.0
8.4
3.9
5.0
8.7
3.4
4.0
5.3
12.7
7.6
8.1
9.8
4.0
9.5
2.1
4.7
3.1
8.4
6.2
4.0
9.8
6.3
10.9
4.9
4.5
4.1
8.6
9.7
8.3
2.7
7.3
10.6
8.1
6.0
11.6
6.5
7.3
8.4
12.3
10.0
47.7
13.7
91.9
14.9
33.6
10.7
37.9
5.1
31.0
30.9
19.0
64.1
33.1
30.9
6.5
7.9
15.1
6.3
7.5
37.3
77.5
54.4
193.9
23.5
22.2
38.3
52.0
8.3
8.5
19.0
13.9
475.2
43.7
37.5
58.1
14.9
9.6
22.5
29.1
23.1
41.5
6.3
23.9
42.9
39.0
12.2
58.8
14.1
15.6
109.6
32.1
41.3
b2278
b2279
b2280
b2281
b2282
b2283
b2284
b2285
b2286
b2287
b2288
b2289
b2290
b2291
b2292
b2293
b2294
b2295
b2296
b2297
b2299
b2300
b2301
b2302
b2303
b2304
b2305
b2306
b2307
b2308
b2309
b2313
b2314
b2315
b2316
b2317
b2318
b2319
b2320
b2322
b2323
b2325
b2326
b2328
b2329
b2330
b2331
b2332
b2334
b2335
b2337
b2339
127
nuoL
nuoK
nuoJ
nuoI
nuoH
nuoG
nuoF
nuoE
nuoC
nuoB
nuoA
lrhA
yfbS
yfbT
ackA
pta
yfcE
yfcF
yfcG
folX
yfcI
hisP
hisM
hisQ
hisJ
cvpA
dedD
folC
accD
dedA
truA
usg
pdxB
fabB
mepA
aroC
yfcB
-
12.1
13.4
11.6
11.1
11.2
10.8
9.3
9.4
7.5
8.0
6.3
6.2
6.4
8.6
4.1
13.7
11.4
7.6
10.3
13.0
12.6
6.3
7.8
2.6
4.8
6.0
8.4
14.9
26.3
7.1
4.8
10.0
6.1
6.2
9.0
11.1
11.4
13.2
9.3
15.2
4.5
14.4
11.4
11.1
8.5
13.2
3.2
4.6
14.8
12.6
12.2
11.7
8.8
9.2
8.2
8.4
8.2
8.0
6.5
6.3
5.8
5.6
4.8
4.8
4.8
5.5
3.0
9.3
7.0
5.3
7.2
7.6
9.6
5.2
5.7
1.9
3.8
4.5
5.5
9.0
15.4
3.8
3.5
6.2
5.2
4.9
6.4
7.3
7.7
8.1
6.5
8.7
3.7
8.6
7.0
7.1
6.2
7.4
2.1
3.1
7.6
7.1
7.2
6.5
19.5
24.1
20.1
16.4
17.8
16.9
16.4
18.6
10.4
14.0
9.2
8.7
9.8
19.9
6.5
26.0
30.1
12.9
17.6
43.1
18.5
8.0
12.6
4.0
6.6
9.2
17.7
44.6
89.0
56.6
7.7
25.2
7.4
8.5
15.2
22.8
22.5
36.1
16.2
57.8
5.8
42.6
32.0
26.3
13.3
63.1
6.6
8.9
303.6
60.1
40.3
57.4
b2340
b2341
b2342
b2343
b2345
b2346
b2347
b2350
b2351
b2353
b2356
b2358
b2361
b2366
b2368
b2369
b2370
b2375
b2377
b2378
b2379
b2380
b2381
b2382
b2383
b2384
b2386
b2388
b2392
b2393
b2394
b2395
b2398
b2399
b2400
b2405
b2406
b2410
b2411
b2412
b2413
b2414
b2415
b2416
b2417
b2418
b2420
b2421
b2423
b2425
b2426
b2427
vacJ
yfdC
yfdM
yfdO
dsdA
emrK
evgA
evgS
ddg
glk
nupC
yi81_3
yfeA
yfeC
yfeD
gltX
xapR
xapB
yfeH
lig
zipA
cysZ
cysK
ptsH
ptsI
crr
pdxK
cysM
cysW
cysP
ucpA
yfeT
8.1
2.1
5.7
12.5
11.1
6.0
7.6
12.4
12.4
15.4
11.8
13.9
7.1
17.2
16.3
2.4
13.9
12.8
1.9
3.7
9.0
4.0
5.5
14.5
16.4
9.1
19.2
6.3
9.2
10.5
13.4
13.5
1.5
1.8
9.7
7.6
12.1
7.3
16.2
6.2
10.8
6.4
12.9
12.9
11.4
6.7
12.0
8.2
7.3
7.3
3.1
10.3
5.9
1.7
3.5
9.2
5.9
5.0
4.5
7.4
7.6
9.1
6.3
7.5
4.1
9.5
8.6
2.0
8.9
7.2
1.5
2.6
6.9
2.3
4.1
10.0
9.7
6.2
11.0
5.1
5.7
7.8
7.6
9.1
1.3
1.6
6.3
4.8
7.9
5.1
9.3
4.8
7.9
5.6
7.8
8.7
8.0
5.2
6.9
4.3
4.2
4.5
2.6
6.6
13.4
2.9
16.4
19.6
95.3
7.3
24.4
38.2
33.6
51.3
87.6
91.4
26.2
89.2
183.9
2.9
31.6
58.7
2.7
6.1
12.8
16.0
8.4
26.7
51.6
17.4
75.1
8.2
24.2
16.0
58.6
26.1
1.9
2.2
21.2
18.0
25.7
12.8
65.2
8.6
17.3
7.5
36.8
25.1
19.8
9.6
45.3
97.8
28.0
19.3
3.9
22.7
b2428
b2429
b2430
b2431
b2432
b2433
b2434
b2435
b2438
b2439
b2440
b2441
b2442
b2443
b2445
b2447
b2449
b2450
b2452
b2454
b2456
b2457
b2459
b2460
b2463
b2464
b2465
b2468
b2469
b2471
b2472
b2473
b2474
b2475
b2476
b2477
b2478
b2479
b2480
b2488
b2489
b2490
b2491
b2492
b2493
b2494
b2495
b2496
b2498
b2499
b2500
b2501
128
yfeU
amiA
eutC
eutB
eutH
eutJ
cchB
cchA
talA
tktB
yffG
narQ
yffB
dapE
ypfH
ypfI
purC
nlpB
dapA
gcvR
bcp
hyfH
hyfI
hyfR
focB
perM
upp
purM
purN
ppk
4.3
8.5
7.2
6.3
16.0
13.1
6.0
9.3
12.4
3.5
6.4
8.3
12.7
3.0
7.4
18.0
4.0
9.8
22.3
11.1
20.8
13.7
9.3
14.9
9.5
6.1
9.8
19.1
15.4
8.0
8.1
9.1
8.7
13.3
14.7
9.7
6.5
3.5
7.3
7.4
14.5
17.7
12.6
9.0
5.9
11.0
12.0
7.8
9.7
8.3
7.1
6.7
3.1
5.8
5.8
5.0
10.3
9.6
4.3
5.3
8.8
2.1
4.6
6.6
7.4
2.0
3.7
10.2
2.3
7.6
11.4
6.5
10.6
9.1
5.7
9.2
7.4
4.7
7.1
10.9
9.2
6.4
6.1
6.1
6.3
7.6
10.5
7.5
5.3
2.7
5.6
4.4
8.1
10.7
8.7
5.3
3.2
7.8
8.1
5.8
6.8
4.4
5.2
4.7
7.2
15.3
9.4
8.3
35.5
20.3
10.2
35.6
21.3
11.9
10.5
11.2
46.8
5.6
308.0
77.9
16.4
13.6
470.9
37.8
541.1
27.4
24.6
39.3
13.2
8.8
16.1
80.2
47.3
10.7
12.1
17.6
14.2
56.1
24.4
13.5
8.4
5.1
10.4
22.5
69.5
51.5
22.9
31.3
49.1
18.1
23.4
11.8
17.1
76.9
11.0
11.4
b2502
b2504
b2506
b2508
b2511
b2512
b2513
b2514
b2515
b2516
b2518
b2519
b2520
b2521
b2522
b2523
b2524
b2525
b2526
b2527
b2528
b2529
b2530
b2531
b2532
b2533
b2534
b2535
b2536
b2538
b2541
b2542
b2543
b2544
b2546
b2548
b2549
b2551
b2552
b2553
b2554
b2555
b2556
b2557
b2559
b2560
b2561
b2562
b2563
b2564
b2565
b2567
ppx
guaB
hisS
gcpE
yfgA
ndk
pbpC
sseA
sseB
pepB
yfhJ
fdx
hscA
yfhE
yfhF
yfhO
suhB
csiE
hcaT
hcaA1
hcaB
hcaD
yphA
yphB
yphD
yphF
yphG
glyA
hmpA
glnB
yfhA
yfhG
yfhK
purL
yfhC
yfhB
yfhH
yfhL
acpS
pdxJ
recO
rnc
8.6
9.7
8.9
7.6
21.0
13.2
9.3
8.9
6.3
5.7
6.1
11.4
8.6
4.8
10.9
10.7
11.6
9.7
8.9
7.1
9.8
9.1
10.8
4.4
2.8
4.4
17.3
1.3
16.1
5.8
20.4
6.4
7.4
17.9
12.0
13.0
5.7
8.0
9.9
6.2
14.3
11.1
6.1
10.6
6.0
3.7
9.5
22.4
8.3
7.9
12.0
7.9
5.8
6.7
6.0
5.8
13.0
7.2
5.9
6.2
4.9
4.9
4.0
6.6
5.7
3.5
7.1
7.9
8.5
6.8
7.0
3.9
7.5
6.5
7.9
3.1
2.3
2.6
10.0
1.1
9.7
3.5
12.7
3.4
5.1
9.3
6.7
7.9
3.7
6.1
7.2
3.7
9.3
7.3
4.8
7.2
4.0
2.8
6.4
12.5
6.6
5.3
9.4
5.9
16.5
17.5
16.8
11.0
54.5
86.7
21.9
16.1
8.8
7.0
13.4
41.9
17.3
7.3
23.2
16.9
18.2
16.6
12.4
42.1
14.0
15.3
17.1
7.5
3.7
14.4
63.5
1.5
47.1
15.6
51.4
43.4
13.8
279.6
56.3
37.7
12.6
11.6
15.9
19.3
31.8
23.2
8.4
20.2
11.9
5.4
18.9
108.6
11.3
15.1
16.6
12.1
b2569
b2570
b2571
b2572
b2573
b2575
b2576
b2577
b2578
b2579
b2580
b2581
b2583
b2584
b2585
b2587
b2592
b2593
b2594
b2595
b2596
b2599
b2600
b2601
b2602
b2603
b2604
b2605
b2606
b2607
b2608
b2609
b2610
b2611
b2612
b2613
b2614
b2615
b2616
b2617
b2618
b2619
b2620
b2622
b2627
b2629
b2630
b2631
b2638
b2640
b2643
b2645
129
lepA
rseC
rseB
rseA
rpoE
yfiC
srmB
yfiE
yfiK
yfiD
ung
yfiF
yfiP
yfiQ
pssA
kgtP
clpB
yfiH
sfhB
pheA
tyrA
aroF
yfiL
yfiN
yfiB
rplS
trmD
yfjA
rpsP
ffh
ypjE
yfjD
grpE
yfjB
recN
smpA
smpB
intA
yfjK
yfjM
yfjN
yfjO
yfjX
yfjZ
11.6
9.5
10.1
10.9
8.0
20.2
11.5
4.4
12.5
10.6
5.1
13.8
21.3
5.1
8.7
5.3
9.4
6.1
7.8
8.0
1.5
8.7
15.2
3.6
6.3
4.8
6.2
9.8
19.1
11.7
13.8
8.6
5.8
3.5
8.0
8.6
10.0
6.8
6.4
8.1
7.0
5.3
4.4
3.1
10.4
4.1
6.9
8.2
13.3
9.1
16.9
2.2
7.6
5.4
6.9
6.7
5.6
10.9
7.6
3.2
7.6
7.6
3.7
8.9
12.0
4.2
6.1
4.3
7.1
3.6
6.1
5.4
0.8
5.0
8.4
2.5
4.0
3.5
3.8
6.2
14.8
7.5
7.8
6.9
4.9
2.3
6.0
5.5
6.8
4.9
4.2
5.9
4.7
4.1
3.7
2.7
7.1
2.2
5.2
6.3
7.6
5.7
9.8
1.2
25.2
39.6
19.1
27.9
14.0
132.8
23.4
6.9
35.9
17.6
8.3
30.4
94.3
6.3
15.2
6.8
13.8
19.6
10.7
15.5
14.4
33.8
78.9
6.4
13.9
7.8
16.5
23.6
26.8
26.4
57.1
11.3
7.1
7.1
12.0
18.9
19.1
11.1
13.5
13.1
14.4
7.5
5.4
3.7
19.6
24.2
10.3
11.6
52.2
22.5
58.9
18.0
b2647
b2660
b2661
b2662
b2663
b2664
b2665
b2666
b2667
b2669
b2670
b2672
b2674
b2676
b2677
b2679
b2680
b2682
b2683
b2684
b2686
b2687
b2688
b2689
b2690
b2696
b2697
b2698
b2699
b2700
b2701
b2703
b2704
b2705
b2706
b2707
b2708
b2709
b2712
b2714
b2715
b2717
b2718
b2719
b2722
b2724
b2726
b2727
b2728
b2729
b2730
b2731
ypjA
ygaF
gabD
gabT
gabP
ygaE
ygaU
stpA
ygaM
nrdI
nrdF
proV
proX
ygaH
emrR
emrB
ygaG
gshA
yqaB
csrA
alaS
oraA
recA
ygaD
mltB
srlE
srlB
srlD
gutM
srlR
gutQ
ygaA
hypF
ascG
ascF
hycI
hycH
hycG
hycD
hycB
hypA
hypB
hypC
hypD
hypE
fhlA
9.6
4.5
2.9
6.8
6.4
11.8
8.2
4.8
1.4
15.9
3.9
16.8
9.4
12.4
4.1
6.8
11.0
3.6
5.9
7.1
9.8
12.3
9.6
7.2
4.4
7.2
8.0
9.0
6.6
7.2
10.9
11.2
17.3
23.3
6.3
6.8
9.9
10.4
14.7
12.8
15.7
7.0
11.2
8.4
16.5
9.2
5.4
8.7
3.2
9.9
9.8
12.7
5.5
3.2
2.2
4.6
4.2
7.2
5.7
3.5
0.7
8.5
2.5
10.4
6.2
6.9
2.6
5.5
7.2
2.4
4.8
5.5
5.7
9.3
7.0
5.2
2.8
5.2
6.6
4.9
4.8
5.4
8.0
8.3
11.1
13.0
5.1
5.3
6.7
6.7
8.3
8.2
8.1
4.5
6.4
4.8
8.3
5.7
3.6
5.6
1.8
6.5
5.6
8.0
38.6
7.4
4.3
13.0
12.9
33.0
14.6
7.9
15.7
133.3
8.8
43.6
19.2
61.7
9.3
9.0
23.7
8.1
7.7
10.2
35.5
18.0
15.1
11.5
10.1
11.6
10.1
54.0
10.7
11.1
16.8
17.1
39.2
117.0
8.3
9.5
19.5
23.7
65.5
29.2
210.8
15.3
44.4
34.5
1339.3
23.9
11.0
19.0
12.2
20.5
37.9
31.7
b2732
b2733
b2735
b2736
b2738
b2739
b2741
b2742
b2743
b2744
b2745
b2746
b2747
b2748
b2749
b2754
b2755
b2756
b2759
b2760
b2761
b2762
b2763
b2764
b2766
b2767
b2768
b2769
b2771
b2773
b2775
b2776
b2777
b2780
b2781
b2782
b2783
b2784
b2785
b2786
b2787
b2788
b2789
b2790
b2791
b2792
b2793
b2794
b2795
b2796
b2797
b2798
130
ygbA
mutS
ygbI
ygbL
ygbM
rpoS
nlpD
pcm
surE
ygbO
ygbB
ygbP
ygbE
ygbF
ygcK
ygcB
cysH
cysI
cysJ
ygcN
ygcO
ygcP
ygcQ
ygcS
ygcU
yqcE
ygcE
ygcF
pyrG
mazG
chpA
chpR
relA
ygcA
barA
ygcX
ygcY
yqcB
syd
yqcD
ygdH
sdaC
sdaB
exo
11.9
8.4
10.3
16.0
8.5
12.2
10.1
8.7
6.4
2.2
3.1
12.8
14.2
6.4
11.2
18.0
9.8
16.8
6.6
6.2
11.7
9.2
6.8
12.5
7.5
5.6
20.7
11.6
23.4
11.7
2.0
10.9
2.7
6.8
12.2
11.0
8.2
6.6
5.5
10.6
13.3
5.8
3.4
5.8
4.1
7.5
4.5
4.7
7.9
7.7
13.2
9.1
6.6
6.5
7.2
10.2
5.8
7.9
6.5
5.6
5.3
1.2
2.5
8.5
8.6
4.6
6.9
9.1
6.1
9.4
3.9
3.5
7.5
5.7
4.8
7.7
5.1
3.1
12.5
7.0
12.6
7.7
1.1
6.7
2.2
5.8
7.4
7.1
6.4
4.6
4.5
6.0
8.4
4.6
2.6
4.1
2.6
5.3
3.0
3.8
5.6
5.8
8.2
6.2
59.6
11.6
17.9
37.8
15.9
26.0
21.9
19.6
8.2
13.3
4.2
25.6
41.3
10.2
30.0
690.5
25.1
81.9
20.6
24.7
26.4
23.4
11.6
34.0
14.4
27.3
60.3
34.8
167.2
24.3
17.4
28.6
3.7
8.4
35.0
23.9
11.3
11.4
7.1
44.9
31.6
8.0
4.8
9.9
10.5
13.1
8.5
6.2
13.2
11.7
33.6
17.1
b2799
b2800
b2802
b2803
b2804
b2805
b2806
b2809
b2810
b2811
b2813
b2817
b2818
b2819
b2820
b2821
b2822
b2824
b2825
b2826
b2827
b2829
b2830
b2833
b2834
b2836
b2837
b2838
b2839
b2840
b2841
b2843
b2845
b2847
b2859
b2863
b2865
b2869
b2875
b2876
b2877
b2887
b2888
b2889
b2890
b2891
b2892
b2893
b2894
b2895
b2896
b2897
fucO
fucA
fucI
fucK
fucU
fucR
ygdE
ygdK
mltA
argA
recD
recB
ptr
recC
ygdB
ppdB
ppdA
thyA
ptsP
ygdP
aas
galR
lysA
lysR
ygeA
araE
kduI
yqeI
ygeV
ygfJ
ygfT
ygfU
lysS
prfB
recJ
dsbC
xerD
fldB
ygfY
6.0
7.5
10.9
10.4
11.2
3.8
15.1
12.2
5.6
5.8
5.2
26.7
18.6
13.2
16.2
7.9
10.0
16.2
11.0
14.6
15.0
5.5
2.2
9.1
5.6
6.6
2.3
13.7
16.0
9.8
12.1
7.7
7.8
10.9
13.8
8.2
15.4
3.4
9.4
7.3
1.9
16.8
18.2
2.8
13.5
10.6
12.4
7.8
4.6
9.5
8.7
8.2
4.0
5.2
6.7
6.8
8.1
2.9
10.1
7.6
4.1
4.1
3.5
15.0
10.8
8.1
10.8
5.8
7.6
9.9
6.6
8.1
9.3
3.7
1.9
5.6
4.4
5.1
1.7
9.7
8.1
5.6
7.3
5.9
4.0
6.5
7.6
4.1
10.0
2.9
7.0
5.2
1.4
11.2
9.3
1.9
9.6
7.8
8.6
6.0
3.5
6.3
5.7
5.9
12.1
13.5
28.8
22.1
18.3
5.4
29.6
31.2
8.6
9.5
10.7
120.5
67.6
35.9
32.9
12.1
14.5
45.4
34.0
73.6
38.5
11.3
2.8
24.1
7.7
9.4
3.4
23.4
897.2
38.6
36.0
10.9
183.2
33.5
75.2
1449.4
33.2
4.2
14.3
12.1
2.7
33.5
405.6
5.8
22.7
16.6
22.6
11.1
6.7
18.5
17.7
13.7
b2898
b2899
b2900
b2901
b2903
b2905
b2906
b2907
b2908
b2909
b2910
b2912
b2913
b2914
b2916
b2919
b2921
b2922
b2923
b2924
b2925
b2927
b2928
b2929
b2930
b2934
b2935
b2936
b2937
b2938
b2939
b2941
b2942
b2946
b2947
b2948
b2949
b2950
b2951
b2952
b2953
b2954
b2955
b2957
b2958
b2959
b2960
b2961
b2962
b2963
b2964
b2965
131
ygfZ
yqfB
bglA
gcvP
gcvT
visC
ubiH
pepP
ygfB
ygfE
ygfA
serA
rpiA
iciA
ygfG
ygfI
yggE
yggA
yggB
fba
epd
yggC
yggD
yggF
cmtB
tktA
yggG
speB
speA
yqgB
yqgD
metK
yggJ
gshB
yqgE
yqgF
yggR
yggS
yggT
yggU
yggV
yggW
ansB
yggN
yggL
yggH
mutY
yggX
mltC
nupG
speC
6.2
2.6
6.5
10.6
12.4
7.2
8.8
4.6
4.6
3.8
4.3
3.9
8.7
7.6
6.0
18.5
6.2
5.4
6.1
18.3
14.1
5.7
6.6
6.2
3.9
12.2
13.0
6.0
18.0
9.7
19.7
16.2
2.9
3.8
6.8
5.6
8.5
3.9
5.5
5.2
11.5
10.8
16.0
15.0
17.4
10.1
6.3
10.2
8.6
10.9
2.7
18.0
4.6
1.8
4.5
6.8
8.5
5.1
6.5
3.5
4.0
3.2
3.4
2.9
6.9
5.8
4.2
12.3
3.8
4.1
4.2
12.3
10.6
4.7
4.8
4.6
2.0
8.4
8.7
4.4
11.4
6.8
10.1
9.7
2.1
3.0
5.6
4.5
6.5
2.4
4.3
4.1
8.4
7.2
9.8
10.1
11.0
7.3
4.5
7.4
6.8
7.5
2.1
10.6
9.6
4.5
11.6
24.7
22.4
12.0
13.6
6.7
5.6
4.8
5.6
6.1
11.8
10.8
10.5
37.6
16.2
7.8
11.5
36.0
21.2
7.3
10.6
9.8
124.9
22.3
25.8
9.6
42.7
16.8
386.8
48.5
4.6
5.0
8.7
7.6
12.3
11.9
7.5
7.2
18.5
21.5
42.8
29.6
40.8
16.3
10.5
16.7
11.8
19.8
3.6
59.4
b2968
b2974
b2977
b2978
b2981
b2984
b2988
b2989
b2992
b2993
b2994
b2995
b2996
b2997
b3000
b3001
b3002
b3003
b3005
b3007
b3008
b3009
b3011
b3012
b3015
b3016
b3017
b3018
b3020
b3021
b3022
b3023
b3024
b3025
b3026
b3028
b3029
b3031
b3032
b3033
b3034
b3035
b3036
b3037
b3038
b3039
b3040
b3041
b3042
b3049
b3051
b3052
yghD
glcG
glcF
yghR
gsp
hybE
hybD
hybC
hybB
hybA
yqhA
yghA
exbD
metC
yghB
yqhD
yqhE
ygiR
sufI
plsC
ygiW
ygiX
ygiY
mdaB
ygiN
yqiA
icc
yqiB
yqiE
tolC
ygiA
ygiB
ygiC
ygiD
ygiE
ribB
glgS
-
8.3
13.8
15.5
8.8
10.7
10.4
9.2
8.9
16.8
12.8
12.2
9.1
7.7
4.4
10.0
5.4
6.1
11.2
1.5
8.1
16.3
8.6
7.8
8.5
15.2
14.5
12.2
4.9
7.6
2.6
3.2
9.8
14.0
8.0
13.3
5.0
6.3
6.5
5.3
5.4
7.0
7.1
9.0
5.2
5.7
17.7
3.1
5.6
4.8
10.0
17.3
11.7
4.9
7.5
8.7
6.1
5.3
6.6
6.7
6.7
8.8
7.4
6.5
6.1
6.2
3.8
6.4
4.4
3.3
6.6
0.8
4.3
9.7
6.5
4.9
5.6
8.5
9.1
6.1
3.6
6.2
1.8
2.4
7.1
9.8
5.3
7.1
3.4
4.6
5.2
4.9
4.2
5.3
6.2
6.2
4.4
3.4
10.8
2.3
4.1
3.8
6.3
9.7
7.3
28.1
82.1
70.5
16.0
3675.0
23.9
14.8
13.0
160.6
45.6
102.3
17.9
10.1
5.2
22.9
7.0
35.3
37.6
5.6
78.7
50.6
12.9
19.2
17.9
72.2
36.1
1038.8
7.6
9.8
4.6
4.8
16.0
24.4
17.0
113.0
9.2
9.9
8.7
5.7
7.6
10.4
8.3
16.3
6.3
17.6
48.8
4.4
8.6
6.5
23.4
81.9
29.2
b3053
b3054
b3055
b3056
b3057
b3058
b3064
b3065
b3066
b3067
b3068
b3070
b3071
b3072
b3074
b3075
b3080
b3082
b3083
b3085
b3086
b3087
b3091
b3092
b3093
b3094
b3095
b3096
b3097
b3099
b3100
b3101
b3102
b3103
b3107
b3108
b3109
b3110
b3115
b3116
b3117
b3122
b3124
b3127
b3128
b3129
b3130
b3131
b3133
b3135
b3136
b3139
132
glnE
ygiF
ygiM
cca
bacA
ygiG
ygjD
rpsU
dnaG
rpoD
ygjF
yqjH
yqjI
aer
ygjH
ebgR
ygjK
ygjM
ygjN
ygjP
ygjQ
ygjR
uxaA
uxaC
exuT
exuR
yqjA
yqjB
yqjC
yqjE
yqjF
yqjG
yhaH
yhaL
yhaM
yhaN
yhaO
tdcD
tdcC
tdcB
yhaD
yhaU
yhaG
sohA
yhaV
agaR
agaV
agaA
agaS
agaC
11.6
3.3
15.8
13.5
7.2
4.9
5.6
7.5
7.7
20.1
3.2
9.1
9.3
10.8
9.2
13.9
11.9
9.1
8.9
9.7
15.0
5.2
4.7
7.7
3.1
4.4
6.3
7.5
11.5
7.0
14.9
11.7
9.9
7.0
9.5
9.6
6.2
10.0
12.0
9.2
8.2
12.8
12.3
3.1
4.3
3.0
4.9
3.3
4.4
9.6
10.2
4.2
7.8
2.3
9.2
9.2
5.1
3.6
3.7
4.6
5.5
10.5
2.4
5.7
6.5
7.0
6.1
8.7
7.3
5.5
5.9
5.9
10.1
3.8
2.5
5.4
2.2
3.3
4.7
6.0
6.4
5.2
11.1
8.5
7.8
5.3
6.9
7.0
4.3
7.1
7.1
6.2
5.6
10.5
6.8
2.0
2.2
2.3
4.0
2.4
2.3
6.4
6.2
2.8
23.1
5.7
55.8
25.4
12.4
7.6
11.9
20.6
12.8
246.8
4.8
21.6
16.5
23.2
19.4
35.1
32.5
24.6
17.9
27.3
29.2
8.2
59.7
13.3
5.2
6.3
9.5
9.9
55.2
10.9
22.6
18.7
13.5
10.1
15.1
15.3
10.7
17.0
38.1
18.5
15.1
16.5
64.0
7.2
645.5
4.4
6.4
5.4
33.4
19.0
28.1
8.8
b3141
b3142
b3148
b3149
b3150
b3151
b3152
b3153
b3154
b3155
b3156
b3157
b3160
b3162
b3163
b3164
b3165
b3166
b3167
b3168
b3169
b3170
b3172
b3173
b3175
b3176
b3177
b3178
b3179
b3180
b3181
b3182
b3184
b3185
b3186
b3187
b3188
b3189
b3190
b3191
b3192
b3193
b3194
b3195
b3197
b3198
b3199
b3200
b3201
b3202
b3203
b3204
agaI
yraH
yraN
yraO
yraP
yraQ
yraR
yhbO
yhbP
yhbQ
yhbS
yhbT
yhbW
deaD
yhbM
pnp
rpsO
truB
rbfA
infB
nusA
yhbC
argG
yhbX
secG
mrsA
folP
hflB
ftsJ
yhbY
greA
dacB
yhbE
rpmA
rplU
ispB
nlp
murA
yrbA
yrbB
yrbC
yrbD
yrbE
yrbF
yrbH
yrbI
yrbK
yhbN
yhbG
rpoN
yhbH
ptsN
5.6
11.3
6.5
8.3
7.2
15.3
16.2
9.2
12.0
15.3
8.7
4.9
6.0
20.9
5.0
8.9
9.8
11.1
8.5
11.9
8.5
4.1
9.1
2.0
4.9
6.5
2.7
11.3
7.4
8.4
13.4
7.3
7.3
7.4
9.9
5.6
11.0
13.7
8.8
9.8
7.2
8.1
12.9
12.9
4.5
9.6
10.0
9.6
11.6
11.8
10.8
11.8
3.6
7.4
4.4
6.0
5.3
9.8
10.2
6.2
7.8
7.8
6.0
3.8
3.4
11.3
3.4
5.7
6.4
6.8
6.2
8.7
6.5
3.3
6.7
1.0
3.6
5.4
1.9
7.7
5.8
5.5
7.5
4.7
4.7
4.3
6.2
4.6
6.7
9.6
5.9
7.5
5.7
6.3
9.6
8.4
2.3
5.8
5.3
4.9
6.5
6.9
6.4
6.8
11.9
24.3
12.8
13.3
11.2
34.8
39.3
17.7
25.6
416.4
15.5
6.9
24.6
138.2
9.5
20.0
21.5
29.5
13.2
19.1
12.3
5.3
14.0
29.0
7.7
8.3
4.6
21.0
10.3
18.6
64.8
16.8
16.3
25.7
24.4
7.3
29.2
23.5
17.3
14.1
9.9
11.7
19.8
27.1
99.6
27.8
87.1
255.5
55.9
42.3
34.1
42.3
b3205
b3206
b3207
b3208
b3209
b3210
b3212
b3213
b3216
b3217
b3220
b3221
b3222
b3223
b3225
b3226
b3228
b3229
b3230
b3231
b3232
b3233
b3234
b3235
b3237
b3239
b3241
b3243
b3244
b3245
b3246
b3247
b3248
b3249
b3250
b3251
b3252
b3253
b3255
b3256
b3257
b3259
b3260
b3261
b3263
b3267
b3268
b3279
b3281
b3282
b3283
b3284
133
yhbJ
ptsO
yrbL
mtgA
yhbL
arcB
gltB
gltD
yhcD
yhcE
yhcG
yhcH
yhcI
yhcJ
nanA
yhcK
sspB
sspA
rpsI
rplM
yhcM
yhcB
degQ
degS
argR
yhcO
yhcQ
yhcS
tldD
yhdP
yhdR
cafA
yhdE
mreD
mreC
mreB
yhdA
yhdH
accB
accC
yhdT
prmA
yhdG
fis
yhdU
yhdV
yhdW
yrdA
aroE
yrdC
yrdD
smg
11.3
20.3
3.8
10.1
8.9
6.9
8.3
14.2
10.0
17.1
20.2
6.8
3.6
4.9
4.4
4.7
7.6
7.4
9.6
9.7
8.4
8.1
13.5
17.1
4.3
7.6
7.1
8.7
7.7
14.7
10.6
10.7
7.3
6.4
15.3
11.6
14.7
4.7
8.6
12.9
9.3
14.1
12.4
13.9
12.1
19.0
16.1
7.3
14.5
13.5
7.6
5.6
7.5
10.6
2.9
6.5
6.5
5.2
5.7
9.7
6.5
10.8
11.6
3.9
2.7
3.5
3.4
3.9
5.3
5.8
7.1
6.7
5.2
5.2
8.2
8.8
2.6
4.4
4.2
5.7
5.9
9.7
6.8
6.7
5.4
4.7
8.8
7.4
9.5
3.6
6.9
9.3
5.3
9.8
8.0
8.2
6.8
10.7
8.1
4.1
8.3
7.4
5.3
4.2
22.7
225.4
5.3
22.0
14.3
10.1
15.5
26.2
22.3
40.4
79.2
24.1
5.4
8.5
6.3
5.8
13.5
10.2
14.6
17.4
21.7
18.5
37.1
289.5
12.5
28.6
22.7
18.0
11.2
30.0
23.9
26.9
11.3
9.9
56.4
26.5
32.8
6.8
11.3
21.1
36.9
25.1
27.7
45.3
54.7
85.8
683.1
32.5
60.2
75.4
13.6
8.6
b3285
b3286
b3287
b3288
b3289
b3290
b3291
b3292
b3293
b3294
b3295
b3296
b3297
b3298
b3300
b3301
b3302
b3303
b3304
b3305
b3306
b3307
b3308
b3309
b3317
b3318
b3319
b3320
b3321
b3322
b3323
b3325
b3326
b3327
b3329
b3330
b3335
b3336
b3337
b3340
b3341
b3342
b3343
b3344
b3345
b3346
b3348
b3350
b3354
b3355
b3356
b3357
smf_2
smf_1
def
fmt
sun
trkA
mscL
yhdM
yhdN
rplQ
rpoA
rpsD
rpsK
rpsM
prlA
rplO
rpmD
rpsE
rplR
rplF
rpsH
rpsN
rplE
rplX
rplB
rplW
rplD
rplC
rpsJ
pinO
yheD
yheF
yheG
hofF
hofH
yheH
hofD
bfr
yheA
fusA
rpsG
rpsL
yheL
yheM
yheN
yheO
slyX
kefB
yheU
prkB
yhfA
crp
2.7
8.2
3.1
5.9
6.8
9.3
7.2
5.4
6.4
13.3
14.9
14.1
13.6
13.3
12.1
14.8
12.1
11.5
12.1
13.7
11.3
12.0
12.1
14.3
15.4
11.4
12.2
9.3
9.7
10.5
20.5
10.5
14.3
10.9
18.3
7.3
6.4
12.0
10.9
16.8
13.2
13.5
10.0
12.2
6.8
9.4
15.1
7.1
3.9
5.1
4.6
6.2
2.1
5.5
2.5
4.8
5.7
6.5
5.3
4.2
4.8
9.1
9.7
8.5
8.3
8.7
7.3
9.2
9.5
8.8
8.7
8.1
8.0
8.1
7.4
8.3
7.9
6.1
6.9
6.5
7.6
6.3
11.8
5.7
8.3
6.3
9.8
4.8
4.0
7.9
5.6
10.8
9.2
8.6
7.8
9.2
4.5
6.7
10.0
3.8
2.3
3.1
3.2
5.0
3.8
16.3
4.0
7.8
8.6
16.3
11.5
7.6
9.5
25.0
32.1
42.4
38.0
28.5
36.2
36.8
16.6
16.4
19.8
43.6
18.9
23.3
33.0
53.7
291.6
81.1
54.5
16.2
13.4
30.2
77.0
57.8
51.6
40.5
142.9
14.8
16.7
25.4
194.8
37.5
23.4
31.5
13.7
18.1
13.9
15.4
30.8
64.3
11.5
14.7
7.7
8.2
b3358
b3359
b3360
b3361
b3362
b3363
b3366
b3368
b3372
b3373
b3374
b3375
b3377
b3378
b3380
b3382
b3384
b3390
b3392
b3395
b3396
b3397
b3398
b3399
b3402
b3403
b3404
b3405
b3406
b3407
b3408
b3411
b3412
b3413
b3414
b3415
b3416
b3417
b3418
b3419
b3420
b3423
b3424
b3425
b3426
b3428
b3429
b3430
b3431
b3432
b3433
b3434
134
yhfK
argD
pabA
fic
yhfG
ppiA
nirD
cysG
yhfO
yhfP
yhfQ
yhfR
yhfT
yhfU
yhfW
yhfY
trpS
aroK
yrfA
yrfD
mrcA
yrfE
yrfF
yrfG
yhgE
pckA
envZ
ompR
greB
yhgF
feoA
yhgA
bioH
yhgH
yhgI
gntT
malQ
malP
malT
yhgJ
rtcA
glpR
glpG
glpE
glpD
glgP
glgA
glgC
glgX
glgB
asd
yhgN
17.2
5.9
6.8
2.9
2.1
6.0
8.9
12.6
15.6
10.1
9.6
2.7
11.8
8.4
16.3
9.0
19.1
10.1
16.4
9.0
7.3
7.2
7.0
4.9
4.0
14.1
5.9
6.1
3.9
7.8
2.4
12.7
5.0
5.4
7.7
8.1
10.2
11.1
6.1
22.2
14.4
14.5
6.5
6.5
9.4
17.0
9.3
7.8
6.1
6.3
6.8
13.1
11.4
4.2
4.5
2.1
1.5
4.4
4.8
8.3
8.1
7.0
5.8
2.0
7.8
5.3
9.9
4.7
10.2
5.7
9.1
5.4
4.4
4.9
3.8
3.6
2.7
10.3
4.0
5.0
2.7
5.0
1.4
8.3
3.8
4.3
6.2
4.6
6.7
7.2
4.1
11.7
7.7
7.8
4.6
3.7
6.0
9.9
7.0
5.9
4.6
5.3
5.5
8.9
34.8
9.8
13.6
4.8
3.4
9.3
56.9
26.5
215.5
17.8
28.6
4.2
24.7
20.4
46.3
94.4
142.2
42.9
83.1
27.0
20.5
13.4
42.1
7.5
7.3
22.2
12.0
7.8
7.0
17.3
10.4
27.4
7.4
7.3
10.1
35.5
21.1
25.0
12.4
214.9
109.5
98.7
11.1
27.3
21.0
58.7
13.9
11.5
9.2
7.9
9.1
24.7
b3435
b3438
b3439
b3440
b3443
b3444
b3447
b3448
b3450
b3452
b3457
b3459
b3460
b3461
b3463
b3464
b3465
b3466
b3467
b3468
b3469
b3471
b3472
b3473
b3474
b3475
b3476
b3477
b3478
b3479
b3481
b3483
b3487
b3488
b3493
b3494
b3496
b3497
b3498
b3499
b3500
b3501
b3502
b3503
b3506
b3507
b3508
b3509
b3510
b3511
b3512
b3513
gntU_2
gntR
yhhW
yhhX
yrhA
insA_6
ggt
yhhA
ugpC
ugpA
livH
yhhK
livJ
rpoH
ftsE
ftsY
yhhF
yhhL
yhhM
yhhN
zntA
yhhQ
yhhS
yhhT
yhhU
nikA
nikB
nikC
nikD
yhhG
yhhH
yhiI
yhiJ
pitA
yhiO
yhiP
yhiQ
prlC
yhiR
gor
arsR
arsB
arsC
slp
yhiF
yhiD
hdeB
hdeA
hdeD
yhiE
yhiU
18.8
8.5
7.8
10.8
10.3
10.2
9.6
2.8
10.0
11.1
13.3
13.5
8.8
6.8
8.8
4.9
4.7
6.2
8.5
2.5
8.5
11.9
5.5
7.1
3.7
3.8
18.1
9.7
14.7
10.6
11.7
10.0
7.4
8.1
13.9
1.4
6.2
12.8
16.4
7.8
11.1
6.6
8.5
8.6
5.6
6.9
9.8
8.3
8.9
3.0
2.0
12.0
10.2
6.4
5.7
7.1
5.7
7.0
5.7
2.0
7.4
6.4
7.4
9.7
5.2
4.8
6.6
4.2
3.6
3.9
5.9
1.5
6.5
6.5
4.2
4.6
2.8
2.3
10.1
7.0
9.3
8.3
8.2
5.3
5.2
4.5
10.2
1.2
5.2
9.0
11.8
5.1
8.5
3.6
5.0
5.4
3.8
4.0
7.1
6.3
6.3
2.5
1.5
7.8
120.0
12.6
12.6
22.0
53.8
18.8
30.0
4.4
15.6
41.1
64.3
22.0
30.8
11.8
13.2
6.0
6.6
14.7
15.1
7.1
12.3
68.6
8.1
15.7
5.6
11.3
87.0
15.8
34.8
14.8
20.0
82.9
12.3
46.8
21.5
1.6
7.6
22.0
26.8
15.9
15.9
40.3
26.7
20.9
10.5
25.1
15.9
12.4
14.9
3.7
3.1
26.1
b3514
b3515
b3516
b3518
b3519
b3520
b3521
b3522
b3523
b3524
b3526
b3527
b3528
b3529
b3533
b3534
b3535
b3536
b3537
b3538
b3540
b3541
b3542
b3543
b3544
b3549
b3554
b3555
b3556
b3559
b3560
b3562
b3565
b3566
b3567
b3569
b3570
b3571
b3581
b3582
b3588
b3589
b3590
b3591
b3592
b3597
b3598
b3599
b3600
b3601
b3602
b3603
135
yhiV
yhiW
yhiX
yhjA
treF
yhjB
yhjC
yhjD
yhjE
yhjG
kdgK
yhjJ
dctA
yhjK
yhjO
yhjQ
yhjR
yhjS
yhjT
yhjU
dppF
dppD
dppC
dppB
dppA
tag
yiaF
yiaG
cspA
glyS
glyQ
yiaA
xylA
xylF
xylG
xylR
bax
malS
sgbH
sgbU
aldB
yiaY
selB
selA
yibF
yibH
yibI
mtlA
mtlD
mtlR
yibL
lldP
10.0
2.0
1.8
11.5
10.2
13.6
9.1
3.8
17.9
6.0
6.4
9.0
4.1
10.8
11.6
14.2
5.5
5.9
5.5
6.6
16.8
9.2
9.6
7.3
5.0
12.4
6.6
5.6
1.4
11.5
7.1
18.7
2.3
5.9
6.9
6.5
8.8
6.6
20.0
17.6
13.8
24.2
13.8
8.8
3.4
4.2
3.2
3.7
4.5
6.5
2.7
10.6
5.3
1.6
1.4
6.0
5.5
7.8
5.0
2.7
10.4
4.1
4.3
5.9
3.4
7.9
7.3
7.6
4.5
5.0
4.5
3.8
9.1
5.8
4.8
5.2
3.4
8.2
4.7
3.4
1.0
8.4
5.5
9.8
1.7
4.4
4.9
4.1
6.2
5.2
11.4
8.9
7.3
12.2
7.1
5.2
2.3
2.9
2.0
2.8
3.5
5.0
1.7
7.2
87.4
2.5
2.7
121.1
77.5
53.0
46.1
6.6
66.9
11.6
13.1
18.8
5.3
16.8
28.7
117.3
7.3
7.2
6.9
24.5
113.9
22.2
407.3
12.6
9.4
25.1
11.5
15.7
2.4
18.2
10.3
213.2
3.6
9.0
11.7
16.3
15.1
9.1
81.7
930.3
142.6
4073.7
233.0
28.4
6.5
7.5
9.0
5.3
6.5
9.4
6.2
20.1
b3605
b3606
b3607
b3608
b3609
b3610
b3611
b3612
b3613
b3615
b3617
b3630
b3631
b3632
b3633
b3634
b3635
b3636
b3637
b3638
b3639
b3640
b3641
b3642
b3644
b3645
b3646
b3647
b3648
b3649
b3650
b3651
b3652
b3653
b3660
b3661
b3667
b3669
b3670
b3672
b3674
b3675
b3676
b3677
b3681
b3683
b3685
b3686
b3687
b3688
b3699
b3701
lldD
yibK
cysE
gpsA
secB
grxC
yibN
yibO
yibP
yibD
kbl
rfaP
rfaG
rfaQ
kdtA
kdtB
mutM
rpmG
rpmB
radC
dfp
dut
ttk
pyrE
yicC
dinD
yicG
yicF
gmk
rpoZ
spoT
spoU
recG
gltS
yicL
nlpA
uhpC
uhpA
ilvN
ivbL
yidF
yidG
yidH
yidI
glvG
glvC
yidE
ibpB
ibpA
yidQ
gyrB
dnaN
1.2
19.3
9.9
4.3
6.2
6.6
7.7
4.7
10.4
17.5
17.6
11.7
11.1
9.2
10.2
7.7
21.1
10.9
10.3
4.4
7.0
9.1
11.3
11.6
12.1
4.9
11.2
12.6
5.3
6.4
9.8
10.0
27.8
3.5
11.2
6.3
4.2
14.2
6.3
3.3
1.5
6.2
6.1
8.7
10.2
7.9
1.3
21.9
11.7
5.0
15.8
7.6
0.8
1.9
11.8
53.5
8.2
12.5
3.6
5.5
5.4
7.3
5.3
8.6
5.9
11.1
3.6
6.8
7.3
18.5
8.8
681.8
9.8
87.9
7.7
24.8
7.5
21.3
7.1
13.2
7.6
15.6
5.1
15.3
10.8
499.9
6.8
26.2
7.1
19.0
3.3
6.4
4.9
12.3
5.7
23.0
7.7
20.6
6.8
39.1
8.1
23.6
3.6
8.0
6.5
37.9
8.6
23.7
4.4
6.8
5.3
8.1
8.0
12.4
6.7
19.8
15.2
169.0
1.9
20.8
7.0
28.7
3.6
27.5
2.2
33.1
7.1 10554.4
3.7
20.5
2.7
4.2
1.0
3.1
3.2
215.3
3.9
14.0
5.9
16.7
6.1
31.9
4.5
32.8
1.0
2.0
11.7
174.3
6.4
71.3
3.9
6.8
8.7
91.7
5.3
13.5
b3702
b3703
b3704
b3706
b3709
b3712
b3713
b3715
b3717
b3718
b3724
b3725
b3727
b3736
b3737
b3738
b3739
b3741
b3742
b3744
b3745
b3746
b3748
b3749
b3750
b3751
b3752
b3753
b3754
b3755
b3762
b3763
b3764
b3766
b3777
b3778
b3779
b3780
b3781
b3782
b3783
b3784
b3785
b3786
b3787
b3788
b3789
b3790
b3791
b3792
b3793
b3794
136
dnaA
rpmH
rnpA
thdF
tnaB
yieE
yieF
yieH
yieJ
yieK
phoU
pstB
pstC
atpF
atpE
atpB
atpI
gidA
mioC
asnA
yieM
yieN
rbsD
rbsA
rbsC
rbsB
rbsK
rbsR
yieO
yieP
yifA
pssR
yifE
ilvL
yifN
rep
gppA
rhlB
trxA
rhoL
rho
rfe
wzzE
wecB
wecC
rffG
rffH
wecD
wecE
wzxE
wecF
wecG
9.4
7.6
10.9
8.6
11.4
9.5
11.6
9.0
11.7
10.1
7.6
10.5
13.6
12.8
12.7
8.7
4.8
11.0
9.2
10.1
5.2
6.1
3.8
2.4
8.2
14.5
6.3
8.8
5.7
5.1
6.5
3.9
11.1
9.2
7.4
11.7
19.3
4.9
5.5
1.9
6.5
8.0
10.9
10.1
10.6
13.2
13.6
12.4
15.7
11.9
10.1
12.3
7.2
5.5
7.4
5.9
7.7
7.2
8.7
6.0
7.5
6.2
5.6
7.7
8.0
7.2
8.3
5.9
3.6
7.4
6.8
6.4
3.7
4.2
2.8
2.0
6.9
7.6
5.0
6.2
3.8
4.2
5.4
3.1
7.4
6.6
4.0
6.3
12.0
4.1
4.6
1.6
5.3
6.1
9.0
6.0
7.3
7.8
7.6
7.7
10.2
7.0
7.2
8.0
13.6
12.1
20.3
16.2
21.5
14.1
17.6
17.3
26.5
26.0
11.9
16.2
47.2
57.5
27.1
16.8
6.9
21.3
14.2
23.7
8.7
11.0
5.8
3.2
10.3
143.8
8.5
15.0
11.4
6.3
8.4
5.3
22.8
15.1
52.6
78.2
50.0
6.1
6.9
2.4
8.7
11.9
14.0
31.2
19.4
40.7
67.7
32.5
34.0
40.1
16.9
26.6
b3795
b3800
b3801
b3802
b3803
b3804
b3805
b3806
b3807
b3809
b3820
b3821
b3822
b3823
b3825
b3826
b3827
b3830
b3831
b3832
b3833
b3834
b3835
b3836
b3837
b3838
b3839
b3840
b3842
b3843
b3844
b3845
b3859
b3860
b3861
b3863
b3865
b3866
b3867
b3869
b3870
b3871
b3872
b3876
b3878
b3881
b3883
b3884
b3885
b3886
b3898
yifK
aslB
aslA
hemY
hemX
hemD
hemC
cyaA
cyaY
dapF
yigI
pldA
recQ
yigJ
pldB
yigL
yigM
ysgA
udp
yigN
ubiE
yigP
yigR
yigU
yigW_
1
rfaH
yigC
ubiB
fadA
yihE
dsbA
yihF
polA
yihA
yihI
hemN
glnL
glnA
yihK
yihL
yihO
yihQ
yihT
yihV
yihW
yihX
rbn
frvX
10.0
13.4
15.1
5.9
6.8
5.0
4.1
11.1
13.5
14.1
16.3
7.8
9.7
4.8
7.4
10.0
12.0
11.3
9.4
4.9
4.3
6.1
6.8
6.6
8.2
8.5
10.5
12.2
7.1
8.6
8.4
4.6
5.7
4.2
3.3
6.5
10.0
7.2
8.3
4.4
5.4
3.0
5.4
7.2
6.7
7.0
7.1
3.8
3.7
5.3
5.8
5.5
6.6
6.8
7.7
7.9
17.0
30.8
78.1
8.4
8.4
6.1
5.6
39.1
20.7
275.2
534.6
36.7
47.7
12.8
12.1
16.4
57.8
29.4
13.9
6.8
5.2
7.1
8.4
8.1
10.6
11.1
16.7
26.8
4.7
6.1
9.3
7.9
11.2
16.6
11.9
5.5
6.3
3.1
8.8
8.1
10.8
11.8
6.0
14.8
6.0
10.6
27.5
5.3
6.2
6.9
8.5
2.5
4.3
5.7
4.2
6.2
9.8
7.9
4.3
4.9
2.3
6.8
6.2
6.6
6.5
4.4
7.6
3.9
7.1
14.9
3.7
4.8
3.9
4.8
46.3
10.1
25.9
65.6
56.0
53.3
24.8
7.9
8.9
5.0
12.6
11.6
30.3
67.7
9.7
299.9
13.5
21.1
176.8
9.5
8.6
31.6
36.9
b3899
b3900
b3902
b3903
b3904
b3905
b3906
b3908
b3909
b3910
b3911
b3912
b3913
b3914
b3915
b3916
b3917
b3918
b3919
b3920
b3921
b3922
b3933
b3934
b3935
b3936
b3937
b3938
b3939
b3942
b3943
b3945
b3946
b3947
b3949
b3950
b3952
b3954
b3955
b3956
b3957
b3958
b3974
b3981
b3982
b3983
b3984
b3985
b3986
b3991
b3993
b3995
137
frvB
frvA
rhaD
rhaA
rhaB
rhaS
rhaR
sodA
kdgT
yiiM
cpxA
cpxR
yiiP
pfkA
sbp
cdh
tpiA
yiiQ
yiiR
yiiS
ftsN
cytR
priA
rpmE
yiiX
metJ
metB
katG
yijE
gldA
talC
ptsA
frwC
frwB
pflC
yijO
yijP
ppc
argE
argC
coaA
secE
nusG
rplK
rplA
rplJ
rplL
thiG
thiE
yjaE
9.9
2.3
8.2
7.4
7.7
11.2
9.9
8.0
16.2
5.5
4.8
4.0
2.0
1.8
5.5
9.5
8.9
11.5
12.6
5.2
4.9
13.3
7.3
5.8
7.3
9.8
16.4
4.1
7.4
6.7
5.0
17.0
8.5
8.8
12.0
9.3
14.9
19.8
19.1
7.8
8.2
12.1
9.7
9.6
9.7
10.2
10.6
11.1
16.9
25.8
5.8
3.2
6.1
1.2
5.6
5.2
4.7
6.1
6.9
6.1
9.4
4.2
3.4
3.3
1.4
1.3
4.4
7.0
6.0
7.9
10.2
3.3
3.4
7.7
5.4
4.4
4.0
6.9
10.0
3.1
4.8
3.9
2.8
11.3
6.1
5.4
8.4
4.9
8.4
12.3
12.3
5.9
5.7
6.4
6.2
6.8
6.7
7.0
7.2
7.2
9.3
13.4
3.7
2.3
25.9
21.5
15.7
12.6
21.1
77.7
17.6
11.6
58.1
8.1
8.7
5.0
3.1
3.2
7.2
15.1
17.5
21.6
16.3
12.6
9.3
49.1
11.2
8.5
42.2
16.4
45.3
5.8
16.1
24.9
23.0
34.4
14.1
23.7
20.9
79.8
63.2
50.5
42.8
11.3
14.4
113.5
21.7
16.5
17.3
18.3
20.3
24.6
96.6
316.3
13.9
5.2
b3996
b3997
b3999
b4000
b4001
b4003
b4005
b4019
b4020
b4021
b4022
b4023
b4024
b4025
b4027
b4030
b4031
b4032
b4033
b4037
b4039
b4040
b4041
b4042
b4043
b4054
b4055
b4056
b4057
b4058
b4059
b4062
b4064
b4065
b4067
b4069
b4070
b4072
b4073
b4075
b4076
b4077
b4079
b4090
b4093
b4094
b4098
b4104
b4105
b4107
b4108
b4111
yjaD
hemE
yjaG
hupA
yjaH
hydH
purD
metH
yjbB
pepE
yjbC
yjbD
lysC
pgi
yjbF
yjbA
xylE
malG
malF
malM
ubiC
ubiA
plsB
dgkA
lexA
tyrB
aphA
yjbQ
yjbR
uvrA
ssb
soxS
yjcD
yjcE
yjcG
acs
nrfA
nrfC
nrfD
nrfF
nrfG
gltP
fdhF
rpiB
phnO
phnN
phnJ
phnE
phnD
phnB
phnA
proP
5.7
8.9
6.7
20.4
10.0
11.9
4.7
11.4
7.5
8.5
3.4
10.5
6.8
7.2
8.1
8.2
12.4
9.6
7.7
10.9
4.7
4.2
9.4
6.2
9.9
15.2
4.2
3.8
6.3
14.7
6.2
1.4
7.1
5.3
22.2
2.6
12.0
3.3
10.5
17.3
34.5
9.0
9.7
13.1
9.4
8.5
23.7
10.6
11.3
9.1
8.5
5.6
3.8
7.0
5.5
10.2
6.9
7.5
2.5
7.2
5.2
6.1
2.3
7.9
3.7
5.6
4.7
4.8
8.1
6.6
5.4
6.1
3.4
3.5
6.3
4.2
6.4
8.5
3.0
2.9
4.3
8.8
4.6
1.0
4.5
4.1
11.4
2.0
7.0
1.7
7.4
10.6
17.8
5.0
6.2
7.5
6.9
5.5
12.3
7.0
6.9
5.9
5.9
4.2
10.9
12.0
8.6
5454.0
18.7
29.2
28.5
26.6
13.7
14.2
6.5
15.5
40.0
9.9
29.0
28.4
26.4
17.2
13.2
53.8
7.5
5.4
18.4
11.2
22.4
75.2
6.7
5.7
12.3
44.8
9.7
2.2
17.5
7.7
459.1
3.7
43.4
75.1
18.1
47.4
543.9
41.3
22.9
49.2
14.9
18.3
300.0
22.1
31.0
20.0
14.9
8.2
b4112
b4113
b4114
b4116
b4126
b4127
b4129
b4130
b4131
b4132
b4135
b4136
b4137
b4138
b4139
b4140
b4141
b4142
b4143
b4144
b4146
b4147
b4148
b4149
b4150
b4151
b4152
b4153
b4166
b4167
b4168
b4169
b4170
b4171
b4172
b4173
b4174
b4175
b4177
b4178
b4179
b4181
b4183
b4184
b4188
b4189
b4191
b4193
b4199
b4203
b4206
b4207
138
basS
basR
yjdB
adiY
yjdI
yjdJ
lysU
yjdL
cadA
cadB
yjdC
dsbD
cutA
dcuA
aspA
yjeH
mopB
mopA
yjeI
yjeK
efp
sugE
blc
ampC
frdD
frdC
frdB
yjeS
yjeF
yjeE
amiB
mutL
miaA
hfq
hflX
hflK
hflC
purA
yjeB
vacB
yjfI
yjfK
yjfL
yjfN
yjfO
yjfQ
sgaT
yjfY
rplI
ytfB
fklB
5.8
1.8
9.2
13.2
10.4
10.8
11.4
7.0
11.3
4.6
6.3
8.1
9.3
9.0
6.5
10.4
9.8
11.8
5.5
6.4
10.5
10.3
11.8
2.9
13.2
7.6
9.9
9.2
10.0
5.8
4.5
5.9
5.2
12.2
6.6
7.6
12.2
12.3
7.1
4.3
8.1
14.9
7.5
8.2
0.8
1.4
12.0
8.8
16.0
12.9
5.5
10.0
4.4
1.1
6.1
6.7
5.9
6.7
7.4
4.3
6.8
3.0
4.6
5.0
7.2
7.0
4.5
6.8
6.8
7.9
3.8
5.6
7.4
6.5
7.8
2.3
7.6
5.9
7.3
5.9
5.1
4.3
2.9
4.1
4.3
6.4
5.1
6.0
8.2
8.8
5.3
3.6
6.6
8.7
4.6
5.8
0.7
0.8
6.4
4.8
8.5
6.9
4.3
7.8
8.7
5.0
19.0
401.2
45.7
28.3
25.2
18.5
32.9
10.0
10.1
20.8
13.1
12.5
11.9
22.5
17.7
22.9
9.8
7.6
17.9
24.3
23.7
4.0
53.2
10.7
15.3
21.0
381.5
8.8
10.1
10.1
6.6
111.3
9.2
10.3
24.4
20.5
10.5
5.4
10.6
49.5
21.2
13.8
1.0
6.5
84.3
56.5
147.9
94.0
7.7
14.1
b4208
b4209
b4210
b4211
b4213
b4214
b4215
b4216
b4217
b4218
b4219
b4220
b4221
b4222
b4224
b4225
b4226
b4227
b4239
b4242
b4243
b4244
b4245
b4247
b4248
b4250
b4252
b4255
b4256
b4258
b4259
b4260
b4261
b4263
b4279
b4280
b4281
b4288
b4289
b4291
b4294
b4295
b4296
b4297
b4298
b4302
b4304
b4305
b4322
b4323
b4327
b4329
cycA
ytfE
ytfF
ytfG
cpdB
cysQ
ytfI
ytfJ
ytfK
ytfL
msrA
ytfM
ytfN
ytfP
chpS
chpB
ppa
ytfQ
treC
mgtA
yjgF
pyrI
pyrB
yjgG
yjgH
yjgK
yjgD
valS
holC
pepA
yjgP
yjgR
yjhB
yjhC
yjhD
fecD
fecC
fecA
insA_7
yjhU
yjhF
yjhG
yjhH
sgcA
sgcC
sgcX
uxuA
uxuB
yjiE
yjiG
10.9
10.6
10.1
9.1
10.6
8.2
6.4
7.6
3.8
4.9
3.4
11.2
6.4
8.7
6.6
3.6
13.7
6.6
6.5
9.6
9.8
10.5
10.8
8.9
7.5
10.1
14.1
7.8
8.9
10.7
8.3
5.1
8.4
5.5
2.7
8.4
15.2
15.0
12.8
8.8
13.9
4.2
23.0
8.4
10.6
6.5
17.1
9.7
5.1
8.0
5.2
6.9
7.6
6.8
7.2
6.0
7.8
5.7
3.3
6.3
2.7
2.7
2.7
8.3
4.1
5.7
4.0
2.7
7.4
4.4
4.2
6.7
7.6
7.0
6.9
4.5
5.0
5.8
8.9
6.4
6.2
7.3
5.7
4.3
6.1
3.2
1.9
5.6
9.0
8.7
7.4
6.3
9.6
3.3
14.4
4.6
7.1
4.0
9.2
4.8
4.1
6.4
2.7
4.5
19.4
24.4
16.9
18.4
16.9
14.3
109.0
9.7
6.7
30.9
4.7
17.0
13.9
18.0
19.4
5.6
93.9
13.3
14.2
16.6
13.9
21.2
25.3
881.4
14.8
40.6
34.7
10.1
15.3
20.3
15.8
6.3
13.3
18.2
4.7
17.4
50.6
53.8
50.1
14.5
25.4
5.7
58.1
46.5
20.9
16.3
113.4
1513.1
6.9
10.7
103.6
14.1
b4330
b4331
b4332
b4334
b4335
b4336
b4337
b4339
b4341
b4350
b4351
b4352
b4353
b4354
b4356
b4357
b4358
b4359
b4360
b4361
b4362
b4364
b4373
b4376
b4377
b4387
b4389
b4390
b4391
b4392
b4393
b4394
b4396
b4397
b4398
b4401
b4402
b4403
139
yjiH
yjiI
yjiJ
yjiL
yjiM
yjiN
yjiO
yjiQ
yjiS
hsdR
mrr
yjiA
yjiX
yjiY
yjiZ
yjjM
yjjN
mdoB
yjjA
dnaC
dnaT
yjjP
rimI
osmY
yjjU
smp
sms
nadR
yjjK
slt
trpR
yjjX
rob
creA
creB
arcA
yjjY
lasT
13.7
7.2
11.9
8.9
3.6
2.6
13.3
12.9
13.6
6.6
2.1
4.1
4.0
7.6
9.3
2.1
12.0
11.7
8.0
8.5
9.5
15.0
6.7
8.3
11.5
14.4
5.0
4.7
9.1
4.4
5.5
3.5
1.7
5.0
8.4
5.5
5.1
14.7
7.4
5.3
7.3
6.4
2.8
1.9
8.8
8.0
7.4
3.3
1.2
3.2
3.4
5.5
6.3
1.3
7.1
8.4
5.6
6.0
6.8
9.8
4.6
5.5
7.4
7.3
2.8
3.4
7.1
3.7
3.3
2.0
1.5
3.5
5.5
4.5
3.9
9.6
93.4
11.2
30.8
14.9
4.9
3.8
26.7
33.2
84.4
746.1
9.1
5.5
4.9
12.6
18.2
5.7
40.3
19.2
14.1
14.6
16.0
32.6
12.7
16.9
24.9
560.5
21.6
7.5
12.9
5.4
17.7
11.8
2.1
8.4
17.8
7.1
7.4
30.5
140