from cornell.edu

advertisement
Insights Into the Relation Between mRNA and
Protein Expression Patterns: II. Experimental
Observations in Escherichia coli
Pat S. Lee,1 Leah B. Shaw,1,2 Leila H. Choe,1 Amit Mehra,3
Vassily Hatzimanikatis,3 Kelvin H. Lee1
1
Department of Chemical Engineering Cornell University, 120 Olin Hall, Ithaca,
New York 14853
2
Department of Physics, Cornell University, 117 Clark Hall, Ithaca,
New York 14853
3
Department of Chemical Engineering, Northwestern University, Evanston,
Illinois; telephone: 607-255-4215; fax: 607-255-9166;
e-mail: khl9@ cornell.edu
Published online 24 November 2003 in Wiley InterScience (www.interscience.wiley.com). DOI: 10.1002/bit.10841
Abstract: There is a need for improved appreciation of the
importance of genome-wide mRNA and protein expression
measurements and their role in understanding translation
and in relation to genome-wide mathematical frameworks
for gene expression regulation. We investigated the use of
a high-density microarray technique for mRNA expression
analysis and a two-dimensional protein electrophoresis –
tandem mass spectrometry method for protein analysis to
monitor changes in gene expression. We applied these
analytical tools in the context of an environmental perturbation of Escherichia coli cells—the addition of varying
amounts of IPTG. We also tested the application of these
tools to the study of a genetic perturbation of Escherichia
coli cells—the ability of certain strains to hypersecrete the
hemolysin protein. We observed a lack of correspondence
between mRNA and protein expression profiles. Although
our data do not include measurements on all expressed
genes (because the ability to measure protein expression
profiles is limiting), we observed that the qualitative and
quantitative behavior of the measurements of a subset of
expressed genes is similar to the behavior of the entire
system. The change in observed average mRNA and protein
amplification factors for 77 and 52 genes coincided with the
observed change in mRNA amplification factor for the entire
system. Furthermore, we found that relative changes in
expression could be used to elucidate mechanisms of gene
expression regulation for the system studied, even when
measurements were made on a small subset of the system.
B 2003 Wiley Periodicals, Inc.
Keywords: proteomics; functional genomics; translation
Correspondence to: K. H. Lee
P.S.L., L.B.S., and L.H.C. contributed equally to this work
Contract grant sponsors: National Science Foundation; USDA Agricultural Research Service; Corning Foundation Fellowship; Liu Memorial
Award Fellowship
Contract grant numbers: BES 9874938; BES 0120315; SCA#58-19071-146
B 2003 Wiley Periodicals, Inc.
INTRODUCTION
Together with complete genome sequence information, the
ability to generate genome-wide patterns of changes in gene
expression at both the mRNA and protein levels has signaled
a new era in the ability to monitor, understand, and
manipulate organisms. However, the promise of this new
paradigm in the study of biological systems has remained
unfulfilled due to technical challenges (e.g., an inability to
quantify changes in protein expression for all expressed
proteins) and a lack of useful statistical and computational
tools to manage, interpret, and integrate these new types of
information. Thus, many of the previous studies that rely on
both mRNA and protein expression profiles have been
observational in nature (Anderson and Seilhamer, 1997;
Baliga et al., 2002; Betts et al., 2002; Griffin et al., 2002;
Gygi et al., 1999).
The relevant previous literature originates in studies of
human liver based on Coomassie blue-stained two-dimensional protein electrophoresis (2DE) measurements for
protein expression and on transcript image methods for
mRNA measurements (Anderson and Seilhamer, 1997).
Investigators have observed a correlation coefficient of 0.48
between mRNA and protein abundances for 19 different
genes. A larger study in yeast (Gygi et al., 1999), which used
serial analysis of gene expression for mRNA expression
level quantification and silver-stained 2DE for protein
measurements, had a Pearson product moment correlation
coefficient of 0.935 for a set of 106 genes. However, this
value would change to 0.356 if one considers genes expressed at ten message copies per cell or fewer, which
accounts for 69% (73 of 106) of the genes measured. Beyond
these two initial investigations, other studies have reported
on a variety of organisms (Baliga et al., 2002; Betts et al.,
2002; Griffin et al., 2002; Ideker et al., 2001). A key feature
of all the observations and a common issue raised in the discussion of such results is the lack of an obvious linear correlation between mRNA expression and protein expression.
The poor correlation between mRNA and protein
expression levels may be the result of multiple factors.
Technical variability is a key challenge that remains for
both mRNA expression profiling and protein expression
profiling. Experimental variability leads many investigators
to consider biological and chip replicates for array
measurements and biological and gel replicates for protein
studies. The issue of variability is confounded by measurements of low-abundance mRNA and proteins that are often
at or below the statistically significant range permitted by
the experimental technique (Gygi et al., 2000), and has
motivated the proteomics community in particular to
consider alternative strategies.
Nonetheless, the previous studies have provided important data regarding the mRNA-to-protein relationship and
raise important questions about the broad applicability and
utility of genome-wide measurements. An obvious conclusion that can be drawn from these studies is that it is
essential to measure both mRNA and protein expression
profiles, whenever feasible, to gain insights into a biological system. The integration of these measurements
together with DNA sequence information provides a
starting point for further analysis. Indeed, it has been
shown mathematically that an understanding of simple
networks of gene expression requires analysis at both the
mRNA and protein levels (Hatzimanikatis and Lee, 1999).
Of course, genome-wide measurements of protein posttranslational modifications, activities, and metabolite concentrations are also important, but analogous technologies
do not yet exist for these measurements. The need for the
integration of DNA, mRNA, and protein measurements is
at least as important as the need for novel information
technology platforms and databases that permit the storage
and management of information of multiple data types
(such as sequences, arrays, gel images, spectra, etc.).
Although there are unmet needs with respect to standards
for the organization of this information, an often overlooked aspect of the integration of different levels of
biological information is the synergistic use of these data to
understand the underlying biological phenomenon in the
system of interest.
The development of an effective, mechanistic, mathematical framework to describe gene expression (DNA
sequence, and mRNA and protein expression) may provide
an important advance in the ability to understand biological
systems for at least two reasons. First, it may enable one to
predict changes in protein expression, based on changes in
mRNA expression. The ability to predict such changes
would address some of the technical challenges that remain
in protein sample preparation and separations, which make
it currently impossible to measure protein expression for all
expressed proteins in a biological system in a single
experiment. Second, and perhaps more importantly, the
combination of experimental data integrated into a
mathematical framework may be useful in identifying
which key regulatory features of translation are modified
by a prokaryote in response to environmental or genetic
perturbations. Examples of such regulatory features are the
mRNA degradation rates for particular messages, the
ribosome copy number, or the protein degradation rates.
If one could obtain such knowledge about the regulatory
features of a system of interest, then subsequent metabolic
and genetic engineering of the system could be facilitated.
The effective integration of mRNA and protein expression data within a mathematical framework will probably
rely on measurements of the relative change in gene
expression rather than on absolute measurements. The use
of relative changes versus absolute measurements is more
consistent with the use of dimensionless equations and also
easier to measure by experiment. One feature of the previous studies that makes it difficult to integrate the prior
data into appropriate modeling frameworks is the lack of
relative measurements. Typically, the data have been presented as an abundance of mRNA or abundance of protein,
perhaps over two different conditions (genetic or environmental perturbations). The use of relative ratios or scaled
parameters in modeling efforts typically requires at least
three experimental conditions to be tested. Thus, in this
study we report experimental measurements and analyze
them in the context of a mathematical framework for
translation (Mehra et al., 2003).
In this investigation we present mRNA expression and
protein expression levels based on an Affymetrix GeneChip
probe array analysis of Escherichia coli compared with protein expression levels measured using fluorescent tagging
of 2DE-separated proteins imaged using a laser-based scanning system. We use these data in the development of a
mechanistic mathematical framework for prokaryotic translation. We chose to employ these technologies because they
are used extensively for similar measurements and because
computational tools to extract data from the raw images are
commercially available. For the mRNA profile measurements in particular, we note that there are multiple analysis
techniques to quantify gene expression based on the same
raw dataset. For proteins, there are also a variety of quantification methods for protein expression based on different
staining and imaging technologies. We present results using
a variety of methods to test both genetic (a hemolysin
supersecretion phenotype) and environmental (varying concentrations of IPTG isopropyl-B-D-thiogalactopyranoside)
perturbations and demonstrate that similar results can be
obtained using any of these methods if appropriate normalization is used. Our measurements further suggest that
simple models of gene expression may provide a useful
foundation for the integration of mRNA and protein expression information.
MATERIALS AND METHODS
Strains and Culture Conditions
In what follows we consider the effect that a minor
environmental perturbation may have on Escherichia coli
LEE ET AL.: RELATION BETWEEN MRNA AND PROTEIN EXPRESSION
835
gene expression with the addition of IPTG, commonly used
as an inducer for heterologous gene expression. Frozen E.
coli MG1655 cell stocks were streaked onto Luria– Bertani
(LB) plates and grown at 37jC. Single colonies were
subsequently streaked onto M9 minimal medium plates
supplemented with 0.4% glucose and 1 mg./L thiamine and
incubated at 37jC; single colonies were used to inoculate
125-mL flasks containing 25 mL of M9 minimal medium
with 0.4% glucose and 1 mg/L thiamine, and then shaken at
250 rpm at 37jC. Overnight cultures were subcultured in six
250-mL flasks containing 50 mL of M9 minimal medium
with glucose and thiamine, at a starting OD600 = 0.1. When
cultures reached OD600 = 0.8, four cultures were supplemented with IPTG at either 0.1 mM (low) or 1 mM (high).
Flasks were shaken for 1 h after IPTG addition, and cells
were harvested by centrifugation. For [35S]-methioninelabeled samples, six 1-mL aliquots were removed at OD600 =
0.6 and grown in culture tubes until OD600 = 0.8. IPTG was
added at 0.1 mM (low) or 1 mM IPTG (high) to four cultures.
Thirty minutes after addition of IPTG, 100 ACi [35S]methionine was added, corresponding to 0.083 AM. Five
minutes later, cultures were supplemented with 1 AM cold
methionine. Culture tubes were grown for 30 min after
methionine chase and cells were harvested by centrifugation.
We designated the samples as having no, low, or high levels
of IPTG.
In the case of studying the effect of a genetic mutation on
Escherichia coli gene expression, we considered the effect
of hypersecretion of hemolysin. Plasmids pWAM1097 and
pWAM716 were obtained from Rodney Welch at the University of Wisconsin (Felmlee et al., 1985). pWAM1097
contains the genes for HlyC, HlyA, and ampicillin resistance. pWAM716 contains the genes for HlyB, HlyD, and
chloramphenicol resistance. In this study the strains used
were W3110, W3110 with the pWAM1097 and pWAM716
plasmids (Hly parent), and a hypersecretion mutant strain
(Lee and Lee, in preparation) derived from the Hly parent
strain (Hly mutant). Frozen stocks of these three strains
were streaked onto LB plates, or LB plates with 150 Ag/mL
ampicillin and 170 Ag/mL chloramphenicol, and incubated
at 37jC to form single colonies. Flasks (250 mL) containing
50 mL of LB medium, or LB with 150 Ag/mL ampicillin and
170Ag/mLchloramphenicol, were inoculated with single colonies of W3110, Hly parent, or Hly mutant. Cultures were
split at OD600 g 0.1 to create biological replicates. Flasks
were shaken at 250 rpm at 37jC until OD600 = 1.0 (mid-log
phase). Cells were harvested by centrifugation at 4jC. We
designated samples as W3110, Hly parent, or Hly mutant.
mRNA analysis
Samples for RNA purification were resuspended in RNAlater (Ambion, Austin, TX) and stored at 4jC until processing (no more than 3 days after harvesting). Prior to
processing, cells were pelleted and washed in cold
phosphate-buffered saline to remove RNAlater. Samples
were processed using MasterPure RNA purification kits
836
(Epicentre, Madison, WI). Cells were lysed using proteinase
K, and genomic DNA was removed using DNase. RNA was
quantified using A260 measurements, and purity was
assessed using the A260 /A280 ratio.
MG1655 samples were treated with polymerase chain
reaction (PCR) primers for rRNA, and the double-stranded
fragments digested, thereby enriching the sample for
mRNA. The enriched mRNA was directly end-labeled with
biotin, to which fluorescently tagged streptavidin binds. The
fluorescently labeled mRNA was then hybridized to E. coli
sense GeneChip probe arrays (Affymetrix, Santa Clara, CA).
cDNA fragments from W3110-derived samples were
created by two rounds of reverse-transcriptase PCR using
random hexamer primers (Rosenow et al., 2001). The resulting cDNA was then fluorescently labeled and hybridized
to E. coli antisense GeneChip probe arrays (Affymetrix).
The resulting data were analyzed using MICROARRAY
SUITE 4.0 (Affymetrix), and genome array processing
software (GAPS) (Selinger et al., 2000). The GAPS software
automatically calculates and subtracts background for each
chip set. For the W3110 sample sets, the calculated background levels were high relative to gene values; thus,
background values were added back to gene intensities.
IPTG sample background levels were much lower than gene
values, allowing direct use of these data.
Protein Analysis
2DE was performed as previously described (Hatzimanikatis
et al., 1999). MG1655 samples were lysed using sonication
as described previously (Choe et al., 1999). W3110 samples
were lysed using a combination of freeze – thaw cycles and
sonication (Lee and Lee, 2003). Lysed samples were used to
rehydrate pH 3– 10 nonlinear Immobiline gels (Amersham
Biosciences, Piscataway, NJ). Isoelectric focusing and
sodium dodecylsulfate – polyacrylamide gel electrophoresis
(SDS-PAGE) were performed as previously described (Choe
et al., 1999). 2DE-separated proteins were visualized using
one of a variety of techniques. Colloidal blue stain
(Invitrogen, Carlsbad, CA) was performed for 24 h and
destained in water and then visualized using a laser
densitometer (Molecular Dynamics). Sypro Ruby stain
(Molecular Probes, Eugene, OR) was performed overnight
followed by destaining in 7% acetic acid and 10% methanol
overnight, then visualized with a laser fluorescence scanner
(Model FLA-3000, Fujifilm Medical Systems, Stamford,
CT). Ammoniacal silver stain was performed as previously
described (Hatzimanikatis et al., 1999) and imaged using a
laser densitometer (Molecular Dynamics). [35S] gels were
fixed in 40% ethanol, 10% acetic acid for 30 min; rehydrated
in 10% glycerol, 5% ethanol, 5% acetic acid for 1 h; and
dried (GelAir Dryer, Bio-Rad Laboratories) according to the
manufacturer’s instructions. These gels were subsequently
imaged with a phosphorimager (Model GS-525, Bio-Rad)
using Imaging Screen-CS, which was exposed for 3 days. In
addition to the dried 2DE gel, a dilution series of [35S]
standards loaded into a microwell plate was imaged with
BIOTECHNOLOGY AND BIOENGINEERING, VOL. 84, NO. 7, DECEMBER 30, 2003
each [35S] 2DE gel and also counted using a scintillation
counter (Model LS6800, Beckman) to serve as a reference
value for normalization across gels.
Image analysis for 2DE was performed with MELANIE 3
software (GeneBio, Geneva, Switzerland) using manual
editing after spot detection with default parameters. The
percent volume (%vol) was used for images generated with
protein stain, and the volume, normalized against the amount
of standards loaded (vol/stds), was used for [35S]-based
images. These parameters were selected to normalize for any
slight variation in protein load or imaging times.
2DE protein spots for characterization by mass spectrometry were excised from 2DE gels, digested using trypsin, and
analyzed using a (Model 4700 Proteomics Analyzer Applied
Biosystems) as described elsewhere (Lee and Lee, 2003).
Mascot searches of the E. coli genome using both MS and
MSMS data were performed using GPS EXPLORER, version
1.1 (Applied Biosystems), and confidence intervals >95%
were accepted.
RESULTS AND DISCUSSION
Selection of Metrics
A key consideration in the measurement of changes in
mRNA expression profiles is the selection of an appropriate
metric that distills the raw data (fluorescence intensities of
Affymetrix probe pairs and probe sets) into values for the
expression level of various genes. In the current context, we
were particularly interested in three possible metrics, which
have been commonly used as reasonable measures of gene
expression (Selinger et al., 2000). These metrics are the
Average Difference, 2Max, and Median (Selinger et al.,
2000) and can be extracted from the MICROARRAY SUITE
and GAPS software packages. As a representative example
of how the data are extracted, consider the mRNA expression
changes for leuC, 3-isopropylmalate isomerase subunit
(P30127; B0072), a gene consisting of 1401 basepairs
(bp). We calculated the fold change in mRNA expression for
the case of low IPTG induction relative to the no-IPTG case,
Table I. Comparison of mRNA expression metrics.
mRNA fold change
leuC
Average difference
2 maximum
Median
Average of metrics
CV of metrics
uspA
Average difference
2 maximum
Median
Average of metrics
CV of metrics
Low/no IPTG
High/no IPTG
High to low
1.29
1.08
1.75
1.75
1.23
2.09
1.36
1.14
1.19
1.23
9.3%
1.59
1.94
1.85
3.35
6.94
3.37
2.11
3.58
1.82
2.5
37.7%
Table II. Comparison of protein expression metrics.
Protein fold change
leuC
Ruby (%vol)
Silver (%vol)
Blue (%vol)
[35S] (vol/stds)
Average of metrics
CV of metrics
uspA
Ruby (%vol)
Silver (%vol)
Average of metrics
CV of metrics
Low/no IPTG
High/no IPTG
High to low
1.14
1.99
0.97
0.76
0.88
1.75
0.58
0.35
0.77
0.88
0.60
0.46
0.68
27.3%
1.11
0.93
1.13
1.06
1.02
1.14
1.08
7.9%
and the fold change in mRNA expression for the case of high
IPTG induction relative to the no-IPTG case, using each of
the three possible metrics just given. leuC message was
reported as ‘‘present’’ by both the MICROARRAY SUITE
software as well as by GAPS software. Table I provides the
values of these two cases relative to a reference state because
our objective was to integrate and interpret this information
in the context of a mathematical framework (Mehra et al.,
2003). We observed that the relative expression change for
the high versus low case for leuC under these conditions is
consistent among these three metrics, with an average value
of 1.23 and a coefficient of variation (CV) of 9%. This result
suggests that the selection of a particular metric may not
introduce substantial errors in measurement for this gene in
the conditions tested. This increase in leuC expression of
23% for a comparison of low to high IPTG was reported to be
statistically significant by both the MICROARRAY SUITE
software and GAPS software. Table I reports similar measurements on uspA, universal stress protein A (P28242;
B3495). Here, we again observed a statistically significant
250% increase in uspA message expression with CV = 38%
among the different metrics. Based on these similar initial
results, we reported mRNA expression analysis results using
the Average Difference metric.
The quantitation of protein expression using 2DE can be
accomplished by a number of different detection chemistries
and metrics. In the current study we consider the possibility
of using fluorescence stains, ammoniacal silver, colloidal
blue, and radioisotope tagging of proteins. The use of Sypro
Ruby is known to provide a reasonably linear and broad
dynamic range for protein expression quantitation in 2DE
gels (Lopez et al., 2000), and we have previously reported
the effective use of ammoniacal silver stain as a method for
quantitation (Dutt and Lee, 1999). Here again, as a
representative example, we considered first the protein fold
change for leuC using these four measures of protein
expression. The results using each of these detection
strategies are reported in Table II. Figure 1 depicts the
relevant regions from 2DE gels including the identity of
some of the protein spots near the leuC protein. We observed
a 32% decrease in protein expression for leuC for the low
LEE ET AL.: RELATION BETWEEN MRNA AND PROTEIN EXPRESSION
837
Figure 1. Comparison of the regions of 2DE gels visualized using Sypro
Ruby, ammoniacal silver, colloidal blue, and [35S]. The positions of
several identified protein spots are noted, including the position of the
protein corresponding to leuC.
versus high IPTG case. In this case, the CV for the different
measures of leuC protein was 27%. Note that this decrease of
32% in protein expression for leuC is in contrast with the
23% increase in corresponding message expression. For the
case of uspA (Fig. 2) we reported values from Sypro Ruby
and silver stains, as these were the most reliable values
obtained for this experiment. The expression level of this
protein remained unchanged for the low versus high IPTG
experiment with a CV of 7.8%. Based on these initial results,
we reported protein expression analysis data using Sypro
Ruby stain, because this is a commonly used stain with
excellent dynamic range and reproducibility (Choe and
Lee, 2003).
Comparison of Protein Amplification Factor and
mRNA Amplification Factor
Figures 3 and 4 present data for 77 genes from the IPTG and
52 genes from the hemolysin experiments. Criteria for
inclusion in the dataset require that the mRNA for a
particular gene be expressed (versus absent) and for the 2DE
spot to have been identified. These data are termed ‘‘mRNA
amplification factors,’’ as measured using the Average
Difference metric, and ‘‘protein amplification factors,’’ as
measured using the percent volume metric from Sypro
Ruby-stained 2DE gels. The line of one-to-one correspondence between mRNA and protein expression is presented in
the figures as a reference. Consistent with observation in
other organisms, we observed no clear relationship between
mRNA amplification and protein amplification factors for
Escherichia coli. The r2 correlation coefficients for the cases
838
studied were 0.02 for the low/no IPTG case, 7e-4 for the
high/no IPTG case, 0.04 for the W3110/Hly parent case, and
3e-4 for the Hly mutant/Hly parent case. These values
reinforce the notion that, although the regulation of mRNA
expression might have evolved to support cellular functions
under perturbations, the translation of this transcriptional
program into protein, and ultimately into cellular phenotype,
is complex. This complexity calls for the development of
mathematical frameworks for the analysis and study of gene
and protein expression that will provide insights into key
biological parameters that affect the mRNA – protein
expression relationship for individual genes.
Theoretical studies and mathematical analyses have
suggested that the genome-wide average mRNA amplification factor fGm (or f m from Mehra et al., 2003) has a significant
effect on the relation between mRNA and protein expression
levels (Table III). Values >1 suggest that the total amount of
message has increased; whereas values <1 suggest a
decrease. From dimensionless analysis (Equation VIII.4 in
Mehra et al. [2003]), we expect that if this value is <1, then
one can predict that, in general, but not for any particular
gene, the ratio of protein to mRNA expression should be >1,
assuming no change in the underlying kinetic and thermodynamic properties of the system; that is, the ‘‘centroid’’ of
data points in a plot similar to Figures 3 and 4 should be
above the one-to-one correspondence line. As the value of fGm
approaches 1, we would expect that the centroid would
approach the one-to-one line. Furthermore, we predict that,
as fGm increases to >1, then the centroid of the data will move
further below the one-to-one correspondence line. We
calculated the fGm based on all expressed genes in the genome
using the MICROARRAY SUITE and GAPS software for each
of the four cases studied and present these values in Figures 3
and 4. We observed, in a comparison of low to high IPTG
(relative to the same reference state), a change of fGm from
1.15 to 1.29. For the hemolysin-secreting strains we
observed a change in this value from 0.85 to 0.95. The
centroid of the data points for each of the four cases was
generated by calculating the fEm and the fEp. These terms refer
to the average value of the mRNA and protein amplifications
from the subset of genes for which we have experimental
observations on both message and protein. The experimental
data collected are consistent with the predictions from our
model of translation (Mehra et al., 2003).
We also calculated the ratio of the lengths of the line
segments connecting the centroid point to the one-to-one line
Figure 2. Comparison of the regions of 2DE gels stained using Sypro
Ruby and ammoniacal silver. The positions of several identified protein
spots are noted, as is the position of uspA.
BIOTECHNOLOGY AND BIOENGINEERING, VOL. 84, NO. 7, DECEMBER 30, 2003
Figure 3. A plot of the relative mRNA and protein amplification factors for a change in IPTG concentration for Escherichia coli MG1655. The values for
individual genes are plotted as the ratio of the condition of interest: (A) 0.1 mM IPTG or (B) 1.0 mM IPTG, relative to a reference state (0 mM IPTG). The
average mRNA amplification factor calculated for all expressed genes is listed and denoted by fGm for ‘‘genome-wide’’ mRNA amplification factor. The
centroid of the experimentally observed data points (average mRNA and protein amplification factors) denoted by the x and y coordinates, fEm and fEp , is also
plotted and listed.
and the ratios of the deviations from one of the fGm for
the IPTG and the hemolysin experiments; for example:ðfGm 1ÞIPTGlo =ðfGm 1ÞIPTGhi . In the case of the IPTG
experiments, the ratio of the lengths of line segments for high
versus low, and that is based on fEm and fEp, was 1.98, and the
ratio of the deviation from one of the fGm values was 1.93,
which is in excellent agreement. Note that the difference
between the values may represent the fact that the centroid
value and the line segment calculation were based only on
data collected from genes in which we were able to measure
both protein and mRNA expression; whereas the fGm was
calculated from mRNA expression for all expressed genes in
the genome (with no regard to protein expression). In the
case of the Hly experiment, we also observed surprisingly
close agreement between the line segment length ratio of fEm
and fEp (W3110 versus Hly mutant) of 2.98 and the ratio of the
deviation from one of the fGm , which was 3.00. This result is
surprising because the experimental data represent only a
Figure 4. A plot of the relative mRNA and protein amplification factors for a change in genetic background for Escherichia coli W3110. The values for
individual genes are plotted as the ratio of the condition of interest: (A) wild-type or (B) a hypersecretion mutant, relative to a reference state (a secreting
parent strain). The average mRNA amplification factor calculated for all expressed genes is listed and denoted by fGm for ‘‘genome-wide’’ mRNA
amplification factor. The centroid of the experimentally observed data points (average mRNA and protein amplification factors) denoted by the x and y
coordinates, fEm and fEp , is also plotted and listed.
LEE ET AL.: RELATION BETWEEN MRNA AND PROTEIN EXPRESSION
839
Table III. Definitions for amplication factors.
fGm
fEm
fEp
The average mRNA amplification factor for all
expressed genes genome-wide.
Analogous to f m from Mehra et al. (2003).
The average mRNA amplification factor for the
experimentally observed subset of genes.
Only includes genes for which mRNA
and protein expression was measured.
The average protein amplification factor for the
experimentally observed subset of genes.
Only includes genes for which mRNA and
protein expression was measured.
relatively small fraction of the total expressed genes,
whereas the fGm was calculated from all of the expressed
genes. Thus, the experimental observations, which are a
small subset of total genes expressed, seem to be
representative of the behavior of the entire system for the
cases studied. There are reports in the literature about the
drawbacks of using only the experimentally observed data
because of bias introduced in the proteomics experiments
(e.g., Gygi et al., 2000). However, these observations were
made based on absolute measurements and did not provide
observations on relative changes in gene expression. The use
of relative changes in gene expression may result in the
ability to better extrapolate system-wide behavior based on
measurements of only part of the system.
A closer inspection of Figures 3 and 4 also yields
interesting observations about the mRNA –protein relationship. Consider the vector traversed by the centroid between
Figure 3A and 3B. In the cases studied, the centroid migrated
more along the horizontal direction than the vertical
direction (0.28 units versus 0.09 units for the IPTG case).
This suggests that, although the fEm may have been increasing
in the cases studied, the corresponding change in fEp was not
as great. If there was no change in the underlying kinetics
and thermodynamics of the system upon perturbation with
IPTG, then the observed change in centroid position would
necessarily correspond to no significant change in the
number of ribosomes (or other factors not considered here,
such as mRNA or protein degradation rates). If the
availability of total ribosomes was constant and if the
amount of mRNA were to increase, then one would expect
the corresponding protein expression levels to decrease
relative to the change in mRNA because ribosome availability would become limiting. That is, the presence of more
messages without a change in the number of ribosomes
engaged in translation will result in proportionately less
protein expressed. In Figure 3, protein levels do not increase
proportionately based on the migration of the centroid. We
therefore conclude that there may have been no significant
increase in the number of ribosomes in the conditions tested
in the present experiments.
Taken further, if one were to observe a substantial change
in fEp and a relatively small change in fEm , then one may
conclude that the number and availability of ribosomes
would increase significantly. Although we did not observe
840
this experimentally, it could be expected that the only means
to disproportionately increase the amount of protein
expression relative to message expression is to increase the
number of ribosomes engaged in translation.
In conclusion, the approaches presented herein represent
an initial attempt to quantify gene expression levels and
integrate these data into a modeling framework. Based on
these observations, we found that experimental mRNA and
protein expression data on a subset of the genes expressed
in Escherichia coli were representative of genome-wide
response. Furthermore, the change in relative position of
the ‘‘centroid’’ of experimental data points (the fEm and the
fEp ), supported by theoretical analysis, suggests a critical
role for ribosomes in determining the relationship between
mRNA and protein expression. We attempted to use
metrics and experimental procedures that are generally
available. However, the use of improved methods for
protein quantitation, such as those offered by isotopic
dilution strategies, may offer further refinement of these
data. A key outcome of this study is the observation that,
although experimental measurements of mRNA and protein
expression may not reveal any obvious correlation, such
measurements can provide important insights into gene
expression regulation—especially when taken in the
context of a mathematical framework for translation.
The authors thank Dave Garfin at Bio-Rad Laboratories for access to
a gel dryer, and Rodney Welch for the hemolysin plasmids. We also
thank Andrew Brooks at the University of Rochester Medical Center
for Affymetrix analysis.
References
Anderson L, Seilhamer J. 1997. A comparison of selected mRNA and
protein abundances in human liver. Electrophoresis 18:533 – 537.
Baliga NS, Pan M, Goo YA, Yi EC, Goodlett DR, Dimitrov K, Shannon P,
Aebersold R, Ng WV, Hood L. 2002. Coordinate regulation of energy
transduction modules in Halobacterium sp analyzed by a global
systems approach. Proc Natl Acad Sci USA 99:14913 – 14918.
Betts JC, Lukey PT, Robb LC, McAdam RA, Duncan K. 2002. Evaluation of
a nutrient starvation model of Mycobacterium tuberculosis persistence
by gene and protein expression profiling. Mol Microbiol 43:717 – 731.
Choe LH, Chen W, Lee KH. 1999. Proteome analysis of factor for inversion
stimulation (Fis) overproduction in Escherichia coli. Electrophoresis
20:798 – 805.
Choe LH, Lee KH. 2003. Quantitative and qualitative measure of
intralaboratory two-dimensional gel reproducibility and the effects of
sample preparation, sample load, and image analysis. Electrophoresis
24:3500 – 3507.
Dutt MJ, Lee KH. 2001. The scaled volume as an image analysis variable for
detecting changes in protein expression levels by silver stain. Electrophoresis 22:1627 – 1632.
Felmlee T, Pellett S, Welch R. 1985. Nucleotide sequence of an
Escherichia coli chromosomal hemolysin. J Bacteriol 163:94 – 105.
Griffin TJ, Gygi SP, Ideker T, Rist B, Eng J, Hood L, Aebersold R. 2002.
Complementary profiling of gene expression at the transcriptome and
proteome levels in Saccharomyces cerevisiae. Mol Cell Proteomics 1:
323 – 333.
Gygi SP, Rochon Y, Franza BR, Aebersold R. 1999. Correlation between
protein and mRNA abundance in yeast. Mol Cell Biol 19:1720 – 1730.
Gygi SP, Corthals GL, Zhang Y, Rochon Y, Aebersold R. 2000.
BIOTECHNOLOGY AND BIOENGINEERING, VOL. 84, NO. 7, DECEMBER 30, 2003
Evaluation of two-dimensional gel electrophoresis-based proteome
analysis technology. Proc Natl Acad Sci USA 97:9390 – 9395.
Hatzimanikatis V, Choe LH, Lee KH. 1999. Proteomics: Theoretical and
experimental considerations. Biotechnol Progr 15:312 – 318.
Hatzimanikatis V, Lee KH. 1999. Dynamical analysis of gene networks
requires both mRNA and protein expression information. Metab Eng
1:275 – 281.
Ideker T, Thorsson V, Ranish JA, Christmas R, Buhler J, Eng JK,
Bumgarner R, Goodlett DR, Aebersold R, Hood L. 2001. Integrated
genomic and proteomic analyses of a systematically perturbed metabolic network. Science 292:929 – 934.
Lee P, Lee KH. 2003. Escherichia coli—a model system that benefits
from and contributes to the evolution of proteomics. Biotechnol Bioeng
84:801 – 814.
Lopez MF, Berggren K, Chernokalskaya E, Lazarev A, Robinson M,
Patton WF. 2000. A comparison of silver stain and Sypro Ruby
protein gel stain with respect to protein detection in two-dimensional
gels and identification by peptide mass profiling. Electrophoresis 21:
3673 – 3683.
Mehra A, Lee KH, Hatzimanikatis VH. Insights into the relation between
mRNA and protein expression patterns. I. Theoretical considerations.
2003. Biotechnol Bioeng 84:822 – 833.
Rosenow C, Saxena RM, Durst M, Gingeras TR. 2001. Prokaryotic RNA
preparation methods useful for high density array analysis: Comparison of two approaches. Nucl Acids Res 29:1 – 8.
Selinger DW, Cheung KJ, Mei R, Johansson EM, Richmond CS, Blattner
FR, Lockhart DJ, Church GM. 2000. RNA expression analysis using a
30 base pair resolution Escherichia coli genome array. Nat Biotechnol
18:1262 – 1268.
Yoshida K, Kobayashi K, Miwa Y, Kang CM, Matsunaga M, Yamaguchi
H, Tojo S, Yamamoto M, Nishi R, Ogasawara N, Nakayama T, Fujita
Y. 2001. Combined transcriptome and proteome analysis as a
powerful approach to study genes under glucose repression in Bacillus
subtilis. Nucl Acids Res 29:683 – 692.
LEE ET AL.: RELATION BETWEEN MRNA AND PROTEIN EXPRESSION
841
Download