Insights Into the Relation Between mRNA and Protein Expression Patterns: II. Experimental Observations in Escherichia coli Pat S. Lee,1 Leah B. Shaw,1,2 Leila H. Choe,1 Amit Mehra,3 Vassily Hatzimanikatis,3 Kelvin H. Lee1 1 Department of Chemical Engineering Cornell University, 120 Olin Hall, Ithaca, New York 14853 2 Department of Physics, Cornell University, 117 Clark Hall, Ithaca, New York 14853 3 Department of Chemical Engineering, Northwestern University, Evanston, Illinois; telephone: 607-255-4215; fax: 607-255-9166; e-mail: khl9@ cornell.edu Published online 24 November 2003 in Wiley InterScience (www.interscience.wiley.com). DOI: 10.1002/bit.10841 Abstract: There is a need for improved appreciation of the importance of genome-wide mRNA and protein expression measurements and their role in understanding translation and in relation to genome-wide mathematical frameworks for gene expression regulation. We investigated the use of a high-density microarray technique for mRNA expression analysis and a two-dimensional protein electrophoresis – tandem mass spectrometry method for protein analysis to monitor changes in gene expression. We applied these analytical tools in the context of an environmental perturbation of Escherichia coli cells—the addition of varying amounts of IPTG. We also tested the application of these tools to the study of a genetic perturbation of Escherichia coli cells—the ability of certain strains to hypersecrete the hemolysin protein. We observed a lack of correspondence between mRNA and protein expression profiles. Although our data do not include measurements on all expressed genes (because the ability to measure protein expression profiles is limiting), we observed that the qualitative and quantitative behavior of the measurements of a subset of expressed genes is similar to the behavior of the entire system. The change in observed average mRNA and protein amplification factors for 77 and 52 genes coincided with the observed change in mRNA amplification factor for the entire system. Furthermore, we found that relative changes in expression could be used to elucidate mechanisms of gene expression regulation for the system studied, even when measurements were made on a small subset of the system. B 2003 Wiley Periodicals, Inc. Keywords: proteomics; functional genomics; translation Correspondence to: K. H. Lee P.S.L., L.B.S., and L.H.C. contributed equally to this work Contract grant sponsors: National Science Foundation; USDA Agricultural Research Service; Corning Foundation Fellowship; Liu Memorial Award Fellowship Contract grant numbers: BES 9874938; BES 0120315; SCA#58-19071-146 B 2003 Wiley Periodicals, Inc. INTRODUCTION Together with complete genome sequence information, the ability to generate genome-wide patterns of changes in gene expression at both the mRNA and protein levels has signaled a new era in the ability to monitor, understand, and manipulate organisms. However, the promise of this new paradigm in the study of biological systems has remained unfulfilled due to technical challenges (e.g., an inability to quantify changes in protein expression for all expressed proteins) and a lack of useful statistical and computational tools to manage, interpret, and integrate these new types of information. Thus, many of the previous studies that rely on both mRNA and protein expression profiles have been observational in nature (Anderson and Seilhamer, 1997; Baliga et al., 2002; Betts et al., 2002; Griffin et al., 2002; Gygi et al., 1999). The relevant previous literature originates in studies of human liver based on Coomassie blue-stained two-dimensional protein electrophoresis (2DE) measurements for protein expression and on transcript image methods for mRNA measurements (Anderson and Seilhamer, 1997). Investigators have observed a correlation coefficient of 0.48 between mRNA and protein abundances for 19 different genes. A larger study in yeast (Gygi et al., 1999), which used serial analysis of gene expression for mRNA expression level quantification and silver-stained 2DE for protein measurements, had a Pearson product moment correlation coefficient of 0.935 for a set of 106 genes. However, this value would change to 0.356 if one considers genes expressed at ten message copies per cell or fewer, which accounts for 69% (73 of 106) of the genes measured. Beyond these two initial investigations, other studies have reported on a variety of organisms (Baliga et al., 2002; Betts et al., 2002; Griffin et al., 2002; Ideker et al., 2001). A key feature of all the observations and a common issue raised in the discussion of such results is the lack of an obvious linear correlation between mRNA expression and protein expression. The poor correlation between mRNA and protein expression levels may be the result of multiple factors. Technical variability is a key challenge that remains for both mRNA expression profiling and protein expression profiling. Experimental variability leads many investigators to consider biological and chip replicates for array measurements and biological and gel replicates for protein studies. The issue of variability is confounded by measurements of low-abundance mRNA and proteins that are often at or below the statistically significant range permitted by the experimental technique (Gygi et al., 2000), and has motivated the proteomics community in particular to consider alternative strategies. Nonetheless, the previous studies have provided important data regarding the mRNA-to-protein relationship and raise important questions about the broad applicability and utility of genome-wide measurements. An obvious conclusion that can be drawn from these studies is that it is essential to measure both mRNA and protein expression profiles, whenever feasible, to gain insights into a biological system. The integration of these measurements together with DNA sequence information provides a starting point for further analysis. Indeed, it has been shown mathematically that an understanding of simple networks of gene expression requires analysis at both the mRNA and protein levels (Hatzimanikatis and Lee, 1999). Of course, genome-wide measurements of protein posttranslational modifications, activities, and metabolite concentrations are also important, but analogous technologies do not yet exist for these measurements. The need for the integration of DNA, mRNA, and protein measurements is at least as important as the need for novel information technology platforms and databases that permit the storage and management of information of multiple data types (such as sequences, arrays, gel images, spectra, etc.). Although there are unmet needs with respect to standards for the organization of this information, an often overlooked aspect of the integration of different levels of biological information is the synergistic use of these data to understand the underlying biological phenomenon in the system of interest. The development of an effective, mechanistic, mathematical framework to describe gene expression (DNA sequence, and mRNA and protein expression) may provide an important advance in the ability to understand biological systems for at least two reasons. First, it may enable one to predict changes in protein expression, based on changes in mRNA expression. The ability to predict such changes would address some of the technical challenges that remain in protein sample preparation and separations, which make it currently impossible to measure protein expression for all expressed proteins in a biological system in a single experiment. Second, and perhaps more importantly, the combination of experimental data integrated into a mathematical framework may be useful in identifying which key regulatory features of translation are modified by a prokaryote in response to environmental or genetic perturbations. Examples of such regulatory features are the mRNA degradation rates for particular messages, the ribosome copy number, or the protein degradation rates. If one could obtain such knowledge about the regulatory features of a system of interest, then subsequent metabolic and genetic engineering of the system could be facilitated. The effective integration of mRNA and protein expression data within a mathematical framework will probably rely on measurements of the relative change in gene expression rather than on absolute measurements. The use of relative changes versus absolute measurements is more consistent with the use of dimensionless equations and also easier to measure by experiment. One feature of the previous studies that makes it difficult to integrate the prior data into appropriate modeling frameworks is the lack of relative measurements. Typically, the data have been presented as an abundance of mRNA or abundance of protein, perhaps over two different conditions (genetic or environmental perturbations). The use of relative ratios or scaled parameters in modeling efforts typically requires at least three experimental conditions to be tested. Thus, in this study we report experimental measurements and analyze them in the context of a mathematical framework for translation (Mehra et al., 2003). In this investigation we present mRNA expression and protein expression levels based on an Affymetrix GeneChip probe array analysis of Escherichia coli compared with protein expression levels measured using fluorescent tagging of 2DE-separated proteins imaged using a laser-based scanning system. We use these data in the development of a mechanistic mathematical framework for prokaryotic translation. We chose to employ these technologies because they are used extensively for similar measurements and because computational tools to extract data from the raw images are commercially available. For the mRNA profile measurements in particular, we note that there are multiple analysis techniques to quantify gene expression based on the same raw dataset. For proteins, there are also a variety of quantification methods for protein expression based on different staining and imaging technologies. We present results using a variety of methods to test both genetic (a hemolysin supersecretion phenotype) and environmental (varying concentrations of IPTG isopropyl-B-D-thiogalactopyranoside) perturbations and demonstrate that similar results can be obtained using any of these methods if appropriate normalization is used. Our measurements further suggest that simple models of gene expression may provide a useful foundation for the integration of mRNA and protein expression information. MATERIALS AND METHODS Strains and Culture Conditions In what follows we consider the effect that a minor environmental perturbation may have on Escherichia coli LEE ET AL.: RELATION BETWEEN MRNA AND PROTEIN EXPRESSION 835 gene expression with the addition of IPTG, commonly used as an inducer for heterologous gene expression. Frozen E. coli MG1655 cell stocks were streaked onto Luria– Bertani (LB) plates and grown at 37jC. Single colonies were subsequently streaked onto M9 minimal medium plates supplemented with 0.4% glucose and 1 mg./L thiamine and incubated at 37jC; single colonies were used to inoculate 125-mL flasks containing 25 mL of M9 minimal medium with 0.4% glucose and 1 mg/L thiamine, and then shaken at 250 rpm at 37jC. Overnight cultures were subcultured in six 250-mL flasks containing 50 mL of M9 minimal medium with glucose and thiamine, at a starting OD600 = 0.1. When cultures reached OD600 = 0.8, four cultures were supplemented with IPTG at either 0.1 mM (low) or 1 mM (high). Flasks were shaken for 1 h after IPTG addition, and cells were harvested by centrifugation. For [35S]-methioninelabeled samples, six 1-mL aliquots were removed at OD600 = 0.6 and grown in culture tubes until OD600 = 0.8. IPTG was added at 0.1 mM (low) or 1 mM IPTG (high) to four cultures. Thirty minutes after addition of IPTG, 100 ACi [35S]methionine was added, corresponding to 0.083 AM. Five minutes later, cultures were supplemented with 1 AM cold methionine. Culture tubes were grown for 30 min after methionine chase and cells were harvested by centrifugation. We designated the samples as having no, low, or high levels of IPTG. In the case of studying the effect of a genetic mutation on Escherichia coli gene expression, we considered the effect of hypersecretion of hemolysin. Plasmids pWAM1097 and pWAM716 were obtained from Rodney Welch at the University of Wisconsin (Felmlee et al., 1985). pWAM1097 contains the genes for HlyC, HlyA, and ampicillin resistance. pWAM716 contains the genes for HlyB, HlyD, and chloramphenicol resistance. In this study the strains used were W3110, W3110 with the pWAM1097 and pWAM716 plasmids (Hly parent), and a hypersecretion mutant strain (Lee and Lee, in preparation) derived from the Hly parent strain (Hly mutant). Frozen stocks of these three strains were streaked onto LB plates, or LB plates with 150 Ag/mL ampicillin and 170 Ag/mL chloramphenicol, and incubated at 37jC to form single colonies. Flasks (250 mL) containing 50 mL of LB medium, or LB with 150 Ag/mL ampicillin and 170Ag/mLchloramphenicol, were inoculated with single colonies of W3110, Hly parent, or Hly mutant. Cultures were split at OD600 g 0.1 to create biological replicates. Flasks were shaken at 250 rpm at 37jC until OD600 = 1.0 (mid-log phase). Cells were harvested by centrifugation at 4jC. We designated samples as W3110, Hly parent, or Hly mutant. mRNA analysis Samples for RNA purification were resuspended in RNAlater (Ambion, Austin, TX) and stored at 4jC until processing (no more than 3 days after harvesting). Prior to processing, cells were pelleted and washed in cold phosphate-buffered saline to remove RNAlater. Samples were processed using MasterPure RNA purification kits 836 (Epicentre, Madison, WI). Cells were lysed using proteinase K, and genomic DNA was removed using DNase. RNA was quantified using A260 measurements, and purity was assessed using the A260 /A280 ratio. MG1655 samples were treated with polymerase chain reaction (PCR) primers for rRNA, and the double-stranded fragments digested, thereby enriching the sample for mRNA. The enriched mRNA was directly end-labeled with biotin, to which fluorescently tagged streptavidin binds. The fluorescently labeled mRNA was then hybridized to E. coli sense GeneChip probe arrays (Affymetrix, Santa Clara, CA). cDNA fragments from W3110-derived samples were created by two rounds of reverse-transcriptase PCR using random hexamer primers (Rosenow et al., 2001). The resulting cDNA was then fluorescently labeled and hybridized to E. coli antisense GeneChip probe arrays (Affymetrix). The resulting data were analyzed using MICROARRAY SUITE 4.0 (Affymetrix), and genome array processing software (GAPS) (Selinger et al., 2000). The GAPS software automatically calculates and subtracts background for each chip set. For the W3110 sample sets, the calculated background levels were high relative to gene values; thus, background values were added back to gene intensities. IPTG sample background levels were much lower than gene values, allowing direct use of these data. Protein Analysis 2DE was performed as previously described (Hatzimanikatis et al., 1999). MG1655 samples were lysed using sonication as described previously (Choe et al., 1999). W3110 samples were lysed using a combination of freeze – thaw cycles and sonication (Lee and Lee, 2003). Lysed samples were used to rehydrate pH 3– 10 nonlinear Immobiline gels (Amersham Biosciences, Piscataway, NJ). Isoelectric focusing and sodium dodecylsulfate – polyacrylamide gel electrophoresis (SDS-PAGE) were performed as previously described (Choe et al., 1999). 2DE-separated proteins were visualized using one of a variety of techniques. Colloidal blue stain (Invitrogen, Carlsbad, CA) was performed for 24 h and destained in water and then visualized using a laser densitometer (Molecular Dynamics). Sypro Ruby stain (Molecular Probes, Eugene, OR) was performed overnight followed by destaining in 7% acetic acid and 10% methanol overnight, then visualized with a laser fluorescence scanner (Model FLA-3000, Fujifilm Medical Systems, Stamford, CT). Ammoniacal silver stain was performed as previously described (Hatzimanikatis et al., 1999) and imaged using a laser densitometer (Molecular Dynamics). [35S] gels were fixed in 40% ethanol, 10% acetic acid for 30 min; rehydrated in 10% glycerol, 5% ethanol, 5% acetic acid for 1 h; and dried (GelAir Dryer, Bio-Rad Laboratories) according to the manufacturer’s instructions. These gels were subsequently imaged with a phosphorimager (Model GS-525, Bio-Rad) using Imaging Screen-CS, which was exposed for 3 days. In addition to the dried 2DE gel, a dilution series of [35S] standards loaded into a microwell plate was imaged with BIOTECHNOLOGY AND BIOENGINEERING, VOL. 84, NO. 7, DECEMBER 30, 2003 each [35S] 2DE gel and also counted using a scintillation counter (Model LS6800, Beckman) to serve as a reference value for normalization across gels. Image analysis for 2DE was performed with MELANIE 3 software (GeneBio, Geneva, Switzerland) using manual editing after spot detection with default parameters. The percent volume (%vol) was used for images generated with protein stain, and the volume, normalized against the amount of standards loaded (vol/stds), was used for [35S]-based images. These parameters were selected to normalize for any slight variation in protein load or imaging times. 2DE protein spots for characterization by mass spectrometry were excised from 2DE gels, digested using trypsin, and analyzed using a (Model 4700 Proteomics Analyzer Applied Biosystems) as described elsewhere (Lee and Lee, 2003). Mascot searches of the E. coli genome using both MS and MSMS data were performed using GPS EXPLORER, version 1.1 (Applied Biosystems), and confidence intervals >95% were accepted. RESULTS AND DISCUSSION Selection of Metrics A key consideration in the measurement of changes in mRNA expression profiles is the selection of an appropriate metric that distills the raw data (fluorescence intensities of Affymetrix probe pairs and probe sets) into values for the expression level of various genes. In the current context, we were particularly interested in three possible metrics, which have been commonly used as reasonable measures of gene expression (Selinger et al., 2000). These metrics are the Average Difference, 2Max, and Median (Selinger et al., 2000) and can be extracted from the MICROARRAY SUITE and GAPS software packages. As a representative example of how the data are extracted, consider the mRNA expression changes for leuC, 3-isopropylmalate isomerase subunit (P30127; B0072), a gene consisting of 1401 basepairs (bp). We calculated the fold change in mRNA expression for the case of low IPTG induction relative to the no-IPTG case, Table I. Comparison of mRNA expression metrics. mRNA fold change leuC Average difference 2 maximum Median Average of metrics CV of metrics uspA Average difference 2 maximum Median Average of metrics CV of metrics Low/no IPTG High/no IPTG High to low 1.29 1.08 1.75 1.75 1.23 2.09 1.36 1.14 1.19 1.23 9.3% 1.59 1.94 1.85 3.35 6.94 3.37 2.11 3.58 1.82 2.5 37.7% Table II. Comparison of protein expression metrics. Protein fold change leuC Ruby (%vol) Silver (%vol) Blue (%vol) [35S] (vol/stds) Average of metrics CV of metrics uspA Ruby (%vol) Silver (%vol) Average of metrics CV of metrics Low/no IPTG High/no IPTG High to low 1.14 1.99 0.97 0.76 0.88 1.75 0.58 0.35 0.77 0.88 0.60 0.46 0.68 27.3% 1.11 0.93 1.13 1.06 1.02 1.14 1.08 7.9% and the fold change in mRNA expression for the case of high IPTG induction relative to the no-IPTG case, using each of the three possible metrics just given. leuC message was reported as ‘‘present’’ by both the MICROARRAY SUITE software as well as by GAPS software. Table I provides the values of these two cases relative to a reference state because our objective was to integrate and interpret this information in the context of a mathematical framework (Mehra et al., 2003). We observed that the relative expression change for the high versus low case for leuC under these conditions is consistent among these three metrics, with an average value of 1.23 and a coefficient of variation (CV) of 9%. This result suggests that the selection of a particular metric may not introduce substantial errors in measurement for this gene in the conditions tested. This increase in leuC expression of 23% for a comparison of low to high IPTG was reported to be statistically significant by both the MICROARRAY SUITE software and GAPS software. Table I reports similar measurements on uspA, universal stress protein A (P28242; B3495). Here, we again observed a statistically significant 250% increase in uspA message expression with CV = 38% among the different metrics. Based on these similar initial results, we reported mRNA expression analysis results using the Average Difference metric. The quantitation of protein expression using 2DE can be accomplished by a number of different detection chemistries and metrics. In the current study we consider the possibility of using fluorescence stains, ammoniacal silver, colloidal blue, and radioisotope tagging of proteins. The use of Sypro Ruby is known to provide a reasonably linear and broad dynamic range for protein expression quantitation in 2DE gels (Lopez et al., 2000), and we have previously reported the effective use of ammoniacal silver stain as a method for quantitation (Dutt and Lee, 1999). Here again, as a representative example, we considered first the protein fold change for leuC using these four measures of protein expression. The results using each of these detection strategies are reported in Table II. Figure 1 depicts the relevant regions from 2DE gels including the identity of some of the protein spots near the leuC protein. We observed a 32% decrease in protein expression for leuC for the low LEE ET AL.: RELATION BETWEEN MRNA AND PROTEIN EXPRESSION 837 Figure 1. Comparison of the regions of 2DE gels visualized using Sypro Ruby, ammoniacal silver, colloidal blue, and [35S]. The positions of several identified protein spots are noted, including the position of the protein corresponding to leuC. versus high IPTG case. In this case, the CV for the different measures of leuC protein was 27%. Note that this decrease of 32% in protein expression for leuC is in contrast with the 23% increase in corresponding message expression. For the case of uspA (Fig. 2) we reported values from Sypro Ruby and silver stains, as these were the most reliable values obtained for this experiment. The expression level of this protein remained unchanged for the low versus high IPTG experiment with a CV of 7.8%. Based on these initial results, we reported protein expression analysis data using Sypro Ruby stain, because this is a commonly used stain with excellent dynamic range and reproducibility (Choe and Lee, 2003). Comparison of Protein Amplification Factor and mRNA Amplification Factor Figures 3 and 4 present data for 77 genes from the IPTG and 52 genes from the hemolysin experiments. Criteria for inclusion in the dataset require that the mRNA for a particular gene be expressed (versus absent) and for the 2DE spot to have been identified. These data are termed ‘‘mRNA amplification factors,’’ as measured using the Average Difference metric, and ‘‘protein amplification factors,’’ as measured using the percent volume metric from Sypro Ruby-stained 2DE gels. The line of one-to-one correspondence between mRNA and protein expression is presented in the figures as a reference. Consistent with observation in other organisms, we observed no clear relationship between mRNA amplification and protein amplification factors for Escherichia coli. The r2 correlation coefficients for the cases 838 studied were 0.02 for the low/no IPTG case, 7e-4 for the high/no IPTG case, 0.04 for the W3110/Hly parent case, and 3e-4 for the Hly mutant/Hly parent case. These values reinforce the notion that, although the regulation of mRNA expression might have evolved to support cellular functions under perturbations, the translation of this transcriptional program into protein, and ultimately into cellular phenotype, is complex. This complexity calls for the development of mathematical frameworks for the analysis and study of gene and protein expression that will provide insights into key biological parameters that affect the mRNA – protein expression relationship for individual genes. Theoretical studies and mathematical analyses have suggested that the genome-wide average mRNA amplification factor fGm (or f m from Mehra et al., 2003) has a significant effect on the relation between mRNA and protein expression levels (Table III). Values >1 suggest that the total amount of message has increased; whereas values <1 suggest a decrease. From dimensionless analysis (Equation VIII.4 in Mehra et al. [2003]), we expect that if this value is <1, then one can predict that, in general, but not for any particular gene, the ratio of protein to mRNA expression should be >1, assuming no change in the underlying kinetic and thermodynamic properties of the system; that is, the ‘‘centroid’’ of data points in a plot similar to Figures 3 and 4 should be above the one-to-one correspondence line. As the value of fGm approaches 1, we would expect that the centroid would approach the one-to-one line. Furthermore, we predict that, as fGm increases to >1, then the centroid of the data will move further below the one-to-one correspondence line. We calculated the fGm based on all expressed genes in the genome using the MICROARRAY SUITE and GAPS software for each of the four cases studied and present these values in Figures 3 and 4. We observed, in a comparison of low to high IPTG (relative to the same reference state), a change of fGm from 1.15 to 1.29. For the hemolysin-secreting strains we observed a change in this value from 0.85 to 0.95. The centroid of the data points for each of the four cases was generated by calculating the fEm and the fEp. These terms refer to the average value of the mRNA and protein amplifications from the subset of genes for which we have experimental observations on both message and protein. The experimental data collected are consistent with the predictions from our model of translation (Mehra et al., 2003). We also calculated the ratio of the lengths of the line segments connecting the centroid point to the one-to-one line Figure 2. Comparison of the regions of 2DE gels stained using Sypro Ruby and ammoniacal silver. The positions of several identified protein spots are noted, as is the position of uspA. BIOTECHNOLOGY AND BIOENGINEERING, VOL. 84, NO. 7, DECEMBER 30, 2003 Figure 3. A plot of the relative mRNA and protein amplification factors for a change in IPTG concentration for Escherichia coli MG1655. The values for individual genes are plotted as the ratio of the condition of interest: (A) 0.1 mM IPTG or (B) 1.0 mM IPTG, relative to a reference state (0 mM IPTG). The average mRNA amplification factor calculated for all expressed genes is listed and denoted by fGm for ‘‘genome-wide’’ mRNA amplification factor. The centroid of the experimentally observed data points (average mRNA and protein amplification factors) denoted by the x and y coordinates, fEm and fEp , is also plotted and listed. and the ratios of the deviations from one of the fGm for the IPTG and the hemolysin experiments; for example:ðfGm 1ÞIPTGlo =ðfGm 1ÞIPTGhi . In the case of the IPTG experiments, the ratio of the lengths of line segments for high versus low, and that is based on fEm and fEp, was 1.98, and the ratio of the deviation from one of the fGm values was 1.93, which is in excellent agreement. Note that the difference between the values may represent the fact that the centroid value and the line segment calculation were based only on data collected from genes in which we were able to measure both protein and mRNA expression; whereas the fGm was calculated from mRNA expression for all expressed genes in the genome (with no regard to protein expression). In the case of the Hly experiment, we also observed surprisingly close agreement between the line segment length ratio of fEm and fEp (W3110 versus Hly mutant) of 2.98 and the ratio of the deviation from one of the fGm , which was 3.00. This result is surprising because the experimental data represent only a Figure 4. A plot of the relative mRNA and protein amplification factors for a change in genetic background for Escherichia coli W3110. The values for individual genes are plotted as the ratio of the condition of interest: (A) wild-type or (B) a hypersecretion mutant, relative to a reference state (a secreting parent strain). The average mRNA amplification factor calculated for all expressed genes is listed and denoted by fGm for ‘‘genome-wide’’ mRNA amplification factor. The centroid of the experimentally observed data points (average mRNA and protein amplification factors) denoted by the x and y coordinates, fEm and fEp , is also plotted and listed. LEE ET AL.: RELATION BETWEEN MRNA AND PROTEIN EXPRESSION 839 Table III. Definitions for amplication factors. fGm fEm fEp The average mRNA amplification factor for all expressed genes genome-wide. Analogous to f m from Mehra et al. (2003). The average mRNA amplification factor for the experimentally observed subset of genes. Only includes genes for which mRNA and protein expression was measured. The average protein amplification factor for the experimentally observed subset of genes. Only includes genes for which mRNA and protein expression was measured. relatively small fraction of the total expressed genes, whereas the fGm was calculated from all of the expressed genes. Thus, the experimental observations, which are a small subset of total genes expressed, seem to be representative of the behavior of the entire system for the cases studied. There are reports in the literature about the drawbacks of using only the experimentally observed data because of bias introduced in the proteomics experiments (e.g., Gygi et al., 2000). However, these observations were made based on absolute measurements and did not provide observations on relative changes in gene expression. The use of relative changes in gene expression may result in the ability to better extrapolate system-wide behavior based on measurements of only part of the system. A closer inspection of Figures 3 and 4 also yields interesting observations about the mRNA –protein relationship. Consider the vector traversed by the centroid between Figure 3A and 3B. In the cases studied, the centroid migrated more along the horizontal direction than the vertical direction (0.28 units versus 0.09 units for the IPTG case). This suggests that, although the fEm may have been increasing in the cases studied, the corresponding change in fEp was not as great. If there was no change in the underlying kinetics and thermodynamics of the system upon perturbation with IPTG, then the observed change in centroid position would necessarily correspond to no significant change in the number of ribosomes (or other factors not considered here, such as mRNA or protein degradation rates). If the availability of total ribosomes was constant and if the amount of mRNA were to increase, then one would expect the corresponding protein expression levels to decrease relative to the change in mRNA because ribosome availability would become limiting. That is, the presence of more messages without a change in the number of ribosomes engaged in translation will result in proportionately less protein expressed. In Figure 3, protein levels do not increase proportionately based on the migration of the centroid. We therefore conclude that there may have been no significant increase in the number of ribosomes in the conditions tested in the present experiments. Taken further, if one were to observe a substantial change in fEp and a relatively small change in fEm , then one may conclude that the number and availability of ribosomes would increase significantly. Although we did not observe 840 this experimentally, it could be expected that the only means to disproportionately increase the amount of protein expression relative to message expression is to increase the number of ribosomes engaged in translation. In conclusion, the approaches presented herein represent an initial attempt to quantify gene expression levels and integrate these data into a modeling framework. Based on these observations, we found that experimental mRNA and protein expression data on a subset of the genes expressed in Escherichia coli were representative of genome-wide response. Furthermore, the change in relative position of the ‘‘centroid’’ of experimental data points (the fEm and the fEp ), supported by theoretical analysis, suggests a critical role for ribosomes in determining the relationship between mRNA and protein expression. We attempted to use metrics and experimental procedures that are generally available. However, the use of improved methods for protein quantitation, such as those offered by isotopic dilution strategies, may offer further refinement of these data. A key outcome of this study is the observation that, although experimental measurements of mRNA and protein expression may not reveal any obvious correlation, such measurements can provide important insights into gene expression regulation—especially when taken in the context of a mathematical framework for translation. The authors thank Dave Garfin at Bio-Rad Laboratories for access to a gel dryer, and Rodney Welch for the hemolysin plasmids. We also thank Andrew Brooks at the University of Rochester Medical Center for Affymetrix analysis. References Anderson L, Seilhamer J. 1997. A comparison of selected mRNA and protein abundances in human liver. Electrophoresis 18:533 – 537. Baliga NS, Pan M, Goo YA, Yi EC, Goodlett DR, Dimitrov K, Shannon P, Aebersold R, Ng WV, Hood L. 2002. Coordinate regulation of energy transduction modules in Halobacterium sp analyzed by a global systems approach. Proc Natl Acad Sci USA 99:14913 – 14918. Betts JC, Lukey PT, Robb LC, McAdam RA, Duncan K. 2002. Evaluation of a nutrient starvation model of Mycobacterium tuberculosis persistence by gene and protein expression profiling. Mol Microbiol 43:717 – 731. Choe LH, Chen W, Lee KH. 1999. Proteome analysis of factor for inversion stimulation (Fis) overproduction in Escherichia coli. Electrophoresis 20:798 – 805. Choe LH, Lee KH. 2003. Quantitative and qualitative measure of intralaboratory two-dimensional gel reproducibility and the effects of sample preparation, sample load, and image analysis. Electrophoresis 24:3500 – 3507. Dutt MJ, Lee KH. 2001. The scaled volume as an image analysis variable for detecting changes in protein expression levels by silver stain. Electrophoresis 22:1627 – 1632. Felmlee T, Pellett S, Welch R. 1985. Nucleotide sequence of an Escherichia coli chromosomal hemolysin. J Bacteriol 163:94 – 105. Griffin TJ, Gygi SP, Ideker T, Rist B, Eng J, Hood L, Aebersold R. 2002. Complementary profiling of gene expression at the transcriptome and proteome levels in Saccharomyces cerevisiae. Mol Cell Proteomics 1: 323 – 333. Gygi SP, Rochon Y, Franza BR, Aebersold R. 1999. Correlation between protein and mRNA abundance in yeast. Mol Cell Biol 19:1720 – 1730. Gygi SP, Corthals GL, Zhang Y, Rochon Y, Aebersold R. 2000. BIOTECHNOLOGY AND BIOENGINEERING, VOL. 84, NO. 7, DECEMBER 30, 2003 Evaluation of two-dimensional gel electrophoresis-based proteome analysis technology. Proc Natl Acad Sci USA 97:9390 – 9395. Hatzimanikatis V, Choe LH, Lee KH. 1999. Proteomics: Theoretical and experimental considerations. Biotechnol Progr 15:312 – 318. Hatzimanikatis V, Lee KH. 1999. Dynamical analysis of gene networks requires both mRNA and protein expression information. Metab Eng 1:275 – 281. Ideker T, Thorsson V, Ranish JA, Christmas R, Buhler J, Eng JK, Bumgarner R, Goodlett DR, Aebersold R, Hood L. 2001. Integrated genomic and proteomic analyses of a systematically perturbed metabolic network. Science 292:929 – 934. Lee P, Lee KH. 2003. Escherichia coli—a model system that benefits from and contributes to the evolution of proteomics. Biotechnol Bioeng 84:801 – 814. Lopez MF, Berggren K, Chernokalskaya E, Lazarev A, Robinson M, Patton WF. 2000. A comparison of silver stain and Sypro Ruby protein gel stain with respect to protein detection in two-dimensional gels and identification by peptide mass profiling. Electrophoresis 21: 3673 – 3683. Mehra A, Lee KH, Hatzimanikatis VH. Insights into the relation between mRNA and protein expression patterns. I. Theoretical considerations. 2003. Biotechnol Bioeng 84:822 – 833. Rosenow C, Saxena RM, Durst M, Gingeras TR. 2001. Prokaryotic RNA preparation methods useful for high density array analysis: Comparison of two approaches. Nucl Acids Res 29:1 – 8. Selinger DW, Cheung KJ, Mei R, Johansson EM, Richmond CS, Blattner FR, Lockhart DJ, Church GM. 2000. RNA expression analysis using a 30 base pair resolution Escherichia coli genome array. Nat Biotechnol 18:1262 – 1268. Yoshida K, Kobayashi K, Miwa Y, Kang CM, Matsunaga M, Yamaguchi H, Tojo S, Yamamoto M, Nishi R, Ogasawara N, Nakayama T, Fujita Y. 2001. Combined transcriptome and proteome analysis as a powerful approach to study genes under glucose repression in Bacillus subtilis. Nucl Acids Res 29:683 – 692. LEE ET AL.: RELATION BETWEEN MRNA AND PROTEIN EXPRESSION 841