What are the most abundant proteins in a cell? Even after reading several textbooks on proteins, one may still be left wondering which of these critical molecular players in the life of a cell are the most quantitatively abundant. Though figuring this out by pure thought alone is generally not easy, cells in the leaves of plants are that rare case in which it is relatively easy to make an estimate. The carbon-fixing enzyme Rubisco, the molecular gatekeeper between the inorganic and the organic worlds is required at extremely high concentrations. Let’s see why. As schematically depicted in Figure 1, the photon flux under full illumination is about 2000 microEinstein/m2-s. About 10-30% of this flux is maximally utilized and beyond that there is saturation of the photosynthetic apparatus. About every 10 photons supply enough energy to fix one carbon atom. Rubisco works at a sluggish maximal rate of ≈1-3 per sec per catalytic site. From this alone, we can see that the cell thus needs ≈0.3-3x107 Rubisco molecules per micron2 cross section. A Rubisco monomer has a mass of 60kDalton (BNID 105007) and so the weight per micron2 is ≈0.3-3x10-12 g. Let’s estimate the total protein content in leaf. A characteristic leaf has a height of about 200 micron. ≈80% of the volume is vacuoles (BNID 103442) and the dry mass will be ≈30% of this volume with proteins consisting about half, so we arrive at about 6x10-12 g of protein per cell as derived in Figure 1. We conclude that about 5-50% of the protein mass is Rubisco. Indeed, the experimental determinations in C3 plants such as wheat, potato and tobacco find that Rubisco constitutes in the range of 25-60% of all soluble proteins in such cells (BNID 101762). The protein census for other organisms, even model microorganisms, is more complicated. In the late 1970s, a unique catalog of the quantities of 140 proteins under different growth rates in E. coli was created using 2D gel electrophoresis and 14C labeling (Pedersen et al, Cell 1978 BNID 106195). Newer methods have recently enabled extensive protein wide surveys of protein content using mass spectrometry (BNID xxx), TAP labeling (Ghaemmaghami 2003, BNID 101845) and fluorescent light microscopy (Taniguchi et al., 2010, BNID xxx). A new database (http://paxdb.org/) has been created to collect such data on protein abundances across organisms. The picture emerging from these kinds of experiments shows several prominent players. First, not surprisingly, ribosomal proteins and their ancillary components are highly abundant. The elongation factor EF-TU, responsible for mediating the entrance of the tRNA to the free site of the ribosome, was characterized as the most abundant protein in the original 1978 catalog with a copy number of ~58,000 proteins per bacterial genome. This absolute molecular count can be repackaged in concentration units and is roughly equivalent to 100 μM (BNID 104733). Recall that under different growth conditions the cell size and thus total protein content can change several fold (see, for example, the vignette on yeast size) and this media dependence to the protein census is especially important for ribosomal proteins. Another contender for the title of most abundant protein is ACP, the Acyl carrier protein, which plays an important role in fatty acid biosynthesis. This protein carries fatty acid chains as the chains are elongated. It is claimed to be the most abundant protein in E. coli, with about 60,000 molecules per cell (BNID 106194). In a recent high throughput mass spectrometry measurement on minimal medium (Lu, 2007 BNID 104246), a value of ≈76,000 was reported making it the third most abundant protein reported. Table 1 gives a rank ordering of some of the most ubiquitous proteins found in E. coli, though it should be noted that there are inconsistencies between the different experimental approaches that have not yet been fully settled. The most abundant protein found in this particular survey of E. coli is RplL, a ribosomal protein (estimated at ≈109,000 copies per cell, and reported (Subrananlan, 1975) to be in 4 copies per ribosome in contrast to other ribosomal proteins which have one copy per ribosome) and TufB (the elongation factor also known as EF-TU, estimated at ≈87,000 copies per cell). The next most abundant reported proteins are GroS (MopB, 65,000), a component of the chaperone system GroEL-Gro-ES necessary for proper folding of many proteins and GapA (49,000), a key enzyme in glycolysis. Structural proteins can also be highly abundant. FimA is the major subunit of the 100-300 fimbria (pili) of E. coli (BNID 101473). Every pilus has about 1000 copies (BNID 100107) and thus a simple estimate leads us to expect hundreds of thousands of this repeating monomer on the outside of the cell. As noted above, protein content varies based on growth conditions and gene induction. For example, LacZ, the gene responsible for breaking lactose into glucose and galactose is usually repressed and the protein has only a small number of copies (10 to 20, BNID 106200), but under full induction was characterized to have a concentration of 50uM (BNID 100735), i.e. about 100,000 copies per cell. In summary, though different measurement methods can vary significantly even under similar conditions the overall picture of the most abundant proteins in E.coli is generally consistent. As usual, it is interesting to contrast what has been discovered in bacteria with similar experiments in eukaryotic microorganisms. In yeast, an overall estimate of ≈50,000,000 proteins per cell was reported (BNID 106198). Measurements based on a TAP tag (BNID 101845 Ghaemmaghami 2003) report that out of this huge store of proteins, only three are found with over a million copies per cell. These are a cell wall protein (YKL096W-a), the Plasma membrane H+-ATPase (YGL008C), that pumps protons out of the cell and Fructose 1,6-bisphosphate aldolase (YKL060C), essential for glycolysis and gluconeogenesis. Different reports on the abundance of proteins in glycolysis, an intensely studied model system, led to an overall estimate of ≈25% of total protein content (BNID 101928). Like with E. coli, in yeast as well, new highthroughput MS data is becoming available (BNID 104245, 104188). Table 1 shows the top 10 most abundant yeast proteins in rich as well as minimal media. In rich media, the proteins with highest abundance are mostly glycolytic. In minimal media the most abundant proteins are still of unclear function, which further highlights our limited knowledge on these most elementary questions to date. Why are people going to all the trouble of carrying out these increasingly refined censuses of some of the most favored model organisms? Many of the biochemical and regulatory pathways that make up the life of a cell have been or are now being mapped with exquisite detail and many of the nodes have essential roles. But a wiring diagram does not a cell make. To really understand the relative rates of the various components of these pathways, we need to know about the abundances of the various proteins and their substrates. Further, if one is interested in assessing the biosynthetic burden of these various molecular players, the actual abundance is critical. Similarly, the many binding reactions that are the basis for much of the busy biochemical activity of cells, whether specific binding of intentional partners or spurious nonspecific binding between unnatural partners is ultimately dictated by molecular counts. Finally, there is a growing appreciation of the constraints that are inflicted on the cell as a result of noise in copy numbers. For understanding and predicting such effects it is vital to know if one is dealing with tens of thousands of copies per cell or only tens of copies per cell, as turns out to often be the case in unicellular organisms. In these small-numbers limits, fluctuations are a fact of life and both we and the cell must account for them. Figure 1: Estimate of the fraction of Rubisco proteins of total protein content in a leaf cell. Table 1-2: Most abundant proteins in prokaryotes and eukaryotes. Several methods using mass spec (APEX, Lu et al., 2007 PMID 17187058), using a yellow fluorescent protein fusion library (Taniguchi et al., 2010 PMID 20671182), creation of a yeast fusion library where each open reading frame is tagged with a high-affinity epitope and expressed from its natural chromosomal location (Ghaemmaghami et al., 2003 PMID 14562106 ) and mass spectrometry data of mouse fibroblast cells (Schwanha¨usser et al., 2011 PMID 21593866). Gene annotation: Yeast -SGD, E. coli – Ecoliwiki, mouse-Uniprot. Color code: yellow – translation, cyan – glycolysis, green – chaperones. The sum is based on adding together all the absolute values reported in each study. Protein rank 1 2 E. coli – minim al media, Nat Biotec hnol, Lu 2007 (total of 23×106 protein s/cell, sum of protein s in referen ce is 2,500,0 00) RplL, 4.4%, 50S riboso mal subunit (**) TufB, 3.5%, EF-Tu, Elongat ion FactorTransla tion (***** ***) E. coli – M9 minimal media, Science, Taniguchi 2010 (sum of proteins in reference is 95,000) B. subtilis – minimal medium during exponentia l growth, Analytical Chemistry, Maass 2011 (sum of proteins in reference is 2,300,000) S. aureus –synthetic medium during exponenti al growth, Anal Chem, Maass 2011 (sum of proteins in reference is 350,000) Leptospira interrogans – EMJH ( EllinghausenMcCullough-JohnsonHarris) medium, Malmström 2009 (sum of proteins in reference is 820,000) CspC, 8.3%, stress protein (**) TufA, 4.3%, Elongation factor Tu (********) Asp23, 7.1%, Alkaline shock protein 23 (**) LipL32, 4.6%, external encapsulating structure TufA, 3.6%, protein chain elongation factor EF-Tu (********) CspD, 4.0%, Cold shock protein CspD (**) SodA, 6.9%, Superoxid e dismutase [Mn/Fe] 1 (***) Peptidoglycan associated cytoplasmic membrane, 3.7%, external encapsulating structure 3 AcpP 3.0%, acyl carrier protein (ACP) RpsV, 3.3%, 30S ribosomal subunit IlvC, 3.3%, Ketol-acid reductoiso merase (**) CspA, 4.3%, Cold shock protein (**) 60 kDa chaperonin (Protein Cpn60) (groEL protein) (Heat shock 58 kDa protein), 2.2%, nucleotide binding 4 GroS, 2.6%, 10 kDa chaper onin (****) CspE, 3.2%, DNA-binding transcriptional repressor Tuf, 3.7%, Elongatio n factor Tu (******** ) Elongation factor Tu (EFTu), 1.7%, hydrolase activity (********) 5 GapA, 2.0%, glycera ldehyd e 3phosph ate dehydr ogenas e-A (****) MetE, 1.6%, Methio nine synthas e (**) DnaK, 2.5%, chaperone Hsp70 AhpC, 3.0%, Alkyl hydroperox ide reductase subunit C (***) YfmK, 2.5%, Uncharacte rized Nacetyltrans ferase (**) RplL, 2.9%, 50S ribosomal protein L7/L12 (**) LipL36, 1.7%, external encapsulating structure GapA, 2.5%, glyceraldehyde3-phosphate dehydrogenase A (****) YheA, 2.0%, UPF0342 protein (**) Flagellin protein, 1.7%, flagellum 7 CspC, 1.6%, stress protein (**) TufB, 2.3%, protein chain elongation factor EF-Tu (********) 8 RplW, 1.5%, 50S riboso mal subunit Rho, 2.3%, transcription termination factor Icd, 1.8%, Isocitrate dehydroge nase [NADP] participates in mapk signaling pathway (**) GroS, 1.8%, 10 kDa chaperonin (****) GapA1, 2.8%, Glyceralde hyde-3phosphate dehydrog enase 1 (****) Eno, 2.1%, Enolase (***) (no name, locus SACOL2595) , 1.8%, Putative uncharact erized protein transcriptional regulator (ArsR family), 1.5%, transcription factor & regulators 6 Electron transfer flavoprotein alphasubunit, 1.5%, nucleotide binding (***) 9 RpsP, 1.2%, 30S riboso mal subunit GroS, 2.2%, 10 kDa chaperonin (****) 10 Mdh, 1.2%, Compo nent of malate dehydr ogenas e GlyA, 1.7%, serine hydroxymethyltr ansferase Protein rank S. cerevisiae rich media, Nat Biotechnol, Lu 2007 (total of 5×107 proteins/cell according to primary source, sum of proteins in reference is also 50,000,000) 1 ENO2, 6.2%, Enolase II S. cerevisiae – minimal media, Nat Biotechnol, Lu 2007 (total of 5×107 proteins/c ell according to primary source, sum of proteins in reference is also 50,000,000 ) ABM1, 4.6%, unknown function, required for normal microtubul e organizatio n SodA, 1.6%, (no name, Superoxide locus dismutase SACOL0427) [Mn] (***) , 1.7%, Putative uncharact erized protein TrxA, 1.5%, AhpC, Thioredoxi 1.4%, n (**) Alkyl hydropero xide reductase subunit C (***) LipL41, 1.3%, external encapsulating structure LipL21, 1.1%, external encapsulating structure S. cerevisiae – rich media, Nature, Ghaemmagha mi 2003 (sum of proteins in reference is 47,000,000) M. musculus (NIH3T3 cells)light (L) SILAC medium, Nature, Schwanha¨usser et al., 2011 (sum of proteins in reference is 570,000,000) CWP2, 3.4%, Cell Wall Protein ACTB, 2.8%, Actin, cytoplasmic 1 2 3 4 FBA1, 4.0%, Fructose 1,6bisphosphate aldolase (***) TDH3, 4.0%, Glyceraldehyd e-3-phosphate dehydrogenas e PGK1, 3.8%, 3phosphoglycer ate kinase 5 ENO1, 3.6%, Enolase I (***) 6 PDC1, 2.6%, Major of three pyruvate decarboxylase isozymes ADH1, 2.6%, Alcohol dehydrogenas e 7 8 TEF2, 2.4%, Translational elongation factor EF-1 alpha YMR181C, 4.2%, unknown function YLR407W, 4.2%, unknown function PMA1, 2.8%, Plasma Membrane ATPase FBA1, 2.1%, Fructose 1,6bisphosphate aldolase (***) HIST1H4A, 2.6%, Histone H4 ORT1, 3.0%, Ornithine transporter of the mitochond rial inner membrane YMR115W (SGD name: MGR3), 2.6%, Subunit of the mitochond rial (mt) iAAA protease supercomp lex YIL077C, 2.2%, unknown function ILV5, 1.9%, IsoLeucineplus-Valine requiring HIST2H2BB, 1.9%, Histone H2B type 2-B YEF3, 1.9%, Yeast Elongation Factor (translation) HIST1H3B, 1.5%, Histone H3.2 HHF2, 1.4%, Histone H Four EEF1A1, 0.93%, Elongation factor 1-alpha 1 (translation) YDR193W, 2.0%, Dubious open reading frame DOA1, 1.8%, WD repeat protein RPP2B, 1.4%, Ribosomal Protein P2 Beta RPS27A, 0.9%, Ubiquitin-40S ribosomal protein S27a HHF1, 1.1%, Histone H Four S100A4, 0.75%, Protein S100-A4 (similar to Glyceraldehyde3-phosphate dehydrogenase (GAPDH) isoform 1) HIST1H2AF, 2.6%, Histone H2A type 1-F 9 TDH2, 1.9%, Glyceraldehyd e-3-phosphate dehydrogenas e 10 CDC19, 1.8%, Pyruvate kinase CCZ1, 1.6%, Protein involved in vacuolar assembly RPS26A, 1.5%, small (40S) ribosomal subunit SOD1, 1.1%, SuperOxide Dismutase TUBB5, 0.75%, Tubulin beta-5 chain RPS26B, 1.1%, Ribosomal Protein of the Small subunit ANXA2, 0.67%, Annexin A2