What are the most abundant proteins in a cell?

advertisement
What are the most abundant proteins in a cell?
Even after reading several textbooks on proteins, one may still be left wondering which of these
critical molecular players in the life of a cell are the most quantitatively abundant. Though
figuring this out by pure thought alone is generally not easy, cells in the leaves of plants are that
rare case in which it is relatively easy to make an estimate. The carbon-fixing enzyme Rubisco,
the molecular gatekeeper between the inorganic and the organic worlds is required at extremely
high concentrations. Let’s see why. As schematically depicted in Figure 1, the photon flux under
full illumination is about 2000 microEinstein/m2-s. About 10-30% of this flux is maximally
utilized and beyond that there is saturation of the photosynthetic apparatus. About every 10
photons supply enough energy to fix one carbon atom. Rubisco works at a sluggish maximal rate
of ≈1-3 per sec per catalytic site. From this alone, we can see that the cell thus needs ≈0.3-3x107
Rubisco molecules per micron2 cross section. A Rubisco monomer has a mass of 60kDalton (BNID
105007) and so the weight per micron2 is ≈0.3-3x10-12 g. Let’s estimate the total protein content
in leaf. A characteristic leaf has a height of about 200 micron. ≈80% of the volume is vacuoles
(BNID 103442) and the dry mass will be ≈30% of this volume with proteins consisting about half,
so we arrive at about 6x10-12 g of protein per cell as derived in Figure 1. We conclude that about
5-50% of the protein mass is Rubisco. Indeed, the experimental determinations in C3 plants such
as wheat, potato and tobacco find that Rubisco constitutes in the range of 25-60% of all soluble
proteins in such cells (BNID 101762).
The protein census for other organisms, even model microorganisms, is more complicated. In the
late 1970s, a unique catalog of the quantities of 140 proteins under different growth rates in E.
coli was created using 2D gel electrophoresis and 14C labeling (Pedersen et al, Cell 1978 BNID
106195). Newer methods have recently enabled extensive protein wide surveys of protein
content using mass spectrometry (BNID xxx), TAP labeling (Ghaemmaghami 2003, BNID 101845)
and fluorescent light microscopy (Taniguchi et al., 2010, BNID xxx). A new database (http://paxdb.org/) has been created to collect such data on protein abundances across organisms. The
picture emerging from these kinds of experiments shows several prominent players. First, not
surprisingly, ribosomal proteins and their ancillary components are highly abundant. The
elongation factor EF-TU, responsible for mediating the entrance of the tRNA to the free site of the
ribosome, was characterized as the most abundant protein in the original 1978 catalog with a
copy number of ~58,000 proteins per bacterial genome. This absolute molecular count can be
repackaged in concentration units and is roughly equivalent to 100 μM (BNID 104733). Recall
that under different growth conditions the cell size and thus total protein content can change
several fold (see, for example, the vignette on yeast size) and this media dependence to the
protein census is especially important for ribosomal proteins.
Another contender for the title of most abundant protein is ACP, the Acyl carrier protein, which
plays an important role in fatty acid biosynthesis. This protein carries fatty acid chains as the
chains are elongated. It is claimed to be the most abundant protein in E. coli, with about 60,000
molecules per cell (BNID 106194). In a recent high throughput mass spectrometry measurement
on minimal medium (Lu, 2007 BNID 104246), a value of ≈76,000 was reported making it the
third most abundant protein reported. Table 1 gives a rank ordering of some of the most
ubiquitous proteins found in E. coli, though it should be noted that there are inconsistencies
between the different experimental approaches that have not yet been fully settled. The most
abundant protein found in this particular survey of E. coli is RplL, a ribosomal protein (estimated
at ≈109,000 copies per cell, and reported (Subrananlan, 1975) to be in 4 copies per ribosome in
contrast to other ribosomal proteins which have one copy per ribosome) and TufB (the
elongation factor also known as EF-TU, estimated at ≈87,000 copies per cell). The next most
abundant reported proteins are GroS (MopB, 65,000), a component of the chaperone system GroEL-Gro-ES necessary for proper folding of many proteins and GapA (49,000), a key enzyme in
glycolysis.
Structural proteins can also be highly abundant. FimA is the major subunit of the 100-300 fimbria
(pili) of E. coli (BNID 101473). Every pilus has about 1000 copies (BNID 100107) and thus a
simple estimate leads us to expect hundreds of thousands of this repeating monomer on the
outside of the cell.
As noted above, protein content varies based on growth conditions and gene induction. For
example, LacZ, the gene responsible for breaking lactose into glucose and galactose is usually
repressed and the protein has only a small number of copies (10 to 20, BNID 106200), but under
full induction was characterized to have a concentration of 50uM (BNID 100735), i.e. about
100,000 copies per cell.
In summary, though different measurement methods can vary
significantly even under similar conditions the overall picture of the most abundant proteins in
E.coli is generally consistent.
As usual, it is interesting to contrast what has been discovered in bacteria with similar
experiments in eukaryotic microorganisms.
In yeast, an overall estimate of ≈50,000,000
proteins per cell was reported (BNID 106198). Measurements based on a TAP tag (BNID 101845
Ghaemmaghami 2003) report that out of this huge store of proteins, only three are found with
over a million copies per cell. These are a cell wall protein (YKL096W-a), the Plasma membrane
H+-ATPase (YGL008C), that pumps protons out of the cell and Fructose 1,6-bisphosphate
aldolase (YKL060C), essential for glycolysis and gluconeogenesis. Different reports on the
abundance of proteins in glycolysis, an intensely studied model system, led to an overall estimate
of ≈25% of total protein content (BNID 101928). Like with E. coli, in yeast as well, new highthroughput MS data is becoming available (BNID 104245, 104188). Table 1 shows the top 10
most abundant yeast proteins in rich as well as minimal media. In rich media, the proteins with
highest abundance are mostly glycolytic. In minimal media the most abundant proteins are still of
unclear function, which further highlights our limited knowledge on these most elementary
questions to date.
Why are people going to all the trouble of carrying out these increasingly refined censuses of
some of the most favored model organisms? Many of the biochemical and regulatory pathways
that make up the life of a cell have been or are now being mapped with exquisite detail and many
of the nodes have essential roles. But a wiring diagram does not a cell make. To really
understand the relative rates of the various components of these pathways, we need to know
about the abundances of the various proteins and their substrates. Further, if one is interested in
assessing the biosynthetic burden of these various molecular players, the actual abundance is
critical. Similarly, the many binding reactions that are the basis for much of the busy
biochemical activity of cells, whether specific binding of intentional partners or spurious
nonspecific binding between unnatural partners is ultimately dictated by molecular counts.
Finally, there is a growing appreciation of the constraints that are inflicted on the cell as a result
of noise in copy numbers. For understanding and predicting such effects it is vital to know if one
is dealing with tens of thousands of copies per cell or only tens of copies per cell, as turns out to
often be the case in unicellular organisms. In these small-numbers limits, fluctuations are a fact
of life and both we and the cell must account for them.
Figure 1: Estimate of the fraction of
Rubisco proteins of total protein
content in a leaf cell.
Table 1-2: Most abundant proteins in prokaryotes and eukaryotes. Several methods using mass
spec (APEX, Lu et al., 2007 PMID 17187058), using a yellow fluorescent protein fusion library
(Taniguchi et al., 2010 PMID 20671182), creation of a yeast fusion library where each open
reading frame is tagged with a high-affinity epitope and expressed from its natural chromosomal
location (Ghaemmaghami et al., 2003 PMID 14562106 ) and mass spectrometry data of mouse
fibroblast cells (Schwanha¨usser et al., 2011 PMID 21593866). Gene annotation: Yeast -SGD, E.
coli – Ecoliwiki, mouse-Uniprot. Color code: yellow – translation, cyan – glycolysis, green –
chaperones. The sum is based on adding together all the absolute values reported in each study.
Protein
rank
1
2
E. coli
–
minim
al
media,
Nat
Biotec
hnol,
Lu
2007
(total
of 23×106
protein
s/cell,
sum of
protein
s in
referen
ce is
2,500,0
00)
RplL,
4.4%,
50S
riboso
mal
subunit
(**)
TufB,
3.5%,
EF-Tu,
Elongat
ion
FactorTransla
tion
(*****
***)
E. coli – M9
minimal media,
Science,
Taniguchi 2010
(sum of proteins
in reference is
95,000)
B. subtilis –
minimal
medium
during
exponentia
l growth,
Analytical
Chemistry,
Maass
2011 (sum
of proteins
in
reference is
2,300,000)
S. aureus
–synthetic
medium
during
exponenti
al growth,
Anal
Chem,
Maass
2011 (sum
of
proteins
in
reference
is
350,000)
Leptospira interrogans –
EMJH ( EllinghausenMcCullough-JohnsonHarris) medium,
Malmström 2009 (sum
of proteins in reference
is 820,000)
CspC, 8.3%,
stress protein
(**)
TufA, 4.3%,
Elongation
factor Tu
(********)
Asp23,
7.1%,
Alkaline
shock
protein 23
(**)
LipL32, 4.6%, external
encapsulating structure
TufA, 3.6%,
protein chain
elongation factor
EF-Tu
(********)
CspD, 4.0%,
Cold shock
protein
CspD (**)
SodA,
6.9%,
Superoxid
e
dismutase
[Mn/Fe] 1
(***)
Peptidoglycan
associated cytoplasmic
membrane, 3.7%,
external encapsulating
structure
3
AcpP
3.0%,
acyl
carrier
protein
(ACP)
RpsV, 3.3%, 30S
ribosomal
subunit
IlvC, 3.3%,
Ketol-acid
reductoiso
merase
(**)
CspA,
4.3%, Cold
shock
protein
(**)
60 kDa chaperonin
(Protein Cpn60) (groEL
protein) (Heat shock 58
kDa protein), 2.2%,
nucleotide binding
4
GroS,
2.6%,
10 kDa
chaper
onin
(****)
CspE, 3.2%,
DNA-binding
transcriptional
repressor
Tuf, 3.7%,
Elongatio
n factor
Tu
(********
)
Elongation factor Tu (EFTu), 1.7%, hydrolase
activity (********)
5
GapA,
2.0%,
glycera
ldehyd
e 3phosph
ate
dehydr
ogenas
e-A
(****)
MetE,
1.6%,
Methio
nine
synthas
e (**)
DnaK, 2.5%,
chaperone
Hsp70
AhpC,
3.0%, Alkyl
hydroperox
ide
reductase
subunit C
(***)
YfmK,
2.5%,
Uncharacte
rized Nacetyltrans
ferase (**)
RplL,
2.9%, 50S
ribosomal
protein
L7/L12
(**)
LipL36, 1.7%, external
encapsulating structure
GapA, 2.5%,
glyceraldehyde3-phosphate
dehydrogenase
A (****)
YheA,
2.0%,
UPF0342
protein
(**)
Flagellin protein, 1.7%,
flagellum
7
CspC,
1.6%,
stress
protein
(**)
TufB, 2.3%,
protein chain
elongation factor
EF-Tu
(********)
8
RplW,
1.5%,
50S
riboso
mal
subunit
Rho, 2.3%,
transcription
termination
factor
Icd, 1.8%,
Isocitrate
dehydroge
nase
[NADP]
participates
in mapk
signaling
pathway
(**)
GroS, 1.8%,
10 kDa
chaperonin
(****)
GapA1,
2.8%,
Glyceralde
hyde-3phosphate
dehydrog
enase 1
(****)
Eno, 2.1%,
Enolase
(***)
(no name,
locus
SACOL2595)
, 1.8%,
Putative
uncharact
erized
protein
transcriptional regulator
(ArsR family), 1.5%,
transcription factor &
regulators
6
Electron transfer
flavoprotein alphasubunit, 1.5%,
nucleotide binding
(***)
9
RpsP,
1.2%,
30S
riboso
mal
subunit
GroS, 2.2%, 10
kDa chaperonin
(****)
10
Mdh,
1.2%,
Compo
nent of
malate
dehydr
ogenas
e
GlyA, 1.7%,
serine
hydroxymethyltr
ansferase
Protein
rank
S. cerevisiae rich media,
Nat
Biotechnol, Lu
2007
(total of 5×107
proteins/cell
according to
primary
source, sum
of proteins in
reference is
also
50,000,000)
1
ENO2, 6.2%,
Enolase II
S.
cerevisiae
– minimal
media, Nat
Biotechnol,
Lu 2007
(total of
5×107
proteins/c
ell
according
to primary
source,
sum of
proteins in
reference
is also
50,000,000
)
ABM1,
4.6%,
unknown
function,
required
for normal
microtubul
e
organizatio
n
SodA, 1.6%, (no name,
Superoxide locus
dismutase
SACOL0427)
[Mn] (***) , 1.7%,
Putative
uncharact
erized
protein
TrxA, 1.5%, AhpC,
Thioredoxi 1.4%,
n (**)
Alkyl
hydropero
xide
reductase
subunit C
(***)
LipL41, 1.3%, external
encapsulating structure
LipL21, 1.1%, external
encapsulating structure
S. cerevisiae –
rich media,
Nature,
Ghaemmagha
mi 2003 (sum
of proteins in
reference is
47,000,000)
M. musculus
(NIH3T3 cells)light (L) SILAC
medium, Nature,
Schwanha¨usser
et al., 2011 (sum
of proteins in
reference is
570,000,000)
CWP2, 3.4%,
Cell Wall
Protein
ACTB, 2.8%,
Actin,
cytoplasmic 1
2
3
4
FBA1, 4.0%,
Fructose 1,6bisphosphate
aldolase (***)
TDH3, 4.0%,
Glyceraldehyd
e-3-phosphate
dehydrogenas
e
PGK1, 3.8%, 3phosphoglycer
ate kinase
5
ENO1, 3.6%,
Enolase I (***)
6
PDC1, 2.6%,
Major of three
pyruvate
decarboxylase
isozymes
ADH1, 2.6%,
Alcohol
dehydrogenas
e
7
8
TEF2, 2.4%,
Translational
elongation
factor EF-1
alpha
YMR181C,
4.2%,
unknown
function
YLR407W,
4.2%,
unknown
function
PMA1, 2.8%,
Plasma
Membrane
ATPase
FBA1, 2.1%,
Fructose 1,6bisphosphate
aldolase (***)
HIST1H4A, 2.6%,
Histone H4
ORT1,
3.0%,
Ornithine
transporter
of the
mitochond
rial inner
membrane
YMR115W
(SGD
name:
MGR3),
2.6%,
Subunit of
the
mitochond
rial (mt) iAAA
protease
supercomp
lex
YIL077C,
2.2%,
unknown
function
ILV5, 1.9%,
IsoLeucineplus-Valine
requiring
HIST2H2BB,
1.9%, Histone
H2B type 2-B
YEF3, 1.9%,
Yeast
Elongation
Factor
(translation)
HIST1H3B, 1.5%,
Histone H3.2
HHF2, 1.4%,
Histone H Four
EEF1A1, 0.93%,
Elongation factor
1-alpha 1
(translation)
YDR193W,
2.0%,
Dubious
open
reading
frame
DOA1,
1.8%, WD
repeat
protein
RPP2B, 1.4%,
Ribosomal
Protein P2 Beta
RPS27A, 0.9%,
Ubiquitin-40S
ribosomal
protein S27a
HHF1, 1.1%,
Histone H Four
S100A4, 0.75%,
Protein S100-A4
(similar to
Glyceraldehyde3-phosphate
dehydrogenase
(GAPDH) isoform
1)
HIST1H2AF,
2.6%, Histone
H2A type 1-F
9
TDH2, 1.9%,
Glyceraldehyd
e-3-phosphate
dehydrogenas
e
10
CDC19, 1.8%,
Pyruvate
kinase
CCZ1,
1.6%,
Protein
involved in
vacuolar
assembly
RPS26A,
1.5%, small
(40S)
ribosomal
subunit
SOD1, 1.1%,
SuperOxide
Dismutase
TUBB5, 0.75%,
Tubulin beta-5
chain
RPS26B, 1.1%,
Ribosomal
Protein of the
Small subunit
ANXA2, 0.67%,
Annexin A2
Download