Opinion: Is proteomics heading in the wrong direction?

advertisement
PERSPECTIVES
10. Mazzarello, P. The Hidden Structure. A Scientific
Biography of Camillo Golgi. (transls and eds Buchtel, H.
& Badiani, A.) (Oxford Univ. Press, 1999).
11. Fusari, R. Sur l’imprégnation chromo-argentique des
fibres musculaires striées des mammifères. Arch. Ital.
Biol. 22, 89–91 (1895) (in French).
12. Fusari, R. Encore sur l’imprégnation chromo-argentique
de la fibre musculaire striée des mammifères. Arch. Ital.
Biol. 22, 91–95 (1895) (in French).
13. Fusari, R. Sur la structure des fibres musculaires striées.
Arch. Ital. Biol. 22, 95–98 (1895) (in French).
14. Fusari, R. Studi sulla struttura delle fibre muscolari striate
(Atti XI Congresso Internazionale di Medicina – Sezione di
Anatomia, Rosemberg & Sellier, Torino, II: 49–50, 1894)
(in Italian).
15. Locatelli, P. Emilio Veratti. Rend. Ist. Lomb. Sci. Lett. 101,
3–7 (1967).
16. Berlucchi, G. Emilio Veratti and the ring of the czarina.
Rend. Acc. Naz. Lincei. Sc. Mat. Fis. Nat. (in the press).
17. Mazzarello, P. & Bentivoglio, M. The centenarian Golgi
apparatus. Nature 392, 543–544 (1998).
18. Veratti, E. Sulla fine struttura della fibra muscolare striata.
Rend. R. Ist. Lomb. Sc. Lett. 35, 279–283 (1902) (in Italian).
19. Bennett, H. S. in The Structure and Function of Muscle.
(ed. Bourne, G. H.) 137–181 (Academic Press, New York
and London, 1960).
20. Beams, H. W. Studies on the ‘Golgi apparatus’ of insect
muscle. Anat. Rec. 42, 323–334 (1929).
21. Bentivoglio, M. & Mazzarello, P. The pathway to the cell
and its organelles: one hundred years of the Golgi
apparatus. Endeavour 22, 101–105 (1998).
22. Luna, E. Sulla fine struttura della fibra muscolare
cardiaca. Arch. Zellforsch. 6, 383–386 (1911) (in Italian).
23. Franzini–Armstrong, C. Veratti and beyond: structure
contributions to the study of muscle contractions. Rend.
Acad. Lincei. (in the press).
24. Szent-György, A. Lost in the twentieth century. Ann. Rev.
Biochem. 32, 461–474 (1963).
25. Marsh, B. B. A factor modifying muscle fibre synaeresis.
Nature 167, 1065–1066 (1951).
26. Goodall, M. C. & Szent-György, A. G. Relaxing factors in
muscle. Nature 172, 84–85 (1953).
27. Portzehl, H. Die Bindung des Erschlaffungsfaktors von
Marsh an die Muskelgrana. Biochim. Biophys. Acta. 26,
373–377 (1957) (in German).
28. Kumagai, K., Ebashi, S. & Takeda, F. Essential relaxing
factor in muscle other than myochinase and creatine
phosphokinase. Nature 176, 166 (1955).
29. Briggs, F. N. & Fuchs, F. The biosynthesis of a
muscle-relaxing substance. Biochim. Biophys. Acta.
42, 519–527 (1960).
30. Nagai, T. Makinose, M. & Hasselbach, W. Der
physiologische Erschlaffungsfaktor und die Muskelgrana.
Bioch. Biophys. Acta 43, 223–238 (1960) (in German).
31. Parker, J. C. & Gergely, J. Soluble relaxing factor from
muscle. J. Biol. Chem. 235, 3449–3453 (1960).
32. Needham, D. M. in Structure and Function of Muscle II,
(ed. Bourne, G. H.) 55–104 (Academic Press, New York
and London, 1960).
33. Perry, S. V. Relation between chemical and contractile
function and structure of the skeletal muscle cell. Physiol.
Rev. 36, 1–76 (1956).
34. Watanabe, S. & Sleator, W. EDTA relaxation of glyceroltreated muscle fibers and the effects of magnesium,
calcium and manganese ions. Arch. Biophys. Biochem.
68, 81–101 (1957).
35. Weber, A. On the role of calcium in the activity of
adenosine 5-triphosphate hydrolysis of actomiosin.
J. Biol. Chem. 234, 2764–2769 (1959).
36. Weber, A. & Winicur, J. The role of calcium in the
superprecipitation of actomiosin. J. Biol. Chem. 236,
3198–3202 (1961).
37. Ebashi, S. Calcium binding activity of vesicular relaxing
factor. J. Biochem. 50, 236–242 (1961).
38. Ebashi, S. & Lipmann, F. Adenosine triphosphate-linked
concentration of calcium ions in a particulate fraction of
rabbit muscle. J. Cell Biol. 14, 389–400 (1962).
39. Hasselbach, W. & Makinose, M. Die Calciumpumpe der
‘Erschlaffungsgrana’ des Muskels und ihre
Abbangingkeit von der ATP-Spaltung. Biochem Z. 333,
518–527 (1961) (in German).
40. Skou, J. C. The influence of some cations on an
adenosine triphosphatase from peripheral nerves. Bioch.
Biophys. Acta 23, 394–401 (1957).
41. Järnefelt, J. Sodium-stimulated adenosintriphosphatase
in microsomes from rat brain. Bioch. Biophys. Acta 48,
104–110 (1961).
42. Porter, K. R., Claude, A. & Fullam, E. A study of tissue
culture cells by electron microscopy. Methods and
preliminary observations. J. Exp. Med. 81, 233–246 (1945).
74
| JANUARY 2003 | VOLUME 4
43. Weber, H. H. The relaxation of the contracted actomyosin
system. Ann. NY Acad. Sci. 81, 409 (1959).
44. Muscatello, U., Andersson Cedergren, E., Azzone, G. F.
& von der Decken, A. The sarcotubular system of
frog skeletal muscle. A morphological and
biochemical study. J. Biophys. Biochem. Cytol. 10,
201–218 (1961).
45. Muscatello, U., Andersson Cedergren, E. & Azzone, G. F.
The relaxing effect of the sarcotubular system. Biochim.
Biophys. Acta. 51, 426–428 (1961).
46. Weber, A., Herz, R. & Reiss I. On the mechanism of the
relaxing effect of fragmented sarcoplasmic reticulum.
J. Gen. Physiol. 46, 679–702 (1963).
47. Hill, A. V. The earliest manifestation of the mechanical
response of striated muscle. Proc. R. Soc. Lond. B 138,
339–369 (1951).
48. Huxley, A. F. & Taylor, R. E. Activation of a single
sarcomere. J. Physiol. 130, 49P–50P (1955).
49. Fawcett, D. W. & Revel, J. P. The sarcoplasmic reticulum
of fast-acting fish muscle. J. Biophys. Biochem. Cytol.
10, 89–110 (1961).
50. Porter, K. R. The sarcoplasmic reticulum: its recent
history and present status. J. Biophys. Biochem. Cytol.
10, 219–226 (1961).
51. Bennet, H. S. & Porter, K. R. An electron microscope
study of sectioned breast muscle of the domestic fowl.
Am. J. Anat. 93, 61–106 (1953).
52. Porter, K. R. & Palade, G. E. Studies on the
endoplasmic reticulum. III. Its form and distribution in
striated muscle cells. J. Biophys. Biochem Cytol. 3,
269–300 (1957).
53. Andersson Cedergren, E. Ultrastructure of motor end
54.
55.
56.
57.
58.
59.
plate and sarcoplasmic components of mouse skeletal
muscle fibre as revealed by three dimensional
reconstructions from serial sections. J. Ultrastr. Res. 1,
1–191 (1959).
Revel, J. P. in Biochemistry of Muscle Contraction (ed.
Gergely, J.) 232–246 (Little, Brown and Co., Boston, 1964).
Franzini-Armstrong, C. & Porter, K. R. Sarcolemmal
invaginations constituting the T-system in fish muscle
fibres. J. Cell Biol. 22, 675–696 (1964).
Huxley, H. E. Evidence for continuity between the central
elements of the triads and extracellular space in frog
sartorius muscle. Nature 202, 1067–1071 (1964).
Carafoli, E. Calcium signalling: a tale for all seasons. Proc.
Natl Acad. Sci. USA 99, 1115–1122 (2002).
Franzini-Armstrong, C. & Protasi, F. Ryanodine receptors
of striated muscles: a complex channel capable of
multiple interactions. Physiol. Rev. 77, 699–729 (1997).
Sacchetto, R., Turcato, F., Damiani, E. & Margreth, A.
Interaction of triadin with histidine-rich Ca2+-binding
protein at the triadic junction in skeletal muscle fibers.
J. Muscle Res. Cell Motility 20, 403–415 (1999).
Acknowledgements
We apologize to those whose work could not be cited because of
space restrictions.
Online links
FURTHER INFORMATION
Encyclopedia of Life Sciences: CamilloGolgi
Access to this interactive links box is free online.
OPINION
Is proteomics heading in the
wrong direction?
Lukas A. Huber
Proteomics is now considered to be one of
the most important ‘post-genome’
approaches to help us understand gene
function. In fact, several genomics
companies have launched large-scale
proteomics projects, and have started to
annotate the entire human proteome.
The ‘holistic view’ painted by a human
proteome project is seductive, but is it
realistic?
“Proteome indicates the proteins expressed by
a genome or tissue” — Marc Wilkins, 1994
(BOX 1). Proteomics is therefore any global
analysis of changes in the quantities, and
post-translational modifications, of all the
proteins in cells, taking the genome sequence
as a starting point. Growth, differentiation,
senescence, environmental changes, genetic
manipulation, or other events might bring
about such changes.
The main difference between genomics
and proteomics is that the genome is a static
collection of genes, whereas the proteome is
not a concrete entity, but rather a dynamic
collection of proteins that will differ from
individual to individual, and even from cell
to cell. Although it is meaningful to talk of
‘the human genome’ as a species-typical set
of genes, on the basis of the definition above
it is highly unlikely that there will be a single
collection of proteins that can be defined as
‘the human proteome’ — instead, there will
be many proteomes that are characteristic of
specific cell types and disease states.
Proteomics is the application of evolving
technologies (BOX 2) to analyse proteins on a
large, ‘genomic’ scale to study proteinexpression profiles — for example, to compare physiological and disease states. These
technologies include two-dimensional (2D)gel electrophoresis, chromatography, mass
spectrometry (MS), bioinformatics and protein ‘chips’. One of the first challenges for
proteomics is to establish routine, reliable
and efficient technologies for the acquisition
and analysis of data. To fulfil these criteria,
the technologies need to facilitate consistent
sample preparation, automation and assimilation of the information generated.
www.nature.com/reviews/molcellbio
PERSPECTIVES
Box 1 | The history of the term ‘proteome’
As Jon Cohen tells us in The proteomics payoff 36, before mid-1994 the word ‘proteome’ did not
exist. Cohen reveals that it was then that “Marc Wilkins, a student at Australia’s Macquarie
University, struggled to find the right words while cobbling together a scientific paper to support
his PhD thesis on rapidly identifying proteins. Wilkins found himself repeatedly writing,‘all
proteins expressed by a genome, cell or tissue’, a phrase he didn’t like.‘This was cumbersome,
inelegant and made for a lot of extra typing’, explains Wilkins, who now works at Sydney’s
Proteome Systems. So he started playing with words that would communicate the protein
equivalent of the genome. After discarding ‘proteinome’ and ‘protome’, he settled on proteome,
‘the one that seemed to work best and roll off the tongue nicely’.
In September 1994, Wilkins referred to the proteome at a scientific conference in Italy, and the
word stuck.”
Essentially, reproducible high-throughput
technologies are required.
The technology most commonly used to
monitor changes in the expression of complex protein mixtures is still 2D-PAGE1,2.
Generally, computer analysis is then used to
reveal the patterns of protein expression.
Proteins of interest are then cut from the gel
one by one, enzymatically chopped into fragments and fed into a mass spectrometer to
generate a ‘mass fingerprint’ of the proteins’
fragments. From this fingerprint, the probable combination of peptide masses that comprises the protein of interest can be worked
out, and this information can then be compared to the information in a genomics database to identify the corresponding DNA
sequence.
Proteomics is a very young discipline and
is used by different people in different ways.
The technologies are exciting, but they still
have considerable limitations. In this article, I
discuss several important questions from the
view of the cell biologist. What are the
promises and pitfalls of proteomics? Which
research questions offer the greatest promise
for proteomics applications? And, what new
proteomics methods do we need to achieve
our goals?
The promise
In the post-Human Genome Organization
(HUGO) era, there is a substantial movement
of effort from genomics towards proteomics.
Proteomics is the next step in the effort to
uncover information about how genes are
related to biological functions and disease
states. There is also great interest in the power
of proteomics to identify new targets for disease intervention and treatment, given that
most drug targets are proteins. Knowledge of
protein expression patterns can provide
insights into potential toxic side effects during
drug screening and can direct the optimization process. In addition, specific proteins can
be identified as highly accurate and sensitive
NATURE REVIEWS | MOLECUL AR CELL BIOLOGY
biomarkers for disease at a very early stage of
disease onset, which ensures their usefulness
in diagnosis and prognosis.
By aiming to understand the structure
and function of all the proteins in the
body, proteomics promises to deliver
potentially life-saving medical treatments
that are targeted at the protein building
blocks of every cell in every tissue.
Consequently, the international Human
Proteome Organisation (HUPO) initiative
was launched recently (for more information, see BOX 3). HUPO aims to help increase
the awareness of proteomics across society
and biomedicine — in particular, the benefits that are offered by knowledge of the
human proteome. As a global body, HUPO
has the objective of fostering international
cooperation between the research community and government and financial agencies,
and of promoting large-scale proteome
research — that is, the cataloguing and
annotation of the entire human proteome.
The pitfalls
However, several scientists, including myself,
are skeptical not only of the realization of a
human proteome project, but also of its
long-term goals. Present estimates of the
number of genes in the human genome that
are expressed in a particular cell type easily
reach 10,000. The actual number of proteins
in the entire human body is expected to be
many times greater. Thousands of chemical
modifications are made to proteins after
they have been expressed, which changes
properties such as enzymatic activity, binding ability and how long proteins remain
active. This myriad of modifications might
give rise to 10–20 million chemically distinct
polypeptides in a single tissue (BOX 4).
Furthermore, the state of a protein changes
over time and is dependent on many external stimuli. One of the main differences
between genomics and proteomics is that
proteomics does not deal with one static
genome per organism, but with a nearly infinite number of proteomes.
At present, we do not have a common and
standardized gel matrix, which would enable
us to reproducibly align protein patterns.
Although 2D gels were invented in 1975 (REFS
1,2), the technology is still tedious and difficult. It did not develop and mature with the
same breathtaking speed as other downstream technologies of proteomics, such as
MS. And this is what causes the first serious
problem in proteomics. How can we relate the
rise or fall of the expression levels of proteins
on 2D gels to the biology of a system when, to
begin with, we can only see a small fraction of
all the proteins present? Only after exhausting
all kinds of time-consuming methodological
‘tricks’, such as subcellular fractionation3,
affinity-purification of samples4 or the use of
zoom gels5 (which are used in 2D-PAGE to
cover narrow pH ranges and to give better
resolution, as well as higher sensitivity), can
low-copy-number proteins be detected.
However, protein patterns per se can sometimes be influenced by the gel system that is
applied. For example, artificial spots can be
generated by protein modification during
sample preparation. The chemistry of amino
acids is, unfortunately, much more difficult to
handle than that of nucleic acids. For proteins, there is no amplification step that is
analogous to the polymerase chain reaction
(PCR) method for gene amplification. This
means that proteins present in small amounts
are ‘muffled’ by highly abundant proteins,
which we often refer to as ‘housekeeping’
proteins. In addition, although membrane
proteins6,7 and basic proteins5 can be separated on 2D gels, hydrophobic properties and
charges still have a strong impact on whether
a protein migrates in 2D gels or not.
The tremendous speed of protein identification today and the fact that it might
soon be as easy as measuring messenger
RNAs have given rise to premature enthusiasm. Most companies so far are using moreor-less the same brute-force approach to
determine which proteins are present in various tissues. Databases are swamped with
information incredibly quickly; however,
the databases mainly contain information
on the most abundant and separable protein
species (for example, cytoskeletal proteins,
chaperones, endoplasmic-reticulum proteins, proteasome components and matrix
proteins), because these are the species that
are detected most commonly by classical
2D-PAGE and MS. They are present in every
database, whereas regulatory proteins of low
abundance (for example, GTPases, kinases
and phosphatases) or ‘difficult’ proteins (for
VOLUME 4 | JANUARY 2003 | 7 5
PERSPECTIVES
Box 2 | Technologies for proteomics studies
Proteomics applies many different technologies, and brief descriptions of some of these are
provided below.
• Two-dimensional polyacrylamide gel electrophoresis (2D-PAGE). The separation of ions or
proteins in an electric field. This separation is usually carried out on polyacrylamide gels as a
matrix. In the first dimension, the proteins are separated by isoelectric focusing, whereas in the
second dimension, they are separated on the basis of their molecular weight.
• Chromatography. A method for separating molecules on the basis of their different absorption
and elution properties.
• Electrospray. An ionization method used in MS to generate ions.
• High-pressure liquid chromatography (HPLC). A chromatographic separation technique in
which the sample is forced through a packed column of finely divided particles at high pressure.
• Isoelectric focusing. The electrophoretic migration of proteins in a pH gradient to the pH at
which they have no net charge (the isoelectric point).
• Isotope coded affinity tag (ICAT). A method for quantifying differential protein expression using
an ICAT reagent, HPLC and MS.
• Mass spectrometry (MS). A very accurate and sensitive technique that measures the mass of an
ion in a vacuum.
• Matrix-assisted laser desorption ionization (MALDI). An method that is used to produce ions
from solid-phase samples in small-molecule matrices that absorb energy from a laser beam.
• Microarray. An array of oligonucleotides that are immobilized on a surface. By defining the
sequences that hybridize, this method can be used to analyse the expression levels of several
genes.
• Multidimensional protein identification technology (MudPIT). A large-scale proteome analysis
that uses multidimensional liquid chromatography, tandem MS and database searching using
the SEQUEST algorithm.
• Phage display. This tool uses phages that have proteins displayed on their surface and identifies
protein–protein interactions by screening phage libraries.
• Serial analysis of gene expression (SAGE). A method that uses tags and cloning techniques to
analyse gene expression patterns.
• Subcellular fractionation techniques. The disruption of cells (by ‘breaking’ them under
conditions that prevent their deterioration), followed by the separation of the mixed
components and the isolation of the desired component using centrifugation.
• Tandem-affinity purification (TAP). A method for purifying complexes from different cellular
compartments. It involves introducing the TAP tag into gene-specific cassettes.
example, hydrophobic transmembrane
receptors and basic nuclear proteins) are
under-represented. In addition, each entry
in a database (expression patterns and proteins catalogued) relates only to one particular situation, in one particular tissue, in
one particular gel system. The latter point
makes cross-laboratory correlation almost
impossible, even when identical biological
systems are used. This problem has always
been the main hurdle in proteomics
research, and it becomes particularly evident when screening available proteinexpression profiles in 2D-gel databases
through the internet (BOX 3; for example, see
world 2D-PAGE at the ExPASy Molecular
Biology server). Therefore, such catalogues
hardly expand on the information that is
already present in genome databases.
Because of the nearly infinite number of
proteomes, we will have to repeat the same
76
| JANUARY 2003 | VOLUME 4
procedure over and over again. This brings
Sisyphus to mind: the ancient Greek gods
condemned Sisyphus to rolling a rock ceaselessly to the top of a mountain, and each
time he did so, the rock would fall back
down the mountain under its own weight;
the gods believed that no punishment was
more dreadful than futile and hopeless
labour.
An important consideration is how to systematically organize and analyse proteome
data (BOX 3; see the ExPASy and European
Bioinformatics Institute servers). The field of
proteomics will soon experience the explosion of data that other fields, such as
genomics and transcriptomics, have seen in
recent years8. More than ever, data validation
will be crucial for the successful establishment
of qualified databases. This means that, if the
tedious validation work is not carried out
with dedication, databases will be swamped
with inaccurate expression and interaction
data, which would be worse than swamping
them with redundant data. In addition, the
core characteristics of databases, such as data
integration and access, will become vital
issues for resources in proteomics-related
bioinformatics8. Open and easy access to proteomics data will be fundamental to enable
the scientific community to extract the greatest benefit from the data being generated.
Proteomics projects will require coordination
that is even more efficient than that for
genome projects. This is where HUPO comes
in. HUPO will need to coordinate initiatives
aimed at resources, technology development,
proteome informatics, and the maintenance
of publicly available and qualified databases.
The successes
Open your web browser, log into PubMed and
search for the term ‘genomics’ together with
your personal favourite of the top 5 % journals. Repeat this search, but this time replace
‘genomics’ with ‘proteomics’. You can repeat
this again with ‘expression profiling’ or several
other keyword combinations together with
‘proteomics’, and the overall impression will
always be the same — there are only a few
exceptional cases in proteomics that have
made it into the ‘premier-league’ journals.
Taking into account the considerable technical
limitations of proteomics, it is not surprising
that success stories in proteomics are rather
rare, at least when these successes are measured
in terms of so-called top publications.
New and powerful genomics technologies,
such as DNA microarrays or serial analysis of
gene expression (SAGE), have made it possible to analyse the expression levels of several
genes simultaneously, both in health and disease. In combination with proteomics, these
technologies promised to revolutionize biology — in particular, the area of molecular
medicine. However, although RNA expression profiling has become a state-of-the-art
technique with measurable success rates (for
examples, see REFS 9–15), protein-expression
profiling on a global level is only slowly catching on. Examples of successful proteinexpression profiling in health and disease are
rare. For example, Celis et al.16 recently
applied proteomics and immunohistochemistry to show tumour heterogeneity among
urothelial papillomas, with the long-term goal
of predicting prognosis. Compared with
rather robust genomics technologies, such as
DNA microarrays, the expenditure for proteomics expression profiling seems significantly higher. Celis et al.16, for example, could
only carry out such studies because they
could take advantage of their own experience
www.nature.com/reviews/molcellbio
PERSPECTIVES
Box 3 | Useful web site links
• Human Proteome Organisation (HUPO): http://www.hupo.org/
Contains news, statements, tools and useful links.
• ExPASy Molecular Biology Server: http://www.expasy.ch
A proteomics server with knowledge databases (for example, SWISS-PROT and index to the
world 2D-PAGE server network), software tools and training opportunities.
• Proteomics server of the European Bioinformatics Institute (EBI):
http://www.ebi.ac.uk/proteome/
Contains information on the statistical analysis of proteomes from eukaryotes, archaea and
bacteria. In addition, the EBI toolbox area provides a comprehensive range of tools for the field
of bioinformatics.
• American Society for Mass Spectromtery (ASMS): http://www.asms.org/index.php
Contains society news, tutorials and discussion platforms.
• The RESID Database : http://pir.georgetown.edu/pirwww/dbinfo/resid.html
A complete collection of annotations and structures for protein modifications.
and of annotated 2D-gel data that had been
accumulated from epithelial cells in one gel
system for more than 15 years17. Proteomics
has, despite this negativity, been extremely
successful when targeted to multiprotein
complexes and subcellular organelles, and
when it has been applied to large-scale protein–protein interaction screens.
Examples of such successes are
described in more detail below. These rare
and exceptional cases differ from the many
other proteomics papers that appear
weekly, as they did not merely provide
expression-profile inventories — they took
a leap forward in ascertaining the integration of cellular components in functionally
targeted proteome analyses.
Analysing multiprotein complexes. Multiprotein complexes carry out most cellular
processes, and the identification and analysis
of their components provides an insight into
how the proteome is organized into functional units. Rout et al.18 have established a
comprehensive inventory of the molecular
components of the nuclear pore complex
(NPC). This complex has large pores that are
embedded in the nuclear envelope and that
allow the passage of proteins and RNA
between the nucleus and the cytoplasm. Rout
et al.18 classified all of the components —
nucleoporins — of the yeast NPC. This
involved identifying all of the proteins that
were present in a highly enriched NPC fraction, determining which of these proteins
were nucleoporins and localizing each nucleoporin in the NPC. Using these data, the
authors presented a map of the molecular
architecture of the yeast NPC and provided
evidence for a Brownian-affinity gating
mechanism for protein transport from the
nucleus to the cytoplasm. These data, together
with crystallography data, enabled a picture
NATURE REVIEWS | MOLECUL AR CELL BIOLOGY
of the complete structure to be assembled and
provided clues to biochemical functions that
would not have been detectable from
sequence analysis alone.
Recently, Houry et al.19 analysed the in vitro
substrates of the chaperonin GroEL, which
has an essential role in mediating protein folding in the cytosol of Escherichia coli. The
authors identified a well-defined set of ~300
newly translated polypeptides — including
essential components of the transcription and
translation machinery and metabolic enzymes
— that strongly interacted with GroEL. About
a third of the identified proteins were structurally unstable and repeatedly returned to
GroEL for conformational maintenance. In
addition, the identified GroEL substrates were
found to be composed preferentially of two or
more domains with αβ-folds that contain
α-helices and buried β-sheets with extensive
hydrophobic surfaces, which have an impact
on the folding and aggregation properties of
the identified substrates.
Analysing subcellular organelle composition. A
comprehensive proteomics analysis of human
nucleoli was carried out recently using a
combination of MS and sequence-database
searches that included online analysis of the
human genome sequence20. The authors of this
study identified 271 proteins in the nucleoli,
and showed that nucleoli have a surprisingly
large protein complexity. Many new factors and
different classes of proteins were found to be in
this location, which supports the view that the
nucleolus might carry out additional functions
beyond its known role in ribosome-subunit
biogenesis. This extensive proteomics analysis
also showed for the first time that the protein
composition of nucleoli can alter significantly
in response to the metabolic state of the cell.
Phagosomes are the key organelles in
macrophages that provide these cells with the
innate ability to participate in tissue remodelling, to clear apoptotic cells and to restrict the
spread of intracellular pathogens. The establishment of a comprehensive 2D-gel database
enabled Desjardins and colleagues21 to analyse
how phagosome composition is modulated
during phagolysosome biogenesis. Using this
approach, the authors found that during this
process hydrolases — enzymes that catalyse
the hydrolysis of peptide bonds — are not
delivered in bulk to phagosomes, but are
acquired sequentially instead. In a follow-up
study by the same group22, this proteome
characterization also provided new insights
into phagosomes as endoplasmic-reticulummediated entry sites for intracellular
pathogens, regardless of their final trafficking
in the host. This is one of the rare examples
where new ‘text-book’ knowledge has been
generated by a global characterization of an
organelle proteome.
Detecting protein–protein interactions. So far,
the generation of large-scale protein–protein
interaction maps has relied on the yeast twohybrid system, which detects binary interactions through the activation of reporter-gene
expression. Two large-scale yeast two-hybrid
screens23 were undertaken to identify protein–protein interactions between full-length
open reading frames (ORFs) that were predicted from the Saccharomyces cerevisiae
genome sequence. This approach resulted in
the detection of 957 putative interactions that
involved 1,004 S. cerevisiae proteins23.
Recently, two groups4,24, with slightly different strategies, embarked on a high-throughput
analysis of multiprotein complexes in S.
cerevisiae. Gavin et al.4 processed 1,739 genes,
Box 4 | Post-translational modifications of proteins
Proteins, once synthesized on the ribosomes, are subject to a multitude of modification steps
such as amino- or carboxy-terminal cleavages, glycosylation, phosphorylation and sulphation.
At present, more than 100 different types of post-translational modifications are known and
many more are likely to be discovered (BOX 3; see the RESID Database). Consequently, there are
many more proteins in the proteome than there are genes in the genome.“Thus the number of
different protein molecules expressed by the human genome is probably closer to a million than
to the hundred thousand generally considered by genome scientists.” (REF. 37).
VOLUME 4 | JANUARY 2003 | 7 7
PERSPECTIVES
Calmodulin-binding
peptide
Cell extract
TEV protease cleavage site
Protein A
+
Bait protein
TAP tag
TEV protease cleavage
Specific binding partner
IgG beads
First affinity
column
Calmodulin
beads
Second affinity
column
Contaminant
Native elution (EGTA)
Figure 1 | Tandem-affinity purification. The tandem-affinity-purification (TAP) tag consists of three
components: a calmodulin-binding peptide, a tobacco etch virus (TEV) protease cleavage site and Protein
A as an immunoglobulin G (IgG)-binding domain. Cells or organisms are generated that contain TAPtagged protein(s). Extracts are then prepared under mild conditions and TAP is carried out. The first
column consists of IgG beads. TEV protease cleaves the immobilized multiprotein complexes. Another
round of binding is carried out on a second column that consists of calmodulin beads. The native complex
is then eluted by chelating calcium using EGTA.
sequences. It is involved in cell–cell communication and in signal transduction from the
cell surface to the nucleus. Application of this
strategy to yeast SH3 domains generated a
phage-display network that contained 394
interactions between 206 proteins and a twohybrid network containing 233 interactions
between 145 proteins. Computational analysis identified 59 highly probable interactions
that were common to both networks26.
The success rates of these different largescale approaches for studying protein–protein
interactions cannot be compared directly.
However, the smallest common denominator
for these approaches is the need for thorough
bioinformatic analysis. Tong et al.26 identified
key interactions by calculating the intersection of predicted and experimental networks,
whereas the huge amount of data produced
by large-scale yeast two-hybrid screens in S.
cerevisiae23 gained its meaning through subsequent bioinformatic analyses. Possible functions were assigned to proteins on the basis of
the known functions of their interacting partners27; the topological properties of interacting protein networks and their regulatory
genetic network were addressed28; and the
question of how the organization of protein
networks affects the evolution of the proteins
that comprise them was considered29.
Together with sophisticated bioinformatics
analyses, these interaction maps now provide
fundamental biological information in the
context of new approaches to drug discovery.
Methods for the future?
including 1,143 human orthologues of relevance to human biology, and purified 589
protein assemblies. The key to their work was
tandem-affinity purification (TAP; FIG. 1; BOX 2).
A cassette that encoded a so-called TAP tag,
which consists of a calmodulin-binding peptide, a specific enzyme cleavage domain and
Protein A from Staphylococcus aureus, was
inserted into the cells being studied, and a
tagged library was generated. The resulting
fusion proteins, together with their binding
partners, were then isolated from total cell
lysates by their tag under mild conditions.
Bioinformatic analysis of these assemblies
defined 232 distinct multiprotein complexes
and proposed new cellular roles for 344 proteins, which included 231 proteins with no
previous functional annotation4.
Beginning with 10% of predicted yeast
proteins as bait, Ho et al.24 detected 3,617 proteins that associated with the bait, using highthroughput mass spectrometric protein
complex identification. This number corresponds to more than half of the yeast ‘proteome’
78
| JANUARY 2003 | VOLUME 4
(~6,000 protein-coding genes have been predicted)25, and numerous protein complexes
were identified, which included many new
interactions in various signalling pathways
and in the DNA-damage response24.
Tong et al.26 have developed a strategy that
combines computational prediction of interactions from phage-display ligand consensus
sequences and large-scale two-hybrid physical
interaction tests. They first screened random
peptide libraries by phage display to define
consensus sequences for preferred ligands. On
the basis of those consensus sequences, a
computational protein–protein interaction
network was derived. They then generated a
second network using yeast two-hybrid
screening for all the possible binding partners
for each motif. Finally, the intersection of predicted and experimental networks was determined and the key interactions were, once
again, experimentally tested for relevance.
The Src-homology-3 (SH3) domain is a small
conserved sequence of ~60 amino-acid
residues, which binds to proline-rich
Although 2D-PAGE technology will, for some
time, still be the main technology for protein
display— especially because of the recent
improvements in immobilized narrow pH
gradients — chromatography-coupled MS
approaches and gel-independent techniques
are likely to replace 2D-PAGE technology in
the future.
Yates and colleagues30 have described an
automated method for shotgun proteomics,
which is known as multidimensional protein
identification technology (MudPIT; FIG. 2), that
combines multidimensional liquid chromatography with electrospray ionization tandem MS. Analogous to DNA sequencing, they
named this method ‘shotgun’ sequencing
because it can easily be automated and it
improves the overall analysis of proteomes by
identifying proteins of all functional and physical classes. The multidimensional liquid-chromatography method integrates a strong
cation-exchange resin and a reversed-phase
resin in a biphasic column. With this largely
unbiased method, Yates and colleagues
analysed the S. cerevisiae strain BJ5460 that was
www.nature.com/reviews/molcellbio
PERSPECTIVES
Complex protein mixture
ReversedStrong
phase
cation
exchanger material
Off-line
loading
(1)
kV
HPLC gradient
(1)
(2)
Insert column
into system
Electrospray
ionization
(2)
ion trap mass
spectrometer
Waste
Database
searching
Figure 2 | Multidimensional protein
identification technology. This method (which
is known as MudPIT) combines multidimensional
liquid chromatography with electrospray ionization
tandem mass spectrometry. In a biphasic column,
the first chromatography dimension consists of a
strong cation exchanger and the second
dimension consists of a reversed-phase resin.
The column is loaded off-line with a complex
protein mixture. Next, the high-pressure liquid
chromatography (HPLC) gradient is applied,
the proteins are eluted and then they are directly
analysed by mass spectrometry and database
searching. Modified with permission from REF. 31
© Macmillan Magazines Ltd.
grown to mid log-phase and produced the
largest proteome analysis to date31. A total of
1,484 proteins were identified. Importantly, a
dynamic ratio of 10,000:1 was shown between
the most-abundant and least-abundant peptides in a complex peptide mixture, which is
very similar to the dynamic range calculated by
O’Farrell in his original 2D-gel publication2.
Furthermore, they identified 131 proteins with
3–12 predicted transmembrane domains,
which might have escaped identification with
conventional gel-based approaches.
Aebersold and colleagues32 have recently
introduced selective labelling chemistries for
the quantitative measurement of peptide and
protein abundance. This method relies on the
selective conjugation of cysteine thiol groups
in proteins, followed by enzymatic digestion
and quantitative analysis of the peptide conjugates by MS. The isotope-coded and
biotinylated affinity tags are molecular handles for the highly selective and reversible
affinity capture of conjugates from complex
biological mixtures, such as cell homogenates
and subcellular organelles (FIG. 3). The isotope-coded affinity tag (ICAT) approach is
highly accurate, because it is based on stable
isotope dilution techniques, and it allows the
rapid and accurate quantification of protein
activity and content.
Using this strategy, the Aebersold group32
compared protein expression in S. cerevisiae
that was using either ethanol or galactose as a
NATURE REVIEWS | MOLECUL AR CELL BIOLOGY
carbon source. The differences measured in
protein expression correlated with known
yeast metabolic function under glucoserepressed conditions. The ICAT approach
should provide a widely applicable means to
quantitatively compare protein expression in
cells and tissues. However, a clear drawback of
this method is the complexity of the generated peptides and, therefore, its still-limited
suitability for large-scale biological problems.
At the From Genome to Proteome meeting in
Siena, Italy (September 2002), R. Aebersold
(Institute of Systems Biology, Seattle, WA,
USA) provided a simulation to show that,
with the present technology and throughput,
proteomics is still very slow. He used the
assumptions that all yeast genes are concurrently expressed, that trypsin is used as the
protease (which allows for one missed cleavage site) and that all peptides are sequenced in
a tandem mass spectrometer at a frequency of
one peptide per second. Using these assumptions, a total of 6,118 yeast proteins would
give rise to ~350,000 peptides after digestion,
and, with the present capacity of liquid
chromatography–MS/MS and subsequent
data interpretation, the ICAT analysis would
take 72 days. This clearly shows the need for
pre-fractionation and for better bioinformatic tools for automated data collection
and interpretation.
There are several highly promising techniques that are not based on MS. Protein arrays
for studying protein–protein or protein–antibody interactions are in their early days, and
many problems — such as protein solubility,
folding and ideal binding milieu — have to be
overcome. However, some very promising
approaches are on their way, and only when
they are put into practice will we discover their
general feasibility. Because the yeast genome
has been sequenced (and was found to contain
more than 6,200 ORFs33), Snyder and colleagues34 were able to overproduce nearly all of
the yeast proteins as glutathione-S-transferase
fusion proteins and to purify these proteins.
These proteins were then ‘printed’ onto slides
at a high spatial density to form a yeast proteome microarray and were screened for their
ability to interact with proteins and phospholipids. Snyder and colleagues34 identified many
new calmodulin- and phospholipid-interacting proteins, and a common potential binding
motif was identified for many of the calmodulin-binding proteins. In July 2002, at the
meeting of the European Life Science
Organization (ELSO) in Nice, M. Snyder (New
Haven, CT, USA) reported an exciting extension of this approach that would enable us to
screen for various high-throughput biochemical assays, such as phosphorylation assays, ATP
and GTP binding assays, and protein–nucleic
acid, protein–lipid and protein–protein interaction assays.
In addition to the human draft sequence,
the complete genome sequences of an
increasing number of model organisms are
now available. (E. coli and a large number of
microorganisms, S. cerevisiae, Caenorhabditis
elegans, Drosophila melanogaster and
Arabidopsis thaliana). This sequence information is expected to revolutionize the way biological questions can be addressed. Molecular
mechanisms should now be approachable on
a more global scale in the context of nearly
complete sets of genes, rather than by
analysing genes individually. Recently, the
predicted ORFs of C. elegans were amplified
by PCR from a highly representative cDNA
library, cloned and then sequenced to generate ORF sequence tags35. The possibility of a
complete or nearly complete set of ORFs —
the ‘ORFeome’, by analogy with genome,
transcriptome and proteome — has very
important consequences for functional
human proteomics approaches in general.
Once such approaches are possible in
humans, there will be a clear transition from
'Light' version
'Heavy' version
–S–SBiotin
Protein Linker
–S–SBiotin
Protein
Deuterium
linker
Pool and cut
–S–S-
–S–S-
ICAT-labelled peptides
Liquidaffinity
chromatography
–S–S-
–S–S∆8
m/z
MS/MS
Figure 3 | Isotope-coded affinity tag
methodology. Two populations of proteins from
different cellular states or growth conditions are
isolated, and each population is tagged with a
different isotope-coded affinity tag (ICAT). The light
version, with hydrogen, and the heavy version,
with deuterium, have a mass difference of 8. The
ICAT-tagged protein populations are then pooled
and the proteins are cut into smaller peptides. The
peptides are affinity purified (using their biotin tag)
and, finally, they are analysed quantitatively as
ICAT pairs of peptides using tandem mass
spectrometry (MS/MS). For a flash-animated
version of the ICAT method, see the Online links
section. m/z, mass-to-charge ratio.
VOLUME 4 | JANUARY 2003 | 7 9
PERSPECTIVES
large-scale protein-annotation projects,
which are based mainly on MS, towards functional protein analysis on a global scale.
Concluding remarks
As the saying goes,“these boots are made for
walking”, and it now seems that the ‘proteomics
boots’ fit, and that we even have an idea about
the ‘right direction’ in which to walk. So far, the
proven strength of proteomics has been in targeted and focused analyses. However, in the
future, global functional approaches seem feasible, and it is here that cell biology is a rich area
for proteome research. By confirming the subcellular localization of proteins and their molecular interactions, we can learn a great deal
about the functions of proteins — and that,
after all, is the whole point of proteomics.
Subcellular proteomes, protein-interaction
networks and large signalling complexes provide unprecedented opportunities to unlock
the mysteries of biological processes and to
develop new rational therapeutics (proteomics
will soon be competing with proven technologies for, for example, target identification and
validation in drug discovery). Showing that
proteomics, in combination with cell biology,
can deliver functional insights into systems as
large as an organelle give us hope that it will
work in its promised sense — that is, on the
level of entire proteomes — in the future.
Lukas A. Huber is at the Institute of Anatomy and
Histology, Department of Histology and Molecular
Cell Biology, University of Innsbruck,
6020 Innsbruck, Austria.
e-mail: Lukas.A.Huber@uibk.ac.at
doi:10.1038/nrm1007
1.
2.
80
Klose, J. Protein mapping by combined isoelectric
focusing and electrophoresis of mouse tissues. A novel
approach to testing for induced point mutations in
mammals. Humangenetik 26, 231–243 (1975).
O‘Farrell, P. H. High resolution two-dimensional
electrophoresis of proteins. J. Biol. Chem. 250,
4007–4021 (1975).
| JANUARY 2003 | VOLUME 4
3.
4.
5.
6.
7.
8.
9.
10.
11.
12.
13.
14.
15.
16.
17.
18.
19.
20.
21.
22.
23.
Pasquali, C., Fialka, I. & Huber, L. A. Subcellular
fractionation, electromigration analysis and mapping of
organelles. J. Chromatogr. B Biomed. Sci. Appl. 722,
89–102 (1999).
Gavin, A. C. et al. Functional organization of the yeast
proteome by systematic analysis of protein complexes.
Nature 415, 141–147 (2002).
Gorg, A. et al. The current state of two-dimensional
electrophoresis with immobilized pH gradients.
Electrophoresis 21, 1037–1053 (2000).
Santoni, V., Molloy, M. & Rabilloud, T. Membrane proteins
and proteomics: un amour impossible? Electrophoresis
21, 1054–1070 (2000).
Pasquali, C., Fialka, I. & Huber, L. A. Preparative twodimensional gel electrophoresis of membrane proteins.
Electrophoresis 18, 2573–2581 (1997).
Stupka, E. Large-scale open bioinformatics
data resources. Curr. Opin. Mol. Ther. 4, 265–274
(2002).
Alizadeh, A. A. et al. Distinct types of diffuse large B-cell
lymphoma identified by gene expression profiling. Nature
403, 503–511 (2000).
Dhanasekaran, S. M. et al. Delineation of prognostic
biomarkers in prostate cancer. Nature 412, 822–826
(2001).
van’t Veer, L. J. et al. Gene expression profiling predicts
clinical outcome of breast cancer. Nature 415, 530–536
(2002).
Huang, Q. et al. The plasticity of dendritic cell responses
to pathogens and their components. Science 294,
870–875 (2001).
Caron, H. et al. The human transcriptome map:
clustering of highly expressed genes in chromosomal
domains. Science 291, 1289–1292 (2001).
Hill, A. A., Hunter, C. P., Tsung, B. T., Tucker-Kellogg, G.
& Brown, E. L. Genomic analysis of gene expression in
C. elegans. Science 290, 809–812 (2000).
St Croix, B. et al. Genes expressed in human
tumor endothelium. Science 289, 1197–1202
(2000).
Celis, J. E. et al. Proteomic strategies to reveal tumor
heterogeneity among urothelial papillomas. Mol. Cell
Proteomics 1, 269–279 (2002).
Celis, J. E. Toward establishing a database of human
protein information derived from the analysis of twodimensional gels. Leukemia 1, 706 (1987).
Rout, M. P. et al. The yeast nuclear pore complex:
composition, architecture, and transport mechanism.
J. Cell Biol. 148, 635–651 (2000).
Houry, W. A., Frishman, D., Eckerskorn, C., Lottspeich, F.
& Hartl, F. U. Identification of in vivo substrates of the
chaperonin GroEL. Nature 402, 147–154 (1999).
Andersen, J. S. et al. Directed proteomic analysis of the
human nucleolus. Curr. Biol. 12, 1–11 (2002).
Garin, J. et al. The phagosome proteome: insight into
phagosome functions. J. Cell Biol. 152, 165–180 (2001).
Gagnon, E. et al. Endoplasmic reticulum-mediated
phagocytosis is a mechanism of entry into macrophages.
Cell 110, 119–131 (2002).
Uetz, P. et al. A comprehensive analysis of
protein–protein interactions in Saccharomyces
cerevisiae. Nature 403, 623–627 (2000).
24. Ho, Y. et al. Systematic identification of protein
complexes in Saccharomyces cerevisiae by mass
spectrometry. Nature 415, 180–183 (2002).
25. Payne, W. E. & Garrels, J. I. Yeast Protein Database
(YPD): a database for the complete proteome of
Saccharomyces cerevisiae. Nucleic Acids Res. 25,
57–62 (1997).
26. Tong, A. H. et al. A combined experimental and
computational strategy to define protein interaction
networks for peptide recognition modules. Science 295,
321–324 (2002).
27. Schwikowski, B., Uetz, P. & Fields, S. A network of
protein–protein interactions in yeast. Nature Biotechnol.
18, 1257–1261 (2000).
28. Maslov, S. & Sneppen, K. Specificity and stability in
topology of protein networks. Science 296, 910–913
(2002).
29. Fraser, H. B., Hirsh, A. E., Steinmetz, L. M., Scharfe, C. &
Feldman, M. W. Evolutionary rate in the protein
interaction network. Science 296, 750–752 (2002).
30. Wolters, D. A., Washburn, M. P. & Yates, J. R. 3rd. An
automated multidimensional protein identification
technology for shotgun proteomics. Anal. Chem. 73,
5683–5690 (2001).
31. Washburn, M. P., Wolters, D. & Yates, J. R. 3rd. Largescale analysis of the yeast proteome by multidimensional
protein identification technology. Nature Biotechnol. 19,
242–247 (2001).
32. Gygi, S. P. et al. Quantitative analysis of complex protein
mixtures using isotope-coded affinity tags. Nature
Biotechnol. 17, 994–999 (1999).
33. Skovgaard, M., Jensen, L. J., Brunak, S., Ussery, D. &
Krogh, A. On the total number of genes and their length
distribution in complete microbial genomes. Trends
Genet. 17, 425–428 (2001).
34. Zhu, H. et al. Global analysis of protein activities using
proteome chips. Science 293, 2101–2105 (2001).
35. Reboul, J. et al. Open-reading-frame sequence tags
(OSTs) support the existence of at least 17,300 genes in
C. elegans. Nature Genet. 27, 332–336 (2001).
36. Cohen, J. The proteomics payoff. Technol. Rev. October,
55–60 (2001).
37. Human Proteomics Initiative. ExPASy Molecular Biology
Server [online], (June 2002)
http://ca.expasy.org/sprot/hpi/hpi_desc.html (2002).
Acknowledgements
I would like to thank M. Glotzer for critically reading and discussing
this manuscript with me. I would also like to thank Tommy Beck
for helping with the web links.
Online links
FURTHER INFORMATION
Lukas Huber’s laboratory:
http://www.uibk.ac.at/c/c5/c552/c55200/index.html
Isotope-Coded Affinity Tags (ICAT) Methodology — Flash
Animation:
http://www.bio.davidson.edu/Courses/genomics/ICAT/ICAT.html
PubMed: http://www.ncbi.nlm.nih.gov/entrez/
Access to this interactive links box is free online.
www.nature.com/reviews/molcellbio
Download