PERSPECTIVES 10. Mazzarello, P. The Hidden Structure. A Scientific Biography of Camillo Golgi. (transls and eds Buchtel, H. & Badiani, A.) (Oxford Univ. Press, 1999). 11. Fusari, R. Sur l’imprégnation chromo-argentique des fibres musculaires striées des mammifères. Arch. Ital. Biol. 22, 89–91 (1895) (in French). 12. Fusari, R. Encore sur l’imprégnation chromo-argentique de la fibre musculaire striée des mammifères. Arch. Ital. Biol. 22, 91–95 (1895) (in French). 13. Fusari, R. Sur la structure des fibres musculaires striées. Arch. Ital. Biol. 22, 95–98 (1895) (in French). 14. Fusari, R. Studi sulla struttura delle fibre muscolari striate (Atti XI Congresso Internazionale di Medicina – Sezione di Anatomia, Rosemberg & Sellier, Torino, II: 49–50, 1894) (in Italian). 15. Locatelli, P. Emilio Veratti. Rend. Ist. Lomb. Sci. Lett. 101, 3–7 (1967). 16. Berlucchi, G. Emilio Veratti and the ring of the czarina. Rend. Acc. Naz. Lincei. Sc. Mat. Fis. Nat. (in the press). 17. Mazzarello, P. & Bentivoglio, M. The centenarian Golgi apparatus. Nature 392, 543–544 (1998). 18. Veratti, E. Sulla fine struttura della fibra muscolare striata. Rend. R. Ist. Lomb. Sc. Lett. 35, 279–283 (1902) (in Italian). 19. Bennett, H. S. in The Structure and Function of Muscle. (ed. Bourne, G. H.) 137–181 (Academic Press, New York and London, 1960). 20. Beams, H. W. Studies on the ‘Golgi apparatus’ of insect muscle. Anat. Rec. 42, 323–334 (1929). 21. Bentivoglio, M. & Mazzarello, P. The pathway to the cell and its organelles: one hundred years of the Golgi apparatus. Endeavour 22, 101–105 (1998). 22. Luna, E. Sulla fine struttura della fibra muscolare cardiaca. Arch. Zellforsch. 6, 383–386 (1911) (in Italian). 23. Franzini–Armstrong, C. Veratti and beyond: structure contributions to the study of muscle contractions. Rend. Acad. Lincei. (in the press). 24. Szent-György, A. Lost in the twentieth century. Ann. Rev. Biochem. 32, 461–474 (1963). 25. Marsh, B. B. A factor modifying muscle fibre synaeresis. Nature 167, 1065–1066 (1951). 26. Goodall, M. C. & Szent-György, A. G. Relaxing factors in muscle. Nature 172, 84–85 (1953). 27. Portzehl, H. Die Bindung des Erschlaffungsfaktors von Marsh an die Muskelgrana. Biochim. Biophys. Acta. 26, 373–377 (1957) (in German). 28. Kumagai, K., Ebashi, S. & Takeda, F. Essential relaxing factor in muscle other than myochinase and creatine phosphokinase. Nature 176, 166 (1955). 29. Briggs, F. N. & Fuchs, F. The biosynthesis of a muscle-relaxing substance. Biochim. Biophys. Acta. 42, 519–527 (1960). 30. Nagai, T. Makinose, M. & Hasselbach, W. Der physiologische Erschlaffungsfaktor und die Muskelgrana. Bioch. Biophys. Acta 43, 223–238 (1960) (in German). 31. Parker, J. C. & Gergely, J. Soluble relaxing factor from muscle. J. Biol. Chem. 235, 3449–3453 (1960). 32. Needham, D. M. in Structure and Function of Muscle II, (ed. Bourne, G. H.) 55–104 (Academic Press, New York and London, 1960). 33. Perry, S. V. Relation between chemical and contractile function and structure of the skeletal muscle cell. Physiol. Rev. 36, 1–76 (1956). 34. Watanabe, S. & Sleator, W. EDTA relaxation of glyceroltreated muscle fibers and the effects of magnesium, calcium and manganese ions. Arch. Biophys. Biochem. 68, 81–101 (1957). 35. Weber, A. On the role of calcium in the activity of adenosine 5-triphosphate hydrolysis of actomiosin. J. Biol. Chem. 234, 2764–2769 (1959). 36. Weber, A. & Winicur, J. The role of calcium in the superprecipitation of actomiosin. J. Biol. Chem. 236, 3198–3202 (1961). 37. Ebashi, S. Calcium binding activity of vesicular relaxing factor. J. Biochem. 50, 236–242 (1961). 38. Ebashi, S. & Lipmann, F. Adenosine triphosphate-linked concentration of calcium ions in a particulate fraction of rabbit muscle. J. Cell Biol. 14, 389–400 (1962). 39. Hasselbach, W. & Makinose, M. Die Calciumpumpe der ‘Erschlaffungsgrana’ des Muskels und ihre Abbangingkeit von der ATP-Spaltung. Biochem Z. 333, 518–527 (1961) (in German). 40. Skou, J. C. The influence of some cations on an adenosine triphosphatase from peripheral nerves. Bioch. Biophys. Acta 23, 394–401 (1957). 41. Järnefelt, J. Sodium-stimulated adenosintriphosphatase in microsomes from rat brain. Bioch. Biophys. Acta 48, 104–110 (1961). 42. Porter, K. R., Claude, A. & Fullam, E. A study of tissue culture cells by electron microscopy. Methods and preliminary observations. J. Exp. Med. 81, 233–246 (1945). 74 | JANUARY 2003 | VOLUME 4 43. Weber, H. H. The relaxation of the contracted actomyosin system. Ann. NY Acad. Sci. 81, 409 (1959). 44. Muscatello, U., Andersson Cedergren, E., Azzone, G. F. & von der Decken, A. The sarcotubular system of frog skeletal muscle. A morphological and biochemical study. J. Biophys. Biochem. Cytol. 10, 201–218 (1961). 45. Muscatello, U., Andersson Cedergren, E. & Azzone, G. F. The relaxing effect of the sarcotubular system. Biochim. Biophys. Acta. 51, 426–428 (1961). 46. Weber, A., Herz, R. & Reiss I. On the mechanism of the relaxing effect of fragmented sarcoplasmic reticulum. J. Gen. Physiol. 46, 679–702 (1963). 47. Hill, A. V. The earliest manifestation of the mechanical response of striated muscle. Proc. R. Soc. Lond. B 138, 339–369 (1951). 48. Huxley, A. F. & Taylor, R. E. Activation of a single sarcomere. J. Physiol. 130, 49P–50P (1955). 49. Fawcett, D. W. & Revel, J. P. The sarcoplasmic reticulum of fast-acting fish muscle. J. Biophys. Biochem. Cytol. 10, 89–110 (1961). 50. Porter, K. R. The sarcoplasmic reticulum: its recent history and present status. J. Biophys. Biochem. Cytol. 10, 219–226 (1961). 51. Bennet, H. S. & Porter, K. R. An electron microscope study of sectioned breast muscle of the domestic fowl. Am. J. Anat. 93, 61–106 (1953). 52. Porter, K. R. & Palade, G. E. Studies on the endoplasmic reticulum. III. Its form and distribution in striated muscle cells. J. Biophys. Biochem Cytol. 3, 269–300 (1957). 53. Andersson Cedergren, E. Ultrastructure of motor end 54. 55. 56. 57. 58. 59. plate and sarcoplasmic components of mouse skeletal muscle fibre as revealed by three dimensional reconstructions from serial sections. J. Ultrastr. Res. 1, 1–191 (1959). Revel, J. P. in Biochemistry of Muscle Contraction (ed. Gergely, J.) 232–246 (Little, Brown and Co., Boston, 1964). Franzini-Armstrong, C. & Porter, K. R. Sarcolemmal invaginations constituting the T-system in fish muscle fibres. J. Cell Biol. 22, 675–696 (1964). Huxley, H. E. Evidence for continuity between the central elements of the triads and extracellular space in frog sartorius muscle. Nature 202, 1067–1071 (1964). Carafoli, E. Calcium signalling: a tale for all seasons. Proc. Natl Acad. Sci. USA 99, 1115–1122 (2002). Franzini-Armstrong, C. & Protasi, F. Ryanodine receptors of striated muscles: a complex channel capable of multiple interactions. Physiol. Rev. 77, 699–729 (1997). Sacchetto, R., Turcato, F., Damiani, E. & Margreth, A. Interaction of triadin with histidine-rich Ca2+-binding protein at the triadic junction in skeletal muscle fibers. J. Muscle Res. Cell Motility 20, 403–415 (1999). Acknowledgements We apologize to those whose work could not be cited because of space restrictions. Online links FURTHER INFORMATION Encyclopedia of Life Sciences: CamilloGolgi Access to this interactive links box is free online. OPINION Is proteomics heading in the wrong direction? Lukas A. Huber Proteomics is now considered to be one of the most important ‘post-genome’ approaches to help us understand gene function. In fact, several genomics companies have launched large-scale proteomics projects, and have started to annotate the entire human proteome. The ‘holistic view’ painted by a human proteome project is seductive, but is it realistic? “Proteome indicates the proteins expressed by a genome or tissue” — Marc Wilkins, 1994 (BOX 1). Proteomics is therefore any global analysis of changes in the quantities, and post-translational modifications, of all the proteins in cells, taking the genome sequence as a starting point. Growth, differentiation, senescence, environmental changes, genetic manipulation, or other events might bring about such changes. The main difference between genomics and proteomics is that the genome is a static collection of genes, whereas the proteome is not a concrete entity, but rather a dynamic collection of proteins that will differ from individual to individual, and even from cell to cell. Although it is meaningful to talk of ‘the human genome’ as a species-typical set of genes, on the basis of the definition above it is highly unlikely that there will be a single collection of proteins that can be defined as ‘the human proteome’ — instead, there will be many proteomes that are characteristic of specific cell types and disease states. Proteomics is the application of evolving technologies (BOX 2) to analyse proteins on a large, ‘genomic’ scale to study proteinexpression profiles — for example, to compare physiological and disease states. These technologies include two-dimensional (2D)gel electrophoresis, chromatography, mass spectrometry (MS), bioinformatics and protein ‘chips’. One of the first challenges for proteomics is to establish routine, reliable and efficient technologies for the acquisition and analysis of data. To fulfil these criteria, the technologies need to facilitate consistent sample preparation, automation and assimilation of the information generated. www.nature.com/reviews/molcellbio PERSPECTIVES Box 1 | The history of the term ‘proteome’ As Jon Cohen tells us in The proteomics payoff 36, before mid-1994 the word ‘proteome’ did not exist. Cohen reveals that it was then that “Marc Wilkins, a student at Australia’s Macquarie University, struggled to find the right words while cobbling together a scientific paper to support his PhD thesis on rapidly identifying proteins. Wilkins found himself repeatedly writing,‘all proteins expressed by a genome, cell or tissue’, a phrase he didn’t like.‘This was cumbersome, inelegant and made for a lot of extra typing’, explains Wilkins, who now works at Sydney’s Proteome Systems. So he started playing with words that would communicate the protein equivalent of the genome. After discarding ‘proteinome’ and ‘protome’, he settled on proteome, ‘the one that seemed to work best and roll off the tongue nicely’. In September 1994, Wilkins referred to the proteome at a scientific conference in Italy, and the word stuck.” Essentially, reproducible high-throughput technologies are required. The technology most commonly used to monitor changes in the expression of complex protein mixtures is still 2D-PAGE1,2. Generally, computer analysis is then used to reveal the patterns of protein expression. Proteins of interest are then cut from the gel one by one, enzymatically chopped into fragments and fed into a mass spectrometer to generate a ‘mass fingerprint’ of the proteins’ fragments. From this fingerprint, the probable combination of peptide masses that comprises the protein of interest can be worked out, and this information can then be compared to the information in a genomics database to identify the corresponding DNA sequence. Proteomics is a very young discipline and is used by different people in different ways. The technologies are exciting, but they still have considerable limitations. In this article, I discuss several important questions from the view of the cell biologist. What are the promises and pitfalls of proteomics? Which research questions offer the greatest promise for proteomics applications? And, what new proteomics methods do we need to achieve our goals? The promise In the post-Human Genome Organization (HUGO) era, there is a substantial movement of effort from genomics towards proteomics. Proteomics is the next step in the effort to uncover information about how genes are related to biological functions and disease states. There is also great interest in the power of proteomics to identify new targets for disease intervention and treatment, given that most drug targets are proteins. Knowledge of protein expression patterns can provide insights into potential toxic side effects during drug screening and can direct the optimization process. In addition, specific proteins can be identified as highly accurate and sensitive NATURE REVIEWS | MOLECUL AR CELL BIOLOGY biomarkers for disease at a very early stage of disease onset, which ensures their usefulness in diagnosis and prognosis. By aiming to understand the structure and function of all the proteins in the body, proteomics promises to deliver potentially life-saving medical treatments that are targeted at the protein building blocks of every cell in every tissue. Consequently, the international Human Proteome Organisation (HUPO) initiative was launched recently (for more information, see BOX 3). HUPO aims to help increase the awareness of proteomics across society and biomedicine — in particular, the benefits that are offered by knowledge of the human proteome. As a global body, HUPO has the objective of fostering international cooperation between the research community and government and financial agencies, and of promoting large-scale proteome research — that is, the cataloguing and annotation of the entire human proteome. The pitfalls However, several scientists, including myself, are skeptical not only of the realization of a human proteome project, but also of its long-term goals. Present estimates of the number of genes in the human genome that are expressed in a particular cell type easily reach 10,000. The actual number of proteins in the entire human body is expected to be many times greater. Thousands of chemical modifications are made to proteins after they have been expressed, which changes properties such as enzymatic activity, binding ability and how long proteins remain active. This myriad of modifications might give rise to 10–20 million chemically distinct polypeptides in a single tissue (BOX 4). Furthermore, the state of a protein changes over time and is dependent on many external stimuli. One of the main differences between genomics and proteomics is that proteomics does not deal with one static genome per organism, but with a nearly infinite number of proteomes. At present, we do not have a common and standardized gel matrix, which would enable us to reproducibly align protein patterns. Although 2D gels were invented in 1975 (REFS 1,2), the technology is still tedious and difficult. It did not develop and mature with the same breathtaking speed as other downstream technologies of proteomics, such as MS. And this is what causes the first serious problem in proteomics. How can we relate the rise or fall of the expression levels of proteins on 2D gels to the biology of a system when, to begin with, we can only see a small fraction of all the proteins present? Only after exhausting all kinds of time-consuming methodological ‘tricks’, such as subcellular fractionation3, affinity-purification of samples4 or the use of zoom gels5 (which are used in 2D-PAGE to cover narrow pH ranges and to give better resolution, as well as higher sensitivity), can low-copy-number proteins be detected. However, protein patterns per se can sometimes be influenced by the gel system that is applied. For example, artificial spots can be generated by protein modification during sample preparation. The chemistry of amino acids is, unfortunately, much more difficult to handle than that of nucleic acids. For proteins, there is no amplification step that is analogous to the polymerase chain reaction (PCR) method for gene amplification. This means that proteins present in small amounts are ‘muffled’ by highly abundant proteins, which we often refer to as ‘housekeeping’ proteins. In addition, although membrane proteins6,7 and basic proteins5 can be separated on 2D gels, hydrophobic properties and charges still have a strong impact on whether a protein migrates in 2D gels or not. The tremendous speed of protein identification today and the fact that it might soon be as easy as measuring messenger RNAs have given rise to premature enthusiasm. Most companies so far are using moreor-less the same brute-force approach to determine which proteins are present in various tissues. Databases are swamped with information incredibly quickly; however, the databases mainly contain information on the most abundant and separable protein species (for example, cytoskeletal proteins, chaperones, endoplasmic-reticulum proteins, proteasome components and matrix proteins), because these are the species that are detected most commonly by classical 2D-PAGE and MS. They are present in every database, whereas regulatory proteins of low abundance (for example, GTPases, kinases and phosphatases) or ‘difficult’ proteins (for VOLUME 4 | JANUARY 2003 | 7 5 PERSPECTIVES Box 2 | Technologies for proteomics studies Proteomics applies many different technologies, and brief descriptions of some of these are provided below. • Two-dimensional polyacrylamide gel electrophoresis (2D-PAGE). The separation of ions or proteins in an electric field. This separation is usually carried out on polyacrylamide gels as a matrix. In the first dimension, the proteins are separated by isoelectric focusing, whereas in the second dimension, they are separated on the basis of their molecular weight. • Chromatography. A method for separating molecules on the basis of their different absorption and elution properties. • Electrospray. An ionization method used in MS to generate ions. • High-pressure liquid chromatography (HPLC). A chromatographic separation technique in which the sample is forced through a packed column of finely divided particles at high pressure. • Isoelectric focusing. The electrophoretic migration of proteins in a pH gradient to the pH at which they have no net charge (the isoelectric point). • Isotope coded affinity tag (ICAT). A method for quantifying differential protein expression using an ICAT reagent, HPLC and MS. • Mass spectrometry (MS). A very accurate and sensitive technique that measures the mass of an ion in a vacuum. • Matrix-assisted laser desorption ionization (MALDI). An method that is used to produce ions from solid-phase samples in small-molecule matrices that absorb energy from a laser beam. • Microarray. An array of oligonucleotides that are immobilized on a surface. By defining the sequences that hybridize, this method can be used to analyse the expression levels of several genes. • Multidimensional protein identification technology (MudPIT). A large-scale proteome analysis that uses multidimensional liquid chromatography, tandem MS and database searching using the SEQUEST algorithm. • Phage display. This tool uses phages that have proteins displayed on their surface and identifies protein–protein interactions by screening phage libraries. • Serial analysis of gene expression (SAGE). A method that uses tags and cloning techniques to analyse gene expression patterns. • Subcellular fractionation techniques. The disruption of cells (by ‘breaking’ them under conditions that prevent their deterioration), followed by the separation of the mixed components and the isolation of the desired component using centrifugation. • Tandem-affinity purification (TAP). A method for purifying complexes from different cellular compartments. It involves introducing the TAP tag into gene-specific cassettes. example, hydrophobic transmembrane receptors and basic nuclear proteins) are under-represented. In addition, each entry in a database (expression patterns and proteins catalogued) relates only to one particular situation, in one particular tissue, in one particular gel system. The latter point makes cross-laboratory correlation almost impossible, even when identical biological systems are used. This problem has always been the main hurdle in proteomics research, and it becomes particularly evident when screening available proteinexpression profiles in 2D-gel databases through the internet (BOX 3; for example, see world 2D-PAGE at the ExPASy Molecular Biology server). Therefore, such catalogues hardly expand on the information that is already present in genome databases. Because of the nearly infinite number of proteomes, we will have to repeat the same 76 | JANUARY 2003 | VOLUME 4 procedure over and over again. This brings Sisyphus to mind: the ancient Greek gods condemned Sisyphus to rolling a rock ceaselessly to the top of a mountain, and each time he did so, the rock would fall back down the mountain under its own weight; the gods believed that no punishment was more dreadful than futile and hopeless labour. An important consideration is how to systematically organize and analyse proteome data (BOX 3; see the ExPASy and European Bioinformatics Institute servers). The field of proteomics will soon experience the explosion of data that other fields, such as genomics and transcriptomics, have seen in recent years8. More than ever, data validation will be crucial for the successful establishment of qualified databases. This means that, if the tedious validation work is not carried out with dedication, databases will be swamped with inaccurate expression and interaction data, which would be worse than swamping them with redundant data. In addition, the core characteristics of databases, such as data integration and access, will become vital issues for resources in proteomics-related bioinformatics8. Open and easy access to proteomics data will be fundamental to enable the scientific community to extract the greatest benefit from the data being generated. Proteomics projects will require coordination that is even more efficient than that for genome projects. This is where HUPO comes in. HUPO will need to coordinate initiatives aimed at resources, technology development, proteome informatics, and the maintenance of publicly available and qualified databases. The successes Open your web browser, log into PubMed and search for the term ‘genomics’ together with your personal favourite of the top 5 % journals. Repeat this search, but this time replace ‘genomics’ with ‘proteomics’. You can repeat this again with ‘expression profiling’ or several other keyword combinations together with ‘proteomics’, and the overall impression will always be the same — there are only a few exceptional cases in proteomics that have made it into the ‘premier-league’ journals. Taking into account the considerable technical limitations of proteomics, it is not surprising that success stories in proteomics are rather rare, at least when these successes are measured in terms of so-called top publications. New and powerful genomics technologies, such as DNA microarrays or serial analysis of gene expression (SAGE), have made it possible to analyse the expression levels of several genes simultaneously, both in health and disease. In combination with proteomics, these technologies promised to revolutionize biology — in particular, the area of molecular medicine. However, although RNA expression profiling has become a state-of-the-art technique with measurable success rates (for examples, see REFS 9–15), protein-expression profiling on a global level is only slowly catching on. Examples of successful proteinexpression profiling in health and disease are rare. For example, Celis et al.16 recently applied proteomics and immunohistochemistry to show tumour heterogeneity among urothelial papillomas, with the long-term goal of predicting prognosis. Compared with rather robust genomics technologies, such as DNA microarrays, the expenditure for proteomics expression profiling seems significantly higher. Celis et al.16, for example, could only carry out such studies because they could take advantage of their own experience www.nature.com/reviews/molcellbio PERSPECTIVES Box 3 | Useful web site links • Human Proteome Organisation (HUPO): http://www.hupo.org/ Contains news, statements, tools and useful links. • ExPASy Molecular Biology Server: http://www.expasy.ch A proteomics server with knowledge databases (for example, SWISS-PROT and index to the world 2D-PAGE server network), software tools and training opportunities. • Proteomics server of the European Bioinformatics Institute (EBI): http://www.ebi.ac.uk/proteome/ Contains information on the statistical analysis of proteomes from eukaryotes, archaea and bacteria. In addition, the EBI toolbox area provides a comprehensive range of tools for the field of bioinformatics. • American Society for Mass Spectromtery (ASMS): http://www.asms.org/index.php Contains society news, tutorials and discussion platforms. • The RESID Database : http://pir.georgetown.edu/pirwww/dbinfo/resid.html A complete collection of annotations and structures for protein modifications. and of annotated 2D-gel data that had been accumulated from epithelial cells in one gel system for more than 15 years17. Proteomics has, despite this negativity, been extremely successful when targeted to multiprotein complexes and subcellular organelles, and when it has been applied to large-scale protein–protein interaction screens. Examples of such successes are described in more detail below. These rare and exceptional cases differ from the many other proteomics papers that appear weekly, as they did not merely provide expression-profile inventories — they took a leap forward in ascertaining the integration of cellular components in functionally targeted proteome analyses. Analysing multiprotein complexes. Multiprotein complexes carry out most cellular processes, and the identification and analysis of their components provides an insight into how the proteome is organized into functional units. Rout et al.18 have established a comprehensive inventory of the molecular components of the nuclear pore complex (NPC). This complex has large pores that are embedded in the nuclear envelope and that allow the passage of proteins and RNA between the nucleus and the cytoplasm. Rout et al.18 classified all of the components — nucleoporins — of the yeast NPC. This involved identifying all of the proteins that were present in a highly enriched NPC fraction, determining which of these proteins were nucleoporins and localizing each nucleoporin in the NPC. Using these data, the authors presented a map of the molecular architecture of the yeast NPC and provided evidence for a Brownian-affinity gating mechanism for protein transport from the nucleus to the cytoplasm. These data, together with crystallography data, enabled a picture NATURE REVIEWS | MOLECUL AR CELL BIOLOGY of the complete structure to be assembled and provided clues to biochemical functions that would not have been detectable from sequence analysis alone. Recently, Houry et al.19 analysed the in vitro substrates of the chaperonin GroEL, which has an essential role in mediating protein folding in the cytosol of Escherichia coli. The authors identified a well-defined set of ~300 newly translated polypeptides — including essential components of the transcription and translation machinery and metabolic enzymes — that strongly interacted with GroEL. About a third of the identified proteins were structurally unstable and repeatedly returned to GroEL for conformational maintenance. In addition, the identified GroEL substrates were found to be composed preferentially of two or more domains with αβ-folds that contain α-helices and buried β-sheets with extensive hydrophobic surfaces, which have an impact on the folding and aggregation properties of the identified substrates. Analysing subcellular organelle composition. A comprehensive proteomics analysis of human nucleoli was carried out recently using a combination of MS and sequence-database searches that included online analysis of the human genome sequence20. The authors of this study identified 271 proteins in the nucleoli, and showed that nucleoli have a surprisingly large protein complexity. Many new factors and different classes of proteins were found to be in this location, which supports the view that the nucleolus might carry out additional functions beyond its known role in ribosome-subunit biogenesis. This extensive proteomics analysis also showed for the first time that the protein composition of nucleoli can alter significantly in response to the metabolic state of the cell. Phagosomes are the key organelles in macrophages that provide these cells with the innate ability to participate in tissue remodelling, to clear apoptotic cells and to restrict the spread of intracellular pathogens. The establishment of a comprehensive 2D-gel database enabled Desjardins and colleagues21 to analyse how phagosome composition is modulated during phagolysosome biogenesis. Using this approach, the authors found that during this process hydrolases — enzymes that catalyse the hydrolysis of peptide bonds — are not delivered in bulk to phagosomes, but are acquired sequentially instead. In a follow-up study by the same group22, this proteome characterization also provided new insights into phagosomes as endoplasmic-reticulummediated entry sites for intracellular pathogens, regardless of their final trafficking in the host. This is one of the rare examples where new ‘text-book’ knowledge has been generated by a global characterization of an organelle proteome. Detecting protein–protein interactions. So far, the generation of large-scale protein–protein interaction maps has relied on the yeast twohybrid system, which detects binary interactions through the activation of reporter-gene expression. Two large-scale yeast two-hybrid screens23 were undertaken to identify protein–protein interactions between full-length open reading frames (ORFs) that were predicted from the Saccharomyces cerevisiae genome sequence. This approach resulted in the detection of 957 putative interactions that involved 1,004 S. cerevisiae proteins23. Recently, two groups4,24, with slightly different strategies, embarked on a high-throughput analysis of multiprotein complexes in S. cerevisiae. Gavin et al.4 processed 1,739 genes, Box 4 | Post-translational modifications of proteins Proteins, once synthesized on the ribosomes, are subject to a multitude of modification steps such as amino- or carboxy-terminal cleavages, glycosylation, phosphorylation and sulphation. At present, more than 100 different types of post-translational modifications are known and many more are likely to be discovered (BOX 3; see the RESID Database). Consequently, there are many more proteins in the proteome than there are genes in the genome.“Thus the number of different protein molecules expressed by the human genome is probably closer to a million than to the hundred thousand generally considered by genome scientists.” (REF. 37). VOLUME 4 | JANUARY 2003 | 7 7 PERSPECTIVES Calmodulin-binding peptide Cell extract TEV protease cleavage site Protein A + Bait protein TAP tag TEV protease cleavage Specific binding partner IgG beads First affinity column Calmodulin beads Second affinity column Contaminant Native elution (EGTA) Figure 1 | Tandem-affinity purification. The tandem-affinity-purification (TAP) tag consists of three components: a calmodulin-binding peptide, a tobacco etch virus (TEV) protease cleavage site and Protein A as an immunoglobulin G (IgG)-binding domain. Cells or organisms are generated that contain TAPtagged protein(s). Extracts are then prepared under mild conditions and TAP is carried out. The first column consists of IgG beads. TEV protease cleaves the immobilized multiprotein complexes. Another round of binding is carried out on a second column that consists of calmodulin beads. The native complex is then eluted by chelating calcium using EGTA. sequences. It is involved in cell–cell communication and in signal transduction from the cell surface to the nucleus. Application of this strategy to yeast SH3 domains generated a phage-display network that contained 394 interactions between 206 proteins and a twohybrid network containing 233 interactions between 145 proteins. Computational analysis identified 59 highly probable interactions that were common to both networks26. The success rates of these different largescale approaches for studying protein–protein interactions cannot be compared directly. However, the smallest common denominator for these approaches is the need for thorough bioinformatic analysis. Tong et al.26 identified key interactions by calculating the intersection of predicted and experimental networks, whereas the huge amount of data produced by large-scale yeast two-hybrid screens in S. cerevisiae23 gained its meaning through subsequent bioinformatic analyses. Possible functions were assigned to proteins on the basis of the known functions of their interacting partners27; the topological properties of interacting protein networks and their regulatory genetic network were addressed28; and the question of how the organization of protein networks affects the evolution of the proteins that comprise them was considered29. Together with sophisticated bioinformatics analyses, these interaction maps now provide fundamental biological information in the context of new approaches to drug discovery. Methods for the future? including 1,143 human orthologues of relevance to human biology, and purified 589 protein assemblies. The key to their work was tandem-affinity purification (TAP; FIG. 1; BOX 2). A cassette that encoded a so-called TAP tag, which consists of a calmodulin-binding peptide, a specific enzyme cleavage domain and Protein A from Staphylococcus aureus, was inserted into the cells being studied, and a tagged library was generated. The resulting fusion proteins, together with their binding partners, were then isolated from total cell lysates by their tag under mild conditions. Bioinformatic analysis of these assemblies defined 232 distinct multiprotein complexes and proposed new cellular roles for 344 proteins, which included 231 proteins with no previous functional annotation4. Beginning with 10% of predicted yeast proteins as bait, Ho et al.24 detected 3,617 proteins that associated with the bait, using highthroughput mass spectrometric protein complex identification. This number corresponds to more than half of the yeast ‘proteome’ 78 | JANUARY 2003 | VOLUME 4 (~6,000 protein-coding genes have been predicted)25, and numerous protein complexes were identified, which included many new interactions in various signalling pathways and in the DNA-damage response24. Tong et al.26 have developed a strategy that combines computational prediction of interactions from phage-display ligand consensus sequences and large-scale two-hybrid physical interaction tests. They first screened random peptide libraries by phage display to define consensus sequences for preferred ligands. On the basis of those consensus sequences, a computational protein–protein interaction network was derived. They then generated a second network using yeast two-hybrid screening for all the possible binding partners for each motif. Finally, the intersection of predicted and experimental networks was determined and the key interactions were, once again, experimentally tested for relevance. The Src-homology-3 (SH3) domain is a small conserved sequence of ~60 amino-acid residues, which binds to proline-rich Although 2D-PAGE technology will, for some time, still be the main technology for protein display— especially because of the recent improvements in immobilized narrow pH gradients — chromatography-coupled MS approaches and gel-independent techniques are likely to replace 2D-PAGE technology in the future. Yates and colleagues30 have described an automated method for shotgun proteomics, which is known as multidimensional protein identification technology (MudPIT; FIG. 2), that combines multidimensional liquid chromatography with electrospray ionization tandem MS. Analogous to DNA sequencing, they named this method ‘shotgun’ sequencing because it can easily be automated and it improves the overall analysis of proteomes by identifying proteins of all functional and physical classes. The multidimensional liquid-chromatography method integrates a strong cation-exchange resin and a reversed-phase resin in a biphasic column. With this largely unbiased method, Yates and colleagues analysed the S. cerevisiae strain BJ5460 that was www.nature.com/reviews/molcellbio PERSPECTIVES Complex protein mixture ReversedStrong phase cation exchanger material Off-line loading (1) kV HPLC gradient (1) (2) Insert column into system Electrospray ionization (2) ion trap mass spectrometer Waste Database searching Figure 2 | Multidimensional protein identification technology. This method (which is known as MudPIT) combines multidimensional liquid chromatography with electrospray ionization tandem mass spectrometry. In a biphasic column, the first chromatography dimension consists of a strong cation exchanger and the second dimension consists of a reversed-phase resin. The column is loaded off-line with a complex protein mixture. Next, the high-pressure liquid chromatography (HPLC) gradient is applied, the proteins are eluted and then they are directly analysed by mass spectrometry and database searching. Modified with permission from REF. 31 © Macmillan Magazines Ltd. grown to mid log-phase and produced the largest proteome analysis to date31. A total of 1,484 proteins were identified. Importantly, a dynamic ratio of 10,000:1 was shown between the most-abundant and least-abundant peptides in a complex peptide mixture, which is very similar to the dynamic range calculated by O’Farrell in his original 2D-gel publication2. Furthermore, they identified 131 proteins with 3–12 predicted transmembrane domains, which might have escaped identification with conventional gel-based approaches. Aebersold and colleagues32 have recently introduced selective labelling chemistries for the quantitative measurement of peptide and protein abundance. This method relies on the selective conjugation of cysteine thiol groups in proteins, followed by enzymatic digestion and quantitative analysis of the peptide conjugates by MS. The isotope-coded and biotinylated affinity tags are molecular handles for the highly selective and reversible affinity capture of conjugates from complex biological mixtures, such as cell homogenates and subcellular organelles (FIG. 3). The isotope-coded affinity tag (ICAT) approach is highly accurate, because it is based on stable isotope dilution techniques, and it allows the rapid and accurate quantification of protein activity and content. Using this strategy, the Aebersold group32 compared protein expression in S. cerevisiae that was using either ethanol or galactose as a NATURE REVIEWS | MOLECUL AR CELL BIOLOGY carbon source. The differences measured in protein expression correlated with known yeast metabolic function under glucoserepressed conditions. The ICAT approach should provide a widely applicable means to quantitatively compare protein expression in cells and tissues. However, a clear drawback of this method is the complexity of the generated peptides and, therefore, its still-limited suitability for large-scale biological problems. At the From Genome to Proteome meeting in Siena, Italy (September 2002), R. Aebersold (Institute of Systems Biology, Seattle, WA, USA) provided a simulation to show that, with the present technology and throughput, proteomics is still very slow. He used the assumptions that all yeast genes are concurrently expressed, that trypsin is used as the protease (which allows for one missed cleavage site) and that all peptides are sequenced in a tandem mass spectrometer at a frequency of one peptide per second. Using these assumptions, a total of 6,118 yeast proteins would give rise to ~350,000 peptides after digestion, and, with the present capacity of liquid chromatography–MS/MS and subsequent data interpretation, the ICAT analysis would take 72 days. This clearly shows the need for pre-fractionation and for better bioinformatic tools for automated data collection and interpretation. There are several highly promising techniques that are not based on MS. Protein arrays for studying protein–protein or protein–antibody interactions are in their early days, and many problems — such as protein solubility, folding and ideal binding milieu — have to be overcome. However, some very promising approaches are on their way, and only when they are put into practice will we discover their general feasibility. Because the yeast genome has been sequenced (and was found to contain more than 6,200 ORFs33), Snyder and colleagues34 were able to overproduce nearly all of the yeast proteins as glutathione-S-transferase fusion proteins and to purify these proteins. These proteins were then ‘printed’ onto slides at a high spatial density to form a yeast proteome microarray and were screened for their ability to interact with proteins and phospholipids. Snyder and colleagues34 identified many new calmodulin- and phospholipid-interacting proteins, and a common potential binding motif was identified for many of the calmodulin-binding proteins. In July 2002, at the meeting of the European Life Science Organization (ELSO) in Nice, M. Snyder (New Haven, CT, USA) reported an exciting extension of this approach that would enable us to screen for various high-throughput biochemical assays, such as phosphorylation assays, ATP and GTP binding assays, and protein–nucleic acid, protein–lipid and protein–protein interaction assays. In addition to the human draft sequence, the complete genome sequences of an increasing number of model organisms are now available. (E. coli and a large number of microorganisms, S. cerevisiae, Caenorhabditis elegans, Drosophila melanogaster and Arabidopsis thaliana). This sequence information is expected to revolutionize the way biological questions can be addressed. Molecular mechanisms should now be approachable on a more global scale in the context of nearly complete sets of genes, rather than by analysing genes individually. Recently, the predicted ORFs of C. elegans were amplified by PCR from a highly representative cDNA library, cloned and then sequenced to generate ORF sequence tags35. The possibility of a complete or nearly complete set of ORFs — the ‘ORFeome’, by analogy with genome, transcriptome and proteome — has very important consequences for functional human proteomics approaches in general. Once such approaches are possible in humans, there will be a clear transition from 'Light' version 'Heavy' version –S–SBiotin Protein Linker –S–SBiotin Protein Deuterium linker Pool and cut –S–S- –S–S- ICAT-labelled peptides Liquidaffinity chromatography –S–S- –S–S∆8 m/z MS/MS Figure 3 | Isotope-coded affinity tag methodology. Two populations of proteins from different cellular states or growth conditions are isolated, and each population is tagged with a different isotope-coded affinity tag (ICAT). The light version, with hydrogen, and the heavy version, with deuterium, have a mass difference of 8. The ICAT-tagged protein populations are then pooled and the proteins are cut into smaller peptides. The peptides are affinity purified (using their biotin tag) and, finally, they are analysed quantitatively as ICAT pairs of peptides using tandem mass spectrometry (MS/MS). For a flash-animated version of the ICAT method, see the Online links section. m/z, mass-to-charge ratio. VOLUME 4 | JANUARY 2003 | 7 9 PERSPECTIVES large-scale protein-annotation projects, which are based mainly on MS, towards functional protein analysis on a global scale. Concluding remarks As the saying goes,“these boots are made for walking”, and it now seems that the ‘proteomics boots’ fit, and that we even have an idea about the ‘right direction’ in which to walk. So far, the proven strength of proteomics has been in targeted and focused analyses. However, in the future, global functional approaches seem feasible, and it is here that cell biology is a rich area for proteome research. By confirming the subcellular localization of proteins and their molecular interactions, we can learn a great deal about the functions of proteins — and that, after all, is the whole point of proteomics. Subcellular proteomes, protein-interaction networks and large signalling complexes provide unprecedented opportunities to unlock the mysteries of biological processes and to develop new rational therapeutics (proteomics will soon be competing with proven technologies for, for example, target identification and validation in drug discovery). Showing that proteomics, in combination with cell biology, can deliver functional insights into systems as large as an organelle give us hope that it will work in its promised sense — that is, on the level of entire proteomes — in the future. Lukas A. Huber is at the Institute of Anatomy and Histology, Department of Histology and Molecular Cell Biology, University of Innsbruck, 6020 Innsbruck, Austria. e-mail: Lukas.A.Huber@uibk.ac.at doi:10.1038/nrm1007 1. 2. 80 Klose, J. Protein mapping by combined isoelectric focusing and electrophoresis of mouse tissues. A novel approach to testing for induced point mutations in mammals. Humangenetik 26, 231–243 (1975). O‘Farrell, P. H. High resolution two-dimensional electrophoresis of proteins. J. Biol. Chem. 250, 4007–4021 (1975). | JANUARY 2003 | VOLUME 4 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17. 18. 19. 20. 21. 22. 23. Pasquali, C., Fialka, I. & Huber, L. A. Subcellular fractionation, electromigration analysis and mapping of organelles. J. Chromatogr. B Biomed. Sci. Appl. 722, 89–102 (1999). Gavin, A. C. et al. Functional organization of the yeast proteome by systematic analysis of protein complexes. Nature 415, 141–147 (2002). Gorg, A. et al. The current state of two-dimensional electrophoresis with immobilized pH gradients. Electrophoresis 21, 1037–1053 (2000). Santoni, V., Molloy, M. & Rabilloud, T. Membrane proteins and proteomics: un amour impossible? Electrophoresis 21, 1054–1070 (2000). Pasquali, C., Fialka, I. & Huber, L. A. Preparative twodimensional gel electrophoresis of membrane proteins. Electrophoresis 18, 2573–2581 (1997). Stupka, E. Large-scale open bioinformatics data resources. Curr. Opin. Mol. Ther. 4, 265–274 (2002). Alizadeh, A. A. et al. Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling. Nature 403, 503–511 (2000). Dhanasekaran, S. M. et al. Delineation of prognostic biomarkers in prostate cancer. Nature 412, 822–826 (2001). van’t Veer, L. J. et al. Gene expression profiling predicts clinical outcome of breast cancer. Nature 415, 530–536 (2002). Huang, Q. et al. The plasticity of dendritic cell responses to pathogens and their components. Science 294, 870–875 (2001). Caron, H. et al. The human transcriptome map: clustering of highly expressed genes in chromosomal domains. Science 291, 1289–1292 (2001). Hill, A. A., Hunter, C. P., Tsung, B. T., Tucker-Kellogg, G. & Brown, E. L. Genomic analysis of gene expression in C. elegans. Science 290, 809–812 (2000). St Croix, B. et al. Genes expressed in human tumor endothelium. Science 289, 1197–1202 (2000). Celis, J. E. et al. Proteomic strategies to reveal tumor heterogeneity among urothelial papillomas. Mol. Cell Proteomics 1, 269–279 (2002). Celis, J. E. Toward establishing a database of human protein information derived from the analysis of twodimensional gels. Leukemia 1, 706 (1987). Rout, M. P. et al. The yeast nuclear pore complex: composition, architecture, and transport mechanism. J. Cell Biol. 148, 635–651 (2000). Houry, W. A., Frishman, D., Eckerskorn, C., Lottspeich, F. & Hartl, F. U. Identification of in vivo substrates of the chaperonin GroEL. Nature 402, 147–154 (1999). Andersen, J. S. et al. Directed proteomic analysis of the human nucleolus. Curr. Biol. 12, 1–11 (2002). Garin, J. et al. The phagosome proteome: insight into phagosome functions. J. Cell Biol. 152, 165–180 (2001). Gagnon, E. et al. Endoplasmic reticulum-mediated phagocytosis is a mechanism of entry into macrophages. Cell 110, 119–131 (2002). Uetz, P. et al. A comprehensive analysis of protein–protein interactions in Saccharomyces cerevisiae. Nature 403, 623–627 (2000). 24. Ho, Y. et al. Systematic identification of protein complexes in Saccharomyces cerevisiae by mass spectrometry. Nature 415, 180–183 (2002). 25. Payne, W. E. & Garrels, J. I. Yeast Protein Database (YPD): a database for the complete proteome of Saccharomyces cerevisiae. Nucleic Acids Res. 25, 57–62 (1997). 26. Tong, A. H. et al. A combined experimental and computational strategy to define protein interaction networks for peptide recognition modules. Science 295, 321–324 (2002). 27. Schwikowski, B., Uetz, P. & Fields, S. A network of protein–protein interactions in yeast. Nature Biotechnol. 18, 1257–1261 (2000). 28. Maslov, S. & Sneppen, K. Specificity and stability in topology of protein networks. Science 296, 910–913 (2002). 29. Fraser, H. B., Hirsh, A. E., Steinmetz, L. M., Scharfe, C. & Feldman, M. W. Evolutionary rate in the protein interaction network. Science 296, 750–752 (2002). 30. Wolters, D. A., Washburn, M. P. & Yates, J. R. 3rd. An automated multidimensional protein identification technology for shotgun proteomics. Anal. Chem. 73, 5683–5690 (2001). 31. Washburn, M. P., Wolters, D. & Yates, J. R. 3rd. Largescale analysis of the yeast proteome by multidimensional protein identification technology. Nature Biotechnol. 19, 242–247 (2001). 32. Gygi, S. P. et al. Quantitative analysis of complex protein mixtures using isotope-coded affinity tags. Nature Biotechnol. 17, 994–999 (1999). 33. Skovgaard, M., Jensen, L. J., Brunak, S., Ussery, D. & Krogh, A. On the total number of genes and their length distribution in complete microbial genomes. Trends Genet. 17, 425–428 (2001). 34. Zhu, H. et al. Global analysis of protein activities using proteome chips. Science 293, 2101–2105 (2001). 35. Reboul, J. et al. Open-reading-frame sequence tags (OSTs) support the existence of at least 17,300 genes in C. elegans. Nature Genet. 27, 332–336 (2001). 36. Cohen, J. The proteomics payoff. Technol. Rev. October, 55–60 (2001). 37. Human Proteomics Initiative. ExPASy Molecular Biology Server [online], (June 2002) http://ca.expasy.org/sprot/hpi/hpi_desc.html (2002). Acknowledgements I would like to thank M. Glotzer for critically reading and discussing this manuscript with me. I would also like to thank Tommy Beck for helping with the web links. Online links FURTHER INFORMATION Lukas Huber’s laboratory: http://www.uibk.ac.at/c/c5/c552/c55200/index.html Isotope-Coded Affinity Tags (ICAT) Methodology — Flash Animation: http://www.bio.davidson.edu/Courses/genomics/ICAT/ICAT.html PubMed: http://www.ncbi.nlm.nih.gov/entrez/ Access to this interactive links box is free online. www.nature.com/reviews/molcellbio